CS203: Advanced Computer Architecture (2021 Fall)

Lecture: MW 9:30a – 10:50a

Where: UCR Campus Olmsted 1208

Schedule and SlidesAssignments and ProjectLogistics

Instructor

Hung-Wei Tseng
email: htseng @ ucr.edu
Office Hours: MTu 2p-3p @ WCH 406 & Zoom (please find the link in eLearn)

Teaching Assistant

Abenezer Wudenhe
e-mail: awude001 @ ucr.edu
Office Hours: WTh 3p-4p @ Zoom (please find the link in eLearn)

Other important links

Quizzes, Assignments, Grading: eLearn
Discussion Forum on Piazza: https://piazza.com/class/ktq8dff9z053pw
Youtube Channel/Video Archive: https://www.youtube.com/profusagi

Course Overview

This course will describe the basics of modern processor operation and techniques to optimize your applications. Topics include computer system performance, instruction set architectures, pipelining, branch prediction, memory-hierarchy design, and a brief introduction to multiprocessor architecture issues.

Text books

Required: Patterson & Hennessy, Computer Architecture: A Quantitative Approach, David Patterson & John Hennessy, Morgan Kaufmann, 6th Edition and assigned research papers.
Required: Other assigned readings throughout the quarter.

Many research papers will require campus network to download. For instructions of connecting to UCR VPN and download research papers, please visit https://library.ucr.edu/using-the-library/technology-equipment/connect-from-off-campus

Grading

  • Homework/Class participation 15%
    • Homeworks will be assigned throughout the course.
  • Class participation (5%)
    • This class uses “peer instruction” and we REQUIRE each of you to download poll everywhere App or navigate/login to their website during the class
    • You must login with UCRNetID@ucr.edu. If you didn’t do it right, you won’t get credits.
    • You need to answer 50% of the poll questions to receive full credits on the class participation
  • Reading Quizzes 15%
    • We will have reading quizzes on eLearn!
    • 2 Attempts, we take the average
  • Project 10%
    We will have one coding project throughout the quarter. It’s going to be a contest and you will win a prize over it!
  • Midterm 20%
  • Final 35%
    The final will be cumulative.
  • Additional notes about grades in this course
    • Your score will be available on eLearn Your final grade is the weighted average of these grades.
      We do our best to record grades accurately, but you should double-check.
    • Late submission: We do not accept any late submission, including quiz, assignments, projects.
    • Errors in grading: If you feel there has been an error in how an assignment or test was graded, you have one week from when the assignment is return to bring it to our attention. You must submit (via email to the instructor and the appropriate TAs) a written description of the problem. Neither I nor the TAs will discuss regrades without receiving an email from you about it first. For arithmetic errors (adding up points etc.) you do not need to submit anything in writing, but the one week limit still applies.
    • For midterm and final: We do not regrade on a single problem. We will re-grade your whole test. The one week regrading window still applies.
    • Final grades: If you have a problem with your final grade in the course, send me email and we can set up an appointment to discuss it.

Schedule and Slides


TopicReadingSlides — PreviewSlides — ReleaseDueNote
9/27/2021Introduction– G.E. Moore. Cramming More Components Onto Integrated Circuits. Electronics, pp. 114–117, April 19, 1965.

– Chapter 1.1-1.6

– John L. Hennessy and David A. Patterson. 2019. A new golden age for computer architecture. Commun. ACM 62, 2

1 Intro

Demo


9/29/2021Performance Evaluation (I)– Chapter 1.3 & 1.8-1.9Performance (Preview)2 Performance

Demo
Reading Quiz #1
10/4/2021Performance Evaluation (II)– M. D. Hill and M. R. Marty. Amdahl’s Law in the Multicore Era. in Computer, vol. 41, no. 7, pp. 33-38, July 2008.
– V. Sze, Y. -H. Chen, T. -J. Yang and J. S. Emer. How to Evaluate Deep Neural Network Processors: TOPS/W (Alone) Considered Harmful. In IEEE Solid-State Circuits Magazine, vol. 12, no. 3, pp. 28-41, Summer 2020.
(Optional) Andrew Davison, Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers. in Humour the Computer , MITP, 1995, pp.

3 Performance (II)

Demo (Programming Languages)
Reading Quiz #2
10/6/2021Performance Evaluation (III)
4 Performance (III)

Demo (GPU Sort)

10/11/2021Memory Hierarchy (1): The Basics– Appendix B.1-B.3Memory Hierarchy (Preview)
5 Performance (IV) and Memory (I)Reading Quiz #3
10/13/2021Memory Hierarchy (2)– Appendix B.1-B.3, 2.3
6 Memory (II)Assignment #1
10/18/2021Memory Hierarchy (3): Optimizing Cache Performance Applications– Chapter 2.3

– Norman P. Jouppi. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. SIGARCH Comput. Archit. News 18, 2SI (June 1990), 364–373.

7 Memory (III)
Reading Quiz #4
10/20/2021Memory Hierarchy (4): Programmer’s optimizations
8 Memory (IV)

Aligned Access

Data Structure

10/25/2021Virtual Memory– Chapter B.4 & B.5, 2.4

– Barr, Thomas W., Alan L. Cox, and Scott Rixner. “Translation caching: skip, don’t walk (the page table).” ACM SIGARCH Computer Architecture News 38.3 (2010): 48-59.

– Basu, Arkaprava, et al. “Efficient virtual memory for big memory servers.” ACM SIGARCH Computer Architecture News 41.3 (2013): 237-248.

Reading Quiz #5
10/27/2021Basic Processor Design– Appendix C.1 – C.4Assignment #2
11/01/2021Midterm
11/03/2021Branch Prediction– Chapter 3.3

– Appendix C.1 – C.4


Reading Quiz #6
11/08/2021Advanced Branch Prediction– M. Evers, S. J. Patel, R. S. Chappell and Y. N. Patt, “An analysis of correlation and predictability: what makes two-level branch predictors work,” Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235), Barcelona, Spain, 1998, pp. 52-61.

– James E. Smith. Retrospective: a study of branch prediction strategies. ISCA ’98: 25 years of the international symposia on Computer architecture (selected papers), New York, NY, USA, 1998, pages 22-23

– Jiménez, Daniel A., and Calvin Lin. “Dynamic branch prediction with perceptrons.” Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture. IEEE, 2001.

– André Seznec and P. Michaud. A case for (partially) TAgged GEometric history length branch prediction. Journal of Instruction Level Parallelism. June 2006.
Reading Quiz #7
11/10/2021OOO Scheduling– Chapter 3.4

– K. C. Yeager, “The MIPS R10000 superscalar microprocessor,” in IEEE Micro, vol. 16, no. 2, pp. 28-41, April 1996.

– R. E. Kessler, “The Alpha 21264 microprocessor,” in IEEE Micro, vol. 19, no. 2, pp. 24-36, March-April 1999.
Reading Quiz #8
11/15/2021OOO Scheduling
Assignment #3
11/17/2021OOO Scheduling
Reading Quiz #9
11/22/2021SMT & Chip Multiprocessors– Chapter 3.11

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm, ISCA ’96: Proceedings of the 23rd annual international symposium on Computer architecture, New York, NY, USA, 1996, pages 191-202.

– Collins, J. D., Wang, H., Tullsen, D. M., Hughes, C., Lee, Y. F., Lavery, D., & Shen, J. P. (2001, June). Speculative precomputation: Long-range prefetching of delinquent loads. In Proceedings 28th Annual International Symposium on Computer Architecture (pp. 14-25). IEEE.

– Chapter 5.1 – 5.3, 5.5 & 5.6

The case for a single-chip multiprocessor, Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang, SIGPLAN Not. 31(9):2-11, 1996.
Reading Quiz #10
11/24/2021Chip Multiprocessors
& Modern Processors
– (Optional) D. Suggs, M. Subramony and D. Bouvier, “The AMD “Zen 2” Processor,” in IEEE Micro, vol. 40, no. 2, pp. 45-52, 1 March-April 2020, doi: 10.1109/MM.2020.2974217.

– (Optional) J. Doweck et al.. Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake. in IEEE Micro, vol. 37, no. 2, pp. 52-62, Mar.-Apr. 2017, doi: 10.1109/MM.2017.38.

Assignment #4
11/29/2021Dark Silicon– Chapter 1.7

– H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam and D. Burger. Dark Silicon and the End of Multicore Scaling. In IEEE Micro, vol. 32, no. 3, pp. 122-134, May-June 2012.

– Rakesh Kumar, Keith Farkas, Norm P. Jouppi, Partha Ranganathan, Dean M. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In 36th International Symposium on Microarchitecture, December, 2003.

Reading Quiz #11

Project

12/1/2021TPU, FPGA– Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. 2016. A cloud-scale acceleration architecture. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49).

– Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA ’17). Association for Computing Machinery, New York, NY, USA, 1–12. DOI:https://doi.org/10.1145/3079856.3080246

– John L. Hennessy and David A. Patterson. 2019. A new golden age for computer architecture. Commun. ACM 62, 2

Assignment #5
12/10/2021Final Exam8am-11am



Assignments

Assignment #1
Assignment #2

Integrity Policy

  • Cheating WILL be taken seriously. Doing otherwise is not fair to honest students. It is also not fair to allow the cheater to thing that it is a reasonable alternative in life.
  • Please review the UCR student handbook for more details on Academic Integrity.
  • Anyone copying information or having information copied during a test will receive an F for the class and will not be allowed to drop. They will be reported to their college dean. If you can prove non-cooperative copying took place, your grade may be restored, but you must prove it to the dean–I don’t want to be involved. Anyone caught cheating or falsely representing the work of others on the homework will not be allowed to turn in further homework. Your grade will be based exclusively on the tests with a penalty of 25% OR GREATER applied.
  • We photocopy a random sampling of the exams in order to ensure that students do not modify their tests after they have been returned.
  • Online solutions, etc.: A solutions manual exists for this text. Using it, or any solutions you may find on the internet elsewhere IS CHEATING and will be dealt with accordingly. We know what the solution manual solutions look like. Homework is a small fraction of your grade, so cheating on it is unproductive.

Public Health Regulation

  • Masks are always required in the classroom.
  • If you have any symptom of