CS203: Advanced Computer Architecture (2020 Fall)

Lecture: MW 9:30a – 10:50a on Zoom/Youtube Live!

Schedule and SlidesAssignments and ProjectLogistics

Instructor

Hung-Wei Tseng
email: htseng @ ucr.edu
Office Hours: M 8p-9p & W 2p-3p @ Zoom or by appointment

Teaching Assistant

Quan Fan
email: qfan005 @ ucr.edu
Office Hours: F 1p-3p @ Zoom or by appointment

Other important links

Quizzes, Assignments, Grading: iLearn
Discussion Forum on Piazza: https://piazza.com/class/kffrohnk4kw6vo
Youtube Channel/Video Archive: https://www.youtube.com/profusagi

Course Overview

This course will describe the basics of modern processor operation and techniques to optimize your applications. Topics include computer system performance, instruction set architectures, pipelining, branch prediction, memory-hierarchy design, and a brief introduction to multiprocessor architecture issues.

Text books

Required: Patterson & Hennessy, Computer Architecture: A Quantitative Approach, David Patterson & John Hennessy, Morgan Kaufmann, 6th Edition and assigned research papers.
Required: Other assigned readings throughout the quarter.

Many research papers will require campus network to download. For instructions of connecting to UCR VPN and download research papers, please visit https://library.ucr.edu/using-the-library/technology-equipment/connect-from-off-campus

Grading

  • Homework/Class participation 15%
    • Homeworks will be assigned throughout the course.
    • Class participation (Zoom-poll Based)
      • This class uses “peer instruction” and we encourage you to participate in Live Zoom sessions
      • You need to answer 50% of the poll questions to receive full credits on the class participation
  • Reading Quizzes 15%
    • We will have reading quizzes on iLearn!
    • 2 Attempts, we take the average
  • Project 15%
    We will have one coding project throughout the quarter. It’s going to be a contest and you will win a prize over it!
  • Midterm 20%
  • Final 35%
    The final will be cumulative.
  • Additional notes about grades in this course
    • Your score will be available on iLearn. Your final grade is the weighted average of these grades.
      We do our best to record grades accurately, but you should double-check.
    • Late submission: We do not accept any late submission, including quiz, assignments, projects.
    • Errors in grading: If you feel there has been an error in how an assignment or test was graded, you have one week from when the assignment is return to bring it to our attention. You must submit (via email to the instructor and the appropriate TAs) a written description of the problem. Neither I nor the TAs will discuss regrades without receiving an email from you about it first. For arithmetic errors (adding up points etc.) you do not need to submit anything in writing, but the one week limit still applies.
    • For midterm and final: We do not regrade on a single problem. We will re-grade your whole test. The one week regrading window still applies.
    • Final grades: If you have a problem with your final grade in the course, send me email and we can set up an appointment to discuss it.

Schedule and Slides


TopicReadingSlides — PreviewSlides — ReleaseDueNote
10/05/2020Introduction– Cramming More Components Onto Integrated Circuits, G.E. Moore, Proceedings of the IEEE 86(1):82-85, Jan 1998
– Chapter 1.1-1.6

Intro

Demo


10/07/2020Performance Evaluation (I)– Chapter 1.3 & 1.8-1.9Performance (Preview)Performance

Demo
Reading Quiz
10/12/2020Performance Evaluation (II)– M. D. Hill and M. R. Marty. Amdahl’s Law in the Multicore Era. in Computer, vol. 41, no. 7, pp. 33-38, July 2008.
Performance (2)

Demo
Reading Quiz
10/14/2020Performance Evaluation (III) &&
Memory Hierarchy
– V. Sze, Y. -H. Chen, T. -J. Yang and J. S. Emer. How to Evaluate Deep Neural Network Processors: TOPS/W (Alone) Considered Harmful. In IEEE Solid-State Circuits Magazine, vol. 12, no. 3, pp. 28-41, Summer 2020.
(Optional) Andrew Davison, Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers. in Humour the Computer , MITP, 1995, pp.

– Appendix B.1-B.3
Memory Hierarchy
(Preview)

Performance (3)

Demo
Reading Quiz
10/19/2020Memory Hierarchy (1): The Basics (cont.)– Appendix B.1-B.3
Memory (1)Assignment #1
10/21/2020Memory Hierarchy (2)– Appendix B.1-B.3, 2.3
Memory (2)Reading Quiz
10/26/2020Joel Emer’s Talk @ 11aPlease attend Joel Emer’s Talk at 11am


10/28/2020Virtual Memory (Youtube only)– Chapter B.4 & B.5, 2.4
– Barr, Thomas W., Alan L. Cox, and Scott Rixner. “Translation caching: skip, don’t walk (the page table).” ACM SIGARCH Computer Architecture News 38.3 (2010): 48-59.
– Basu, Arkaprava, et al. “Efficient virtual memory for big memory servers.” ACM SIGARCH Computer Architecture News 41.3 (2013): 237-248.

Lecture

Virtual Memory

Demo
Reading Quiz
11/02/2020Memory Hierarchy (3): Optimizing Cache Performance Applications – Chapter 2.3
– Norman P. Jouppi. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. SIGARCH Comput. Archit. News 18, 2SI (June 1990), 364–373.

Optimizing Memory Performance

Demo
Assignment #2
11/04/2020Memory Hierarchy (4): Programmer’s optimizations
&
Basic Processor Design
– Chapter 3.3
– Appendix C.1 – C.4
Basic Pipelined Processor (Preview)Optimizing Memory Performance (2)

Demo(Memory)

Demo(Pipeline)
Reading Quiz
11/09/2020Basic Processor Design
& Branch Prediction
– Chapter 3.3Basic Pipeline Processor/Branch Prediction/Sample MidtermAssignment #3
11/11/2020Veterans DayNo lecture
Take home midterm due 11/13/2020 midnight




11/16/2020Advanced Branch Prediction– M. Evers, S. J. Patel, R. S. Chappell and Y. N. Patt, “An analysis of correlation and predictability: what makes two-level branch predictors work,” Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235), Barcelona, Spain, 1998, pp. 52-61.

– James E. Smith. Retrospective: a study of branch prediction strategies. ISCA ’98: 25 years of the international symposia on Computer architecture (selected papers), New York, NY, USA, 1998, pages 22-23

– Jiménez, Daniel A., and Calvin Lin. “Dynamic branch prediction with perceptrons.” Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture. IEEE, 2001.

– André Seznec and P. Michaud. A case for (partially) TAgged GEometric history length branch prediction. Journal of Instruction Level Parallelism. June 2006.
Branch Prediction (Preview)
Dynamic Branch PredictionReading Quiz
11/18/2020OOO Scheduling– Chapter 3.4
– K. C. Yeager, “The MIPS R10000 superscalar microprocessor,” in IEEE Micro, vol. 16, no. 2, pp. 28-41, April 1996.
– R. E. Kessler, “The Alpha 21264 microprocessor,” in IEEE Micro, vol. 19, no. 2, pp. 24-36, March-April 1999.
Dynamic Instruction Scheduling (Preview)Branch Prediction & Data HazardsReading Quiz
11/23/2020OOO Scheduling
Data Hazards & Dynamic Instruction Scheduling (2)
11/25/2020OOO Scheduling
Data Hazards & Dynamic Instruction Scheduling (3)Reading Quiz
11/30/2020SMT & Chip Multiprocessors– Chapter 3.11
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm, ISCA ’96: Proceedings of the 23rd annual international symposium on Computer architecture, New York, NY, USA, 1996, pages 191-202.
– Collins, J. D., Wang, H., Tullsen, D. M., Hughes, C., Lee, Y. F., Lavery, D., & Shen, J. P. (2001, June). Speculative precomputation: Long-range prefetching of delinquent loads. In Proceedings 28th Annual International Symposium on Computer Architecture (pp. 14-25). IEEE.
– Chapter 5.1 – 5.3, 5.5 & 5.6
The case for a single-chip multiprocessor, Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang, SIGPLAN Not. 31(9):2-11, 1996.
Multithreaded ArchitecturesSpeculative Execution and SMTReading Quiz

Assignment #4

12/02/2020Chip Multiprocessors
& Modern Processors
– (Optional) D. Suggs, M. Subramony and D. Bouvier, “The AMD “Zen 2” Processor,” in IEEE Micro, vol. 40, no. 2, pp. 45-52, 1 March-April 2020, doi: 10.1109/MM.2020.2974217.
– (Optional) J. Doweck et al.. Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake. in IEEE Micro, vol. 37, no. 2, pp. 52-62, Mar.-Apr. 2017, doi: 10.1109/MM.2017.38.

Multithreaded Architecture

Demo

12/07/2020Dark Silicon– Chapter 1.7
– H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam and D. Burger. Dark Silicon and the End of Multicore Scaling. In IEEE Micro, vol. 32, no. 3, pp. 122-134, May-June 2012.
– Rakesh Kumar, Keith Farkas, Norm P. Jouppi, Partha Ranganathan, Dean M. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In 36th International Symposium on Microarchitecture, December, 2003.
Power/Engery & Dark Silicon (Preview)
Power/Energy and Dark SiliconReading Quiz

Project

12/09/2020TPU, FPGA– Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. 2016. A cloud-scale acceleration architecture. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49).
– Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA ’17). Association for Computing Machinery, New York, NY, USA, 1–12. DOI:https://doi.org/10.1145/3079856.3080246

Dark Silicon and Future ArchitectureAssignment #5
12/15/2020Final ExamTake home. Due 11:59:00pm.



Assignments

Assignment #1
– Questions:
Please find the homework questions and complete the homework using the from the following template
https://intra.engr.ucr.edu/~htseng/classes/cs203_2020fa/Assignments/assignment_1.pdf
or
https://intra.engr.ucr.edu/~htseng/classes/cs203_2020fa/Assignments/assignment_1.docx
– Deliverable
Turn in your solutions through iLearn under the assignment section
– Due
11:59pm 10/19/2020

Assignment #2
– Questions:
Please find the homework questions and complete the homework using the from the following template
https://intra.engr.ucr.edu/~htseng/classes/cs203_2020fa/Assignments/assignment_2.pdf
or
https://intra.engr.ucr.edu/~htseng/classes/cs203_2020fa/Assignments/assignment_2.docx
– Deliverable
Turn in your solutions through iLearn under the assignment section
– Due
11:59pm 11/02/2020

Assignment #3
– Questions:
Please find the homework questions and complete the homework using the from the following template
https://intra.engr.ucr.edu/~htseng/classes/cs203_2020fa/Assignments/assignment_3.pdf
or
https://intra.engr.ucr.edu/~htseng/classes/cs203_2020fa/Assignments/assignment_3.docx
– Deliverable
Turn in your solutions through iLearn under the assignment section
– Due
11:59pm 11/09/2020

Assignment #4
– Questions:
Please find the homework questions and complete the homework using the from the following template
https://intra.engr.ucr.edu/~htseng/classes/cs203_2020fa/Assignments/assignment_4.pdf
or
https://intra.engr.ucr.edu/~htseng/classes/cs203_2020fa/Assignments/assignment_4.docx
– Deliverable
Turn in your solutions through iLearn under the assignment section
– Due
11:59pm 11/30/2020

Assignment #5
– Questions:
Please find the homework questions and complete the homework using the from the following template
https://intra.engr.ucr.edu/~htseng/classes/cs203_2020fa/Assignments/assignment_5.pdf
or
https://intra.engr.ucr.edu/~htseng/classes/cs203_2020fa/Assignments/assignment_5.docx
– Deliverable
Turn in your solutions through iLearn under the assignment section
– Due
11:59pm 12/09/2020

Project

https://github.com/hungweitseng/CS203_Fa20_Project

– Due
11:59pm 12/07/2020

Integrity Policy

  • Cheating WILL be taken seriously. Doing otherwise is not fair to honest students. It is also not fair to allow the cheater to thing that it is a reasonable alternative in life.
  • Please review the UCR student handbook for more details on Academic Integrity.
  • Anyone copying information or having information copied during a test will receive an F for the class and will not be allowed to drop. They will be reported to their college dean. If you can prove non-cooperative copying took place, your grade may be restored, but you must prove it to the dean–I don’t want to be involved. Anyone caught cheating or falsely representing the work of others on the homework will not be allowed to turn in further homework. Your grade will be based exclusively on the tests with a penalty of 25% OR GREATER applied.
  • We photocopy a random sampling of the exams in order to ensure that students do not modify their tests after they have been returned.
  • Online solutions, etc.: A solutions manual exists for this text. Using it, or any solutions you may find on the internet elsewhere IS CHEATING and will be dealt with accordingly. We know what the solution manual solutions look like. Homework is a small fraction of your grade, so cheating on it is unproductive.