Course DetailsCourse Details
• Time: Wed 10:00-11:40am, Fri 10:00-11:40am• Location: 下院 201
• Course Website: http://www.cs.sjtu.edu.cn/~liang-xy/ms108/MS108-L.ppt
http://www.cs.sjtu.edu.cn/~liang-xy/ms108/hw*.pdf• Instructor: Xiaoyao Liang, [email protected]• TA: TBD• Textbook: Computer Architecture:A Quantitative Approach,Fifth
Edition/ 计算机体系结构 : 量化研究方法 ( 英文版•第 5 版 ) ISBN 9787111364580 ( 英文影印版 ) , John L.Hennessy, David A.Patterson 著,机械工业出版社2012年 1 月 1 日出版
• Reference: 计算机组成与设计 : 硬件、软件接口 ( 原书第 3 或第 4 版 ) , David A.Patterson, John L.Hennessy 著,机械工业出版社出版
• Grades:
Homework (40%), Attendance (10%), Middle-term Exam (20%), Project
(30%)
2
Course PrerequisitesCourse Prerequisites
3
• Computing Hardware or similarLogic Design (computer arithmetic)Basic ISA (what is a RISC instruction)Pipelining (control/data hazards, forwarding)Will review the above during the first couple of weeks
• C Programming, Linux
• Compilers, OS, Circuits/VLSI background is aplus, not needed
Course ImportanceCourse Importance
4
• Embarrassing if you are a BS in CS/CE and can’t make sense of the following terms: DRAM, pipelining, cache hierarchies, I/O, virtual memory
• Embarrassing if you are a BS in CS/CE and can’t decide which processor to buy: 3 GHz Core2duo or 1GHz ARM (helps us reason about performance/power)
• Obvious first step for chip designers, compiler/OS writers
• Will knowledge of the hardware help me write better programs?
Course ImportanceCourse Importance
5
• Memory management: if we understand how/where data is placed, we can help ensure that relevant data is nearby
• Thread management: if we understand how threads interact, we can write smarter multi-threaded programs
Why do we care about multi-threaded programs?
Average Joe Programmer Vs. Stephaney Programmer
Course TopicsCourse Topics
6
• Focus on what modern computer architects worryabout (both academia and industry)
• Get through the basics of modern processor design
• Understand the interfaces between architectureand system software (compilers, OS)
• System architecture and I/O (disks, memory,multiprocessors)
• Look at technology trends, recent research ideas,and the future of computing hardware
Course ArrangementCourse Arrangement
• Introduction and Performance Metrics (1 week)• ISA/Basic Pipelining Review (2 week)• Hardware ILP (2 weeks)• Software ILP (1 week)• Caches/Memory (2 weeks)• Modern Processor Case Studies (2 weeks)• Multiprocessors/Multithreading (2 weeks)• Input/Output and Interconnects (1 week)• Research Trends (1 week)• Technology Trends impact on architecture (1 week)
7
Computers Are Computers Are EverywhereEverywhere
• General-Purpose Laptop/DesktopProductivity, interactive graphics, video, audio
Optimize price-performanceExamples: Intel Core2duo, Nvidia GTX
• Embedded ComputersPDAs, cell-phones, sensors => Price, lifetimeExamples: Iphone, Ipad, Android PhoneGame Machines, Network uPs => Price-PerformanceExamples: Sony PS, Xbox, IBM 750FX
• Data CentersHPC, Cloud => Price, throughput, power, cooling Example: Google, Amazon
9
Microprocessor CapacityMicroprocessor Capacity
2X transistors/Chip Every 1.5 years
Called “Moore’s Law”
Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.
Microprocessors have become smaller, denser, and more powerful.
10
Microprocessor SpeedMicroprocessor Speed
i4004
i80286
i80386
i8080
i8086
R3000R2000
R10000
Pentium
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1970 1975 1980 1985 1990 1995 2000 2005
Year
Tran
sist
ors
Growth in transistors per chip Increase in clock rate
0.1
1
10
100
1000
1970 1980 1990 2000
Year
Clo
ck R
ate
(MH
z)
Why bother with parallel programming? Just wait a year or two…11
Limit #1: Power DensityLimit #1: Power Density
40048008
80808085
8086
286386
486Pentium®
P6
1
10
100
1000
10000
1970 1980 1990 2000 2010
Year
Po
wer
Den
sity
(W
/cm
2 )
Hot Plate
NuclearReactor
RocketNozzle
Sun’sSurface
Source: Patrick Gelsinger, Intel
Scaling clock speed (business as usual) will not work
Can soon put more transistors on a chip than can afford to turn on. -- Patterson ‘07
13
Limit #2: ILP Tapped OutLimit #2: ILP Tapped Out
Year
1
10
100
1000
10000
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
25%/year
52%/year
??%/year
• VAX : 25%/year 1978 to 1986• RISC + x86: 52%/year 1986 to 2002
From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006
Application performance was increasing by 52% per year as measured by the SpecInt benchmarks here
• ½ due to transistor density• ½ due to architecture
changes, e.g., Instruction Level Parallelism (ILP)
14
Limit #2: ILP Tapped OutLimit #2: ILP Tapped Out
Year
• Superscalar (SS) designs were the state of the art; many forms of parallelism not visible to programmer multiple instruction issue dynamic scheduling: hardware discovers parallelism
between instructions speculative execution: look past predicted branches non-blocking caches: multiple outstanding memory ops
• You may have heard of these before, but you haven’t needed to know about them to write software
• Unfortunately, these sources have been used up
15
Limit #3: Chip YieldLimit #3: Chip Yield
Year
• Moore’s (Rock’s) 2nd law: fabrication costs go up
• Yield (% usable chips) drops
• Parallelism can helpMore smaller, simpler processors are easier to design and validateCan use partially working chips:E.g., Cell processor (PS3) is sold with 7 out of 8 “on” to improve yield
Manufacturing costs and yield problems limit use of density
16
Current SituationCurrent Situation
17
• Chip density is continuing increasing Clock speed is
not Number of
processor cores may double instead
• There is little or no hidden parallelism (ILP) to be found
• Parallelism must be exposed to and managed by software
Source: Intel, Microsoft (Sutter) and Stanford (Olukotun, Hammond)
AbstractionAbstraction
18
• As an architect, our main job is to deal with tradeoffsPerformance, Power, Die Size, Complexity, Applications Support, Functionality, Compatibility, Reliability, etc.
• Technology trends, applications… How do we deal with all of this to make real tradeoffs?
• Abstractions allow this to happen
• Focus is on metrics of these abstractionsPerformance, Cost, Availability, Power
Computer ComponentsComputer Components
19
• Input/output devices
• Secondary storage: non-volatile, slower, cheaper
• Primary storage: volatile, faster, costlier
• Communication: Bus, cable
• CPU/processor
Processor Technology Processor Technology TrendTrend
21
• Integrated circuit technology– Transistor density: 35%/year– Die size: 10-20%/year– Integration overall: 40-55%/year
• DRAM capacity: 25-40%/year (slowing)
• Flash capacity: 50-60%/year– 15-20X cheaper/bit than DRAM
• Magnetic disk technology: 40%/year– 15-25X cheaper/bit then Flash– 300-500X cheaper/bit than DRAM
Memory and IO Technology TrendMemory and IO Technology Trend
22
• Bandwidth or throughput– Total work done in a given time– 10,000-25,000X improvement for processors– 300-1200X improvement for memory and disks
• Latency or response time– Time between start and completion of an event– 30-80X improvement for processors– 6-8X improvement for memory and disks