cs 531 advanced computer architecture - auburn universityxqin/courses/cs531/lec01.pdfcs 531, new...
TRANSCRIPT
Slide 01-1CS 531, New Mexico Tech
CS 531 Advanced Computer Architecture
These slides are adapted from notes by Dr. David E. Culler (U.C. Berkeley)
Dr. Xiao QinNew Mexico Tech
http://www.cs.nmt.edu/[email protected]
Slide 01-2CS 531, New Mexico Tech
Sony PlayStation 3
Slide 01-3CS 531, New Mexico Tech
Insomniac Games
From IEEE Spectrum, Dec. 2006 Issue
Slide 01-4CS 531, New Mexico Tech
Resistance – Fall of Man
Source: images.playsyde.com
Slide 01-5CS 531, New Mexico Tech
Resistance – Fall of Man
Source: www.insomniacgames.com
Slide 01-6CS 531, New Mexico Tech
Slide 01-7CS 531, New Mexico Tech
Multi-Core CPU Classification
Homogeneous Multi-Core Processor Configuration
Heterogeneous Multi-Core Processor Configuration
Slide 01-8CS 531, New Mexico Tech
Cell Multi-Core Processor
Slide 01-9CS 531, New Mexico Tech
Cure for the Multicore Blues
• Michael McCool • The RapidMind Platform• The prescription for programmers
paralyzed by parallel processing • Applications:
– Real-time ray tracing – Games– Simulations
From IEEE Spectrum, Jan. 2007 Issue
Slide 01-10CS 531, New Mexico Tech
A crowd simulation of 16,000 individual chickens
From IEEE Spectrum, Jan. 2007 Issue
Slide 01-11CS 531, New Mexico Tech
Today’s Goal:
• Course Objectives• Course Content & Grading• Introduce you to Parallel Computer
Architecture• Answer your questions about CS 531• Provide you a sense of the trends that shape
the field
Slide 01-12CS 531, New Mexico Tech
CS 531: Semester Calendar
http://www.cs.nmt.edu/~xqin/courses/cs531
See the class webpage for the most up to date version!
Slide 01-13CS 531, New Mexico Tech
Slide 01-14CS 531, New Mexico Tech
Slide 01-15CS 531, New Mexico Tech
Slide 01-16CS 531, New Mexico Tech
Slide 01-17CS 531, New Mexico Tech
Slide 01-18CS 531, New Mexico Tech
What will you get out of CS531?
• In-depth understanding of the design and engineering of modern parallel computers– technology forces– fundamental architectural issues
• naming, replication, communication, synchronization– basic design techniques
• cache coherence, protocols, networks, pipelining, …– methods of evaluation– underlying engineering trade-offs
• from moderate to very large scale• across the hardware/software boundary
Slide 01-19CS 531, New Mexico Tech
Will it be worthwhile?• Absolutely!
– even through few of you will become PP designers
• The fundamental issues and solutions translate across a wide spectrum of systems.– Crisp solutions in the context of parallel machines.
• Pioneered at the thin-end of the platform pyramid on the most-demanding applications– migrate downward with time
• Understand implications for software
SuperServers
Departmenatal Servers
Workstations
Personal Computers
Workstations
Slide 01-20CS 531, New Mexico Tech
Topic Coverage• David E. Culler, Jaswinder Pal Singh, and Anoop Gupta,
“Parallel Computer Architecture: A Hardware/Software Approach,” ISBN 1-55860-343-3.
• Covers (These topics may change)• Fundamental computer architecture design issues • Perspective on parallel programming • Performance across the hardware/software boundary • Shared memory multiprocessors • Evaluating protocol design tradeoffs• Snoop-based multiprocessor design• Scalable multiprocessors• Directory-based cache coherence• Hardware/software tradeoffs• Latency tolerance• Interconnection network design
Slide 01-21CS 531, New Mexico Tech
Course Syllabus• Prerequisite: CS 331Computer Architecture• 1 midterm exam and no final exam• Grading
– Class Participation 10%– Exam 30%– Proposal 10%– Progress Report 10%– Presentation 10%– Technical Report 30%
Slide 01-22CS 531, New Mexico Tech
Course Syllabus (cont.)• Scale
– Letter grades will be awarded based on the following scale. This scale may be adjusted upwards if it is necessary based on the final grades.
– A+ >= 97 A >= 93 A- >= 90 B+ >= 87 B >= 83 B- >= 80 C+ >= 77 C >= 73 C- >= 70 D+ >= 67 D >= 63 D- >= 60 F < 60
Slide 01-23CS 531, New Mexico Tech
Office Hours and Exams
Mid-term Wednesday 3/5/2007
Office hours: Wednesday 1:30-3:00pm
Slide 01-24CS 531, New Mexico Tech
Am I going to read the book to you?
• NO!• Book provides a framework and complete
background, so lectures can be more interactive.– You do the reading– We’ll discuss it
• Projects will go “beyond”
Slide 01-25CS 531, New Mexico Tech
Questions
Please ask at any time!
Slide 01-26CS 531, New Mexico Tech
What is Parallel Architecture?• A parallel computer is a collection of processing
elements that cooperate to solve large problems fast
• Some broad issues:– Resource Allocation:
• how large a collection? • how powerful are the elements?• how much memory?
– Data access, Communication and Synchronization• how do the elements cooperate and communicate?• how are data transmitted between processors?• what are the abstractions and primitives for cooperation?
– Performance and Scalability• how does it all translate into performance?• how does it scale?
Slide 01-27CS 531, New Mexico Tech
Why Study Parallel Architecture?
Slide 01-28CS 531, New Mexico Tech
Why Study Parallel Architecture?
Slide 01-29CS 531, New Mexico Tech
Why Study Parallel Architecture?
Slide 01-30CS 531, New Mexico Tech
Role of a computer architect:To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology and cost.
Parallelism:• Provides alternative to faster clock for performance• Applies at all levels of system design• Is a fascinating perspective from which to view architecture• Is increasingly central in information processing
Why Study Parallel Architecture?
Slide 01-31CS 531, New Mexico Tech
Why Study it Today?
• History: diverse and innovative organizational structures, oftentied to novel programming models
• Rapidly maturing under strong technological constraints– The “killer micro” is ubiquitous– Laptops and supercomputers are fundamentally similar!– Technological trends cause diverse approaches to converge
• Technological trends make parallel computing inevitable• Need to understand fundamental principles and design
tradeoffs, not just taxonomies– Naming, Ordering, Replication, Communication performance
Slide 01-32CS 531, New Mexico Tech
Is Parallel Computing Inevitable?
• Application demands: Our insatiable need for computing cycles
• Technology Trends• Architecture Trends• Economics• Current trends:
– Today’s microprocessors have multiprocessor support– Servers and workstations becoming MP: Sun, SGI,
DEC, COMPAQ!...– Tomorrow’s microprocessors are multiprocessors
Slide 01-33CS 531, New Mexico Tech
Application Trends• Application demand for performance fuels advances
in hardware, which enables new applications, which...– Cycle drives exponential increase in microprocessor
performance– Drives parallel architecture harder
• most demanding applications
• Range of performance demands– Need range of system performance with progressively
increasing cost
New ApplicationsMore Performance
Slide 01-34CS 531, New Mexico Tech
Speedup
• Speedup (p processors) =
• For a fixed problem size (input data set), performance = 1/time
• Speedup fixed problem (p processors) =
Performance (p processors)
Performance (1 processor)
Time (1 processor)
Time (p processors)
Slide 01-35CS 531, New Mexico Tech
Slide 01-36CS 531, New Mexico Tech
Commercial Computing• Relies on parallelism for high end
– Computational power determines scale of business that can be handled
• Databases, online-transaction processing, decision support, data mining, data warehousing ...
• TPC benchmarks (TPC-C order entry, TPC-D decision support)– Explicit scaling criteria provided– Size of enterprise scales with size of system– Problem size not fixed as p increases.– Throughput is performance measure (transactions per
minute or tpm)
Slide 01-37CS 531, New Mexico Tech
Thro
ughp
ut (t
pmC
)
Number of processors
0
5,000
10,000
15,000
20,000
25,000
0 20 40 60 80 100 120
Tandem HimalayaDEC AlphaSGI PowerChallenge
HP PAIBM PowerPCOther
TPC-C Results for March 1996
• Parallelism is pervasive• Small to moderate scale parallelism very important• Difficult to obtain snapshot to compare across vendor platforms
Observations?
Slide 01-38CS 531, New Mexico Tech
Scientific Computing Demand
Slide 01-39CS 531, New Mexico Tech
Engineering Computing Demand• Large parallel machines a mainstay in many industries
– Petroleum (reservoir analysis)– Automotive (crash simulation, drag analysis, combustion
efficiency), – Aeronautics (airflow analysis, engine efficiency, structural
mechanics, electromagnetism), – Computer-aided design– Pharmaceuticals (molecular modeling)– Visualization
• in all of the above• entertainment (films like Toy Story)• architecture (walk-throughs and rendering)
– Financial modeling (yield and derivative analysis)– etc.
Slide 01-40CS 531, New Mexico Tech
1980 1985 1990 1995
1 MIPS
10 MIPS
100 MIPS
1 GIPS
Sub-BandSpeech Coding
200 WordsIsolated SpeechRecognition
SpeakerVeri¼cation
CELPSpeech Coding
ISDN-CD StereoReceiver
5,000 WordsContinuousSpeechRecognition
HDTV Receiver
CIF Video
1,000 WordsContinuousSpeechRecognitionTelephone
NumberRecognition
10 GIPS
• Also CAD, Databases, . . .• 100 processors gets you 10 years, 1000 gets you 20 !
Applications: Speech and Image Processing
Slide 01-41CS 531, New Mexico Tech
Is better parallel arch enough?
• AMBER molecular dynamics simulation program• Starting point was vector code for Cray-1• 145 MFLOP on Cray90, 406 for final version on 128-processor
Paragon, 891 on 128-processor Cray T3D
Why?
Slide 01-42CS 531, New Mexico Tech
Summary of Application Trends
• Transition to parallel computing has occurred for scientific and engineering computing
• In rapid progress in commercial computing– Database and transactions as well as financial– Usually smaller-scale, but large-scale systems also used
• Desktop also uses multithreaded programs, which are a lot like parallel programs
• Demand for improving throughput on sequential workloads– Greatest use of small-scale multiprocessors
• Solid application demand exists and will increase
Slide 01-43CS 531, New Mexico Tech
Perfo
rman
ce
0.1
1
10
100
1965 1970 1975 1980 1985 1990 1995
Supercomputers
Minicomputers
Mainframes
Microprocessors
Technology Trends
• Today the natural building-block is also fastest!
Observations?
Slide 01-44CS 531, New Mexico Tech
• Microprocessor performance increases 50% - 100% per year• Transistor count doubles every 3 years• DRAM size quadruples every 3 years• Huge investment per generation is carried by huge commodity market
0
20
4060
80100
120
140160
180
1987 1988 1989 1990 1991 1992
Integer FP
Sun 4260
MIPSM/120
IBMRS6000
540MIPSM2000
HP 9000750
DECalpha
Can’t we just wait for it to get faster?
Slide 01-45CS 531, New Mexico Tech
Proc $
Interconnect
Technology: A Closer Look• Basic advance is decreasing feature size ( λ )
– Circuits become either faster or lower in power• Die size is growing too
– Clock rate improves roughly proportional to improvement in λ– Number of transistors improves like λ2 (or faster)
• Performance > 100x per decade– clock rate < 10x, rest is transistor count
• How to use more transistors?– Parallelism in processing
• multiple operations per cycle reduces CPI– Locality in data access
• avoids latency and reduces CPI• also improves processor utilization
– Both need resources, so tradeoff• Fundamental issue is resource distribution, as in
uniprocessors
Slide 01-46CS 531, New Mexico Tech
• 30% per year
0.1
1
10
100
1,000
19701975
19801985
19901995
20002005
Clo
ck ra
te (M
Hz)
i4004i8008
i8080
i8086 i80286i80386
Pentium100
R10000
Growth Rates
Tran
sist
ors
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
19701975
19801985
19901995
20002005
i4004i8008
i8080
i8086
i80286i80386
R2000
Pentium R10000
R3000
40% per year
Slide 01-47CS 531, New Mexico Tech
Architectural Trends• Architecture translates technology’s gifts into
performance and capability• Resolves the tradeoff between parallelism and
locality– Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip
connect– Tradeoffs may change with scale and technology advances
• Understanding microprocessor architectural trends – Helps build intuition about design issues or parallel machines– Shows fundamental role of parallelism even in “sequential”
computers