course description: parallel computer architecture · 9/12/2004 \course\eleg652-04f\topic0a.ppt 12...
TRANSCRIPT
9/12/2004 \course\eleg652-04F\Topic0a.ppt 1
Course Description:
Parallel Computer Architecture
9/12/2004 \course\eleg652-04F\Topic0a.ppt 2
Reading List
Slides: Topic1x
Henn&Patt: Chapter 1
CullerSingh98: Chapter 1
Other assigned readings from homework and classes
9/12/2004 \course\eleg652-04F\Topic0a.ppt 3
Why Study Parallel Architecture?
Role of a computer architect:
To design and engineer the various levels of a computer system to maximize performance and programmabilitywithin limits of technology and cost.
Parallelism:• Provides alternative to faster clock for performance
• Applies at all levels of system design
• Is a fascinating perspective from which to view architecture
• Is increasingly central in information processing
9/12/2004 \course\eleg652-04F\Topic0a.ppt 4
Application demands
Technology Trends
Architecture Trends
Economics
Inevitability of Parallel Computing
9/12/2004 \course\eleg652-04F\Topic0a.ppt 5
Application Trends
Demand for cycles fuels advances in hardware, and vice-versaRange of performance demandsGoal of applications in using parallel machines: SpeedupProductivity requirement
9/12/2004 \course\eleg652-04F\Topic0a.ppt 6
Summary of Application Trends
Transition to parallel computing has occurred for scientific and engineering computingIn rapid progress in commercial computing
Desktop also uses multithreaded programs, which are a lot like parallel programsDemand for improving throughput on sequential workloads
Demand on productivity
9/12/2004 \course\eleg652-04F\Topic0a.ppt 7
Technology: A Closer Look
Basic advance is decreasing feature size ( λ )� Clock rate improves roughly proportional to
improvement in λ� Number of transistors improves like λ2 (or faster)
Performance > 100x per decade; clock rate 10x, rest transistor count
How to use more transistors?
� Parallelism in processing
� Locality in data access
� Both need resources, so tradeoff
Proc $
Interconnect
9/12/2004 \course\eleg652-04F\Topic0a.ppt 8
Clock Frequency Growth Rate
•30
% p
er y
ear
1970 1975 1980 1985 1990 1995 2000 200510
−1
100
101
102
103
104
4004
8008
8080 8086
286 386
Pentium
Pentium 4
Itanium 2−2003
MH
z
9/12/2004 \course\eleg652-04F\Topic0a.ppt 9
Transistor Count Growth Rate
• 1 billion transistors on chip in early 2000’s A.D.
• Transistor count grows much faster than clock rate- 40% per year, order of magnitude more contribution in 2 decades
1970 1975 1980 1985 1990 1995 2000 200510
−3
10−2
10−1
100
101
102
103
40048085
8008 8080
8086
286386
486
PentiumPentium Pro
Pentium 4
Itanium 2−2002
Itanium 2−2003M
illio
n Tr
ansi
stor
s
9/12/2004 \course\eleg652-04F\Topic0a.ppt 10
Similar Story for Storage
Divergence between memory capacity and speed more pronouncedLarger memories are slower� Need deeper cache hierarchies
Parallelism and locality within memory systems
Disks too: Parallel disks plus caching
9/12/2004 \course\eleg652-04F\Topic0a.ppt 11
Moore’s Law and Headcount
Along with the number of transistors, the effort and headcount required to design a microprocessor has grown exponentially
9/12/2004 \course\eleg652-04F\Topic0a.ppt 12
Architectural Trends
Architecture: performance and capability
Tradeoff between parallelism and locality� Current microprocessor: 1/3 compute, 1/3 cache,
1/3 off-chip connect
Understanding microprocessor architectural trends Four generations of architectural history: tube, transistor, IC, VLSI
9/12/2004 \course\eleg652-04F\Topic0a.ppt 13
Technology Progress Overview
Processor speed improvement: 2x per year (since 85). 100x in last decade.
DRAM Memory Capacity: 2x in 2 years (since 96). 64x in last decade.
DISK capacity: 2x per year (since 97). 250x in last decade.
9/12/2004 \course\eleg652-04F\Topic0a.ppt 14
Motorola’s PowerPC 604 Pentium
9/12/2004 \course\eleg652-04F\Topic0a.ppt 15
9/12/2004 \course\eleg652-04F\Topic0a.ppt 16
Technology Progress Overview
Processor speed improvement: 2x per year (since 85). 100x in last decade.
DRAM Memory Capacity: 2x in 2 years (since 96). 64x in last decade.
DISK capacity: 2x per year (since 97). 250x in last decade.
9/12/2004 \course\eleg652-04F\Topic0a.ppt 17
Summary: Parallel Architecture?
Increasingly attractive
� Economics, technology, architecture, application
Parallelism exploited at many levels
Same story from memory system perspective
Wide range of parallel architectures make sense
9/12/2004 \course\eleg652-04F\Topic0a.ppt 18
9/12/2004 \course\eleg652-04F\Topic0a.ppt 19
9/12/2004 \course\eleg652-04F\Topic0a.ppt 20
9/12/2004 \course\eleg652-04F\Topic0a.ppt 21
9/12/2004 \course\eleg652-04F\Topic0a.ppt 22
9/12/2004 \course\eleg652-04F\Topic0a.ppt 23
The Earth Simulator Machine in Japan
Earth Simulator (2002)
� Max 40 TFLOPS
� No.1 in TOP500 list
� General purpose
� Parallel vector processors
� 400 M$�development�
9/12/2004 \course\eleg652-04F\Topic0a.ppt 24
9/12/2004 \course\eleg652-04F\Topic0a.ppt 25
HPC Architecture
Vector Processor � 1976�
Parallel Processors � 1985�
MPU Cluster�Grid � 1997�
massively PP � 2008�2010
(CRAY-1)
(CM-1)
(ASCI-RED)
(DARPA-HPCS machinesGRAPE-DRBlueGene/LBG/C64)
9/12/2004 \course\eleg652-04F\Topic0a.ppt 26
Cluster computer of commodity MPU �
1997�ASCI Project � ASCI-Q 20TFLOPS(2003) 8,192 CPUs�
� ASCI-Purple 100TFLOPS(2005) 12,544 CPUs
� OLNL project (2004)
Limitation of current cluster� Low utilization of CPU due to
high-latency in interconnection
� No automatic parallelization
Limitation by size and power� ASCI-Purple (12,544 CPUs�
� �MW
ASCI-Q 20TFLOPS
9/12/2004 \course\eleg652-04F\Topic0a.ppt 27
New generation parallel systems �
2008�� IBM BlueGene/L Project (360TFLOPS�2005)
High density parallel processor
�65,536 CPU chips in 64 racks�131,072 processors�
IBM BlueGene/C64 Project (1.1 PFlops, 2007 ?)
HPCS Project
� IBM PERCS
� Cray Cascade
� SUN Hero project�
IBM Blue Gene/L
9/12/2004 \course\eleg652-04F\Topic0a.ppt 28
Landscape of Microprocessor Families
0
0.5
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Frequency (MHz)
SP
EC
int2
000/
MH
z
Intel-x86
AMD-x86
Alpha
PowerPC
Sparc
IPF
SPECint2000 800700600500400300200
100
50
25
PIII-Xeon
P4
Athlon
264C
Sparc-III
PIII
264A
604eItanium