an overview of high performance computing
TRANSCRIPT
![Page 1: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/1.jpg)
1
11/29/2005 1
An Overview of High Performance An Overview of High Performance ComputingComputingJack Dongarra
University of Tennesseeand
Oak Ridge National Laboratory
00 2
OverviewOverview
♦Look at fastest computersFrom the Top500
♦Some of the changes that face usHardware SoftwareAlgorithms
![Page 2: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/2.jpg)
2
00 3
Technology Trends: Microprocessor Technology Trends: Microprocessor Chip CapacityChip Capacity
2X transistors/Chip Every 1.5 years “Moore’s Law”
Microprocessors have become smaller, denser, and more powerful.Not just processors, bandwidth, storage, etc. 2X memory and processor speed and ½ size, cost, & power every 18 months.
Gordon Moore (co-founder of Intel) Electronics Magazine, 1965
Number of devices/chip doubles every 18 months
00 4
H. Meuer, H. Simon, E. Strohmaier, & JDH. Meuer, H. Simon, E. Strohmaier, & JD
- Listing of the 500 most powerfulComputers in the World
- Yardstick: Rmax from LINPACK MPPAx=b, dense problem
- Updated twice a yearSC‘xy in the States in NovemberMeeting in Germany in June
- All data available from www.top500.org
Size
Rate
TPP performance
![Page 3: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/3.jpg)
3
00 5
Performance DevelopmentPerformance Development2.3 PF/s
1.167 TF/s
59.7 GF/s
280.6 TF/s
0.4 GF/s
1.646 TF/s
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Fujitsu
'NWT' NAL
NEC
Earth Simulator
Intel ASCI Red
Sandia
IBM ASCI White
LLNL
N=1
N=500
SUM
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
IBM
BlueGene/L
My Laptop
00 6
Architecture/Systems ContinuumArchitecture/Systems Continuum
♦ Custom processor with custom interconnect
Cray X1NEC SX-8IBM RegattaIBM Blue Gene/L
♦ Commodity processor with custom interconnect
SGI AltixIntel Itanium 2
Cray XT3, XD1AMD Opteron
♦ Commodity processor with commodity interconnect
Clusters Pentium, Itanium, Opteron, AlphaGigE, Infiniband, Myrinet, Quadrics
NEC TX7IBM eServerDawning
Loosely Coupled
Tightly Coupled ♦ Best processor performance for
codes that are not “cache friendly”
♦ Good communication performance♦ Simpler programming model♦ Most expensive
♦ Good communication performance♦ Good scalability
♦ Best price/performance (for codes that work well with caches and are latency tolerant)
♦ More complex programming model0%
20%
40%
60%
80%
100%
Jun-
93
Dec
-93
Jun-
94
Dec
-94
Jun-
95
Dec
-95
Jun-
96
Dec
-96
Jun-
97
Dec
-97
Jun-
98
Dec
-98
Jun-
99
Dec
-99
Jun-
00
Dec
-00
Jun-
01
Dec
-01
Jun-
02
Dec
-02
Jun-
03
Dec
-03
Jun-
04
Custom
Commod
Hybrid
![Page 4: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/4.jpg)
4
00 7
Commodity ProcessorsCommodity Processors
♦ Intel Pentium Nocona3.6 GHz, peak = 7.2 Gflop/sLinpack 100 = 1.8 Gflop/sLinpack 1000 = 4.2 Gflop/s
♦ Intel Itanium 21.6 GHz, peak = 6.4 Gflop/sLinpack 100 = 1.7 Gflop/sLinpack 1000 = 5.7 Gflop/s
♦ AMD Opteron2.6 GHz, peak = 5.2 Gflop/sLinpack 100 = 1.6 Gflop/sLinpack 1000 = 3.9 Gflop/s
00 8
Architectures / SystemsArchitectures / Systems
0
100
200
300
400
500
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
SIMD
Single Proc.
Cluster
Constellations
SMP
MPP
Cluster: Commodity processors & Commodity interconnect
Constellation: # of procs/node nodes in the system
(360)
![Page 5: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/5.jpg)
5
00 9
Interconnects / SystemsInterconnects / Systems
0
100
200
300
400
50019
93
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Others
Cray Interconnect
SP Switch
Crossbar
Quadrics
Infiniband
Myrinet
Gigabit Ethernet
N/A
(249)(101)
00 10
Processor TypesProcessor Types
0
100
200
300
400
500
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
SIMD
Sparc
Vector
MIPS
Alpha
HP
AMD
IBM Power
Intel
![Page 6: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/6.jpg)
6
00 11
Processors Used in Each Processors Used in Each of the 500 Systemsof the 500 Systems
Intel IA-3241%
Intel EM64T16%
IBM Power15%
AMD x86_6411%
Intel IA-649%
HP PA-RISC3%
Cray2%
HP Alpha1%
NEC1%
Sun Sparc1%
Hitachi SR80000%
91% = 66% Intel 15% IBM 11% AMD
00 12
26th List: The TOP1026th List: The TOP10
96
51202002JapanEarth Simulator Center35.86Earth-Simulator
SX-5NEC7
4
52002005USAOak Ridge National Lab20.53JaguarCray XT3 AMDCray10
122882005NetherlandsASTRONUniversity Groningen27.45eServer Blue GeneIBM
48002005SpainBarcelona Supercomputer Center27.91MareNostrum
PPC 970/MyrinetIBM85
108802005USASandia36.19Red StormCray XT3 AMDCray6
10
80002005USASandia38.27Thunderbird
Pentium/InfinibandDell5
101602004USANASA Ames51.87ColumbiaAltix, Itanium/InfinibandSGI4
3
102402005USADOE/NNSA/LLNL63.39ASC PurplePower5 p575IBM3
409602005USAIBM Thomas Watson91.29BGWeServer Blue GeneIBM2
1310722005USADOE/NNSA/LLNL280.6BlueGene/LeServer Blue GeneIBM1
#ProcYearCountryInstallation SiteRmax[TF/s]ComputerManufacturer
![Page 7: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/7.jpg)
7
00 13
Customer Segments / PerformanceCustomer Segments / Performance
Research
Industry
Academic
VendorClassified
Government
0%10%20%30%40%50%60%70%80%90%
100%19
93
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
00 14
Countries / PerformanceCountries / Performance
0%10%
20%30%40%
50%60%70%80%
90%100%
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Others
France
China
Germany
UK
Japan
US 68%
6.0%
5.4%
3.0%
2.6%
1.0%
![Page 8: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/8.jpg)
8
00 15
Asian Countries / SystemsAsian Countries / Systems
0
20
40
60
80
100
12019
93
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Others
India
Korea
China
Japan 21
17
7
4
17
00 16
Concurrency Levels of the Top500Concurrency Levels of the Top500
0
100
200
300
400
500
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
64k-128k32k-64k16k-32k8k-16k4k-8k2049-40961025-2048513-1024257-512129-25665-12833-6417-329-165-83-421
![Page 9: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/9.jpg)
9
00 17
Concurrency Levels of the Top500Concurrency Levels of the Top500
Minimum
Average
Maximum
1
10
100
1,000
10,000
100,000
1,000,000
Jun-9
3
Jun-9
4
Jun-9
5
Jun-9
6
Jun-9
7
Jun-9
8
Jun-9
9
Jun-0
0
Jun-0
1
Jun-0
2
Jun-0
3
Jun-0
4
Jun-0
5
# pr
oces
sors
50
1462
131K
00 18
Chip(2 processors)
Compute Card(2 chips, 2x1x1)
4 processors
Node Board(32 chips, 4x4x2)
16 Compute Cards64 processors
System(64 racks, 64x32x32)
131,072 procsRack(32 Node boards, 8x8x16)
2048 processors
2.8/5.6 GF/s4 MB (cache)
5.6/11.2 GF/s1 GB DDR
90/180 GF/s16 GB DDR
2.9/5.7 TF/s0.5 TB DDR
180/360 TF/s32 TB DDR
IBM IBM BlueGeneBlueGene/L/L131,072 Processors (#1131,072 Processors (#1--64K and #264K and #2--40K)40K)
“Fastest Computer”BG/L 700 MHz 131K proc64 racksPeak: 367 Tflop/sLinpack: 281 Tflop/s77% of peak
BlueGene/L Compute ASIC
Full system total of 131,072 processors
The compute node ASICs include all networking and processor functionality. Each compute ASIC includes two 32-bit superscalar PowerPC 440 embedded cores (note that L1 cache coherence is not maintained between these cores).(13K sec about 3.6 hours; n=1.8M
1.6 MWatts (1600 homes)43,000 ops/s/person
![Page 10: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/10.jpg)
10
00 19
Performance ProjectionPerformance Projection
My Laptop
1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015
N=1
N=500
SUM
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
10 Pflop/s
1 Eflop/s
100 Pflop/s
DARPA HPCS
00 20
A A PetaFlopPetaFlop Computer by the End of the Computer by the End of the DecadeDecade
♦ 10 Companies working on a building a Petaflop system by the end of the decade.
CrayIBMSunDawningGalacticLenovoHitachi NECFujitsuBull
Japanese Japanese ““Life SimulatorLife Simulator”” (10 (10 Pflop/sPflop/s))
} Chinese Chinese CompaniesCompanies
}
}
![Page 11: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/11.jpg)
11
00 21
Flops per Gross Domestic ProductFlops per Gross Domestic ProductBased on the November 2005 Top500 Based on the November 2005 Top500
0
20
40
60
80
100
120
140
160
180
200
Isrea
lUSA
New Zealand
Switzerla
ndRussia UK
Netherlan
ds
Australia
n
South Korea
Saudi A
rabia
China
Japa
n
Irelan
dSpain
German
y
Taiwan
Canada
Brazil
00 22
0
1000
2000
3000
4000
5000
6000
United Stat
es
Switzerl
and
Israel
New Zealan
d
United King
dom
Netherlan
ds
Austra
liaJa
pan
German
ySpa
in
Canada
Korea,
South
Sweden
Saudia
Arabia
France
Taiwan
Italy
Brazil
Mexico
China
Russia
India
KFlop/sKFlop/s per Capita (Flops/Pop)per Capita (Flops/Pop)Based on the November 2005 Top500Based on the November 2005 Top500
WETA Digital (Lord of the Rings)
Hint: Peter Jackson had something to do with this
Has nothing to do with the 47.2 million sheep in NZ
![Page 12: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/12.jpg)
12
00 23
Fuel Efficiency: Fuel Efficiency: GFlopsGFlops/Watt/Watt
Top 20 systemsBased on processor power rating only (3,>100,>800)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9BlueGene
/LBlue G
ene
ASC Purple
p5 1
.9 GHz
Columbia
- SGI A
ltix 1.
5 GHz
Thunde
rbird - P
entium 3.
6 GHz
Red Storm
Cray XT3,
2.0 GHz
Earth-Sim
ulator
MareNos
trum PPC 97
0, 2.2
GHz
Blue Gene
Jaguar
- Cray
XT3, 2
.4 GHz
Thunde
r - Int
el Ita
nium2 1.4G
HzBlue G
ene
Blue Gene
Cray XT3,
2.6 G
Hz
Apple XServ
e, 2.0
GHz
Cray X1E (4
GB)Cray
X1E (2
GB)
ASCI Q - A
lpha 1.25 G
Hz
IBM p5 575
1.9 G
Hz
System X 2.
3 GHz Apple
XServe/
SGI Altix
3700
1.6 G
Hz
GFl
ops/
Wat
t
00 24
10,00010,000
1,0001,000
100100
1010
11
‘‘7070 ‘‘8080 ‘‘9090 ‘‘0000 ‘‘1010
Pow
er D
ensi
ty (W
/cm
Pow
er D
ensi
ty (W
/cm
22 ))
4004400480088008
80808080
8085808580868086
286286 386386
486486
PentiumPentium®®
Hot PlateHot Plate
Nuclear ReactorNuclear Reactor
Rocket NozzleRocket Nozzle
SunSun’’s Surfaces Surface
Intel Developer Forum, Spring 2004 - Pat Gelsinger(Pentium at 90 W)
Cube relationship between the cycle time and power.
![Page 13: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/13.jpg)
13
00 25
Lower Lower VoltageVoltage
Increase Increase Clock RateClock Rate& Transistor & Transistor
DensityDensity
We have seen increasing number of gates on a chip and increasing clock speed.
Heat becoming an unmanageable problem, Intel Processors > 100 Watts
We will not see the dramatic increases in clock speeds in the future.
However, the number of gates on a chip will continue to increase.
Intel Yonah will double the processing power on a per watt basis.
00 26
3 GHz, 8 Cores3 GHz, 4 Cores
2 Cores2 Cores
4 Cores4 Cores
8 Cores8 Cores
1 Core1 Core
Free
Lun
ch F
or T
radi
tiona
l Sof
twar
e(It
just
runs
twic
e as
fast
eve
ry 1
8 m
onth
sw
ith n
o ch
ange
to th
e co
de!)
Ope
ratio
ns p
er s
econ
d fo
r ser
ial c
ode No Free Lunch For Traditional
Software(Without highly concurrent software it won’t get any faster!)
Additional operations per second if code can take advantage of concurrency
24 G
Hz,
1 C
ore
12 G
Hz,
1 C
ore
6 G
Hz
1 C
ore
3 GHz2 Cores
3GH
z1
Cor
e
![Page 14: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/14.jpg)
14
00 27
CPU Desktop Trends 2004CPU Desktop Trends 2004--20102010
2004 2005 2006 2007 2008 2009 2010
Cores Per Processor ChipHardware Threads Per Chip
0
50
100
150
200
250
300
Year
♦ Relative processing power will continue to double every 18 months
♦ 256 logical processors per chip in late 2010
00 28
Commodity Processor TrendsCommodity Processor TrendsBandwidth/Latency is the Critical Issue, not FLOPSBandwidth/Latency is the Critical Issue, not FLOPS
28 ns= 94,000 FP ops= 780 loads
50 ns= 1600 FP ops= 170 loads
70 ns= 280 FP ops= 70 loads
(5.5%) DRAM latency
27 GWord/s= 0.008 word/flop
3.5 GWord/s= 0.11 word/flop
1 GWord/s= 0.25 word/flop23%Front-side bus
bandwidth
3300 GFLOP/s 32 GFLOP/s 4 GFLOP/s 59% Single-chipfloating-point performance
Typical valuein 2020
Typical valuein 2010
Typical valuein 2005
Annual increase
Source: Getting Up to Speed: The Future of Supercomputing, National Research Council, 222 pages, 2004, National Academies Press, Washington DC, ISBN 0-309-09502-6.
Got Bandwidth?
![Page 15: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/15.jpg)
15
00 29
Fault Tolerance: MotivationFault Tolerance: Motivation
♦ Trends in HPC:High end systems with thousand of processors
♦ Increased probability of a node failureMost systems nowadays are robust
♦ MPI widely accepted in scientific computingProcess faults not tolerated in MPI model
Mismatch between hardware and (non fault-tolerant) programming paradigm of MPI.
00 30
Reliability of Leading-Edge HPC Systems
MTBI: 9.7 hours.3,016PittsburghLemieux
MTBF: 5.0 hours (’01) and 40 hours (’03).Leading outage sources: storage, CPU, 3rd-party HW.
8,192LLNL ASCI White
MTBI: 6.5 hours.Leading outage sources: storage, CPU, memory.
8,192LANL ASCI Q
ReliabilityCPUsSystem
MTBI: mean time between interrupts = wall clock hours / # downtime periodsMTBF: mean time between failures (measured)
♦ 100K processor systemsare herewe have fundamental challenges in dealing with machines of this size … and little in the way of programming support
![Page 16: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/16.jpg)
16
00 31
Future Challenge: Developing Future Challenge: Developing the Ecosystem for HPCthe Ecosystem for HPC
From the NRC Report on “The Future of Supercomputing”:♦ Hardware, software, algorithms, tools, networks, institutions,
applications, and people who solve supercomputing applications can be thought of collectively as an ecosystem
♦ Research investment in HPC should be informed by the ecosystem point of view - progress must come on a broad front of interrelated technologies, rather than in the form of individual breakthroughs.
A supercomputer ecosystem is a
continuum of computing platforms,
system software, algorithms, tools,
networks, and the people who know
how to exploit them to solve
computational science applications.
00 32
Real Crisis With HPC Is With The SoftwareReal Crisis With HPC Is With The Software♦ Our ability to configure a hardware system capable of
1 PetaFlop (1015 ops/s) is without question just a matter of time and $$.
♦ A supercomputer application and software are usually much more long-lived than a hardware
Hardware life typically five years at most…. Apps 20-30 yearsFortran and C are the main programming models (still!!)
♦ The REAL CHALLENGE is SoftwareProgramming hasn’t changed since the 70’sHUGE manpower investment
MPI… is that all there is?Often requires HERO programmingInvestments in the entire software stack is required (OS, libs, etc.)
♦ Software is a major cost component of modern technologies.The tradition in HPC system procurement is to assume that the software is free… SOFTWARE COSTS (over and over)
![Page 17: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/17.jpg)
17
00 33
Summary of Current Unmet NeedsSummary of Current Unmet Needs♦ Performance / Portability♦ Fault tolerance ♦ Memory bandwidth/Latency♦ Adaptability: Some degree of autonomy to self optimize,
test, or monitor. Able to change mode of operation: static or dynamic
♦ Better programming models Global shared address space Visible locality
♦ Maybe coming soon (incremental, yet offering real benefits):Global Address Space (GAS) languages: UPC, Co-Array Fortran, Titanium, Chapel)
“Minor” extensions to existing languagesMore convenient than MPIHave performance transparency via explicit remote memory references
00 34
Collaborators / SupportCollaborators / Support
♦ Top500 TeamErich Strohmaier, NERSCHans Meuer, MannheimHorst Simon, NERSC
http://www.top500.org/
![Page 18: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/18.jpg)
18
00 35
00 36
Next StepsNext Steps♦ Software to determine the checkpointing interval and number of
checkpoint processors from the machine characteristics.Perhaps use historical information.MonitoringMigration of task if potential problem
♦ Local checkpoint and restart algorithm.Coordination of local checkpoints.Processors hold backups of neighbors.
♦ Have the checkpoint processes participate in the computation and do data rearrangement when a failure occurs.
Use p processors for the computation and have k of them hold checkpoint.
♦ Generalize the ideas to provide a library of routines to do the diskless check pointing.
♦ Look at “real applications” and investigate “Lossy” algorithms.
![Page 19: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/19.jpg)
19
00 37
Operating Systems / SystemsOperating Systems / Systems
0
100
200
300
400
50019
98
1999
2000
2001
2002
2003
2004
2005
Others
FreeBSD
N/A
BSD Based
Linux
Unix
00 38
Clusters / SystemsClusters / Systems
0
50
100
150
200
250
300
350
1997
1998
1999
2000
2001
2002
2003
2004
2005
Others
Intel Itanium
Sun Fire
Alpha Cluster
Sun Cluster
Dell Cluster
AMD Cluster
HP AlphaServer
Intel Pentium
HP Cluster
IBM Cluster
![Page 20: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/20.jpg)
20
00 39
00 40
![Page 21: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/21.jpg)
21
00 41
Japanese:Japanese:TightlyTightly--Coupled Heterogeneous SystemCoupled Heterogeneous System
♦ Would like to get to 10 PetaFlop/s by 2011♦ Scalable, fits any computer center
Size, cost, ratio of components ♦ Easy and low-cost to develop new component♦ Scale merit of components
SwitchPresentPresent
Faster interconnect
Vector Node
Scalar Node
MD Node
Slower connectionFaster interconnect
Faster interconnect
Future systemFuture system
Vector Node
Scalar Node
FPGA Node
MD Node
Faster interconnect
00 42
How Big Is Big?How Big Is Big?♦ Every 10X brings new challenges
64 processors was once considered largeit hasn’t been “large” for quite a while
1024 processors is today’s “medium” size8096 processors is today’s “large”
we’re struggling even here
♦ 100K processor systemsare in constructionwe have fundamental challenges in dealing with machines of this size … and little in the way of programming support
![Page 22: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/22.jpg)
22
00 43
Interconnects / SystemsInterconnects / Systems
0
100
200
300
400
50019
93
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Cray Interconnect
SP Switch
Crossbar
Others
Infiniband
Quadrics
Gigabit Ethernet
Myrinet
N/A
28%
24%
Myrinet and GigE > 50% of market
00 44
Real Crisis With HPC Is With The Real Crisis With HPC Is With The Software Software
♦ Programming is stuckArguably hasn’t changed since the 60’s
♦ It’s time for a changeComplexity is rising dramatically
highly parallel and distributed systemsFrom 10 to 100 to 1000 to 10000 to 100000 of processors!!
multidisciplinary applications♦ A supercomputer application and software are usually
much more long-lived than a hardwareHardware life typically five years at most.Fortran and C are the main programming models
♦ Software is a major cost component of modern technologies.
The tradition in HPC system procurement is to assume that the software is free.
♦ We have too few ideas about how to solve this problem.
![Page 23: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/23.jpg)
23
00 45
Processor TypesProcessor Types
0
100
200
300
400
50019
93
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
SIMD
Sparc
MIPS
Intel
HP PA-Risc
HP Alpha
IBM Power
Other scalar
Vector
00 46
TodayToday’’s Processorss Processors
♦ pipelining (superscalar, OOO, VLIW, branch prediction, predication)
♦ simultaneous multithreading (SMT, Hyper-Threading, multi-core)
♦ SIMD vector instructions (VIS, MMX/SSE, AltiVec)
♦ caches and the memory hierarchy ♦ Intel added 36 instructions per year to
IA-32, or 3 instructions per month!
![Page 24: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/24.jpg)
24
00 47
0
200
400
600
800
1000
1200
1400
1600
China
Brazil
Italy
Mexico
France
Korea,
South
Saudia
Arab
ia
German
y
Belarus
Switzerl
and
Canad
aSpa
inJa
pan
Israe
l
United
Kingdo
m
New Zea
land
United
States
KFlop/sKFlop/s per Capita (Flops/Pop)per Capita (Flops/Pop)Based on the November 2004 Top500 onlyBased on the November 2004 Top500 only
WETA Digital (Lord of the Rings)
Hint: Peter Jackson had something to do with this
Has nothing to do with the 47.2 million sheep in NZ
00 48
TodayToday’’s CPU Architectures CPU Architecture
Moore’s Law for Power Consumption
Heat is becoming an unmanageable problem
![Page 25: An Overview of High Performance Computing](https://reader031.vdocuments.us/reader031/viewer/2022021807/620deed188998138c129a2ee/html5/thumbnails/25.jpg)
25
00 49
Self Adapting Numerical SoftwareSelf Adapting Numerical Software
♦ The process of arriving at an efficient solution involves many decisions by an expert.
Algorithm decisionsData decisionsManagement of the computing environmentProcessor specific tuning
Proceedings of the IEEE, V: 93 #: 2 Feb. 2005
Issue on Program Generation,
Optimization, and Platform Adaptation
Complex set of interaction between Users’ applicationsAlgorithmProgramming languageCompilerMachine instructionHardware
Many layers of translation from the application to the hardware. Changing with each generation of hardware.