la - thicasci simulation time scales go from femtoseconds to years to years 10 –10 10 –9 10 –8...
TRANSCRIPT
Presented at the October 9, 2001 THIC MeetingWestCoast Silverdale Hotel, Silverdale WA 98383-9191
The NNSA ASCI Program:Advanced Simulation and Computing
Steve LouisLawrence Livermore National Laboratory
7000 East Avenue, Livermore, CA, 94550-0234Phone:+1-925-422-1550 FAX: +1-925-423-8715
E-mail: [email protected]
UCRL-PRES-146034
This work was performed under the auspices of the U.S. Department of Energy by University of CaliforniaLawrence Livermore National Laboratory under contract No. W-7405-Eng-48.
THIC Mtg Silverdale WA 9-October-2001 2
OutlineOutlineOutline
• The NNSA Stockpile Stewardship Program
• Where We are Now: ASCI/LLNL Computing
• Challenges for Today and for the Future
THIC Mtg Silverdale WA 9-October-2001 3
ASCI supports the DOE/NNSAStockpile Stewardship ProgramASCI supports the DOE/NNSAASCI supports the DOE/NNSA
Stockpile Stewardship ProgramStockpile Stewardship Program
No nuclear testing
Simulation of thehigh-explosive
pre-nuclearphase
Many differentsystem designs
Underground nuclear testing
Frequentmodernization
Modest numbers ofwarheads
Few systemdesigns
Nomodernization
Numerical simulations
Simulation of thehigh-explosive
pre-nuclear phase
Numerical simulations
Laboratory simulations of the nuclear or high-energy density phase
Large numbers ofwarheads
Past Future
THIC Mtg Silverdale WA 9-October-2001 4
2004/2005 is a critical time period.Confidence in the stockpile is at risk.
2004/2005 is a critical time period.2004/2005 is a critical time period.Confidence in the stockpile is at risk.Confidence in the stockpile is at risk.
Challenge: Maintain stockpileconfidence as changes occurChallenge: Maintain stockpileChallenge: Maintain stockpileconfidence as changes occurconfidence as changes occur
THIC Mtg Silverdale WA 9-October-2001 5
Simulation plays central role tomaintain stockpile confidence
Simulation plays central role toSimulation plays central role tomaintain stockpile confidencemaintain stockpile confidence
Stockpileconfidence
Physical data•� Opacity•� EOS•� Nuclear
Experimentstestingalgorithms•� Numerical•� Physical
Algorithms•� Hydro•� Transport•� Burn•� Mix
Codes•� 1D•� 2D•� 3D
Performancepredictions•� Yield•� Output
TheoryComparisonto archived
test data
Stockpilesurveillance
Experimentsmeasuring
physical dataand
material aging
Abovegroundintegral
experimentstesting
complexinteractions
Stockpilerefurbishmentand rebuilding
But its value is critically dependent on the other elements of the integrated programBut its value is critically dependent on the other elements of the integrated program
THIC Mtg Silverdale WA 9-October-2001 6
Nature of simulations changingwith the loss of nuclear testingNature of simulations changingNature of simulations changingwith the loss of nuclear testingwith the loss of nuclear testing
Supporting a stockpile of aging, highly optimized nuclearweapons demands advanced simulation capability
Supporting a stockpile of aging, highly optimized nuclearSupporting a stockpile of aging, highly optimized nuclearweapons demands advanced simulation capabilityweapons demands advanced simulation capability
Without Nuclear Experiments
• Will it continue to work as it ages?• Is the simulation adequate for making
decisions affecting national security?
ASCI WhiteASCI White
With Nuclear Experiments
• Will it work as designed?• Is the simulation good enough to
risk cost of a nuclear experiment?
THIC Mtg Silverdale WA 9-October-2001 7
ASCI simulation time scales gofrom femtoseconds to years
ASCI simulation time scales goASCI simulation time scales gofrom from femtosecondsfemtoseconds to years to years
10–10 10–9 10–8 10–6 10–2 1
Femtoseconds
Nanoseconds
Microseconds
Milliseconds
Minutes
Years
Level 1: Atomic physics opacity
Level 2: Molecular/atomic level simulations
Level 3: Turbulence and mix simulations
Level 4: Continuum models
Distance (m)
Level 5: System validation Examples: fires, aging, explosions
> Aging
> Fires
> Explosions
THIC Mtg Silverdale WA 9-October-2001 8
Computing capabilities requireadvances in several key areasComputing capabilities requireComputing capabilities requireadvances in several key areasadvances in several key areas
Stockpile stewardship pushes the limits in weapon simulationcodes, computational power, and supporting infrastructures
Stockpile stewardship pushes the limits in weapon simulationStockpile stewardship pushes the limits in weapon simulationcodes, computational power, and supporting infrastructurescodes, computational power, and supporting infrastructures
Weapon codes and science
Applications
Computing power Computational Infrastructure
System-Area-Network Model for a Site Wide Global File System
System Data and Control NetworksSystem Data and Control NetworksSystem Data and Control Networks
CPUNode
…
FileSys
… NFSLoginLoginNet
NFSLoginLoginNet
FileSys
…
Net … Net
CPUNode Visualization Cluster
Capacity Compute Farms
Infiniband™ I/O Network
CapabilityPlatform
DigitalDisplays
High-Speed I/O Network(s)
HPSS
FileSys
System Data and Control Networks
FileSys
FileSys
… FileSys
FileSys
THIC Mtg Silverdale WA 9-October-2001 9
ASCI’s elements have evolvedto meet the SSP requirementsASCI’s elements have evolvedASCI’s elements have evolvedto meet the SSP requirementsto meet the SSP requirements
Defense Applications & ModelingDefense Applications & Modeling
Applications
V & V Physical &
MaterialsModeling
University PartnershipsUniversity Partnerships
IntegrationIntegration
Institutes
Alliances
Simulation& Computer
Science
Simulation& Computer
Science
Platforms
PSE
DistanceComputing
VIEWS
PathForward
Integrated Computing SystemsIntegrated Computing Systems
Production Computing
& Center Operation
Platforms
THIC Mtg Silverdale WA 9-October-2001 10
The ASCI budget is healthyand still growing
The ASCI budget is healthyThe ASCI budget is healthyand still growingand still growing
Excludes building construction
0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
FY96 FY97 FY98 FY99 FY00 FY01
Mill
ions
of d
olla
rs
THIC Mtg Silverdale WA 9-October-2001 11
Environmentalglobal climate
groundwater flow
Lasers & Energycombustion ICF modeling
Engineeringstructural dynamicselectromagnetics Experiments are
too expensive
Physics & Biologymaterials modeling
drug design
Experimentsare impractical
StockpileStewardship
radiation transporthydrodynamics
Experimentsare prohibited
TerascaleScientific
Simulation
Simulation emerging as a peer to theory and experiment in scientificdiscovery since it challenges these to refine quality and accuracy
Simulation also plays a key rolein virtually every LLNL programSimulation also plays a key roleSimulation also plays a key rolein virtually every LLNL programin virtually every LLNL program
THIC Mtg Silverdale WA 9-October-2001 12
OutlineOutlineOutline
• The NNSA Stockpile Stewardship Program
• Where We are Now: ASCI White at LLNL
• Challenges for Today and for the Future
THIC Mtg Silverdale WA 9-October-2001 13
Requirements for very largeparallel computer systems
Requirements for very largeRequirements for very largeparallel computer systemsparallel computer systems
• Many thousands of processors (up to ~20,000),with high reliability and high parallel efficiency
• Typical calculations will require a large fraction ofthe total machine for a hundred or more hours
• Some examples of problems that need extremelylarge-scale computing capabilities beyond ASCI—Micro and macro weather simulations
—Global climate and ocean simulations
—Material aging studies
—Pharmaceutical design
—Biology (brain function, circulatory systems, DNA)
THIC Mtg Silverdale WA 9-October-2001 14
What can you get for $250M?What can you get for $250M?What can you get for $250M?
Shortstop for the NNSA Softball TeamShortstop for the NNSA Softball Team 100-TF National Security Computer100-TF National Security Computer
THIC Mtg Silverdale WA 9-October-2001 15
ASCI White delivery in FY00 wasa key programmatic milepost
ASCI White delivery in FY00 wasASCI White delivery in FY00 wasa key programmatic mileposta key programmatic milepost
FY00 FY02FY01 FY03 FY04 FY05
10/1/99
Cap
abili
ty
ASCI Program Elements
3-D primary burn prototype simulation
3-D prototype hostile environment simulation
3-D secondary burn prototype simulation
Mechanics for normal environments
3-D prototype full-system coupled simulation
3-D high-fidelity-physics full-system burn simulation, initial capability
3-D high-fidelity-physics full-system simulation, initial capability
Full-system STS simulation
3-D high-fidelity-physics primary burn simulation, initial capability
3-D safety simulation of a complex abnormal explosive-initiation scenario
Coupled multi-physics for abnormalenv ironments
Applications
Delivery of initial macro-scale reactive flow model for highexplosive detonation derived from grain scale dynamics
Materials & Physics Modeling
Demonstrate initial validation methodology on the then-current state of application modeling of early-time primary behavior
Demonstrate initial uncertainty quantification assessments of ASCI nuclear and no n-nuclear simulation codes
Verification & Validation
Initial software developm ent environment extende d to the 10-TOPS system
PSE
Distance-computing environment available for use of the 10-TOPS ASCI system
Complex-wide infrastructure that integratesall ASCI resources
DisCom2
Prototype s ystem that al lows weapons analysts to see and understand results from 3 -D prototype primary -burn simulations
Ability to do realtime analysis on a 200TB ASCI d ataset
VIEWS
10 TeraOPS (Option White ) system de livery and checkout
30 TeraOPS final s ystem delivery and checkout
50+ TeraOPS (Option Purple) final system d elivery and checkout
100 TeraOPS final sys tem delivery and checkout
Platforms
√√√√√√√√
THIC Mtg Silverdale WA 9-October-2001 16
100 TeraOPS is the entry-levelcapability for SSP requirements100 TeraOPS is the entry-level100 TeraOPS is the entry-level
capability for SSP requirementscapability for SSP requirements
1996 1997 1998 1999 2000 2001 2002 2003 2004
Pea
k T
eraO
PS
Demonstrated last spring anddelivered summer 2000
Moore’s Law, single processor performance
Our deadline is the year 2004 Our deadline is the year 2004 Our deadline is the year 2004
LLNLIBM’s ASCI White
10+ teraOPSLANLASCI
30 teraOPS
ASCI 50+ teraOPS
ASCI 100 teraOPS
Sandia (Intel) ASCI “Red”1+ teraOPS
LLNL (IBM)LANL (SGI) ASCI “Blue”3+ teraOPS
THIC Mtg Silverdale WA 9-October-2001 17
National security continues torequire very high capability
National security continues toNational security continues torequire very high capabilityrequire very high capability
Lawrence Livermore National Laboratory
National Security Agency
Nu-Tec Life Sciences
Compaq
University of Tokyo
Sandia National Laboratories
Raytheon
Los Alamos National Laboratory
Lawrence Berkeley Laboratory
Nat. Center for Environmental Prediction
Naval Oceanographic Office
Maui HPC Center
Charles Schwab
DOE-NNSA LaboratoryDoD LaboratoryCommercial EnterpriseNon-Defense Research
105 150 20 25
Site capability in trillions of calc/sec
THIC Mtg Silverdale WA 9-October-2001 18
24
24
24HPGNHPGN
FDDI6
6
Blue Pacific SST 3.9 TeraOPHyper-Cluster Architecture
Blue Pacific SST 3.9 TeraOPBlue Pacific SST 3.9 TeraOPHyper-Cluster ArchitectureHyper-Cluster Architecture
Sector Y
Each SP sector comprised of• 488 Silver nodes• 24 HPGN Links
System Parameters• 3.89 TFLOP/s Peak• 2.6 TB Memory• 62.5 TB Global disk
1.5 GB/node Memory20.5 TB Global Disk4.4 TB Local Disk
Gb
E
2
Gb
E
2
Gb
E
2
Sector S
2.5 GB/node Memory24.5 TB Global Disk8.3 TB Local Disk
Sector K
1.5 GB/node Memory20.5 TB Global Disk4.4 TB Local Disk
• Partnership to build theworld’s most powerfulcomputer announced at theWhite House on July 26, 1996
• Contract was signed onAugust 12, 1996
• Initial delivery was made inLivermore more than 30 daysearly on September 20, 1996
• Full configuration was up andrunning by September 1998
THIC Mtg Silverdale WA 9-October-2001 19
Smaller White system used forscience runs Nov ‘00 - Feb ‘01Smaller White system used forSmaller White system used forscience runs Nov ‘00 - Feb ‘01science runs Nov ‘00 - Feb ‘01
• Recent science runs on Frost have provided a unique opportunity to studymaterial dynamics at the atomistic level with unprecedented problem sizes.
• IBM Almaden, with LLNL materials scientists and visualization experts,successfully ran computations involving a billion atoms on 2000-5000 CPUs.
• Results: surprising discoveries that cracks can travel at supersonic speeds,and showing in unprecedented detail the complexities and structure of themolecular dislocation dynamics.
THIC Mtg Silverdale WA 9-October-2001 20
LLNL now operates the world’smost powerful supercomputerLLNL now operates the world’sLLNL now operates the world’smost powerful supercomputermost powerful supercomputer
THIC Mtg Silverdale WA 9-October-2001 21
LLNL now operates the world’smost powerful supercomputerLLNL now operates the world’sLLNL now operates the world’smost powerful supercomputermost powerful supercomputer
• ASCI White peak speed of 12.3 TeraOPS (trillion operations per second)
• ASCI White weighs 106 tons, covers 10,000 square feet of floor space
• Contains 8,192 P3 375 MHz processors in 512 shared memory nodes
• Latest IBM technology: silicon-on-insulator with copper interconnects
• 8 TB of memory, 36 TB local disk, 110 TB global (GPFS) disk
• 5.12 GB/s local I/O and 12.8 GB/s global I/O bandwidths
• 4 login nodes, 3 system nodes, 16 GPFS server nodes, 32 VIEWS nodes
• 457 batch/compute nodes (each with 16 GB memory)
THIC Mtg Silverdale WA 9-October-2001 22
ASCI White IBM Nighthawk-2Node Specifications
ASCI White IBM Nighthawk-2ASCI White IBM Nighthawk-2Node SpecificationsNode Specifications
Number of CPUs per Node 16CPU Clock Speed 375 MHzNode Peak Perf. ~24 GigaOP/sMemory per node 16 GBLocal Disk per node 72 GB
POWER3 processors are super-scalarpipelined 64-bit RISC chips with twofloating-point units and three integerunits. They are capable of executingup to eight instructions per clockcycle and up to four floating-pointoperations per cycle.
THIC Mtg Silverdale WA 9-October-2001 23
ASCI White IBM SP Switch2high-performance technologyASCI White IBM SP Switch2ASCI White IBM SP Switch2
high-performance technologyhigh-performance technology
• Increased communicationbandwidth with reducedmessage latencies
• Bi-sectional B/W over 4 Tb/s
• Compared to previoustechnologies, provides:
— Faster interfaces,data paths, andmicroprocessor
— Reduced microcodeworkloads usinghardware assisteddatagram reassembly
— Microprocessor busoperations concurrentwith data movement
• Switch adapters controlled by nodesoftware and on-card microcode
• 500 MB/s data rates each directionper adapter with message retry
THIC Mtg Silverdale WA 9-October-2001 24
Simulations managed from datageneration to data assessment
Simulations managed from dataSimulations managed from datageneration to data assessmentgeneration to data assessment
• High performancesimulation requiresbalanced systems— Supercomputers
— GigE networks/switches
— Local and archival storage
— Data analysis/visualization
— Algorithm development
— Programming techniques
— Facilities
– floor space
– power
– cooling State-of-the-art visualization facilities in samebuildings that house code physicists/analysts
THIC Mtg Silverdale WA 9-October-2001 25
HPSS archival storage recentperformance improvements
HPSS archival storage recentHPSS archival storage recentperformance improvementsperformance improvements
Accomplishments
— A 20x performance increase in
15 months (faster nets and disks)
— PSE Milepost demonstrated
170 MB/s aggregate
throughput White-to-HPSS
— Large single file transfer rates
of up to 80MB/s White-to-HPSS
— Large singe file transfer rates
of up to 150MB/s White-to-SGI
Challenges
— Yearly doubling of throughput
is needed for next machine
At 170 MB/s, 2TB of data moves toAt 170 MB/s, 2TB of data moves tostorage in less than 4 hours. A yearstorage in less than 4 hours. A yearand a half ago it took two and a halfand a half ago it took two and a halfdays to move the same amount of datadays to move the same amount of data
Aggregate Throughput to Storage
1 MB/s 4 MB/s 6 MB/s 9 MB/s
120 MB/s
170 MB/s
0
20
40
60
80
100
120
140
160
180
FY96 FY97 FY98 FY99 FY00 FY01
MB
/sMoved to
HPSS
Moved to SPNodes
Moved to Jumbo GE &Parallel Striping
Moved to Faster Disk on FasterNodes & multi-node Concurrency
THIC Mtg Silverdale WA 9-October-2001 26
“With 3D data sets, I can’t look at all thenumbers anymore” ... LLNL scientist
XY plots and 2D graphics cannotaccurately represent 3D simulation data
New methods are needed toanalyze high resolution, 3D
simulation data.
Terascale data sets will not fit onmonitors.
3D sine wave
How does a scientist cope withterascale, 3D simulation data?How does a scientist cope withHow does a scientist cope withterascale, 3D simulation data?terascale, 3D simulation data?
THIC Mtg Silverdale WA 9-October-2001 27
Current ASCI PowerWall Capabilities at LLNL
Current ASCI PowerWallCurrent ASCI PowerWall Capabilities at LLNL Capabilities at LLNL
• B-132 Assessment Theater— 5x3 tiled 20M pixel display
— Usage: Two to three times per week for presentations todignitaries, some data analysis, other demos
• B-451 Video Cube PowerWall— 3x2 modular 8M pixel display
— Usage: Extensive both DNT and other
• B-111 Visualization Work Center— 4x2 tiled 10M pixel display
— Usage: Milepost review in January, some for data analysis,several demos. Will grow more when resource managementsystem deployed for instant access
• B-451 Vis Development Lab PowerWall— 2x2 tiled 20M pixel display
— Usage: Demos (less since VideoCube), development
THIC Mtg Silverdale WA 9-October-2001 28
OutlineOutlineOutline
• The NNSA Stockpile Stewardship Program
• Where We are Now: ASCI White at LLNL
• Challenges for Today and for the Future
THIC Mtg Silverdale WA 9-October-2001 29
ASCI 30 TeraOPS “Q” SystemASCI 30 TeraOPS “Q” SystemASCI 30 TeraOPS “Q” System
• ~30 TeraOPS
• ~12,000 processors
• ~12 TB of memory
• ~600 TB usable disk storage
• Multi-rail high-speed switch
• ID System is first delivery
• FS-P1 - Final System Phase 1
• FS-BU - Final System Build-Up
• FS - Final System 30 TeraOPS
THIC Mtg Silverdale WA 9-October-2001 30
Strategic Computing Complexfor siting the ASCI Q at LANLStrategic Computing ComplexStrategic Computing Complexfor siting the ASCI Q at LANLfor siting the ASCI Q at LANL
303,000 sq. ft. 43,500 sq. ft. unobstructed computer room1 PowerWall Theater, 4 Collaboration rooms, 2 Immersive RoomsDesign Simulation Laboratories (200 classified, 100 unclassified)200 seat auditorium
THIC Mtg Silverdale WA 9-October-2001 31
A 50+ TeraOPS procurementstrategy for viz and filesystems
A 50+ TeraOPS procurementA 50+ TeraOPS procurementstrategy for strategy for vizviz and and filesystemsfilesystems
• New ideas for visualization capabilities— Include visualization requirements with platform procurement— Separately priced options with target of <10% platform budget— Framework for multiple solutions, bridge existing environment— Fast access to raw data and visualization files— Special network to commodity rendering resources
• New ideas for networking, I/Oand file systems— Site-wide shared global file system— Possible open source development— 100+ GB/s delivered I/O xfer rates— External InfiniBand or 10Gb Enet— Parallel FTP transfers for speed
System-Area-Network Model for a Site Wide Global File System
System Data and Control NetworksSystem Data and Control NetworksSystem Data and Control Networks
CPUNode
…
FileSys
… NFSLoginLoginNet
NFSLoginLoginNet
FileSys
…
Net … Net
CPUNode Visualization Cluster
Capacity Compute Farms
Infiniband™ I/O Network
CapabilityPlatform
DigitalDisplays
High-Speed I/O Network(s)
HPSS
FileSys
System Data and Control Networks
FileSys
FileSys
… FileSys
FileSys
THIC Mtg Silverdale WA 9-October-2001 32
Warning: There is a majorASCI speed bump ahead
Warning: There is a majorWarning: There is a majorASCI speed bump aheadASCI speed bump ahead
Some Challenges• ASCI will achieve its objectives…..
— 100 TF by 2004 + full systems code
— Exceeding Moore’s Law for 7-10 years
— Energized other agencies and Nation
• ASCI soon at its limit for acceleration— 4x non-ASCI sites not sustainable
— If the H/W doesn’t break, the S/Wcomplexity will kill you...
• Major problems looming …— Cost, power, space
— SW scalability, usability, reliability
— Interconnect
— Distance-to-memory issues
— Availability
Quote from PITAC HPC summary
“Suppliers of high-end systems sufferfrom unusual market pressures...”
ASCIASCI
THIC Mtg Silverdale WA 9-October-2001 33
If ASCI acceleration stops, howto take simulation to next step?If ASCI acceleration stops, howIf ASCI acceleration stops, howto take simulation to next step?to take simulation to next step?
PITAC Recommendations(President’s IT Advisory Committee)
— $$ for R&D on innovativecomputing technologies
— $$ for software research
— $$ for Petaflops on someapplications by 2010
— $$ to fund the most powerfulhigh-end systems
— Can this be leveraged into abroad national program?
THIC Mtg Silverdale WA 9-October-2001 34
Disruptive technologiesDisruptive technologiesDisruptive technologies
• Disruptive technology tends to startsmall with much faster growth rate
• R&D today can impact 2010-2020timeframe and accelerate a transitionfrom PetaFLOP to ExaFLOPcomputations
• This transition will requirefundamental research anddevelopment in:
— Processor technology
— Memory
— Computer architecture
— Operating systems
— Programming environments
— Scientific applications
— Storage including MEMSand holographic devices
Per
form
ance
1970
Mainframesdied
1980 1990 2000 2010 2020
ECL
CMOS
Mega
Giga
Tera
Peta
Exa
Kilo
Conventionalarchitectures
die
ComingDisruptive
Technologies?
Evolutionary ImprovementsEvolutionary Improvements require R&D as well require R&D as well
THIC Mtg Silverdale WA 9-October-2001 35
Summary and ConclusionsSummary and ConclusionsSummary and Conclusions
Unprecedented hardware and software simulation capabilitieshave been built through the ASCI Program at LLNL, LANL, SNL
Advanced simulation capabilities have several major elements(all of which must be present for effective use)— Advanced codes, skilled scientists— Advanced computing platforms and visualization
Simulation has become an integral part of science and technologyprograms at the national labs
“We are changing the nature of scientific discovery”
ASCI-style acceleration is inevitably going to slow down— R&D for evolutionary and disruptive technologies is the hope— Partnerships with industry and academia critical to success
“The opportunities go to those who understand the trends”
THIC Mtg Silverdale WA 9-October-2001 36
This work was performed under the auspices of the U.S. Department of Energy by Universityof California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48.
DISCLAIMER
This document was prepared as an account of work sponsored by an agency of the UnitedStates Government. Neither the United States Government nor the University of Californianor any of their employees, makes any warranty, express or implied, or assumes any legalliability or responsibility for the accuracy, completeness, or usefulness of any information,apparatus, product, or process disclosed, or represents that its use would not infringeprivately owned rights. Reference herein to any specific commercial products, process, orservice by trade name, trademark, manufacturer, or otherwise, does not necessarilyconstitute or imply its endorsement, recommendation, or favoring by the United StatesGovernment or the University of California. The views and opinions of authors expressedherein do not necessarily state or reflect those of the United States Government or theUniversity of California, and shall not be used for advertising or product endorsementpurposes.