petascale i/o impacts on visualization hank childs lawrence berkeley national laboratory & uc...
TRANSCRIPT
![Page 1: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/1.jpg)
Petascale I/O Impacts on Visualization
Hank Childs
Lawrence Berkeley National Laboratory
& UC Davis
March 24, 2010
27B elementRayleigh-Taylor Instability(MIRANDA, BG/L)
2 trillion element mesh
2006
2012
![Page 2: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/2.jpg)
How does the {peta-, exa-} scale affect visualization?
Large # of time steps
Large ensembles
High-res meshes
Large # of variables
Your mileage may vary - Are you running full machine?- How much data do you
output?
![Page 3: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/3.jpg)
The soon-to-be “good ole days” … how visualization is done right now
P0P1
P3
P2
P8P7 P6
P5
P4
P9
Pieces of data(on disk)
Read Process Render
Processor 0
Read Process Render
Processor 1
Read Process Render
Processor 2
Parallelized visualizationdata flow network
P0 P3P2
P5P4 P7P6
P9P8
P1
Parallel Simulation Code
This technique is called “pure parallelism”
![Page 4: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/4.jpg)
Pure parallelism performance is based on # bytes to process and I/O rates.
Vis is almost always >50% I/O and sometimes 98% I/O
Amount of data to visualize is typically O(total mem)
FLOPs Memory I/O
Terascale machine
“Petascale machine”
Two big factors:
① how much data you have to read
② how fast you can read it Relative I/O (ratio of total memory and I/O) is
key
![Page 5: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/5.jpg)
Anedoctal evidence: relative I/O really is getting slower.
Machine name Main memory I/O rate
ASC purple 49.0TB 140GB/s 5.8min
BGL-init 32.0TB 24GB/s 22.2min
BGL-cur 69.0TB 30GB/s 38.3min
Petascale machine
?? ?? >40min
Time to write memory to disk
![Page 6: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/6.jpg)
Why is relative I/O getting slower?
• “I/O doesn’t pay the bills”—And I/O is becoming a dominant cost in the
overall supercomputer procurement.• Simulation codes aren’t as exposed.
—And will be less exposed with proposed future architectures.
![Page 7: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/7.jpg)
Recent runs of trillion cell data sets provide further evidence that I/O dominates
7
● Weak scaling study: ~62.5M cells/core
7
#coresProblem Size
TypeMachine
8K0.5TZAIXPurple
16K1TZSun LinuxRanger
16K1TZLinuxJuno
32K2TZCray XT5JaguarPF
64K4TZBG/PDawn
16K, 32K1TZ, 2TZCray XT4Franklin2T cells, 32K procs on Jaguar
2T cells, 32K procs on Franklin
- Approx I/O time: 2-5 minutes- Approx processing time: 10 seconds
![Page 8: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/8.jpg)
Summary: what are the challenges?
• Scale—We can’t read all of the data at full resolution
any more… • Q: What can we do?• A: We need algorithmic change.
• Insight—How are we going to understand it?
• (There is a lot more data than pixels!)
![Page 9: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/9.jpg)
Multi-resolution techniques use coarse representations then refine.
P0P1
P3
P2
P8P7 P6
P5
P4
P9
Pieces of data(on disk)
Read Process Render
Processor 0
Read Process Render
Processor 1
Read Process Render
Processor 2
Parallelized visualizationdata flow network
P0 P3P2
P5P4 P7P6
P9P8
P1
Parallel Simulation Code
P2
P4
![Page 10: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/10.jpg)
Multi-resolution: pros and cons
• Pros—Drastically reduce I/O & memory requirements
• Cons—Is it meaningful to process simplified version of
the data?—How do we generate hierarchical
representations? What costs do they incur?
![Page 11: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/11.jpg)
In situ processing does visualization as part of the simulation.
P0P1
P3
P2
P8P7 P6
P5
P4
P9
Pieces of data(on disk)
Read Process Render
Processor 0
Read Process Render
Processor 1
Read Process Render
Processor 2
P0 P3P2
P5P4 P7P6
P9P8
P1
Parallel Simulation Code
![Page 12: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/12.jpg)
In situ processing does visualization as part of the simulation.
P0P1
P3
P2
P8P7 P6
P5
P4
P9
GetAccessToData
Process RenderProcessor 0
Parallelized visualization data flow networkParallel Simulation Code
GetAccessToData
Process RenderProcessor 1
GetAccessToData
Process RenderProcessor 2
GetAccessToData
Process RenderProcessor 9
… … … …
![Page 13: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/13.jpg)
In situ: pros and cons
• Pros:—No I/O!—Lots of compute power available
• Cons:—Very memory constrained—Many operations not possible
• Once the simulation has advanced, you cannot go back and analyze it
—User must know what to look a priori• Expensive resource to hold hostage!
![Page 14: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/14.jpg)
Now we know the tools … what problem are we trying to solve?
• Three primary use cases:—Exploration—Confirmation—Communication
Examples:Scientific discoveryDebugging
Examples:Data analysisImages / moviesComparison
Examples:Data analysisImages / movies
Multi-res
In situ
Will still need pure parallelism and possibly other techniques such as data subsetting and streaming.
![Page 15: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/15.jpg)
Prepare for difficult conversations in the future.
• Multi-resolution:—Do you understand what a multi-resolution
hierarchy should look like for your data?—Who do you trust to generate it?—Are you comfortable with your I/O routines
generating these hierarchies while they write?—How much overhead are you willing to tolerate
on your dumps? 33+%?—Willing to accept that your visualizations are
not the “real” data?
![Page 16: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/16.jpg)
Prepare for difficult conversations in the future.
• In situ:—How much memory are you willing to give up
for visualization?—Will you be angry if the vis algorithms crash?—Do you know what you want to generate a
priori? • Can you re-run simulations if necessary?
![Page 17: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/17.jpg)
Visualization on BlueWaters: Two Scenarios
1) Pure parallelism continues (no SW change):—Visualization and analysis will be done using large
portions of BW, users charging against science allocations• Lots of time spent doing I/O• This increases overall I/O contention on BW
—Vis & analysis is slow people do less ( insights will be lost (?))
2) Smart techniques deployed:—Allocations are used for simulation, not vis & analysis—Less artificial I/O contention introduced—Ability to explore / interact with data
![Page 18: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/18.jpg)
18
184
VisIt is a richly featured, turnkey application
• VisIt is an open source, end user visualization and analysis tool for simulated and experimental data
– Used by: physicists, engineers, code developers, vis experts
– >100K downloads on web
• R&D 100 award in 2005• Used “heavily to exclusively” on 8 of
world’s top 12 supercomputers
217 pin reactor coolingsimulation. Run on ¼ of Argonne BG/P.
1 billion grid points
![Page 19: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/19.jpg)
19
19
Terribly Named!! Intended for more than just visualization!
Data Exploration
VisualDebugging Quantitative Analysis
Presentations
=?
Comparative Analysis
![Page 20: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/20.jpg)
20
20
VisIt has a rich feature set that can impact many science areas.
• Meshes: rectilinear, curvilinear, unstructured, point, AMR
• Data: scalar, vector, tensor, material, species• Dimension: 1D, 2D, 3D, time varying• Rendering (~15): pseudocolor, volume rendering,
hedgehogs, glyphs, mesh lines, etc…• Data manipulation (~40): slicing, contouring, clipping,
thresholding, restrict to box, reflect, project, revolve, …• File formats (~85)• Derived quantities: >100 interoperable building blocks
+,-,*,/, gradient, mesh quality, if-then-else, and, or, not• Many general features: position lights, make movie, etc• Queries (~50): ways to pull out quantitative information,
debugging, comparative analysis
![Page 21: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/21.jpg)
21
21
VisIt employs a parallelized client-server architecture.
• Client-server observations:—Good for remote
visualization—Leverages available
resources—Scales well—No need to move data
Additional design considerations:• Plugins• Multiple UIs: GUI (Qt), CLI
(Python), more…
remote machine
Parallel vis resources
Userdata
localhost – Linux, Windows, Mac
Graphics Hardware
![Page 22: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/22.jpg)
22
22
The VisIt team focuses on making a robust, usable product for end users.
• Manuals—300 page user manual—200 page command line interface manual—“Getting your data into VisIt” manual
• Wiki for users (and developers)• Revision control, nightly regression testing, etc• Executables for all major platforms• Day long class, complete with exercises
Slides from the VisIt class
![Page 23: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/23.jpg)
23
23
VisIt is a vibrant project with many participants.
• Over 50 person-years of effort• Over one million lines of code• Partnership between: Department of Energy’s Office of Nuclear
Energy, Office of Science, and National Nuclear Security Agency, and among others—NSF XD centers both expected to make large contributions.
2004-6
User communitygrows, includingAWE & ASC Alliance schools
Fall ‘06
VACET is funded
Spring ‘08
AWE enters repo
2003
LLNL user communitytransitioned to VisIt
2005
2005 R&D100
2007
SciDAC Outreach Center enablesPublic SW repo
2007
Saudi Aramcofunds LLNL to support VisIt
Spring ‘07
GNEP funds LLNL to support GNEP codes at Argonne
Summer‘07
Developers from LLNL, LBL, & ORNLStart dev in repo
‘07-’08
UC Davis & UUtah research done in VisIt repo
2000
Project started
‘07-’08
Partnership withCEA is developed
2008
Institutional supportleverages effort from many labs
Spring ‘09
More developersEntering repo allthe time
![Page 24: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/24.jpg)
24
VisIt: What’s the Big Deal?
• Everything works at scale• Robust, usable tool• Vis to code development to scientific insight
24
![Page 25: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/25.jpg)
VisIt and the smart data techiques
• Full pure parallelism implementation• Data subsetting well integrated
—(if you set up your data properly)• In situ: yes, but … it’s a memory hog• Multi-res: emerging effort in this space
![Page 26: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/26.jpg)
26
Three Ways To Get Data Into VisIt
• (1) Write to a known output format• (2) Write a plugin file format reader• (3) Integrate VisIt “in situ”
—“lib-VisIt” is linked into simulation code• (Note: Memory footprint issues with
implementation!)
—Use model:• simulation code advances• at some time interval (e.g. end of cycle), hands
control to lib-VisIt.• lib-VisIt performs vis & analysis tasks, then hands
control back to simulation code• repeat
26
![Page 27: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability](https://reader036.vdocuments.us/reader036/viewer/2022062515/56649cd75503460f949a0037/html5/thumbnails/27.jpg)
Summary
• Massive data will force algorithmic change in visualization tools.
• The VisIt project is making progress on bringing these algorithmic changes to the user community.
• Contact info:—Hank Childs, LBL & UC Davis—[email protected] / [email protected]—http://vis.lbl.gov/~hrchilds
Questions??