efficient visualization and analysis of very large climate data

16
Efficient Visualization and Analysis of Very Large Climate Data Hank Childs, Lawrence Berkeley National Laboratory December 8, 2011 Lawrence Livermore National Laboratory

Upload: tarak

Post on 08-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Efficient Visualization and Analysis of Very Large Climate Data. Lawrence Livermore National Laboratory. Hank Childs, Lawrence Berkeley National Laboratory December 8, 2011. Problem Statement. Climate data is getting increasingly large (massively data!), both spatially and temporally - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient Visualization and Analysis of Very Large Climate Data

Efficient Visualization and Analysis of Very Large Climate Data

Hank Childs, Lawrence Berkeley National Laboratory

December 8, 2011

Lawrence Livermore National Laboratory

Page 2: Efficient Visualization and Analysis of Very Large Climate Data

Problem Statement

• Climate data is getting increasingly large (massively data!), both spatially and temporally

• Climate scientists want to process this data in lots of ways:– Analyze simulation recently run on their supercomputer– Download remote data to nearby supercomputer for

analysis– Download remote data to desktop computer for analysis– Send analysis routines to remote machine location,

results are returned.

Page 3: Efficient Visualization and Analysis of Very Large Climate Data

VisIt is an open source, richly featured, turn-key application for large data.

• Used by:– Visualization experts– Simulation code developers– Simulation code consumers

• Popular– R&D 100 award in 2005– Used on many of the Top500– >>>100K downloads

• Developed by:– NNSA, SciDAC, NEAMS, NSF,

and more217 pin reactor cooling simulation

Run on ¼ of Argonne BG/P Image credit: Paul Fischer, ANL

1 billion grid points / time slice

Page 4: Efficient Visualization and Analysis of Very Large Climate Data

VisIt employs a parallelized client-server architecture.

• Client-server observations:– Good for remote visualization– Leverages available resources– Scales well– No need to move data

remote machine

Parallel vis resources

Userdata

localhost – Linux, Windows, Mac

Graphics Hardware

Page 5: Efficient Visualization and Analysis of Very Large Climate Data

VisIt recently demonstrated good performance at unprecedented scale.

● Weak scaling study: ~62.5M cells/core

5

#coresProblem Size

ModelMachine

8K0.5TIBM P5Purple

16K1TSunRanger

16K1TX86_64Juno

32K2TCray XT5JaguarPF

64K4TBG/PDawn

16K, 32K1T, 2TCray XT4Franklin

Two trillion cell data set, rendered in VisIt by David Pugmire on ORNL

Jaguar machine

VisIt’s data processing techniques are more than scalability at massive concurrency;

we are leveraging a suite of techniques developed over the last decade by VACET,

the NNSA, and more.

VisIt’s data processing techniques are more than scalability at massive concurrency;

we are leveraging a suite of techniques developed over the last decade by VACET,

the NNSA, and more.

Page 6: Efficient Visualization and Analysis of Very Large Climate Data

VisIt is used to look at simulated and experimental data from many areas.

Fusion, Sanderson, UUtah

Particle accelerators, Ruebel, LBNL

Astrophysics, Childs

Nuclear Reactors, Childs

Page 7: Efficient Visualization and Analysis of Very Large Climate Data

Problem Statement

• Climate data is getting increasingly massive, both spatially and temporally

• Climate scientists want to process this data in lots of ways:– Analyze simulation recently run on their supercomputer– Download remote data to nearby supercomputer for

analysis– Download remote data to desktop computer for analysis– Send analysis routines to remote machine location,

results are returned.

Mission accomplished?Mission accomplished?

Page 8: Efficient Visualization and Analysis of Very Large Climate Data

General-purpose tools vs application-specific tools

• Are made specifically to solve your problem:– Streamlined user

interface– Application-specific

analysis• But, they have smaller user

and developer communities, so they are:– Less robust– Less efficient algorithms– Smaller set of features

• Are developed by large teams, leading to:– Robustness– Efficient algorithms– Rich set of features

General-purpose tools: Application-specific tools:

• But:– They aren’t streamlined

for a given application area (lots of button clicks)

– They don’t have application-specific methods

So which do users want?So which do users want?

Page 9: Efficient Visualization and Analysis of Very Large Climate Data

Amazing developments over the last decade…

• Very useful packages now available that make quick tool development possible:– Python, R, VTK, Qt

• But this doesn’t solve the large data issue. – … but tools now are available that do that as well:

• VisIt & ParaView• Great idea: put all these products together into one tool (VisTrails).• Users get:

– The robustness, richness, and efficiency of large development efforts

– Streamlined user interface & climate-specific analysis (via CDAT)

This tool is UV-CDAT.This tool is UV-CDAT.

Page 10: Efficient Visualization and Analysis of Very Large Climate Data

UV-CDAT

• UV-CDAT = Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT)

• Goal: robust tool, capable of doing powerful analysis on large climate data sets.

• Collaboration between two DOE BER teams• One tool that incorporates many packages…

– … including “VisIt”

UV-CDAT Developers:Dean Williams (PI)Jim Ahrens, Andy Bauer, Dave Bader, Berk Geveci, Timo Bremer, Claudio Silva, David Partyka, Charles Doutriaux, Robert Drach, Emanuele Marques, Feiyi Wang, Jerry Potter, Galen Shipman, John Patchett, Phil Jones, Thomas Maxwell, and many more

Page 11: Efficient Visualization and Analysis of Very Large Climate Data

Integrated UV-CDAT GUI: Project, Plot, and Variables View

Page 12: Efficient Visualization and Analysis of Very Large Climate Data

P0P1

P3

P2

P8P7 P6

P5

P4

P9

Pieces of data

(on disk)

Read Process Render

Processor 0

Read Process Render

Processor 1

Read Process Render

Processor 2

Parallelized visualizationdata flow network

P0P0 P3P3P2P2

P5P5P4P4 P7P7P6P6

P9P9P8P8

P1P1

Parallel Simulation Code

Big data visualization tools use “pure parallelism” to process data.

This is a good approach for high resolution meshes

… but climate data is often different

This is a good approach for high resolution meshes

… but climate data is often different

Page 13: Efficient Visualization and Analysis of Very Large Climate Data

Parallelizing Processing Over Time Slices: Improving Performance For Climate Data

Objective

Progress & ResultsImpact

VisIt’s parallel processing techniques were designed for single time slices of very high resolution meshes. We must adapt this approach for the lower resolution and high temporal frequency characteristic of climate data.

• We have modified VisIt’s underlying infrastructure to have a “parallelize over time” processing mode.

• We implemented a simple algorithm (“maximum value over time”) as a proof of concept

Climate scientists with access to parallel resources for their data will be able to process their data significantly faster through parallelization. This software investment will enable other algorithms developed by the project to also be accelerated through parallelization.

P0P0 P1P1 P2P2

VisIt parallelizing over a high resolution spatial

mesh

VisIt parallelizing temporally over a low

resolution mesh

P0P0 P1P1 P2P2

T=0T=0 T=1T=1 T=2T=2

T=3T=3 T=4T=4 T=5T=5

T=6T=6 T=7T=7 T=8T=8

Concurrency Average Time

Speedup

1 480.0s -

2 223.0s 2.1X

4 120.0s 4.0X

8 62.0s 7.8X

16 29.0s 16.5X

32 15.5s 31.0X

64 7.8s 61.5X

128 4.2s 114X

Scaling on 2130 time slices of NetCDF climate data (source: Wehner)

Scaling on 2130 time slices of NetCDF climate data (source: Wehner)

This is just one aspect of the uniqueness of climate data … many more.

This is just one aspect of the uniqueness of climate data … many more.

Page 14: Efficient Visualization and Analysis of Very Large Climate Data

Success story: spatial extreme value analysis

• We were able to parallelize the spatial extreme value analysis:– Parallelization was both spatial and temporal– Parallelization was done with VisIt infrastructure, but

analysis was done with R (via VisIt-R linkage)• Ran on CCSM3.0, 100 year, daily precipitation data• Near perfect scaling on 36,500 time slices

Page 15: Efficient Visualization and Analysis of Very Large Climate Data

Summary

• Visualization and analysis of massive data is possible– Other communities have built tools for this purpose– These tools are effective for:

• Local or remote data• Modest to massive parallelization (& serial too!)• Interactive or batch processing

• Our effort focuses on deploying VisIt and R as part of UV-CDAT.– VisIt is excellent with large data and has been adapted to

work with climate data.– We also are collaborating on cutting edge analysis

techniques with climate scientists

Page 16: Efficient Visualization and Analysis of Very Large Climate Data

Example climate visualizations with VisIt