iap09 cuda@mit 6.963 - lecture 01: high-throughput scientific computing (hanspeter pfister, harvard)

50
High-Throughput Scientific Computing Hanspeter Pfister pfi[email protected]

Upload: npinto

Post on 29-Nov-2014

4.386 views

Category:

Education


1 download

DESCRIPTION

See http://sites.google.com/site/cudaiap2009 and http://pinto.scripts.mit.edu/Classes/CUDAIAP2009

TRANSCRIPT

Page 1: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

High-Throughput Scientific Computing

Hanspeter [email protected]

Page 2: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Themes

• How is the brain wired?

• How did the Universe start?

Page 3: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

How is the brain wired?The Connectome Project

Page 4: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Connectome Team• Harvard Center for Brain Science

– Jeff Lichtman & Clay Reid

• Microsoft Research / UW– Michael Cohen

• Kitware Inc.– Will Schroeder, Charles Law, Rusty Blue

• VRVis Vienna– Markus Hadwiger, Johanna Beyer

• IIC– Amelio Vazquez, Eric Miller (Tufts)– Won-Ki Seung, Hanspeter Pfister

Page 5: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

The Scientific Challenge

composite from Roe et al. 1989, Sutton and Brunso-Bechtold 1991

Page 6: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Confocal Microscopy:Brainbow

Adapted from OlympusConfocal.com

Page 7: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Electron Microscopy: ATLUM

Page 8: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Serial Sectioning

...Section i, i (1, …,N)

Adapted from http://parasol.tamu.edu Texas A&M University

z

x y

Page 9: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

40,000x40,000 pixels1.6 GB

120x120 µm (3 nm/pixel)

Here shown 40x undersampled

6 15mu EM big view

Page 10: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

5 8mu rlp

Page 11: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

4 3mu rlp

Page 12: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

3 1mu rlp

Page 13: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

2 300 nm rlp

Page 14: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

The Data Challenge• 1 mm3 ~= mouse thalamus ~= 1 petabyte

• 1 cm3 ~= mouse brain ~= 1 exabyte

• 1000 cm3 ~= human brain ~= 1 zettabyte

All of Google’s world-wide storage today ~= 1 exabyte

Page 15: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Addressing the Data Challenge

• Multi-Scale Imaging

• Hierarchical Data Representation

• Distributed Heterogeneous Computing

• Visualization

• Segmentation

• Analysis

Page 16: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Addressing the Data Challenge

• Multi-Scale Imaging

• Hierarchical Data Representation

• Distributed Heterogeneous Computing

• Visualization

• Segmentation

• Analysis

Page 17: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Direct Volume Rendering

Page 18: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Ray Casting• Image-order ray shooting

•Interpolate•Assign color & opacity•Composite

•Simple to implement•Very flexible

(adaptive sampling, …)•Correct perspective

Page 19: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Transfer Functions• Mapping of density to optical properties• Simplest: color table with opacity over density

Page 20: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Connectome: EM Data

Page 21: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Single-Pass Ray Casting• Enabled by conditional loops • Substitute multiple passes with single loop and early

loop exit

• Volume rendering examplein NVIDIA CUDA SDK(procedural ray setup)

Page 22: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Basic Ray Setup / Termination•Two main approaches:

•Procedural ray/box intersection[Röttger et al., 2003], [Green, 2004]

•Rasterize bounding box[Krüger and Westermann, 2003]

Page 23: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Procedural Ray Setup / Term.•Procedural ray / box intersection

•Everything handled infragment shader

• Ray given by camera positionand volume entry position

• Exit criterion needed

• Pro: simple and self-contained• Con: full load on fragment shader

Page 24: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

- =

"Image-Based" Ray Setup / Term.

• Rasterize bounding boxfront faces and back faces

• Ray start positions:front faces

• Direction vectors:back faces − front faces

• Independent of projection (orthogonal/perspective)

Page 25: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Kernel• Image-based

ray setup• Ray start image• Direction image

• Ray-cast loop• Sample volume• Accumulate

color and opacity

• Terminate

• Store output

Page 26: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Standard Ray Casting Optim. (1)

Early ray termination•Isosurfaces:

stop when surface hit•Direct volume rendering:

stop when opacity >= threshold

•Several possibilities•Current GPUs: early loop exit works well

Page 27: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Standard Ray Casting Optim. (2)

Empty space skipping•Skip transparent samples•Depends on transfer function•Start casting close to first hit

• Several possibilities•Per-sample check of opacity (expensive)•Hierarchical data store (e.g., octree with stack-less

traversal [Gobbetti et al., 2008] )

•These are image-order:what about object-order?

Page 28: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Object-Order Empty Space Skip. (1)

•Modify initial rasterization step

rasterize bounding box rasterize “tight" bounding geometry

Page 29: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Object-Order Empty Space Skip. (2)

• Store min-max values of volume blocks• Cull blocks against transfer function or isovalue• Rasterize front and back faces of active blocks

Page 30: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Connectome: Fluorescence Data

Page 31: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA

Connectome: Implicit Surfaces

Page 32: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Addressing the Data Challenge

• Multi-Scale Imaging

• Hierarchical Data Representation

• Distributed Heterogeneous Computing

• Visualization

• Segmentation

• Analysis

Page 33: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Active Ribbons

Active Ribbon:A set of two non-intersecting and coupled Active Contours

Active Contour: Deformable closed curve that can be used to segment objects in an image

Inner Active Contour

Outer Active Contour

Active Ribbon

Page 34: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Results (Matlab)

Page 35: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Axon Segmentation

Page 36: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Interactive Analysis

Page 37: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

How did the Universe start?

The MWA Project

Kevin Dale, Richard Edgar, Daniel Mitchell, Randall Wayth, Lincoln Greenhill, and Hanspeter Pfister

Page 38: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

MWA CfA / IIC Team• Harvard Center for Astrophysics /

Smithsonian Astrophysical Observatory– Lincoln Greenhill– Daniel Mitchell– Randall Wayth– Stephen Ord

• IIC / SEAS– Richard Edgar– Kevin Dale, Hanspeter Pfister

Page 39: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

The Scientific Goals• Epoch of Re-

Inonisation (EOR)

• Heliospheric and Ionospheric

• Transient detection

• Pulsars, Surveys, Interstellar Medium, Galactic Magnetic Field, …

ionized

neutral

( H )

ionized

Th

e “G

ap

Page 40: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

The Murchison Widefield Array (MWA)

• Located in the remote Australian outback

• Extremely wide fields of view for radio astronomy in the 80-300 MHz band

• 512 tiles, each a 4x4 array of dipoles, scattered over ~ 1.5 km

• Data center for real-time processing co-located with the array

http://www.haystack.mit.edu/ast/arrays/mwa/index.html

Page 41: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

© Murchison Wide-field Array Project (MIT/Harvard/Smithsonian/ANU/Curtin U./U.Melb./UWA/CSIRO)

Page 42: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

© Murchison Wide-field Array Project (MIT/Harvard/Smithsonian/ANU/Curtin U./U.Melb./UWA/CSIRO)

Page 43: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)
Page 44: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Ionospheric offsets

Ungridded visibilities with bright sources

peeledImaging

Calibration

Page 45: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

FFT

Averaging ( !)

GriddingVector Rotation

16 GB/s

0.5s cadence

(1) GB/s

8s cadence

Mapping

Science

v. parallel computation

entangled Calibration Loop

The Data Rate Challenge

Page 46: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Implementation• Hardware

• 2.4 GHz dual-core AMD Opteron, 4GB RAM

• NVIDIA Quadro FX 5600

• Software

• AMD Core Math Library (ACML)

• NVIDIA CUDA (CUBLAS, CUFFT)

• OpenGL

Page 47: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Single-GPU SpeedupCPUGPU speedup

0 10 20 30 40 50 60 70

RotateAndAccumulateVisibilities

MeasureIonosphericOffset

MeasureTileResponse

ReRotateVisibilities

PeelTileResponse

UnpeelTileResponse

Gridding *

Imaging

� � � � � � �

������

Image Formation

Calibration Loop

Mostly OpenGL

Page 48: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Example Results

GPU Reference

• Noisy images from test data

Page 49: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Scaling to a Cluster

• 1000 frequency channels, 65 sources every 8 seconds, and 16002 output image

• 20-40 frequencies / GPU

• 32-64 GPUs, i.e., 16 Tesla S1070s

• Need MPI for internal data transfer

Page 50: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

Conclusions

• GPUs enable high-throughput scientific computing

• Performance gains of 10-100x

• CUDA makes life easier (but not perfect)

• Rasterization / OpenGL still useful

• Need CUDA MPI for clusters