cardiac blood flow alpha project impact kathy yelick eecs department u.c. berkeley
TRANSCRIPT
Cardiac Blood FlowAlpha Project Impact
Kathy Yelick
EECS Department
U.C. Berkeley
The 20+ Year Vision
• Imagine a “digital body double” – 3D image-based medical record– Includes diagnostic, pathologic, and other
information
• Used for:– Diagnosis– Less invasive surgery-by-robot– Experimental treatments
• Digital Human Effort– Lead by the Federation of American Scientists
Digital Human Today: Imaging
• The Visible Human Project– 18,000 digitized sections of the body
• Male: 1mm sections, released in 1994• Female: .33mm sections, released in 1995
– Goals• study of human anatomy• testing medical imaging algorithms
– Current applications: • educational, diagnostic, treatment planning,
virtual reality, artistic, mathematical and industrial
• Used by > 1,400 licensees in 42 countries
Image Source: www.madsci.org
Digital Human Roadmap
1995 2000 2005 2010
1 organ 1 model
scalable implementations
1 organ multiple models
multiple organs
3D model construction
better algorithms
organ system
coupled models
faster computers
improved programmability
digital human
Building 3D Models from Images
Image source: John Sullivan et al, WPI
Image data from • visible human• MRI• Laboratory experiments
Automatic construction• Surface mesh• Volume mesh• John Sullivan et al, WPI
Simulation: From Genome to Function (James B.Bassingthwaighte, UW)
Genes
HealthOrganism
Organ
Tissue
Cell
Molecule
The Physiome Projecthttp://www.physiome.org
Structure to Function:• Experiments, Databases• Problem Formulation• Engineering the Solutions • Quantitative System Modeling• Archiving & Dissemination
Organ Simulation
Cardiac cells/muscles– SDSC, Auckland, UW, Utah,
Cardiac flow
– NYU,…
Lung transport– Vanderbilt
Lung flow– ORNL
Cochlea
– Caltech, UM
Kidney mesh generation
– Dartmouth
Electrocardiography– Johns Hopkins,…Skeletal mesh
generation
Brain
– ElIisman
Just a few of the efforts at understanding and simulating parts of the human body
Immersed Boundaries within the Body
• Fluid flow within the body is one of the major challenges, e.g., – Blood through the heart– Coagulation of platelets in clots– Movement of bacteria
• A key problem is modeling an elastic structure immersed in a fluid– Irregular moving boundaries– Wide range of scales– Vary by structure, connectivity, viscosity, external
forced, internally-generated forces, etc.
Heart Simulation
Developed by Peskin and McQueen at NYU– On a Cray C90: 1 heart-beat in 100 CPU hours– 1283 point fluid mesh; coarser mesh incorrect – Heart represented by muscle fibers
–Applications: • Understanding
structural abnormalities
• Evaluating artificial heart valves
• Eventually, artificial hearts
Source: www.psc.org
Heart Simulation
Source: www.psc.org
Animation of lower portion of the heart
Platelet Simulation
• Simulation of blood clotting– Developed by Aaron Fogelson and Charlie Peskin– Cells are represented by polygons
• Rules added to simulation adhesion• Cells within some distance of each other are linked• Sufficient force on the link can break it
– Artery walls represented as fibers– 2D Simulation
Cochlea Simulation
– Simulates fluid-structure interactions due to incoming sound waves
– Potential applications: design of cochlear implants
• Simulation–OpenMP code
• E. Givelberg• J. Bunn• M. Rajan
–256x256x128 fluid grid
–18 hours on HP Superdome
Cochlea Simulation
• Embedded structures are– Elastic membranes– Shell model– Variable elasticity
• Capture shape of cochlea
• Shown here is a plate– Implemented in a
few weeks within Titanium code Source: Titanium simulation on a T3E
Bacteria Simulation
• Simulation of bacteria in a fluid– Requires smaller scale– Small animal swimming
• Lisa Fauchy, Tulane and
• James Sullivan, Tulane
• E coli simulation– Video © James A. Sullivan, Cells Alive!
• Brownian motion extension– Under development by Peter Kramer, RPI– microscopic processes where thermal fluctuation is
important
Insect Flight Simulation
• Wings are– Immersed 2D
structure
• Under development – UW and NYU
• Applications to– Insect robot design
Source: Dickenson, UCB
Other Applications
• The immersed boundary method has also been used, or is being applied to– Flags and parachutes– Flagella – Embryo growth– Valveless pumping– Paper making
Immersed Boundary Method Structure
Fiber activation & force calculation
InterpolateVelocity
Navier-Stokes Solver
SpreadForce
4 steps in each timestep
Fiber Points
Interaction
Fluid Lattice
Architecture of the Alpha Project
Application Models
Generic Immersed Boundary Method (Titanium)
Heart(Titanium)
CochleaFlagellateSwimming
…
Spectral(Titanium)
Multigrid MLC(KeLP)
AMR
Extensible Simulation
SolversMultigrid(Titanium)
– Based on Generic Immersed Boundary Method– Can plug in new models by extending fiber points– Uses Java inheritance for simplicity
Alpha Project Plans
• Support for new applications– Berkeley and Michigan
• Scaling– Berkeley
• Solvers– San Diego, Berkeley, LBNL
• Data analysis and model construction– Ohio, Maryland
Use of PTE Tools in Implementation• Titanium provides
– Java classes, linked data structures, overloading– Distributed data structures (global address space)– Useful for trying different layouts
• KeLP provides– Alternative programming model for solvers
• DataCutter provides– Help with analysis and organization of output– Especially for hierarchical data
IB Titanium Status
• IB (Generic) rewritten in Titanium.–Heartbeat simulation
• Experiment 800• 1/5th of heartbeat done• Using to validate code
• Recent improvements–FFTW in solver
• 10x faster
–Study of fluid/fiber communication
• Better load balancingSource: Titanium simulation on Millennium
Reduced Solver Bottleneck
• Performance improvements in past year– About 5x overall; ~10 seconds/ts on 64 T3E nodes– Bottleneck has changed to interaction phases
Fiber14%
Solve23%
Spread30%
Inter-polate27%
Inter-polate27%
Spread27%
Solve44%
Fiber2%
2000 Performance 2002 Performance
Added Scatter/Gather Array Copy
• Fine-grained communication in Titanium– Using references: x = cell.xPos;– y = allCells[k].yPos;
• Bulk communication in Titanium– Using array copy: A.copy(B);
• Added scatter/gather– Using array copy with list of “Points”– A.copy(B,Indices)
Use of Scatter/Gather in IB Code
• Scatter/Gather useful for sparse matrix algorithms, e.g., A*x– Incurs packing overhead– Cutoff for benefit under 50%
• Modified IB code– fill was closer to 60%– Overhead of computing
indices surprising high
0
500
1000
1500
2000
2500
3000
3500
orig box hash bool
mse
c
total
setup
network
Load Balance and Locality
• Tried various assignments of fiber points
Optimized for Locality
Optimized for Load Balance
• Optimizing for load balance generally better– Application (fiber structure) dependent– Somewhat constrained by spectral solver– Multigrid should give more options
Scallop: Multigrid Poisson Solver
• A latency tolerant elliptical solver library– Will be used to build Navier-Stokes Solver– Implemented in KeLP, with a simple interface
• Work by Scott Baden and Greg Balls– Based on Balls/Colella algorithm– 2D implementation in both KeLP and Titanium
• 3D Solver – Algorithm is complete– Implementation running, but performance tuning is
ongoing– Interface between Titanium and KeLP developed
Elliptical solvers
• A finite-difference based solvers– Good for regular, block-structured domains
• Method of Local Corrections– Local solutions corrected by a coarse solution– Good accuracy, well-conditioned solutions
• Limited communication– Once to generate coarse grid values– Once to correct local solutions– Trades off extra computation for fewer messages
KeLP implementation
• Advantages – abstractions available in C++– built in domain calculus– communication management– numerical kernels written in Fortran
• Simple interface– callable from other languages– no KeLP required in user code
Improved Heart Structure Model
• Current model is– Based on dog heart, textbook anatomy– Approximation by composing cones
• Building a more accurate model– Use modern imaging on human heart for model– Need to see individual fibers– Collaboration between
• Joel Saltz’s group and • Dr. Robert DePhilip in Anatomy Division of Biomedical
Informatics Dept. at Ohio State University
• Long term goal– Specialize model to patient using MRI data
Analysis of Cardiac Simulations
• Methods and tools to analyze 3D datasets from cardiac blood flow. – Outputs are velocity and pressure values on a 3D grid
(1283) over many time steps.
• Characterize behavior under different pathological and physiological conditions. – Natural and artificial heart valves– Vary the initial values of
• Kinematic viscosity • Fluid density• Fiber stiffness and resting lengths
• Use parameter study to find conditions that may lead to aneurysm.
Analysis of Cardiac Simulations
• These queries require support for – spatial subsetting of one or more datasets – processing of the data of interest to visualize and
compare results from one or more datasets – execution on high-performance machines and in a
distributed environment.
• Implemented using DataCutter developed by Joel Saltz’s group.
Summary of Alpha Project Impact
• Several categories– Application development
• Heart and cochlea-component simulation
– Application-level package• Generic immersed boundary method• Parallel for shared and distributed memory• Enables new larger-scale simulations; finer grid
– Solver libraries• Method of Local Corrections• Improved scalability and load balance expected
– Data analysis• Building input data and analysis of results
applicationdata
softwaresystems
Alpha Project Team Collaborators
• Scott Baden, UCSD• Greg Balls, SDSC• Julian Bunn, CalTech• Phillip Colella, LBNL• Kaushik Datta, UCB• Robert DePhilip, OSU• Ed Givelberg, UM• Susan Graham, UCB
And lots of others….
• Dan Martin, LBNL• Dave McQueen, NYU• Sabrina Merchant, UCB• Charles Peskin, NYU• Joel Saltz, OSU• Dan Shalit, SDSC• Alan Sussman, UMD• Siu Man Yau, UCB• Kathy Yelick, UCB
End of Slides
Outline
• Motivation: digital human• Immersed Boundary method applications• Titanium• Parallel Immersed boundary method• Alpha Project
– Heart simulation– Cochlea
• Performance and scaling the problems• Future plans
– Input data– Data Analysis– Solvers– As a driver for machine architectures
Project Summary
• Provide easy-to-use, high performance tool for simulation of fluid flow in biological systems.– Using the Immersed Boundary Method
• Enable simulations on large-scale parallel machines. – Distributed memory machine including SMP clusters
• Using Titanium, ADR, and KeLP with AMR
Outline
• Short term goals and plans
• Technical status of project– Immersed Boundary Method– Software Tools– Solvers
• Next Steps
Short Term Goals for October 2001
IB Method written in Titanium (IBT)IBT Simulation on distributed memoryHeart model input and visualization
support in IBTTitanium running on Blue Horizon• IBT users on BH and other SPs? Performance tuning of code to exceed
T90 performance? Replace solver with (adaptive) multigrid
Building a User Community
• Many users of the IB Method
• Lots of concern over lack of distributed memory implementation
• Once IBT is more robust and efficient (May ’01), advertise to users
• Identify 1 or 2 early adopters
• Longer term: workshop or short course
Long Term Software Release Model
• Titanium– Working with UPC and possibly others on
common runtime layer– Compiler is relatively stable but needs
ongoing support
• IB Method– Release Titanium source code– Parameterized “black box” for IB Method
with possible cross-language support
• Visualization software is tied to SGI
Immersed Boundary Method
• Developed at NYU by Peskin & McQueen to model biological systems where elastic fibers are immersed in an incompressible fluid. – Fibers (e.g., heart muscles) modeled by
list of fiber points– Fluid space modeled by a regular lattice
Challenges to Parallelization
• Irregular fiber lists need to interact with regular fluid lattice. – Trade-off between load balancing of fibers
and minimizing communication• Efficient “scatter-gather” across processors
• Need a scalable elliptic solver– Plan to uses multigrid – Eventually add Adaptive Mesh Refinement
• New algorithms under development by Colella’s group
Titanium Status
• Titanium runs on uniprocessors, SMPs, and distributed memory with a single programming model
• It has run on Blue Horizon– Issues related to communication balance– Revamped backends are more organized,
but BH backend not working right now
• Need to replace personnel
Solver Status
• Current solver is based on 3D FFT• Multigrid might be more scalable• Multigrid with adaptive meshes might be more
so• Balls and Colella algorithm could also be
used• KeLP implementations of solvers included• Note: McQueen is looking into solver issues
for numerical reasons unrelated to scaling• Not critical for first year goals
Immersed Boundary on Titanium
• Performance Breakdown (torus simulation):
Berkeley NOW, 4 processors
NS Solver44%
Interpolate Velocity
39%
Fiber Calculation
0%
Spread Force27%
Argonne Lab SGI Origin, 4 processors
NS Solver30%
Interpolate Velocity
34%
Fiber Calculation
2%
Spread Force34%
Immersed Boundary on Titanium
Relative speedup on the NOW
00.5
11.5
22.5
33.5
44.5
5
1 2 4 8Number of processors
Speedup
FiberCalculation
SpreadForce
Fluid Solver
InterpolateVelocity
Overall
Next Steps
• Improve performance of IBT
• Generate heart input for IBT
• Recover Titanium on BH
• Identify early user(s) of IBT
• Improve NS solver
• Add functionality – Bending angles, anchorage points, source
& sinks) to the software package.
General Questions• - How has your project addressed the goals of the PACI program (providing
access to tradition HPC, providing early access to experimental systems,fostering interdisciplinary research, contributing to intellectualdevelopment, broadening the base)?- What infrastructure products (e.g., software, algorithms, etc.) have you produced?- Where have you deployed them (on NPACI systems, other systems)?- What have you done to communicate the availability of thisinfrastructure?- What training have you done?- What kind/size of community is using your infrastructure?- How have you integrated your work with EOT activities?- What scientific accomplishments - or other measurable impacts notcovered by answers to previous questions - have resulted from its use?- What are the emerging trends/technologies that NPACI should buildon/leverage?- How can we increase the impact of NPACI development to date?- How can we increase the community that uses the infrastructure you'vedeveloped?
Greg’s Slides
A Finite Difference Domain Decomposition Method Using Local Corrections for the Poisson Equation
Greg Balls
University of California, Berkeley
The Poisson Equation
• We are interested in the solution to
• A particular solution to this equation is2 of all on,
zzGydyyxG 1
log21
)(,)()(
Infinite Domain Boundary Conditions
• We can write our infinite domain boundary condition as
• These boundary conditions specify a unique solution.
xox
R ),1(1
log2
xdxR )(
The Discretized Problem
• We would like an approximate solution
1,01,0)supp(
),(, jhihexactji
NhNji
1,,0
Solving the Discretized Problem
• We could calculate the convolution integral at each point
• Multigrid provides a faster method work)( 4NO
work)log( 2 NNO
A Standard Finite Difference Discretization
• With a discretization of the Laplacian, e.g.
• We solve the discretized equation
jijijijijiji hL ,1,1,,1,12,
5 41
),(, jhihL ji
1,11,11,11,12,9
61
jijijijiji hL
jijijijiji ,1,1,,1,1 204
A Finite Difference Approach for the Infinite Domain Problem
• A discrete solution can be found in three steps:
1. Solve a multigrid problem with homogeneous Dirichlet boundary conditions.
2. Do a potential calculation to set accurate inhomogeneous Dirichlet boundary conditions.
3. Solve a second multigrid problem with these boundary conditions.
A Finite Difference Approach for the Infinite Domain Problem
• The first multigrid solution:
DDL on
DD on0
A Finite Difference Approach for the Infinite Domain Problem
• The potential calculation:
dssyxGsyxP
inside
D
nsy
sy
A Finite Difference Approach for the Infinite Domain Problem
• The second multigrid solution:
BBL on
BPB on
Domain Decomposition
• We would like to solve this problem in parallel, calculating h such that
• A basic domain decomposition strategy:Do until converged -
• Break into pieces.• Solve on each piece.• Compute coupling.
2hOexacth
Domain Decomposition Options
• Point relaxation– Too much communication and too much
computation.
• Multigrid– Less computation, but still too much
communication.
• Finite element domain decomposition– Less communication, but still iterative.
The Importance of Communication
• Current parallel machines can do many floating point operations in the time that it takes to send a message.
• This imbalance will get worse.
Fast Particle Methods
• Methods such as FMM and MLC need no iteration.
• They take advantage of the fact that the local and far-field solutions are only weakly coupled.
A Method of Local Corrections for Finite Difference Calculations
• The basic strategy:– Break into pieces.– Solve on each piece.– Compute coupling through a single coarse
solution.– Compute the corrected solution on each
piece.
The Initial Solution
• An infinite domain solution is found on each piece, l
• The effects of all other pieces are ignored.
hl
initialhlL ,9
A Coarse Grid Charge
• A coarse grid charge is computed for each piece.
initialHl
Hl L ,9
initialhl
initialHl A ,,
The Global Coarse Solution
• All the individual coarse grid charges are combined on a global coarse grid.
• A global coarse solution is found.
HHL 9
l
Hl
H
Setting Accurate Boundary Conditions
• The interpolation stencil only interpolates far-field information.
stencilHH
mm
initialHm
HstencilH
,:
,,
Hk
Hj
Setting Accurate Boundary Conditions
• The coarse stencil information is interpolated to a corresponding fine grid stencil to O(H4).
• Local information is added from nearby fine grids.
stencilHHm
lm
initialhm
stencilhhl
,:
,,
The Corrected Solution
• Once the boundary conditions have been set for each piece, we solve one last time with multigrid:
• The full solution is thenhl
hlL 5
l
hl
How Accuracy Is Maintained
• Local error is only O(h2).
• Error in the global coarse solution is O(H4).
• The coarse solution is accurate to O(H4) because of the error of the L9 discretization.
where),( 64)supp(~
HOHexact H
0H
Scaling for Accuracy and Performance
• We can scale the coarse and fine grids as
• The coarse grid solve represents much less work than the work done on the fine grids.
hOH
NNC
The Titanium Programming Language
• Titanium is a new language designed for scientific computing on parallel architectures.– SPMD parallelism.– A dialect of Java, compiled to native code.– Language support for scientific computing.
The Benefits of Titanium
• An object-oriented language with built-in support for fast, multi-dimensional arrays.
• Language support for– Tuples (Points).– Rectangular regions (RectDomains).– Expressing array bounds as RectDomains
and indexing arrays by Points.
• A global address space
Accuracy of the Method
Grid Size Np Nr Max Error
L2 Error Max Convergence
L2 Convergence
257 2 4 8.61e-8 2.18e-8
257 2 8 8.51e-8 2.13e-8
257 4 8 8.25e-8 2.02e-8
257 4 16 7.23e-8 1.77e-8
513 2 4 2.02e-8 5.32e-9 2.22 2.06
513 2 8 2.01e-8 5.26e-9 2.23 2.06
513 4 8 1.96e-8 5.05e-9 2.32 2.18
513 4 16 1.67e-8 4.12e-9 2.42 2.30
Error on a Large, High-Wavenumber Problem
91047.6
91031.1
Scalability of the Method
• Results from the SDSC IBM SP-2:
Scalability of the Method
• Results from the NERSC Cray T3E:
Future Work
• Extension to three dimensions.
• Implementation of different boundary conditions.
• Use in other solvers such as:– Euler.– Navier-Stokes.
Conclusions
• The method is second-order accurate.
• The method does not iterate between the local fine representations and the global coarse grid.
• The need for communication is kept to a minimum.
• The method is scalable.
Comparison to the Serial Method
• Extra computational costs:– The time spent on the coarse grid solution
can be kept to less than 10% of time spent on the local fine grids.
– The final multigrid solution adds 40% more fine grid work.
• Communication costs:– Experimentally, less than 1% of the total
time
End of Slides
Simulation: The Third Pillar of Science • Traditional scientific and engineering paradigm:
1) Do theory or paper design.2) Perform experiments or build system.
• Limitations:– Too difficult -- build large wind tunnels.– Too expensive -- build a throw-away passenger jet.– Too slow -- wait for climate or galactic evolution.– Too dangerous -- drug design.
• Computational science paradigm:3) Use high performance computer systems to
simulate the phenomenon.
Economics of Large Scale Simulation• Automotive design:
– Crash and aerodynamics simulation (500+ CPUs).– Savings: approx. $1 billion per company per year.
• Semiconductor industry:– Device simulation and logic validation (500+ CPUs).– Savings: approx. $1 billion per company per year.
• Airlines:– Logistics optimization on parallel system.– Savings: approx. $100 million per airline per year.
• Securities industry:– Home mortgage investment and risk analysis.– Savings: approx. $15 billion per year.
• What about health care, which is 20% of GNP?Source: David Bailey, LBNL
IB Method Users• Peskin and McQueen at NYU
– Heart model, including valve design
• At Washington– Insect flight
• Fauchy et al at Tulane– Small animal swimming
• Peter Kramer at RPI– Brownian motion in the IBM
• John Stocky at Simon Fraser– Paper making
• Others– parachutes, flags, flagellates, robot insects
ToDo
• Work on solver slides– 2nd order accurate– MLC, AMR, MG– Load balancing
• Add performance numbers• Add conclusions• Add scatter/gather slide• Add more on Titanium• Add Fogelson visualization• Add info on beating heart movie