coalition of academic scientific computation, september 18, 2002 high performance computing and...
TRANSCRIPT
Coalition of Academic Scientific Computation, September 18, 2002
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
High Performance Computingand
Biomedical Research
Ralph RoskiesProfessor of Physics-University of Pittsburgh
Scientific Director-Pittsburgh Supercomputing Center
Coalition for Academic Scientific Computation, September 18, 2002
Why the recent emphasis on computing in biomedicine?
Economics- in 15 years, for the same cost Can do 10,000 times more processing Can store 100,000 times more data
Software developments New algorithms for processing The Web for finding data
Now possible to do things previously not feasible
Acquiring, storing, finding, accessing large amounts of data (terabytes to petabytes of text, numbers, images)
Assembling large databases and repeatedly reprocessing them to ask new questions
Simulations based on realistic models to understand the data, do predictive, as opposed to descriptive, biology and
medicine
Why High Performance Computing?
HPC is a discovery tool. Bringing more problems within reasonable human
timescales encourages creativity and exploration HPC is a time machine Incorporating more realism in models crosses a
threshold of relevance to experiment
HPC really involves computing, networking, visualization, storage
3.0T MRI Scanner Cray T3E SGI Onyx
Real-time fMRI
In 1996, this needed a supercomputerToday, it’s routine
Compelling biomedical investigations that HPC enables today
Genomics Analyzing and storing images for
revolutionizing medical care Blood flow and heart disease Structural biology
Data ExplosionExponential Growth of GenBank
Nu
mb
er o
f G
igab
ases
Growth in 2002 due to additions up to May 23th!
1982 1986 1990 1994 1998 20020
5
10
15
20
courtesy of Thom Dunning,
BioGrid North Carolina
Simulations linking genes and diseases Based on Utah’s resource of 1.5M people,
with their genealogy, record-linked to cancer and death records back to the early 1900s. Typical pedigree goes back 8 generations Need genotypic data on hundreds of people alive
today
University of Utah Division of Genetic Epidemiology and Center for High Performance Computing
Image Analysis USC Microtomography
High-throughput 3D microtomography using data from electron microscopes
Need high performance computing for real-time analysis
Compare 3-D MRI of normal and knockout mice
Allan Johnson-Duke
Blood Flow 1990’s- Realistic geometry Artificial valve design (Charles Peskin et al, Courant Institute;
150 hours C90)
Today- role of turbulence in loosening plaque, leading to embolisms(Henry Tufo et al, Argonne; 104 hours TCS)
Tomorrow- designing heart pumps to minimize damage to individual blood cells
(Jim Antaki et al, University of Pittsburgh, millions of hours TCS?)
Structural Biology e.g.How Do Aquaporins Work?
Aquaporins -proteins which conduct large volumes of water through cell walls while filtering out charged particles like hydrogen ions.
Massive simulation showed that water moves through aquaporin channels in single file. Oxygen leads the way in. Half way through, the water molecule flips over.
That breaks the ‘proton wire’Klaus Schulten et al, U. of Illinois, SCIENCE (April 19, 2002)
35,000 hours TCS
For a given computing capability, certain important problems get solved.
Many problems need more computing power than we currently have
Protein folding Analyzing genomic and proteomic data Cell modeling and metabolism
Protein folding
Critical for drug design, Understanding misfolding
crucial for diseases like Alzheimers and mad cow
Today’s most powerful systems can only simulate microseconds of real time, but folding takes milliseconds or more Villin headpiece
Red-nativeBlue- partially folded
Genomics and Proteomics
Data is increasing rapidly. Computational demands for integrating, mining and analyzing that data grows even faster.
courtesy of Thom Dunning,
BioGrid North Carolina
By 2005Genomic data–petabytesComputational needs- 10
Teraflops
from TimeLogic
Cell modeling
Need to take account of spatial inhomogeneity cell geometry signal variability (stochastic behavior)
Synaptic TransmissionMany
neurological diseases due
to problems of release or
absorption of neurotransmitters like acetylcholine,
glutamate, glycine, GABA,
serotonin
Joel Stiles, PSC and Tom Bartol, Salk- MCell
Unusual Medical Success In Slow Channel Congenital Myasthenic Syndrome,
channel closes slower upon binding. Electrical current continues longer than normal.
Particular patient presented puzzling symptoms Stiles experimented with the model parameters, and
simulations showed that one could explain the symptoms if the receptors also opened slowly- then verified medically
Unusual interplay of simulation and medical diagnosis- depends critically on realistic
geometry and on stochastic modeling
HPC has enormous promise for biomedicine and improving health
See e.g. The Biomedical Information Science and
Technology Initiative (BISTI) report, June 1999 www.nih.gov/about/director/060399.htm
PITAC Report to the President, “Transforming Health Care Through Information Technology” February 2001, www.itrd.gov/pubs/pitac/index.html
Department of Energy, Computational Structural Biology http://cbcg.lbl.gov/ssi-csb/Program.html
PITAC Recommendations for NIH
Pilot projects and Enabling Technology Centers should be established to extend the practical uses of information technology to health care systems and biomedical research NCRR doing some of this, but Resource budgets limited at
$700K. NIBIB?
PITAC recommendations (cont’d)
Programs should be established to increase the pool of biomedical research and health care professionals with training at the intersection of health and information technology. NPEBC programs a start. But biologists will soon be
overtaken by technical developments and the associated analysis needs
PITAC recommendations (cont’d) A scalable national computing infrastructure
should be provided to support the biomedical research community; Still badly needed NSF and civilian DOE have each recently
invested~$100M in HPC infrastructure. Biomedical users are very heavy users of NSF and
DOE facilities. (At PSC, close to 50% this past year). NIH has almost no investment in these or comparable
resources. (PSC has 30 times the compute power of NCI’s ABCC at Frederick)
Hardware is not enough
Also need support people, knowledgeable in both computing and biology to interact with and support the biomedical research community.
Emerging paradigmGrid Computing
Stresses collaboration, seamless access to data wherever located
Multiple ComputersDistributed data sets High speed networksCommon Interface
We urge you to consider:
NIH should establish HPC Centers, with leading-edge hardware, biomedically-oriented support staff, research into relevant algorithms, and vigorous training.
NIH should actively cooperate with NSF, DOE, and other agencies and shoulder their fair share in building the national computing infrastructure.
Comparable budget scale is ~$100M/year .
There are many sites in the nation that could respond credibly to an NIH solicitation for such Center, including many minority institutions as partners
Because computing infrastructure cuts across Institutes, it is not, and will never be the major priority of any Institute
A cross-Institute initiative of this magnitude and importance cannot happen without leadership from the Director