microsoft research faculty summit 2008. ian foster computation institute university of chicago &...
TRANSCRIPT
![Page 1: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/1.jpg)
Microsoft Research Faculty Summit 2008
![Page 2: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/2.jpg)
Towards a Data Cauldron
Ian FosterComputation InstituteUniversity of Chicago & Argonne National Laboratory
![Page 3: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/3.jpg)
If you want to build a ship, don’t drum up the men to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.
Antoine de Saint-Exupéry
![Page 4: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/4.jpg)
Biomedical Research, circa 1600
![Page 5: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/5.jpg)
Biomedical Research, circa 2000
![Page 6: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/6.jpg)
Growth of Sequences &Annotations since 1982
Folker Meyer, Genome Sequencing vs. Moore’s Law: Cyber Challenges for the Next Decade, CTWatch, August 2006.
![Page 7: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/7.jpg)
An Open Analytics Environment
Resultsout
Datain
Programs& rules in
“No limits” Storage Computing Format Program
Allowing for Versioning Provenance Collaboration Annotation
![Page 8: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/8.jpg)
o·pen [oh-puhn] adjective
having the interior immediately accessible
relatively free of obstructions to sight, movement, or internal arrangement
generous, liberal, or bounteous
in operation; live
readily admitting new members
not constipated
![Page 10: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/10.jpg)
What Goes In (2)
Rules
Workflows
Dryad
MapReduce
Parallel programs
SQL
BPEL
Swift
SCFL
R
MatLab
Octave
![Page 11: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/11.jpg)
How it Cooks
VirtualizationRun any program, store any data
IndexingAutomated maintenance
ProvisioningPolicy-driven allocation of resources to competing demands
![Page 12: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/12.jpg)
What Comes Out
DataData
![Page 13: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/13.jpg)
Analysis as (Collaborative) ProcessTransform
Annotate
Search
Add to
Tag
Visualize
Discover
Extend
Group
Share
![Page 14: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/14.jpg)
Data Cauldron @ U.Chicago: ApplicationsAstrophysicsCognitive scienceEast Asian studiesEconomicsEnvironmental scienceEpidemiologyGenomic medicineNeurosciencePolitical scienceSociologySolid state physics
![Page 15: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/15.jpg)
Data Cauldron @ U.Chicago: Hardware
500 TB reliable storage (data, metadata)
180 TB, 180 GB/s17 Top/sanalysis
Dataingest
Dynamic provisioning
Parallel analysis
Remote access
Offload to remote data centers
P A D S
Diverseusers
Diversedata
sources
1000 TBtape backup
![Page 16: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/16.jpg)
DOCK on BG/P: ~1M Tasks on 118,000 CPUs
CPU cores: 118784
Tasks: 934803
Elapsed time: 7257 sec
Compute time: 21.43 CPU yr
Average task time: 667 sec
Relative Efficiency: 99.7%
(from 16 to 32 racks)
Utilization: Sustained: 99.6%
Overall: 78.3%
IoanRaicu
ZhaoZhang
MikeWilde
Time (secs)
![Page 17: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/17.jpg)
Data Cauldron @ U.Chicago:MethodsHPC systems software (MPICH, PVFS, ZeptOS)Collaborative data tagging (GLOSS)Data integration (XDTM)HPC data analytics and visualizationLoosely coupled parallelism (Swift, Hadoop)Dynamic provisioning (Falkon)Service authoring (Introduce, caGrid, gRAVI)Provenance recording and query (Swift)Service composition and workflow (Taverna)Virtualization management (Workspace Service)Distributed data management (GridFTP, etc.)
![Page 18: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/18.jpg)
High-PerformanceData Analytics
FunctionalMRI
Ben Clifford, MihaelHatigan, Mike Wilde,Yong Zhao
![Page 19: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/19.jpg)
Social Informatics Data Grid (SIDgrid)Collaborative, multi-modal analysis of cognitive science data
TeraGrid PADS …
SIDgrid
Diverseexperimental
data &metadata
Browse dataSearchContent previewTranscodeDownloadAnalyze
Bennett BerthenthalMike PapkaMike Wilde… and others
![Page 20: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/20.jpg)
A Vast and Endless Sea …
Resultsout
Datain
Programs& rules in
“No limits” Storage Computing Format Program
Allowing for Versioning Provenance Collaboration Annotation
![Page 21: Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d195503460f949ee1e1/html5/thumbnails/21.jpg)