hpc for biomed applications marcos athanasoulis, dr.ph director, information technology harvard...
TRANSCRIPT
![Page 1: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/1.jpg)
HPC for Biomed HPC for Biomed ApplicationsApplications
Marcos Athanasoulis, Dr.PH Director, Information TechnologyHarvard Medical School
![Page 2: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/2.jpg)
OutlineOutlineAbout HMSWhy Biomed HPC is differentContextResults from Biomed HPC 2007
SummitPredictionsRecommendations for Fabric
weavers
![Page 3: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/3.jpg)
About the Longwood Medical About the Longwood Medical AreaArea213 Acres, 37,000 employees,
15,000 students21 institutions2.15 million in- and outpatient visits Forty-seven percent of all hospital-
based outpatient clinical visits, and fifty-one percent of all inpatient admissions in Boston
Forty-seven percent of all staffed beds in Boston
15,016 births in the LMA
![Page 4: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/4.jpg)
HMS Affiliated Research – HMS Affiliated Research – LongwoodLongwood Four of the top five Independent Hospital recipients of
NIH funding nationwide Massachusetts was the number two state recipient of
National Institutes of Health (NIH) funding Boston is ranked as the number one city in the nation
for NIH support If the LMA were ranked as a city, it would be number
three for funding, after New York and before Philadelphia. If the LMA were ranked as a state, it would be number eight, after North Carolina, and before Washington.
National Institutes of Health (NIH) awards more than doubled for the LMA institutions from $302 million to $722 million over the decade between FY 1991 and FY 2001
![Page 5: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/5.jpg)
What makes Biomed HPC What makes Biomed HPC Different?Different?Larger problem space
◦Whole genome processing◦Whole ‘Ome processing◦Image Processing◦Simulations◦Everything Else
Bursty Usage◦Processing power is not always the
bottleneck◦Most work is “embarrassingly parallel”
![Page 6: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/6.jpg)
Biomed HPC Differences Biomed HPC Differences (cont.)(cont.)Researchers
◦Funding challenges◦Grant funding limitations and
requirements◦Everyone is a CIO
Systems Diversity◦Plethora of small clusters◦General lack of centralization◦White boxes to blue genes
![Page 7: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/7.jpg)
About HPC @ HMSAbout HPC @ HMSToday:
◦Modest shared cluster◦1000 processor cores◦100TB attached NAS storage◦Interconnect: Gigabit Ethernet◦Subsidized user contribution model◦BUT, MOST computing happens
under the desk and behind the curtain!
![Page 8: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/8.jpg)
About HPC @ HMS (cont.)About HPC @ HMS (cont.)Tomorrow:
◦Mid-scale cluster and Harvard Grid◦10-20K processor cores◦Petabyte of storage◦Parallel file system◦10g Ethernet or Infiniband◦More centralized
![Page 9: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/9.jpg)
Challenge: Natural Language Challenge: Natural Language ProcessingProcessing
HOSPITAL COURSE: ... It was recommended that she receive …We also added Lactinax, oral form of Lactobacillus acidophilus to attempt a repopulation of her gut.
SH: widow,lives alone,2 children,no tob/alcohol.
BRIEF RESUME OF HOSPITAL COURSE: 63 yo woman with COPD, 50 pack-yr tobacco (quit 3 wks ago), spinal stenosis, ...
SOCIAL HISTORY: Negative for tobacco, alcohol, and IV drug abuse.
SOCIAL HISTORY: The patient is a nonsmoker. No alcohol.
SOCIAL HISTORY: The patient is married with four grown daughters,uses tobacco, has wine with dinner.
Smoker
Non-Smoker
SOCIAL HISTORY: The patient lives in rehab, married. Unclear smoking historyfrom the admission note…
Past Smoker
Hard to pick
Hard to pick
???
![Page 10: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/10.jpg)
Challenge: Whole OmesChallenge: Whole OmesCurrent cost 100KWorking on <$1,000 whole
genomeHigh Throughput Instrumentation
◦ $250-$500 for 500,000 SNP’s◦ $50-100K for good quality phenotyping of
100K++ individuals◦ What about the samples (consented)
$650/patient Dozens a week Wait in clinic: $450+/patient
![Page 11: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/11.jpg)
11
HPLC autosampler
(96 wells)syringe pump
Sequencing Equipment
microscope
with xyz
controls
flow-cell
temperature
control
![Page 12: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/12.jpg)
12
2nd-generation 2nd-generation sequencingsequencing
Harvard-model-F07: $106K incl. computer. $14K support. Open-source software, hardware, wetware Reduce reagent volume & per vol cost 100X each.
E07 (Nikon) F07
![Page 13: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/13.jpg)
![Page 14: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/14.jpg)
Challenge: Everything to Challenge: Everything to EverythingEverything
![Page 15: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/15.jpg)
Biomed HPC Leadership Biomed HPC Leadership SummitSummit150 leaders in biomedical HPCThe tech guy is between you and
a sale2008 Summit to convene October
6 and 7th in Boston MAhttp://biomedhpc.med.harvard.ed
u
![Page 16: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/16.jpg)
Biomed HPC Audience Biomed HPC Audience SurveysSurveysAudience response devicesN=60-100 Leaders in HPC Questions asked over the two
day eventAnd, survey says!
![Page 17: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/17.jpg)
Primary Network FabricPrimary Network Fabric
63
125
17
30
10
20
30
40
50
60
70
Per
cen
t
Primary Network Fabric
HMS Biomed HPC Leadership Summit 2007
Gig-Ethernet
InfiniBand
Myrinet
10g Ethernet
Other
![Page 18: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/18.jpg)
Do you use virtualization?Do you use virtualization?
47
14
39
0
10
20
30
40
50
Per
cen
t
Do you use virtualization?
HMS Biomed HPC Leadership Summit 2007
Yes, we do now
No, we don't and don't haveplans to
No, but considering it forfuture
![Page 19: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/19.jpg)
What are you using for What are you using for virtualization?virtualization?
66
23
29
0
10
20
30
40
50
60
70
Per
cen
t
What are you using for virtualization in your environment?
HMS Biomed HPC Leadership Summit 2007
VMWare
Xen
VMI
HPVM
![Page 20: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/20.jpg)
Use of parallel/distributed Use of parallel/distributed FSFS
50
5
22 23
0
10
20
30
40
50
Per
cen
t
Use of parallel/distributed/networkfilesystem for production storage
HMS Biomed HPC Leadership Summit 2007
Yes, we do now
No, we don't and don't haveplans to
No, but have plans to
No, but considering for future
![Page 21: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/21.jpg)
Which parallel filesystem?Which parallel filesystem?
1815
126 8
41
0
10
20
30
40
50
Pe
rce
nt
If using a distributed/network file system -- which one?
HMS Biomed HPC Leadership Summit 2007
Lustre
Microsoft Distributed FileSystem
Open AFS
PVFS
Brix
Other
![Page 22: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/22.jpg)
Which publication do you Which publication do you rely on?rely on?
44
12
3 58
27
0
10
20
30
40
50
Per
cen
t
Most useful, relevant, and timely publication for Gridand HPC computing
HMS Biomed HPC Leadership Summit 2007
HPC Wire
Bio IT World
Grid World
Grid Today
Computerworld
Other
![Page 23: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/23.jpg)
Primary Storage Primary Storage InfrastructureInfrastructure
45
30
1015
0
10
20
30
40
50
Per
cen
t
Primary Storage Infrastructure
HMS Biomed HPC Leadership Summit 2007
NAS
SAN
Locall attached for storageonly
Distributed file system forproduction
![Page 24: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/24.jpg)
Data center challengesData center challenges
30
45
25
0
10
20
30
40
50
Per
cen
t
Data Center Status
HMS Biomed HPC Leadership Summit 2007
Plenty of power, cooling, andspace
Plenty of space, butpower/cooling constraints
Short of physical space, plentof power and cooling
![Page 25: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/25.jpg)
Data center expansion Data center expansion plansplans
43
35
19
4
0
10
20
30
40
50
Per
cen
t
Data Center Expansion (in next year)
HMS Biomed HPC Leadership Summit 2007
Will build new data centerspace
Will lease commercial datacenter space
Will not expand data center
Don't run any data centers
![Page 26: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/26.jpg)
Job schedulers usedJob schedulers used
42
19 17
913
0
10
20
30
40
50
Per
cen
t
Job Scheduler Used
HMS Biomed HPC Leadership Summit 2007
Platform LSF
Sun Grid Engine
Open PBS
Other
No Scheduler/Not Applicable
![Page 27: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/27.jpg)
Primary drives being Primary drives being purchasedpurchased
55
16
27
20
10
20
30
40
50
60
Per
cen
t
Primary type of drive being bought for storageinfrastructure in new HPC systems
HMS Biomed HPC Leadership Summit 2007
SATA
SCSI/SAS
Fibre Channel
Other
![Page 28: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/28.jpg)
Types of servers deployedTypes of servers deployed
33
56
92
0
10
20
30
40
50
60
Per
cen
t
Primarily Purchased New ComputationalHardware (Current)
HMS Biomed HPC Leadership Summit 2007
1U Nodes
Blade Servers
Larger Scale SMP Boxes (>16CPU)
Other
![Page 29: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/29.jpg)
Installed 10GB E todayInstalled 10GB E today
53
2027
0
10
20
30
40
50
60
Per
cen
t
Installed 10GbE in Facility
HMS Biomed HPC Leadership Summit 2007
Yes
Plans for 2008
No
![Page 30: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/30.jpg)
Installed 10GB to enpointsInstalled 10GB to enpoints
2417
58
0
10
20
30
40
50
60
Per
cen
t
Installed 10GbE to End Points (Servers)
HMS Biomed HPC Leadership Summit 2007
Yes
Plans for 2008
No plans
![Page 31: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/31.jpg)
Best use of 10GB todayBest use of 10GB today
1724
58
0
10
20
30
40
50
60
Per
cen
t
Best Use for 10 Gigabit Ethernet Today
HMS Biomed HPC Leadership Summit 2007
Connecting Storage to CoreNetwork
Connecting SwitchesTogether
Both
![Page 32: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/32.jpg)
PredictionPredictionBiomed HPC will continue double
digit growth for the foreseeable future
The importance of the network fabric will increase dramatically
Biomedical HPC will become more centralized
![Page 33: HPC for Biomed Applications Marcos Athanasoulis, Dr.PH Director, Information Technology Harvard Medical School](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e705503460f94b6dba5/html5/thumbnails/33.jpg)
Recommendations for Open Recommendations for Open FabricFabricUser centered design
◦End to end analysis of your products usability
Don’t ignore the small guysBring costs downContinue your pursuit of
enlightened self interestBe involved in the community