from the benchtop to the datacenter: it and converged infrastructure in life sciences
DESCRIPTION
Talk given at the Leverage Big Data '14 Event in May 2014. Big data has arrived in the life science research domain, and it has caught researchers and IT professionals alike off-guard. Workstations, Excel, and even small clusters under people's desks are no longer sufficient to meet the data storage and processing needs of modern biological research techniques. Data is being produced cheaply and rapidly at unprecedented rates in academic, commercial and clinical laboratories while budgets in those spaces continue to be slashed. Despite the reduced budgets, it is predicted that 25% of all researchers will require HPC to analyze their data in the coming year. Research organizations are starting to realize they have to run to catch up, or face failure in the wake of old-school IT infrastructures and policies. IT organizations have been forced to get creative and build amazing infrastructures for pennies, or fail in the wake of the user pressure being generated from the laboratories. Converged infrastructure is the present and the future for biomedical, clinical, and life sciences research. In this talk, I'll cover the IT challenges in life sciences, how and where they are being met, and talk about the near-future trends in IT infrastructure, services, and informatics and how they will affect medical discoveries in the next 5-10 years.TRANSCRIPT
![Page 1: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/1.jpg)
From the Benchtop to the DatacenterIT and Converged Infrastructure in Life Science Research
1Leverage Big Data ’14, San Diego, CA.; May 2014
Thursday, May 22, 14
![Page 2: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/2.jpg)
Ari E. Berman, Ph.D.
Who am I?
2
Director of Government Services, Principal Investigator
I’m a fallen scientist - Ph.D. Molecular Biology, Neuroscience, Bioinformatics
I’m an HPC/Infrastructure geek - 14 years
I help enable science!Thursday, May 22, 14
![Page 3: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/3.jpg)
3
BioTeam
‣ Independent consulting shop
‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done
‣ Infrastructure, Informatics, Software Development, Cross-disciplinary Assessments
‣ 11+ years bridging the “gap” between science, IT & high performance computing
‣ Our wide-ranging work is what gets us invited to speak at events like this ...
Thursday, May 22, 14
![Page 4: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/4.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 5: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/5.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 6: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/6.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 7: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/7.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 8: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/8.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 9: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/9.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 10: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/10.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Converged Solution
Thursday, May 22, 14
![Page 11: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/11.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Converged Solution
Thursday, May 22, 14
![Page 12: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/12.jpg)
Mostly work in Life SciencesOur domain coverage
• Government• Universities• Big pharma• Biotech• Private institutes• Diagnostic startups• Oil and Gas• Geospatial• Hollywood Animation• Law Enforcement
5Thursday, May 22, 14
![Page 13: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/13.jpg)
A ton of .gov work on a massive scaleMy Recent Work...
‣ National Institutes of Health
‣ US Department of Agriculture (USDA)
‣ Navy‣ Centers for Disease
Control
6Thursday, May 22, 14
![Page 14: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/14.jpg)
7
OK, so why are we here talking to you?
Thursday, May 22, 14
![Page 15: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/15.jpg)
We have a unique perspective across much of life sciences
We’ve noticed a few things
‣ Big Data has arrived in Life Sciences
‣ Data is being generated at unprecedented rates
‣ Research and Biomedical Orgs were caught off guard
‣ IT running to catch up, limited budgets
‣ Money is tight, Orgs reluctant to invest in Bio-IT
8
25% of all Life Scientists will require HPC in 2015!Thursday, May 22, 14
![Page 16: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/16.jpg)
It’s being made harder by lack of supportResearch is hard
‣ Scientists are getting frustrated
‣ Stubborn, fight for what they need
‣ They will build a cluster under their desk if it gets the job done
‣ In general they will win against IT
9Thursday, May 22, 14
![Page 17: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/17.jpg)
10
It’s a risky time to be doing Bio-IT
Thursday, May 22, 14
![Page 18: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/18.jpg)
11
Big Picture / Meta Issue
‣ HUGE revolution in the rate at which lab platforms are being redesigned, improved & refreshed
‣ IT not a part of the conversation, running to catch up
Thursday, May 22, 14
![Page 19: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/19.jpg)
Science progressing way faster than IT can refresh/change
The Central Problem Is ...
‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure
• Bench science is changing month-to-month ...• ... while our IT infrastructure only gets refreshed every
2-7 years
‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...)
12Thursday, May 22, 14
![Page 20: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/20.jpg)
The Central Problem Is ...
‣ The easy period is over
‣ 5 years ago - under the desk solutions worked
‣ This doesn’t work any more!
13Thursday, May 22, 14
![Page 21: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/21.jpg)
14
The new normal for informatics
Thursday, May 22, 14
![Page 22: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/22.jpg)
And a related problem ...
‣ Easy to acquire vast amounts of data cheaply and easily
‣ Growth rate of data creation exceeds rate of storage improvements
‣ Not just a storage lifecycle problem.
• This data *moves* and often needs to be shared among multiple entities and providers
• ... ideally without punching holes in your firewall or consuming all available internet bandwidth
15Thursday, May 22, 14
![Page 23: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/23.jpg)
If you get it wrong ...
‣ Lost opportunity
‣ Missing capability
‣ Frustrated & very vocal scientific staff
‣ Problems in recruiting, retention, publication & product development
16Thursday, May 22, 14
![Page 24: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/24.jpg)
17
It’s a risky time to be doing Bio-IT
11
What are the drivers in Bio-IT today?
Thursday, May 22, 14
![Page 25: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/25.jpg)
18
Genomics: Next Generation Sequencing (NGS)
Thursday, May 22, 14
![Page 26: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/26.jpg)
It’s like the hard drive of life
19
The big deal about DNA
‣ DNA is the template of life
‣ DNA is read --> RNA
‣ RNA is read --> Proteins
‣ Proteins are the functional machinery that make life possible
‣ Understanding the template = understanding basis for disease
Thursday, May 22, 14
![Page 27: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/27.jpg)
Sequencing by SynthesisHow does NGS work?
20Thursday, May 22, 14
![Page 28: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/28.jpg)
Reference assembly, variant callingHow does NGS work?
21Thursday, May 22, 14
![Page 29: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/29.jpg)
Reference assembly, variant callingHow does NGS work?
21Thursday, May 22, 14
![Page 30: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/30.jpg)
Reference assembly, variant callingHow does NGS work?
21Thursday, May 22, 14
![Page 31: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/31.jpg)
Gateway to personalized medicineThe Human Genome
‣ 3.2 Gbp
‣ 23 chromosomes
‣ ~21,000 genes
‣ Over 55M known variations
22Thursday, May 22, 14
![Page 32: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/32.jpg)
...and why NGS is the primary driver
23
The Problem...
‣ Sequencers are now relatively cheap and fast
‣ Some can generate a human genome in 18 hours, for $2,000
‣ Everyone is doing it
‣ Can generate 3TB of data in that time
‣ First genome took 13 years and $2.7B to complete
Thursday, May 22, 14
![Page 33: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/33.jpg)
24
Other Methodologies Not Far Behind
Thursday, May 22, 14
![Page 34: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/34.jpg)
High-throughput Imaging
‣ Robotics screening millions of compounds on live cells 24/7
• Not as much data as genomics in volume, but just as complex
• Data volumes in the 10’s TB/week
‣ Confocal Imaging• Scanning 100’s of tissue sections/
week, each with 10’s of scans, each with 20-40 layers and multiple florescent channels
• Data volumes in the 1’s - 10’s TB/week
25Thursday, May 22, 14
![Page 35: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/35.jpg)
High-power, dense detector MRI scanners in use 24/7 at large research hospitals
High-res medical imaging
‣ Creating 3D models of brains, comparing large datasets
‣ Using those models to perform detailed neurosurgery with real-time analytic feedback from supercomputer in the OR (cool stuff)
‣ Also generates 10’s of TB/week
26Thursday, May 22, 14
![Page 36: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/36.jpg)
27
This is a huge problem
‣ Causing a literal deluge of data, in the 10’s of Petabytes
‣ NIH generating 1.5PB of data/month
‣ First real case in life science where 100Gb networking might really be needed
‣ But, not enough storage or compute
Thursday, May 22, 14
![Page 37: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/37.jpg)
28
And the problem is getting even bigger
Thursday, May 22, 14
![Page 38: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/38.jpg)
We have them allFile & Data Types
‣ Massive text files
‣ Massive binary files
‣ Flatfile ‘databases’
‣ Spreadsheets everywhere
‣ Directories w/ 6 million files
‣ Large files: 600GB+
‣ Small files: 30kb or smaller
29Thursday, May 22, 14
![Page 39: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/39.jpg)
30
Application characteristics‣ Mostly SMP/threaded apps
performance bound by IO and/or RAM
‣ Hundreds of apps, codes & toolkits
‣ 1TB - 2TB RAM “High Memory” nodes becoming essential
‣ Lots of Perl/Python/R
‣ MPI is rare• Well written MPI is even rarer
‣ Few MPI apps actually benefit from expensive low-latency interconnects*
• *Chemistry, modeling and structure work is the exception
Thursday, May 22, 14
![Page 40: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/40.jpg)
Why, giant meta-analyses, of course
31
What to do with all that data?
‣ Typical problem across all of big data: how do you use it?
‣ In life sciences: no real standards of data formats
‣ Data scattered all over, despite push for Data Commons
‣ Not always accessible
‣ Combining the data if you have it all is a real challenge
Thursday, May 22, 14
![Page 41: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/41.jpg)
Scientists don’t like to share (really!)A Compounding Problem...
‣ The fear: • if someone sees data before it
is published, they might steal it and publish it themselves (getting scooped)
‣ Causes:• Long time to publication• Outdated methods of
assigning scientific credit• Not properly incentivized
32Thursday, May 22, 14
![Page 42: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/42.jpg)
Sharing requiredA Problem for Data Commons
‣ Data piling up
‣ Bad network infrastructures
‣ Few central analytics platforms
‣ Wild-west file formats/algorithms
‣ No sharing33
Thursday, May 22, 14
![Page 43: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/43.jpg)
Sharing requiredA Problem for Data Commons
‣ Data piling up
‣ Bad network infrastructures
‣ Few central analytics platforms
‣ Wild-west file formats/algorithms
‣ No sharing33
Hyperscale analytics will only work if the data is accessible!
Thursday, May 22, 14
![Page 44: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/44.jpg)
34
But wait! There’s more!
Thursday, May 22, 14
![Page 45: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/45.jpg)
35
Local IT inhibiting research
‣ IT originally developed for business administration
‣ Tight policies, security at the expense of performance
‣ IT determines resources, not users
‣ Result: lack of institutional investment and lack of talent
‣ Doesn’t work for science
Thursday, May 22, 14
![Page 46: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/46.jpg)
IT staff become a part of the research projectsSuccessful Bio-IT orgs
‣ When IT and scientists at the table together, mutual understanding happens
‣ IT is driven to innovate to accommodate science
‣ Scientists better understand resourcing
‣ Roadblocks are removed, science is enabled
36Thursday, May 22, 14
![Page 47: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/47.jpg)
Up to a two line subtitle, generally used to describe the takeaway for the slide
37
How are these issues being solved?
Thursday, May 22, 14
![Page 48: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/48.jpg)
Compute related design patterns largely static
38
Core Compute
‣ Linux compute clusters are the baseline compute platform
‣ Even our lab instruments know how to submit jobs to common HPC cluster schedulers
‣ Compute is not hard. It’s a commodity that is easy to acquire & deploy in 2014
Thursday, May 22, 14
![Page 49: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/49.jpg)
Defensive hedge against Big Data / HDFS
39
Compute: Local Disk is Back
‣ We’ve started to see organizations move away from blade servers and 1U pizza box enclosures for HPC
‣ The “new normal” may be 4U enclosures with massive local disk spindles - not occupied, just available
‣ Why? Hadoop & Big Data
‣ This is a defensive hedge against future HDFS or similar requirements
• Remember the ‘meta’ problem - science is changing far faster than we can refresh IT. This is a defensive future-proofing play.
‣ Hardcore Hadoop rigs sometimes operate at 1:1 ratio between core count and disk count
Thursday, May 22, 14
![Page 50: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/50.jpg)
New and refreshed HPC systems running many node types
40
Compute: Huge trend in ‘diversity’
‣ Accelerated trend since at least 2012 ...• HPC compute resources no longer homogenous;
many types and flavors now deployed in single HPC stacks
‣ Newer clusters mix-and-match to match the known use cases:
• GPU nodes for compute
• GPU nodes for visualization• Large memory nodes (512GB +)• Very Large memory nodes (1TB +)
• ‘Fat’ nodes with many CPU cores• ‘Thin’ nodes with super-fast CPUs• Analytic nodes with SSD, FusionIO, flash or large
local disk for ‘big data’ tasks
Thursday, May 22, 14
![Page 51: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/51.jpg)
Most use Amazon Web ServicesBig push for cloud use
‣ Many Orgs are pushing for cloud
‣ Unsupported scientists end up using cloud
‣ It’s fast, flexible, affordable, if done right
‣ If done wrong, way more expensive than local compute
‣ Biggest problem: getting data to it!
41Thursday, May 22, 14
![Page 52: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/52.jpg)
42
Storage & Data Management‣ LifeSci core requirement:
• Shared, simultaneous read/write access across many instruments, desktops & HPC silos
‣ NAS = “easiest” option • Scale Out NAS products are the
mainstream standard
‣ Parallel & Distributed storage for edge cases and large organizations with known performance needs
• Becoming much more common: GPFS has taken hold in LifeSci
Thursday, May 22, 14
![Page 53: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/53.jpg)
43
Storage & Data Management
‣ Storage & data mgmt. is the #1 infrastructure headache in life science environments
‣ Most labs need “peta capable” storage due to unpredictable future
• Only a small % will actually hit 1PB• Often forced to trade away performance
in order to obtain capacity
‣ Object stores, ZFS and commodity “Nexentastor-style” methods are making significant inroads
Thursday, May 22, 14
![Page 54: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/54.jpg)
44
Data Movement & Data Sharing
‣ Peta-scale data movement needs
• Within an organization• To/from collaborators• To/from suppliers• To/from public data repos
‣ Peta-scale data sharing needs• Collaborators and partners may be
all over the world
Thursday, May 22, 14
![Page 55: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/55.jpg)
Physical & Network
45
We Have Both Ingest Problems
‣ Significant physical ingest occurring in Life Science
• Standard media: naked SATA drives shipped via Fedex
‣ Cliche example:• 30 genomes outsourced means 30
drives will soon be sitting in your mail pile
‣ Organizations often use similar methods to freight data between buildings and among geographic sites
Thursday, May 22, 14
![Page 56: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/56.jpg)
46
Physical Ingest Just Plain Nasty
‣ Most common high-speed network: FedEx
‣ Easy to talk about in theory
‣ Seems “easy” to scientists and even IT at first glance
‣ Really really nasty in practice• Incredibly time consuming• Significant operational burden• Easy to do badly / lose data
Thursday, May 22, 14
![Page 57: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/57.jpg)
47
Networking
‣ Major 2014 focus
‣ May surpass storage as our #1 infrastructure headache
‣ Why?• Petascale storage meaningless
if you can’t access/move it• 10-Gig, 40-Gig and 100-Gig
networking will force significant changes elsewhere in the ‘bio-IT’ infrastructure
Thursday, May 22, 14
![Page 58: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/58.jpg)
48
Network: Speed @ Core and Edge
‣ Remember 2004 when research storage requirements started to dwarf what the enterprise was using?
‣ Same thing is happening now for networking
‣ Research core, edge and top-of-rack networking speeds may exceed what the rest of the organization has standardized on
Thursday, May 22, 14
![Page 59: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/59.jpg)
Network: ‘ScienceDMZ’
‣ Very fast “low-friction” network links and paths with security policy and enforcement specific to scientific workflows
‣ “ScienceDMZ” concept is real and necessary
‣ BioTeam will be building them in 2014 and beyond
‣ Central premise:• Legacy firewall, network and security methods architected
for “many small data flows” use cases• Not built to handle smaller #s of massive data flows
• Also very hard to deploy ‘traditional’ security gear on 10Gigabit and faster networks
‣ More details, background & documents at http://fasterdata.es.net/science-dmz/
49
Background traffic or
competing bursts
DTN traffic with wire-speed
bursts
10GE
10GE
10GE
Thursday, May 22, 14
![Page 60: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/60.jpg)
50
Simple Science DMZ:
Image source: “The Science DMZ: Introduction & Architecture” -- esnet
Thursday, May 22, 14
![Page 61: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/61.jpg)
51
What does it all mean?
Thursday, May 22, 14
![Page 62: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/62.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 63: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/63.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 64: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/64.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 65: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/65.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 66: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/66.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 67: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/67.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 68: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/68.jpg)
Converged Infrastructure
53
The meta issue
‣ Individual technologies and their general successful use are fine
‣ Unless they all work together as a unified solution, it all means nothing
‣ Creating an end-to-end solution based on the use case (science!): converged infrastructure
Thursday, May 22, 14
![Page 69: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/69.jpg)
It’s what we doConvergence
54
Laboratory Knowledge
Thursday, May 22, 14
![Page 70: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/70.jpg)
It’s what we doConvergence
54
Laboratory Knowledge
Converged Solution
Thursday, May 22, 14
![Page 71: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/71.jpg)
It’s what we doConvergence
54
Laboratory Knowledge
Converged Solution
Thursday, May 22, 14
![Page 72: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/72.jpg)
55
What’s the future look like?
Thursday, May 22, 14
![Page 73: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/73.jpg)
Small desktop appliances; gateways
56
Local Compute Appliances
‣ BioTeam is developing appliances: affordable compute for labs without IT resources
‣ More of these will make their way into the field
‣ Act at gateways to larger resources
‣ Pre-packaged with best-practices, low IT burden
Thursday, May 22, 14
![Page 74: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/74.jpg)
Also known as hybrid cloudsHybrid HPC
‣ Relatively new idea• small local footprint• large, dynamic, scalable, orchestrated
public cloud component
‣ DevOps is key to making this work
‣ High-speed network to public cloud required
‣ Software interface layer acting as the mediator between local and public resources
‣ Good for tight budgets, has to be done right to work
‣ Not many working examples yet57
Thursday, May 22, 14
![Page 75: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/75.jpg)
Commodity Science DMZsScientific Networks
‣ As data grows, need clean/fast networks
‣ Security needs to be addressed, but performance is key
‣ Internet2 enables direct, clean, long-distance, affordable fast networking
58Thursday, May 22, 14
![Page 76: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/76.jpg)
Reduce guessing game on scientists’ partCommoditized Infrastructure
‣ Building blocks matched to use cases
‣ Blocks designed as converged infrastructure
‣ Let scientists get back to the science
‣ Make the infrastructure serve the science
59Thursday, May 22, 14
![Page 77: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/77.jpg)
Central storage of knowledge with computeData Commons
‣ Common structure for data storage and indexing (a cloud?)
‣ Associated compute for analytics
‣ Development platform for application development (PaaS)
‣ Make discovery more possible
60Thursday, May 22, 14
![Page 78: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/78.jpg)
Data Integration Platform!
Source 1! Source 2! Source 3!
Analytics/Data Retrieval!
Don’t move data, move computeData Integration Systems
‣ PaaS is a meta-index and development platform
‣ Index knows where the data is
‣ System reads data as needed from remote sources
‣ Requires good networks61
Thursday, May 22, 14
![Page 79: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/79.jpg)
Not so distant future!Fully converged laboratories
62
Laboratory Knowledge
Converged Solution
Thursday, May 22, 14
![Page 80: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/80.jpg)
Not so distant future!Fully converged laboratories
62
Laboratory Knowledge
Converged Solution
New Bottleneck!
Thursday, May 22, 14
![Page 81: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/81.jpg)
Not so distant future!Fully converged laboratories
63
Laboratory
Thursday, May 22, 14
![Page 82: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/82.jpg)
Not so distant future!Fully converged laboratories
63
Laboratory
Knowledge
Thursday, May 22, 14
![Page 83: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/83.jpg)
64
end; Thanks!Slides can be found at http://www.slideshare.com/arieberman
Thursday, May 22, 14