![Page 1: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/1.jpg)
From the Benchtop to the DatacenterIT and Converged Infrastructure in Life Science Research
1Leverage Big Data ’14, San Diego, CA.; May 2014
Thursday, May 22, 14
![Page 2: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/2.jpg)
Ari E. Berman, Ph.D.
Who am I?
2
Director of Government Services, Principal Investigator
I’m a fallen scientist - Ph.D. Molecular Biology, Neuroscience, Bioinformatics
I’m an HPC/Infrastructure geek - 14 years
I help enable science!Thursday, May 22, 14
![Page 3: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/3.jpg)
3
BioTeam
‣ Independent consulting shop
‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done
‣ Infrastructure, Informatics, Software Development, Cross-disciplinary Assessments
‣ 11+ years bridging the “gap” between science, IT & high performance computing
‣ Our wide-ranging work is what gets us invited to speak at events like this ...
Thursday, May 22, 14
![Page 4: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/4.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 5: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/5.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 6: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/6.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 7: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/7.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 8: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/8.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 9: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/9.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Thursday, May 22, 14
![Page 10: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/10.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Converged Solution
Thursday, May 22, 14
![Page 11: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/11.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Converged Solution
Thursday, May 22, 14
![Page 12: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/12.jpg)
Mostly work in Life SciencesOur domain coverage
• Government• Universities• Big pharma• Biotech• Private institutes• Diagnostic startups• Oil and Gas• Geospatial• Hollywood Animation• Law Enforcement
5Thursday, May 22, 14
![Page 13: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/13.jpg)
A ton of .gov work on a massive scaleMy Recent Work...
‣ National Institutes of Health
‣ US Department of Agriculture (USDA)
‣ Navy‣ Centers for Disease
Control
6Thursday, May 22, 14
![Page 14: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/14.jpg)
7
OK, so why are we here talking to you?
Thursday, May 22, 14
![Page 15: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/15.jpg)
We have a unique perspective across much of life sciences
We’ve noticed a few things
‣ Big Data has arrived in Life Sciences
‣ Data is being generated at unprecedented rates
‣ Research and Biomedical Orgs were caught off guard
‣ IT running to catch up, limited budgets
‣ Money is tight, Orgs reluctant to invest in Bio-IT
8
25% of all Life Scientists will require HPC in 2015!Thursday, May 22, 14
![Page 16: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/16.jpg)
It’s being made harder by lack of supportResearch is hard
‣ Scientists are getting frustrated
‣ Stubborn, fight for what they need
‣ They will build a cluster under their desk if it gets the job done
‣ In general they will win against IT
9Thursday, May 22, 14
![Page 17: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/17.jpg)
10
It’s a risky time to be doing Bio-IT
Thursday, May 22, 14
![Page 18: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/18.jpg)
11
Big Picture / Meta Issue
‣ HUGE revolution in the rate at which lab platforms are being redesigned, improved & refreshed
‣ IT not a part of the conversation, running to catch up
Thursday, May 22, 14
![Page 19: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/19.jpg)
Science progressing way faster than IT can refresh/change
The Central Problem Is ...
‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure
• Bench science is changing month-to-month ...• ... while our IT infrastructure only gets refreshed every
2-7 years
‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...)
12Thursday, May 22, 14
![Page 20: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/20.jpg)
The Central Problem Is ...
‣ The easy period is over
‣ 5 years ago - under the desk solutions worked
‣ This doesn’t work any more!
13Thursday, May 22, 14
![Page 21: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/21.jpg)
14
The new normal for informatics
Thursday, May 22, 14
![Page 22: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/22.jpg)
And a related problem ...
‣ Easy to acquire vast amounts of data cheaply and easily
‣ Growth rate of data creation exceeds rate of storage improvements
‣ Not just a storage lifecycle problem.
• This data *moves* and often needs to be shared among multiple entities and providers
• ... ideally without punching holes in your firewall or consuming all available internet bandwidth
15Thursday, May 22, 14
![Page 23: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/23.jpg)
If you get it wrong ...
‣ Lost opportunity
‣ Missing capability
‣ Frustrated & very vocal scientific staff
‣ Problems in recruiting, retention, publication & product development
16Thursday, May 22, 14
![Page 24: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/24.jpg)
17
It’s a risky time to be doing Bio-IT
11
What are the drivers in Bio-IT today?
Thursday, May 22, 14
![Page 25: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/25.jpg)
18
Genomics: Next Generation Sequencing (NGS)
Thursday, May 22, 14
![Page 26: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/26.jpg)
It’s like the hard drive of life
19
The big deal about DNA
‣ DNA is the template of life
‣ DNA is read --> RNA
‣ RNA is read --> Proteins
‣ Proteins are the functional machinery that make life possible
‣ Understanding the template = understanding basis for disease
Thursday, May 22, 14
![Page 27: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/27.jpg)
Sequencing by SynthesisHow does NGS work?
20Thursday, May 22, 14
![Page 28: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/28.jpg)
Reference assembly, variant callingHow does NGS work?
21Thursday, May 22, 14
![Page 29: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/29.jpg)
Reference assembly, variant callingHow does NGS work?
21Thursday, May 22, 14
![Page 30: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/30.jpg)
Reference assembly, variant callingHow does NGS work?
21Thursday, May 22, 14
![Page 31: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/31.jpg)
Gateway to personalized medicineThe Human Genome
‣ 3.2 Gbp
‣ 23 chromosomes
‣ ~21,000 genes
‣ Over 55M known variations
22Thursday, May 22, 14
![Page 32: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/32.jpg)
...and why NGS is the primary driver
23
The Problem...
‣ Sequencers are now relatively cheap and fast
‣ Some can generate a human genome in 18 hours, for $2,000
‣ Everyone is doing it
‣ Can generate 3TB of data in that time
‣ First genome took 13 years and $2.7B to complete
Thursday, May 22, 14
![Page 33: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/33.jpg)
24
Other Methodologies Not Far Behind
Thursday, May 22, 14
![Page 34: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/34.jpg)
High-throughput Imaging
‣ Robotics screening millions of compounds on live cells 24/7
• Not as much data as genomics in volume, but just as complex
• Data volumes in the 10’s TB/week
‣ Confocal Imaging• Scanning 100’s of tissue sections/
week, each with 10’s of scans, each with 20-40 layers and multiple florescent channels
• Data volumes in the 1’s - 10’s TB/week
25Thursday, May 22, 14
![Page 35: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/35.jpg)
High-power, dense detector MRI scanners in use 24/7 at large research hospitals
High-res medical imaging
‣ Creating 3D models of brains, comparing large datasets
‣ Using those models to perform detailed neurosurgery with real-time analytic feedback from supercomputer in the OR (cool stuff)
‣ Also generates 10’s of TB/week
26Thursday, May 22, 14
![Page 36: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/36.jpg)
27
This is a huge problem
‣ Causing a literal deluge of data, in the 10’s of Petabytes
‣ NIH generating 1.5PB of data/month
‣ First real case in life science where 100Gb networking might really be needed
‣ But, not enough storage or compute
Thursday, May 22, 14
![Page 37: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/37.jpg)
28
And the problem is getting even bigger
Thursday, May 22, 14
![Page 38: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/38.jpg)
We have them allFile & Data Types
‣ Massive text files
‣ Massive binary files
‣ Flatfile ‘databases’
‣ Spreadsheets everywhere
‣ Directories w/ 6 million files
‣ Large files: 600GB+
‣ Small files: 30kb or smaller
29Thursday, May 22, 14
![Page 39: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/39.jpg)
30
Application characteristics‣ Mostly SMP/threaded apps
performance bound by IO and/or RAM
‣ Hundreds of apps, codes & toolkits
‣ 1TB - 2TB RAM “High Memory” nodes becoming essential
‣ Lots of Perl/Python/R
‣ MPI is rare• Well written MPI is even rarer
‣ Few MPI apps actually benefit from expensive low-latency interconnects*
• *Chemistry, modeling and structure work is the exception
Thursday, May 22, 14
![Page 40: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/40.jpg)
Why, giant meta-analyses, of course
31
What to do with all that data?
‣ Typical problem across all of big data: how do you use it?
‣ In life sciences: no real standards of data formats
‣ Data scattered all over, despite push for Data Commons
‣ Not always accessible
‣ Combining the data if you have it all is a real challenge
Thursday, May 22, 14
![Page 41: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/41.jpg)
Scientists don’t like to share (really!)A Compounding Problem...
‣ The fear: • if someone sees data before it
is published, they might steal it and publish it themselves (getting scooped)
‣ Causes:• Long time to publication• Outdated methods of
assigning scientific credit• Not properly incentivized
32Thursday, May 22, 14
![Page 42: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/42.jpg)
Sharing requiredA Problem for Data Commons
‣ Data piling up
‣ Bad network infrastructures
‣ Few central analytics platforms
‣ Wild-west file formats/algorithms
‣ No sharing33
Thursday, May 22, 14
![Page 43: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/43.jpg)
Sharing requiredA Problem for Data Commons
‣ Data piling up
‣ Bad network infrastructures
‣ Few central analytics platforms
‣ Wild-west file formats/algorithms
‣ No sharing33
Hyperscale analytics will only work if the data is accessible!
Thursday, May 22, 14
![Page 44: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/44.jpg)
34
But wait! There’s more!
Thursday, May 22, 14
![Page 45: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/45.jpg)
35
Local IT inhibiting research
‣ IT originally developed for business administration
‣ Tight policies, security at the expense of performance
‣ IT determines resources, not users
‣ Result: lack of institutional investment and lack of talent
‣ Doesn’t work for science
Thursday, May 22, 14
![Page 46: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/46.jpg)
IT staff become a part of the research projectsSuccessful Bio-IT orgs
‣ When IT and scientists at the table together, mutual understanding happens
‣ IT is driven to innovate to accommodate science
‣ Scientists better understand resourcing
‣ Roadblocks are removed, science is enabled
36Thursday, May 22, 14
![Page 47: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/47.jpg)
Up to a two line subtitle, generally used to describe the takeaway for the slide
37
How are these issues being solved?
Thursday, May 22, 14
![Page 48: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/48.jpg)
Compute related design patterns largely static
38
Core Compute
‣ Linux compute clusters are the baseline compute platform
‣ Even our lab instruments know how to submit jobs to common HPC cluster schedulers
‣ Compute is not hard. It’s a commodity that is easy to acquire & deploy in 2014
Thursday, May 22, 14
![Page 49: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/49.jpg)
Defensive hedge against Big Data / HDFS
39
Compute: Local Disk is Back
‣ We’ve started to see organizations move away from blade servers and 1U pizza box enclosures for HPC
‣ The “new normal” may be 4U enclosures with massive local disk spindles - not occupied, just available
‣ Why? Hadoop & Big Data
‣ This is a defensive hedge against future HDFS or similar requirements
• Remember the ‘meta’ problem - science is changing far faster than we can refresh IT. This is a defensive future-proofing play.
‣ Hardcore Hadoop rigs sometimes operate at 1:1 ratio between core count and disk count
Thursday, May 22, 14
![Page 50: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/50.jpg)
New and refreshed HPC systems running many node types
40
Compute: Huge trend in ‘diversity’
‣ Accelerated trend since at least 2012 ...• HPC compute resources no longer homogenous;
many types and flavors now deployed in single HPC stacks
‣ Newer clusters mix-and-match to match the known use cases:
• GPU nodes for compute
• GPU nodes for visualization• Large memory nodes (512GB +)• Very Large memory nodes (1TB +)
• ‘Fat’ nodes with many CPU cores• ‘Thin’ nodes with super-fast CPUs• Analytic nodes with SSD, FusionIO, flash or large
local disk for ‘big data’ tasks
Thursday, May 22, 14
![Page 51: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/51.jpg)
Most use Amazon Web ServicesBig push for cloud use
‣ Many Orgs are pushing for cloud
‣ Unsupported scientists end up using cloud
‣ It’s fast, flexible, affordable, if done right
‣ If done wrong, way more expensive than local compute
‣ Biggest problem: getting data to it!
41Thursday, May 22, 14
![Page 52: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/52.jpg)
42
Storage & Data Management‣ LifeSci core requirement:
• Shared, simultaneous read/write access across many instruments, desktops & HPC silos
‣ NAS = “easiest” option • Scale Out NAS products are the
mainstream standard
‣ Parallel & Distributed storage for edge cases and large organizations with known performance needs
• Becoming much more common: GPFS has taken hold in LifeSci
Thursday, May 22, 14
![Page 53: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/53.jpg)
43
Storage & Data Management
‣ Storage & data mgmt. is the #1 infrastructure headache in life science environments
‣ Most labs need “peta capable” storage due to unpredictable future
• Only a small % will actually hit 1PB• Often forced to trade away performance
in order to obtain capacity
‣ Object stores, ZFS and commodity “Nexentastor-style” methods are making significant inroads
Thursday, May 22, 14
![Page 54: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/54.jpg)
44
Data Movement & Data Sharing
‣ Peta-scale data movement needs
• Within an organization• To/from collaborators• To/from suppliers• To/from public data repos
‣ Peta-scale data sharing needs• Collaborators and partners may be
all over the world
Thursday, May 22, 14
![Page 55: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/55.jpg)
Physical & Network
45
We Have Both Ingest Problems
‣ Significant physical ingest occurring in Life Science
• Standard media: naked SATA drives shipped via Fedex
‣ Cliche example:• 30 genomes outsourced means 30
drives will soon be sitting in your mail pile
‣ Organizations often use similar methods to freight data between buildings and among geographic sites
Thursday, May 22, 14
![Page 56: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/56.jpg)
46
Physical Ingest Just Plain Nasty
‣ Most common high-speed network: FedEx
‣ Easy to talk about in theory
‣ Seems “easy” to scientists and even IT at first glance
‣ Really really nasty in practice• Incredibly time consuming• Significant operational burden• Easy to do badly / lose data
Thursday, May 22, 14
![Page 57: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/57.jpg)
47
Networking
‣ Major 2014 focus
‣ May surpass storage as our #1 infrastructure headache
‣ Why?• Petascale storage meaningless
if you can’t access/move it• 10-Gig, 40-Gig and 100-Gig
networking will force significant changes elsewhere in the ‘bio-IT’ infrastructure
Thursday, May 22, 14
![Page 58: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/58.jpg)
48
Network: Speed @ Core and Edge
‣ Remember 2004 when research storage requirements started to dwarf what the enterprise was using?
‣ Same thing is happening now for networking
‣ Research core, edge and top-of-rack networking speeds may exceed what the rest of the organization has standardized on
Thursday, May 22, 14
![Page 59: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/59.jpg)
Network: ‘ScienceDMZ’
‣ Very fast “low-friction” network links and paths with security policy and enforcement specific to scientific workflows
‣ “ScienceDMZ” concept is real and necessary
‣ BioTeam will be building them in 2014 and beyond
‣ Central premise:• Legacy firewall, network and security methods architected
for “many small data flows” use cases• Not built to handle smaller #s of massive data flows
• Also very hard to deploy ‘traditional’ security gear on 10Gigabit and faster networks
‣ More details, background & documents at http://fasterdata.es.net/science-dmz/
49
Background traffic or
competing bursts
DTN traffic with wire-speed
bursts
10GE
10GE
10GE
Thursday, May 22, 14
![Page 60: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/60.jpg)
50
Simple Science DMZ:
Image source: “The Science DMZ: Introduction & Architecture” -- esnet
Thursday, May 22, 14
![Page 61: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/61.jpg)
51
What does it all mean?
Thursday, May 22, 14
![Page 62: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/62.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 63: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/63.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 64: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/64.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 65: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/65.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 66: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/66.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 67: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/67.jpg)
What just happened?
52
Laboratory Knowledge
Thursday, May 22, 14
![Page 68: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/68.jpg)
Converged Infrastructure
53
The meta issue
‣ Individual technologies and their general successful use are fine
‣ Unless they all work together as a unified solution, it all means nothing
‣ Creating an end-to-end solution based on the use case (science!): converged infrastructure
Thursday, May 22, 14
![Page 69: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/69.jpg)
It’s what we doConvergence
54
Laboratory Knowledge
Thursday, May 22, 14
![Page 70: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/70.jpg)
It’s what we doConvergence
54
Laboratory Knowledge
Converged Solution
Thursday, May 22, 14
![Page 71: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/71.jpg)
It’s what we doConvergence
54
Laboratory Knowledge
Converged Solution
Thursday, May 22, 14
![Page 72: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/72.jpg)
55
What’s the future look like?
Thursday, May 22, 14
![Page 73: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/73.jpg)
Small desktop appliances; gateways
56
Local Compute Appliances
‣ BioTeam is developing appliances: affordable compute for labs without IT resources
‣ More of these will make their way into the field
‣ Act at gateways to larger resources
‣ Pre-packaged with best-practices, low IT burden
Thursday, May 22, 14
![Page 74: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/74.jpg)
Also known as hybrid cloudsHybrid HPC
‣ Relatively new idea• small local footprint• large, dynamic, scalable, orchestrated
public cloud component
‣ DevOps is key to making this work
‣ High-speed network to public cloud required
‣ Software interface layer acting as the mediator between local and public resources
‣ Good for tight budgets, has to be done right to work
‣ Not many working examples yet57
Thursday, May 22, 14
![Page 75: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/75.jpg)
Commodity Science DMZsScientific Networks
‣ As data grows, need clean/fast networks
‣ Security needs to be addressed, but performance is key
‣ Internet2 enables direct, clean, long-distance, affordable fast networking
58Thursday, May 22, 14
![Page 76: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/76.jpg)
Reduce guessing game on scientists’ partCommoditized Infrastructure
‣ Building blocks matched to use cases
‣ Blocks designed as converged infrastructure
‣ Let scientists get back to the science
‣ Make the infrastructure serve the science
59Thursday, May 22, 14
![Page 77: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/77.jpg)
Central storage of knowledge with computeData Commons
‣ Common structure for data storage and indexing (a cloud?)
‣ Associated compute for analytics
‣ Development platform for application development (PaaS)
‣ Make discovery more possible
60Thursday, May 22, 14
![Page 78: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/78.jpg)
Data Integration Platform!
Source 1! Source 2! Source 3!
Analytics/Data Retrieval!
Don’t move data, move computeData Integration Systems
‣ PaaS is a meta-index and development platform
‣ Index knows where the data is
‣ System reads data as needed from remote sources
‣ Requires good networks61
Thursday, May 22, 14
![Page 79: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/79.jpg)
Not so distant future!Fully converged laboratories
62
Laboratory Knowledge
Converged Solution
Thursday, May 22, 14
![Page 80: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/80.jpg)
Not so distant future!Fully converged laboratories
62
Laboratory Knowledge
Converged Solution
New Bottleneck!
Thursday, May 22, 14
![Page 81: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/81.jpg)
Not so distant future!Fully converged laboratories
63
Laboratory
Thursday, May 22, 14
![Page 82: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/82.jpg)
Not so distant future!Fully converged laboratories
63
Laboratory
Knowledge
Thursday, May 22, 14
![Page 83: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences](https://reader034.vdocuments.us/reader034/viewer/2022051314/558b07d3d8b42abd468b45b2/html5/thumbnails/83.jpg)
64
end; Thanks!Slides can be found at http://www.slideshare.com/arieberman
Thursday, May 22, 14