![Page 1: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/1.jpg)
High-Performance NetworkingUse Cases in Life Sciences
1
2014 Internet2 Technology Exchange; Indianapolis, INSlides available at http://www.slideshare.net/arieberman
![Page 2: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/2.jpg)
Who am I?
2
Director of Government Services, Principal Investigator
I’m a fallen scientist - Ph.D. Molecular Biology, Neuroscience, Bioinformatics
I’m an HPC/Infrastructure geek - 15 years
I help enable science!
I’m Ari
![Page 3: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/3.jpg)
3
BioTeam
‣ Independent consulting shop
‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done
‣ Infrastructure, Informatics, Software Development, Cross-disciplinary Assessments
‣ 11+ years bridging the “gap” between science, IT & high performance computing
‣ Our wide-ranging work is what gets us invited to speak at events like this ...
![Page 4: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/4.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
![Page 5: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/5.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
![Page 6: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/6.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
![Page 7: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/7.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
![Page 8: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/8.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
![Page 9: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/9.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
![Page 10: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/10.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Converged Solution
![Page 11: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/11.jpg)
What do we do?BioTeam
4
Laboratory Knowledge
Converged Solution
![Page 12: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/12.jpg)
Mostly work in Life SciencesOur domain coverage
• Government • Universities • Big pharma • Biotech • Private institutes • Diagnostic startups • Oil and Gas • Geospatial • Hollywood Animation • Law Enforcement
5
![Page 13: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/13.jpg)
6
OK, so why am I here talking to you?
![Page 14: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/14.jpg)
We have a unique perspective across much of life sciences
We’ve noticed a few things
‣ Big Data has arrived in Life Sciences
‣ Data is being generated at unprecedented rates
‣ Research and Biomedical Orgs were caught off guard
‣ IT running to catch up, limited budgets
‣ Money is tight, Orgs reluctant to invest in Bio-IT
7
25% of all Life Scientists will require HPC in 2015!
![Page 15: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/15.jpg)
8
Big Picture / Meta Issue
‣ HUGE revolution in the rate at which lab platforms are being redesigned, improved & refreshed
‣ IT not a part of the conversation, running to catch up
![Page 16: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/16.jpg)
Science progressing way faster than IT can refresh/change
The Central Problem Is ...
‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure
• Bench science is changing month-to-month ... • ... while our IT infrastructure only gets refreshed every
2-7 years
‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...)
9
![Page 17: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/17.jpg)
10
It’s a risky time to be doing Bio-IT
11
What are the drivers in Bio-IT today?
![Page 18: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/18.jpg)
11
Genomics: Next Generation Sequencing (NGS)
![Page 19: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/19.jpg)
It’s like the hard drive of life
12
The big deal about DNA
‣ DNA is the template of life
‣ DNA is read --> RNA
‣ RNA is read --> Proteins
‣ Proteins are the functional machinery that make life possible
‣ Understanding the template = understanding basis for disease
![Page 20: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/20.jpg)
Sequencing by SynthesisHow does NGS work?
13
![Page 21: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/21.jpg)
Reference assembly, variant callingHow does NGS work?
14
![Page 22: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/22.jpg)
Reference assembly, variant callingHow does NGS work?
14
![Page 23: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/23.jpg)
Reference assembly, variant callingHow does NGS work?
14
![Page 24: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/24.jpg)
Gateway to personalized medicineThe Human Genome
‣ 3.2 Gbp
‣ 23 chromosomes
‣ ~21,000 genes
‣ Over 55M known variations
15
![Page 25: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/25.jpg)
...and why NGS is the primary driver
16
The Problem...
‣ Sequencers are now relatively cheap and fast
‣ Some can generate a human genome in 18 hours, for $2,000
‣ Everyone is doing it
‣ Can generate 3TB of data in that time
‣ First genome took 13 years and $2.7B to complete
‣ Know of 10 organizations: 100,000 genomes over 5 years
![Page 26: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/26.jpg)
...and why NGS is the primary driver
16
The Problem...
‣ Sequencers are now relatively cheap and fast
‣ Some can generate a human genome in 18 hours, for $2,000
‣ Everyone is doing it
‣ Can generate 3TB of data in that time
‣ First genome took 13 years and $2.7B to complete
‣ Know of 10 organizations: 100,000 genomes over 5 years
That’s 14PB of data, folks
![Page 27: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/27.jpg)
17
Other Methodologies Not Far Behind
![Page 28: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/28.jpg)
High-throughput Imaging
‣ Robotics screening millions of compounds on live cells 24/7 • Not as much data as genomics in
volume, but just as complex • Data volumes in the 10’s TB/week
‣ Confocal Imaging • Scanning 100’s of tissue sections/
week, each with 10’s of scans, each with 20-40 layers and multiple florescent channels
• Data volumes in the 1’s - 10’s TB/week
18
![Page 29: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/29.jpg)
High-power, dense detector MRI scanners in use 24/7 at large research hospitals
High-res medical imaging
‣ Creating 3D models of brains, comparing large datasets
‣ Using those models to perform detailed neurosurgery with real-time analytic feedback from supercomputer in the OR (cool stuff)
‣ Also generates 10’s of TB/week
19
![Page 30: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/30.jpg)
20
This is a huge problem
‣ Causing a literal deluge of data, in the 10’s of Petabytes
‣ NIH generating 1.5PB of data/month
‣ First real case in life science where 100Gb networking might really be needed
‣ But, not enough storage or compute
![Page 31: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/31.jpg)
21
And, just to make things more complicated
![Page 32: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/32.jpg)
We have them allFile & Data Types
‣ Massive text files
‣ Massive binary files
‣ Flatfile ‘databases’
‣ Spreadsheets everywhere
‣ Directories w/ 6 million files
‣ Large files: 600GB+
‣ Small files: 30kb or smaller22
![Page 33: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/33.jpg)
Why, giant meta-analyses, of course
23
What to do with all that data?
‣ Typical problem across all of big data: how do you use it?
‣ In life sciences: no real standards of data formats
‣ Data scattered all over, despite push for Data Commons
‣ Not always accessible
‣ Combining the data if you have it all is a real challenge
![Page 34: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/34.jpg)
Scientists don’t like to share (really!)A Compounding Problem...
‣ The fear: • if someone sees data before it
is published, they might steal it and publish it themselves (getting scooped)
‣ Causes: • Long time to publication • Outdated methods of
assigning scientific credit • Not properly incentivized
24
![Page 35: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/35.jpg)
Sharing requiredA Problem for Data Commons
‣ Data piling up (scientists are hoarders)
‣ Bad network infrastructures
‣ Few central analytics platforms
‣ Wild-west file formats/algorithms
‣ No sharing25
![Page 36: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/36.jpg)
Sharing requiredA Problem for Data Commons
‣ Data piling up (scientists are hoarders)
‣ Bad network infrastructures
‣ Few central analytics platforms
‣ Wild-west file formats/algorithms
‣ No sharing25
Hyperscale analytics will only work if the data is accessible!
![Page 37: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/37.jpg)
Every kind of flow imaginableClear issue for Networking
‣ Mouse —> Elephant
‣ Typical problem: firewalls not designed for this
‣ Potentially massive amount of constant data movement
‣ How are people handling all of this?
26
![Page 38: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/38.jpg)
27
Use Cases in Life Sciences
![Page 39: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/39.jpg)
28
Getting Data out of the Laboratory
![Page 40: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/40.jpg)
Usually very little IT infrastructure in labsLaboratories not Integrated
‣ Tons of data generating equipment going in now
‣ Can generate 15GB of data in 50 hours
‣ Others can generate 64GB/day
‣ Labs are not designed to transmit data, lucky if wired for ethernet
29
![Page 41: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/41.jpg)
Usually very little IT infrastructure in labsLaboratories not Integrated
‣ Tons of data generating equipment going in now
‣ Can generate 15GB of data in 50 hours
‣ Others can generate 64GB/day
‣ Labs are not designed to transmit data, lucky if wired for ethernet
29
![Page 42: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/42.jpg)
Usually very little IT infrastructure in labsLaboratories not Integrated
‣ Tons of data generating equipment going in now
‣ Can generate 15GB of data in 50 hours
‣ Others can generate 64GB/day
‣ Labs are not designed to transmit data, lucky if wired for ethernet
29
![Page 43: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/43.jpg)
OK, so write data over ethernet to network drive…Getting data out
‣ Sounds good, 64GB in 24 hours ~= 6Mb/s
‣ Problem: desktop class ethernet adaptors
‣ No error checking, no retries, no MD5, no local buffer
‣ If network goes, whole run is lost
30
![Page 44: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/44.jpg)
Scientists have to get creative, but not in a good wayGetting data out
‣ Usually ends up going to local workstation
‣ Go buy the cheapest disks they can
‣ Carry it somewhere, transfer the data to a workstation
‣ Put the disk in a drawer under a sink (really)
‣ Works if lab only does one or two runs/month, fails if more
31
![Page 45: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/45.jpg)
Unless you’re dealing with a bigger lab with lots of equipment, or a core facility
Lab data transit not huge!
‣ Fast networking not required, 100Mb OK
‣ Just GOOD networking
‣ ….for now (more later)
32
![Page 46: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/46.jpg)
Some generalized network models that have successfully solved the problem
Successful models
‣ Most of it is protocol and topology
‣ Quality of Service (QoS)
‣ Appropriate segmentation (L2 and/or L3)
‣ MPLS paths
‣ Intermediate protocols (i.e., Aspera FASP)
‣ One way or another, guarantee transfer
33
![Page 47: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/47.jpg)
34
Storing the Data
![Page 48: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/48.jpg)
As storage needs increase, the need to transmit it goes up too
Storage: a networking problem
‣ Networking will quickly replace storage as #1 headache in Bio-IT
‣ Petascale storage is useless without high-performance networking
‣ Most enterprise networks won’t cut it
35
![Page 49: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/49.jpg)
Most single laboratories don’t have an immediate need for peta-scale storage
Storage: an Org Problem
‣ BUT - labs need to be peta-capable
‣ Can’t predict how much or what kind of equipment
‣ Have to build for an indeterminate future
‣ Does it make sense for each lab to buy own storage? • Probably not, doesn’t scale well
financially36
![Page 50: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/50.jpg)
Orgs that don’t invest will find themselves in a mess of storage support
Storage: an Org Problem
‣ This is when the storage problem becomes a networking problem
‣ Scientists need to share, collaborate
‣ Lab with 100TB of data, needs to share with offsite or onsite scientist
‣ Also: backups and disaster recovery: data is the new commodity
37
![Page 51: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/51.jpg)
Without high-performance networking, petascale anything is useless
Storage: a networking problem
‣ Traditional enterprise networks don’t cut it
‣ Large single-stream flows get squashed through firewalls and IDS
‣ Centralized: 10’s of PBs
‣ Distributed: 100’s of PBs • Likely a lot of duplication
‣ Network becomes key
‣ Cloud use makes this an even bigger problem
38
![Page 52: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/52.jpg)
Storage: options!
‣ There are a ton of options for storage • Local: small and large • Institutional: mostly large • Distributed Institutional: distributed NAS
(GPFS over WAN), Object store networks, iRODS
• Public clouds: block and object storage
‣ All require high-performance networking
‣ Anything external requires awesome external connection
39
![Page 53: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/53.jpg)
External connections that make petascale storage useful to scientists
Storage networking: solutions
‣ OC-192 • Works for large institutions willing to
make investment • Cost prohibitive: $200-$300k/month • Start-up cost of at least $1-2M for
border equipment
‣ Internet2 10/100Gb Hybrid ports • Much better cost, fewer routing
options • $200k/year
‣ Google Fiber, AT&T Gigapower?40
![Page 54: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/54.jpg)
Internal networking more critical than external for petascale storage
Storage networking: solutions
‣ Infrastructure must be able to support the inevitable 1PB transit • Disaster recovery • High-availability • Backup
‣ Need at least 10Gb • Probably dedicated 10Gb per >1PB
storage facility: 40Gb min —> 1Tb backbone
‣ 1Gb will not cut it for that data size • ~97 days to transmit at saturation • 10Gb: ~9.7 days
41
![Page 55: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/55.jpg)
And now, the real problem: topology and logical design
Storage networking: solutions
‣ Need a scaling internal topology
‣ One core switch doing all routing and packet transit == bad
‣ More advanced designs needed
‣ Also: prioritize performance over security • Nearly impossible for most orgs
‣ Most implemented option: Science DMZ
42
![Page 56: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/56.jpg)
Sensitive data have policies and compliance issues, breaking them can be illegal
Science DMZ: not for everything
‣ Need logical topology flexible enough for security AND performance
‣ Best example: ISP model • Collapsed PE/CE on single router at edge • OSPF routing at edge, fast label
switching on dual 100Gb cores • VRF for network segments • MPLS for fast transit and bandwidth
guarantees
‣ Side benefit: trusted and untrusted Science DMZ
43
![Page 57: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/57.jpg)
44
Analyzing the data
![Page 58: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/58.jpg)
The pinnacle of data transit, the reason we store it in the first place
Compute == Answers!
‣ High performance computing: clusters, supercomputers, single servers, powerful workstations, etc.
‣ Mostly a datacenter issue
‣ Unless… • Storage not centralized or co-
located: data duplicated unless have a killer network
• New methods: data doesn’t move, compute moves to data
45
![Page 59: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/59.jpg)
Assumes the use of central high-performance storage system
Use Case: Get data to cluster
‣ Easier problem within the same datacenter
‣ Large data needs large pipe
‣ Output of storage device needs to be fast • Needs to drive data to/from all
compute nodes simultaneously
‣ Large clusters: big problem • Needs parallel filesystems:
GPFS, Lustre46
![Page 60: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/60.jpg)
Use of local disk in newer clustersInternal network esp. important
‣ Implementation of storage/analytics systems for Big Data/HDFS
‣ Hadoop, Gluster, local ZFS volumes, virtual disk pools
‣ Now storage can be both internal and external
‣ I/O throughput is critical47
![Page 61: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/61.jpg)
Application characteristics
‣ Mostly single process apps
‣ Some SMP/threaded apps performance bound by IO and/or RAM
‣ Lots of Perl/Python/R
‣ Hundreds of apps, codes & toolkits
‣ 1TB - 2TB RAM “High Memory” nodes becoming essential
‣ MPI is rare • Well written MPI is even rarer
‣ Few MPI apps actually benefit from expensive low-latency interconnects* • *Chemistry, modeling and structure work is
the exception48
![Page 62: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/62.jpg)
Genomics especiallyLife Science very I/O bound
‣ Sync time for data often takes longer than the job itself
‣ Have to load up to 300GB into memory, for 1min process
‣ Do this thousands of times
‣ Largely due to bad programming and improperly configured systems
49
![Page 63: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/63.jpg)
Interconnects between the nodes and the cluster’s connection to the main network critical
Cluster networking Solutions
‣ Optimal cluster networks: fat tree and torus topologies • All layer 2, internally ‣ Most keep subscription to 1:4,
depending on usage
‣ Top-level switches connect at high speed to datacenter network • Newest are multiple 10Gb or 40Gb • Infiniband internal networks:
Mellanox ConnectX3 - ethernet and IB capable switch ports
50
![Page 64: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/64.jpg)
51
Sharing the data: Collaboration
![Page 65: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/65.jpg)
Fundamental to scienceCollaboration
‣ Now that data production is reaching petascale, collaboration is getting harder
‣ Projects are getting more complex, more data is being generated, takes more people to work on the science
‣ Journal authorships: common to see 40+ authors now
‣ Clearly a networking problem at its core
‣ Let’s face it, doing this right is expensive!52
![Page 66: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/66.jpg)
The gist of collaborative data sharing in life sciencesData Movement & Data Sharing
‣ Peta-scale data movement needs • Within an organization • To/from collaborators • To/from suppliers • To/from public data repos
‣ Peta-scale data sharing needs • Collaborators and partners may
be all over the world53
![Page 67: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/67.jpg)
54
Most common high-speed network: FedEx
![Page 68: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/68.jpg)
Physical & NetworkWe Have Both Ingest Problems
‣ Significant physical ingest occurring in Life Science • Standard media: naked SATA drives
shipped via Fedex
‣ Cliche example: • 30 genomes outsourced means 30
drives will soon be sitting in your mail pile
‣ Organizations often use similar methods to freight data between buildings and among geographic sites
55
![Page 69: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/69.jpg)
Physical Ingest Just Plain Nasty
‣ Easy to talk about in theory
‣ Seems “easy” to scientists and even IT at first glance
‣ Really really nasty in practice
• Incredibly time consuming • Significant operational burden • Easy to do badly / lose data
56
![Page 70: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/70.jpg)
Science DMZ: making it easier to collaborateCollaboration Solutions
57Image source: “The Science DMZ: Introduction & Architecture” -- esnet
![Page 71: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/71.jpg)
Internet2: making data accessible and affordableCollaboration Solutions
‣ Internet2 is bringing Research and Education together • High-speed, clean networking at its
core • Novel and advanced uses of SDN • Subsidized rates: national high-
performance networking affordable
‣ AL2S: quickly establish national networks at high-speed
‣ Combined with Science DMZ: platform for collaboration
58
![Page 72: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/72.jpg)
Push for Cloud use: Most use Amazon Web Services, Google Cloud not far behind
Collaboration Solutions
‣ Many Orgs are pushing for cloud
‣ Unsupported scientists end up using cloud
‣ It’s fast, flexible, affordable, if done right
‣ Great place for large public datasets to live
‣ Has existing high(ish)-performance networking
‣ If done wrong, way more expensive than local compute
‣ Biggest problem: getting data to it!59
![Page 73: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/73.jpg)
Hybrid HPC: Also known as hybrid cloudsCollaboration Solutions
‣ Relatively new idea • small local footprint • large, dynamic, scalable, orchestrated
public cloud component
‣ DevOps is key to making this work
‣ High-speed network to public cloud required
‣ Software interface layer acting as the mediator between local and public resources
‣ Good for tight budgets, has to be done right to work
‣ Not many working examples yet60
![Page 74: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/74.jpg)
Central storage of knowledge with computeData Commons
‣ Common structure for data storage and indexing (a cloud?)
‣ Associated compute for analytics
‣ Development platform for application development (PaaS)
‣ Make discovery more possible
61
![Page 75: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/75.jpg)
62
An Example of Progress
![Page 76: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/76.jpg)
Huge Government Agency trying to make agriculture better in every way
USDA: Agricultural Research Service
‣ Researchers doing amazing research on how crops and animals can be better farmed
‣ Lower environmental impacts
‣ Better economic returns
‣ How to optimize how agriculture functions in the US
‣ But, there’s a problem…63
![Page 77: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/77.jpg)
Every kind of high-throughput research talked about they are doing, and more, and on a massive scale
They’re doing all the things!
64
![Page 78: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/78.jpg)
Just to list a few…
‣ Genomics (a lot of de novo assembly)
‣ Large scale imaging • LIDAR • Satellite
‣ Simulations
‣ Climatology
‣ Remote sensing
‣ Farm equipment sensors (IoT)65
![Page 79: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/79.jpg)
Their current network
66
• Upgrading to DS3 • Still a lot of T1 • Won’t cut it for
science
![Page 80: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/80.jpg)
Build a Science DMZ: SciNet, on an Internet2 AL2S Backbone
The new initiative
67
![Page 81: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/81.jpg)
Hybrid HPC, Storage, Virtualization environmentSciNet to feature compute
68
![Page 82: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/82.jpg)
69
What’s the Big Picture?
![Page 83: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/83.jpg)
Utilizing scientific computing to enable discoveryProblems getting solved
70
Laboratory Knowledge
![Page 84: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/84.jpg)
Utilizing scientific computing to enable discoveryProblems getting solved
70
Laboratory Knowledge
![Page 85: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/85.jpg)
Utilizing scientific computing to enable discoveryProblems getting solved
70
Laboratory Knowledge
![Page 86: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/86.jpg)
Utilizing scientific computing to enable discoveryProblems getting solved
70
Laboratory Knowledge
![Page 87: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/87.jpg)
Utilizing scientific computing to enable discoveryProblems getting solved
70
Laboratory Knowledge
![Page 88: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/88.jpg)
Utilizing scientific computing to enable discoveryProblems getting solved
70
Laboratory Knowledge
![Page 89: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/89.jpg)
Converged Infrastructure
71
The meta issue
‣ Individual technologies and their general successful use are fine
‣ Unless they all work together as a unified solution, it all means nothing
‣ Creating an end-to-end solution based on the use case (science!): converged infrastructure
![Page 90: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/90.jpg)
It’s what we do[Hyper-]convergence
72
Laboratory Knowledge
![Page 91: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/91.jpg)
It’s what we do[Hyper-]convergence
72
Laboratory Knowledge
Converged Solution
![Page 92: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/92.jpg)
It’s what we do[Hyper-]convergence
72
Laboratory Knowledge
Converged Solution
![Page 93: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/93.jpg)
People matter tooConvergence
73
Laboratory Knowledge
Converged Solution
![Page 94: High-Performance Networking Use Cases in Life Sciences](https://reader030.vdocuments.us/reader030/viewer/2022020721/558b06e3d8b42a8d0f8b4728/html5/thumbnails/94.jpg)
“The network IS the computer” - John Gage, Sun Microsystems
Universal Truth
‣ Convergence is not possible without networking
‣ Also not possible without GOOD networking
‣ Life Sciences is learning lessons learned by physics and astronomy 5-10 years ago
‣ Biggest problem is Org acceptance and investment in personnel and equipment
‣ Next-Gen biomedical research advancing too quickly: must invest now
74