(bac208) bursting to the cloud: deploying a hybrid cloud storage solution with aws | aws re:invent...
TRANSCRIPT
November 13, 2014 Las Vegas, Nevada
Aaron Black, Director of Informatics, Inova Translational Medicine Institute (ITMI)
Ron Bianchini, President and CEO, Avere Systems
• Five hospital + ambulatory
healthcare system
• Largest healthcare system in
Northern VA
• Two million patient visits/year
• 1,700 beds
• 20,000 deliveries/year
• Inova Translational Medicine
Institute (ITMI) started in 2010
ITMI research studies
• Molecular associations with preterm birth
• 365 Preterm Birth trios, 590 Full term trios
• WGS + ‘omics + clinical data
• Biobank: blood, buccal mucosa, saliva, cord blood,
placenta
Preterm birth study (2011)
• WGS + ‘omics + clinical data on 5,000 → 10,000
family trios – currently 2800 genomes
• Longitudinal study (≥18 yrs)
• Blood, saliva, urine, cord blood, placenta
• DNA, RNA, protein, epigenetic + clinical data
Longitudinal study (2012)
Congenital disorders
study (2012)• Mostly NICU-based
• Any other patient with a “congenital/genetic”
disorder
and there’s more...
• Additional studies
• Diabetes
• Obesity
• Heart implants
• Ethics
• Clinical pharmacogenomics testing
• Anticipating disease specific genetic panels in 2015
• Predictive analytics
Community health system
Large, diverse patient cohort
Family trio-based data sets
Longitudinal
Non-disease specific
Clinic linked to R&D
• Data diversity• Over 100 countries of origin
represented
• Data quality• CLIA regulated lab
• Complete medical records
• Whole genome sequence
• Interoperable and comparable data• Consistent SOPs across studies
• Consistent data formats / platforms
• Industry best practices
• Access to policy makers
• Patients
• 8,300 subjects (enrolled), > 110 different countries of birth
• Banked samples
• >200,000 banked samples
• Whole genome sequences
• 5,745
• 28,750,000,000
• Diagnosis
• 46,500 patient diagnosis
• Labs results
• 1,245,612 discrete lab results
• 91,881 surveys and case report forms
• 1,470,772 discrete variables from case report form and surveys
ITMI omics
• Whole genome sequences
• CGI and Illumina – high quality
• RNA-Seq
• Gene expression
• MicroRNA-Seq
• Gene regulation
• Methylation (Infinium 450k arrays)
• Gene regulation
• External enrichment
• Reference annotations – open and commercial
sourcesScalability needed!
Integrate data across services
Enable disease prediction models
Represent all informatics activities
Enable fast data discovery
Improve clinical care through better insightLearn
Predict
Report
Manage
Store Securely manage data at scale
ITMI informatics challenges
• Petabyte scale storage
• Execution: how to set up effective,
large scale, data store
• Cost: on-premises initial costs in
the $10’s MM
• Data durability: average of 2%
decay/mo* unacceptable
• HIPAA compliance
• Support for obscure
bioinformatics tools
• Data movement:
AWS on-premises
• 100’s of millions of files
• Large files: up to 0.5 TB each
• Encryption: difficult to ensure
integrity given large file sizes
• Fluctuating HPC demands
and benchmarks
• Balancing development and
support
• On premises – SGI UV2K \ NetApp– Large in-memory processing
– 16 TB memory
– 1024 CPU cores
– 1 PB magnetic storage
– 40 TB SSD storage
• 10+ Linux servers
• Numerous virtual machines– Application and DBs (Oracle, MSSQL, PostgreSQL, MySQL)
Why ITMI uses AWS
• Facilitates movement of biological data from vendors
• Lower data storage costs• Saved >$10 million in up-front costs
• 1.3 PB storage (mostly bio data)
• 7,000,000 + files (scripts for ETL)
• Only pay for usage per month biological data QC and analysis workflows
• Flexible number of Amazon EC2 instances• Linux clusters
• Hadoop/Cloudera big data engine
• Custom bioinformatics
• Quickly do proof of concepts
• Easy to share data with collaborators
• Easy to deploy web applications
Bio materials Omic data
Reporting
& analysis
On-premises Inova
bioinformatics staff
ITMI hybrid cloud with AWS & Avere
On-premises
laboratory &
clinical data
On-premises HPC cluster & storage
ITMI hybrid cloud with AWS & Avere
• Analysis from days (or not even finishing) to hours!
• Agile and targeted analysis
• Faster outcomes
• Improved patient care!
• Improved prediction!
Clients & Servers LAN / WANAvere Edge Filer
Low latency read, write
& metadata ops
Add performance NetApp
EMC
Oracle
Core Filer(s)
Edge Core Architecture• Edge filer performance optimized
• Read caching, Write posting
• Clustering for linear scaling
• Core filer capacity optimized
• High density disk
• Latency independent
• Heterogeneous global namespace
• On-line data migration
Success Stories• High performance
• Up to 50x traditional NAS
• Cost Savings
• Up to 80% savings vs NAS
• Remote Office/WAN
• Hide 98% WAN latency
• Public Benchmarks
• 80% footprint, WAN neutral
Clients & Servers LAN / WANAvere Edge Filer
Low latency read, write
& metadata ops
Add performance NetApp
EMC
Core Filer(s)
On-premises object storage
Amazon S3
AOS 3.0
AOS 4.0
FlashMoveHybrid Cloud Storage
• S3 bucket is treated as a Core filer
• All prior AOS features apply
• 50:1 off load
• GNS
• Online in/out migration
• Public Benchmarks performance neutral
• Ultimate cloud on-ramp!
This is the
Inova Translational Medicine Institute
Use Case
On-premises object storage
NAS (Netapp)
NAS (Isilon)
Customer premises
Amazon Web Services
Amazon S3
AOS 4.0
AOS
4.0
Amazon S3
Core Filer
Clients & Servers LAN / WANAvere Edge Filer
Core Filer(s)
AOS
4.5
Virtual FXT
in cloud
Amazon EC2
AOS 4.5
Cloudbursting
Best-In-Class NAS 100% AWS Cloud Enabled
• Storage: Amazon S3
• Compute: Amazon EC2
• No vendor-specific hardware
• No over-priced disks
The Only Best-In-Class NAS that is 100% enabled for AWS
Cloud
Gateway
Products
Traditional
NAS
Products
• Personalized medicine is happening at ITMI
• Many drivers for preventative personalized medicine
• Multiple and complicated barriers for broader success
• Keys for IT / informatics
• Leverage both cloud and on-premises HPC
• Use the right tools (there are many)
• Collaborate
• Be secure!
• Agile, scalable, resilient, durable