storage and analysis of sensitive large-scale biomedical data in sweden
TRANSCRIPT
![Page 1: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/1.jpg)
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Ola SpjuthSNIC, UPPMAX and Science for Life
LaboratoryUppsala University, Sweden
![Page 2: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/2.jpg)
Ola Spjuth
• Associate Professor in Pharmaceutical Bioinformatics
• Guest Researcher
• Co-Director
• Manager of Bioinformatics Compute and Storage facility
![Page 3: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/3.jpg)
![Page 4: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/4.jpg)
2003: First sequenced human genome - 13 years for $3 billions
![Page 5: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/5.jpg)
2015: Human whole genome sequenced in 3 days for ~$1150
…requires supercomputersfor analysis and storage
Massively parallel sequencing….
![Page 6: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/6.jpg)
2010: Science for Life Laboratory inaugurated
An internationally leading center that develops and applies
large-scale technologies for molecular biosciences with a focus
on health and environment.
National platform since 2013
Stockholm node
Uppsala node
![Page 7: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/7.jpg)
2. Data delivery
Data generation and delivery
3. Analysis
Scientists
www.uppmax.uu.se/uppnexHigh-performance computers and large scale storage for bioinformatics analysis.
1. Sample transfer
![Page 8: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/8.jpg)
![Page 9: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/9.jpg)
Sequence production 2014:• Generated > 120 Tbp of sequence data• 13.7 Gbp/hour, 3.8 Mbp/sec (on average)
![Page 10: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/10.jpg)
Hardware resourcesmilou: HP cluster of 208 nodes
pica: 6 (7) PBHitachi storage
halvan: 2 TB high-memory computer
Fast network via SUNET
Backup via SNIC
Long-termstorage atSweStore
nestor: 48 nodes production cluster
meles: 547 TBHitachi storage mosler: 24
nodes, 223 TBSmog: 100 nodes, ~300 TB
2015: 250 nodes
2016: 200new nodes
+1 PB
+2 PB
![Page 11: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/11.jpg)
A national e-Infrastructure for NGS
Software + reference data
Support
Education
Compute resources
Storage resourcesEfficiency + automation
![Page 12: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/12.jpg)
What we sequenced at NGI /
![Page 13: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/13.jpg)
Chipster workbench on UPPMAX
UpCloud – smog - (OpenStack)
![Page 14: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/14.jpg)
• Open catalogue of VMIs• Hosted at Uppsala University
M. Dahlö, F. Haziza, A. Kallio, E.
Korpelainen, E. Bongcam-
Rudloff, and O. Spjuth.
BioImg.org: A catalogue of
virtual machine images for the
life sciences. Accepted in
Bioinformatics and Biology
Insights.
www.bioimg.org
Managing Virtual Machine Images
![Page 15: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/15.jpg)
Mosler overview
• e-Infrastructure for working with sensitive data
• Copy of Norwegian solution (TSD)
• Designed to look like UPPMAX clusters
![Page 16: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/16.jpg)
Mosler specifications
• High-performance computing in a virtualized environment (OpenStack)
• 2-factor authentication• Restricted data transfer in/out• Only accessible over remote desktop (ThinLinc) via
Mosler dashboard
• Aim: Compliant with all laws and regulations for analyzing sensitive data in Sweden
![Page 17: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/17.jpg)
Consortia
DBA
Consortiummember
MyResearch
Virtual environment
storage compute
Mosler
Datahosting
Datasyncing
Access, analysis
Data hosting use case
![Page 18: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/18.jpg)
Manager
DBA
Scientist
LifeGene
Virtual environment
storage compute
Mosler
1. Requestfor data
2. Approval
3. Dataextraction
4. Datatransfer
5. Access, analysis
Data extraction use case
![Page 19: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/19.jpg)
Nov 2014
20M € total grant4M € IT-infrastructure
![Page 20: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/20.jpg)
X-Ten System
• First system able to deliver 1000$ genome• Each run 1.2TB data
• 16 Human genome (30X)• 3 days per run
• Population scale genomics• 15K genomes per year
Swedish Genome Initiative
Call for a reference variation Database (1000 genomes) and for Whole Human Genome (half price).
Goal: 5.000 genomes 2015, 10.000 genomes 2016
![Page 21: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/21.jpg)
Aug-11 Mar-12 Sep-12 Apr-13 Nov-13 May-14 Dec-14 Jun-15 Jan-160
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000NGI-Stockholm Procution (Jan-12 to Dec-15)
Production date
Giga
Bas
esData production
Conservative Prediction(60% of maximum production)
![Page 22: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/22.jpg)
Whole Genome Sequencing
• Data on new scale, 80% expected to be sensitive New challenges
• Funding for IT-infrastructure from KAW foundation– Resources for data production (2 M EUR)– Resources for scientists (2 M EUR)
• A national security project funded by Swedish Research Council (5 M EUR over 4 years) – SNIC Sens
![Page 23: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/23.jpg)
SNIC-Sens
• 4-year project, started Jan 2015• Project owner: SNIC (Ann-Charlotte Sonnhammer)• Project leader: Ola Spjuth (until end of this week)• Aims:
– Specifications for analyzing sensitive data in SNIC (hardware, legal, contracts, processes etc.)
– Evaluation on the use of public cloud providers (Google, Amazon)
– Make available e-Infrastructure for production and research of data generated at NGI, blueprint for other domains
![Page 24: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/24.jpg)
SNIC-Sens roadmap
• Information classification workshop (21/5)• Risk/vulnerability analysis (2/6)• Specifications for hardware procurement• Public tender (end of this week)• Installation and testing of production system (Aug-
Sept)• Installation, configuration and testing of research
system (Q3-Q4)• Research system online (Q1 2016)
![Page 25: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/25.jpg)
Two pilots for clinical data management
![Page 26: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/26.jpg)
CML, Lucia Cavelier
![Page 27: Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden](https://reader035.vdocuments.us/reader035/viewer/2022070516/58738bae1a28ab272d8b6c4b/html5/thumbnails/27.jpg)
MDR, Åsa Melhus