how hpc meets big data · how hpc meets big data ... data from the sequencer need to be served to...
TRANSCRIPT
![Page 1: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/1.jpg)
![Page 2: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/2.jpg)
![Page 3: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/3.jpg)
eXact lab - ICTP Data Day 3
how HPC meets big data
● HPDA: High Performance Data Analysis● tasks involving sufficient data volumes and
algorithm complexity to require HPC resources
● Use cases● climate modeling● risk analysis● national security● life science
![Page 4: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/4.jpg)
eXact lab - ICTP Data Day 4
use case: DNA sequencing
● In 2006 the XPRIZE Foundation offered $10 million to the first team that“... could sequence 100 whole human genomes at a cost
of $10000 or less per genome, in 30 days or less ...”
● from September 5, 2013 to October 5
➔ HURRY UP!!
![Page 5: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/5.jpg)
eXact lab - ICTP Data Day 5
use case: DNA sequencing
● In 2006 the XPRIZE Foundation offered $10 million to the first team that“... could sequence 100 whole human genomes at a cost
of $10000 or less per genome, in 30 days or less ...” ● from September 5, 2013 to October 5
➔ XPRIZE canceled the price on August 22"... genome sequencing technology is plummeting in cost and increasing in speed independent of our competition. Today, companies can do this for less than $5,000 per
genome, in a few days or less ..."
![Page 6: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/6.jpg)
![Page 7: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/7.jpg)
![Page 8: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/8.jpg)
![Page 9: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/9.jpg)
eXact lab - ICTP Data Day 9
Customer needs analysis
● Huge amount of genomic data from Illumina Hi-Seq 2000
● To backup (~20k € per run)● To post-process● Always available
➔ Data from the sequencer need to be served to the computational infrastructure
➔ Need for a fast, high performance, highly scalable file system, with robust failover and recovery mechanisms
![Page 10: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/10.jpg)
eXact lab - ICTP Data Day 10
Customer needs analysis
➔ Need for a fast, high performance, highly scalable file system, with robust failover and recovery mechanisms
➔ Lustre File System
➔ parallel and distributed
➔ high availability features
![Page 11: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/11.jpg)
![Page 13: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/13.jpg)
![Page 14: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/14.jpg)
![Page 15: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/15.jpg)
![Page 16: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/16.jpg)
![Page 17: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/17.jpg)
![Page 18: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/18.jpg)
![Page 19: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/19.jpg)
![Page 20: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/20.jpg)
![Page 21: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/21.jpg)
![Page 22: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/22.jpg)
![Page 23: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/23.jpg)
![Page 24: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/24.jpg)
![Page 25: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/25.jpg)
eXact lab - ICTP Data Day 25
The way to ensure MDT integrity
![Page 26: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/26.jpg)
![Page 27: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/27.jpg)
![Page 28: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/28.jpg)
![Page 29: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/29.jpg)
![Page 30: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable](https://reader034.vdocuments.us/reader034/viewer/2022052003/6016599b60bb5e385e1c55d9/html5/thumbnails/30.jpg)