accelerating genomic computations 1000x with hardware talks... · genomic granular computing...
TRANSCRIPT
![Page 1: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/1.jpg)
Accelerating Genomic Computations 1000X with Hardware
Prof. Bill Dally (Electrical Engineering and Computer Science)Prof. Gill Bejerano (Computer Science, Developmental Biology and Pediatrics)
Yatish Turakhia EE PhD candidate Stanford University
![Page 2: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/2.jpg)
DNA sequencing costs and data explosion
2
• Since 2003, genomics data doubling every 7 months!
• Exabyte data by 2025 – 100M to 2B genomes to be sequenced!
1st gen
2nd gen
3rd gen
“Storing and processing genome data will exceed the computing challenges of running YouTube and Twitter, biologists warn.”[Nature News, 2015]
“The decreasing cost of sequencing and the increasing number of sequence reads being generated are placing greater demand on the computational resources and knowledge necessary to handle sequence data.”[Genome Biology, 2016]
Stephens, Zachary D., et al. "Big data: astronomical or genomical?." PLoS Biology (2015)
![Page 3: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/3.jpg)
Genomic Granular Computing Applications
3
• 4 million newborns per year in the US alone• 1 in 33 newborns with rare genetic conditions
admitted to NICU • Time of essence for genome-based
diagnosis
• Non-invasively diagnose for over 3,000 rare genetic conditions (e.g. Down Syndrome)
• Free-floating DNA in blood – enormous volume!
• Early cancer detection – life-saving application for millions of individuals
• Non-invasive – circulating tumor DNA• Periodic sequencing of healthy individuals -
enormous volume!
Neonatal ICU
Prenatal ICU and IVF clinics
Liquid Biopsy
![Page 4: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/4.jpg)
Patient Diagnosis: Sample-to-answer
4
ATGTCGAT CGATACGA GAGTCATC ACTGACGT
Reads 1 2
REFERENCE:--ATGTCGATGATCCAGAGGATACTAGGATAT-
PATIENT: --ATGTCTATGATC--GAGGATATTAGGATAT-
Genome (3 Billion base pairs)
Genome Sequencing Machine
Mutations
Read assembly
Find the causal mutation
3
• Long reads (>10Kbp) offer a better resolution of the mutation spectrum but have high error rate (15-40%)
• >1,300 CPU hours for reference-guided assembly of noisy long reads• 14.2M CPU-years for 100M individuals
• >15,600 CPU hours for de novo assembly of noisy long reads• 178M CPU-years for 100M individuals
Patient
![Page 5: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/5.jpg)
Darwin: A Genomics Co-processor
High speed and programmability1. D-SOFT: Tunable speed/precision to match any error profile2. GACT: First algorithm with O(1) memory for compute-
intensive step of alignment allowing arbitrarily long alignments in hardware – ideal for long reads
3. First framework shown to accelerate reference-guided as well as de novo assembly of reads in hardware
D-SOFT(filter)
GACT(aligner)
GACT APID-SOFT API
Software Aligner
DarwinD-SOFT
Reference (R)
Que
ry (Q
)
GACT
Reference (R)
Que
ry (Q
)
5
![Page 6: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/6.jpg)
Darwin: 40nm ASIC configuration
6
D-SOFT
Sof
twar
e
GACT GACT GACT GACT
GACT GACT GACT GACT
Darwin
LPDDR4 (32GB)
LPDDR4 (32GB)
Area: 300mm2 Power: 9W
D-S
OFT
API
GAC
T AP
I
Algorithm Power(1 thread)
BWA-MEM 9.2W
GraphMap 10.7W
DALIGNER 8.8W
Software (Intel Xeon E5)
![Page 7: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/7.jpg)
GACT algorithm and hardware design
7
![Page 8: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/8.jpg)
Strategies for long sequence alignment
Algorithm Time Space (compute-intensive
step)
Optimal
Smith-Waterman O(mn) O(mn) Y
Hirschberg O(mn) O(m+n) Y
Banded Smith-Waterman
O(n) O(n) N
X-drop O(n) O(n) N
GACT O(n) O(1) N
Profound hardware design implications
Prior assumptions (hardware) Small upper bound on sequence length n
OR Trace-back of alignment in software – SLOW!
m, n: sequence lengths m >= n
8
![Page 9: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/9.jpg)
Genome Alignment using Constant-memory Trace-back (GACT)
* G G C G A C T T T*GGTCGTTT
Reference (R)
Que
ry (Q
)
Tile 1
Tile 2
1. Initialize Icurr, Jcurr in R, Q2. Form tile of maximum size T
around Icurr, Jcurr in R, Q3. Align tile with trace-back
from Icurr, Jcurr with at most (T-D) steps
4. Update Icurr, Jcurr with trace-back end coordinates
5. Repeat 2-4 till extension no longer possible
G G - C G A C T T T| | | | | | | G G T C G - - T T T
Optimal Alignment
Score = 11
T = 5, D=2
Tile 3
G G - C G A C T T T| | | | | | | G G T C G - - T T T
Alignment
Score = 11
9
(Icurr, Jcurr)
(Icurr, Jcurr)
![Page 10: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/10.jpg)
GACT empirically provides optimal alignments
10
} GACT tile size T=400} GACT compared to optimal Smith-Waterman for 200,000 10Kbp
sequences with 4 different error rates: 10%, 20%, 30% and 40%} Simple scoring (match: +1, mismatch: -1, gap: -1)} At D=120, all observed alignments were optimal
D (in bp)
Fraction alignments non-optimal
Worst-case score loss
10% 20% 30% 40% 10% 20% 30% 40% 0 30.4% 61.0% 83.0% 94.7% 0.29% 0.67% 1.26% 2.38%
30 0.0% 0.02% 0.55% 55.3% 0.0% 0.35% 0.63% 1.59% 60 0.0% 0.0% 0.01% 1.38% 0.0% 0.0% 0.34% 0.81% 90 0.0% 0.0% 0.0% 0.05% 0.0% 0.0% 0.0% 0.33%
120 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
![Page 11: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/11.jpg)
GACT Hardware-acceleration
11
A G G T C G G T AAGTCACTAT
Query Block 1
Query Block 2
Query Block 3
Reference
Que
ry
PE 0 PE 1 PE 2 PE 3
TB Logic
A C T A
G C T G
SRA
M
SRA
M
SRA
M
SRA
M
} Systolic array of Npe (= 4) processing elements (PEs) solve Smith-Waterman-Gotoh} Tile with size T > Npe, query divided into blocks, reference streamed through each block} Computation exploits wave-front parallelism} On-chip SRAM for storing trace-back state (4-bit per cell) } Total SRAM size = 4-bit x (Tmax)2 => 128KB for Tmax = 512
T = 9
![Page 12: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/12.jpg)
Darwin: GACT Performance
12
• Runtime scales linearly to sequence length• 300-1000X faster than Edlib• 10,000X faster than software implementation of GACT
574K
108K 54K
1
10
100
1000
10000
100000
1000000
1 2 3 4 5 6 7 8 9 10
Alig
nmen
ts/s
ec
Sequence length (Kbp)
GACT (Software) Edlib GACT (Darwin)
302X
591X
986X
35X
19X11X
![Page 13: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/13.jpg)
D-SOFT algorithm and hardware design
13
![Page 14: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/14.jpg)
Seed Position table based exact matching
14
R = AGCTATACTA
AA
AC 6
AG 0
AT 4
CA
CC
CG
CT 2 7
GA
GC 1
GG
GT
TA 3 5 8
TC
TG
TT
Seed Positions Q = GCTA
GC:1 CT: 2, 7 TA: 3, 5, 8
Slope=1
R
Q
For human genome, seed position table size > 12GB (4B x 3 x 109)
3210
1 2 3 4 5 6 7 8
![Page 15: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/15.jpg)
Diagonal-band Seed Overlapping based Filtration Technique (D-SOFT)
} Divide R into NB bins (diagonal bands)} Use N seeds of size k bp from different offsets in Q} Lookup positions of seeds in R and assign each seed hit to
corresponding bin (diagonal band)} Count non-overlapping Q base-pairs covered by seed hits for
each bin and filter based on threshold h (same as DALIGNER)
Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 Bin 6
Reference (R)
Que
ry (Q
)
123456789
106 5 9 4 0 5
NB = 6N = 10 k = 4h = 7
15
![Page 16: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/16.jpg)
D-SOFT hardware-acceleration design
16
Area: 264 mm2
Power: 7.3W
• Random accesses to update bins using on-chip SRAM (bin count SRAM)
• Area and power both dominated by 64MB Bin count SRAM • Hardware exploits DRAM channel parallelism for seed position lookup
![Page 17: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/17.jpg)
D-SOFT hardware-acceleration throughput
} ~2X speedup from parallel DRAM channels} ~3X reduction in number of memory accesses to the DRAM} All random memory accesses to update bins using on-chip
SRAM (64MB)} On-chip updates completely hide off-chip (DRAM) bandwidth
k Avg. hits per seed (Human Genome)
Throughput (103 seeds/sec) Darwin speedup Software Darwin
11 1765 7.9 760.6 96.3X 12 457 29.1 2,796.2 96.1X 13 118 136.1 9,126.3 67.1X 14 32 339.0 21,271.1 62.7X 15 8 784.3 34,166.7 43.5X
17
![Page 18: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/18.jpg)
Long read assembly on Darwin
18
![Page 19: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/19.jpg)
Darwin: Read assembly
19
Reference-guided
De novo
![Page 20: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/20.jpg)
Darwin: Performance Results
Reference-guided (54X human genome)
De novo (54X human genome)
Baseline: BWA-MEM (15%), GraphMap (30%, 40%)
Baseline: DALIGNER
Read Error Rate
D-SOFT settings (k, N, h)
Sensitivity SpeedupBaseline Darwin
15% (14, 750, 24) 95.95% 99.91% 4,110X30% (12, 1000, 25) 98.11% 98.40% 4,088X40% (11, 1300, 22) 97.10% 97.40% 128X
Read Error Rate
D-SOFT settings (k, N, h)
Sensitivity Speedup(Bottleneck)Baseline Darwin
15% (14, 1300, 24) 99.80% 99.89% 264X
20
![Page 21: Accelerating Genomic Computations 1000X with Hardware Talks... · Genomic Granular Computing Applications 3 • 4 million newborns per year in the US alone • 1 in 33 newborns with](https://reader030.vdocuments.us/reader030/viewer/2022040408/5ebca3f609eaff67267834a2/html5/thumbnails/21.jpg)
Thank you!
Questions or feedback?
21