ramethy: reconfigurable acceleration of bisulfite sequence … 8/2_james... · 2015. 4. 16. ·...
TRANSCRIPT
![Page 1: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/1.jpg)
1
Ramethy:Reconfigurable Acceleration ofBisulfite Sequence Alignment
James Arram, Wayne Luk (Imperial College)Peiyong Jiang (Chinese University of Hong Kong)
February 24, 2015
![Page 2: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/2.jpg)
2
Contributions
I A run-time reconfigurable architecture for acceleratingsequence alignment using FPGAs
I An application of this architecture to accelerate bisulfitesequence alignment
I Optimisations which improve the performance of theFM-index search operation
I Implementation of our design: Ramethy on a Maxeler nodeI 14.9x speed of multicore CPU (up to 88.4x)I 3.8x speed of GPU (up to 22x)
![Page 3: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/3.jpg)
3
Motivation
I Cost of DNA sequencing decreasing faster than Moore’s LawI Amount of sequenced data increasingI Current compute infrastructures strugglingI Slow analysis = slow diagnosis
![Page 4: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/4.jpg)
4
Bisulfite Sequencing
I Particular DNA sequencing used to find methylation status ofDNA
I Mechanism which controls gene expression
I Studies link Methylation to certain illnesses
Figure: Image by Karina Zillner
![Page 5: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/5.jpg)
5
Bisulfite Sequencing Applications
I Exciting diagnosis methods developed by CUHK:
1. Cancer diagnosisI General diagnosis: targets most cancersI Early stage detection
2. Prenatal diagnosisI Test baby for illness before birthI non-invasive: safe for mother and baby
![Page 6: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/6.jpg)
6
Bisulfite Sequencing Analysis
I Latest bisulfite sequencing analysis tool: Methy-Pipe(developed by CUHK)
I Fully integrated, but still slow (∼1 day to perform analysis)I Sequence alignment is bottleneck (over 50% of run-time)
T C G G A A G G A G T A G T T A C G T
T C G G A
G G A G C
G T T A C
C G G G A
G A G C A
T T A C G
G G A A G
A G C A G
T A C G T
Genetic diversity Sequencing error
Reference genome
Reads
![Page 7: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/7.jpg)
7
Alignment Programs
I Multiple software programs for sequence alignmentI Soap2, BWA, Bowtie, etc.
I Several GPU-based alignment programsI Soap3-dp, CUSHAWI up to 10x faster than CPU-based alignment programs
I FPGA based alignment programs have been proposedI Typically accelerate Smith-Waterman alignment algorithmI Good speed-up, but often compromise functionality and
accuracy
![Page 8: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/8.jpg)
8
FM-index
I Given Methy-Pipe alignment parameters, we target FM-indexsearch operation for acceleration
I Performs alignment using an index of the reference genomeI Closely related to suffix arrays (SA)
I FM-index search operation:I Update SA interval for each symbol in read (backwards search)I number updates (iterations) = number symbols in read
1: initialise SA interval2: for each symbol in read (last to first) do3: update SA interval4: end for
![Page 9: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/9.jpg)
9
FM-index Search Operation Analysis
I Extend search with backtracking for inexact alignmentI Edit operations (sub, ins, del) performed on the read
I Algorithm bottlenecks:I Dynamic data access to index (2 per update)I Excessive backtracking in inexact alignment
I Optimise algorithm and alignment architecture
![Page 10: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/10.jpg)
10
Static Architecture
I FPGA is configured with a static alignment circuit
Module 1
Module 2
Module 3
Input
Output
Static Architecture
FPGA Configuration 1
I Performance limited by features of alignment algorithms:
1. Idle Modules2. Unbalanced pipelines
3. Not enough resources4. Inflexible alignment
![Page 11: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/11.jpg)
11
Run-time Reconfigurable Architecture
I FPGA configurations for each moduleI Modules replicated as many times as possible
I Run-time reconfiguration used to switch configurations
Module 1
Module 1
Module 1
Input I.D
Module 2
Module 2
Module 2
I.D I.D
Module 3
Module 3
Module 3
I.D Output
Reconfigurable Architecture
FPGA Configuration 1 FPGA Configuration 2 FPGA Configuration 3
I.D = intermediate data
![Page 12: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/12.jpg)
12
Run-time Reconfigurable Architecture Analysis
I Performance of run-time reconfigurable architecture:
T =∑i
(Ti/Ni + tr + tt)
I Ti : time for module to process input dataI Ni : number of modules in configurationI Overheads: tr (reconfiguration time), tt (transfer time)
I Performance of static architecture
best: T = max (T1/N1, ...) worst: T =∑i
(Ti/Ni )
I Ni :runtime >> Ni :static
![Page 13: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/13.jpg)
13
Run-time Reconfigurable Architecture Analysis
I Reconfiguration addresses limitations of static architecture:
1. Configurations contain single type of module, datapartitioned:
I No idle modules
2. Modules are not interlinked:I No unbalanced pipeline of modules
3. Configurations for each stage in alignment algorithm:I Easier to map full alignment circuit to hardware
4. Run-time reconfiguration used to switch configurations:I Flexible alignment
![Page 14: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/14.jpg)
14
FM-index Optimisations
I Reduce dynamic data access: n-step FM-indexI Old: SA interval updated for 1 symbol per iterationI New: SA interval updated for n symbols per iterationI Number of dynamic data accesses reduced by factor of n
I Reduce excessive backtracking: bi-directional searchI Old: brute force searchI New: Use bi-directional search and constrain edit positionI Reduce search space and excessive backtracking
I More details in paper
![Page 15: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/15.jpg)
15
Ramethy: Implementation
I Design goal: Match alignment parameters of Methy-PipeI Permit up to two mismatches in readsI Report unique alignments only
Exact
Match
One
Mismatch
Reads Two
Mismatches
I Implement Ramethy on a 1U Maxeler MPC-X1000 nodeI 8 Altera Stratix V FPGAs each with 48GB of off-chip DRAMI Design as data flow graph, then compiled into bitstream
![Page 16: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/16.jpg)
16
Module Designs
I Host CPU streams reads to modules, results streamed backI Index stored in off-chip DRAMI Module uses BRAM to store reads and SA intervals
Off-chip DRAM
Host Device
FPGA
resultsreads
Module
Index
BR
AM
update
old interval
new interval
![Page 17: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/17.jpg)
17
Experimentation
I Experimental data:I Reference genome = chr22 of Human genome (Full 3.3GB)I 10M reads of 75 symbols
I Comparisons with:soap2 (CPU - 2 Intel Xeon: 32 cores)soap3-dp (GPU - 1 NVIDIA GTX 580, 512 cores)
I Among fastest alignment programs availableI soap2 used in Methy-Pipe, later versions may use soap3-dp
I Provide 3 performance values for Ramethy (8 FPGAs):I v1: measured values (1 module per Conf.)I v2: upper bound estimate (3 modules per Conf.)I static: best case static design 4 EM, 1 OM, 3 TM (1 module
per Conf.)
![Page 18: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/18.jpg)
18
Performance: Bisulfite Sequence Alignment
program platform clock freq. device Mbps speedup
soap2 Intel E5-2650 2000 2 4.5 1.0xsoap3-dp NVIDIA GTX-580 772 1 17.4 3.9x
static design MPC-X1000 150 8 58.5 13.1xRamethy v1 MPC-X1000 150 8 66.4 14.9xRamethy v2 MPC-X1000 150 8 395 88.4x
I 14.9x speed of soap2, 3.8x speed of soap3-dp
I Same accuracy as software
0 20 40 60 80 100
soap2
soap3-dp
static design
Ramethy v1
Ramethy v2
Speedup
![Page 19: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/19.jpg)
19
Performance: Standard DNA Sequence Alignment
program platform clock freq. device Mbps
Fernandez et al. Convey HC-1 150 4 13.2Olson et al. Pico M-503 250 8 120Ramethy v1 MPC-X1000 150 8 135
I Fernandez et al.: Does not test all edit positions
I Olson et al.: Backtracing?
I Ramethy: Identical to software
0 20 40 60 80 100 120 140 160
Fernandez et al.
Olson et al.
Ramethy v1
bases per second (M)
![Page 20: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/20.jpg)
20
Energy Consumption
I Measure energy consumption from run-time device power(obtained from MaxOS)
Program Device Power (W) Energy Consumption (kJ)
soap2 190 31.9soap3-dp 244 10.5
Ramethy v1 72 1.5
I Almost an order of magnitude less energy consumed byFPGAs (short run-time plus low power)
![Page 21: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/21.jpg)
21
Future Work
I Further optimising RamethyI Other ways of reducing dynamic data access?I Pre-filer reads for quick discarding
I Accelerating other parts of Methy-Pipe
I Study scientific and clinical impact
![Page 22: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static](https://reader034.vdocuments.us/reader034/viewer/2022052006/601a8053e7839e735b387dca/html5/thumbnails/22.jpg)
22
Conclusion
I Accelerate bisulfite sequence alignment using FPGAs
I Run-time reconfigurable architecture for acceleratingalignment algorithms
I Target FM-index for accelerationI Reduce dynamic data access and excessive backtracking
I Ramethy 14.9x speed of soap2, 3.8x speed of soap3-dpI Identical alignment accuracy to softwareI Almost an order of magnitude less energy consumed
I Faster alignment = improved science and healthcare