exact alignment with fm-index on the intel xeon phi...

Post on 02-Jun-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FM-Index Search Algorithm Hardware resources Results Conclusions

Exact Alignment with FM-Index on the Intel XeonPhi Knights Landing Processor

Jose M. Herruzo1 Sonia Gonzalez-Navarro1 Pablo Ibanez2

Vıctor Vinals2 Jesus Alastruey-Benede2 Oscar Plata1

1Departamento de Arquitectura de ComputadoresUniversidad de Malaga

2Grupo de Arquitectura de ComputadoresUniversidad de Zaragoza

Accelerator Architecture in Computational Biology andBioinformatics, 2018

FM-Index Search Algorithm Hardware resources Results Conclusions

Motivation

Genomic Sequencing

New SequencingTechnologies

Sequence alignment

FM-Index Search Algorithm Hardware resources Results Conclusions

MotivationIndices

Suffix Tree

Hash Tables

FM-Index

FM-Index Search Algorithm Hardware resources Results Conclusions

MotivationBWT & FM-Index

Burrows Wheeler Transform(BWT)

FM-Index

BowtieBWASOAP2

banana$

0 banana$

1 anana$b

2 nana$ba

3 ana$ban

4 na$bana

5 a$banan

6 $banana

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index

Data structures

C arrayOcc table

Random memoryaccesses

Memory boundalgorithm

Backward Searchalgorithm

Algorithm: Backward Search Based on FM-index

Input: FM-index of T text (C & Occ), Q query, n:|T|, p:|Q|

Ouput: (sp,ep): Interval pointers of Q in T

begin

1: sp = C[Q[p]]

2: ep = C[Q[p]+1]

3: for i from p-1 to 1 step -1

4: sp = LF(Q[i],sp)

5: ep = LF(Q[i],ep)

6: end for

7: return (sp+1,ep)

end

2 LFop-chains

FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index

Lmem Lop

LLFM

MEMLF(Q{p-1},sp) OP

MEM OPLF(Q{p-1},ep)

MEM OP

MEM OP

LF(Q{p-2},sp)

LF(Q{p-2},ep)

idle time

Liter

FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index

Lmem Lop

LLFM

MEMLF(Q[1]{p-1},sp) OP

MEM OP

MEM OP

MEM OP

idle time

LF(Q[1]{p-1},ep)

LF(Q[1]{p-2},sp)

LF(Q[1]{p-2},ep)

MEMLF(Q[2]{p-1},sp) OP

MEM OPLF(Q[2]{p-1},ep)

OPLF(Q[Nq]{p-1},ep)

Liter

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Sampled FM-Index

Occ

rOcc

d=4

SFM

d=4

d=4

Reduce Memory Footprint↓

Increase computingrequirements

From Ferragina, P. and Manzini, G.: ”Opportunistic DataStructures with Applications” (2000)

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

K-Step FM-Index

Searching severalsymbols per

iteration

Increasedmemoryfootprint

Reducednumber of LFoperations

C1 4=X

A C G T

2-C1

AA AC AG AT CA CC TTGT

1

1 2 3

1

4=X

rOccA

C

G

T

16=X2

1 2 3

2-rOccAA

AC

AG

AT

CA

CC

TT

1 2 3 n+1

1

2

3

n+1

M

BWT

G C

T C

A A

A G

G T

1

2

3

n+1

GC

TC

AA

AG

GT

2-BWT

From Chacon et al. (2015)

FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Comparison

K1-32S

FM

K1-192

SFM

K1-448

SFM

K2-16S

FM

K2-128

SFM

K2-256

SFM

Implementation

10.0

12.5

15.0

17.5

20.0

22.5

25.0

27.5

30.0

SI (L

FMs/

KB

yte)

0.0 0.0 0.0

26.728.4

20.8

27.6 28.4

15.9

0.0 0.0 0.0

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Bit-Vector FM-Index

1 Q{i} X 1 d

SFM row

m%d

F A R R A M S M…………C A F L G A O M D

1 1 d

bvSFM row

m%d

1 0 0 0 0 0 0 0……0 0 1 0 0 0 0 0 0

(m-1)/d +1

F A R M S D

F

Q{i} 1 dm%d

0 0 0 0 0 1 0 1……0 0 0 0 0 0 0 1 0

M

X 1 dm%d

0 0 0 0 0 0 0 0……0 0 0 0 0 0 0 0 1

D

FM-Index Search Algorithm Hardware resources Results Conclusions

Bit-Vector FM-Index

Advantages

Reduced data movement

Reduced computing requirements

Disadvantages

Increased memory footprint

FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Comparison

K1-32S

FM

K1-192

SFM

K1-448

SFM

K2-16S

FM

K2-128

SFM

K2-256

SFM

K2-64b

vSFM

K2-96b

vSFM

Implementation

10

20

30

40

50

60

SI (L

FMs/

KB

yte)

27.6 28.4

15.9

26.7 28.4

20.8

56.3 56.3

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Hardware Resources

Xeon Phi 7210 Xeon E5-2630V4(KNL) (Broadwell)

64 cores @ 1.3 GHz 10 cores @ 2.2 GHz4 threads per core 2 threads per core

400 GB/s (MCDRAM) 68 GB/s (DDR4)

Intel Xeon Phi

AVX 512 VectorialProcessing Units

High Bandwidth Memory(HBM)

Cache / Hybrid / Flat

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

ResultsXeon Broadwell RANDOM Benchmark

4 8 12 16 20 24 28 32Dependency chains per thread

30

35

40

45

50

55

60

Ban

dwid

th (G

B/s

)

37.93

43.67 43.46 43.51 43.22 43.49 43.16 43.4

44.31 44.77 45.24 45.74 46.58 46.35 45.19 45.76

1 Cache Block2 Cache Block

FM-Index Search Algorithm Hardware resources Results Conclusions

ResultsXeon Phi KNL RANDOM Benchmark

2 3 4 5 6 8 10 12 14 16Dependency chains per thread

100

150

200

250

300

Ban

dwid

th (G

B/s

)

147.8

197.7209.9 211.3 219.6 217.2 217.6 217.8 216.9 215.5

82.3

108.0 109.6 111.4 112.4 114.5 115.3 117.2 115.0 113.6

1 Cache Block MCDRAM1 Cache Block DDR

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

ResultsExperimental Setup

Human Genome (Around 3GBases).

20 million input queries generated by Mason.

200 symbols per Sequence.

Measurements started after loading the sequences into mainmemory.

LFM/s as performance evaluation metric

FM-Index Search Algorithm Hardware resources Results Conclusions

Results

k1-19

2SFM

k1-32

SFM

k2-12

8SFM

k2-16

SFM

k2-96

bvSFM

k2-64

bvSFM

Implementation

0

2

4

6

8

10

12

Perf

orm

ance

(GLF

M/s

)

1.7

4.3

3.2

5.15.9

11.6

0.61.1 1.0 1.2

1.8 2.0

Theoretical limitsIntel Xeon Phi 7210Intel Xeon E5-2630v4

FM-Index Search Algorithm Hardware resources Results Conclusions

ResultsRoofline (Broadwell)

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14Search Intensity (LFM/Byte)

0

2

4

6

8

10

GLF

M/s

STREAM Peak Bandwidth

RANDOM Peak Bandwidth

Peak LF Performance (8 GLFM/s)

Previous CPU Performance (0.5 GLFM/s)

Previous GPU Performance (3.8 GLFM/s)

Tesla P100 NVBIO Match (2.7 GLFM/s)

K2-16SFM (1.23 GLFM/s)K2-64bvSFM (1.97 GLFM/s)

FM-Index Search Algorithm Hardware resources Results Conclusions

ResultsRoofline (Knights Landing)

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14Search Intensity (LFM/Byte)

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

GLF

M/s

STRE

AM

Pea

k Ba

ndw

idth

RANDOM Pe

ak B

andw

idth

Peak LF Performance (15.10 GLFM/s)

Previous CPU Performance (0.5 GLFM/s)

Previous GPU Performance (3.8 GLFM/s)Tesla P100 NVBIO Match (2.7 GLFM/s)

K2-16SFM (5.1G GLFM/s)K2-64bvSFM (11.6 GLFM/s)

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Conclusions

Reduced data movement

Efficiently used memory bandwidth

Great performance improvement

Up to 11.7 G LFM/s3x faster than previous GPU versions

FM-Index Search Algorithm Hardware resources Results Conclusions

Exact Alignment with FM-Index on the Intel XeonPhi Knights Landing Processor

Jose M. Herruzo1 Sonia Gonzalez-Navarro1 Pablo Ibanez2

Vıctor Vinals2 Jesus Alastruey-Benede2 Oscar Plata1

1Departamento de Arquitectura de ComputadoresUniversidad de Malaga

2Grupo de Arquitectura de ComputadoresUniversidad de Zaragoza

Accelerator Architecture in Computational Biology andBioinformatics, 2018

top related