tailored off the peg - freie universität · andreas döring: "tailored off the peg"...

29
Tailored off the peg Static Polymorphism in Software Library Design Andreas Döring 11/2007

Upload: others

Post on 06-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Tailored off the pegStatic Polymorphism in Software Library Design

Andreas Döring

11/2007

Page 2: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

SeqAn – Sequence Analysis

www.seqan.de

Page 3: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

SeqAn – Sequence Analysis

- Searching in long strings- Quite simple concepts- Main problem: Performance!- C++

www.seqan.de

Page 4: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Tailored off the PegSpeed up trick:

different implementationsfor different cases

Page 5: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Specialization Hierarchy

general

special

— slow

— fast

Page 6: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Run Time "Tailoring"Select best variant

at run timedepending on the data

Page 7: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Run Time "Tailoring"String matching algorithms:

alphabet size

patternlength

[Navarro, Raffinot, "Flexible Pattern Matching in Strings" (2002)]

Page 8: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Compile Time "Tailoring"Select best variant

at compile timedepending on data types

Page 9: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Gapped q-Gram HashGiven:1. Alphabet:

Σ = {A,C,G,T}

2. Sequence:

3. Shape:

A C G T G C C A T G C C A T C G T G C …

Page 10: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Gapped q-Gram Hash

A C G T G C C A T G C C A T C G T G C …

Σ = {A,C,G,T}

Page 11: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Gapped q-Gram Hash

A C G T G C C A T G C C A T C G T G C …A C T C

0 1 3 0 = 28 hash value4 10

To do:Compute hash values for all positions!

Σ = {A,C,G,T}0 1 2 3

Page 12: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Gapped q-Gram Hash

A C G T G C C A T G C C A T C G T G C …

To do:Compute hash values for all positions!

Application:Fast scan for similar regions in two sequences

Page 13: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Implementations

A C G T G C C A T G C C A T C G T G C …

Generic Shape

Positions

hash(){

(loop)}

Page 14: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Implementations

Ungapped Shape

Length q

hash()hashNext(prev)

A C G T G C C A T G C C A T C G T G C …1 2 30 4 42 41

27101101018510

O(q)O(1)

1 multiplication1 modulo1 addition

Page 15: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Implementations

Hardwired Ungapped Shape ⟨q ⟩

hash()hashNext(prev)

C++ Template

Page 16: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Implementations

Hardwired Shape ⟨ ⟩

hash() {

(loop unrolled)}

Page 17: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Polymorphism

hashAll(Sequence, Shape)

hashAll(Sequence, GenericShape)hashAll(Sequence, HardwiredShape ⟨…⟩)hashAll(Sequence, UngappedShape)hashAll(Sequence, HardwiredUngappedShape ⟨…⟩)

Page 18: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Generic Algorithm

hashAll(Sequence, Shape){

for position=0 to Sequence.length():Shape.hash(position)

}

hashAll(Sequence, Shape){

Shape.hash(0)for position=1 to Sequence.length():

Shape.hashNext(position)}

Generic

Hardwired

Ungapped

Hardwired Ungapped

Page 19: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

OOP

Generic

hashAll()

Hardwired Ungapped

hashAll()

HardwiredUngapped

Page 20: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Dynamic Binding

Generic

hashAll()hash()

Hardwired

hash()

Generic.hashAll(Sequence){

for position=0 to Sequence.length():hash(position)

}

virtual is bad!

virtual

Page 21: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Wish ListWe need:

- Polymorphism- Delegation (Inheritance)- Spezialization (Overloading)- Static Binding

Solution:C++ Templates

Page 22: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Template Subclassing

Generic

Hardwired

Ungapped

HardwiredUngapped

Shape ⟨ ⟩

Shape ⟨ Hardwired ⟨ … ⟩ ⟩

Shape ⟨ Ungapped ⟨ ⟩ ⟩

Shape ⟨ Ungapped ⟨ q ⟩ ⟩

Page 23: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Template Subclassing

hashAll(seq, Shape ⟨ ? ⟩ sh){

for position=0 to seq.length():hash(sh, position)

}

hashAll(seq, Shape ⟨ Ungapped ⟨ ? ⟩ ⟩ sh){

hash(sh, 0)for position=1 to seq.length():

hashNext(sh, position)}

Shape ⟨ ⟩

Shape ⟨ Hardwired ⟨ … ⟩ ⟩

Shape ⟨ Ungapped ⟨ ⟩ ⟩

Shape ⟨ Ungapped ⟨ q ⟩ ⟩

Page 24: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Template SubclassingSubclasses specified by template arguments

Animal

Mammal

Rodent

Mouse

Animal ⟨ ⟩

Animal ⟨ Mammal ⟨ ⟩ ⟩

Animal ⟨ Mammal ⟨ Rodent ⟨ ⟩ ⟩ ⟩

Animal ⟨ Mammal ⟨ Rodent ⟨ Mouse ⟩ ⟩ ⟩

Page 25: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Template SubclassingAll animals must eat:

eat( Animal ⟨ ? ⟩ )

Rodents eat greenstuff:eat( Animal ⟨ Mammal ⟨ Rodent ⟨ ? ⟩ ⟩ ⟩ )

Pandas eat bamboo:eat( Animal ⟨ Marsupials ⟨ Panda ⟨ ? ⟩ ⟩ ⟩ )

Animal ⟨ Mammal ⟨ Human ⟩ ⟩

Page 26: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Template SubclassingAll animals must eat:

eat( Animal ⟨ ? ⟩ )

Rodents eat greenstuff:eat( Animal ⟨ Mammal ⟨ Rodent ⟨ ? ⟩ ⟩ ⟩ )

Koala bears eat eucalyptus:eat( Animal ⟨ Marsupial ⟨ Koala ⟨ ? ⟩ ⟩ ⟩ )

Animal ⟨ Mammal ⟨ Rodent ⟨ Mouse ⟩ ⟩ ⟩

Page 27: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Wish ListWe need:

- Polymorphism- Delegation (Inheritance)- Spezialization (Overloading)- Static Binding

Page 28: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

ConclusionMessage 1:

"Good libraries offer tailor-made suits of the peg."

Message 2:"Never put off till run time

what you can do at compile time."

Message 3:"Use C++ templates

to move work to compile time."

Page 29: Tailored off the peg - Freie Universität · Andreas Döring: "Tailored off the peg" (2007) SeqAn – Sequence Analysis - Searching in long strings - Quite simple concepts - Main

Andreas Döring: "Tailored off the peg" (2007)

Questions?

EOT(end of talk)