csc 448: bioninformatics algorithms alex dekhtyar ukkonen’s algorithm for generalized suffix trees

11
LESSON 6 THE ROLE OF NEUROTRANSMITTERS IN ACHIEVEING ACADEMIC SUCCESS UNDER CONSTRUCTION Presented by THE NATURAL SYSTEMS INSTITUTE

Upload: jennifer-quinn

Post on 16-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

CSC 448: Bioninformatics Algorithms

Alex Dekhtyar

Ukkonen’s Algorithm for Generalized Suffix Trees

Page 2: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

Example for two DNA sequences: T and T’=reverse(complement(T))

T = AATGTT

T’ = AACATT

Page 3: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

Steps

1. Create SuffixTree(T$) using Ukkonen’s algorithm.Keep suffix links.

2. Add “T:” to all leaf labels (designate current labels)

3. Traverse SuffixTree(T$) using the prefix of T’The stoppage point is new active point

4. Use Ukkonen’s algorithm to insert the remainder of T’4.1. Label leaves “T’: [x, ∞]”4.2. modification: traverse to existing leaves to leave a label

Page 4: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

T = AATGTT T’ = AACATT

Tree Trie

ε

ε

Page 5: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

T = AATGTT T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 1: insert fist string

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

Page 6: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

T = AATGTT T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 1: insert fist string

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

Last boundary path

- Last active point

Page 7: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

TA

T = AATGTT T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 1: insert fist string

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

Last boundary path

- Last active point

2,∞

A

3,∞ 4,∞

T

4 ,∞

G

6,∞

T G

Last active point

Page 8: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

TA

T = AATGTT$ T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 1: insert fist stringStep 1.5: finish the tree

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

Last boundary path

- Last active point

2,∞

A

3,∞ 4,∞

T

4 ,∞

G

6,∞

T

G

Last active point

7,∞

$

7,∞$

Page 9: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

TA

T = AATGTT$ T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 2: Traverse the prefix of T’

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

Last boundary path

- Last active point

2,∞

A

3,∞ 4,∞

T

4 ,∞

G

6,∞

T

G

7,∞

$

7,∞$

New active point

Page 10: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

TA

T = AATGTT$ T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 2: Traverse the prefix of T’Step 3: Start inserting the rest of T’

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

- active point

2,∞

A

3,∞ 4,∞

T

4 ,∞

G

6,∞

T

G

7,∞

$

7,∞$

AAC

AC

C

Page 11: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

TA

T = AATGTT$ T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 2: Traverse the prefix of T’Step 3: Start inserting the rest of T’

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

- active point

T:2,∞

A

T:3,∞ T:4,∞

T

T:4,∞

G

T:6,∞

T

G

T:7,∞

$

T:7,∞$

AAC

AC

C

Make leaf nodes “generalized”

Page 12: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

TA

T = AATGTT$ T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 2: Traverse the prefix of T’Step 3: Start inserting the rest of T’

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

- active point

T:2,∞

A

T:3,∞ T:4,∞

T

T:4,∞

G

T:6,∞

T

G

T:7,∞

$

T:7,∞$

AAC

AC

C

T’:3,∞

C

TT’:3,∞

C

T’:3,∞ C

Page 13: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

TA

T = AATGTT$ T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 2: Traverse the prefix of T’Step 3: Start inserting the rest of T’

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

- active point

T:2,∞

A

T:3,∞ T:4,∞

T

T:4,∞

G

T:6,∞

T

G

T:7,∞

$

T:7,∞$

AAC

AC

C

T’:3,∞

C

TT’:3,∞

C

T’:3,∞ C

AACA

ACA

CA

- end point

Nothing to do!

Page 14: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

TA

T = AATGTT$ T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 2: Traverse the prefix of T’Step 3: Start inserting the rest of T’

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

- active point

T:2,∞

A

T:3,∞ T:4,∞

T

T:4,∞

G

T:6,∞

T

G

T:7,∞

$

T:7,∞$

AAC

AC

C

T’:3,∞

C

TT’:3,∞

C

T’:3,∞ C

AACA

ACA

CA

- end point

AACAT

ACAT

CAT

Nothing to do!

Page 15: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

TA

T = AATGTT$ T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 2: Traverse the prefix of T’Step 3: Start inserting the rest of T’

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

- active point

T:2,∞

A

T:3,∞ T:4,∞

T

T:4,∞

G

T:6,∞

T

G

T:7,∞

$

T:7,∞$

AAC

AC

C

T’:3,∞

C

TT’:3,∞

C

T’:3,∞ C

AACA

ACA

CA

- end point

AACAT

ACAT

CAT

ATT

G

T’:6,∞

T

Page 16: CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

TA

T = AATGTT$ T’ = AACATT

Tree Trie

A

AA

AAT

AATG

AATGT

AATGTT

ε

ε

Step 2: Traverse the prefix of T’Step 3: Start inserting the rest of T’

T

AT

ATG

TG

G

ATGT

TGT

GT

ATGTT

TGTT

GTT

TT

- active point

T:2,∞

A

T:3,∞ T:4,∞

T

T:4,∞

G

T:6,∞

T

G

T:7,∞

$

T:7,∞$

AAC

AC

C

T’:3,∞

C

TT’:3,∞

C

T’:3,∞ C

AACA

ACA

CA

- end point

AACAT

ACAT

CAT

ATT

G

T’:6,∞

T

Crucial bit coming!

T’:6,∞