embo global exchange lecture course on bioinformatics and comparative genome analysis 13 dec – 18...

87
EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010 Methods for Repeat Detection In Nucleotide Sequences Gary Benson Computer Science, Biology, Bioinformatics Boston University [email protected]

Upload: rudolf-bridges

Post on 28-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Methods for Repeat Detection In Nucleotide Sequences

Gary BensonComputer Science, Biology, Bioinformatics

Boston [email protected]

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Outline

• Classes of Repeats in DNA• Tandem Repeats• Techniques for finding repetitive sequence• Tandem Repeats Database

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Why Look at Repeats in DNA?

• Repeats make up the largest portion of DNA.– coding sequence (~5% of human DNA) – repetitive sequence (>50% of human DNA)

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Classes of Repeats in DNA

• Interspersed repeats:o Retrotransposons

• Sines:• Lines:• LTRs

o Transposons

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Classes of Repeats in DNA

• Inverted repeats• Tandem repeats

o Satellite repeatsomicrosatellitesominisatelliteso VNTR (variable number of tandem repeats)

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Tandem Repeats

A tandem repeat (TR) is any pattern of nucleotides that has been duplicated so that it appears several times in a row.

For example, the sequence fragment below contains a tandem repeat of the trinucleotide cgt:

tcgctggtcata cgt cgt cgt cgt cgt tacaaacgtcttccgt

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Approximate Tandem Repeats

More typically, the tandem copies are only approximate due to mutations. Here is an alignment of copies from a human TR from Chromosome 5.

Shown are

and

a consensus pattern

23.7 copies

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Why are tandem repeats interesting?

• They are associated with human disease:Fragile-X mental retardation Myotonic dystrophy Huntington’s diseaseFriedreich’s ataxiaEpilepsy DiabetesOvarian cancer

• They are often polymorphic, making them valuable genomic markers.

• They are involved in gene regulation and often may contain transcription factor binding sites.

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

LOCUS RATIGCA 4461 bp DNA ROD 18-APR-1994

DEFINITION Rat Ig germline epsilon H-chain gene C-region, 3' end.

2881 cgccccaagt aggcttcatc atgctctttg gtttagcaat agcccaaagc aagctatgca 2941 tccatctcag gcccagaggg atgaggagac cagaatcaag acatacccac gcccatccca

3001 cgcccaacca ccaaccacca gcacatcagg ttcacacacc tgagaccagt ggctcccatc

3061 acacacacac acacacacac acacacacac acacacacac acacacaagc ccgtacacat

3121 ccaccatatc cagagacaag tgtctgagtc tgagatacct ctgaggatca ccaatggcag

3181 agtcggccag cacctcagcc tccaggccaa tccttatact ttggcccact gcaggccatg

3241 agagatggag gaggtggagg cctgagctgt ggaaaaccag agacaggaag atggtctgta

3301 ctccaggcca atccttatac tttggcccac tgcaggccat gagagatgga ggaggtggag

3361 gcctgagctg tggaaaacca gagacaggaa gatggtctgt atggagagag tagtaaacca

3421 gattataggg agactgaggc aggagtagag ctcctacaag gccagtagtc taccttagag

3481 tcctataagt ctgggctggg agtccatgtg tcctgacttg ctcctcagat atcacaacca

3541 agattcctgg agccagagtg tgcatgcagg ccctagaaga aatgtggagc ttagagccct

3601 tcctggaggg ccctgggcac tctgaacaaa aggcaattct gtaggctgta tagaggcatc

3661 ctgtcagata cacacacaca tgcacacaca tacacacaca gagacacaga cacacacaca

3721 tgcccacaca catgcataca cacatgcaca cacatacaca cacagagaca cagacacaca

3781 catgcccaca cacatgcata cacacatgca tgcacacaca cacacacaca tacacataca

3841 cacacacaca cacaccccgc aggtagcctt catcatgctg tctagcgata gccctgctga

3901 gggtgggaga tactgggtca tggtgggcac cggagtagaa agagggaatg agcagtcagg

3961 gtcaggggaa aaggacatct gcctccaggg ctgaacagag acttggagca gtcccagagc

4021 aagtgggatg gggagctctg ccactccagt ttcaccagga ctgcctgaga ccagtgaggg

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

LOCUS RATIGCA 4461 bp DNA ROD 18-APR-1994

DEFINITION Rat Ig germline epsilon H-chain gene C-region, 3' end.

2881 cgccccaagt aggcttcatc atgctctttg gtttagcaat agcccaaagc aagctatgca 2941 tccatctcag gcccagaggg atgaggagac cagaatcaag acatacccac gcccatccca

3001 cgcccaacca ccaaccacca gcacatcagg ttcacacacc tgagaccagt ggctcccatc

3061 acacacacac acacacacac acacacacac acacacacac acacacaagc ccgtacacat

3121 ccaccatatc cagagacaag tgtctgagtc tgagatacct ctgaggatca ccaatggcag

3181 agtcggccag cacctcagcc tccaggccaa tccttatact ttggcccact gcaggccatg

3241 agagatggag gaggtggagg cctgagctgt ggaaaaccag agacaggaag atggtctgta

3301 ctccaggcca atccttatac tttggcccac tgcaggccat gagagatgga ggaggtggag

3361 gcctgagctg tggaaaacca gagacaggaa gatggtctgt atggagagag tagtaaacca

3421 gattataggg agactgaggc aggagtagag ctcctacaag gccagtagtc taccttagag

3481 tcctataagt ctgggctggg agtccatgtg tcctgacttg ctcctcagat atcacaacca

3541 agattcctgg agccagagtg tgcatgcagg ccctagaaga aatgtggagc ttagagccct

3601 tcctggaggg ccctgggcac tctgaacaaa aggcaattct gtaggctgta tagaggcatc

3661 ctgtcagata cacacacaca tgcacacaca tacacacaca gagacacaga cacacacaca

3721 tgcccacaca catgcataca cacatgcaca cacatacaca cacagagaca cagacacaca

3781 catgcccaca cacatgcata cacacatgca tgcacacaca cacacacaca tacacataca

3841 cacacacaca cacaccccgc aggtagcctt catcatgctg tctagcgata gccctgctga

3901 gggtgggaga tactgggtca tggtgggcac cggagtagaa agagggaatg agcagtcagg

3961 gtcaggggaa aaggacatct gcctccaggg ctgaacagag acttggagca gtcccagagc

4021 aagtgggatg gggagctctg ccactccagt ttcaccagga ctgcctgaga ccagtgaggg

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Tandem Repeats Finder

An online sequence analysis tool.

OR

A program to download and run locally.

Data from TRF is listed as “simple repeats” at the UCSC genome browser website.

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Similarity Models

Match/mismatch model – Sequences differ only by mismatches:

AAAGCTTCGGAGTGCCCGA

AATGCATCGGGGTGCCTGA

Sequence 1:

Sequence 2:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Similarity Models

Match/mismatch model – Sequences differ only by mismatches:

AAAGCTTCGGAGTGCCCGA

AATGCATCGGGGTGCCTGA

1101101111011111011

Alignments of similar sequences can be represented by bit strings (zeros and ones).

Sequence 1:

Sequence 2:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Similarity Models

Match/mismatch model – Sequences differ only by mismatches:

One model parameter required:

p = probability of matching letters in a column = probability of a 1 in the bit string

Sometimes known as a Bernoulli (coin toss) model.

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Similarity Models

Match/mismatch/indel model – adds indels to sequence differences:

AAAGCTTCGG-AGT--GCCCGA

AA-GCATCGGGAGTTAGCCTGA

Sequence 1:

Sequence 2:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Similarity Models

Match/mismatch/Indel model – adds indels to sequence differences:

AAAGCTTCGG-AGT--GCCCGA

AA-GCATCGGGAGTTAGCCTGA

1121101111211122111011

Alignments of sequences can be represented by strings of numbers in [0,1,2].

Sequence 1:

Sequence 2:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Similarity Models

Match/mismatch/indel model – adds indels to sequence differences:

At least two model parameters required:

p = probability of matching letters in a column = probability of a 1 in the numerical string r = probability of an insertion or deletion

= probability of a 2 or 3 in the numerical string

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Detecting Similar Sequences

Methods for similarity detection involve some form of scanning the input sequences, usually, with a window of fixed size. Information about the contents of the window is stored. This is called indexing.

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

A C G T T G C A G T T G A C T G A C G

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Indexing

The index is a list of all possible window contents together with a list, for each content, of where it occurs:

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…,TTG,…TTT

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Building an Index

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

A C G T T G C A G T T G A C T G A C G

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

First sequence:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Building an Index

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

A C G T T G C A G T T G A C T G A C G

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Building an Index

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

A C G T T G C A G T T G A C T G A C G

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Building an Index

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

A C G T T G C A G T T G A C T G A C G

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1 2

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Building an Index

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

A C G T T G C A G T T G A C T G A C G

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1 2 3

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Building an Index

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

A C G T T G C A G T T G A C T G A C G

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1 2 4 3

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Building an Index

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

A C G T T G C A G T T G A C T G A C G

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1 2 4 3

8

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Building an Index

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

A C G T T G C A G T T G A C T G A C G

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1 2 4 3

8 9

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Scanning a new sequence

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1 2 4 3

8 9

T G C A G T T G . . .

Second sequence:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Scanning a new sequence

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1 2 4 3

8 9

T G C A G T T G . . .

Second sequence:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Scanning a new sequence

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1 2 4 3

8 9

T G C A G T T G . . .

Second sequence:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Scanning a new sequence

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1 2 4 3

8 9

T G C A G T T G . . .

Second sequence:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Scanning a new sequence

AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT

0 1 2 4 3

8 9

T G C A G T T G . . .

Second sequence:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Test subject sequence at matching locations

Indexed sequence: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 A C G T T G C A G T T G A C T G A C GSubject seq: T G C A G T T G . . .

Indexed sequence: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 A C G T T G C A G T T G A C T G A C GSubject seq: T G C A G T T G . . .

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Interaction between the similarity model and the index

Once a model is chosen and the index is built, two questions arise:

1. Is it possible to find a match using the window size chosen?

2. How many character matches are likely to be detected with the window size chosen?

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Q1: Is it possible to find a match?

This is known as the waiting time problem. Waiting Time: How many consecutive positions must

be examined until a run of k ones occurs.

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Q1: Is it possible to find a match?

This is known as the waiting time problem. Waiting Time: How many consecutive positions must

be examined until a run of k ones occurs.

Specific sequence example:AAAGCTTCGGAGTGCCCGA

AATGCATCGGGGTGCCTGA

1101101111011111011

Sequence 1:

Sequence 2:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Waiting Time Specific ExampleAAAGCTTCGGAGTGCCCGA

AATGCATCGGGGTGCCTGA

1101101111011111011

k waiting time1 12 23 94 105 166 -

Sequence 1:

Sequence 2:

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Q1: Is it possible to find a match?

Waiting Time: Given a Bernoulli sequence with generating probability p and length n, what is the probability that a run of k ones occurs?

Randomly generated Bernoulli sequence using p: k

1110101111011011010

n

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Waiting Time Formulas

These calculate the probability of a first occurrence of a run of k ones at every sequence length from 1 to n.

for n ≥ 3, k = 3F(111:n) =

P(1)3 – F(111: n – 1) · P(1) – F(111: n – 2) · P(1)2 – ∑ k = 3 to n – 3 F(111: k) · P(1)3

where:F(111:n) is the probability of a first occurrence of 3 ones in a row at position

n,P(1) is the model probability of a match.

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Waiting Time Formulas

Predictions: If k = 3, p = .5, n = 121. In what position [0..12] is it most likely to get a first

occurrence of 3 ones in a row?2. By what position will there be a cumulative probability of 30%

to see a first occurrence of 3 ones in a row? 3. What is the likely cumulative probability of getting 3 ones in a

row anywhere up to position 12?

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Coin Flip ExperimentGoal: To estimate the probability of finding three heads in a row

at least once when tossing a coin n times.The Experiment: Each student tosses a coin until the first

occurrence of 3 heads in a row. This is one trial. Record the flip where the 3rd head is found. If twelve tosses do not produce three heads in a row, stop and begin a new trial. Repeat.

EXAMPLE: Note that the first occurrence of three heads in a row is marked in bold. 1 2 3 4 5 6 7 8 9 10 11 12

Trial 1: H T H H T H H HTrial 2: T T H H H Trial 3: H T T H H T H H H Trial 3: H T T H H T H T H H T T

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Waiting Time Formulas

Calculated probabilities:

Probabilities of first occurrence of patterns in coin toss sequences

P(1) P(0) P(1) P(1)^2 P(1)^30.5 0.5 0.5 0.25 0.125

HHHP(111) Position 1 2 3 4 5 6 7 8 9 10 11 12

Probability 0.000000 0.000000 0.125000 0.062500 0.062500 0.062500 0.054688 0.050781 0.046875 0.042969 0.039551 0.036377

Cumulative 0.000000 0.000000 0.125000 0.187500 0.250000 0.312500 0.367188 0.417969 0.464844 0.507813 0.547363 0.583740

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Q2: How many character matches will be detected?

This is known as the coverage problem. Coverage: Given a Bernoulli sequence with generating

probability p and length n, what is the probability distribution for number of ones contained in runs of k or more ones?

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Q2: How many character matches will be detected?

Specific sequence example: Let k = 3, n = 19

AAAGCTTCGGAGTGCCCGA

AATGCATCGGGGTGCCTGA

1101101111011111011

Sequence 1:

Sequence 2:

n

k

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Q2: How many character matches will be detected?

Specific sequence example: Let k = 3, n = 19

AAAGCTTCGGAGTGCCCGA

AATGCATCGGGGTGCCTGA

1101101111011111011

Sequence 1:

Sequence 2:

n

Total character matches detected is 9.

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Data Structure – modified Aho Corasick Tree

Seed is 1*1**1.

Tree represents all patternsobtained by replacing each * by either 0 or 1.

Fail links in AC tree go to longest match between a stringsuffix and a prefix of a pattern.

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Recurrence Formula

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Q2: How many character matches will be detected?

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Finding Tandem Repeats:Basic Assumption

We assume that two, mutated, adjacent copies of a pattern will contain runs of exact matches.

d d

T A T A C G T C T C C A C G G A

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

We assume that two, mutated, adjacent copies of a pattern will contain runs of exact matches.

d d

T A T A C G T C T C C A C G G A

We identify the runs with seeds.

Finding Tandem Repeats:Basic Assumption

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

We assume that two, mutated, adjacent copies of a pattern will contain runs of exact matches.

d d

T A T A C G T C T C C A C G G A

Finding Tandem Repeats:Basic Assumption

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

The TRF Algorithm Outline

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Criteria for Recognition• Are there enough matches at a common distance?• Are there enough matches if nearby distances are

included (random walk)?• Do the matches start close enough to the left end?

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Part 2The Tandem Repeats Database

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Tools

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Selecting a Data Set

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Viewing a Data Set

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

TRF Characteristics Table

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Filter for large patterns with many copies

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

More information about a single repeat

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Single repeat view

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Filter for Gene Overlap or Proximity

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Filters for Triplets in Genes

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Triplets in Genes

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Changing Visible Columns

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Changing Visible Columns

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Link for Annotations

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Annotations

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Information link to the Source Database

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Following the Information link to the UCSC Browser

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

The TRDB Browser link

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

The TRDB Browser

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Distributions for a Data Set

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Pattern size distribution Human chr. I: size 1 - 60

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Pattern size distribution Human chr. I: sizes 60 - 120

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Pattern size distribution Drosophila chr. 2R: size 1 - 60

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Clustering repeats

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Clustering repeats

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Human Chr. 15 Family

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Human Chr. 1 Family

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Data Download

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Data Download

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Using TRF on your own data

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Uploading a Sequence

EMBO Global Exchange Lecture Course on Bioinformatics and Comparative Genome Analysis 13 Dec – 18 Dec 2010

Running TRF on a Sequence