![Page 1: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/1.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Methods for Repeat Detection In Nucleotide Sequences
Gary BensonComputer Science, Biology, Bioinformatics
Boston [email protected]
![Page 2: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/2.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Outline
• Classes of Repeats in DNA• Tandem Repeats• Techniques for finding repetitive sequence• Tandem Repeats Database• Variant Tandem Repeats
![Page 3: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/3.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Why Look at Repeats in DNA?
• Repeats make up the largest portion of DNA.– coding sequence (~5% of human DNA) – repetitive sequence (>50% of human DNA)
![Page 4: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/4.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Classes of Repeats in DNA
• Interspersed repeats:– Retrotransposons
• Sines:• Lines:• LTRs
– Transposons
![Page 5: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/5.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Classes of Repeats in DNA
• Inverted repeats• Tandem repeats
– Satellite repeats– microsatellites– minisatellites– VNTR (variable number of tandem repeats)
![Page 6: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/6.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Tandem Repeats
A tandem repeat (TR) is any pattern of nucleotides that has been duplicated so that it appears several times in succession.
For example, the sequence fragment below contains a tandem repeat of the trinucleotide cgt:
tcgctggtcata cgt cgt cgt cgt cgt tacaaacgtcttccgt
![Page 7: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/7.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Approximate Tandem Repeats
More typically, the tandem copies are only approximate due to mutations. Here is an alignment of copies from a human TR from Chromosome 5.
Shown are
and
a consensus pattern
23.7 copies
![Page 8: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/8.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Why are tandem repeats interesting?
• They are associated with human disease:Fragile-X mental retardation Myotonic dystrophy Huntington’s diseaseFriedreich’s ataxiaEpilepsy DiabetesOvarian cancer
• They are often polymorphic, making them valuable genomic markers. Also, they may cause significant variation in the human population.
• They are involved in gene regulation and often may contain transcription factor binding sites.
![Page 9: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/9.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
LOCUS RATIGCA 4461 bp DNA ROD 18-APR-1994
DEFINITION Rat Ig germline epsilon H-chain gene C-region, 3' end.
2881 cgccccaagt aggcttcatc atgctctttg gtttagcaat agcccaaagc aagctatgca 2941 tccatctcag gcccagaggg atgaggagac cagaatcaag acatacccac gcccatccca 3001 cgcccaacca ccaaccacca gcacatcagg ttcacacacc tgagaccagt ggctcccatc 3061 acacacacac acacacacac acacacacac acacacacac acacacaagc ccgtacacat 3121 ccaccatatc cagagacaag tgtctgagtc tgagatacct ctgaggatca ccaatggcag 3181 agtcggccag cacctcagcc tccaggccaa tccttatact ttggcccact gcaggccatg 3241 agagatggag gaggtggagg cctgagctgt ggaaaaccag agacaggaag atggtctgta 3301 ctccaggcca atccttatac tttggcccac tgcaggccat gagagatgga ggaggtggag 3361 gcctgagctg tggaaaacca gagacaggaa gatggtctgt atggagagag tagtaaacca 3421 gattataggg agactgaggc aggagtagag ctcctacaag gccagtagtc taccttagag 3481 tcctataagt ctgggctggg agtccatgtg tcctgacttg ctcctcagat atcacaacca 3541 agattcctgg agccagagtg tgcatgcagg ccctagaaga aatgtggagc ttagagccct 3601 tcctggaggg ccctgggcac tctgaacaaa aggcaattct gtaggctgta tagaggcatc 3661 ctgtcagata cacacacaca tgcacacaca tacacacaca gagacacaga cacacacaca 3721 tgcccacaca catgcataca cacatgcaca cacatacaca cacagagaca cagacacaca 3781 catgcccaca cacatgcata cacacatgca tgcacacaca cacacacaca tacacataca 3841 cacacacaca cacaccccgc aggtagcctt catcatgctg tctagcgata gccctgctga 3901 gggtgggaga tactgggtca tggtgggcac cggagtagaa agagggaatg agcagtcagg 3961 gtcaggggaa aaggacatct gcctccaggg ctgaacagag acttggagca gtcccagagc 4021 aagtgggatg gggagctctg ccactccagt ttcaccagga ctgcctgaga ccagtgaggg
![Page 10: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/10.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
LOCUS RATIGCA 4461 bp DNA ROD 18-APR-1994
DEFINITION Rat Ig germline epsilon H-chain gene C-region, 3' end.
2881 cgccccaagt aggcttcatc atgctctttg gtttagcaat agcccaaagc aagctatgca 2941 tccatctcag gcccagaggg atgaggagac cagaatcaag acatacccac gcccatccca 3001 cgcccaacca ccaaccacca gcacatcagg ttcacacacc tgagaccagt ggctcccatc 3061 acacacacac acacacacac acacacacac acacacacac acacacaagc ccgtacacat 3121 ccaccatatc cagagacaag tgtctgagtc tgagatacct ctgaggatca ccaatggcag 3181 agtcggccag cacctcagcc tccaggccaa tccttatact ttggcccact gcaggccatg 3241 agagatggag gaggtggagg cctgagctgt ggaaaaccag agacaggaag atggtctgta 3301 ctccaggcca atccttatac tttggcccac tgcaggccat gagagatgga ggaggtggag 3361 gcctgagctg tggaaaacca gagacaggaa gatggtctgt atggagagag tagtaaacca 3421 gattataggg agactgaggc aggagtagag ctcctacaag gccagtagtc taccttagag 3481 tcctataagt ctgggctggg agtccatgtg tcctgacttg ctcctcagat atcacaacca 3541 agattcctgg agccagagtg tgcatgcagg ccctagaaga aatgtggagc ttagagccct 3601 tcctggaggg ccctgggcac tctgaacaaa aggcaattct gtaggctgta tagaggcatc 3661 ctgtcagata cacacacaca tgcacacaca tacacacaca gagacacaga cacacacaca 3721 tgcccacaca catgcataca cacatgcaca cacatacaca cacagagaca cagacacaca 3781 catgcccaca cacatgcata cacacatgca tgcacacaca cacacacaca tacacataca 3841 cacacacaca cacaccccgc aggtagcctt catcatgctg tctagcgata gccctgctga 3901 gggtgggaga tactgggtca tggtgggcac cggagtagaa agagggaatg agcagtcagg 3961 gtcaggggaa aaggacatct gcctccaggg ctgaacagag acttggagca gtcccagagc 4021 aagtgggatg gggagctctg ccactccagt ttcaccagga ctgcctgaga ccagtgaggg
![Page 11: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/11.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Similarity Models
Match/mismatch model – Sequences differ only by mismatches:
AAAGCTTCGGAGTGCCCGAAATGCATCGGGGTGCCTGA
Sequence 1:
Sequence 2:
![Page 12: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/12.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Tandem Repeats Finder
An online sequence analysis tool.
OR
A program to download and run locally.
Data from TRF is listed as “simple repeats” at the UCSC genome browser website.
![Page 13: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/13.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Similarity Models
Match/mismatch model – Sequences differ only by mismatches:
AAAGCTTCGGAGTGCCCGAAATGCATCGGGGTGCCTGA
1101101111011111011
Alignments of similar sequences can be represented by bit strings (zeros and ones).
Sequence 1:
Sequence 2:
![Page 14: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/14.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Similarity Models
Match/mismatch model – Sequences differ only by mismatches:
One model parameter required:
p = probability of matching letters in a column = probability of a 1 in the bit string
Sometimes known as a Bernoulli (coin toss) model.
![Page 15: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/15.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Similarity Models
Match/mismatch/indel model – adds indels to sequence differences:
AAAGCTTCGG-AGT--GCCCGAAA-GCATCGGGAGTTAGCCTGA
Sequence 1:
Sequence 2:
![Page 16: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/16.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Similarity Models
Match/mismatch/Indel model – adds indels to sequence differences:
AAAGCTTCGG-AGT--GCCCGAAA-GCATCGGGAGTTAGCCTGA
1121101111211122111011
Alignments of sequences can be represented by strings of numbers in [0,1,2].
Sequence 1:
Sequence 2:
![Page 17: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/17.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Similarity Models
Match/mismatch/Indel model – adds indels to sequence differences:
AAAGCTTCGG-AGT--GCCCGAAA-GCATCGGGAGTTAGCCTGA
1121101111311133111011
If the direction of insertion or deletion matters, we use strings of numbers in [0,1,2,3].
Sequence 1:
Sequence 2:
![Page 18: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/18.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Similarity Models
Match/mismatch/indel model – adds indels to sequence differences:
At least two model parameters required:
p = probability of matching letters in a column = probability of a 1 in the numerical string r = probability of an insertion or deletion
= probability of a 2 or 3 in the numerical string
![Page 19: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/19.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Detecting Similar Sequences
Methods for similarity detection involve some form of scanning the input sequences, usually, with a window of fixed size. Information about the contents of the window is stored. This is called indexing.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8A C G T T G C A G T T G A C T G A C G
![Page 20: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/20.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Indexing
The index is a list of all possible window contents together with a list, for each content, of where it occurs:
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…,TTG,…TTT
![Page 21: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/21.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Building an Index
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8A C G T T G C A G T T G A C T G A C G
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
First sequence:
![Page 22: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/22.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Building an Index
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8A C G T T G C A G T T G A C T G A C G
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0
![Page 23: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/23.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Building an Index
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8A C G T T G C A G T T G A C T G A C G
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1
![Page 24: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/24.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Building an Index
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8A C G T T G C A G T T G A C T G A C G
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1 2
![Page 25: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/25.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Building an Index
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8A C G T T G C A G T T G A C T G A C G
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1 2 3
![Page 26: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/26.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Building an Index
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8A C G T T G C A G T T G A C T G A C G
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1 2 4 3
![Page 27: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/27.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Building an Index
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8A C G T T G C A G T T G A C T G A C G
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1 2 4 3
8
![Page 28: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/28.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Building an Index
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8A C G T T G C A G T T G A C T G A C G
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1 2 4 3
8 9
![Page 29: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/29.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Scanning a new sequence
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1 2 4 3
8 9
T G C A G T T G . . . Second sequence:
![Page 30: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/30.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Scanning a new sequence
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1 2 4 3
8 9
T G C A G T T G . . . Second sequence:
![Page 31: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/31.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Scanning a new sequence
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1 2 4 3
8 9
T G C A G T T G . . . Second sequence:
![Page 32: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/32.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Scanning a new sequence
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1 2 4 3
8 9
T G C A G T T G . . . Second sequence:
![Page 33: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/33.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Scanning a new sequence
AAA, AAC,….,ACG,…,CAG,…,CGT,…GAC,…,GTT,…,TGC,…TTG,…TTT
0 1 2 4 3
8 9
T G C A G T T G . . . Second sequence:
![Page 34: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/34.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Interaction between the similarity model and the index
Once a model is chosen and the index is built, two questions arise:
1. Is it possible to find a match using the window size chosen?
2. How many character matches are likely to be detected with the window size chosen?
![Page 35: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/35.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Q1: Is it possible to find a match?
This is known as the waiting time problem. Waiting Time: How many consecutive positions must
be examined until a run of k ones occurs.
![Page 36: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/36.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Q1: Is it possible to find a match?
This is known as the waiting time problem. Waiting Time: How many consecutive positions must
be examined until a run of k ones occurs.
Specific sequence example:AAAGCTTCGGAGTGCCCGAAATGCATCGGGGTGCCTGA1101101111011111011
Sequence 1:
Sequence 2:
![Page 37: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/37.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Waiting Time Specific Example
AAAGCTTCGGAGTGCCCGAAATGCATCGGGGTGCCTGA1101101111011111011
k waiting time1 12 23 94 105 166 -
Sequence 1:
Sequence 2:
![Page 38: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/38.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Q1: Is it possible to find a match?
Waiting Time: Given a Bernoulli sequence with generating probability p and length n, what is the probability that a run of k ones occurs?
Randomly generated Bernoulli sequence using p: k
1110101111011011010
n
![Page 39: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/39.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Waiting Time Formulas
These calculate the probability of a first occurrence of a run of k ones at every sequence length from 1 to n.
for n ≥ 3, k = 3F(111:n) =
P(1)3 – F(111: n – 1) · P(1) – F(111: n – 2) · P(1)2 – ∑ k = 3 to n – 3 F(111: k) · P(1)3
where:F(111:n) is the probability of a first occurrence of 3 ones in a row
at position n,P(1) is the model probability of a match.
![Page 40: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/40.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Waiting Time Formulas
Predictions: If k = 3, p = .5, n = 121. In what position [0..12] is it most likely to get a first
occurrence of 3 ones in a row?2. By what position will there be a cumulative probability of
30% to see a first occurrence of 3 ones in a row? 3. What is the likely cumulative probability of getting 3 ones in
a row anywhere up to position 12?
![Page 41: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/41.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Waiting Time Formulas
Calculated probabilities:
Probabilities of first occurrence of patterns in coin toss sequences
P(1) P(0) P(1) P(1)^2 P(1)^30.5 0.5 0.5 0.25 0.125
HHHP(111) Position 1 2 3 4 5 6 7 8 9 10 11 12
Probability 0.000000 0.000000 0.125000 0.062500 0.062500 0.062500 0.054688 0.050781 0.046875 0.042969 0.039551 0.036377
Cumulative 0.000000 0.000000 0.125000 0.187500 0.250000 0.312500 0.367188 0.417969 0.464844 0.507813 0.547363 0.583740
![Page 42: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/42.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Q2: How many character matches will be detected?
This is known as the coverage problem. Coverage: Given a Bernoulli sequence with generating
probability p and length n, what is the probability distribution for number of ones contained in runs of k or more ones?
![Page 43: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/43.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Q2: How many character matches will be detected?
Specific sequence example: Let k = 3, n = 19
AAAGCTTCGGAGTGCCCGAAATGCATCGGGGTGCCTGA1101101111011111011
Sequence 1:
Sequence 2:
n
k
![Page 44: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/44.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Q2: How many character matches will be detected?
Specific sequence example: Let k = 3, n = 19
AAAGCTTCGGAGTGCCCGAAATGCATCGGGGTGCCTGA1101101111011111011
Sequence 1:
Sequence 2:
n
Total character matches detected is 9.
![Page 45: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/45.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Data Structure – modified Aho Corasick Tree
Seed is 1*1**1.
Tree represents all patternsobtained by replacing each * by either 0 or 1.
Fail links in AC tree go to longest match between a stringsuffix and a prefix of a pattern.
![Page 46: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/46.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Recurrence Formula
![Page 47: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/47.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Q2: How many character matches will be detected?
![Page 48: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/48.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Basic Assumption
We assume that two, mutated, adjacent copies of a pattern will contain runs of exact matches.
d d
T A T A C G T C T C C A C G G A
![Page 49: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/49.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Basic Assumption
We assume that two, mutated, adjacent copies of a pattern will contain runs of exact matches.
d d
T A T A C G T C T C C A C G G A
We identify the runs with seeds.
![Page 50: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/50.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Basic Assumption
We assume that two, mutated, adjacent copies of a pattern will contain runs of exact matches.
d d
T A T A C G T C T C C A C G G A
![Page 51: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/51.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
The TRF Algorithm Outline
![Page 52: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/52.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Criteria for Recognition• Are there enough matches at a common distance?• Are there enough matches if nearby distances are
included?• Do the matches start close enough to the left end?
![Page 53: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/53.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
![Page 54: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/54.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Tools
![Page 55: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/55.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Selecting a Data Set
![Page 56: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/56.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Viewing a Data Set
![Page 57: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/57.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
TRF Characteristics Table
![Page 58: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/58.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Filter for large patterns with many copies
![Page 59: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/59.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
More information about a single repeat
![Page 60: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/60.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Single repeat view
![Page 61: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/61.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Filter for Gene Overlap or Proximity
![Page 62: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/62.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Filters for Triplets in Genes
![Page 63: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/63.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Triplets in Genes
![Page 64: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/64.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Changing Visible Columns
![Page 65: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/65.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Changing Visible Columns
![Page 66: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/66.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Link for Annotations
![Page 67: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/67.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Annotations
![Page 68: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/68.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Information link to the Source Database
![Page 69: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/69.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Following the Information link to the UCSC Browser
![Page 70: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/70.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
The TRDB Browser link
![Page 71: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/71.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
The TRDB Browser
![Page 72: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/72.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Distributions for a Data Set
![Page 73: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/73.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Pattern size distribution Human chr. I: size 1 - 60
![Page 74: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/74.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Pattern size distribution Human chr. I: sizes 60 - 120
![Page 75: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/75.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Pattern size distribution Drosophila chr. 2R: size 1 - 60
![Page 76: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/76.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Clustering repeats
![Page 77: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/77.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Clustering repeats
![Page 78: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/78.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Human Chr. 15 Family
![Page 79: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/79.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Human Chr. 1 Family
![Page 80: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/80.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Data Download
![Page 81: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/81.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Data Download
![Page 82: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/82.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Using TRF on your own data
![Page 83: Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010 Methods for Repeat Detection In Nucleotide Sequences Gary](https://reader031.vdocuments.us/reader031/viewer/2022013011/56649d2d5503460f94a04a06/html5/thumbnails/83.jpg)
Bioinformatic and Comparative Genome Analysis course, Institut Pasteur, July 5 - July 17, 2010
Uploading a Sequence