exploring protein sequences
DESCRIPTION
Tutorial 5. Exploring Protein Sequences. Exploring Protein Sequences. Multiple alignment ClustalW Motif discovery MEME Jaspar. A. C. D. B. Multiple Sequence Alignment. More than two sequences DNA Protein Evolutionary relation Homology Phylogenetic tree Detect motif. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/1.jpg)
Exploring Protein Sequences
Tutorial 5
![Page 2: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/2.jpg)
Exploring Protein Sequences
• Multiple alignment– ClustalW
• Motif discovery– MEME– Jaspar
![Page 3: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/3.jpg)
• More than two sequences– DNA– Protein
• Evolutionary relation– Homology Phylogenetic tree– Detect motif
Multiple Sequence Alignment
GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC
A
D B
CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC
![Page 4: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/4.jpg)
• Dynamic Programming– Optimal alignment– Exponential in #Sequences
• Progressive– Efficient– Heuristic
Multiple Sequence Alignment
GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC
A
D B
CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC
![Page 5: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/5.jpg)
ClustalW
“CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al
![Page 6: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/6.jpg)
• Progressive– At each step align two existing alignments or sequences
– Gaps present in older alignments remain fixed
ClustalW
GTCGTAGTCG-GC-TGTC-TAG-CGAGCGTGC-GAAG-AG-GCG-GCCGTCG-CG-TCGT
GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC
![Page 7: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/7.jpg)
ClustalW - InputScoring matrix
Gap scoring
Input sequences
![Page 8: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/8.jpg)
ClustalW - Output
![Page 9: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/9.jpg)
ClustalW - Output
Input sequences
Pairwise alignment scores
Building alignment
Final score
![Page 10: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/10.jpg)
ClustalW - Output
![Page 11: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/11.jpg)
ClustalW Output
Sequence names Sequence positions
Match strength in decreasing order: * : .
![Page 12: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/12.jpg)
http://http://www.megasoftware.net/
![Page 13: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/13.jpg)
Can we find motifs using multiple sequence alignment?
1 2 3 4 5 6 7 8 9 10
A 0 0 0 0 0 0.5 1/6 1/3 0 0
D 0 0.5 1/3 0 0 1/6 5/6 1/6 0 1/6
E 0 0 2/3 1 0 0 0 0 1 5/6
G 0 1/6 0 0 1 1/3 0 0 0 0
H 0 1/6 0 0 0 0 0 0 0 0
N 0 1/6 0 0 0 0 0 0 0 0
Y 1 0 0 0 0 0 0.5 0.5 0 0
1 3 5 7 9..YDEEGGDAEE....YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE.. * :** *:
MotifA widespread pattern with a biological significance
![Page 14: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/14.jpg)
Can we find motifs using multiple sequence alignment?
YES! NO
![Page 15: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/15.jpg)
MEME – Multiple EM for Motif finding
• http://meme.sdsc.edu/• Motif discovery from unaligned sequences
– Genomic or protein sequences• Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence)
![Page 16: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/16.jpg)
MEME - InputEmail address
Multiple input sequences
How many times in each sequence?
How many motifs?
How many sites?
Range of motif lengths
![Page 17: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/17.jpg)
MEME - OutputMotif length
Number of times
Like BLAST
![Page 18: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/18.jpg)
MEME - Output
Probability * 10
‘a’=10, ‘:’=0
![Page 19: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/19.jpg)
MEME - Output
Low uncertainty
=
High information content
![Page 20: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/20.jpg)
MEME - Output
Multilevel Consensus
![Page 21: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/21.jpg)
Sequence names
Reverse complement (genomic input only)
Position in
sequence
Strength of match
Motif within sequence
MEME - Output
![Page 22: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/22.jpg)
Overall strength of motif matches
sequence lengths
Motif instance
MEME - Output
‘-’=Other strand
![Page 23: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/23.jpg)
MAST• Searches for motifs (one or more) in sequence databases:– Like BLAST but motifs for input– Similar to iterations of PSI-BLAST
• Profile defines strength of match– Multiple motif matches per sequence– Combined E value for all motifs
• MEME uses MAST to summarize results: – Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.
![Page 24: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/24.jpg)
JASPAR• Profiles
– Transcription factor binding sites– Multicellular eukaryotes– Derived from published collections of
experiments
• Open data accesss
![Page 25: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/25.jpg)
JASPAR• profiles
– Modeled as matrices.– can be converted into PSSM for scanning
genomic sequences.
1 2 3 4 5 6 7 8 9 10
A 0 0 0 0 0 0.5 1/6 1/3 0 0
D 0 0.5 1/3 0 0 1/6 5/6 1/6 0 1/6
E 0 0 2/3 1 0 0 0 0 1 5/6
G 0 1/6 0 0 1 1/3 0 0 0 0
H 0 1/6 0 0 0 0 0 0 0 0
N 0 1/6 0 0 0 0 0 0 0 0
Y 1 0 0 0 0 0 0.5 0.5 0 0
![Page 26: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/26.jpg)
Search profile
http://jaspar.cgb.ki.se/
![Page 27: Exploring Protein Sequences](https://reader035.vdocuments.us/reader035/viewer/2022081520/568159a6550346895dc70a3a/html5/thumbnails/27.jpg)
http://jaspar.cgb.ki.se/