Download - Some frequently-used Bioinformatics Tools
![Page 1: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/1.jpg)
MGM workshop. 19 Oct 2010
Some frequently-used Some frequently-used Bioinformatics ToolsBioinformatics Tools
Konstantinos Mavrommatis
Prokaryotic Superprogram
![Page 2: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/2.jpg)
MGM workshop. 19 Oct 2010
OutlineOutline
Pairwise Alignment Global/Local, Scoring BLAST, BLAT, SIM, LALIGN, Dotlet, Ublast
Multiple Sequence Alignment ClustalW, Kalign, MAFFT, Muscle, T-Coffee, MSA,
DIALIGN, Match-Box, Multalin, MUSCA
Phylogenetic analysis and tree construction BIONJ, DendroUPGMA, PHYLIP, PhyML, Phylogeny.fr,
POWER, BlastO, TraceSuite II
HMM Protein family profiles
http://expasy.org/tools/
![Page 3: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/3.jpg)
MGM workshop. 19 Oct 2010
AlignmentAlignment
Insert spaces in arbitrary locations -> same length and no two spaces in the same position.
Find arrangement of two sequences to identify regions of similarity
![Page 4: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/4.jpg)
MGM workshop. 19 Oct 2010
Alignment methods: Dot Alignment methods: Dot plotsplots
![Page 5: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/5.jpg)
MGM workshop. 19 Oct 2010
Global vs Local Global vs Local alignmentalignment
Global alignment: An alignment that assumes that the two sequences are basically similar over the entire length of one another
Local alignment: An alignment that searches for segments of the two sequences that match well
It may seem that one should always use local alignments! However each has its application
![Page 6: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/6.jpg)
MGM workshop. 19 Oct 2010
Substitution matricesSubstitution matrices
http://www.russelllab.org/aas/
![Page 7: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/7.jpg)
MGM workshop. 19 Oct 2010
Scoring an alignmentScoring an alignment
![Page 8: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/8.jpg)
MGM workshop. 19 Oct 2010
Global alignmentGlobal alignment
S1=HGSAQVKGHGS2=KTEAEMKASEDLKKHGT
![Page 9: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/9.jpg)
MGM workshop. 19 Oct 2010
KTEAEMKAESEDLKKHGT--HG--SA--Q-VKGHG-
![Page 10: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/10.jpg)
MGM workshop. 19 Oct 2010
Local AlignmentLocal Alignment
![Page 11: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/11.jpg)
MGM workshop. 19 Oct 2010
How BLAST worksHow BLAST works
MLVTTILAFALFKNSYAQQCGSQAGGALCSNRLCCSKFGYCGSTDPYCGTGCQSQCGGGG
VVWMLLVGGSYGVQCGTEAGGALCPRGLCCSQWGWCGSTIDYCGPGCQSQCGG
Common 3mer
GCQSQCGG extend
Query
Subject (database)
++ L SY QCG++AGGALC LCCS++G+CGST YCG GCQSQCGG
HSP
Score = 66.6 bits (161), Expect = 3e-12, Method: Compositional matrix adjust. Identities = 32/53 (60%), Positives = 39/53 (74%), Gaps = 0/53 (0%)
Query 6 ILAFALFKNSYAQQCGSQAGGALCSNRLCCSKFGYCGSTDPYCGTGCQSQCGG 58 ++ L SY QCG++AGGALC LCCS++G+CGST YCG GCQSQCGGSbjct 15 VVWMLLVGGSYGVQCGTEAGGALCPRGLCCSQWGWCGSTIDYCGPGCQSQCGG 67
![Page 12: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/12.jpg)
MGM workshop. 19 Oct 2010
Types of BlastTypes of Blast
Nucleic sequence:atcgatatatatagactgactgact
Protein sequence:MTAVYHILRALRARARVARARVH
6 frame translation
Nucleic acids sequence database
Nucleic acids sequence database
Protein seqeunces database
Protein seqeunces database
blastnblastn
blastpblastp
6 frame translationtblastxtblastx
blastxblastx
tblastntblastn
DatabaseQuery
![Page 13: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/13.jpg)
MGM workshop. 19 Oct 2010
![Page 14: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/14.jpg)
MGM workshop. 19 Oct 2010
Exact multiple alignment by Exact multiple alignment by dynamic programmingdynamic programming
Compexity= O(nS2SS2)N: length of sequencesS: number of sequences
Only feasible for 4-5 sequences max.
![Page 15: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/15.jpg)
MGM workshop. 19 Oct 2010
![Page 16: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/16.jpg)
MGM workshop. 19 Oct 2010
Neighbor JoiningNeighbor Joining
![Page 17: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/17.jpg)
MGM workshop. 19 Oct 2010
Unrooted NJ treeUnrooted NJ tree
![Page 18: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/18.jpg)
MGM workshop. 19 Oct 2010
Comparison of Multiple Comparison of Multiple sequence alignment sequence alignment
programsprograms
![Page 19: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/19.jpg)
MGM workshop. 19 Oct 2010
Primary sequence Primary sequence changes:changes:
![Page 20: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/20.jpg)
MGM workshop. 19 Oct 2010
ProfilesProfiles
CGGSV
0.8 * 0.4 * 0.8 * 0.6 * 0.2 = .031
ln(0.8)+ln(0.4)+ln(0.8)+ln(0.6)+ln(0.2) = -3.48
![Page 21: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/21.jpg)
MGM workshop. 19 Oct 2010
Hidden Markov ModelsHidden Markov Models
Assumptions: Observations are ordered Random process can be represented by a stochastic finite
state machine with emitting states
Probabilistic parameters of a Hidden Markov Model
x – states, y – possible observations
a – state transition probabilities, b –output/emision probabilities
![Page 22: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/22.jpg)
MGM workshop. 19 Oct 2010
HMM estimation, usage & HMM estimation, usage & applicationsapplications
Training/Estimation
Feed an architecture (given in advance) a set of observation sequences
The training process will iteratively alter its parameters to fit the training set
The trained model will assign the training sequences high probabilities
Usage
Evaluate the probability of an observation sequence given the model (Forward)
Find the most likely path through the model for a given observation sequence (Viterbi)
Applications
Gene finding
Protein family modeling
…
![Page 23: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/23.jpg)
MGM workshop. 19 Oct 2010
Profile HMMsProfile HMMs
Families of functional biological sequences Primary sequences have diverged due to evolution,
while maintaining structure/function. Questions:
Does a biological sequence belong to a certain protein family? For example is a given protein (sequence) a globin?
Given a set of sequences, find more sequences of the same family
![Page 24: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/24.jpg)
MGM workshop. 19 Oct 2010
![Page 25: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/25.jpg)
MGM workshop. 19 Oct 2010
Trade offsTrade offs
Advandages Disadvandages•Statistics•Modularity•Transparency•Prior knowledge
•State independence•Over – fitting•Local maximums•Speed
![Page 26: Some frequently-used Bioinformatics Tools](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814f58550346895dbd0389/html5/thumbnails/26.jpg)
MGM workshop. 19 Oct 2010
Questions?