multiple sequence alignment. definition homology: related by descent homologous sequence positions...
TRANSCRIPT
![Page 1: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/1.jpg)
Multiple Sequence Alignment
![Page 2: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/2.jpg)
Definition
• Homology: related by descent
• Homologous sequence positions
ATTGCGC ATTGCGC
ATCCGCC
ATTGCGC AT-CCGC
ATTGCGC
![Page 3: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/3.jpg)
Reasons for aligning sets of sequences
• Organise data to reflect sequence homology
• Infer phylogenetic trees from homologous sites
• Highlight conserved sites/regions
• Highlight variable sites/regions
• Uncover changes in gene structure
• Summarise information
![Page 4: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/4.jpg)
Alignments help to
Organise
Visualise
Analyze
Sequence Data
![Page 5: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/5.jpg)
The process of aligning sequences is a game involving playing off gaps and mismatches
![Page 6: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/6.jpg)
Ways of aligning multiple sequences
• By hand
• Automated
• Combination
![Page 7: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/7.jpg)
Optimality criteria: some kind rule or scoring scheme to help you to decide what you consider to be the best alignment
Definition
![Page 8: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/8.jpg)
Pairwise vs Multiple Sequences
• Pairs of sequences typically aligned using exhaustive algorithms (dynamic programming)– complexity of exhaustive methods is O(2n mn)
n = number of sequences
• Multiple sequence alignment using heuristic methods
![Page 9: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/9.jpg)
ATTGCGC
The Correct Alignment
ATTGCGC
ATCCGCC
ATTGCGC AT-CCGC
ATTGCGC ATC-CGC
ATTGCGC
![Page 10: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/10.jpg)
The Correct Alignment
Correct according to optimality criteria
Correct according to homology
Exhaustive methods
Always Not always
Heuristic methods
Not always Not always
![Page 11: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/11.jpg)
• Sequence alignment is easy with sufficiently closely related sequences
• Below a certain level of identity sequence alignment may become meaningless – twilight zone for aa sequences ~ 30%
• In the twilight zone it is good to make use of additional information if possible (e.g. structure)
![Page 12: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/12.jpg)
Consensus Sequences
• Simplest Form:A single sequence which represents the most common amino acid/base in that position
Y D D G A V - E A L
Y D G G - - - E A L
F E G G I L V E A L
F D - G I L V Q A V
Y E G G A V V Q A L
Y D G G A/I V/L V E A L
![Page 13: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/13.jpg)
Multiple Alignment Formats
e.g. Clustal, Phylip, MSF, MEGA etc. etc.
![Page 14: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/14.jpg)
Clustal Format
CLUSTAL X (1.81) multiple sequence alignment
CAS1_BOVIN MKLLILTCLVAVALARPKHPIKHQGLPQ--------EVLNEN-CAS1_SHEEP MKLLILTCLVAVALARPKHPIKHQGLSP--------EVLNEN-CAS1_PIG MKLLIFICLAAVALARPKPPLRHQEHLQNEPDSRE--------CAS1_HUMAN MRLLILTCLVAVALARPKLPLRYPERLQNPSESSE--------CAS1_RABBIT MKLLILTCLVATALARHKFHLGHLKLTQEQPESSEQEILKERKCAS1_MOUSE MKLLILTCLVAAAFAMPRLHSRNAVSSQTQ------QQHSSSECAS1_RAT MKLLILTCLVAAALALPRAHRRNAVSSQTQ------------- *:***: **.*.*:* : . :
![Page 15: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/15.jpg)
7 100SOMA_BOVIN MMAAGPRTSL LLAFALLCLP WTQVVGAFPA MSLSGLFANA VLRAQHLHQL SOMA_SHEEP MMAAGPRTSL LLAFTLLCLP WTQVVGAFPA MSLSGLFANA VLRAQHLHQL SOMA_RAT_P -MAADSQTPW LLTFSLLCLL WPQEAGAFPA MPLSSLFANA VLRAQHLHQL SOMA_MOUSE -MATDSRTSW LLTVSLLCLL WPQEASAFPA MPLSSLFSNA VLRAQHLHQL SOMA_RABIT -MAAGSWTAG LLAFALLCLP WPQEASAFPA MPLSSLFANA VLRAQHLHQL SOMA_PIG_P -MAAGPRTSA LLAFALLCLP WTREVGAFPA MPLSSLFANA VLRAQHLHQL SOMA_HUMAN -MATGSRTSL LLAFGLLCLP WLQEGSAFPT IPLSRLFDNA MLRAHRLHQL
AADTFKEFER TYIPEGQRYS -IQNTQVAFC FSETIPAPTG KNEAQQKSDL AADTFKEFER TYIPEGQRYS -IQNTQVAFC FSETIPAPTG KNEAQQKSDL AADTYKEFER AYIPEGQRYS -IQNAQAAFC FSETIPAPTG KEEAQQRTDM AADTYKEFER AYIPEGQRYS -IQNAQAAFC FSETIPAPTG KEEAQQRTDM AADTYKEFER AYIPEGQRYS -IQNAQAAFC FSETIPAPTG KDEAQQRSDM AADTYKEFER AYIPEGQRYS -IQNAQAAFC FSETIPAPTG KDEAQQRSDV AFDTYQEFEE AYIPKEQKYS FLQNPQTSLC FSESIPTPSN REETQQKSNL
Phylip Format (Interleaved)
![Page 16: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/16.jpg)
Phylip Format (Sequential)
3 100Rat ATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGTTAATGGCCGTGGTGGCTGGAGTGGCCAGTGCCCTGGCTCACAAGTACCACTAAMouse ATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGTCTCTTGCCTTGGGGAAAGGTGAACTCCGATGAAGTTGGTGGTGAGGCCCTGGGRabbit ATGGTGCATCTGTCCAGT---GAGGAGAAGTCTGCGGTCACTGCTGGGGCAAGGTGAATGTGGAAGAAGTTGGTGGTGAGGCCCTGGG
![Page 17: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/17.jpg)
#megaTITLE: No title
#Rat ATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGT#Mouse ATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGT#Rabbit ATGGTGCATCTGTCCAGT---GAGGAGAAGTCTGC#Human ATGGTGCACCTGACTCCT---GAGGAGAAGTCTGC#Oppossum ATGGTGCACTTGACTTTT---GAGGAGAAGAACTG#Chicken ATGGTGCACTGGACTGCT---GAGGAGAAGCAGCT#Frog ---ATGGGTTTGACAGCACATGATCGT---CAGCT
Mega Format
![Page 18: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/18.jpg)
Progressive Multiple Alignment
• Heuristic
• Perform pairwise alignments
• Align sequences to alignments or alignments to existing alignments (profile alignments
• Do the alignments in some sensible order
![Page 19: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/19.jpg)
Iterative methods
• Several progressive alignment methods can be iterated– e.g. Barton-Sternberg, ClustalX
![Page 20: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/20.jpg)
ClustalX Algorithm
• Perform alignments and calculate distances for all pairs of sequences
• Construct guide tree (dendrogram) joining the most similar sequences using Neighbour Joining
• Align sequences, starting at the leaves of the guide tree. This involves the pair-wise comparisons as well as comparison of single sequence with a group of seqs (Profile)
![Page 21: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/21.jpg)
• ClustalX is not optimal
• There are known areas in which ClustalX performs badly e.g. – errors introduced early cannot be corrected by
subsequent information– alignments of sequences of differing lengths
cause strange guide trees and unpredictable effects
– edges: ClustalX does not penalise gaps at edges
• There are alternatives to ClustalX available
![Page 22: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/22.jpg)
Using ClustalX
• Start with sequences in FASTA format (or an existing alignment in Clustal format
• [Do Alignment] on the alignment menu
![Page 23: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/23.jpg)
![Page 24: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/24.jpg)
ClustalX Parameters
• Scoring Matrix
• Gap opening penalty
• Gap extension penalty
• Protein gap parameters
• Additional algorithm parameters
• Secondary structure penalties
![Page 25: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/25.jpg)
Score Matrices
• Pairwise matrices and multiple alignment matrix series
• PAM (Dayhoff), BLOSUM (Hennikof), GONNET (default), user defined
• Transition (A<->G)/Transversion (C<-T) ratio – low for distantly related sequences
![Page 26: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/26.jpg)
Gap Penalties
• Linear gap penalties – Affine gap penaltiesp = (o + l.e)
• Gap opening • Gap extension• Protein specific penalties (on by default)
– Increase the probability of gaps associated with certain residues
– Increase the chances of gaps in loop regions (> 5 hydrophilic residues)
![Page 27: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/27.jpg)
Algorithm parameters
• Slow-accurate pair-wise alignment
• Do alignment from guide tree
• Reset gaps before aligning (iteration)
• Delay Divergent sequences (%)
![Page 28: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/28.jpg)
Additional displays
• Column Scores
• Low quality regions
• Exceptional residues
![Page 29: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/29.jpg)
Multiple Alignment Strategies
• Align pairs of sequences using an optimal method• Choose representative sequences to align carefully• Choose sequences of comparable lengths• Progressive alignment programs such as ClustalX
for multiple alignment• Progressive alignment programs may be combined• Review alignment by eye and edit
![Page 30: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/30.jpg)
Alignment of coding regions
• Nucleotide sequences much harder to align accurately than proteins
• Protein coding sequences can be aligned using the protein sequences
![Page 31: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/31.jpg)
Multiple Alignments and Phylogenetic Trees
– You can make a more accurate multiple sequence alignment if you know the tree already
– A good multiple sequence alignment is an important starting point for drawing a tree
– The process of constructing a multiple alignment (unlike pair-wise) needs to take account of phylogenetic relationships
![Page 32: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/32.jpg)
Editing a multiple sequence alignment
• It is NOT fraud to edit a multiple sequence alignment
• Incorporate additional knowledge if possible
• Alignment edititors help to keep the data organised and help to prevent unwanted mistakes
![Page 33: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/33.jpg)
Alignment Editors
• e.g. GDE, Bioedit, Seaview, Jalview etc.
• Alignment editors can function as an organisational tool (analyses tools on BioEdit)
• Construct sub-sequences (GDE, Seaview)
• Annotate sequences (Seaview)
![Page 34: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/34.jpg)
Aligning weakly similar sequences
![Page 35: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/35.jpg)
Sequence contains conserved regions
• e.g. DIALIGN (Morgenstern, Dress, Werner) – re-aligns regions between conserved blocks
http://bibiserv.techfak.uni-bielefeld.de/
useful if sequences contains consistent conserved blocks
• Block Maker – searches for conserved words that may be inconsistent http://blocks.fhcrc.org/
![Page 36: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/36.jpg)
Profile AlignmentGribskov et al. 1987
• Position specific scores• Allows alignment of alignments• Gaps introduced as whole columns in the separate
alignments• Optimal alignment in time O(a2l2)a = alphabet size, l = sequence length• Information about the degree of conservation of
sequence positions is included
![Page 37: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/37.jpg)
Good reasons to use profile alignments
– Adding a new sequence to an existing multiple alignment that you want to keep the same(align sequence to profile)
– Searching a database for new members of your protein family(pfsearch)
– Searching a database of profiles to find out which one your sequence belongs to(pfscan)
– Combining two multiple sequence alignments(profile to profile)
![Page 38: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/38.jpg)
Profile Alignment Using ClustalX
• Profile Alignment Mode
• Align sequence to profile
• Align profile 1 to profile 2
• Secondary structure parameters
![Page 39: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/39.jpg)
![Page 40: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/40.jpg)
Profile searching using PSI-BLAST
• Position Specific Iterative
• Perform search – construct profile – perform search
• Convergence (hopefully…)
• Increased sensitivity for distantly related sequences
• Available on-line (NCBI)
![Page 41: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/41.jpg)
Databases of Aligned Sequences• Hovergen http://pbil.univ-lyon1.fr/databases/
hovergen.html (vertebrate alignments)• Pfam http://www.sanger.ac.uk/Software/Pfam/
(protein domain alignments and profile HMMs)• BLOCKS http://blocks.fhcrc.org/• Ribosomal Database Project http://rdp.cme.msu.edu
/html/ alignments and trees derived from rRNA sequences
• Interpro – combines information from other sources• Many more…
![Page 42: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/42.jpg)
Probabilistic Models of Sequence Alignment
• Hidden Markov Models– sequence of states and associated symbol probabilities
• Produces a probabilistic model of a sequence alignment
• Align a sequence to a Profile Hidden Markov Model– Algorithms exist to find the most efficient pathway
through the model
![Page 43: Multiple Sequence Alignment. Definition Homology: related by descent Homologous sequence positions ATTGCGC ATTGCGC ATCCGC C ATTGCGC AT-CCGC ATTGCGC](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649f555503460f94c78ff4/html5/thumbnails/43.jpg)
Markov Chain: A chain of things. The probability of the next thing depends only on the current thing
Hidden Markov Model: A sequence of states which form a Markov Chain. The states are not observable. The observable characters have “emission” probabilities which depend on the current state.