sequence based analysis tutorial
DESCRIPTION
Sequence Based Analysis Tutorial. March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at Georgetown University Medical Center. Retrieval, Sequence Search & Classification Methods. Retrieve protein info by text / UID - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/1.jpg)
Sequence Based Analysis Sequence Based Analysis TutorialTutorial
March 26, 2004 March 26, 2004 NIH Proteomics Workshop NIH Proteomics Workshop
Lai-Su L. Yeh, Ph.D.Lai-Su L. Yeh, Ph.D.Protein Science Team LeadProtein Science Team LeadProtein Information Resource at Protein Information Resource at Georgetown University Medical CenterGeorgetown University Medical Center
![Page 2: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/2.jpg)
22
Retrieval, Sequence Search & Retrieval, Sequence Search & Classification MethodsClassification Methods
Retrieve protein info by text / UIDRetrieve protein info by text / UID Sequence Similarity SearchSequence Similarity Search
BLAST, FASTA, Dynamic ProgrammingBLAST, FASTA, Dynamic Programming Family Classification Family Classification
Patterns, Profiles, Hidden Markov Models, Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural NetworksSequence Alignments, Neural Networks
Integrated Search and Classification Integrated Search and Classification SystemSystem
![Page 3: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/3.jpg)
33
Sequence Similarity SearchSequence Similarity Search
Based on Based on Pair-Wise ComparisonsPair-Wise Comparisons Dynamic Programming AlgorithmsDynamic Programming Algorithms
Global Similarity: Needleman-WunchGlobal Similarity: Needleman-Wunch Local Similarity: Smith-WatermanLocal Similarity: Smith-Waterman
Heuristic AlgorithmsHeuristic Algorithms FASTA: Based on K-Tuples (2-Amino Acid)FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino AcidsBLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment PairsGapped-BLAST: Allow Gaps in Segment Pairs PHI-BLAST: Pattern-Hit Initiated SearchPHI-BLAST: Pattern-Hit Initiated Search PSI-BLAST: Position-Specific Iterated SearchPSI-BLAST: Position-Specific Iterated Search
![Page 4: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/4.jpg)
44
Sequence Similarity SearchSequence Similarity Search
Similarity Search ParametersSimilarity Search Parameters Scoring Matrices – Based on Conserved Amino Scoring Matrices – Based on Conserved Amino
Acid Substitution Acid Substitution • Dayhoff Mutation Matrix, e.g., PAM250 (~20% Dayhoff Mutation Matrix, e.g., PAM250 (~20%
Identity)Identity)• Henikoff Matrix from Ungapped Alignments, Henikoff Matrix from Ungapped Alignments,
e.g., BLOSUM 62 e.g., BLOSUM 62 Gap PenaltyGap Penalty
Search Time ComparisonsSearch Time Comparisons Smith-Waterman: 10 MinSmith-Waterman: 10 Min FASTA: 2 MinFASTA: 2 Min BLAST: 20 SecBLAST: 20 Sec
![Page 5: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/5.jpg)
55
Feature RepresentationFeature Representation
Features:Features: Residue Physicochemical Properties, Context Residue Physicochemical Properties, Context (Local & Global) Features, Evolutionary Features(Local & Global) Features, Evolutionary Features
Alternative Alphabets:Alternative Alphabets: Classification of Amino Acids To Classification of Amino Acids To Capture Different Features of Amino Acid ResiduesCapture Different Features of Amino Acid Residues
![Page 6: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/6.jpg)
66
Substitution MatrixSubstitution Matrix Likelihood of One Amino Acid Mutated into Another Over Evolutionary Likelihood of One Amino Acid Mutated into Another Over Evolutionary
TimeTime Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7)Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7) Positive Score: Conservative Substitution (e.g., Lys/Arg, +3)Positive Score: Conservative Substitution (e.g., Lys/Arg, +3) High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys)High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys)
![Page 7: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/7.jpg)
77
BLASTBLAST
BLASTBLAST (Basic Local Alignment Search Tool) (Basic Local Alignment Search Tool) To search a sequence against the databaseTo search a sequence against the database Extremely fastExtremely fast Robust Robust Most widely usedMost widely usedIt finds very short segment pairs between the query It finds very short segment pairs between the query
and sequence in the databaseand sequence in the databaseThese segments are then extended in both directions These segments are then extended in both directions
until the maximum possible score of this particular until the maximum possible score of this particular segment is reached segment is reached
![Page 8: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/8.jpg)
88
BLAST SearchBLAST Search From BLAST Search InterfaceFrom BLAST Search Interface Table-Format Result with BLAST Output and SSEARCH Table-Format Result with BLAST Output and SSEARCH
(Smith-Waterman) Pair-Wise Alignment(Smith-Waterman) Pair-Wise Alignment
![Page 9: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/9.jpg)
99
BLAST/SSEARCH ResultsBLAST/SSEARCH Results
SSEARCH Alignment
BLAST Alignment
![Page 10: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/10.jpg)
1010
Family Classification MethodsFamily Classification Methods
Based on Based on Family InformationFamily Information ClustalW Multiple Sequence AlignmentClustalW Multiple Sequence Alignment ProSite Pattern SearchProSite Pattern Search Profile Search Profile Search Hidden Markov Models (HMMs)Hidden Markov Models (HMMs) Neural NetworksNeural Networks Integrated AnalysisIntegrated Analysis
![Page 11: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/11.jpg)
1111
Multiple Sequence AlignmentMultiple Sequence Alignment
ClustalWClustalW Progressive Pairwise ApproachProgressive Pairwise Approach
Base on Exhaustive Pairwise AlignmentsBase on Exhaustive Pairwise Alignments Neighbor JoiningNeighbor Joining
Joining Order Corresponding to a Tree Joining Order Corresponding to a Tree Alignment VariesAlignment Varies
Dependent on Joining OrderDependent on Joining Order
![Page 12: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/12.jpg)
1212
How do you build a tree?How do you build a tree?
Pick sequences to alignPick sequences to align Align themAlign them Verify the alignmentVerify the alignment Keep the parts that are aligned correctlyKeep the parts that are aligned correctly Build and evaluate a phylogenetic treeBuild and evaluate a phylogenetic tree
![Page 13: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/13.jpg)
1313
Multiple Alignment and TreeMultiple Alignment and Tree From Text/Sequence Search Result or ClustalW Alignment InterfaceFrom Text/Sequence Search Result or ClustalW Alignment Interface
![Page 14: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/14.jpg)
1414
![Page 15: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/15.jpg)
1515
Motif Patterns (Regular Expressions)Motif Patterns (Regular Expressions) Signature Patterns for Functional MotifsSignature Patterns for Functional Motifs
ProClass Motif Alignments
![Page 16: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/16.jpg)
1616
PIR Pattern SearchPIR Pattern Search From Text/Sequence Search Result or Pattern Search InterfaceFrom Text/Sequence Search Result or Pattern Search Interface One Query Sequence Against PROSITE Pattern DatabaseOne Query Sequence Against PROSITE Pattern Database One Query Pattern (PROSITE or User-Defined) Against Sequence DBOne Query Pattern (PROSITE or User-Defined) Against Sequence DB
![Page 17: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/17.jpg)
1717
Pattern Search Result (I)Pattern Search Result (I) One Query Sequence Against PROSITE Pattern DatabaseOne Query Sequence Against PROSITE Pattern Database
![Page 18: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/18.jpg)
1818
Pattern Search Result (II)Pattern Search Result (II) One Query Pattern Against Sequence DatabaseOne Query Pattern Against Sequence Database
![Page 19: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/19.jpg)
1919
Profile MethodProfile Method
Profile: A Table of Scores to Express Family Consensus Derived from Multiple Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence AlignmentsSequence Alignments Num of Rows = Num of Aligned PositionsNum of Rows = Num of Aligned Positions Each row contains a score for the alignment with each possible residue.Each row contains a score for the alignment with each possible residue.
Profile SearchingProfile Searching Summation of Scores for Each Amino Acid Residue along Query SequenceSummation of Scores for Each Amino Acid Residue along Query Sequence Higher Match Values at Conserved PositionsHigher Match Values at Conserved Positions
![Page 20: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/20.jpg)
2020
PIR HMM Domain/Motif SearchPIR HMM Domain/Motif Search
From Text/Sequence From Text/Sequence Search Result or HMM Search Result or HMM Search InterfaceSearch Interface
HMMER Model Building HMMER Model Building & Sequence Search & Sequence Search
Search One Query Search One Query Protein Against All HMMs Protein Against All HMMs
Search One HMM Search One HMM Against Sequence DBAgainst Sequence DB
![Page 21: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/21.jpg)
2121
HMM Search Result (I)HMM Search Result (I) One Query Protein Against All Pfam HMMsOne Query Protein Against All Pfam HMMs
![Page 22: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/22.jpg)
2222
HMM Search Result (II)HMM Search Result (II) Search User-Built HMM Against Protein Sequence DBSearch User-Built HMM Against Protein Sequence DB Input Sequences (Optional Residue Ranges) -> Multiple Input Sequences (Optional Residue Ranges) -> Multiple
Sequence Alignment -> Model Building -> HMM SearchSequence Alignment -> Model Building -> HMM Search
![Page 23: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/23.jpg)
2323
Secondary Structure FeaturesSecondary Structure Features HelixHelix Patterns of Hydrophobic Residue Conservation Showing I, Patterns of Hydrophobic Residue Conservation Showing I,
I+3, I+4, I+7 Pattern Are Highly Indicative of an I+3, I+4, I+7 Pattern Are Highly Indicative of an Helix (Amphipathic)Helix (Amphipathic) StrandsStrands That Are Half Buried in the Protein Core Will Tend to Have That Are Half Buried in the Protein Core Will Tend to Have
Hydrophobic Residues at Positions I, I+2, I+4, I+6Hydrophobic Residues at Positions I, I+2, I+4, I+6
![Page 24: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/24.jpg)
2424
Integrated Bioinformatics System for Integrated Bioinformatics System for Function and Pathway DiscoveryFunction and Pathway Discovery
Data IntegrationData Integration Associative AnalysisAssociative Analysis
Sequence Analysis Pipeline
(Family Classification & Feature Identification)
Data Mining Tools
(Retrieval, Visualization, Analysis, Correlation)
Data Warehouse
(Gene, Protein, Family, Function, Structure, Pathway, Interaction)
Graphical User Interface
(Browsing, Querying, Navigation)
Input
(Gene/Protein Expression Data)
Output
(Analysis Results, Biological Interpretation)
Integrated Bioinformatics System
User
Input
(Local Data, Search Criteria, Report Format)
Sequence Analysis Pipeline
(Family Classification & Feature Identification)
Data Mining Tools
(Retrieval, Visualization, Analysis, Correlation)
Data Warehouse
(Gene, Protein, Family, Function, Structure, Pathway, Interaction)
Graphical User Interface
(Browsing, Querying, Navigation)
Input
(Gene/Protein Expression Data)
Output
(Analysis Results, Biological Interpretation)
Integrated Bioinformatics System
User
Input
(Local Data, Search Criteria, Report Format)
![Page 25: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/25.jpg)
2525
Analytical Analytical PipelinePipeline
Query SequencePIR-NREFiProClass
Top-Matched Superfamilies/Domains
BLAST Search HMM Domain Search
Predicated Superfamilies/Domains/Motifs/Sites/SignalPeptides/TMHs
SSEARCH CLUSTALW
Superfamily/Domain/Motif Alignments
Family Relationships & Functional Features
Family Classification & Functional Analysis
HMM Motif Search Pattern Search SignalP/TMHMM
![Page 26: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/26.jpg)
2626
Integrated Bioinformatics SystemIntegrated Bioinformatics System
Global Bioinformatics Global Bioinformatics Analysis of 1000’s of Analysis of 1000’s of Genes and ProteinsGenes and Proteins
Pathway Discovery, Pathway Discovery,
Target IdentificationTarget Identification
Gene Expression Data Proteomic Data
Clustering
Expression Pattern
Visualization & Statistical Analysis
Clustered Matrix Pathway Map Process HierarchyClustered Graph
Gene/Peptide-Protein Mapping
Pathway Discovery (Browsing, Sorting, Visualization & Statistical Analysis)
Functional Analysis (Sequence Analysis & Information Retrieval)
Integrated Protein Knowledge System
Comprehensive Protein
Information Matrix
Protein List
Gene Expression Data Proteomic Data
Clustering
Expression Pattern
Visualization & Statistical Analysis
Clustered Matrix Pathway Map Process HierarchyClustered GraphClustered Matrix Pathway Map Process HierarchyClustered Graph
Gene/Peptide-Protein Mapping
Pathway Discovery (Browsing, Sorting, Visualization & Statistical Analysis)
Functional Analysis (Sequence Analysis & Information Retrieval)
Integrated Protein Knowledge System
Comprehensive Protein
Information Matrix
Protein List
Gene/Peptide-Protein Mapping
Pathway Discovery (Browsing, Sorting, Visualization & Statistical Analysis)
Functional Analysis (Sequence Analysis & Information Retrieval)
Integrated Protein Knowledge System
Comprehensive Protein
Information Matrix
Protein List
![Page 27: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/27.jpg)
2727
![Page 28: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/28.jpg)
2828
Lab SectionLab Section
![Page 29: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/29.jpg)
2929
Peptide Search & ResultsPeptide Search & Results
![Page 30: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/30.jpg)
3030
Blast Similarity SearchBlast Similarity Search
![Page 31: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/31.jpg)
3131
Blast Search ResultsBlast Search Results
![Page 32: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/32.jpg)
3232
Pair-Wise AlignmentPair-Wise Alignment
![Page 33: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/33.jpg)
3333
Multiple Sequence AlignmentMultiple Sequence Alignment
![Page 34: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/34.jpg)
3434
Pattern Search Results Pattern Search Results
![Page 35: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/35.jpg)
3535
HMM Domain Search ResultHMM Domain Search Result
![Page 36: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/36.jpg)
3636
Building HMM ProfileBuilding HMM Profile
![Page 37: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/37.jpg)
3737
Using HMM Profile for Using HMM Profile for SearchingSearching
![Page 38: Sequence Based Analysis Tutorial](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813545550346895d9ca43c/html5/thumbnails/38.jpg)
3838
Rabbit Alpha Crystallin A Chain Rabbit Alpha Crystallin A Chain An An iiProClass View of the entryProClass View of the entry