mathematical challenges in protein motif recognition bonnie berger mit
TRANSCRIPT
Mathematical Challenges in Protein Motif Recognition
Bonnie Berger
MIT
Approaches to Structural Motif Recognition
Alignments
Multiple alignments & HMMs
Threading
Profile methods (1D, 3D)
* Statistical methods
Structural Motif Recognition
1) Collect a database of positive examples of a motif (e.g., coiled coil, beta helix).
2) Devise a method to determine if an unknown sequence folds as the motif or not.
3) Verification in lab.
Our Coiled-Coil Programs
PairCoil [Berger, Wilson, Wolf, Tonchev, Milla, Kim,1995]• predicts 2-stranded CCs• http://theory.lcs.mit.edu/paircoil
MultiCoil [Wolf, Kim, Berger, 1997]• predicts 3-stranded CCs• http://theory.lcs.mit.edu/multicoil
LearnCoil-Histidine Kinase [Singh, Berger, Kim, Berger, Cochran, 1998]• predicts CCs in histidine kinase linker domains• http://theory.lcs.mit.edu/learncoil
LearnCoil-VMF [Singh, Berger, Kim, 1999]• predicts CCs in viral membrane fusion proteins• http://theory.lcs.mit.edu/learncoil-vmf
Long Distance Correlations
In beta structures, amino acids close in the folded 3D structure may be far away in the linear sequence
Biological Importance of Beta Helices
Surface proteins in human infectious disease:• virulence factors (plants, too)• adhesins• toxins• allergens
Amyloid fibrils (e.g., Alzheimer’s, Creutzfeld Jakob (Mad Cow) disease)
Potential new materials
What is Known
Solved beta-helix structures:
12 structures in PDB in 7 different SCOP families
Related work:
• ID profile of pectate lyase (Heffron et al. ‘98)
• HMM (e.g., HMMER)
• Threading (e.g., 3D-PSSM)
Key Databases
Solved structures:
Protein Data Bank (PDB) (100’s of non-redundant structures)[www.rcsb.org/pdb/]
Sequence databases:
Genbank (100’s of thousands of protein sequences)[www.ncbi.nlm.nih.gov/Genbank/GenbankSearch.html]
SWISSPROT (10’s of thousands of protein sequences)[www.ebi.ac.uk/swissprot]
Performance:
• On PDB: no false positives & no false negatives.
• Recognizes beta helices in PDB across SCOP families in cross-validation.
• Recognizes many new potential beta helices.
• Runs in linear time (~5 min. on SWISS-PROT).
[Bradley, Cowen, Menke, King, Berger: RECOMB 2001]
BetaWrap Program
BetaWrap ProgramHistogram of protein scores for:
• beta helices not in database (12 proteins)• non-beta helices in PDB (1346 proteins
)
Single Rung of a Beta Helix
3D Pairwise Correlations
Stacking residues in adjacent beta-strands
exhibit strong correlations
Residues in the T2 turn have special
correlations (Asparagine ladder,
aliphatic stacking)
B3T2
B2
B1
3D Pairwise Correlations
Stacking residues in adjacent beta-strands
exhibit strong correlations
Residues in the T2 turn have special
correlations (Asparagine ladder,
aliphatic stacking)
B3T2
B2
B1
Question: but how can we find these correlations which are a variable distance apart in sequence?
[Tailspike, 63 residue turn]
Finding Candidate Wraps
• Assume we have the correct locations of a
single T2 turn (fixed B2 & B3).
• Generate the 5 best-scoring candidates for the next rung.
B2
B3 T2Candidate
Rung
Scoring Candidate Wraps (rung-to-rung)
Similar to probabilistic framework plus:
• Pairwise probabilities taken
from amphipathic
beta (not beta helix)structures in PDB.
• Additional stacking bonuseson internal pairs.
• Incorporates distribution on
turn lengths.
Scoring Candidate Wraps (5 rungs)
• Iterate out to 5 rungs generating candidate wraps:
• Score each wrap:
- sum the rung-to-rung scores
- B1 correlations filter
- screen for alpha-helical content
Potential Beta HelicesToxins:• Vaculating cytotoxin from the human gastric pathogen H. pylori• Toxin B from the enterohemorrhagic E. coli strain O157:H7
Allergens:• Antigen AMB A II, major allergen from A. artemisiifolia (ragweed)• Major pollen allergen CRY J II, from C. japonica (Japanese cedar)
Adhesins:• AIDA-I, involved in diffuse adherence of diarrheagenic E. coli
Other cell surface proteins:• Outer membrane protein B from Rickettsia japonica• Putative outer membrane protein F from Chlamydia trachomatis• Toxin-like outer membrane protein from Helicobacter pylori
The Problem
Given an amino acid residue subsequence, does it fold as a coiled coil? A beta helix?
Very difficult:
• peptide synthesis (1-2 months)
• X-ray crystallization, NMR (>1 year)
• molecular dynamics
Our goal: predict folded structure based on a template of positive examples.
CollaboratorsMath / CS
Mona Singh
Ethan Wolf
Phil Bradley
Lenore Cowen
Matt Menke
David Wilson
Theo Tonchev
Biologists
Peter S. Kim
Jonathan King
Andrea Cochran
James Berger
Mari Milla