mathematical challenges in protein motif recognition bonnie berger mit

23

Upload: aaliyah-wates

Post on 14-Dec-2015

222 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT
Page 2: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Mathematical Challenges in Protein Motif Recognition

Bonnie Berger

MIT

Page 3: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Approaches to Structural Motif Recognition

Alignments

Multiple alignments & HMMs

Threading

Profile methods (1D, 3D)

* Statistical methods

Page 4: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Structural Motif Recognition

1) Collect a database of positive examples of a motif (e.g., coiled coil, beta helix).

2) Devise a method to determine if an unknown sequence folds as the motif or not.

3) Verification in lab.

Page 5: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Our Coiled-Coil Programs

PairCoil [Berger, Wilson, Wolf, Tonchev, Milla, Kim,1995]• predicts 2-stranded CCs• http://theory.lcs.mit.edu/paircoil

MultiCoil [Wolf, Kim, Berger, 1997]• predicts 3-stranded CCs• http://theory.lcs.mit.edu/multicoil

LearnCoil-Histidine Kinase [Singh, Berger, Kim, Berger, Cochran, 1998]• predicts CCs in histidine kinase linker domains• http://theory.lcs.mit.edu/learncoil

LearnCoil-VMF [Singh, Berger, Kim, 1999]• predicts CCs in viral membrane fusion proteins• http://theory.lcs.mit.edu/learncoil-vmf

Page 6: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Long Distance Correlations

In beta structures, amino acids close in the folded 3D structure may be far away in the linear sequence

Page 7: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Biological Importance of Beta Helices

Surface proteins in human infectious disease:• virulence factors (plants, too)• adhesins• toxins• allergens

Amyloid fibrils (e.g., Alzheimer’s, Creutzfeld Jakob (Mad Cow) disease)

Potential new materials

Page 8: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

What is Known

Solved beta-helix structures:

12 structures in PDB in 7 different SCOP families

Related work:

• ID profile of pectate lyase (Heffron et al. ‘98)

• HMM (e.g., HMMER)

• Threading (e.g., 3D-PSSM)

Page 9: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Key Databases

Solved structures:

Protein Data Bank (PDB) (100’s of non-redundant structures)[www.rcsb.org/pdb/]

Sequence databases:

Genbank (100’s of thousands of protein sequences)[www.ncbi.nlm.nih.gov/Genbank/GenbankSearch.html]

SWISSPROT (10’s of thousands of protein sequences)[www.ebi.ac.uk/swissprot]

Page 10: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Performance:

• On PDB: no false positives & no false negatives.

• Recognizes beta helices in PDB across SCOP families in cross-validation.

• Recognizes many new potential beta helices.

• Runs in linear time (~5 min. on SWISS-PROT).

[Bradley, Cowen, Menke, King, Berger: RECOMB 2001]

BetaWrap Program

Page 11: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

BetaWrap ProgramHistogram of protein scores for:

• beta helices not in database (12 proteins)• non-beta helices in PDB (1346 proteins

)

Page 12: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Single Rung of a Beta Helix

Page 13: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT
Page 14: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

3D Pairwise Correlations

Stacking residues in adjacent beta-strands

exhibit strong correlations

Residues in the T2 turn have special

correlations (Asparagine ladder,

aliphatic stacking)

B3T2

B2

B1

Page 15: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

3D Pairwise Correlations

Stacking residues in adjacent beta-strands

exhibit strong correlations

Residues in the T2 turn have special

correlations (Asparagine ladder,

aliphatic stacking)

B3T2

B2

B1

Page 16: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT
Page 17: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Question: but how can we find these correlations which are a variable distance apart in sequence?

[Tailspike, 63 residue turn]

Page 18: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Finding Candidate Wraps

• Assume we have the correct locations of a

single T2 turn (fixed B2 & B3).

• Generate the 5 best-scoring candidates for the next rung.

B2

B3 T2Candidate

Rung

Page 19: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Scoring Candidate Wraps (rung-to-rung)

Similar to probabilistic framework plus:

• Pairwise probabilities taken

from amphipathic

beta (not beta helix)structures in PDB.

• Additional stacking bonuseson internal pairs.

• Incorporates distribution on

turn lengths.

Page 20: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Scoring Candidate Wraps (5 rungs)

• Iterate out to 5 rungs generating candidate wraps:

• Score each wrap:

- sum the rung-to-rung scores

- B1 correlations filter

- screen for alpha-helical content

Page 21: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Potential Beta HelicesToxins:• Vaculating cytotoxin from the human gastric pathogen H. pylori• Toxin B from the enterohemorrhagic E. coli strain O157:H7

Allergens:• Antigen AMB A II, major allergen from A. artemisiifolia (ragweed)• Major pollen allergen CRY J II, from C. japonica (Japanese cedar)

Adhesins:• AIDA-I, involved in diffuse adherence of diarrheagenic E. coli

Other cell surface proteins:• Outer membrane protein B from Rickettsia japonica• Putative outer membrane protein F from Chlamydia trachomatis• Toxin-like outer membrane protein from Helicobacter pylori

Page 22: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

The Problem

Given an amino acid residue subsequence, does it fold as a coiled coil? A beta helix?

Very difficult:

• peptide synthesis (1-2 months)

• X-ray crystallization, NMR (>1 year)

• molecular dynamics

Our goal: predict folded structure based on a template of positive examples.

Page 23: Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

CollaboratorsMath / CS

Mona Singh

Ethan Wolf

Phil Bradley

Lenore Cowen

Matt Menke

David Wilson

Theo Tonchev

Biologists

Peter S. Kim

Jonathan King

Andrea Cochran

James Berger

Mari Milla