current status of homology modeling using mcsg structures

Current Status of Homology Modeling Using MCSG Structures

319 MCSG structures in PDB have over 400,000 sequence homologues.

These structures represent ~350 domains.

Models are built by MODELLER (Sali) and quality is assessed using PROSA (Sippl).

High-quality models can be generated for ~80,000 proteins.

Web site has been established that allows automated modeling of sequence homologues and evaluate the quality of the models.

www.biochem.ucl.ac.uk/~dlee/GeMMA

Gly140/141

Phe97Asp96

Tyr95

Gly154/155

Phe103

Trp102

Gly140/141

Phe97Asp96

Tyr95

Gly154/155

Phe103

Trp102

1t5b domain template

Q92LV5 domain model

Gly140/141

Phe97Asp96

Tyr95

Gly154/155

Phe103

Trp102

Gly140/141

Phe97Asp96

Tyr95

Gly154/155

Phe103

Trp102

1t5b domain template

Q92LV5 domain model

Protein Structure Initiative - the Need for Large-Scale Homology Modeling

In the next five years PSI can determine approximately 3,000-4,000 protein structures, mainly at course granularity.

Reality check: novel structures in PDB will represent very small fraction of sequences in GenBank – reliable homology modeling is critical for obtaining 3D models and extending experimental work.

In PSI2 targets for structure determination are selected from large families, therefore determined structures have a large number of sequence homologues at wide range of sequence similarity. Protein often display different function.

Homology modeling must provide tools and 3D proteins models that can be used for high-confidence, reliable interpretation of specific structural features in distant (15-25%) sequence homologues, protein function assignment and evolution.

Models should provide guide for increasing number of more sophisticated experiments including: (i) aid mutagenesis and biochemical studies, (ii) predicting ligand binding, (iii) predicting oligomerization state, (iv) predicting cellular interactions (protein/protein/DNA/RNA).

We need to consider how PSI target selection of protein sequences and subsequent structure determination can improve homology modeling and the quality of the models.

Major Issues with Large-Scale Homology Modeling for Structural Genomics

3D proteins models for distant (15-25%) sequence homologues are often not suitable.

Because of sequence divergence for very large families only small fraction of sequences can be reliably modeled (10-20%).

Homology modeling must provide input to target selection in fine coverage of protein families.

Domain parsing needs improvement. We should be able to model multi-domain proteins from structures of

individual domains. We should be able to model neighbouring side chains and important

structural and functional features that currently are difficult to assigned and predict correctly.

We need methods to predict unusual features and departures from the structure that is used for modelling.

Modelling loop and high B factor regions needs improvement.

Structure of P5CR Exemplifies Challenges for Homology Modeling

• Two structures of P5CR were determined.• The proteins share 22% sequence identity and 47% sequence similarity.• Structures of monomer are very similar but show individual features.• Problems:• Protein has two domains and forms oligomers, one domain shows major swapping and protein forms different oligomeric forms in different species

Human Aldose Reductase – SeMet MAD at 0.9 Å Comparison – Experimental vs. Refined Map

Refined map @ 0.9 Å, sigmaA

(2mFo-DFc), contour level: 1 sigma

Experimental map @ 0.9 Å, Fo, contour

level: 1 sigma

MAD Map at 3.2 Å, 1.8 Å, 1.6 Å and 1.1 Å

Inhibitor Head Existing in Double Conformation Hard to Interpret at RT (1.45 Å), Clear at 100 K (0.8 Å)

Tyr 48

His 110

current status of homology modeling using mcsg structures

Documents

protein structures

sequence divergence

structures of p5cr

sequence identity

d models

novel structures

structures of monomer

highquality models