structural genomics: case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? james...

37
Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson [email protected]

Upload: dana-pearson

Post on 23-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Structural Genomics:Case studies in assigning function from structure

??

? ??

?

?

?

??

??

James D Watson

[email protected]

Page 2: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Structural Genomics Collaborators

MCSG – Mid-west Centre for Structural Genomics

SPINE – Structural Proteomics in Europe

SGC – Structural Genomics Consortium

Page 3: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Structural Genomics Aims

?

Pathogens and disease

Human Proteins Coverage of Fold Space

Automation / High Throughput

Page 4: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

~1.3m non-redundant protein sequencesMRTKSPGDSKFHEITKTPPKNQVSNS…MIVISGENVDIAELTDFLCAA…PPRIPYSMVGPCCVFLMHH…MDVVDSLFVNGSNITSACELGFENE…VYAWETAHFLDAAPKLIEWEVS…MAQQRRGGFKRRKKVDFIAANKIE…CELGFENETLFCLDRPRPSKE…MAQQRRGGFKRRKKVDFIAANKIE…MGMKKNRPRRGSLAFSPRKRAKKLVP…MQILKENASNQRFVTRESEV…MEKFEGYSEKQKSRQQYFVYPFLF…MEEFVNPCKIKVIGVGGGGSNAVNRMY…MAVTQEEIIAGIAEIIEEVTGIEP……

Proteins: known sequences and 3D structures

5,500 non-redundant structures

~260,000 homology models

Page 5: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

~10% unknown

Proteins: known sequences and 3D structures

5,500 non-redundant structures

Homology models

3D structures of ~16,000 carefully selected proteins

Page 6: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Protein Function

Protein function has many definitions:

• Biochemical Function - The biochemical role of the protein e.g. serine protease

• Biological Function - The role of the protein in the cell/organism e.g.digestion, blood clotting, fertilisation

Page 7: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Function through homology

Surface comparison

Sequence similarity

Motif searches

Active SiteTemplates

Structural Similarity

HTH motifs

Page 8: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Template Methodology

Use 3D templates to describe the active site of the enzyme - analogous to 1-D sequence motifs such as PROSITE, but in 3-D

(Wallace et al 1997)

•defines a functional site

•search a new structure for a functional site

•search a database of structures for similar clusters

Page 9: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Query structureQuery structure

SiteSeer’s “reverse” templates

1 2 3

4 5 6

87 9

3-residue templates

Page 10: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Problems with template methods

• Too many hits (hundreds, thousands or even tens of thousands)

•Use of rmsd rarely discriminates true from false positives

• Local distortion in structure may give a large rmsd

• Top hit rarely the correct hit – even in “obvious” cases

Page 11: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

An example

PDB code: 1hsk

UDP-N-acetylenolpyruvoylglucosaminereductase (MURB)

E.C.1.1.1.158

Contains the 3D template that characterisesthis enzyme class

Sequence identity to template’s representative structure (1mbb) is 28% Ser

Arg

Glu

Page 12: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Enzyme active site templatesHits for 1hsk

102. E.C.1.1.1.158 2.19Å UDP-N-acetylmuramate dehydrogenase

Hit E.C number Rmsd Enzyme

1. E.C.1.3.99.2 0.76Å Acyl-CoA dehydrogenase

2. E.C.4.2.1.20 0.76Å Tryptophan synthase α-subunit

3. E.C.3.2.1.73 1.19Å Glycosyl hydrolases, family 17

4. E.C.3.2.1.73 1.21Å Glycosyl hydrolases, family 16

5. E.C.4.1.2.13 1.25Å Fructose-bisphosphate aldolase (class I)

… … …

… … …

386. … 3.94Å …

Arg

Glu

Serrmsd=2.19Å

Page 13: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Template structure – 1mbb

Comparison of template environments

Arg

Glu

Ser

Match to template:

Query structure – 1hsk

Page 14: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Template structure – 1mbb

Comparison of template environments

Arg

Glu

Ser

Match to template:

Query structure – 1hsk

Page 15: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Template structure – 1mbb

Comparison of template environments

Identical residues in neighbourhood:

Query structure – 1hsk

Page 16: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Template structure – 1mbb

Comparison of template environments

Arg

Glu

Ser

Similar residues in neighbourhood:

Query structure – 1hsk

Page 17: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Results for 1hsk

1. E.C.1.1.1.158 2.08 209.1 UDP-N-acetylmuramate dehydrogenase

2. E.C.3.2.1.14 2.13 146.0 Chitinase A chitodextrinase 1,4-beta-poly-N-acetylglucosaminidase coly-beta-glucosaminidase

3. E.C.3.2.1.17 1.92 142.4 Turkey lysozyme

4. E.C.3.2.1.17 1.89 138.7 Hen lysozyme

5. E.C.3.5.1.26 1.47 132.3 Aspartylglucosylaminidase

6. E.C.3.2.1.3 1.54 131.1 Glucan 1,4-alpha-glucosidase

Hit E.C number Rmsd Score Enzyme

Page 18: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

ProFunc – function from 3D structure

Functional sequence motifsQ-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC]

HTH-motifs Electrostatics Surface comparison Nests … etc

Homologous structures of known function

Homologous sequences of known function

Template based methods

Binding site identification and analysis

Residue conservation analysis

Function

Page 19: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Large scale analysis

• Created an edited version of the target database from the PDB – only those with status “In PDB”

• Extract all PDB codes for each Structural Genomics group

• Extract ‘prior’ knowledge (Header, Title, Jrnl, etc.)• Find any associated GOA annotation• Classify each structure by whether function is

“known” “unknown” or “limited info”• Run Profunc in a batch process on all codes (~560)• Extract summary results from each analysis• Compare to prior knowledge and estimate success

Page 20: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Number of deposits to the TargetDB by Structural Genomics group (Total of 577 unique entries)

CESG (6)

JCSG (59)NESG (83)

NYSGC (73)

PSF (4)

S2F (37)

SECSG (19)

TB (26)

MCSG (117)

BCSG (35)

RIKEN (124)

March 2004

Page 21: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

63%

37%

No Matches

Signif icant Hits(> 30% Seq ID)

PDB Blast

• Run query sequences against the PDB using BLAST• Filtered out those matches released AFTER the query sequence• Any hits are ignored from subsequent analyses

• Still get significant matches – why?

Target selection criteria

Released within months of SG target

Page 22: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

No Matches

Significant Hits

InterPro Scan

• InterPro scan on proteins of known function

• Cannot “backdate” the InterPro database• Essentially picking up itself

Page 23: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Function of query structure “known”

0% 20% 40% 60% 80% 100%

HTH motif

Enzyme

Ligand

DNA

Siteseer

SSM

No Hits Different Function Same Function

Page 24: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Limited Functional Info

0% 20% 40% 60% 80% 100%

HTH motif

Enzyme

Ligand

DNA

Siteseer

SSM

No Hits Different Function Same Function New Function

Page 25: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Unknown Function

0% 20% 40% 60% 80% 100%

InterPro

HTH motif

Enzyme

Ligand

DNA

Siteseer

SSM

No Hits Hit Unknown Function New Function

Page 26: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

The Good, the Not So Good and the Ugly

Three examples show the varying levels of information that can be retrieved from structures:

1. New functional assignment

2. Possible function identified

3. Function remains unknown

Page 27: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Ser-His-Asp catalytic triad of the lipases with rmsd=0.28Å

(template cut-off is 1.2Å)

The Good: BioH structure (MCSG)One very strong hit

Experimentally confirmed by hydrolase assays

Novel carboxylesterase acting on short acyl chain substrates

Function Discovered

Page 28: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

[FY] -x-[LIVMFY]-x-S-[TV]-x-K-x(4)-[AGLM]-x(2)-[LC]

70 F-T-M-Q-S-I-S-K-V-I-S-F-I-A-A-C 85

Class A:

APC1040:

The Not So Good: APC1040 (MCSG)

•Assigned as a probable glutaminase

•Most methods suggest -lactamase activity

•No match to Prosite patterns

Function being assayed

Page 29: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

The Ugly: MT0777 (MCSG)

•Fold associated with many functions (Rossmann fold)

•No sequence motifs

•Residue conservation is poor.

•Template methods fail

Hypothetical protein from:

Methanobacterium

thermoautotrophicum

Function Unknown

Page 30: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Future Work• Improvements to scoring system and additional

templates• Further utilisation of SOAP services as they

become available (e.g. KEGG API service)• Possible adaptation to use as part of a larger

workflow or in LIMS systems (Taverna and MyGrid)

• More truely predictive analyses being developed (e.g. Electrostatics, ligand prediction, catalytic residue prediction)

Page 31: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

(Hugh Shanahan)

Detection of DNA-binding proteins (with HTH motif) using structural motifs and electrostatics

● Combine electrostatics with HTH structural templates.● Can detect HTH DNA-binding proteins only.● 1/3 of DNA-binding proteins families have HTH motif● Use linear predictor as discriminant.

● Find comparable true positive rate (~80%) with more complicated methods. ● Very low (< 0.01% ) false positive rate.

Page 32: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Ligand Prediction

Active Site & Ligand description/fingerprinting methods:

Can active site geometry, shape, physical-chemical properties etc. be used to predict the preferred ligand class?

• Spherical Harmonics

• Hybrid Ellipsoids

Page 33: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Spherical Harmonics(Richard Morris)

The computation of Legendre polynomials of high order requires a robust integration scheme

Spherical t-designs

Page 34: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Hybrid Ellipsoids(Rafael Najmanovich)

• Every shape can be modelled by a set of hybrid ellipsoids

• The parameters describe location and a,b,c of the ellipsoid and a smear factor

• Similar parameters mean similar active sites and ligands

Page 35: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Predicting Catalytic Residues

(Alex Gutteridge)

• Aims:

• To predict the location of the active site in an enzyme structure.

• To predict the catalytic residues of an enzyme.

• How?

• Train a neural network to identify catalytic residues.

• Cluster high scoring residues to find the active site.

Page 36: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Workflows and Taverna(Tom Oinn)

• Most procedures used now follow a workflow type scheme

• Taverna allows users to pick elements from services to create their own workflows for automation of complex sets of procedures.

• Removes the need to write complex scripts

Beta 9 release available at: http://taverna.sourceforge.net/

Page 37: Structural Genomics: Case studies in assigning function from structure ? ? ? ? ? ? ? ? ? ? ? ? James D Watson watson@ebi.ac.uk

Acknowledgements• Janet Thornton

• Christine Orengo

• Roman Laskowski - Profunc

• Richard Morris – Interpro search, Spherical Harmonics

• Gail Bartlett, Craig Porter – Enzyme Templates

• Alex Gutteridge – Catalytic Residue Prediction

• Sue Jones – HTH motifs

• Hugh Shanahan – DNA binding, Electrostatics

• Jonathan Barker – JESS

• Hannes Ponstingl – PITA

• Rafael Najmanovich – Hybrid Ellipsoids

• Martin Senger, Siamak Sobhany – SOAP, Tom Oinn – Taverna

• Annabel Todd and Russell Marsden – UCL

• MCSG consortium for lots of structures, plus many more at EBI and UCL

• Work was supported by NIH grant (GM 62414) and by the US DoE under contract (W-31-109-Eng-38)