protein structure database for structural genomics group

23
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense

Upload: sulwyn

Post on 12-Feb-2016

48 views

Category:

Documents


1 download

DESCRIPTION

Protein Structure Database for Structural Genomics Group. M.S. Thesis Defense. Jessica Lau December 13, 2004. Bioinformatics is Analysis of biological data: gene expression, DNA sequence, protein sequence. Data mining and management of biological information through database systems. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Protein Structure Database  for Structural Genomics Group

Protein Structure Database for Structural Genomics

GroupJessica Lau

December 13, 2004

M.S. Thesis Defense

Page 2: Protein Structure Database  for Structural Genomics Group

• Bioinformatics is• Analysis of biological data: gene expression, DNA

sequence, protein sequence. • Data mining and management of biological information

through database systems.• At the Northeast Structural Genomics Consortium,

database management systems play a large role in its daily operation

• Data collection and mining of experimental results• Track target progress – status milestones• Exchange information with rest of the world

• My thesis presents work in database management systems at the NESG.

• Part 1: ZebaView• Part 2: Worm Structure Gallery• Part 3: Prototype of NESG Structure Gallery

Page 3: Protein Structure Database  for Structural Genomics Group

• Zebaview is the official target list of the Northeast Structural Genomics Consortium

• Display summary table of NESG targets.– Status milestones– Protein properties: DNA and

protein sequences, molecular weight, isoelectric point

• New targets are curated and then uploaded to SPiNE.

• 11,284 targets from 88 organisms.

Page 4: Protein Structure Database  for Structural Genomics Group

Family View

NESG Families

• Unfolded• Membrane• Core 50• Nf-kB

Page 5: Protein Structure Database  for Structural Genomics Group

In PDB / Cloned Prokaryotic vs. Eukaryotic

0

5

10

15

20

25

30

35H

. sap

iens

(H

)D

. mel

anog

aste

r (F

)S

. cer

evis

iae

(Y)

C. e

lega

ns (

W)

Organism

Perc

enta

ge In

PD

B/C

lone

d Prokaryotic

Eukaryotic

Target Summary Statistics

Success of soluble targe ts: Prokaryotic vs. Eukaryotic

0

10

20

30

40

50

60

70

80

90

D. m

elan

ogas

ter

(F)

S. c

erev

isia

e (Y

)

H. s

apie

ns (

H)

C. e

lega

ns (

W)

Organism

Perc

enta

ge o

f Sol

uble

/Clo

ned

Prokaryotic

Eukaryotic

Selected Cloned Expressed Soluble Purified X-ray or NMR data collection In PDB

• 4,418 targets cloned• 141 structures• 3.4% successful targets

Page 6: Protein Structure Database  for Structural Genomics Group

GO, Cellular Localization, and SignalP

• Search for targets that have • any of the three GO ontologies defined• no GO ontologies defined at all

116 NESG structures do not have Molecular Function defined

Page 7: Protein Structure Database  for Structural Genomics Group

LOCTarget

• Secretory proteins require formation of disulfide bonds• Oxidative Folding needed for proper native folding

• 2,132 “Extracellular” NESG targets

Bovine ribonuclease A has four disulfide bonds to stabalize its 3-D structure.Mahesh Narayan, et al. (2000) Acc. Chem. Res., 33 (11), 805 -812.

Page 8: Protein Structure Database  for Structural Genomics Group

SignalP

• mRNA are translated with signal peptide for cellular localization• Peptide is cleaved upon destination

• SignalP predicts cleavage of signal peptide• Removal of signal peptide gives proper native fold

Lodish et al. Molecular Cell Biology 4th edition, Figure 7.1 (2000)

Page 9: Protein Structure Database  for Structural Genomics Group

Part 2 – Worm Structure Gallery

Page 10: Protein Structure Database  for Structural Genomics Group

Caenorhabditis elegans– Widely studied model organism

• 2-3 weeks life span, small size (1.5-mm-long), ease of laboratory cultivation, transparent body

• Small genome, yet has complex organ systems similar to higher organisms: digestive, excretory, neuromuscular, reproductive systems

Donald Riddle et al, C. elegans II (1997)

Altun Z F and Hall DH. , Atlas of C. elegans Anatomy, Wormatlas (2002-2004)

Page 11: Protein Structure Database  for Structural Genomics Group

System Components

• 22,653 C. elegans proteins• 42 experimentally determined

• 4 are from NESG• 24 homology models

• 14 are from NESG• 960 C. elegans proteins potentially modeled

• Uniprot: Pfam domain, Gene name, ORF name• PDB Coordinates• Structure Validation Report• Sequence similarities to proteins in PDB

Page 12: Protein Structure Database  for Structural Genomics Group

Protein Structure Validation Software

• Suite of quality validation software– PROCHECK

• Quality of experimental data• Distribution of φ, ψ angles in Ramachandran plot

– MolProbity Clashscore• Number of H atom clashes per 1,000 atoms

• With respect to a set of scores from 129 high resolution X-ray crystal structures

• < 500 residues, of resolution <= 1.80 Å, R-factor <= 0.25 and R-free <= 0.28;

Bahattacharya, A et al. to be published

Page 13: Protein Structure Database  for Structural Genomics Group

• Algorithm based on alignment between query and template sequences.– Regions of conserved

residues forms a set of constraints for modeling

• Sequence identity of 40% or more

• Good quality template

Homology Modeling Automatically (HOMA)

Page 14: Protein Structure Database  for Structural Genomics Group

Bad alignment Bad model

Page 15: Protein Structure Database  for Structural Genomics Group

Poor quality template Poor quality model

Page 16: Protein Structure Database  for Structural Genomics Group

Quality scores of 3-D structuresQuality Z-scores - Homology Models vs. Experimentally Determined

Structures

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

5

-10 -8 -6 -4 -2 0 2

Procheck (all) z-score

Mol

Prob

ity C

lash

scor

e z-

scor

e

Homology ModelsExperimentally Determined Structures

Page 17: Protein Structure Database  for Structural Genomics Group

Search

• Search for C. elegans proteins in local database.

• Keyword: “Ubiquitin” in any field

Results:72 C. elegans proteins2 Experimentally determined structures1 Homology model11 Potential models

Results:152 C. elegans proteins2 Experimentally determined structures1 Homology model19 Potential models

Page 18: Protein Structure Database  for Structural Genomics Group

System Architecture• Java, Tomcat, MySQL, Perl.

Three-tier architecture• Client: Web browser• Application: JSP, Logic components, Data access components• Data: MySQL

Page 19: Protein Structure Database  for Structural Genomics Group

Part 3 – NESG Structure Gallery

Page 20: Protein Structure Database  for Structural Genomics Group

• Structure files submitted by individual groups• Structure information is entered into SPiNE manually• Manually run PSVS and MolScript

• Structure files submitted by automated pipeline• ADIT integrated with SPiNE for uniform format• PSVS and images automatically generated • Structure information from PSVS directly into SPiNE• Archives structure files.

Page 21: Protein Structure Database  for Structural Genomics Group

• Downloads– Structure Validation

Report– Structure related files

• Atomic coordinates• NMR constraints• NMR peak lists • Chemical shifts• Structure factor

• Annotation– Functional annotation

provided by other NESG members

– Uniprot– PDB coordinates file

• Reusing Java components from Worm Structure Gallery

Page 22: Protein Structure Database  for Structural Genomics Group

– Enhance ZebaView performance to handle increased load and functionalities

– Integrate annotation from other protein and structure databases.

– Make modules available for other java-based applications within structural genomics.

– Develop a gallery for other organisms: yeast, fruit fly, human

– Continue specifications for the new NESG Structure Gallery

Page 23: Protein Structure Database  for Structural Genomics Group

Advisor: Dr. Gaetano Montelione

Thanks to everyone at theProtein NMR lab and NESG!

Aneerban BhattacharyaJohn Everett

All the scientists who solved the structures!