computational analysis of proteins

61
Computational Analysis of Proteins Dr. K. Sivakumar Department of Chemistry SCSVMV University [email protected] Chemistry – Our Life, Our Future National Workshop on Modern Techniques in Analytical Chemistry www.kanchiuniv.ac.in/DrKSivakumar_chemistry.html

Upload: olin

Post on 02-Feb-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Dr. K. Sivakumar Department of Chemistry SCSVMV University [email protected]. Computational Analysis of Proteins. National Workshop on Modern Techniques in Analytical Chemistry. Chemistry – Our Life, Our Future. www.kanchiuniv.ac.in/DrKSivakumar_chemistry.html. Amino Acid. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computational Analysis of Proteins

Computational Analysis of Proteins

Dr. K. SivakumarDepartment of Chemistry

SCSVMV [email protected]

Chemistry – Our Life, Our Future

National Workshop

on

Modern Techniques in Analytical Chemistry

www.kanchiuniv.ac.in/DrKSivakumar_chemistry.html

Page 2: Computational Analysis of Proteins

AMINO ACIDS: THE BUILDING BLOCKS OF PROTEINS

Triple & single letter codes of amino acids

General structureof an amino acid

Amino AcidTriple letter

codeSingle letter

codeAlanine Ala A

Cysteine Cys C

Aspartic acid Asp D

Glutamic acid Glu E

Phenylalanine Phe F

Glycine Gly G

Histidine His H

Isoleucine Ile I

Lysine Lys K

Leucine Leu L

Methionine Met M

Asparagine Asn N

Proline Pro P

Glutamine Gln Q

Arginine Arg R

Serine Ser S

Threonine Thr T

Valine Val V

Tryptophan Trp W

Tyrosine Tyr Y2

Page 3: Computational Analysis of Proteins

PROTEIN SEQUENCING ( Order of amino acids in proteins)

MALSFTVGQLIFLFWTMRITEASPD

Methionine

AlanineLeucine Serine

Phenylalanine

Protein sequence

Protein sequencer•Protein sequencing - determining the order of amino acid sequence

•Methods– Mass Spec., Edman degradation,….

•Amino acids in a protein - determines the properties of proteins

•Proteins are sequenced - by microbiologists and biotechnologists for various purposes.

3

Page 4: Computational Analysis of Proteins

4

www.writersujatha.com

Refer “GENOME” by Sujatha, for simple explanations on sequencing process

Page 5: Computational Analysis of Proteins

5

Various levels of protein structure…….. Various levels of protein structure……..

Page 6: Computational Analysis of Proteins

Methane

Primary structure

Secondarystructure

Tertiarystructure

Protein

Primary structure

Secondarystructure

Tertiarystructure

4CH

M for MetheonineM for group of atoms

C for carbonC for single atom

Page 7: Computational Analysis of Proteins

• Protein sequences are continuously submitted by sequencing centers and updated in protein databases.

• Till date more than 10 Lac proteins are sequenced and publicly made available through protein databases. For example,

524,420

Protein Sequence Databases No. of Sequences

1,365,912

13,593,921

7

Page 8: Computational Analysis of Proteins

Sequence growth in Protein sequence databases:

Ref: SwissProt – Feb’ 2011 Ref: GenomeNet – Feb’ 2011

Page 9: Computational Analysis of Proteins

70,947Till 01, Feb, 2011

9

524,420 - ~ 5 Lac

Protein Sequence Databases No. of Sequences

1,365,912 - > 10 Lac

13,593,921 - ~ 1 Cr

The ONLY Protein Structure Database No. of Structure

Ref: K. Sivakumar, Advanced BioTech, V (9), 20-27 (2007)

Page 10: Computational Analysis of Proteins

10

PDB contains (70,947) structures determined by X-ray, NMR & Electron microscopyPDB contains (70,947) structures determined by X-ray, NMR & Electron microscopy

EM~350

NMR~8,700

X-ray~60,500

Page 11: Computational Analysis of Proteins

Most of the sequenced proteins lack a descriptive, documented physico-chemical and STRUCTURAL characterization.

Because, experimental methods (X-ray, NMR, EM) are,

Trial and error based

Time consuming

Expensive

11

Computational methods are,

Minimizing the number of experimental trials.

Reduces the cost of experimental investigation.

Facilitates experimental analysis be more focused.

Ref: K. Sivakumar, S. Balaji, Ganga Radhakrishnan, Journal of Theoretical and Computational Chemistry, 6 (1), 127-140 (2007).

Page 12: Computational Analysis of Proteins

12

Need for computational analysisNeed for computational analysis

• > 10 Lac sequences are available in public databases

• Sequences are highly valuable resources, because…

• Huge amount of structural, functional & evolutionary information are locked up in sequences

• By contrast, the # of unique protein structures is very less

• - this represents a huge information deficit

• So, We need to construct 3D Models by COMPUTATIONAL METHODS

Page 13: Computational Analysis of Proteins

13

3D Structure can be modelled by…3D Structure can be modelled by…

• Homology Modeling

• Threading

• Ab initio

Page 14: Computational Analysis of Proteins

Ref: K. Sivakumar, Advanced BioTech, IV (11), 18-23 (2006)

Repeated with other suitable templates

14

Homology Modeling – Principle…Homology Modeling – Principle…

Page 15: Computational Analysis of Proteins

??

KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE

Predicting Protein Structure:Predicting Protein Structure:Comparative ModelingComparative Modeling

(formerly, homology modeling)(formerly, homology modeling)

Use as template & model 8lyz1alc

KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare

Similar Sequence

HomologousTarget sequence Template sequence

Template structure

Page 16: Computational Analysis of Proteins

What is Homology Modeling?

• Predicts the three-dimensional structure of a given protein sequence (TARGET) based on an alignment to one or more known protein structures (TEMPLATES)

• If similarity between the TARGET sequence and the TEMPLATE sequence is detected, structural similarity can be assumed.

• In general, 30% sequence identity is required for generating useful models.

Page 17: Computational Analysis of Proteins

17

Homology ModelingHomology ModelingGet protein sequence from sequence database

http://expasy.org/sprot/

Page 18: Computational Analysis of Proteins

18

Click to get protein details

Page 19: Computational Analysis of Proteins

19

Click to get protein sequence

Page 20: Computational Analysis of Proteins

20

protein sequence in fasta format

Save it in a notepad for further use

Page 21: Computational Analysis of Proteins

21

http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins

Using Protein Blast server to find similar STRUCTURE

Click to search, similar structures in PDB

Paste sequence in Fasta format

Choose PDB

Page 22: Computational Analysis of Proteins

22

Graphical summary of Blastp suite

Blast search of O70456 Vs PDB

Page 23: Computational Analysis of Proteins

23

List of similar structure - Blastp suite

Page 24: Computational Analysis of Proteins

24

Detailed summary of Blastp suite

Page 25: Computational Analysis of Proteins

25

Paste sequence

only

Type the PDB ID

Method1: EsyPred3D server - Submit the sequence and PDB ID

Click to submit

Page 26: Computational Analysis of Proteins

26

Get built in structure through email in Inbox

Page 27: Computational Analysis of Proteins

27

Download the attached the *.pdb file and save it

Page 28: Computational Analysis of Proteins

28

Open and visualize the *.pdb file in RasMol

Page 29: Computational Analysis of Proteins

29

Open and visualize the *.pdb file in RasMol

Page 30: Computational Analysis of Proteins

30

Method2: SWISS-MODEL server

Click for modeling

Page 31: Computational Analysis of Proteins

31

Submit sequence only in Fasta format (without PDB ID)

Similarity search (BlastP) will be done by SWISS-MODEL server

Paste sequenceClick to submit

Page 32: Computational Analysis of Proteins

32

Get built in structure through email in Inbox

Page 33: Computational Analysis of Proteins

33

The links in the email will lead to

Click to download 3D structure

Page 34: Computational Analysis of Proteins

34

Open and visualize the *.pdb file in RasMol

Page 35: Computational Analysis of Proteins

35

Structure retrieval from Protein 3D Structure Database – PDB……….

Page 36: Computational Analysis of Proteins

36

Structure retrieval from Protein 3D Structure Database – PDB……….

PDB ID

Click for protein details

491 sequence in SwissProt for « Keratin »

Page 37: Computational Analysis of Proteins

37

Structure retrieval from Protein 3D Structure Database – PDB……….

Click for downloading structure

Page 38: Computational Analysis of Proteins

38

Structure retrieval from Protein 3D Structure Database – PDB……….

Save & Know the location

Page 39: Computational Analysis of Proteins

39

Open and visualize the *.pdb file in RasMol

Structure of 3EUU

Page 40: Computational Analysis of Proteins

40

MNRVDLSLFIPDSLTAETGDLKIKTYKVVLIARAASIFGVKRIVIYHDDADGEARFIRDILTYMDTPQYLRRKVFPIMRELKHVGILPPLRTPHHPTG

Sequence data

Structural data

(in notepad)

Atom No. AtomAmino

Acid(AA)AA No.

1 N PRO 98 8.824 17.273 88.787

2 CA PRO 98 8.452 18.679 89.088

3 CD PRO 98 9.692 16.763 89.899

4 CB PRO 98 8.73 18.889 90.578

5 CG PRO 98 9.172 17.521 91.124

6 C PRO 98 9.482 19.367 88.271

7 O PRO 98 10.515 19.739 88.825

8 N ARG 99 9.263 19.522 86.956

9 CA ARG 99 10.346 20.102 86.231

10 CB ARG 99 11.564 19.174 86.276

11 CG ARG 99 12.054 19.012 87.718

12 CD ARG 99 10.944 18.606 88.698

6213 N GLY 1078 -299.78 40.023 17.009

6214 CA GLY 1078 -285.59 39.377 19.813

6215 C GLY 1078 -267.82 38.403 22.744

6216 O GLY 1078 -267.78 38.205 24.03

6217 N ILE 1079 -255.59 37.727 24.695

6218 CA ILE 1079 -241.59 37.013 27.144

6219 CB ILE 1079 -241.06 35.864 27.728

-------------------------------------------------------------------------------

x, y,z Cordinates

Structural data

(in RasMol)

Page 41: Computational Analysis of Proteins

41

Built model validation by ProQ server

Click for uploading structure

Page 42: Computational Analysis of Proteins

42

Built model validation by ProQ server

Click & upload the structure

Page 43: Computational Analysis of Proteins

43

Built model validation by ProQ server

Submit after uploading

Page 44: Computational Analysis of Proteins

44

Built model validation by ProQ server result

Page 45: Computational Analysis of Proteins

45

Built model validation by Ramachandran Plot

Click & upload the structure

Page 46: Computational Analysis of Proteins

46

Submit after uploading

Built model validation by Ramachandran Plot….

Page 47: Computational Analysis of Proteins

47

Built model validation by Ramachandran Plot…. RESULTS

G.N.Ramachandran

Page 48: Computational Analysis of Proteins

Ref: K. Sivakumar, S. Balaji, Ganga Radhakrishnan, Journal of Chemical Sciences, 119 (5), 571-579 (2007)

3D structure modeling and validation

48

Page 49: Computational Analysis of Proteins

Disulphide bridges in 3D structure of Q01758

• Backbone of Q01758 (rainbow smelt fish)• 10 Cysteines - ball and stick • 10 Sulphur in Cysteines and 5 SS bonds (dotted lines) 49

Page 50: Computational Analysis of Proteins

Disulphide bridges in 3D structure of P05140

• Ribbon model of P05140 (sea raven)• 10 Cysteines - ball and stick• 10 Sulphur in Cysteines and 5 SS bonds (dotted lines)

50

Page 51: Computational Analysis of Proteins

Secondary structure prediction from modeled 3D structure

Q01758

P05140 Beta strand

-helices

Coil

51

Page 52: Computational Analysis of Proteins

52

Finding cavities in the built model using Castp server

Click for calculation

Page 53: Computational Analysis of Proteins

53

Finding cavities in the built model using Castp server

Click, upload & Submit the structure

Page 54: Computational Analysis of Proteins

54

Finding cavities in the built model using Castp server - RESULTS

Page 55: Computational Analysis of Proteins

55

For literature

Page 56: Computational Analysis of Proteins

56

Page 57: Computational Analysis of Proteins
Page 58: Computational Analysis of Proteins

58

Page 59: Computational Analysis of Proteins

59

• Download sequence file for any one of the following proteins from Swissprot/Protein Information Resource/Protein Research Foundation,

AntifreezeVascular Endothelial growth factor proteinKeratin

• Generate atleast 3 homology models using EsyPred server or SWISS-model server (i.e., using different PDB structures)

• Visualize the structure using RasMol tool

• Compare and Evaluate the modelled 3D structure using RamPage, ProQ Server and Combinatorial Extension servers.

EXERCISE

Target sequence code Template (PDB) Codes

RamPage ProQ

Percentage of residues in

favoured regionLG Score MaxSub

Page 60: Computational Analysis of Proteins

60

• Generate the report in MS-Word file and submit to [email protected]

• Repeat the exercise for other protein sequences of your choice

EXERCISE……

Page 61: Computational Analysis of Proteins

Thank you all!

61

P05140