bioinformatics and protein structural analysis

32
Bioinformatics and Protein Structural Analysis Surabhi Agarwal The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the most widespread fields of research in bioinformatics.

Upload: vic

Post on 13-Feb-2016

64 views

Category:

Documents


4 download

DESCRIPTION

Surabhi Agarwal. Bioinformatics and Protein Structural Analysis. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bioinformatics and Protein Structural Analysis

Bioinformatics and Protein Structural Analysis

Surabhi Agarwal

The molecular structures of proteins are complex and can be defined at various levels. These structures can

also be predicted from their amino-acid sequences. Protein structure prediction is one of the most

widespread fields of research in bioinformatics.

Page 2: Bioinformatics and Protein Structural Analysis

Master Layout (Part 1)

5

3

2

4

1 This animation consists of 2 parts:Part 1: Protein Structural DatabasesPart 2: Uses of Structural databases

Different types of data and the organization of data in a

Structural Database

Search the Database for Protein Structures

Page 3: Bioinformatics and Protein Structural Analysis

Definitions of the components:Part 1 – Protein structural databases

5

3

2

4

11. Query Peptide: The unknown protein or peptide whose sequence is

first determined, with which further analysis is performed. This protein sequence is compared with other known protein sequences in existing databases.

2. Protein sequence: The linear chain or sequence of amino acids, which form the structural unit of a protein, is known as the protein sequence. This sequence is unique for all proteins and is also known as the primary structure of the protein.

3. Sequence similarity: The process by which the amino acid sequences of two proteins are aligned linearly to evaluate their similarities.

4. 3-D structural alignment: The three dimensional structural alignment is the process of super-positioning two given protein structures. This can be achieved by using suitable software by entering protein identifiers or their atomic coordinates.

Page 4: Bioinformatics and Protein Structural Analysis

5

3

2

4

15. Geometry of Protein Structure: Geometry of a protein structure

refers to the three dimensional coordinates of its atoms and the angles between their bonds. These are essential to simulate the protein structure on computers.

6. Biology of Protein Structure: Information regarding the biological source of the protein and its metabolic roles within the cell and organism is referred to as the biology of protein structure.

7. SCOP classification: SCOP stands for “Structural Classification of Proteins” and aims to provide a detailed description of the various structural and evolutionary relationships between all proteins that have been structurally characterized. SCOP Classification can be done at four levels - Class, Fold, Superfamily and Family.

8. CATH classification: CATH stands for “Class Architecture Topology and Homologous Superfamily” and provides a semi-automatic, hierarchical classification of protein domains. The levels for CATH classification are Class, Architecture, Topology and Homologous Superfamily.

Definitions of the components:Part 1 – Protein structural databases

Page 5: Bioinformatics and Protein Structural Analysis

Step 1: Protein Structure Database: Search 1

5

3

2

4

Protein Structural Database

Enter Protein ID or text query Capsid

Structure Features Biology

Experiment

10 Retro Transcribing Viruses

X-RAY CRYSTALLOGRAPHY

Sequence Features

< 500

Optional Inputs

Macromolecule type

Number of Chains

Number of models

Molecular Weight

Secondary Structure Content

Secondary Structure Length

SCOP classification

CATH classification

Number of Chains

Source Organism

Expression Organism

Enzyme Classification

Biological Process

Cellular componentExperimental method

Resolution

Crystal Properties

Detectors used

Experimental Data Available

Source Organism

Sequence

Translated Nucleotide Sequence

Sequence Length

Sequence Motif

Sequence Length Experimental method

Search

http://www.pdb.org/pdb/search/advSearch.do

Page 6: Bioinformatics and Protein Structural Analysis

Step 1: Protein Structure Database: Search

Action Audio Narration

1

5

3

2

4

Description of the actionSchematic for Database functioning

Follow the steps as shown in the animations. First show the basic layout of the database. Then input the test “Capsid” in the text box on the top of the page. For each 4 categories, when the down-link gets clicked announce the options as the mouse hovers on them. The downlink in the animation should look like the downlink in web-pages. Re-create all images.

The protein structural databases contain a basic search box which requires the input for an identifier of the protein. This identifier can be the protein name, key-word, ID, author, etc. In this example, we take the case of Viral Capsid Proteins. These databases have advanced search features which are optional but help in making the query very specific. The general options can be categorized in 4 broad classes. Structural Features, Biology, Sequence Data and Experimental Details.

http://www.pdb.org/pdb/search/advSearch.do

Page 7: Bioinformatics and Protein Structural Analysis

Step 2.a: Protein Structure database: Output

Action Audio Narration

1

5

3

2

4Description of the action

Protein Structural Database

Number of Hits

Follow the steps as shown in the animations. Re-create all images. Show the display of “67” in front of tab titled “Number of Hits”. Then show the figure under the 2nd horizontal line. Show clicking effect on the 1st point. This slide and the 8 that follow it, are part for the same animated webpage.

The search results for the query protein entered showed 67 structures in the database that match the criteria given by the user in the search options. The first page of the results shows the titles of all the hits. The user then needs to select the protein structure of their interest to study in detail. Here we select the structure titled “HIV CAPSID C-TERMINAL DOMAIN (CAC146)” for further study.

67

1. HIV CAPSID C-TERMINAL DOMAIN (CAC146)

2. X-RAY CRYSTAL STRUCTURE OF EQUINE INFECTIOUS ANEMIA VIRUS (EIAV) CAPSID PROTEIN P26

3. ROUS SARCOMA VIRUS CAPSID PROTEIN: N-TERMINAL DOMAIN

4. STRUCTURE OF HIV1 PROTEASE AND AKC4P_133A COMPLEX.

Showing 1 to 4 of 67 Next

Schematic for Database functioning

http://www.pdb.org/pdb/explore/explore.do?structureId=1AUM

Page 8: Bioinformatics and Protein Structural Analysis

Step 2.b - Protein Structure database: Output

Action Audio Narration

1

5

3

2

4Description of the action

Protein Structural Database

Summary Sequence data Sequence similarity 3D similarity

BiologyMethods Geometry

1. 1AUM

2. Molecule:HIV CAPSIDStructure Weight: 7970.16Type:polypeptide(L)Chains:ALength:70Classification: Viral Protein

Derived data

Follow the steps as shown in the animations. Re-create all images. This slide and the 7 slides that follow it, are part for the same webpage. The mouse pointer should be shown clicking on each of the 8 tabs one –by-one , and the text below it changes accordingly. Always highlight the active tab with a different color as done in websites..As each of the four headings is being narrated in the audio narration, that particular text must be highlighted in the animation.

The summary page shows all the general information pertaining to the basic features of the protein. This includes:1 . Protein Identifier2. Molecule name, structure weight, polymer type, number of chains, length of the molecule and its classification3. Source organism and Expression organism4. Journal, paper and author name

http://www.pdb.org/pdb/explore/explore.do?structureId=1AUM

3. Scientific Name: Human immunodeficiency virus 1 Expression System: Escherichia coli bl21(de3)

4.“Structure of the carboxyl-terminal dimerization domain of the HIV-1 capsid protein”, Science, 1997

Schematic for Database functioning

Page 9: Bioinformatics and Protein Structural Analysis

Step 2.c - Protein Structure database: Output

Action Audio Narration

1

5

3

2

4Description of the action

Protein Structural Database

Summary Sequence data Sequence similarity 3D similarity

BiologyMethods Geometry Derived data

Follow the steps as shown in the animations. Re-create all images. This is a follow-up slide to slide #8, as described there.

The sequence data tab contains all the information related to the amino acid sequence corresponding to the protein under consideration1. FATSA sequence for all chains in the polypeptide 2. Type of chain such as polypeptide, glyco-peptide, lipo-peptide, etc.3. Diagrammatic representation of the Classification and Secondary structure of this chain - assigning residues with helix, sheet or turn

Schematic for Database functioning

1. FASTA>1AUM:A|PDBID|CHAIN|SEQUENCELDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALGPGATLEEMMTACQG

2.Chain Type: polypeptide(L)

3.

Sequence of Amino Acid Residues and their

positions

Cysteine Residues

Cysteine Residues

Di-sulphide bridge

Domain of the protein

Alpha HelixHydrogen Bonded

TurnNo assigned

secondary structure

http://www.pdb.org/pdb/explore/explore.do?structureId=1AUM

Page 10: Bioinformatics and Protein Structural Analysis

Step 2.d - Protein Structure database: Output

Action Audio Narration

1

5

3

2

4Description of the action

Protein Structural Database

Summary Sequence data Sequence similarity 3D similarity

BiologyMethods Geometry Derived data

Follow the steps as shown in the animations. Re-create all images. This is a follow-up slide to slide #8, as described there.

The sequence similarity tab shows the information related to comparative studies of the two sequences. 1. Option to perform BLAST search. 2. List of Clusters of proteins is produced. These clusters are formed and ranked based on the resolution of the structures within them. The better the quality (resolution) of the cluster, higher it is ranked.When the user clicks on a particular cluster, the component proteins within the cluster are displayed along with supporting information..

Schematic for Database functioning

Cluster Similarity Cut-off

Rank

100% 1

95% 3

PDB ID Name of the Protein

1A80 HIV CAPSID

2ONT Capsid protein p24

1AUM HIV CAPSID

BLAST

Perform BLAST of the sequence of the

retrieved ProteinTable for cluster of similar

proteins where the structure has been

determined

http://www.pdb.org/pdb/explore/explore.do?structureId=1AUM

Page 11: Bioinformatics and Protein Structural Analysis

Step 2.e -Protein Structure database: Output

Action Audio Narration

1

5

3

2

4 Description of the action

Protein Structural Database

Summary Sequence data Sequence similarity 3D similarity

BiologyMethods Geometry Derived data

Follow the steps as shown in the animations. Re-create all images. This is a follow-up slide to slide #8 , as described there.

The structural similarity tab shows the information related to comparative studies of the two structures. It establishes equivalences based on 3D conformations of both proteins. The default visualization tool for PDB is Jmol. Structural alignment is covered in more detail in the second part of this animation.

Schematic for Database functioning

http://www.pdb.org/pdb/workbench/showPrecalcAlignment.do?action=pw_fatcat&mol=1A8O.A&mol=1BAJ.A

HIV capsid alignment with GAG ployprotein

HIV CAPSID (colored orange)

GAG POLYPROTEIN (colored blue)

Page 12: Bioinformatics and Protein Structural Analysis

Step 2.f - Protein Structure database: Output

Action Audio Narration

1

5

3

2

4Description of the action

Protein Structural Database

Summary Sequence data Sequence similarity 3D similarity

BiologyMethods Geometry Derived data

This tab provides details of the methodology used in conducting those experiments. This includes,

1. Crystallization methods, pH, temperature, and other details of the experiment2. Crystal Data (Space group, unit cell dimensions)3. Diffraction source, diffraction protocol and diffraction detectors4. Data related to Resolution and Refinement details5. Software, programs and Computing utilized.A brief summary of this result is shown in this animation. For details visit

http://www.pdb.org/pdb/explore/materialsAndMethods.do?structureId=1AUM#

Schematic for Database functioning

All tables have to be re-drawn by the animator. Follow the steps as shown in the animations. This is a follow-up slide to slide #8, as described there.

Crystallization Experiments Method vapor diffusion - sitting droppH 8

Space Group Name I 41Diffraction Detector CCD

Computing Data Reduction (intensity integration) DENZO

Computing Data Reduction (data scaling) SCALEPACKComputing Structure Solution X-PLOR 3.843

Computing Structure Refinement X-PLOR 3.843

http://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM

Page 13: Bioinformatics and Protein Structural Analysis

Step 2.g - Protein Structure database: Output

Action Audio Narration

1

5

3

2

4Description of the action

Protein Structural Database

Summary Sequence data Sequence similarity 3D similarity

BiologyMethods Geometry Derived data

All tables have to be re-drawn by the animator. Follow the steps as shown in the animations. This is a follow-up slide to slide #8 , as described there.

The Geometry of the molecule contains all the spatial information about the Geometry of the molecule, so that it can be simulated in a virtual environment. This includes:Bond length: Number of occurrences and their positions in the chainsBond Angles: Number of occurrences and their positions in the chainsDihedral Angles: Number of occurrences and their positions in the chainsRamachandran plot, Fold Deviation Scores and other structural detailshttp://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM

Schematic for Database functioning

The position, total number, range of the covalent bond lengths between two adjacent atoms in a protein molecule

The angle formed by 3 consecutive atoms in native conformation of a protein and their statistics

The angle formed by 2 consecutive planes of 4 linearly bonded atoms. Their occurrence, positions along with other statistics.

Ramachandran Map to show the residues that lie in the favored region (outlined in Dark Blue) and the permitted region (outlined in light blue)

67/68 residues lie in the favored region and none of the residues lie in the

dis-allowed region

Residue ValuesLEU1 1.29ASP2 0.56ILE3 1.19

ARG4 1.73GLN5 1.29GLY6 1.85PRO7 0.65LYS8 0.73GLU9 1.27

PRO10 1.53PHE11 0.41

Values for Fold Deviation Score . For a specific reference value, FDS is a multiple of the standard deviationPlot for Fold Deviation Score. x- axis has the residue positions and y-axis has the FDS values

http://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM

Page 14: Bioinformatics and Protein Structural Analysis

Step 2.h - Protein Structure database: Output

Action Audio Narration

1

5

3

2

4Description of the action

Protein Structural Database

Summary Sequence data Sequence similarity 3D similarity

BiologyMethods Geometry Derived data

Follow the steps as shown in the animations. Re-create all images. This is a follow-up slide to slide #8 , as described there.

The biology tab contains information about the significance of the molecule at the biological and cellular level. This includes 1. Molecule type 2. Formula weight 3. Monomers, and linkages 4. Source method 5. Ligands and prosthetic groups 6. Gene detail and Genome information 7. Keywords

Schematic for Database functioning

Description HIV CAPSID

FragmentC-TERMINAL DOMAIN,

RESIDUES 146 - 231 Nonstandard Linkage no

Nonstandard Monomers no Polymer Type polypeptide(L)

Formula Weight 7970.2

Source Methodgenetically manipulated

Entity Name CAC146

SWS/UNP ID POL_HV1N5SWS/UNP Accession(s) P12497

Protein Details

Scientific NameHuman immunodeficiency virus 1

Genus LentivirusCell Line

Bl21

Host Scientific Name Escherichia coli bl21(de3)

Host Genus Escherichia

Host Species Escherichia Coli

Host Strain Bl21 (de3)

Host Vector Pet11a

Host Plasmid Name WISP97-7

Gene Details

http://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM

Page 15: Bioinformatics and Protein Structural Analysis

Step 2.g - Protein Structure database: Output

Action Audio Narration

1

5

3

2

4Description of the action

Protein Structural Database

Summary Sequence data Sequence similarity 3D similarity

BiologyMethods Geometry Derived data

Follow the steps as shown in the animations. Re-create all images. This is a follow-up slide to slide #8 , as described there.

Data for the same protein but from other resources such as SCOP, CATH and PFAM classification details are provided in the derived data tab. For more detailed analysis visit http://www.pdb.org/pdb/explore/derivedData.do?structureId=1AUM

Schematic for Database functioning

Domain Info d1auma_

Class All alpha proteins

Fold Acyl carrier protein

Super-Family

Retrovirus capsid dimerization domain-like

Family

Retrovirus capsid protein C-terminal domain

Domain HIV capsid protein,

dimerisation domain

Species

Human immunodeficiency virus

type 1 [TaxId: 11676]

SCOP classificationDomain 1aumA00

Class Mainly Alpha

Architecture Orthogonal Bundle

Topology

Non-ribosomal Peptide Synthetase

Peptidyl Carrier Protein; Chain A

CATH classification

Chain APFAM

AccessionPF00607

PFAM ID Gag_p24

Description

gag gene protein p24 (core nucleocapsid

protein)

Type Family

PFAM classification

http://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM

Page 16: Bioinformatics and Protein Structural Analysis

Master Layout (Part 2)

5

3

2

4

1 This animation consists of 2 parts:Part 1: Protein Structural DatabasesPart 2: Uses of Structural databases

Functional Annotation

Protein Structural alignment Secondary Structure Prediction

Page 17: Bioinformatics and Protein Structural Analysis

Definitions of the componentsPart 2 – Uses of structural databases

5

3

2

4

11. Protein Structural Alignment: The geometry of two given protein structures

can be compared by means of available software tools that analyse their three dimensional similarity to each other.

2. Protein Structure Prediction: The prospective secondary structures of peptides or proteins can be predicted from a given stretch of amino acid residues by using machine learning algorithms.

3. Machine Learning Algorithms: These are computer algorithms that can be trained from a given classified dataset. Thereafter, these programs train their parameters in a such a way, that they can classify new data. Most widely used Machine Learning Algorithms in Bioinformatics are Artificial Neural Networks, Hidden Markov Modeling, Support Vector Machines, etc.

4. Functional Annotation: For novel proteins that are yet to be characterized, the potential functions can be predicted by techniques such as Homology Modelling which provide an initial insight into the protein’s properties.

Page 18: Bioinformatics and Protein Structural Analysis

Definitions of the componentsPart 2 – Uses of structural databases

5

3

2

4

15. Gene Ontology: Also known as GO terms, they are identifiers to represent a

gene’s functional properties categorized to cover three domains namely, “cellular component”, “molecular function” and “biological process”.

6. Root Mean Square Deviation (RMSD): Qauantification of the average distance between the atoms of the super-imposed proteins. The higher is the RMSD value, the lower is the similarity.

7. Protein Structural Alignment Server: Web based servers which help in determining the structural similarity of two given proteins by superimposing the two proteins and calculating various comparative parameters. Currently there are a large number of web based servers assigned for this task. Few examples of available servers for this include DALI (Distance Matrix Alignment), MAMMOTH (Matching Molecular Models Obtained from Theory), CE/CE-MC (Combinatorial Extension -- Monte Carlo), SSAP(Sequential Structure Alignment Program), ProFit (Protein least-squares Fitting), etc.

Page 19: Bioinformatics and Protein Structural Analysis

Step 1: Structure Alignment - Input

Action Audio Narration

1

5

3

2

4Description of the action

Protein Structural Alignment Server (DALI)

Follow the steps as shown in the animations. Re-create all images. Enter the 2 IDs in the text box. Follow it with clicking effect on “Submit” Button. Show the action in progress effect as shown in the slide. Follow it with the two simple structures getting superimposed and highlight the no-aligned areas. Follow this with the actual output in the next slide.

Two given proteins can be structurally aligned to evaluate the similarity between them. The server requires an input of two protein sequences or their IDs, which are then simulated and aligned based on their 3D coordinates, bond angles and dihedral angles. Few of the various servers available for this are DALI, MAMMOTH, CE/CE-MC, SSAP and ProFit.

Enter the first PDB ID and Chain(or Upload a Protein Structure)

Enter the second PDB ID and Chain(or Upload a Protein Structure)

1A8O 1BAJ

Submit

Running the Server…3D Superimposition

Web-Tool functioning

Non-aligned regions on super-imposed structures

Page 20: Bioinformatics and Protein Structural Analysis

Step 2: Structure Alignment- Output 1

5

3

2

4

Protein Structural Alignment Server (DALI)

1A8O 1BAJ

P-value: 0.00e+00Score: 190.92RMSD: 0.75%Id: 94.0%

It is the probability for similarity between the two structures. P-value < 0.05

indicates significant similarity

Raw score of alignment is used to compare other similarity matches with same proteinsIn super-imposed proteins, RMSD The average of the

distances between the atoms

Percentage of identical residues in the sequences of

the alignment

http://www.pdb.org/pdb/workbench/showPrecalcAlignment.do?action=pw_fatcat&mol=1A8O.A&mol=1BAJ.A

Page 21: Bioinformatics and Protein Structural Analysis

Step 2: Structure Alignment- Output Action Audio Narration

1

5

3

2

4

Description of the actionFollow the steps as shown in the animations. Mention the definitions of the result in audio narration as well as written format. Re-create all images.

The results are 1. P-value: It is the probability measure that the two structure are similar. If P-value < 0.05 indicates significant similarity2. Raw score: It is used to compare other similarity matches with same proteins3. RMSD: Measure of the average distance between the atoms of the super-imposed proteins4. Percentage sequence identity in the alignment

Web-Tool functioning

http://www.pdb.org/pdb/workbench/showPrecalcAlignment.do?action=pw_fatcat&mol=1A8O.A&mol=1BAJ.A

Page 22: Bioinformatics and Protein Structural Analysis

Step 3: Structure Prediction 1

5

3

2

4

Protein Structural Prediction Server

Enter the sequence of amino acids (primary structure of protein)

DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEVDVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLPKLDELRDEGKASSAKQRLKCASLQKFGERA

Predicted Secondary Structure

Alpha Helix Beta Sheets

Coils

http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html

Page 23: Bioinformatics and Protein Structural Analysis

Action Audio NarrationDescription of the actionWeb-Tool functioning

Follow the steps as shown in the animations. Re-create all images.

Once the amino acid sequence of the protein is known, its secondary and tertiary structures can be predicted using many prediction algorithms, which utilize information from previous structurally characterized sequences. In the secondary structure prediction, 1.“h” represents Alpha Helix2.“e” represents Beta Sheets,3.“c” represents CoilsSince all known proteins have not yet been structurally characterized, this provides a useful bioinformatics analysis tool for researchers. The various servers for structure prediction are GOR, HNN, PredictProtein, NNPredict and Sspro.

Step 3: Structure Prediction

http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html

Page 24: Bioinformatics and Protein Structural Analysis

Step 4: Functional Annotation

Action Audio Narration

1

5

3

2

4 Description of the action

Protein Functional Annotation Server

Follow the steps as shown in the animations. Re-create all images.

Given a particular amino acid sequence, the cellular, molecular and biological processes associated with the sequence can be predicted using functional annotation servers. These processes are represented by a unique set of identifiers called “Gene Ontology Terms” or the “GO Terms”. The GO term can be a word or an alphanumeric identifier which includes a definition with cited sources and a namespace indicating the domain to which it belongs. The various server for this include DbAli Annolite, PFP, ProteomeAnalyst, GOPET, SpearMint and ProKnow.

Enter the sequence of amino acids (primary structure of protein)

DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEVDVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLPKLDELRDEGKASSAKQRLKCASLQKFGERA

Functional Prediction

Molecular Functions

Probability GO Term Description

100 %

97%

GO0549

GO0543

Vitamin D Binding

Water Binding

Biological Functions

Probability GO Term Description

100 %

97%

GO0189

GO0243

C21 Steroid Hormone Metabolism

Vitamin Transport

Cellular Component

Probability GO Term Description

89 %

74%

GO0432

GO0

Membrane

Intra-cellular organelle

Web-Tool functioning

http://www.pdb.org/pdb/explore/remediatedSequence.do?structureId=1AO6, http://kiharalab.org/web/pfp.php

Page 25: Bioinformatics and Protein Structural Analysis

Interactivity option 1: Predict the 3 Dimensional Structure of Human Serum Albumin and cross-validate

Boundary/limitsInteracativity Type Options Results

1

2

5

3

4

Input the term “human serum albumin” in a structural Database 1

Click on the hit which matches with your query 2

Go to the “sequence details” tab and retrieve the FASTA sequence of the protein 3

Go to the 3D structure details and save the actual co-ordinates and the 3D structure of the protein, derived from experimental details 4

Select a structural alignment tool and superimpose the predicted structure on the actual structure derived from the database 6

Predict the tertiary structure from the amino-acid sequence and save the predicted structure coordinates 5

Arrange the steps in the order to be performed. Remove the step number from the bottom of the tab

Remove the step number mentioned in the tabs in “yellow” color. Show all the steps in the mixed order. The user must click on the tabs order wise. If the user clicks at a tab which is not in the right order, then flash a message saying “try again”

All the tabs must be arranged in right order.

Check for the quality of the alignment. If the RMSD value is low, then the structural alignment is good. Thereby, the structure prediction was correct 7

Page 26: Bioinformatics and Protein Structural Analysis

Interactivity option 2.a - True/False - Questions

Interactivity Type Options Results

1

2

5

3

4True or False Flash the Questions one at a time. User needs to

press either the “Green tab” marked “TRUE” or the “Red Tab” marked “FALSE”. If the answer is correct flash “Tick”. If the answer is incorrect flash “Cross”. For all questions which have an answer “False”, also mention the correct answer as shown in the next slide

Next Slide

GO stands for “Genetic Oncology”

DALI is a server for Protein Structural Alignment

SCOP is a classification scheme for Nucleic Acids

p-value is one of the result from Structural Alignment

In protein secondary structure, “e” stands for coil

RMSD stands for “Root Mean Square Distance”

TRUE

FALSE

Page 27: Bioinformatics and Protein Structural Analysis

Interactivity option 2.b - True/False - Correct Answers

Interacativity Type Options Results

1

2

5

3

4True or False

Flash the Questions one at a time. User needs to press either the “Green tab” marked “TRUE” or the “Red Tab” marked “FALSE”. If the answer is correct flash “Tick”. If the answer is incorrect flash “Cross”

The questions are followed by their correct answers

GO stands for “Genetic Oncology”

DALI is a server for Protein Structural Alignment

SCOP is a classification scheme for Nucleic Acids

p-value is one of the result from Structural Alignment

In protein secondary structure, “e” stands for coil

RMSD stands for “Root Mean Square Distance”

TRUE

FALSE

FALSE

FALSE

FALSE

TRUE

GO stands for “Genetic Ontology”

SCOP is a classification scheme

for ProteinsIn protein secondary

structure, “e” stands for beta sheets

RMSD stands for “Root Mean Square Deviation”

Page 28: Bioinformatics and Protein Structural Analysis

Interactivity option 2.c - True/False - Example

Boundary/limitsInteracativity Type Options Results

1

2

5

3

4True or False

Flash the Questions one at a time. User needs to press either the “Green tab” marked “TRUE” or the “Red Tab” marked “FALSE”. If the answer is correct flash “Tick”. If the answer is incorrect flash “Cross” and the correct answer as mentioned in the next slide

This is an example slide to show the various cases of answers.

GO stands for “Genetic Oncology”

TRUE

FALSE

The correct answer

is “False”. GO stands for “Genetic Ontology”

DALI is a server for Protein Structural Alignment

SCOP is a classification scheme for Nucleic Acids

SCOP is a classification scheme

for Proteins

Page 29: Bioinformatics and Protein Structural Analysis

Questionnaire1. Which is the server for Protein Structure Prediction ?

Answers: a) ProtParam b) PeptideMass c) nnPREDICT

d) DALI

2. Which is the server for Functional annotation of Proteins?

Answers: a) DALI b) GOR c) SSAP d) Proteome

Analyst

3. Which amongst these is NOT the output for Functional annotation?

Answers: a) GO Term b)Source Organism c) Probability

of annotation d) Description of Function

4. By default, PDB structures appear in which visualization tool?

Answers: a) VMD b) NAMD c) Jmol d) None of the

above

5. PDB is primarily which Database?

a) Protein b) Nucleotide c) Gene d) None of the Above

1

5

2

4

3

Page 30: Bioinformatics and Protein Structural Analysis

Links for further readingReference websites

http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.htmlhttp://cubic.bioc.columbia.edu/predictprotein/

http://ekhidna.biocenter.helsinki.fi/dali_lite/starthttp://kiharalab.org/web/pfp.php

http://pa.cs.ualberta.ca:8080/pa/index.htmlhttp://www.ebi.ac.uk/Tools/clustalw2/index.html

http://www.pdb.org/pdb/home/home.do

http://expasy.org/sprot/

http://expasy.org/prosite/

http://webdocs.cs.ualberta.ca/~bioinfo/PA/

Page 31: Bioinformatics and Protein Structural Analysis

Links for further reading

Following URLs are used for animations

http://www.pdb.org/pdb/search/advSearch.do

http://www.pdb.org/pdb/explore/explore.do?structureId=1AUM

http://www.pdb.org/pdb/workbench/showPrecalcAlignment.do?action=pw_fatcat&mol=1A8O.A&mol=1BAJ.A

http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.htmlhttp://www.pdb.org/pdb/explore/remediatedSequence.do?

structureId=1AO6 http://kiharalab.org/web/pfp.php

Page 32: Bioinformatics and Protein Structural Analysis

Links for further readingPublished Literature

SCOP: A Structural Classification of Proteins Databasefor the Investigation of Sequences and Structures

Alexey G. Murzin, Steven E. Brenner, Tim Hubbard and Cyrus Chothia.J. Mol. Biol. (1995) 247, 536–540

CATH — a hierarchic classification of protein domain structuresCA Orengo, AD Michie, S Jones, DT Jones, MB Swindells and

JM Thornton Structure 1997, Vol 5 No 8

Books:

Bioinformatics Sequence and Genome Analysis by David Mount