information system in biologylsir · information system in biology homology modeling in biology and...

27
INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL

Upload: others

Post on 26-Jan-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

INFORMATION SYSTEM IN BIOLOGY

HOMOLOGY MODELING IN BIOLOGY AND MEDECINE

Virginie Lafleur EPFL

Page 2: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 2

Plan

What is Homology ModelingGoalsRecalls in biology

How construct a modelChecksSteps

Programs of Homology ModelingQuick presentationModeller

Page 3: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 3

The goal of Homology Modeling

Use homologous sequences to construct a model of 3D structureAnalyze relationships between DNA sequence and 3D structure of proteinsKnow protein’s 3D structure to understand interactions with other moleculesCreate computer-aided drug design, mutagenesis and protein engineering

Page 4: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 4

Homologous sequences

Gene A in DNA sequence

Duplication of the gene A

Gene AGene A

Gene A Gene A’

Mutation involving

Speciation

Gene A in species 1

ParalogyOrthology

Xenology Copy transferred to the specie from an other organism

Gene A duplicated

Gene A in different organism

Gene A in species 2

Page 5: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 5

Level structure>1ubq_ mol: protein length:76 Ubiquitin

MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGGPrimary structure

Secondary structure

Tertiary structure

Quaternary structure

Several subunits not part of the same polypeptide chain

Page 6: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 6

Why it is important to design protein structure?

Diversity of structure for several functions:Enzymatic activity

Storage

Transport

Immune response

Structure implies function:Example with experiment

of denaturation

Denaturation

Refolding

Page 7: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 7

How determine the protein structure?

By experimentationX-RayNMR (nuclear magnetic resonance spectroscopy)

Today, Sequence Analysis have explodedWe have the data

We need to construct 3D models

The idea!Use similar structure to identify constraints and build fold corresponding

Homology Modeling

Page 8: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 8

Where find the data?

Protein Data Bank (PDB)http://www.rcsb.org/pdb/

> 10,000 structures of proteins

Text file contain: coordinates for each heavy (non-hydrogen) atom from the first residue to the last

ATOM 1 N SER A 2 29.089 9.397 51.904 1.00 81.75 ATOM 2 CA SER A 2 27.883 10.162 52.185 1.00 79.71ATOM 3 C SER A 2 26.659 9.634 51.463 1.00 82.64 ATOM 4 O SER A 2 26.718 8.686 50.686 1.00 81.02 ATOM 5 CB SER A 2 28.039 11.660 51.932 1.00 75.59ATOM 6 OG SER A 2 27.582 12.038 50.639 1.00 43.28-------ATOM 1737 CD1 ILE A 229 39.535 21.584 52.346 1.00 41.62TER 1738 ILE A 229

Page 9: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 9

The way to visualize the protein

It is impossible to read this text file without the help of graphic viewers such as RASMOLhttp://www.bernstein-plus-sons.com/software/rasmolDifferent way to visualize:

Coloring: by structure

All-atom model, in ball-and-stick representation

Space-filling modelCα Trace

Page 10: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 10

Structure and homologous sequences

With at least 30% identity between two sequences, a definite correlation exists between sequence and structure

In particular, homologous sequences show very similar structures, with strong conservation in secondary structural elements

Some folds are preferred by vastly different sequences to conserve the structure of the active site

On the other hand, some proteins adopt very similar structures, with no obvious sequence similarity

Page 11: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 11

Why homology modeling?

Other way to construct 3D modelPrediction method

Ab initio

Threading

But :Expansive in time and in calculation

The solution of Homology ModelingFrom 3D structure for each protein family

Construct model from this known structure

template structure

Page 12: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 12

Before building a model…

Elements of sequence analysis, essential for building a molecular model, will be considered

Multiple sequence alignment

Alignment checks

Protein domain,…

Page 13: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 13

Some problems have to be solved

Homologous sequences are identified by using database search methods (BLAST) To build a model, we require the alignment of complete protein sequences, collected from database searchesIdentical residues must be lined upThe rest should be arranged, based on

observed substitution in protein familieschemical similaritycharge similarity

Where none of the 19 residues is suitable, the alignment simply skips that position a ‘gap’ (insertion/deletion regions)

CLUSTALW/CLUSTALX, MAXHOM, MALIGN (MSA method)

Page 14: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 14

After alignment, check the result

The function of a protein depends on the localization in space of a few key residuesSome residues are critical for the stability of the protein fold or for the formation of functional quaternary structuresConserve all residues usually indicate some conserved structural or functional role, especially buried charges

Page 15: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 15

Checking protein domain

A polypeptide sequence can contain several regions of compact globular proteins, which can fold independently domainA domain is a compact unit of protein structure, usually associated with a functionTo know what domains go into making up a given protein is importantThe 3D model of a protein will be composed of these modular elements, usually constructed individually and then assembled together

Page 16: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 16

How to construct homologous model?

Find homologous sequence

Select the template sequence of known structure

Align the template and the target structure

Build the model

Page 17: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 17

The most important step: find a homologous structure

The criteria:Alignment Score and E value (discarded: low scores and high values (> 0.005) )

Domain coverage (at least 60% of the domain)

Gaps (the fewer the gaps, the better the structural model)

For small proteins, specific search (disulfide bond)

No structure found: prediction method used (second and tertiary structure prediction method)

Page 18: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 18

Selection of template sequence

Single structural homologue one unique choice for template selectionSeveral equally structural homologues are identified how many and which one(s) should we choose? Improve one template in viewing

simple phylogenetic tree (show the most similar structure)Completeness of structural information (viewing PDB information by RASMOL and verify the completeness of the structure)X-ray and NMR entries

Page 19: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 19

One or many templates?

When we have selected many templates with same quality and similarity

Compare 3D structure to check the unique information each templates provides

Structure alignment of Cα atoms

If 2 templates are very close, keep only one

Keep templates that provide new information

Page 20: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 20

Align the template and the target sequences

In case of homology (>40%), the alignment is constant and every method is available

In the other cases, the use of multiple sequence improve the quality

Some checks are needed to increase the satisfaction of the model

Residue conservation checks (pattern and function)

Visual inspection of indel regions (RASMOL)

Page 21: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 21

Illustration of the building

Page 22: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 22

And finally… Build the model

It is the moment to use a program

In input: target, template sequence and their alignment

In output: the 3D structure responding of the constraints

Page 23: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 23

Which program to choose?

WHATIF (1990)

SWISSMODEL (1993)

MODELLER (1994)

ICM (1994)

CPH Models (1997)

SDSC1 (2000)

3D-JIGSAW (2001)

Page 24: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 24

A short presentation of Modeller[Šali & Blundell, 1993]

This is one of the best available modeling programs

Is written in Fortran 90

A graphical interface to MODELLER is commercially available from Accelrys, as part of Discovery Studio Modeling 1.1.

Page 25: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 25

Advantages of Modeller

Implements comparative protein structure modelling by satisfaction of spatial restraints (2,3)

Can perform many additional tasksincluding de novo modelling of loops in protein structures

Optimize various models of protein structure with respect to a flexibly defined objective function

Perform multiple alignment of protein sequences and/or structures

Search sequence in databases

Compare protein structures

Page 26: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 26

Optimization with iteration

Page 27: INFORMATION SYSTEM IN BIOLOGYlsir · INFORMATION SYSTEM IN BIOLOGY HOMOLOGY MODELING IN BIOLOGY AND MEDECINE Virginie Lafleur EPFL. 01/12/2004 2 Plan What is Homology Modeling Goals

01/12/2004 27

To conclude

Protein structure determine functions

Importance to know protein structure for application in biology and medecine

Homology modeling :From a known structure in protein family

Build a model of homologous sequence