signals in sequences the number of sequences available for analysis rapidly approaches infinite. we...

38
Signals in Sequences Signals in Sequences The number of sequences The number of sequences available for analysis available for analysis rapidly approaches rapidly approaches infinite. infinite. We need new ways to look We need new ways to look at all this information. at all this information.

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Signals in SequencesSignals in Sequences

The number of sequences The number of sequences available for analysis rapidly available for analysis rapidly approaches infinite.approaches infinite.

We need new ways to look at We need new ways to look at all this information.all this information.

Page 2: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Rule 1Rule 1

First rule of sequence First rule of sequence analysis:analysis:

If a residue is conserved, it is If a residue is conserved, it is important.important.

Page 3: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Rule 2Rule 2

Second rule of sequence analysis:Second rule of sequence analysis:

If a residue is very conserved, it is If a residue is very conserved, it is very important.very important.

Page 4: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

GPCR ProjectGPCR Project

GPCR is THE drug target.Lots of data available.You have ~630 GPCRs.Little structure data.2000 sequences known.‘Easy’ to align.

Page 5: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

The GPCR (rhodopsin)The GPCR (rhodopsin)

Page 6: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

1 conserved aa / helix!1 conserved aa / helix!

Page 7: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Laerte about modelling:Laerte about modelling:

“Use the sequence, Luke”

Page 8: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Conserved, CMA, variableConserved, CMA, variable

QWERTYASDFGRGHQWERTYASDTHRPMQWERTNMKDFGRKCQWERTNMKDTHRVWBlack = conservedWhite = variableGreen = correlated mutations(CMA)

Page 9: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

CMA and treeCMA and tree

1 ASASDFDFGHKM2 ASASDFDFRRRL3 ASLPDFLPGHSI4 ASLPDFLPRRRV

Page 10: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

CMA versus treeCMA versus tree

1 ASASDFDFGHKMGHS2 ASASDFDFRRRLRHS3 ASLPDFLPGHSIGHS4 ASLPDFLPRRRVRIT5 ASASDFDFRRRLRIT6 ASLPDFLPGHSIGITRed : 1,2,5 vs 3,4,6Black : 1,3,6 vs 2,4,5Yellow: 1,2,3 vs 4,5,6

Page 11: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

CMA on GPCRCMA on GPCR

Page 12: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

CMA on GPCRCMA on GPCR

Page 13: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Florence HornFlorence Horn

Page 14: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Class B LigandsClass B Ligands

Page 15: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Class B – ligand dockingClass B – ligand docking

Page 16: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

G protein-coupling?G protein-coupling?

Page 17: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Sequence SignalsSequence Signals

Three classes of residues

1) Conserved2) CMA3) Variable

Page 18: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Conservation ArtefactsConservation Artefacts

Conservation can result from

Not enough sequencesToo conserved sequencesOver-alignment

Page 19: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Variability ArtefactsVariability Artefacts

Variability can result from

Wrong sequence choiceVariable loopsAlignment errors

Page 20: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

CMA ArtefactsCMA Artefacts

CMA can result from

Wrong sequence choicePoor sequence homogeneity Over-fitting

Page 21: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Recalcitrant residues Recalcitrant residues

Page 22: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Sequence EntropySequence Entropy

20

Ei = pi ln(pi) i=1

Page 23: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Sequence VariabilitySequence Variability

Sequence variability is the number of residues that is present in more than 0.5% of all sequences.

Page 24: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Entropy - VariabilityEntropy - Variability

Entropy = Information Variability = Chaos

Page 25: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Entropy - VariabilityEntropy - Variability

Variability is result of evolution.

Entropy is the protein’s break on evolutionary speed.

Page 26: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

GPCR Entropy - VariabilityGPCR Entropy - Variability

11 Red12 Orange22 Yellow23 Green33 Blue

Page 27: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

GPCR LocationGPCR Location

11 Red12 Orange22 Yellow23 Green33 Blue

Page 28: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Ras Entropy - VariabilityRas Entropy - Variability

Page 29: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Ras LocationRas Location

11 Red12 Orange22 Yellow23 Green33 Blue

Page 30: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Protease Protease Entropy - VariabilityEntropy - Variability

Page 31: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Protease LocationProtease Location

11 Red12 Orange22 Yellow23 Green33 Blue

Page 32: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Globin Globin Entropy - VariabilityEntropy - Variability

GPCR

Page 33: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Globin LocationGlobin Location

11 Red12 Orange22 Yellow23 Green33 Blue

Page 34: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

GPCR Again….GPCR Again….

Page 35: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

GPCR Location (Again)GPCR Location (Again)

11 Red12 Orange22 Yellow23 Green33 Blue

Page 36: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

GPCR signalingGPCR signaling

11 Purple12 Red22 ‘Yellow’23 Green33 Blue

Page 37: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

SummarySummary

Given infinitely many sequences:

Every residues role known.Signaling paths detectable.

So, sequences contain many signals

Page 38: Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information

Thanks to:Thanks to:

Laerte Oliveira Sao PauloWilma Kuipers Weesp Florence Horn San Francisco

Bob Bywater CopenhagenNora vd Wenden The HagueMike Singer New HavenAd IJzerman LeidenMargot Beukers LeidenAmos Bairoch GenevaFabien Campagne New York