what can (many) sequences tell us?

33
What can (many) sequences tell us?

Upload: joseph-benjamin

Post on 31-Dec-2015

16 views

Category:

Documents


1 download

DESCRIPTION

What can (many) sequences tell us?. Nuclear receptor function. NR2A2-HN4G. NR2B3-RRXG. NR2A5-HN4 d?. NR2B1-RRXA. NR2B2-RRXB. NR3C1-GCR. NR2A1-HNF4. NR3C4-ANDR. NR3A1-ESTR. NR2C2-TR4. NR3C3-PRGR. NR2C1-TR2-11. NR0B1-DAX1. NR2E1-TLX. NR3A2-ERBT. NR0B2-SHP. NR3C2-MCR. NR2E3-PNR. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What can (many) sequences tell us?

What can (many) sequences tell us?

Page 2: What can (many) sequences tell us?

Nuclear receptor function

Page 3: What can (many) sequences tell us?

Nuclear receptor family

NR1C1-PPAR

NR1C2-PPAS

NR1C3-PPAT

NR1D1-EAR1NR1D2-BD73

NR1I3-MB67NR1I4-CAR1-MOUSE-

NR1H2-NER

NR1H3-LXR

NR1H4-FAR

NR4A2-NOT

NR4A3-NOR1

NR4A1-NGFINR2F1-COTF

NR2F2-ARP1

NR2F6-EAR2

NR2E3-PNR

NR2B1-RRXA NR2B2-RRXB

NR2A2-HN4G

NR3C1-GCRNR3C4-ANDR

NR3C3-PRGRNR3A1-ESTR

NR3A2-ERBT

NR3B1-ERR1

NR3B2-ERR2

NR5A1-SF1NR5A2-FTF

NR1I1-VDR

NR1B3-RRG1

NR2E1-TLXNR2C1-TR2-11

NR2C2-TR4

NR6A1-GCNF

NR2B3-RRXG

NR2A1-HNF4NR2A5-HN4

NR0B1-DAX1NR0B2-SHP NR3C2-MCR

NR1F3-RORG

NR1F2-RORBNR1F1-ROR1NR1A2-THB1

NR1A1-THA1NR1I2-PXR

NR1B2-RRB2 NR1B1-RRA1

Page 4: What can (many) sequences tell us?

Nuclear receptor structure

A-B C D E F

Ligand binding domain– conserved protein fold– > 20% sequence similarity

DNA binding domain– highly conserved– > 90% similarity

C

E

AF-1 DNA LBD

Page 5: What can (many) sequences tell us?

The questions

As Organon is paying the bills, question one is, of course☺, how do ligands relate to activity?

With and without ligand being present, NRs can bind co-activators and co-repressors, so what is an agonists, an antagonists, or an inverse agonists?

What is the role of each amino acid in the NR LBD?

Which data handling is needed to answer these questions?

Page 6: What can (many) sequences tell us?

3D structure LBD

(hER)

Page 7: What can (many) sequences tell us?

Available NR data

56 structures in (PDB) (>200 now)

>500 sequences (scattered) (>1500 now)

>1000 mutations (very scattered)

>10000 ligand-binding studies (secret)

Disease patterns, expression, >1000 SNPs, genetic localization, etc., etc., etc.

This data must be integrated, sorted, combined,validated, understood, and used to answer our questions.

Page 8: What can (many) sequences tell us?

Step 1

The first important step is a common numbering scheme.

Whoever solves that problem once and for all should get three Nobel prices.

Page 9: What can (many) sequences tell us?

Large data volumes

Large data volumes allow us to develop new data analysis techniques.

Entropy-variability analysis is a novel technique to look at very large multiple sequence alignments.

Entropy-variability analysis requires ‘better’ alignments than routinely are obtained with ‘standard’ multiple sequence alignment programs.

Page 10: What can (many) sequences tell us?

Part of the big alignment

Page 11: What can (many) sequences tell us?

Vriend’s first rule of sequence analysis

If it is conserved,it is important

Page 12: What can (many) sequences tell us?

Vriend’s second rule of sequence analysis

If it is very conserved,it is very important

Page 13: What can (many) sequences tell us?

QWERTYASDFGRGHQWERTYASDTHRPMQWERTNMKDFGRKCQWERTNMKDTHRVWRed = conservedGreen = variableBlue = correlated

What is CMA?

Page 14: What can (many) sequences tell us?

Wilma

Wilma Kuipers Thesis

Page 15: What can (many) sequences tell us?

Correlation analysis

Receptor

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ...

Affinity + + + + - - - - - - - - - - - + + + - - - - - - ...

res. 386 N N N N T T T T A A A V V L L N N N Y Y Y Y T T ...

1 = 5HT-1a

2 = 5HT-1b

3 = 5HT-1d

.... ....

• Correlate sequences with ligand binding affinities• Alignments showed 100% correlation of affinity for

pindolol and the absence/presence of Asn386

• Obviously, Asn386 plays an important role in ligand binding

Page 16: What can (many) sequences tell us?

Wilma Kuipers Thesis

Wilma

Page 17: What can (many) sequences tell us?

Wilma Kuipers Thesis

Wilma

Page 18: What can (many) sequences tell us?

Entropy

20

Ei = pi ln(pi)

i=1

Sequence entropy Ei at position i is calculated from the frequency pi of the twenty amino acid types (p) at position i:

Page 19: What can (many) sequences tell us?

Variability

Sequence variability Vi is the number of amino acid types observed at position i in more than 0.5% of all sequences.

Page 20: What can (many) sequences tell us?

Ras Entropy-Variability

11 Red12 Orange22 Yellow23 Green33 Blue

Page 21: What can (many) sequences tell us?

Protease Entropy-Variability

11 Red12 Orange22 Yellow23 Green33 Blue

Page 22: What can (many) sequences tell us?

Globin Entropy-Variability

11 Red12 Orange22 Yellow23 Green33 Blue

Page 23: What can (many) sequences tell us?

GPCR Entropy-Variability; signalling path

GPCR11 G protein12 Support22 Signaling23 Ligand in33 Ligand out

Page 24: What can (many) sequences tell us?

0.0

0.4

0.8

1.2

1.6

2.0

2.4

2.8

0 2 4 6 8 10 12 14 16 18

VARIABILITY

ENTROPY

11

2212

23 33

11 main function

12 first shell around main function

22 core residues (signal transduction)

23 modulator

33 mainly surface

NR LBD Entropy-Variability

Page 25: What can (many) sequences tell us?

Mutation data

http://www.cmbi.ru.nl/NR/http://www.receptors.org/

1095 entries 41 receptors12 species3D numbers7 sources

Page 26: What can (many) sequences tell us?

Mutation dataDiseases

0%

10%

20%

30%

40%

50%

60%

Box 11 Box 12 Box 22 Box 23 Box 33

Transcription

0%

5%

10%

15%

20%

Box 11 Box 12 Box 22 Box 23 Box 33

Coregulator

0%

10%

20%

30%

40%

Box 11 Box 12 Box 22 Box 23 Box 33

Dimerization

0%

10%

20%

30%

40%

Box 11 Box 12 Box 22 Box 23 Box 33

Page 27: What can (many) sequences tell us?

Mutation data

Ligand binding

0%

10%

20%

30%

Box 11 Box 12 Box 22 Box 23 Box 33

No effect

0%

1%

2%

3%

4%

5%

6%

Box 11 Box 12 Box 22 Box 23 Box 33

No mutations

0%

5%

10%

15%

20%

25%

Box 11 Box 12 Box 22 Box 23 Box 33

Page 28: What can (many) sequences tell us?

Ligand binding data

Ligand-binding positions extracted from PDB files (nomenclature)

Categorized in ‘very frequent’ to ‘not so frequent’ binder

Type of ligand (agonist/antagonist=inverse agonist…)

Page 29: What can (many) sequences tell us?

LIG 1 more than 50 of 56

LIG 2 25-50 of 56

LIG 3 11-24 of 56

LIG 4 1-10 out of 56

H-bonds (~35,15,15,15)

Ligand-binding residues

Page 30: What can (many) sequences tell us?

Example: role of Asp 351

antagonistagonist

Page 31: What can (many) sequences tell us?

Ligand, cofactor and dimerization data combined with entropy-variability analysis

Ligand contacting residues

0

2

4

6

8

10

12

Box 11 Box 12 Box 22 Box 23 Box 33

Cofactor contacting residues

0

0.5

1

1.5

2

2.5

3

3.5

Box 11 Box 12 Box 22 Box 23 Box 33

Residues involved in dimerization

0

1

2

3

4

5

6

7

Box 11 Box 12 Box 22 Box 23 Box 33

Page 32: What can (many) sequences tell us?

Conclusions:

Data is difficult, but we need it (sic); life would be so nice if we could do without it. PDB files are the worst.

Nomenclature is not homogeneous. Ontologies….

Much data has been carefully hidden in the literature, where it can only be found back with great difficulty.

Residue numbering is difficult but very necessary.

Variability-entropy analysis is powerful, but requires very 'good' alignments.

Page 33: What can (many) sequences tell us?

A short break for a word from our sponsors

LaerteOliveira

Our industrial sponsor:

FLORENCE

HORN

Wilma Kuipers Weesp Bob Bywater CopenhagenNora vd Wenden The HagueMike SingerNew HavenAd IJzermanLeidenMargot Beukers LeidenFabien Campagne New YorkØyvind Edvardsen TromsØ

Simon Folkertsma FrisiaHenk-Jan Joosten WageningenJoost van Durma BrusselsDavid Lutje Hulsik UtrechtTim Hulsen GoffertManu Bettler Lyon

Elmar

Krieger

Simon Folkertsma

David

Tim

Adje Margot

FabienManu