what i will tell
DESCRIPTION
DECONSTRUCTING EVOLUTION TO PERTURB PROTEINS AND NETWORKS Olivier Lichtarge MD, PhD Cullen Professor of Human and Molecular Genetics Baylor College of Medicine Houston, Texas USA. PROLOGUE. What I will tell. What I want to tell. What I tried to tell. - PowerPoint PPT PresentationTRANSCRIPT
DECONSTRUCTING EVOLUTION TO PERTURB PROTEINS AND NETWORKS
Olivier Lichtarge MD, PhD
Cullen Professor of Human and Molecular GeneticsBaylor College of Medicine
Houston, TexasUSA
What I will tell
What I want to tell
What I tried to tell
PROLOGUE
MORBIDITY AND MORTALITY OF PROTEIN DISEASES
• Alzheimer’s• Cancers• Sickle cell• HIV entry• Autoimmune diseases• Amyloidosis• Type II diabetes
• Bleeding diathesis• Molecular mimicry• Cardiomyopaties• Cystic Fibrosis• Huntington’s chorea• Ataxias….
PROTEIN DYSFUNCTION IS LINKED TO MANY AILMENTS
Functional site identification has widespread applications
UNDERSTAND FUNCTIONAL SITES
Molecular recognitionprotein-small moleculeprotein-peptideprotein-proteinprotein-nucleic acid
Functioncatalysissignalingmotionmetabolismimmunitytransport…
FOCUS EXPERIMENTS RELEVANT TARGETS
Engineer • drugs • peptide mimics• binding sites• catalytic sites
Modulate pathwayssignalingtranscriptiondevelopmentapoptosis…
FUNCTIONAL SITES MEDIATE PROTEIN FUNCTION
Lichtarge LabBaylor College of Medicine
HOW DO PROTEINS WORK?
To control proteins, know their functional determinants
functional determinants
RELEVANT PATTERNS OF EVOLUTIONARY VARIATIONS
FUNCTIONAL DETERMINANTS IN PROTEINS
Lichtarge LabBaylor College of Medicine
Block, separate function
RATIONAL PROTEIN DESIGN
Lichtarge LabBaylor College of Medicine
Galpha Bourne Onrust Science (1997)RGS Wensel Sowa PNAS (2000) Nucl. Transp Moore Cushman JMB ‘04Nucl. Recept. Smith Raviscioni Proteins “06Ku70/80 Bertuch Ribes-Zamora NSMB ’07 GRK Clark Baamaeur Mol. Pharm ‘10RecA-LexA Lichtarge Adikesavan (submitted)
Peptide inhibitor or
trigger
RATIONAL PROTEIN DESIGN
Lichtarge LabBaylor College of Medicine
Cohesin PatiGRK
ClarkeKu70/80 Bertuch
Rewire function
RATIONAL PROTEIN DESIGN
Lichtarge LabBaylor College of Medicine
RGS Wensel Sowa NSMB (2001) Proneural Tx Hassan Quan Develop. ‘04Nucl. Recept. Cooney Raviscioni JMB“06GPCR Wensel Rodriguez PNAS ‘10
Constitutive activity“internal reprogramming”
RATIONAL PROTEIN DESIGN
Lichtarge LabBaylor College of Medicine
GPCR Wensel Madabushi JBC ’04GPCR Lefkowitz Shenoy JBC ‘06 GPCR Wensel Rodriguez PNAS ’10
MUTATIONAL IMPACT
Lichtarge LabBaylor College of Medicine
Item Mol. Genet. Metab. ‘07Shaibani Arch.Neurol. ’09Haberle Hum. Mutations ‘10Katsonis in prep
PROTEIN FUNCTION PREDICTION
Lichtarge LabBaylor College of Medicine
PROTEIN FUNCTION PREDICTION
Lichtarge LabBaylor College of Medicine
Kristensen Prot Sci ‘06Kristensen BMC Bioinfo ‘08Ward PLoS One ‘09Kristensen J Mol Biol ‘09Venner PLoS One ‘10
PATTERNS AND EMERGING PROTEOMIC RULES
ScalableRobust
Not random
Match known sites
Predict and guide
experiments
Three classes of automated accurate ET ranking functions Three ET servers:
http://mammoth.bcm.tmc.edu
Function
Sequence StructureEVOLUTION4. ET quality measures
(Clustering, Rank Information) correlate with
prediction quality
1. Amino acids may be ranked by importance
3. Clusters predict functional sites
2. Top-ranked residues cluster
6. Importance symmetry across interface
7. Top-ranked residue variations: specificity key
5. ET Clusters exchange water more slowly more H-bonds and
salt bridges
Lichtarge LabBaylor College of Medicine
What I will tell
What I want to tell
What I tried to tell
– LECTURE 1 –
PROBLEM
METHOD
Given a structure• Where is the active site ?• What are the key residue determinants of function?
Evolutionary Trace (ET)Use evolution’s mutations and assays
• Overview SH2, SH3, ZnF
• Functional sites, 4º structureRGS, Ga
• Functional annotation ZnF, GPCRs• Remote homology and alignments GPCRs• Generality
EVOLUTION: A COMPUTATIONAL TOOL FOR PROTEIN FUNCTIONAL SITE DISCOVERY
Lichtarge LabBaylor College of Medicine
INTEGRATINGSEQUENCE-STRUCTURE-FUNCTION INFORMATION
Lichtarge LabBaylor College of Medicine
SEQUENCE
STRUCTURE FUNCTION
A FUNDAMENTAL CHALLENGE
Non-deterministicprocess
Deterministicprocess
Lichtarge LabBaylor College of Medicine
STRUCTUREX
A SIMPLER PROBLEM
SEQUENCE
FUNCTIONALSITE
? EXPERIMENTSTHEORY
Given structure x, where are its functional sites?
NEED a CHEAP, SCALABLE method to characterize the key residue determinants of protein function
• What is important in the structure ?• Where are the functional sites? • How is specificity encoded ?
FUNCTIONAL SITE CHARACTERIZATION:A LIMITING STEP IN EXPLOITING STRUCTURES
• Mutational analysis is precise, but protein specific, costly, and requires assays.
• Structural Genomics producing vast numbers of new structures.
Lichtarge LabBaylor College of Medicine
Very basic Evolutionary Tracing (ET)
• Location, architecture and function of active sites are conserved• Specific variations impart novel and unique functional modulations
GAALF…….RT…W…KL
GAALY…….RT…W…KD
GAQLF…….FT…W…RE
IF these macroscopic observations apply to proteinsTHEN active site residues will be invariant within functional classes,THEREFORE identify functional sites by looking for such class specific residues.
Lichtarge LabBaylor College of Medicine
AR
YL
DTW
AK
GA
FF
LD
TW
QR
G
AR
FL
LTW
AK
G
FUNCTIONAL SITES EVOLVE THROUGH VARIATIONS ON A CONSERVED ARCHITECTURE
0. GATHER SEQUENCES
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNM
Lichtarge LabBaylor College of Medicine
1. SPLIT THEM INTO FUNCTIONAL SUBGROUPS
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQM
ASR.YTGVKKNVASR.YTGVKKNV
ASR.YTGHKKNMASR.YTGHKKNM
Lichtarge LabBaylor College of Medicine
GROUPS
1
2
3
4
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM
ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM
CONSENSUSSEQUENCES
Consensus sequence: residues that are invariant within that group
Lichtarge LabBaylor College of Medicine
2. IDENTIFY KEY RESIDUES IN EACH SUBGROUP
GROUPS
1
2
3
4
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM
ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM
CONSENSUSSEQUENCES
Compare consensus sequences
KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNM
EVOLUTIONARYTRACE
Lichtarge LabBaylor College of Medicine
3. COMPARE KEY RESIDUES ACROSS GROUPS
GROUPS
1
2
3
4
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM
ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM
CONSENSUSSEQUENCES
EVOLUTIONARYTRACE
By definition: if X varies, function changes the sine qua non of importance
KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNMX
Lichtarge LabBaylor College of Medicine
4. IDENTIFY CLASS SPECIFIC RESIDUES X
GROUPS
1
2
3
4
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM
ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM
CONSENSUSSEQUENCES
ACTIVE SITE
A site where any variation is linked to functional change.
EVOLUTIONARYTRACE
KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNMXX___T__K_XX
Lichtarge LabBaylor College of Medicine
5. MAP TRACE RESIDUES ON THE STRUCTURE
HOW TO DEFINE FUNCTIONAL SUBGROUPS?
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNM
Lichtarge LabBaylor College of Medicine
Expert bias
Experiments
Approximation
GROUPS
1
2
3
4
APPROXIMATE FUNCTIONAL SUBGROUPS FROM EVOLUTIONARY INFORMATION
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQM
ASR.YTGVKKNVASR.YTGVKKNV
ASR.YTGHKKNMASR.YTGHKKNM
Hypothesis
A sequence identity tree approximates a functional classification.
If so, each node is a virtual assay that defines functional subgroups.
4 branches 4 functional groups.
Lichtarge LabBaylor College of Medicine
GROUPS
1
2
3
4
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM
ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM
CONSENSUSSEQUENCES
Lichtarge LabBaylor College of Medicine
2. IDENTIFY KEY RESIDUES IN EACH SUBGROUP
GROUPS
1
2
3
4
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM
ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM
CONSENSUSSEQUENCES
KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNM
EVOLUTIONARYTRACE
Lichtarge LabBaylor College of Medicine
3. COMPARE KEY RESIDUES ACROSS GROUPS
GROUPS
1
2
3
4
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM
ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM
CONSENSUSSEQUENCES
EVOLUTIONARYTRACE
KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNMX
Lichtarge LabBaylor College of Medicine
4. IDENTIFY TRACE RESIDUES X
DefinitionA trace residue is one that does NOT vary within branches. Generically this property is also called class specificity.
GROUPS
1
2
3
4
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM
ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM
CONSENSUSSEQUENCES
ACTIVE SITEEVOLUTIONARYTRACE
KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNMXX___T__K_XX
Lichtarge LabBaylor College of Medicine
5. MAP TRACE RESIDUES ON THE STRUCTURE
GROUPS
1
-----T--K---_____T__K___
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMASR.YTGVKKNMASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNM---.-T--K---
CONSENSUSSEQUENCES
EVOLUTIONARYTRACE
1
rank 1DefinitionThe trace rank is the fewest number of branches at which a residue first becomes class specific.
ACTIVE SITE
Lichtarge LabBaylor College of Medicine
6. EVOLUTIONARY IMPORTANCE RANK
1
1
1
GROUPS
1
2
-E-T-T--K--MASR.YTG-KKN-_X___T__K___
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQM-E-T-T--K--M
ASR.YTGVKKNMASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTG-KKN-
CONSENSUSSEQUENCES
EVOLUTIONARYTRACE
ACTIVE SITE
rank 2
Lichtarge LabBaylor College of Medicine
6. RANK OF EVOLUTIONARY IMPORTANCE
2
GROUPS
1
2
3
KE-TFT-HK-LMVERT-TG-K-QMASR.YTG-KKN-XX___T__K_X_
KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM
VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM
ASR.YTGVKKNMASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTG-KKN- Lichtarge Lab
Baylor College of Medicine
CONSENSUSSEQUENCES
EVOLUTIONARYTRACE
ACTIVE SITE
rank 3
A WELL DEFINED ALGORITHMIC PROCEDURE
1
12
3
Use the tree’s intrinsic hierarchy to assign an evolutionary trace rank to every residues.
PROBLEM
METHOD
Given a protein structure• Where is the active site ?• What are the key residue determinants of function?
the Evolutionary Trace (ET):Use evolution’s mutation and assays
• Overview SH2, SH3, ZnF• Functional sites, 4º structure RGS, Ga
• Functional annotation ZnF, GPCRs• Remote homology and alignments GPCRs• Generality
EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES
Lichtarge LabBaylor College of Medicine
Lichtarge LabBaylor College of Medicine
SH2 DOMAIN
Get an SH2 structure
Extract the sequence
Gather homologs: Blast, FASTA...
Align: PILEUP, CLUSTALW...
Construct a tree: PHYLIP,…
Trace!
Lichtarge LabBaylor College of Medicine
0° 90° 180° 270°
A
B
C
D
EF
G
SH2 DOMAIN
Trace residues (colored)
• exist
• Accumulate with more branches
• map unevenly on the structure,
• up until they scatter
Lichtarge LabBaylor College of Medicine
0° 90° 180° 270°
A
B
C
D
EF
G
SH2 DOMAIN
Mutations of residues ranked
• best kill function
• lower modulate it
• worst no
effect
Lichtarge LabBaylor College of Medicine
0° 90° 180° 270°
A
B
C
D
EF
G
SH2 DOMAIN
Lichtarge LabBaylor College of Medicine
Trace cluster matches binding site
Binding site (Waksman et al.)
80 sequences
40 sequences
SH2 DOMAIN
Trace cluster matches the binding site (cyan).
But it matches the functional site (red) even better.
Evolution’s experiments agree with laboratory experiments
SH3 DOMAIN
Lichtarge LabBaylor College of Medicine
Lichtarge LabBaylor College of Medicine
INTRACELLULAR HORMONE RECEPTORS
Trace residue clusters match the protein-DNA interface
If• The dendrogram approximates a functional tree.
• The active site evolves through variations on a conserved architecture.
Then• Class specific residues can be found
• They cluster at functional sites (protein-protein, protein-DNA interfaces)
• They are ranked following a functional hierarchy:• functionally essential residues are first, • modulators of specificity follow,• then noise appears, unlike signal it is scattered rather than
clustered.
Lichtarge LabBaylor College of Medicine
Lichtarge et al. J. Mol. Biol. (1996)
PROOF OF PRINCIPLE
PROBLEM
METHOD
Given a protein structure• Where is the active site ?• What are the key residue determinants of function?
the Evolutionary Trace (ET):Use evolution’s mutation and assays
• Overview: control studies SH2, SH3, ZnF• Bona fide predictions of functional sites Galpha
and 4º structure RGS• Functional annotation ZnF, GPCRs• Remote homology and alignments GPCRs• Generality
EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES
Lichtarge LabBaylor College of Medicine
Lichtarge LabBaylor College of Medicine
Gbg
Ga GDP
Effector1
7TMR
G PROTEIN SIGNALING
G
LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000
• Ubiquitous in eukaryotes• Sight smell taste pain reward
inflammation• ≥ 80% of neuroendocrine signaling, • 100% of autonomic physiology.
• 40-60% of all drugs
Lichtarge LabBaylor College of Medicine
Gbg
Ga GDP
Effector
LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000
17TMR
G PROTEIN-COUPLED RECEPTOR ACTIVATION
Lichtarge LabBaylor College of Medicine
Gbg
Ga GDP Ga
Gbg
Effector
LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000
Activation
2
7TMR
GTP a
G PROTEIN ACTIVATION
Lichtarge LabBaylor College of Medicine
Gbg
Ga GDP Ga
Gbg
Effector
LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000
3
Adenylyl CyclasecGMP-PDEK ChannelsPhospholipase C b
GasGatGaiGaq
Activation7TMR
GTP a
EFFECTOR ACTIVATION
Lichtarge LabBaylor College of Medicine
Gbg
Ga GDP Ga
Effector
LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000 Adenylyl Cyclase
cGMP-PDEK ChannelsPhospholipase C b
4
GasGatGaiGaq
Activation7TMR
GTP
Changes in concentration of intracellular 2nd messengers
a
CELLULAR EFFECT
Lichtarge LabBaylor College of Medicine
Gbg
Ga GDP Ga
Gbg
Effector
LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000
7TMR
GTP
Changes in concentration of intracellular 2nd messengers
a
RGSSTOP
FIRST PROSPECTIVE STUDY
5
Where does Galpha bind the receptor?
Lichtarge LabBaylor College of Medicine
A MODEL OF THE G PROTEIN TRIMER-RECEPTOR COMPLEX
•ET identifies 3 surfaces on Ga 1. Cleft ----> GTP/GDP 2. Cterm----> GPCR 3. ? ----> Gb
• Since Gb also interact with 7TMR, this leads to a low resolution model of the complex
Structures from Wall et al Cell ‘95Lambright et al Nature ‘96
Lichtarge LabBaylor College of Medicine
THE Galpha-Gbeta INTERFACE
• A2 and B1 match the footprints Gbeta and Galpha
• A2 goes beyond the Gbeta footprint: additional interaction ?
Lichtarge et al. (1996) PNAS
PREDICTION vs ALA SCAN
Lichtarge LabBaylor College of Medicine
Lichtarge et al. (1996) PNAS
No effect
108 alanine mutants Two assays:
• Activation-dependent protection from Trypsin degradation
• Binding to photoactivated rhodopsin in membranes
Impaired
Onrust et al. (1997) Science
Lichtarge LabBaylor College of Medicine
Ala scan ET + - + 36 17 - 17 38
p=0.004
• Accuracy > 70%• p = 0.004• Disagreement in yellow region linked to assay limitations
Lichtarge et al. Meth Enzym. (2002)
PREDICTION vs ALA SCAN
EVOLUTIONARY TRACE IN G PROTEINS
Lichtarge LabBaylor College of Medicine
PROSPECTIVE STUDY
Multiple clusters of trace residues
Each assigned to a specific ligand (receptor, effector, Gb, nucleotides)
Low resolution 4º structure follows from the assignments
Anticipates 7 out of 10 alanine mutations correctly (p=0.004), disagreements reflect assay limitations
Lichtarge et al. (1996) PNAS Onrust et al. (1997) Science
Lichtarge et al. (2001, In Press) Meth. Enzym.
membrane
GPCRPDE
Lichtarge LabBaylor College of Medicine
Out
In
ligand
GbgGTP
Ga
PDE
GDP RGSRegulator ofG proteinSignaling
REGULATORS OF G PROTEIN SIGNALING
What is the basis for this difference?
RGS proteins binds Galpha and enhances GTP’ase activity
PDEgamma slows the GPTase accelerating property of RGS7
PDEgamma boosts the GPTase accelerating property of RGS9
Lichtarge LabBaylor College of Medicine
Family 1
Family 2
Family 3
Family 4
Family 5
Family 6
Family 7
6. Swap residues to swap function
EVOLUTION-BASED PROTEIN DESIGN
1. Identify relevant patterns of variation 2. Map these positions onto the structure
3. Clusters predict functional sites 4. Model 4º structure
5. Selectively block function
17 trace residues
• 10 at Galpha interface
• 7 that extend beyond: a second active site S2 ?
A NEW FUNCTIONAL SITE - S2
RGS
RGS
S2
Lichtarge LabBaylor College of Medicine
Putative Ga-PDEg Binding Site
RGS cluster S2
AN RGS SITE LINKED TO PDEgamma
Sowa et al (2000) PNAS
S2 interacts with PDEgamma and modulates its effect on Galpha
RGS7 RGS9
0.025
0.050
0.075
0.100
0.125
0.150
0.175
0.200
Dkin
act (
s-1)
- PDE
+ PDE
117124
(117,124) mutations mimic PDEg
117124131
131 confers RGS9-like activity
Sowa et al 2001 Nature Struct Bio Lichtarge LabBaylor College of Medicine
117 124
131
Slep et al. (2001)Nature
PDE INTERACTION
Direct contact betweenS2 and PDEgamma
Putative Ga-PDEg Binding Site
RGS cluster S2
Sowa et al (2000) PNAS
Lichtarge LabBaylor College of Medicine
•Uncover in part how G protein signaling turns off •Link raw sequence and structure data to function•Guide mutational studies and anticipate outcome•Design specificity by swapping trace residues among homologs•Anticipate protein-protein 4º structure
Predict • novel functional interface• specificity determinants• RGS-effector 4º structure
Target mutagenesis • allosteric on-off switch • RGS7-RGS9 specificity• trace residue pathway
Crystallography • Confirms RGS-effector 4º structure
Sowa et al. PNAS 2000
Sowa et al. Nature Struc Biol 2001
PREDICTION and VALIDATION in RGS
Lichtarge LabBaylor College of Medicine
PROBLEM
METHOD
Given a protein structure• Where is the active site ?• What are the key residue determinants of function?
the Evolutionary Trace (ET):Use evolution’s mutation and assays
• Overview: control studies SH2, SH3, ZnF
• Bona fide predictions of functional sites Ga
and 4º structure RGS
• Functional consistency and annotation RGS, ZnF, GPCRs
• Remote homology and alignmentsGPCRs• Generality
EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES
Lichtarge LabBaylor College of Medicine
Function has been experimentally determined in 0.5% of sequences It has been inferred by homology in another 4.5% (Karp P. Bioinformatics 2001)
Lichtarge LabBaylor College of Medicine
SEQUENCE
STRUCTURE FUNCTION
DATA vs INFORMATION
Lichtarge LabBaylor College of Medicine
FUNCTIONAL ANNOTATION
1. How does specificity arise at a functional site?
2. Do these proteins perform the same function?
INTRACELLULAR HORMONE RECEPTORS
Lichtarge LabBaylor College of Medicine
• The largest eukaryotic family of transcriptional regulators.
• Steroid IRs homodimerize onto palindromic response elements.
• Others heterodimerize onto double or inverted repeats.
• All bind DNA via a Zn-finger domain.
SteroidHead to Head
Non-SteroidHead to Tail
Non-SteroidTail to Tail
ESTR
THR
PPA
ROR
NUR
ANDRPRGR
MCRGCR
RAR
EAR
RXR
Momomer
ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING
N CDNA
BINDING
Trace residue clusters match the protein-DNA interface
INTRACELLULAR HORMONE RECEPTORS
Lichtarge LabBaylor College of Medicine
TRACE RANK AND CORRELATED EVOLUTION OF A PROTEIN-DNA INTERFACE
Lichtarge LabBaylor College of Medicine
Lichtarge LabBaylor College of Medicine
GROUP 1TRACE RESIDUES
RESPONSEELEMENT
YF 452H 451KA 461R 496R 489R 466F 463
MOSTLY INVARIANT MOSTLY VARIABLE
TRACE RANK AND CORRELATED EVOLUTION OF A PROTEIN-DNA INTERFACE
Lichtarge LabBaylor College of Medicine
GROUP 2TRACE RESIDUES
RESPONSEELEMENT
513 GKDEMN511 RKLVQ465 GKR458 EGDN459 GSA490 TKNRSC493 KPQR
MOSTLY VARIABLEMOSTLY INVARIANT
TRACE RANK AND CORRELATED EVOLUTION OF A PROTEIN-DNA INTERFACE
Lichtarge LabBaylor College of Medicine
• DNA binding evolves through variations on a theme.• Protein-DNA contacts have similar patterns of variation.
GROUP 1TRACE RESIDUES
GROUP 2TRACE RESIDUES
RESPONSEELEMENT
YF 452H 451KA 461R 496R 489R 466F 463
513 GKDEMN511 RKLVQ465 GKR458 EGDN459 GSA490 TKNRSC493 KPQR
MOSTLY VARIABLEMOSTLY INVARIANT
TRACE RANK AND CORRELATED EVOLUTION OF A PROTEIN-DNA INTERFACE
DNA RECOGNITION DETERMINANTS
Lichtarge LabBaylor College of Medicine
POSITION
N490 D513
G459R465 Q493 E458
R511
Y452K461
H451 F463 R466 R489 R496
•Highly conserved•Bind conserved bases
•Highly variable•Not conserved•Bind variable bases
Lichtarge LabBaylor College of Medicine
VARIATION
POSITION
N490 D513
S A SG459R465 Q493 E458EGK GKP EKP EKP EKQ EKQ DRR DKR
R511 K L L V L Q
F
A
F
Y452K461
H451 F463 R466 R489 R496
•Highly conserved•Bind conserved bases
•Highly variable•Not conserved•Bind variable bases
DNA RECOGNITION DETERMINANTS
T K K K R R N N N N K S N N C
A DNA RECOGNITION KEY ?
Lichtarge LabBaylor College of Medicine
VARIATION
POSITION
PARTITION
knirandr
prgrmcr
gcrestr thra thrb ppa rar ror nur ear rxr
P21N490 D513
S A SG459R465 Q493 E458EGK GKP EKP EKP EKQ EKQ DRR DKR
R511 K L L V L Q
F
A Y452K461
H451 F463 R466 R489 R496
P19P17
P15
P9
P7
P1
Q3
By tracking class specific variations along the tree, it is possible to link specific side chains to specific functions
•Highly conserved•Bind conserved bases•K461A correlates with a base change
•Highly variable•Not conserved•Bind variable bases
T K K K R R N N N N K S N N C
F
•Uncover in part how G protein signaling turns off •Link raw sequence and structure data to function•Guide mutational studies and anticipate outcome•Design specificity by swapping trace residues among homologs•Anticipate protein-protein 4º structure
Predict • novel functional interface• specificity determinants• RGS-effector 4º structure
Target mutagenesis • allosteric on-off switch • RGS7-RGS9 specificity• trace residue pathway
Crystallography • Confirms RGS-effector 4º structure
Sowa et al. PNAS 2000
Sowa et al. Nature Struc Biol 2001
THIS WAS THE TECHNIQUE USED IN THIS RGS STUDY
Lichtarge LabBaylor College of Medicine
Lichtarge LabBaylor College of Medicine
FUNCTIONAL ANNOTATION
1. How does specificity arise at a functional site?
2. Do these proteins perform the same function?
Lichtarge LabBaylor College of Medicine
NO SIGNAL AT THE DIMER INTERFACE !
INTRACELLULAR HORMONE RECEPTORS
Lichtarge LabBaylor College of Medicine
SteroidHead to Head
Non-SteroidHead to Tail
Non-SteroidTail to Tail
ESTR
THR
PPA
ROR
NUR
ANDRPRGR
MCRGCR
RAR
EAR
RXR
Momomer
ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING
N CDNA
BINDING
• Steroid IRs homodimerize onto palindromic response elements
• Others IRs heterodimerize onto double or inverted repeats
THE STEROID DIMER INTERFACE
SteroidHead to Head
Non-SteroidHead to Tail
Non-SteroidTail to Tail
ESTR
THR
PPA
ROR
NUR
ANDRPRGR
MCRGCR
RAR
EAR
RXR
Momomer
ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING
N CDNA
BINDING
Can identify an interface that is unique to a subgroup (steroids) by restricting ET to that branch
Lichtarge LabBaylor College of Medicine
DO THRs USE THE DIMER INTERFACE ?
Lichtarge LabBaylor College of Medicine
SteroidHead to Head
Non-SteroidHead to Tail
Non-SteroidTail to Tail
ESTR
THR
PPA
ROR
NUR
ANDRPRGR
MCRGCR
RAR
EAR
RXR
Momomer
ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING
N CDNA
BINDING
Can test whether a specific subgroup uses a given functional site. NO!
Lichtarge LabBaylor College of Medicine
SteroidHead to Head
Non-SteroidHead to Tail
Non-SteroidTail to Tail
ESTR
THR
PPA
ROR
NUR
ANDRPRGR
MCRGCR
RAR
EAR
RXR
Momomer
ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING
N CDNA
BINDING
DO PPARs USE THE DIMER INTERFACE ?
Can test whether a specific subgroup uses a given functional site. NO!
Lichtarge LabBaylor College of Medicine
SteroidHead to Head
Non-SteroidHead to Tail
Non-SteroidTail to Tail
ESTR
THR
PPA
ROR
NUR
ANDRPRGR
MCRGCR
RAR
RXR
Momomer
ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING
N CDNA
BINDING
DO RXRs USE THE DIMER INTERFACE ?
NO!
Lichtarge LabBaylor College of Medicine
SteroidHead to Head
Non-SteroidHead to Tail
Non-SteroidTail to Tail
ESTR
THR
PPA
NUR
ANDRPRGR
MCRGCR
RAR
EAR
RXR
Momomer
ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING
N CDNA
BINDING
DO RARs USE THE DIMER INTERFACE ?
Can test whether a specific branch uses a given functional site. YES !
SUBGROUP ANALYSIS or DIFFERENCE ET
•ET IDENTIFIES RESIDUES INVARIANT WITHIN EVERY BRANCH OF A FAMILY.•BRANCHES CAN BE PRUNED, TO SEARCH FOR SURFACES UNIQUE TO THE REMAINING BRANCHES •IF FOUND, THESE SURFACES SUGGEST THAT THE REMAINING BRANCHES SHARE COMMON SPECIFIC FUNCTIONS.
A B
U ====
C
U
D
U
A B
U
B
UD
CA
U
Lichtarge LabBaylor College of Medicine
INTRACELLULAR RECEPTORS DNA BINDING DOMAIN
Lichtarge LabBaylor College of Medicine
• Identify protein-DNA binding sites.
• Suggest how DNA recognition specificity is encoded.
• Identify subgroup specific active sites (dimerization, LH).
• Find sites, and by inference functions, that may be shared by distant branches of a sequence family.
• Consistency and intersection of ET signal is important
Lichtarge et al. J. Mol. Biol. (1997) 274:325-337
PROBLEM
METHOD
Given a protein structure• Where is the active site ?• What are the key residue determinants of function?
the Evolutionary Trace (ET):Use evolution’s mutation and assays
• Overview: control studies SH2, SH3, ZnF• Bona fide predictions of functional sites Ga
and 4º structure RGS• Functional annotation RGS, ZnF, GPCRs• Remote homology and alignments GPCRs• Generality
EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES
Lichtarge LabBaylor College of Medicine
Lichtarge LabBaylor College of Medicine
FUNCTIONAL ANNOTATION
1. How does specificity arise at a functional site?
2. Do these proteins perform the same function?
Asthma. Expert Opin Investig Drugs. 2000 Bertrand CP, Ponath PDCardiac diseases Cell Signal. 2000 Chakraborti S, Chakraborti T, Shaw G.Inflammation and infectious diseases. Blood Murdoch C, Finn A. 2000 May 15;95(10):3032-43.Proliferative vascular disease. Life. 1999 Sep;48(3):257-61. Iaccarino G, Hypercalcemia of malignancy: Int J Oncol. 2000 Rabbani SAAllergic lung disease Inflamm Res. 1999 . Wells TN, Proudfoot AE.Hyperthyroidism. Thyroid. 1999 Jul;9(7):727-33. Zimmerman D. HIV-1 co-receptor. Annu Rev Immunol. 1999;Berger EA, The next generation of drug targets? Br J Pharmacol. 1998 Wilson S, et alKidney: Exp Nephrol. 1998 Breyer MD. Calcium receptor Exp Nephrol. 1998 Hory B, et al. Nephrogenic diabetes insipidus. J Mol Med. 1998 Oksche A, Rosenthal W.Lung cancer. J Clin Oncol. 1998 Salgia R, Skarin AT. Alzheimer's disease Life Sci. 1995 Flynn DD, Ferrari-DiLeo G, Levey AI, Mash DC.
GPCRs
• Ubiquitous eukaryotic receptors• Mediate
• sight/smell/taste• nearly all neuroendocrine signaling • all autonomic physiology
• 40-60% of all drugs target GPCRs
Lichtarge LabBaylor College of Medicine
• 7 transmembrane helices • Variable length of loops and termini (N is out, C is in)• 5 main classes:
• Rhodopsin-like• Secretin-like• Metabotropic glutamate / pheromone• Fungal pheromone• cAMP Dicty receptors• Frizzled/Smoothened• Drosophila odorant• Nematode Chemor…• Class Y• Bacterial rhodopsins
• Helices 3,6,7 changes relative orientations upon activation.
Lichtarge LabBaylor College of Medicine
PURPOSE OF ET STUDIES IN GPCRS
Determine• Ligand binding site • Conformational switch involving H3, H6,...• Dimerization ?• G protein coupling site
Goals• Target mutations and drug design• Predict the G protein target• Create constitutively active receptors for assays• Modify G protein target for assay purposes
Lichtarge LabBaylor College of Medicine
Trace all GPCRs
Lichtarge LabBaylor College of Medicine
STRATEGY TO IDENTIFY FUNCTIONAL DETERMINANTS IN A RECEPTOR FAMILY
Trace of GPCR X
Lichtarge LabBaylor College of Medicine
STRATEGY TO IDENTIFY FUNCTIONAL DETERMINANTS IN A RECEPTOR FAMILY
Trace of GPCR X
-
Trace all GPCRs
=
-
Lichtarge LabBaylor College of Medicine
STRATEGY TO IDENTIFY FUNCTIONAL DETERMINANTS IN A RECEPTOR FAMILY
Trace of GPCR X
-
Trace all GPCRs
=
Lichtarge LabBaylor College of Medicine
STRATEGY TO IDENTIFY FUNCTIONAL DETERMINANTS IN A RECEPTOR FAMILY
- X specificity Determinants
ET IN GPCRS
This strategy is only justified if GPCRs have related structures and function.
• Do they share a common structure?• Do they share common functional determinants?
1. Show similarities in related GPCRs (+ control)2. Show similarities in unrelated GPCRs (test).3. Show no similarities in non GPCRs (- control)
Lichtarge LabBaylor College of Medicine
PROBLEM
METHOD
Given a protein structure• Where is the active site ?• What are the key residue determinants of function?
the Evolutionary Trace (ET):Use evolution’s mutation and assays
• Overview SH2, SH3, ZnF• Functional sites, 4º structure RGS, Ga
• Functional annotation ZnF, GPCRs• Remote homology and alignments GPCRs• Generality
EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES
Lichtarge LabBaylor College of Medicine
COGNATE RESIDUES OFTEN HAVE SIMILAR TRACE RANKS
OPSIN
THR
ADR
Trace ranks along helix 5
Peaks = greater importance. Troughs = lesser importance.
Peaks and troughs tend to align, more so closer to the G protein interface.
Lichtarge LabBaylor College of Medicine
Evolutionary importance appears to be correlated across GPCR families.
COMBINED
Lichtarge LabBaylor College of Medicine
OFFSET -4 -3 -2 -1 0 1 2 3 4OPSIN AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLC5A MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTOLF G IMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTADR VMLAVTAPL . . RVMLAVTAPL... LRVMLAVTAPL.. ILRVMLAVTAPL. IILRVMLAVTAPL .IILRVMLAVTAP ..IILRVMLAVTA ...IILRVMLAVT . . IILRV
-0.2
-0.1
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
Series1
TRACE RANKS ARE CORRELATED IN CLASS A
-4 -3 -2 -1 0 1 2 3 4offset
Lichtarge LabBaylor College of Medicine
OFFSET -4 -3 -2 -1 0 1 2 3 4OPSIN AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLC5A MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTOLF G IMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTADR VMLAVTAPL . . RVMLAVTAPL... LRVMLAVTAPL.. ILRVMLAVTAPL. IILRVMLAVTAPL .IILRVMLAVTAP ..IILRVMLAVTA ...IILRVMLAVT . . IILRV
-0.2
-0.1
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
Series1
-4 -3 -2 -1 0 1 2 3 4offset
TRACE RANKS ARE CORRELATED IN CLASS A
Lichtarge LabBaylor College of Medicine
OFFSET -4 -3 -2 -1 0 1 2 3 4OPSIN AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLC5A MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTOLF G IMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTADR VMLAVTAPL . . RVMLAVTAPL... LRVMLAVTAPL.. ILRVMLAVTAPL. IILRVMLAVTAPL .IILRVMLAVTAP ..IILRVMLAVTA ...IILRVMLAVT . . IILRV
-0.2
-0.1
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
Series1
-4 -3 -2 -1 0 1 2 3 4offset
TRACE RANKS ARE CORRELATED IN CLASS A
Lichtarge LabBaylor College of Medicine
OFFSET -4 -3 -2 -1 0 1 2 3 4OPSIN AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLC5A MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTOLF G IMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTADR VMLAVTAPL . . RVMLAVTAPL... LRVMLAVTAPL.. ILRVMLAVTAPL. IILRVMLAVTAPL .IILRVMLAVTAP ..IILRVMLAVTA ...IILRVMLAVT . . IILRV
-0.2
-0.1
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
Series1
-4 -3 -2 -1 0 1 2 3 4offset
TRACE RANKS ARE CORRELATED IN CLASS A
Lichtarge LabBaylor College of Medicine
OFFSET -4 -3 -2 -1 0 1 2 3 4OPSIN AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLC5A MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTOLF G IMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTADR VMLAVTAPL . . RVMLAVTAPL... LRVMLAVTAPL.. ILRVMLAVTAPL. IILRVMLAVTAPL .IILRVMLAVTAP ..IILRVMLAVTA ...IILRVMLAVT . . IILRV
-0.2
-0.1
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9
Series1
•The correlation is significant and greatest at the correct alignment.•Can this guide the alignment of Class A with Class B ?
-4 -3 -2 -1 0 1 2 3 4offset
TRACE RANKS ARE CORRELATED IN CLASS A
Lichtarge LabBaylor College of Medicine
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3-2
-1 0 12
3 4
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3-2
-1 0 12
3 4
0
5
10
15
20
25
residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 40
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2
-1 0 1
2
3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2
-1 0 1
2
3 4
ADR vs Class A Class B vs Class A
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3 -2 -1 0 1 2 3 4
0
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3 -2 -1 0 1 2 3 4
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
residue position shift
Spearman rank-order correlation coefficient
-4 -3-2 -1
01 2 3
4
0
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2 -1
0
1 2 3
4
Class C vs Class A BR vs. Class A
Perc
ent I
denti
tyRa
nk C
orre
latio
nPe
rcen
t Ide
ntity
XRa
nk C
orre
latio
n
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset -4 -3 -2 -1 0 1 2 3 4
offset
.50
.40
.30
.20
.100
-4 -3 -2 -1 0 1 2 3 4offset
25%20%15%10%5%0%
25%20%15%10%5%0%
25%20%15%10%5%0%
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
.50
.40
.30
.20
.100
.50
.40
.30
.20
.100
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
Lichtarge LabBaylor College of Medicine
CLASS A, GPCRs CAN BE CO-ALIGNED
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3-2
-1 0 12
3 4
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3-2
-1 0 12
3 4
0
5
10
15
20
25
residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 40
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2
-1 0 1
2
3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2
-1 0 1
2
3 4
ADR vs Class A Class B vs Class A
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3 -2 -1 0 1 2 3 4
0
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3 -2 -1 0 1 2 3 4
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
residue position shift
Spearman rank-order correlation coefficient
-4 -3-2 -1
01 2 3
4
0
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2 -1
0
1 2 3
4
Class C vs Class A BR vs. Class A
Perc
ent I
denti
tyRa
nk C
orre
latio
nPe
rcen
t Ide
ntity
XRa
nk C
orre
latio
n
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset -4 -3 -2 -1 0 1 2 3 4
offset
.50
.40
.30
.20
.100
-4 -3 -2 -1 0 1 2 3 4offset
25%20%15%10%5%0%
25%20%15%10%5%0%
25%20%15%10%5%0%
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
.50
.40
.30
.20
.100
.50
.40
.30
.20
.100
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
Lichtarge LabBaylor College of Medicine
CLASS A and B GPCRs CAN BE CO-ALIGNED
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3-2
-1 0 12
3 4
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3-2
-1 0 12
3 4
0
5
10
15
20
25
residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 40
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2
-1 0 1
2
3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2
-1 0 1
2
3 4
ADR vs Class A Class B vs Class A
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3 -2 -1 0 1 2 3 4
0
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3 -2 -1 0 1 2 3 4
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
residue position shift
Spearman rank-order correlation coefficient
-4 -3-2 -1
01 2 3
4
0
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2 -1
0
1 2 3
4
Class C vs Class A BR vs. Class A
Perc
ent I
denti
tyRa
nk C
orre
latio
nPe
rcen
t Ide
ntity
XRa
nk C
orre
latio
n
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset -4 -3 -2 -1 0 1 2 3 4
offset
.50
.40
.30
.20
.100
-4 -3 -2 -1 0 1 2 3 4offset
25%20%15%10%5%0%
25%20%15%10%5%0%
25%20%15%10%5%0%
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
.50
.40
.30
.20
.100
.50
.40
.30
.20
.100
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
Lichtarge LabBaylor College of Medicine
CLASS A, B, and C GPCRs CAN BE CO-ALIGNED
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3-2
-1 0 12
3 4
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3-2
-1 0 12
3 4
0
5
10
15
20
25
residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 40
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2
-1 0 1
2
3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2
-1 0 1
2
3 4
ADR vs Class A Class B vs Class A
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Residue position shift
Spearman rank-order correlation coefficient
-4 -3 -2 -1 0 1 2 3 4
0
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3 -2 -1 0 1 2 3 4
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
residue position shift
Spearman rank-order correlation coefficient
-4 -3-2 -1
01 2 3
4
0
5
10
15
20
25
Residue position shift
% of identity
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
4
6
8
10
12
Residue position shift
correlation coefficient X % identity
-4 -3
-2 -1
0
1 2 3
4
Class C vs Class A BR vs. Class A
Perc
ent I
denti
tyRa
nk C
orre
latio
nPe
rcen
t Ide
ntity
XRa
nk C
orre
latio
n
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset -4 -3 -2 -1 0 1 2 3 4
offset
.50
.40
.30
.20
.100
-4 -3 -2 -1 0 1 2 3 4offset
25%20%15%10%5%0%
25%20%15%10%5%0%
25%20%15%10%5%0%
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
.50
.40
.30
.20
.100
.50
.40
.30
.20
.100
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
-4 -3 -2 -1 0 1 2 3 4offset
CLASS A, B, C GPCRs CAN BE CO-ALIGNED BUT BACTERIORHODOPSIN CANNOT
Lichtarge LabBaylor College of Medicine
ET (Rhodopsin)
-
ET (A+B)
=
• Surround retinal• Funnel towards the generic signaling determinants
Lichtarge LabBaylor College of Medicine
TRACE RESIDUES UNIQUE TO RHODOPSIN
CHEMOKINEOPSINOLFACTORYADRENERGIC
These variations suggest there are significant differences in the details of ligand coupling.
Lichtarge LabBaylor College of Medicine
TRACE RESIDUES UNIQUE TO OTHER GPCRs
CONCLUSIONS
Lichtarge LabBaylor College of Medicine
Sheikh et al JBC 1999 Baranski et al JBC 1999Geva et al JBC 2000 Lichtarge et al Meth. Enzymol. 2001
A strategy to: • Align receptors from different Classes.• Identify global determinants of the switch mechanism.• Identify specific determinants of ligand binding.
FUTURE• Test in specific receptors.• Extend to study other aspects of GPCR function (dimerization, G protein specificity).• Correlate binding specificity determinants with ligand binding affinity data.
PROBLEM
METHOD
Given a protein structure• Where is the active site ?• What are the key residue determinants of function?
the Evolutionary Trace (ET):Use evolution’s mutation and assays
• Overview SH2, SH3, ZnF• Functional sites, 4º structure RGS, Ga
• Functional annotation ZnF, GPCRs• Remote homology and alignments GPCRs• Generality
EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES
Lichtarge LabBaylor College of Medicine
LARGE SCALE ET
Large scale ET• Scalability Input: tolerance to insertions and deletions• Statistics Output: objective significance• Pipeline Automation and GUI
Applications• Structural Genomics• Functional Genomics• Pharmaceuticals & Bioengineering
Lichtarge LabBaylor College of Medicine
STATISTICS OF CLUSTER NUMBER AND SIZE
If trace residues were randomly picked
ET clusters are fewer and larger than random.
An actual trace of pyruvate decarboxylase.
Lichtarge LabBaylor College of Medicine
Random Distribution of the total number of clusters at 15% coverage in Pyruvate Decarboxylase
80 residues out of 537 (15%) were drawn randomly, and the number of clusters was tallied in each of 5000 trials.
Perc
enta
ge F
requ
ency
5% 95%
Total number of clusters
Lichtarge LabBaylor College of Medicine
Perc
enta
ge F
requ
ency
Protein Size
1% Threshold
Random Distribution of the total number of clusters at 15% coverage in Pyruvate Decarboxylase
1%
A random draw of 80 out of 537 (15%) residues will generate 27 clusters or less only once every 100 trials.
Total number of clusters
Total number of clusters
Lichtarge LabBaylor College of Medicine
Perc
enta
ge F
requ
ency
1%
Protein Size
1% Threshold
Random Distribution of the total number of clusters at 15% coverage in a-Amylase
A random draw of 64 of 425 residues (15%) will generate less than 20 clusters only once every 100 trials in a-amylase
Total number of clusters
Lichtarge LabBaylor College of Medicine
Perc
enta
ge F
requ
ency
1%
Protein Size
1% Threshold
Random Distribution of the total number of clusters at 15% coverage in annexin III
A random draw of 48 of 323 residues (15%) will generate less than 20 clusters only once every 100 trials in annexin III.
Total number of clusters
Lichtarge LabBaylor College of Medicine
SIGNIFICANCE THRESHOLDS FOR THE NUMBER OF CLUSTERS VARIES LINEARLY WITH PROTEIN SIZE
100 300 500
30
20
10
Protein Size
1% Threshold
Significant at the 1% confidence level
Similar linear relationships at other confidence levels
Not significant at the 1% confidence level
Tota
l num
ber o
f clu
ster
s
Lichtarge LabBaylor College of Medicine
Ligand binding domain of LDL receptor LDL receptorc-Src tyrosine kinase; SH3 Tyrosine KinaseBiotinyl domain CarboxylaseAcyl CoA binding protein Binding proteinc-Src tyrosine kinase; SH2 Tyrosine KinaseBikunin Kunitz type inhibitorMannose binding protein Binds MannoseTpr2a-domain of Hop ChaperonePseudoazurin electron transportTpr1 domain of Hop ChaperoneRegulator of G-protein signaling regulator of G-protein signalingGalectin-3 CRD Galectin carbohydrate recognition domainMyoglobin Oxygen transportThermosome chaperoninPoly-A binding protein Gene regulationGrowth hormone Growth hormone Growth hormone receptor Growth hormone receptorAstacin Metalloproteinase (hydrolase)von Willebrand factor blood coagulationHSP-90 chaperoneGlutathione S-transferase, type-III transferaseAdenylate kinase phosphotransferaseF-MuLV viral glycoproteinEstrogen receptor Nuclear receptor
Indole-3-glyceophosphate synthase SynthaseTriosephosphate isomerase gluconeogenesisCyclins transferaseB-Lactamase hydrolaseDeacetoxycephalosporin C Oxidoreductase2,5-diketo-D-gluconic acid reductase A OxidoreductaseEndonuclease IV endonucleaseDihydropteroate SynthaseProtein phospatase-1 hydrolaseSignal sequence recognition proteinCyclins transferaseThioredoxin reductase reductaseAnnexin III calcium/phospholipid binding proteinTransferrin iron transportPeroxidase peroxidaseRhodopsin signaling proteinSerine/Threonine phosphatase hydrolasecitrate synthase synthasePhosphoglycerate kinase kinaseA amylase a-amylaseHIV Reverse transcriptase reverse transcriptasePyruvate decarboxylase Carbon-Carbon lyase
Protein Function Protein Function
ET IN 46 PROTEINS THAT ARE FUNCTIONALLY, STRUCTURALLY, EVOLUTIONARILY DIVERSE
•Folds: 19 a/b, 15 a, 7 b, 2 small, 1 multidomain, 1 membrane protein.•Origin: 24 eukaryotic, 18 euk.+prok., 2 prokaryotic, 2 viral proteins.•Role: signaling, metabolism, gene regulation, transport, folding, etc...
Lichtarge LabBaylor College of Medicine
0 200 400 600Protein size (aa)
Num
ber o
f Clu
ster
s
0.3%
32 proteins have a coverage fraction of ~20%.
• 29 are significant with a p-value ≤ 5%.
• 19 are significant at a p-value ≤ 0.3 %.
• Only 3 are not significant at a level of 5%.
Number of Clusters Statistics20% coverage fraction, with gaps
30
20
10
30%
5%
Lichtarge LabBaylor College of Medicine
1a3k: Galectin-3 CRD
Structural Epitope Trace Without Gaps Trace With Gaps
1elw: Tpr2a-domain of Hop
1bqk: Pseudoazurin
1am1: HSP-90
http://imgen.bcm.tmc.edu/molgen/labs/lichtarge/trace_of_the_week/
Lichtarge LabBaylor College of Medicine
16pk: Phosphoglycerate kinase
3ert: Estrogen receptor
1a80: 2,5-diketo-D-gluconic acid reductase A
1qum: Endonuclease IV
Structural Epitope Trace Without Gaps Trace With Gaps
http://imgen.bcm.tmc.edu/molgen/labs/lichtarge/trace_of_the_week/
Lichtarge LabBaylor College of Medicine
LARGE SCALE ET
Lichtarge LabBaylor College of Medicine
• Gap tolerance improves signal:noise and ease of use.
• Trace residues cluster non-randomly.
• Consistent with the cooperative nature of folding/function
• In nearly all proteins tested thus far, trace cluster are statistically significant and overlap with the binding sites.
Madabushi et al. JMB 2002
SUMMARY
Lichtarge LabBaylor College of Medicine
ET is useful to• Rank - residue importance• Identify - functional sites
- ligand binding pockets and - specificity determinants SH2, SH3, GPCRs
• Anticipate - mutation outcomes RGS, Ga
- quaternary structureGabg, Ga-RGS-PDEg
• Recognize - remote homology GPCR• Target mutations to relevant sites GPCR,
RGS, NTP,... • Infer which homologs may share functions ZnF, GPCR • Statistically significant• Can be applied to a significant fraction of the PDB.
STRATEGIES OF SEQUENCE ANALYSIS
SEQUENCECONSERVATION
Sequence A ~ Sequence B
Protein A ~ Protein B
SEQUENCEVARIATION
DeduceR (function,sequence) = 0
AssayFunction A & Function B
Structure A ~ Structure B Function A ~ Function B
Mutate Sequences soSequence A ≠ Sequence B
COMPUTATIONAL BIOLOGICAL Lichtarge Lab
Baylor College of Medicine
THE EVOLUTIONARY TREE
Lichtarge LabBaylor College of Medicine
EVOLUTION INTEGRATES SEQUENCE AND STRUCTURE
SEQUENCEDATABASE
EXPERIMENTSTHEORY
STRUCTUREDATABASE
FUNCTION
Meta-DATABASE
Lichtarge LabBaylor College of Medicine
EVOLUTION AS A COMPUTATIONAL PRINCIPLE
FACTS
Annotation of functional sites in proteins
EXPERIMENTS
THEORYEVOLUTIONARY
FILTER
RELEVANT FACTS
Lichtarge LabBaylor College of Medicine
USING EVOLUTION TO INTEGRATESEQUENCE-STRUCTURE-FUNCTION INFORMATION
Lichtarge LabBaylor College of Medicine
SEQUENCE
STRUCTURE FUNCTION
EVOLUTION
A FUNDAMENTAL CHALLENGE