protein targeting by functional linkage of non-homologous proteins with examples from m....
TRANSCRIPT
Protein Targeting by Functional Linkage of Non-Homologous Proteins
with examples from M. tuberculosis
TB Gene B0 1000 2000 3000 4000
TB G
ene A
0
1000
2000
3000
4000
Genome-wide functional linkage mapStructural Genomics of Complexes:
Identifying subunitsof complexes by analyzingco-evolution of non-homologous proteins, fromgenome-wide functionallinkage maps
Limitations of Relying Entirely on Homology-Based Targeting
• Many (most ?) proteins function in complexes made up of non-homologous proteins
• Some (many ?) proteins are crystallizable only with their functional partners
Limitations of Relying Entirely on Homology-Based Targeting
• Many (most ?) proteins function in complexes made up of non-homologous proteins
• Some (many ?) proteins are crystallizable only with their functional partners
Suggests that targeting of non-homologus, functionallylinked proteins may offer a useful shortcut to learning protein structures and functions
Identifying Subunits of Protein Complexes by Analyzing the
Co-evolution of Non-homologous Proteins
Structural Genomics of Protein Complexes
4 Methods to Infer Non-Homologous Protein Pairs that have Co-evolved and
hence are Functionally Linked
•Rosetta Stone Protein fusion
•Phylogenetic Profile Protein co-occurrrence
•Gene neighbor Constant separation
•Operon Small separation
A
A
A′
B
B′
Figure 7. M. Strong, T. Graeber et al.
Whole Genome Functional Linkage Map (RS, PP, GN, OP methods for TB)
TB Gene B0 1000 2000 3000 4000
TB G
ene
A
0
1000
2000
3000
4000
Classical graphical representation of protein functional linkages
Research of Michael Strong and Morgan Beeby
Requiring 2 or more functional linkages:1,865 genes make 9,766 linkages
Functional Linkages Between Genes of M. tuberculosis
Hierarchical Clustering of the Combined Genome-Wide Linkage Map for M. Tb. Reveals Complexes and
Pathways
TB Gene B
0 1000 2000 3000 4000 5000
TB G
ene
A
0
1000
2000
3000
4000
5000
Genome-wide functional linkagemap based on 4 methods:
Clustered linkage mapshowing complexes and pathways:
Clustersimilarlinkagepatterns
ach cluster is a complex or pathway
DetoxificationPolyketide and non-ribosomal peptide synthesis
Energy Metabolism, oxidoreductase
Deg. of Fatty AcidsVirulenceEnergy Metabolism, oxidoreductase Amino acid Biosynthesis
Emergy Metab. Respiration AerobicLipid Biosynthesis
Degradation of Fatty Acids
Amino Acid Biosynthesis (Branched)
Synthesis and Modif. Of Macromolecules, rpl,rpm, rpsBiosynthesis of Cofactors, Prosthetic groups
Purine, Pyrimidine nucleotide biosynthesisNovel Group Sugar MetabolismAromatic Amino Acid BiosynthesisEnergy Metabolism, Anaerobic Respiration
Two component systemsCell EnvelopeCytochrome P450Chaperones
Biosynthesis of cofactors
Cell Envelope, Cell Division
Transport/Binding Proteins
Energy Metabolism TCA
Broad Regulatory, Serine Threonine Protein Kinase
Cell Envelope, Murein Sacculus and Peptidoglycan
Transport/Binding Proteins Cations
Energy Metabolism, ATP Proton Motive force
Fig 4.M. Strong, T. Graeber et al.
Quantitative Assessment of Inferred Protein Complexes
Calculating Probabilities of Co-evolution
m
Nkm
nN
k
n
NmnkP ),,|(
1
0 !
ln)(1)(
m
k
k
mm k
XXXPXP
nenP 1)(
Phylogenetic ProfileRosetta Stone
Gene Neighbor
Operon
N= number of fully sequenced genomesn= number of homologs of protein Am = number of homologs of protein Bk = number of genomes shared in common
X= fractional separation of genes
n = intergenic separation
Combining Inferences of Co-Evolution from 4 Methods
We use a Bayesian approach to combine the probabilities from the four methods to arrive at a single probability that two proteins co-evolve:
)(
)(
)|(
)|(4
1 negP
posP
negfP
posfPO
i i
ipost
where positive pairs are proteins with common pathway annotation and negative pairs are proteins with different annotation
Benchmarking this Approach Against Known Complexes
Ecocyc: Karp et al. NAR, 30, 56 (2002)
True positive interactions are between subunits of known complexes and false positive ones are between subunits of different complexes.
ROC plot
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009
Fraction of False Positives
Fra
ctio
n o
f T
rue
Po
siti
ves
For high confidence links, we find 1/3 of true interactions with only one 1/1000 of the false positive ones
Random
Example Complex: NADH Dehydrogenase I
11 of 13 subunits detected
Example Complex: NADH Dehydrogenase I
11 of 13 subunits detected
3 false positives
CtaD
CtaE CtaC
Functional Linkages Among Cytochrome Oxidase Genes
CtaBFunctional linkages relate all 3 componentsof cytochrome oxidase complexand also CtaB, the cytochrome oxidase assembly factor
These genes are at four different chromosomallocations
Membrane proteins linked to soluble proteins
From Inferred Protein Complexes to their
Structures
PE, PE-PGRS, and PPE Proteins in M. tuberculosis
38 PE proteins; 61 PE-PGRS proteins; 68 PPE proteins
Together compromise about 5 % of the genome
No function is known, but some appear to be membrane boundNo structure is known: always insoluble when expressed
Goal: use functional linkages to predict a complex betweena PE and a PPE protein: express complex, and determineits structure
Research of Shuishu Wang and Michael Strong
The Problem of PE and PPE Proteins in M. tb
Construction of a co-expression vector to test for protein-protein interactions (Mike Strong)
pET 29b(+)
T7 promoter lac oper. RBS
Nde1 HindIIIKpn1 NcoI
RBS gene A gene B
Thrombinsite
His tag
polycistronic mRNA
transcription
translation
protein A protein B (with His tag)
protein A protein B (with His tag) protein A protein B (with His tag)
If proteins interact (protein-protein interaction)
If proteins do not interact
When co-expressed, the PE and PPE proteins, inferred to interact, do form a soluble complex,
Mr = 35,200Sedimentation equilibrium experiments:Rv2430c + Rv2431c fraction 49, in 20mM HEPES, 150mM NaCl, pH 7.8Concentration OD280 0.7, 0.45, 0.15
Expected Mr:
Rv 2431c (PE) 10,687
(10563.12 from Mass Spec)
Rv2430c+His tag (PPE) 24,072
(23895.00 from Mass Spec)
Possibly suggests a 1:1 complex between these
two proteins
Crystallization trials of the Complex Between PE Protein Rv2430c and PPE Protein Rv2431c
Summary
Many functional lnkages are revealed from genomic data (high coverage)
Summary
Many functional lnkages are revealed from genomic data (high coverage)
Clustered genome-wide functional maps can reveal and organize information on complexes (and pathways)
Summary
Many functional lnkages are revealed from genomic data (high coverage)
Known subunits of E. coli complexes can be identified with high accuracy from functional linkages
Clustered genome-wide functional maps can reveal and organize information on complexes (and pathways)
Summary
Many functional lnkages are revealed from genomic data (high coverage)
Known subunits of E. coli complexes can be identified with high accuracy from functional linkages
Clustered genome-wide functional maps can reveal and organize information on complexes (and pathways)
A protein complex suitable for structural studieshas been revealed from functional linkages
Summary
Many functional lnkages are revealed from genomic data (high coverage)
Known subunits of E. coli complexes can be identified with high accuracy from functional linkages
Clustered genome-wide functional maps can reveal and organize information on complexes (and pathways)
A protein complex suitable for structural studieshas been revealed from functional linkages
The procedures for identifying and producing protein complexes can be adapted for high thruput
Protein Interactions in M. tb.Analysis of M.tb. Genome
Michael Strong, Debnath Pal,Sulmin Kim
Whole Genome Interaction MapsMichael Strong, Tom Graeber,Huiying Li, Matteo Pellegrini
Methods of Inferring InteractionsEdward Marcotte, Matteo Pellegrini,Todd Yeates, Michael Thompson
PI of Tb Structural Genomics ConsortiumTom Terwilliger