proteomics technologies and protein-protein interaction
DESCRIPTION
Proteomics technologies and protein-protein interaction. Michael Kerner (Lars Kiemer, Anders Hinsby) Center for Biological Sequence Analysis The Technical University of Denmark Advanced Bioinformatics – November 2006. Outlining the problem. - PowerPoint PPT PresentationTRANSCRIPT
Proteomics technologies and protein-protein
interaction
Michael Kerner (Lars Kiemer, Anders Hinsby)
Center for Biological Sequence AnalysisThe Technical University of Denmark
Advanced Bioinformatics – November 2006
November 2006
Outlining the problem
Around 30% of the human proteins still have no annotated function.
Even if the function is known, we often don’t know anything about the big picture (regulation?, multiple functions?, pathogenesis?, mutations?, splice variants?).
In fact, the individual proteins are as interesting as bricks in a wall – what we want to know about is the system.
November 2006
Ras
Raf
MEK
MAPK MAPKNUCLEUS
CYTOPLASM
EXTRACELLULAR
Rap1
bRaf
NC
AM
DAGL
Ca2+
FynFG
FR C
B1
NC
AM
NC
AM
NC
AM
Frs2
PLC
Shc
Fak
PKC
PKA
Grb2
Sos
GAP43
CaMKII
CREB
C-Fos
Example: signal transduction cascade
November 2006
Ras
Raf
MEK
MAPK
Transcription
MAPKNUCLEUS
CYTOPLASM
EXTRACELLULAR
cAMPRap1
bRaf
NC
AM
DAGL
DAG PIP22-AG
Ca2+
Fyn
FG
FR C
B1
NC
AM
NC
AM
NC
AM
Frs2
PLC Shc Fak
PKCPKA
IP3Grb2
SosGrb2Sos
GAP43
CaMKII CREB
C-Fos
Example: signal transduction cascade
November 2006
Obtaining data
High-throughput data can provide information about interactions with other proteins, protein abundance in different tissues, transcriptional regulation, etc.
High-throughput experimental techniques provide large data sets – thus no manual curation is possible.
These data sets often contain false positives. They still miss many interactions.
But combining several such data sets increases confidence and coverage.
November 2006
Protein interactions reveal a lot!
Hints of the function of a protein are revealed when its interaction partners are known.
Guilt by association!
Complexes in which none of the interaction partners have known functions are even more interesting.
November 2006
Yeast-two-hybrid screening
Has been widely used
Only binary interactions High false postive rate Proteins always get
expressed as chimera Proteins must be able to
enter the nucleus
November 2006
Affinity purification
Large-scale Can be done on any preparation
of cells Often complexes are purified and
the order of binding is not obtained
An extra step is needed to identify purified proteins
November 2006
Q1
TOF
q2
Mass Analyzer(s)
Separates gas-phaseIons by m/z
Ion Source
Converts the analyteinto gasphase ions
3 principal components
+
Detector
Ions are detected as
they discharge on the detector
Mass spectrometer
November 2006
Mass spectrometry in short
Extremely sensitive Mass precision of one atom In principle, detection of one, relatively short peptide allows for
unambiguous identification of a protein. (in practice: two or three peptides)
Proteins usually have to be digested to smaller peptides before analysis.
Some proteins are difficult to digest with proteases. Some peptides are very difficult to ionize. Due to the high sensitivity of the method, contaminations are
difficult to avoid. Protein/peptide identification is still mostly qualitative only.
Relative (but not yet absolute) protein concentrations can be obtained with more sopisticated experimental setups.
November 2006
Affinity pulldown
Bait Prey
Spoke Matrix Truth?
Protein interaction databases: Spoke/Matrix
November 2006
Protein interaction data:
A total of 18.629 articles represented in the databases (June 2005).
Database Unique article references
# interaction pairs in unique references.
DIP 1.353 5.403 (binary?)
MINT 1.406 5.430 (spoke)
Intact 355 6.836 (spoke)
GRID 1.232 49.135 (binary?)
BIND* (protein part) 5.733 44.279 (spoke/matrix)
HPRD 6.989 14.533 (matrix)*Approx. 10% of pp interactions in BIND are db’ imports
Protein interaction databases: Overlap
November 2006
Species bias in available data
A few select organisms are very well-studied, while others are not.
The BIND database, species distribution (Alfarano et al., NAR, 2005): Yeast
Drosophila
Human
C. elegans
Mouse
Helicobacter
Bos taurus
HIV
Other
November 2006
Orthologs?
Orthologous genes are direct descendants of a gene in a common ancestor:
(O'Brien K, Remm et al. 2005)
S. cerevisiae
D. melanogaster
H. sapiens
Trans-organism protein interaction network
November 2006
D. melanogaster Experim.
C. elegans Experim.
S. cerevisiae Experim.
H. sapiens MOSAIC
Trans-organism protein interaction network
November 2006
Repetition of experiments adds credibility
Light blue connection – 1 experiment.Darker blue connection – >1 experiment, 1 organism.Purple connection - >1 experiment, >1 organisms.
Light blue connection – 1 experiment.Darker blue connection – >1 experiment, 1 organism.Purple connection - >1 experiment, >1 organisms.
November 2006
Adding co-expression data
Red connector – co-expression in 80 different tissues with a correlation coefficient above 0.7.Grey nodes – no expression data available.
Red connector – co-expression in 80 different tissues with a correlation coefficient above 0.7.Grey nodes – no expression data available.
November 2006
Nucleolus dynamics
Nodes are coloured according to level of protein in the nucleolus following transcriptional inhibition (Andersen et al., Nature, 2005).
Nodes are coloured according to level of protein in the nucleolus following transcriptional inhibition (Andersen et al., Nature, 2005).
decreased
unchanged
Relative level of protein in the nucleolusafter inhibition of transcription
increased
decreased
unchanged
Relative level of protein in the nucleolusafter inhibition of transcription
increased
November 2006
Adding up to make high quality associations
Integration of various data sources builds up confidence
Integration of various data sources builds up confidence
November 2006
Upon integration comes enlightenment
November 2006
Upon integration comes enlightenment
November 2006
Identifying functional complexes
Ribosome (predominantly 60S)
DNA repairSMARCA complex
TFIID
Arp2/3
November 2006
Summary
Protein-protein interactions can reveal hints about the function of a protein (guilt by association).
Information about protein interactions is obtained with different technologies each with its own advantages and weaknesses.
Due to the high degree of systemic conservation, interactions can be inferred from observed interactions in other species.
Data are always error-prone. Repeated observations build up confidence.
Integrating different types of data can futher build up confidence.