proteomics technologies and protein-protein interaction

Proteomics technologies and protein-protein

interaction

Michael Kerner (Lars Kiemer, Anders Hinsby)

Center for Biological Sequence AnalysisThe Technical University of Denmark

Advanced Bioinformatics – November 2006

November 2006

Outlining the problem

Around 30% of the human proteins still have no annotated function.

Even if the function is known, we often don’t know anything about the big picture (regulation?, multiple functions?, pathogenesis?, mutations?, splice variants?).

In fact, the individual proteins are as interesting as bricks in a wall – what we want to know about is the system.

November 2006

Ras

Raf

MEK

MAPK MAPKNUCLEUS

CYTOPLASM

EXTRACELLULAR

Rap1

bRaf

NC

AM

DAGL

Ca2+

FynFG

FR C

B1

NC

AM

NC

AM

NC

AM

Frs2

PLC

Shc

Fak

PKC

PKA

Grb2

Sos

GAP43

CaMKII

CREB

C-Fos

Example: signal transduction cascade

November 2006

Ras

Raf

MEK

MAPK

Transcription

MAPKNUCLEUS

CYTOPLASM

EXTRACELLULAR

cAMPRap1

bRaf

NC

AM

DAGL

DAG PIP22-AG

Ca2+

Fyn

FG

FR C

B1

NC

AM

NC

AM

NC

AM

Frs2

PLC Shc Fak

PKCPKA

IP3Grb2

SosGrb2Sos

GAP43

CaMKII CREB

C-Fos

Example: signal transduction cascade

November 2006

Obtaining data

High-throughput data can provide information about interactions with other proteins, protein abundance in different tissues, transcriptional regulation, etc.

High-throughput experimental techniques provide large data sets – thus no manual curation is possible.

These data sets often contain false positives. They still miss many interactions.

But combining several such data sets increases confidence and coverage.

November 2006

Protein interactions reveal a lot!

Hints of the function of a protein are revealed when its interaction partners are known.

Guilt by association!

Complexes in which none of the interaction partners have known functions are even more interesting.

November 2006

Yeast-two-hybrid screening

Has been widely used

Only binary interactions High false postive rate Proteins always get

expressed as chimera Proteins must be able to

enter the nucleus

November 2006

Affinity purification

Large-scale Can be done on any preparation

of cells Often complexes are purified and

the order of binding is not obtained

An extra step is needed to identify purified proteins

November 2006

Q1

TOF

q2

Mass Analyzer(s)

Separates gas-phaseIons by m/z

Ion Source

Converts the analyteinto gasphase ions

3 principal components

+

Detector

Ions are detected as

they discharge on the detector

Mass spectrometer

November 2006

Mass spectrometry in short

Extremely sensitive Mass precision of one atom In principle, detection of one, relatively short peptide allows for

unambiguous identification of a protein. (in practice: two or three peptides)

Proteins usually have to be digested to smaller peptides before analysis.

Some proteins are difficult to digest with proteases. Some peptides are very difficult to ionize. Due to the high sensitivity of the method, contaminations are

difficult to avoid. Protein/peptide identification is still mostly qualitative only.

Relative (but not yet absolute) protein concentrations can be obtained with more sopisticated experimental setups.

November 2006

Affinity pulldown

Bait Prey

Spoke Matrix Truth?

Protein interaction databases: Spoke/Matrix

November 2006

Protein interaction data:

A total of 18.629 articles represented in the databases (June 2005).

Database Unique article references

# interaction pairs in unique references.

DIP 1.353 5.403 (binary?)

MINT 1.406 5.430 (spoke)

Intact 355 6.836 (spoke)

GRID 1.232 49.135 (binary?)

BIND* (protein part) 5.733 44.279 (spoke/matrix)

HPRD 6.989 14.533 (matrix)*Approx. 10% of pp interactions in BIND are db’ imports

Protein interaction databases: Overlap

November 2006

Species bias in available data

A few select organisms are very well-studied, while others are not.

The BIND database, species distribution (Alfarano et al., NAR, 2005): Yeast

Drosophila

Human

C. elegans

Mouse

Helicobacter

Bos taurus

HIV

Other

November 2006

Orthologs?

Orthologous genes are direct descendants of a gene in a common ancestor:

(O'Brien K, Remm et al. 2005)

S. cerevisiae

D. melanogaster

H. sapiens

Trans-organism protein interaction network

November 2006

D. melanogaster Experim.

C. elegans Experim.

S. cerevisiae Experim.

H. sapiens MOSAIC

Trans-organism protein interaction network

November 2006

Repetition of experiments adds credibility

Light blue connection – 1 experiment.Darker blue connection – >1 experiment, 1 organism.Purple connection - >1 experiment, >1 organisms.

Light blue connection – 1 experiment.Darker blue connection – >1 experiment, 1 organism.Purple connection - >1 experiment, >1 organisms.

November 2006

Adding co-expression data

Red connector – co-expression in 80 different tissues with a correlation coefficient above 0.7.Grey nodes – no expression data available.

Red connector – co-expression in 80 different tissues with a correlation coefficient above 0.7.Grey nodes – no expression data available.

November 2006

Nucleolus dynamics

Nodes are coloured according to level of protein in the nucleolus following transcriptional inhibition (Andersen et al., Nature, 2005).

Nodes are coloured according to level of protein in the nucleolus following transcriptional inhibition (Andersen et al., Nature, 2005).

decreased

unchanged

Relative level of protein in the nucleolusafter inhibition of transcription

increased

decreased

unchanged

Relative level of protein in the nucleolusafter inhibition of transcription

increased

November 2006

Adding up to make high quality associations

Integration of various data sources builds up confidence

Integration of various data sources builds up confidence

November 2006

Upon integration comes enlightenment

November 2006

Identifying functional complexes

Ribosome (predominantly 60S)

DNA repairSMARCA complex

TFIID

Arp2/3

November 2006

Summary

Protein-protein interactions can reveal hints about the function of a protein (guilt by association).

Information about protein interactions is obtained with different technologies each with its own advantages and weaknesses.

Due to the high degree of systemic conservation, interactions can be inferred from observed interactions in other species.

Data are always error-prone. Repeated observations build up confidence.

Integrating different types of data can futher build up confidence.

proteomics technologies and protein-protein interaction

Documents

protein interactions

protein interaction

protein abundance

interaction partners

interaction pairs

large data sets

datahighthroughput data

individual proteins