bcb 570 spring 20081 protein-protein interaction networks & methods julie dickerson electrical...
Post on 22-Dec-2015
213 Views
Preview:
TRANSCRIPT
BCB 570 Spring 2008 1
Protein-Protein Interaction Networks & methods
Julie Dickerson
Electrical and Computer Engineering
BCB 570 Spring 2008 2
Outline
Data for Protein-protein interaction networks
Brief review of network concepts for network analysis
Effect of different data sets Biological network comparison
BCB 570 Spring 2008 3
Two hybrid system
P protein of interest, referred to as "bait," is bound to a DNA Binding Domain (DBD).
A separate protein, called the "prey," is bound to an open reading frame.
If these two proteins (the bait and prey) interact, a reporter gene is transcribed.
In general, used for initial identification of interacting proteins, not for detailed characterization of the interaction
Image from http://www.biochem.arizona.edu/classes/bioc568/two-hybrid_system.htm.
BCB 570 Spring 2008 4
Domain Belief Assumptions : A domain is a discrete
functional and structural unit, such that it folds as a unit and carries out a particular function.
Proteins consist of a number of these domains, laid out in a linear array along the polypeptide chain.
The properties of a domain are basically the same when this unit is put into a different context (such as in a hybrid protein, for instance in the two-hybrid system).
Limitations: Not all proteins have a domain
structure. In many proteins, domains exist
but they include portions of the polypeptide from different parts of the chain; for example, a domain might be composed of residues 1-100 and 250-350.
Properties of a domain may change when it is taken out of the context of the intact protein. E.g., some proteins contain "autoinhibitory" regions.
BCB 570 Spring 2008 5
Co-Immunoprecipitation (co-IP) to find out what is binding
the protein itself is used as an affinity reagent to isolate its binding partners
Compared with two-hybrid and chip-based approaches, this strategy has the advantages that the fully processed and modified protein serves as bait
BCB 570 Spring 2008 6
Proteome Mass Spectrometry
BCB 570 Spring 2008 7
Problems
Noisy data Many weak associations Self-activators contaminants
Molecules are highly connected
BCB 570 Spring 2008 8
Approach
Get more evidence Physical interactions Synthetic lethality Co-citation Co-expression Literature
BCB 570 Spring 2008 9
MIPS Database GDA1p
BCB 570 Spring 2008 10
PIR Database
BCB 570 Spring 2008 11
DIP
GDA1p
YEL017W
YBR161W
YJL152W
ALD5pSsp120p
HPA2p
BCB 570 Spring 2008 12
BCB 570 Spring 2008 13
Biogrid.org
BCB 570 Spring 2008 14
Analyzing P-P interaction networks
Create networks Find structure in networks, search for
modules or motifs Analyze results using known databases,
functional enrichment, expression data, organelle information,etc
BCB 570 Spring 2008 15
Science. 2003 Dec 5;302(5651):1727-36. Epub 2003 Nov 6. A protein interaction map of Drosophila melanogaster. By Giot, et al.
BCB 570 Spring 2008 16
BCB 570 Spring 2008 17
BCB 570 Spring 2008 18
BCB 570 Spring 2008 19Copyright restrictions may apply.
Jonsson, P. F. et al. Bioinformatics 2006 22:2291-2297; doi:10.1093/bioinformatics/btl390
A description of the protein communities identified by k-clique cluster analysis (k = 6)
BCB 570 Spring 2008 20
Find structure Use cliques or highly connected regions in a
network Clique Percolation Method (CPM, see Derényi
et al., 2005) to locate the k-clique percolation clusters of the network
MCL-Markov Cluster Algorithm based on simulation of (stochastic) flow in graphs Enright A.J., Van Dongen S., Ouzounis C.A. An efficient
algorithm for large-scale detection of protein families. Nucleic Acids Research 30(7):1575-1584 (2002).
Animation
BCB 570 Spring 2008 21
Method: MCL Cluster Definition: Natural clusters in a graph are characterised by
the presence of many edges between the members of that cluster, and one expects that the number of ‘higher-length’ (longer) paths between two arbitrary nodes in the cluster is high. Random walks on the graph rarely go from one natural cluster to another.
The MCL algorithm finds cluster structure in graphs by deterministically computes (the probabilities of) random walks through the similarity graph, and uses two operators transforming one set of probabilities into another. It uses the language of stochastic matrices (also called Markov matrices) to capture the mathematical concept of random walks on a graph. Expansion coincides with taking the power of a stochastic matrix using
the normal matrix product finding probabilities of random walks between nodes
Inflation corresponds with taking the Hadamard power of a matrix:
1
kr r
r pq iqpqi
M M M
BCB 570 Spring 2008 22
Example
BCB 570 Spring 2008 23
BCB 570 Spring 2008 24
Adding in Transcriptional Interactions
ChIP-chip with whole genome microarrays determines the range of in vivo DNA binding sites for any given protein
Map protein complexes (interacting proteins and their
Map co-regulated complexes within and across species.
BCB 570 Spring 2008 25
BCB 570 Spring 2008 26
Approach Cross Species
Nature Biotechnology 24, 427 - 433 (2006) Modeling cellular machinery through
biological network comparisonRoded Sharan& Trey Ideker
BCB 570 Spring 2008 27
Network Alignment
Why is this hard?
BCB 570 Spring 2008 28
BCB 570 Spring 2008 29
PATHBlast
Identifies pairs of interaction paths, drawn from the networks of different species or from different processes within a species,
Proteins at equivalent path positions must share strong sequence homology.
Score is a sum of alignments plus the probability of the interaction ideally compared to the null set.
BCB 570 Spring 2008 30
Algorithms for Network Alignment
Scoring: measure similarity of each subnetwork to a predefined structure of interest and the level of conservation of the subnetwork across networks being compared.
Search procedures: find conserved subnetworks of interest.
BCB 570 Spring 2008 31
BCB 570 Spring 2008 32
Edit-Distance Methods Evolution-based
Define M to be set of matches determine by orthology relationships between pairs of proteins
N: set of mismatched interactions, sets of proteins where one pair interacts
D: union of sets of duplicated protein pairs within each network
a M a N a D
S m a n a d a
BCB 570 Spring 2008 33
Fit to a desired structure
Maximum likelihood Compute a log-likelihood ratio that
measures fit to an ideal structure vs. chance that the subnetwork is observed at random (null hypothesis).
Ratios summed over aligned subnetworks to give overall score.
BCB 570 Spring 2008 34
Model of Protein Complex Each protein interacts with high prob ,
independently of other protein pairs. Null: every two proteins interact with a probability
that depends on their node degree, p(u,v) Likelihood that a set of proteins, C, with
interactions E(C) forms a complex is:
, ,
1log log
, 1 ,u v E C u v E C
L Cp u v p u v
BCB 570 Spring 2008 35
BCB 570 Spring 2008 36
Network Queries
BCB 570 Spring 2008 37
Searching
Greedy seach: promising seed network, refines using local search using an editing approach (adding/deleting a protein)
Works well for defined graph structures such as paths or trees
BCB 570 Spring 2008 38
Network Evolution
BCB 570 Spring 2008 39
top related