algorithms for biological networks prof. tijana milenković computer science and engineering...

43
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame [email protected] Fall 2010

Upload: berniece-johnson

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Algorithms for Biological Networks

Prof. Tijana MilenkovićComputer Science and Engineering

University of Notre Dame [email protected]

Fall 2010

Page 2: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Topics

• Introduction: biology• Introduction: graph theory• Network properties

– Network/node centralities– Network motifs

• Network models• Network/node clustering• Network comparison/alignment• Software tools for network analysis• Interplay between topology and biology

Page 3: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010
Page 4: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010
Page 5: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Network Properties

1. Global Network Properties (Chapter 3 of the course textbook “Analysis of

Biological Networks” by Junker and Schreiber)

They give an overall view of a network:1) Degree distribution2) Clustering coefficient and spectrum3) Average diameter

Page 6: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

1) Degree Distribution

G

Page 7: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Research debates…

• Degree correlation:– Pearson corr. coefficient between degrees of adjacent vertices– Average neighbor degree; then average over all nodes of

degree k• Structural robustness and attack tolerance:

– “Robust, yet fragile”• Scale-free degree distribution:

– Party vs. date hubs• J.D. Han et al., Nature, 430:88-93, 2004

– Bias in the data construction (sampling)?• M. Stumpf et al., PNAS, 102:4221-4224, 2005• J. Han et al., Nature Biotechnology, 23:839-844, 2005

• High degree nodes:– Essential genes

• H. Jeong at al., Nature 411, 2001. – Disease/cancer genes

• Jonsson and Bates, Bioinformatics, 22(18), 2006• Goh et al., PNAS, 104(21), 2007

Page 8: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

• Cv – Clustering coefficient of node vCA= 1/1 = 1CB = 1/3 = 0.33CC = 0 CD = 2/10 = 0.2 …

• C = Avg. clust. coefficient of the whole network = avg {Cv over all nodes v of G}

• C(k) – Avg. clust. coefficient of all nodesof degree kE.g.: C(2) = (CA + CC)/2 = (1+0)/2 = 0.5

=> Clustering spectrum

E.g. (not for G)

2) Clustering Coefficient and Spectrum

G

Need to evaluate whether the value of C (or any other property) is statistically significant.

Page 9: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

3) Average Diameter

G

u

v

E.g.(not for G)

• Distance between a pair of nodes u and v:

Du,v = min {length of all paths between u and v} = min {3,4,3,2} = 2 = dist(u,v)

• Average diameter of the whole network:

D = avg {Du,v for all pairs of nodes {u,v} in G}

• Spectrum of the shortest path lengths

Page 10: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Network Properties

• Global network properties might not be detailed enough to capture complex topological characteristics of large networks

Page 11: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Network Properties

2. Local Network Properties(Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

• They encompass larger number of constraints, thus reducing degrees of freedom in which networks being compared can vary

• How do we show that two networks are different?

• How do we show that they are the same?• How do we quantify the level of their similarity?

Page 12: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Network Properties

2. Local Network Properties(Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

1) Network motifs2) Graphlets:

2.1) Relative Graphlet Frequency Distance between 2 networks

2.2) Graphlet Degree Distribution Agreement between 2 networks

Page 13: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

• Small subgraphs that are overrepresented in a network when compared to randomized networks

• Network motifs:– Reflect the underlying evolutionary processes that generated the network– Carry functional information– Define superfamilies of networks

- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1

• But:– Functionally important but not statistically significant patterns could be missed– The choice of the appropriate null model is crucial, especially across “families”

1) Network motifs (Uri Alon’s group, ’02-’04)

Page 14: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

• Small subgraphs that are overrepresented in a network when compared to randomized networks

• Network motifs:– Reflect the underlying evolutionary processes that generated the network– Carry functional information– Define superfamilies of networks

- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1

• But:– Functionally important but not statistically significant patterns could be missed– The choice of the appropriate null model is crucial, especially across “families”– Random graphs with the same in- and out- degree distribution as data might not be

the best network null model

1) Network motifs (Uri Alon’s group, ’02-’04)

Page 15: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

1) Network motifs (Uri Alon’s group, ’02-’04)

http://www.weizmann.ac.il/mcb/UriAlon/

Also, see Pajek, MAVisto, and FANMOD

Page 16: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

_____

Different from network motifs: Induced subgraphs Of any frequency

2) Graphlets (Przulj group, ’04-’10)

Page 17: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free

or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Page 18: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free

or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Page 19: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free

or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

2.1) Relative Graphlet Frequency (RGF) distance between networks G and H:

Page 20: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Generalize node degree

2.2) Graphlet Degree Distributions

Page 21: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 22: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 23: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Network structure vs. biological function & disease

Graphlet Degree (GD) vectors, or “node signatures”

Page 24: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Similarity measure between “node signature” vectors

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Page 25: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Signature Similarity Measure between nodes u and v

Page 26: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

Page 27: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

40%SMD1

PMA1

YBR095C

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

Page 28: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

Page 29: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

90%*

SMD1

SMB1RPO26

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

*Statistically significant threshold at ~85%

Page 30: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Later we will see how to use this and other techniquesto link network structure with biological function

Page 31: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

Generalize Degree Distribution of a network

The degree distribution measures:• the number of nodes “touching” k edges for each value of k

Page 32: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 33: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 34: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

/ sqrt(2) ( to make it between 0 and 1)

This is called Graphlet Degree Distribution (GDD) Agreement between networks G and H.

Page 35: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Software that implements many of these networkproperties and compares networks with respect to them: GraphCrunchhttp://www.ics.uci.edu/~bio-nets/graphcrunch/

Page 36: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Network properties

3. Network/node centralities(Chapter 4 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

• Rank nodes according to their “topological importance”

Page 37: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

3) Network/node centralities

1 2 3 4 5 6

7

8

9

10

If nodes are housing communities, where to build a hospital?

Page 38: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

3) Network/node centralities

1 2 3 4 5 6

If nodes are housing communities, where to build a hospital?

Page 39: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Network properties

3. Network/node centralities

• Different centrality measures exist• Centrality values comparable inside a given

network only• Centrality values of two centrality measures

incomparable even within the same network• Some centrality measures can be applied to

connected networks only

Page 40: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

3) Network/node centralities

• Degree centrality• Closeness centrality• Eccentricity centrality• Betweenness centrality

• Other centrality measures exist, e.g.:– Eigenvector centrality– Subgraph centrality– …

• Software tools: Visone (social nets) and CentiBiN (biological nets)

Page 41: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

3) Network/node centralities

• Degree centrality:– Nodes with high degrees have high centrality

Cd(v)=deg(v)

• Closeness centrality:– Nodes with short paths to all other nodes have

high centrality

Page 42: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

3) Network/node centralities

• Essentricity centrality:– Nodes with short paths to any other node have

high centrality

• Betweenness centrality:– Nodes (or edges) that occur in many of the

shortest paths have high centrality

Page 43: Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame tmilenko@nd.edu Fall 2010

Topics

• Introduction: biology• Introduction: graph theory• Network properties

– Network/node centralities– Network motifs

• Network models• Network/node clustering• Network comparison/alignment• Software tools for network analysis• Interplay between topology and biology