341: introduction to bioinformatics

43
341: Introduction to Bioinformatics Dr. Nataša Pržulj Department of Computing Imperial College London [email protected] Winter 2011

Upload: alvin-matthews

Post on 01-Jan-2016

25 views

Category:

Documents


2 download

DESCRIPTION

341: Introduction to Bioinformatics. Dr. Nataša Pržulj Department of Computing Imperial College London [email protected] Winter 2011. Topics. Introduction: biology Introduction: graph theory Network properties Network/node centralities Network motifs Network models - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 341: Introduction to Bioinformatics

341: Introduction to Bioinformatics

Dr. Nataša PržuljDepartment of ComputingImperial College [email protected]

Winter 2011

Page 2: 341: Introduction to Bioinformatics

Topics

• Introduction: biology• Introduction: graph theory• Network properties

– Network/node centralities– Network motifs

• Network models• Network/node clustering• Network comparison/alignment• Software tools for network analysis• Interplay between topology and biology

Page 3: 341: Introduction to Bioinformatics
Page 4: 341: Introduction to Bioinformatics
Page 5: 341: Introduction to Bioinformatics

Network Properties

1. Global Network Properties (Chapter 3 of the course textbook “Analysis of

Biological Networks” by Junker and Schreiber)

They give an overall view of a network:1) Degree distribution2) Clustering coefficient and spectrum3) Average diameter

Page 6: 341: Introduction to Bioinformatics

1) Degree Distribution

G

Page 7: 341: Introduction to Bioinformatics

Research debates…

• Degree correlation:– Pearson corr. coefficient between degrees of adjacent vertices– Average neighbor degree; then average over all nodes of

degree k• Structural robustness and attack tolerance:

– “Robust, yet fragile”• Scale-free degree distribution:

– Party vs. date hubs• J.D. Han et al., Nature, 430:88-93, 2004

– Bias in the data construction (sampling)?• M. Stumpf et al., PNAS, 102:4221-4224, 2005• J. Han et al., Nature Biotechnology, 23:839-844, 2005

• High degree nodes:– Essential genes

• H. Jeong at al., Nature 411, 2001. – Disease/cancer genes

• Jonsson and Bates, Bioinformatics, 22(18), 2006• Goh et al., PNAS, 104(21), 2007

Page 8: 341: Introduction to Bioinformatics

• Cv – Clustering coefficient of node vCA= 1/1 = 1CB = 1/3 = 0.33CC = 0 CD = 2/10 = 0.2 …

• C = Avg. clust. coefficient of the whole network = avg {Cv over all nodes v of G}

• C(k) – Avg. clust. coefficient of all nodesof degree kE.g.: C(2) = (CA + CC)/2 = (1+0)/2 = 0.5

=> Clustering spectrum

E.g. (not for G)

2) Clustering Coefficient and Spectrum

G

Need to evaluate whether the value of C (or any other property) is statistically significant.

Page 9: 341: Introduction to Bioinformatics

3) Average Diameter

G

u

v

E.g.(not for G)

• Distance between a pair of nodes u and v:

Du,v = min {length of all paths between u and v} = min {3,4,3,2} = 2 = dist(u,v)

• Average diameter of the whole network:

D = avg {Du,v for all pairs of nodes {u,v} in G}

• Spectrum of the shortest path lengths

Page 10: 341: Introduction to Bioinformatics

Network Properties

• Global network properties might not be detailed enough to capture complex topological characteristics of large networks

Page 11: 341: Introduction to Bioinformatics

Network Properties

2. Local Network Properties(Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

• They encompass larger number of constraints, thus reducing degrees of freedom in which networks being compared can vary

• How do we show that two networks are different?

• How do we show that they are the same?• How do we quantify the level of their similarity?

Page 12: 341: Introduction to Bioinformatics

Network Properties

2. Local Network Properties(Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

1) Network motifs2) Graphlets:

2.1) Relative Graphlet Frequency Distance between 2 networks

2.2) Graphlet Degree Distribution Agreement between 2 networks

Page 13: 341: Introduction to Bioinformatics

• Small subgraphs that are overrepresented in a network when compared to randomized networks

• Network motifs:– Reflect the underlying evolutionary processes that generated the network– Carry functional information– Define superfamilies of networks

- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1

• But:– Functionally important but not statistically significant patterns could be missed– The choice of the appropriate null model is crucial, especially across “families”

1) Network motifs (Uri Alon’s group, ’02-’04)

Page 14: 341: Introduction to Bioinformatics

• Small subgraphs that are overrepresented in a network when compared to randomized networks

• Network motifs:– Reflect the underlying evolutionary processes that generated the network– Carry functional information– Define superfamilies of networks

- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1

• But:– Functionally important but not statistically significant patterns could be missed– The choice of the appropriate null model is crucial, especially across “families”– Random graphs with the same in- and out- degree distribution as data might not be

the best network null model

1) Network motifs (Uri Alon’s group, ’02-’04)

Page 15: 341: Introduction to Bioinformatics

1) Network motifs (Uri Alon’s group, ’02-’04)

http://www.weizmann.ac.il/mcb/UriAlon/

Also, see Pajek, MAVisto, and FANMOD

Page 16: 341: Introduction to Bioinformatics

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

_____

Different from network motifs: Induced subgraphs Of any frequency

2) Graphlets (Przulj group, ’04-’10)

Page 17: 341: Introduction to Bioinformatics

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free

or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Page 18: 341: Introduction to Bioinformatics

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free

or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Page 19: 341: Introduction to Bioinformatics

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free

or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

2.1) Relative Graphlet Frequency (RGF) distance between networks G and H:

Page 20: 341: Introduction to Bioinformatics

Generalize node degree

2.2) Graphlet Degree Distributions

Page 21: 341: Introduction to Bioinformatics

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 22: 341: Introduction to Bioinformatics

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 23: 341: Introduction to Bioinformatics

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Network structure vs. biological function & disease

Graphlet Degree (GD) vectors, or “node signatures”

Page 24: 341: Introduction to Bioinformatics

Similarity measure between “node signature” vectors

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Page 25: 341: Introduction to Bioinformatics

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Signature Similarity Measure between nodes u and v

Page 26: 341: Introduction to Bioinformatics

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

Page 27: 341: Introduction to Bioinformatics

40%SMD1

PMA1

YBR095C

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

Page 28: 341: Introduction to Bioinformatics

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

Page 29: 341: Introduction to Bioinformatics

90%*

SMD1

SMB1RPO26

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

*Statistically significant threshold at ~85%

Page 30: 341: Introduction to Bioinformatics

Later we will see how to use this and other techniquesto link network structure with biological function

Page 31: 341: Introduction to Bioinformatics

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

Generalize Degree Distribution of a network

The degree distribution measures:• the number of nodes “touching” k edges for each value of k

Page 32: 341: Introduction to Bioinformatics

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 33: 341: Introduction to Bioinformatics

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 34: 341: Introduction to Bioinformatics

/ sqrt(2) ( to make it between 0 and 1)

This is called Graphlet Degree Distribution (GDD) Agreement between networks G and H.

Page 35: 341: Introduction to Bioinformatics

Software that implements many of these networkproperties and compares networks with respect to them: GraphCrunchhttp://www.ics.uci.edu/~bio-nets/graphcrunch/

Page 36: 341: Introduction to Bioinformatics

Network properties

3. Network/node centralities(Chapter 4 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

• Rank nodes according to their “topological importance”

Page 37: 341: Introduction to Bioinformatics

3) Network/node centralities

1 2 3 4 5 6

7

8

9

10

If nodes are housing communities, where to build a hospital?

Page 38: 341: Introduction to Bioinformatics

3) Network/node centralities

1 2 3 4 5 6

If nodes are housing communities, where to build a hospital?

Page 39: 341: Introduction to Bioinformatics

Network properties

3. Network/node centralities

• Different centrality measures exist• Centrality values comparable inside a given

network only• Centrality values of two centrality measures

incomparable even within the same network• Some centrality measures can be applied to

connected networks only

Page 40: 341: Introduction to Bioinformatics

3) Network/node centralities

• Degree centrality• Closeness centrality• Eccentricity centrality• Betweenness centrality

• Other centrality measures exist, e.g.:– Eigenvector centrality– Subgraph centrality– …

• Software tools: Visone (social nets) and CentiBiN (biological nets)

Page 41: 341: Introduction to Bioinformatics

3) Network/node centralities

• Degree centrality:– Nodes with high degrees have high centrality

Cd(v)=deg(v)

• Closeness centrality:– Nodes with short paths to all other nodes have

high centrality

Page 42: 341: Introduction to Bioinformatics

3) Network/node centralities

• Essentricity centrality:– Nodes with short paths to any other node have

high centrality

• Betweenness centrality:– Nodes (or edges) that occur in many of the

shortest paths have high centrality

Page 43: 341: Introduction to Bioinformatics

Topics

• Introduction: biology• Introduction: graph theory• Network properties

– Network/node centralities– Network motifs

• Network models• Network/node clustering• Network comparison/alignment• Software tools for network analysis• Interplay between topology and biology