341: introduction to bioinformatics

341: Introduction to Bioinformatics

Dr. Nataša PržuljDepartment of ComputingImperial College Londonnatasha@imperial.ac.uk

Winter 2011

Topics

• Introduction: biology• Introduction: graph theory• Network properties

– Network/node centralities– Network motifs

• Network models• Network/node clustering• Network comparison/alignment• Software tools for network analysis• Interplay between topology and biology

Network Properties

1. Global Network Properties (Chapter 3 of the course textbook “Analysis of

Biological Networks” by Junker and Schreiber)

They give an overall view of a network:1) Degree distribution2) Clustering coefficient and spectrum3) Average diameter

1) Degree Distribution

Research debates…

• Degree correlation:– Pearson corr. coefficient between degrees of adjacent vertices– Average neighbor degree; then average over all nodes of

degree k• Structural robustness and attack tolerance:

– “Robust, yet fragile”• Scale-free degree distribution:

– Party vs. date hubs• J.D. Han et al., Nature, 430:88-93, 2004

– Bias in the data construction (sampling)?• M. Stumpf et al., PNAS, 102:4221-4224, 2005• J. Han et al., Nature Biotechnology, 23:839-844, 2005

• High degree nodes:– Essential genes

• H. Jeong at al., Nature 411, 2001. – Disease/cancer genes

• Jonsson and Bates, Bioinformatics, 22(18), 2006• Goh et al., PNAS, 104(21), 2007

• Cv – Clustering coefficient of node vCA= 1/1 = 1CB = 1/3 = 0.33CC = 0 CD = 2/10 = 0.2 …

• C = Avg. clust. coefficient of the whole network = avg {Cv over all nodes v of G}

• C(k) – Avg. clust. coefficient of all nodesof degree kE.g.: C(2) = (CA + CC)/2 = (1+0)/2 = 0.5

=> Clustering spectrum

E.g. (not for G)

2) Clustering Coefficient and Spectrum

Need to evaluate whether the value of C (or any other property) is statistically significant.

3) Average Diameter

E.g.(not for G)

• Distance between a pair of nodes u and v:

Du,v = min {length of all paths between u and v} = min {3,4,3,2} = 2 = dist(u,v)

• Average diameter of the whole network:

D = avg {Du,v for all pairs of nodes {u,v} in G}

• Spectrum of the shortest path lengths

Network Properties

• Global network properties might not be detailed enough to capture complex topological characteristics of large networks

Network Properties

2. Local Network Properties(Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

• They encompass larger number of constraints, thus reducing degrees of freedom in which networks being compared can vary

• How do we show that two networks are different?

• How do we show that they are the same?• How do we quantify the level of their similarity?

Network Properties

2. Local Network Properties(Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

1) Network motifs2) Graphlets:

2.1) Relative Graphlet Frequency Distance between 2 networks

2.2) Graphlet Degree Distribution Agreement between 2 networks

• Small subgraphs that are overrepresented in a network when compared to randomized networks

• Network motifs:– Reflect the underlying evolutionary processes that generated the network– Carry functional information– Define superfamilies of networks

- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1

• But:– Functionally important but not statistically significant patterns could be missed– The choice of the appropriate null model is crucial, especially across “families”

1) Network motifs (Uri Alon’s group, ’02-’04)

• Small subgraphs that are overrepresented in a network when compared to randomized networks

• Network motifs:– Reflect the underlying evolutionary processes that generated the network– Carry functional information– Define superfamilies of networks

- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1

• But:– Functionally important but not statistically significant patterns could be missed– The choice of the appropriate null model is crucial, especially across “families”– Random graphs with the same in- and out- degree distribution as data might not be

the best network null model

1) Network motifs (Uri Alon’s group, ’02-’04)

http://www.weizmann.ac.il/mcb/UriAlon/

Also, see Pajek, MAVisto, and FANMOD

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Different from network motifs: Induced subgraphs Of any frequency

2) Graphlets (Przulj group, ’04-’10)

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free

or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

2.1) Relative Graphlet Frequency (RGF) distance between networks G and H:

Generalize node degree

2.2) Graphlet Degree Distributions

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Network structure vs. biological function & disease

Graphlet Degree (GD) vectors, or “node signatures”

Similarity measure between “node signature” vectors

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Signature Similarity Measure between nodes u and v

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

40%SMD1

YBR095C

SMB1RPO26

*Statistically significant threshold at ~85%

Later we will see how to use this and other techniquesto link network structure with biological function

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

Generalize Degree Distribution of a network

The degree distribution measures:• the number of nodes “touching” k edges for each value of k

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

/ sqrt(2) ( to make it between 0 and 1)

This is called Graphlet Degree Distribution (GDD) Agreement between networks G and H.

Software that implements many of these networkproperties and compares networks with respect to them: GraphCrunchhttp://www.ics.uci.edu/~bio-nets/graphcrunch/

Network properties

3. Network/node centralities(Chapter 4 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

• Rank nodes according to their “topological importance”

3) Network/node centralities

1 2 3 4 5 6

If nodes are housing communities, where to build a hospital?

1 2 3 4 5 6

If nodes are housing communities, where to build a hospital?

Network properties

3. Network/node centralities

• Different centrality measures exist• Centrality values comparable inside a given

network only• Centrality values of two centrality measures

incomparable even within the same network• Some centrality measures can be applied to

connected networks only

• Degree centrality• Closeness centrality• Eccentricity centrality• Betweenness centrality

• Other centrality measures exist, e.g.:– Eigenvector centrality– Subgraph centrality– …

• Software tools: Visone (social nets) and CentiBiN (biological nets)

• Degree centrality:– Nodes with high degrees have high centrality

Cd(v)=deg(v)

• Closeness centrality:– Nodes with short paths to all other nodes have

high centrality

• Essentricity centrality:– Nodes with short paths to any other node have

high centrality

• Betweenness centrality:– Nodes (or edges) that occur in many of the

shortest paths have high centrality

Topics

• Introduction: biology• Introduction: graph theory• Network properties

– Network/node centralities– Network motifs

• Network models• Network/node clustering• Network comparison/alignment• Software tools for network analysis• Interplay between topology and biology

341: introduction to bioinformatics

network analysisinterplay

nodes v of g

high degree nodes

nodesof degree ke

g2 clustering coefficient

pairs of nodes

pair of nodes u

g distance

Documents

introduction to bioinformatics - craig...

introduction to bioinformatics … · introduction to...

introduction to bioinformatics introduction to...

introduction to bioinformatics data analysis of...

introduction to bioinformatics · biopotato bioinformatics...

introduction to bioinformatics yana kortsarts references: an...

341: introduction to bioinformatics dr. nataša pržulj...

eb3233 bioinformatics introduction to bioinformatics

341: introduction to bioinformatics

introduction to bioinformatics

341: introduction to bioinformatics dr. nataša pržulj...

introduction to bioinformatics

bioinformatics introduction

1. introduction to biology and bioinformatics ·...

introduction to bioinformatics, 2010 - göteborgs...

introduction to bioinformatics - department of computer...

metabolic networks john pinney theoretical systems biology...

1 introduction to bioinformatics. 2 what is bioinformatics?

introduction to bioinformatics 2. genetics background course...

introduction to bioinformatics - hu-berlin.de ·...