analysis of genetic networks using attributed graph matching

36
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

Post on 21-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

ANALYSIS OF GENETIC NETWORKS USING

ATTRIBUTED GRAPH MATCHING

Page 2: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

BACKGROUND

• Completion of sequencing projects• Need for functional discovery• Emerging area of study: Large

scale genomic analysis• Similarity of living systems

Page 3: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

GENETIC NETWORKS

• Modelling genetic networks• Interaction of genes and proteins• Relationship between topology and

function

Page 4: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

MOTIVATION

• Common biological processes• Comparison of networks• Discovering missing interactions• Discovering missing genes

Page 5: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

GRAPH MATCHING

mpn132

mpn124

mpn141

mpn145

mpn134

mpn133

mge234

mge235

mge236

mge312

mge314

mge310

mge313

mge336mge337

Search-based Algorithm

Pruning Techniques

G1

G2

Page 6: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

ROADMAP

• Scale-Free Networks• Modelling Genetic Networks• Graph Matching• Algorithm• Results

Page 7: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

SCALE-FREE NETWORKS

Page 8: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

COMPLEX NETWORKS

• Small-world model– WWW– Human acquaintances network– Citation networks– Biological networks

Page 9: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

SMALL-WORLD

• Features:– Characteristic path length– Clustering coefficient– Sparseness

Page 10: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

SMALL-WORLD

• Somewhere in between regular & random graphs

Page 11: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

SMALL-WORLD • Highly clustered• Short diameter

Page 12: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

SCALE-FREE NETWORKS

• Complex networks: biological, social, www, power grid, citation etc.

• Power low connectivity: P(k) = k -

• Hubs - authorities

Page 13: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

SCALE-FREE NETWORKS

• Application for testing scale free behavior• Yeast• Helicobacter Pylori• Mycoplasma Pnuemonia• Mycoplasma Genitelium• Linear log-log graph• Slope =

Page 14: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

SCALE-FREE NETWORKS • Slope is calculated by least mean

square method

Page 15: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

TOPOLOGY & FUNCTIONALITY

• Small diameter– ease of dissemination of information– ease of restoring after disturbance

• Cliquishness – Alternate paths are found

• Heterogeneity– Random removal does not effect the

network– Hubs are vulnerable to attack

Page 16: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

BIOLOGICAL ASPECTS • Multifunctionality

– Grouped into functional units

• Stability• Reason: Most of

the interactions are between hubs and authorities

Page 17: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

MODELLING GENETIC NETWORKS

Page 18: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

TYPES OF GENETIC NETWORKS• Categorized by data sources

– Metabolic pathways– Gene expression arrays– Protein interactions– Gene interactions

Page 19: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

INTERACTION MAPS• High level perspective

– Nodes: Genes or proteins– Edges: Presence of an interaction

• Data sources– Two-hybrid analysis– Fusion analysis– Chromosomal proximity– Phylogenetic analysis

Page 20: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

GRAPH MATCHING

Page 21: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

PROBLEM DEFINITION

Attributed Relational Graph (ARG)

G = { V, E, X}.

V = {v1, v2, …, vn} Nodes

E = {e1, e2, …, em} Edges

X = {x1, x2,…,xn} Attributes

Page 22: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

INEXACT SUBGRAPH MATCHING

Allow for :

• Mismatching attribute values

• Missing nodes

• Missing links

Also called error-correcting subgraph isomorphism

NP-Complete

Page 23: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

SEARCH TECHNIQUES

• Cost function• Pruning (Structure Constraints)•Backtracking

Page 24: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

ATTRIBUTED GRAPH MATCHING TOOL

Page 25: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

ATTRIBUTE MATCHING

- Amino Acid Sequence Content Composition– array of 20, percentage of each aa– Amino acid grouped into classes: array of 6– Amino acid triples grouped into classes:

array of 216

MKVLNKNEL

216

1

2)]()([ 21

iiiS XX

6 x 6 x 6

A

anOaX

A

1n

))(( 1)(

Page 26: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

ATTRIBUTE MATCHING

Difference in amino acid composition values of gene pairs for M. Genitalium and M. Pneumoniae.

Score

observations

Page 27: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

STRUCTURAL CONSTRAINTS

• Effect of scale-free behaviour– Connectivity information: Highly

heterogeneous, thus start with most connected and work around it

– Pruning strategy: comparibility is determined by power low

loglog

)(log)(log

12

12

12

12

kk

kPkP

xx

yy

Page 28: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

STRUCTURAL CONSTRAINTS• Neigborhood connectivity

– Choose the neighbor at the next stage

• Backtracking– Component by component– Go back to the neighbor with the

most connectivity within the component

Page 29: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

TEST CASE

• Mycoplasma Genitalium: – smallest genome (470 ORFs)

• Mycoplasma Pnuemoniae: – Very similar, superset (688 ORFs)

Page 30: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

TEST CASE...• Mycoplasma Genitalium:

– 232 nodes– 211 links

• Mycoplasma Pnuemoniae: – 267 nodes– 257 links

• Inputs:• MGE links• MPN links

• MGE synonyms• MPN synonyms

• MGE amino acid sequence• MPN amino acid sequence

Page 31: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

RESULTSMGE MPN

Page 32: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

DISCOVERY OF MISSING DATA

• Missing link

• Link between in MPN632 and MPN637 is missing in our data but exists in literature

Page 33: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

DISCOVERY OF MISSING DATA

• Missing node with known COG

MPN236--- MPN237---MPN238---MPN678MG098 ----MG099-----MG100----MG459

MG459 is ortholog of MPN678

Page 34: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

DISCOVERY OF MISSING DATA

• Missing node without known ortholog

Page 35: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

CONCLUSION

• Large-scale genomics• Interaction data captures system

structure and dynamics• Graph matching exploits the scale-

free characteristics• Novel interactions and genes can

be identified

Page 36: ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

ACKNOWLEDGEMENT

• YASEMİN TÜRKELİ