analysis of genetic networks using attributed graph matching
Post on 21-Dec-2015
222 views
TRANSCRIPT
ANALYSIS OF GENETIC NETWORKS USING
ATTRIBUTED GRAPH MATCHING
BACKGROUND
• Completion of sequencing projects• Need for functional discovery• Emerging area of study: Large
scale genomic analysis• Similarity of living systems
GENETIC NETWORKS
• Modelling genetic networks• Interaction of genes and proteins• Relationship between topology and
function
MOTIVATION
• Common biological processes• Comparison of networks• Discovering missing interactions• Discovering missing genes
GRAPH MATCHING
mpn132
mpn124
mpn141
mpn145
mpn134
mpn133
mge234
mge235
mge236
mge312
mge314
mge310
mge313
mge336mge337
Search-based Algorithm
Pruning Techniques
G1
G2
ROADMAP
• Scale-Free Networks• Modelling Genetic Networks• Graph Matching• Algorithm• Results
SCALE-FREE NETWORKS
COMPLEX NETWORKS
• Small-world model– WWW– Human acquaintances network– Citation networks– Biological networks
SMALL-WORLD
• Features:– Characteristic path length– Clustering coefficient– Sparseness
SMALL-WORLD
• Somewhere in between regular & random graphs
SMALL-WORLD • Highly clustered• Short diameter
SCALE-FREE NETWORKS
• Complex networks: biological, social, www, power grid, citation etc.
• Power low connectivity: P(k) = k -
• Hubs - authorities
SCALE-FREE NETWORKS
• Application for testing scale free behavior• Yeast• Helicobacter Pylori• Mycoplasma Pnuemonia• Mycoplasma Genitelium• Linear log-log graph• Slope =
SCALE-FREE NETWORKS • Slope is calculated by least mean
square method
TOPOLOGY & FUNCTIONALITY
• Small diameter– ease of dissemination of information– ease of restoring after disturbance
• Cliquishness – Alternate paths are found
• Heterogeneity– Random removal does not effect the
network– Hubs are vulnerable to attack
BIOLOGICAL ASPECTS • Multifunctionality
– Grouped into functional units
• Stability• Reason: Most of
the interactions are between hubs and authorities
MODELLING GENETIC NETWORKS
TYPES OF GENETIC NETWORKS• Categorized by data sources
– Metabolic pathways– Gene expression arrays– Protein interactions– Gene interactions
INTERACTION MAPS• High level perspective
– Nodes: Genes or proteins– Edges: Presence of an interaction
• Data sources– Two-hybrid analysis– Fusion analysis– Chromosomal proximity– Phylogenetic analysis
GRAPH MATCHING
PROBLEM DEFINITION
Attributed Relational Graph (ARG)
G = { V, E, X}.
V = {v1, v2, …, vn} Nodes
E = {e1, e2, …, em} Edges
X = {x1, x2,…,xn} Attributes
INEXACT SUBGRAPH MATCHING
Allow for :
• Mismatching attribute values
• Missing nodes
• Missing links
Also called error-correcting subgraph isomorphism
NP-Complete
SEARCH TECHNIQUES
• Cost function• Pruning (Structure Constraints)•Backtracking
ATTRIBUTED GRAPH MATCHING TOOL
ATTRIBUTE MATCHING
- Amino Acid Sequence Content Composition– array of 20, percentage of each aa– Amino acid grouped into classes: array of 6– Amino acid triples grouped into classes:
array of 216
MKVLNKNEL
216
1
2)]()([ 21
iiiS XX
6 x 6 x 6
A
anOaX
A
1n
))(( 1)(
ATTRIBUTE MATCHING
Difference in amino acid composition values of gene pairs for M. Genitalium and M. Pneumoniae.
Score
observations
STRUCTURAL CONSTRAINTS
• Effect of scale-free behaviour– Connectivity information: Highly
heterogeneous, thus start with most connected and work around it
– Pruning strategy: comparibility is determined by power low
loglog
)(log)(log
12
12
12
12
kk
kPkP
xx
yy
STRUCTURAL CONSTRAINTS• Neigborhood connectivity
– Choose the neighbor at the next stage
• Backtracking– Component by component– Go back to the neighbor with the
most connectivity within the component
TEST CASE
• Mycoplasma Genitalium: – smallest genome (470 ORFs)
• Mycoplasma Pnuemoniae: – Very similar, superset (688 ORFs)
TEST CASE...• Mycoplasma Genitalium:
– 232 nodes– 211 links
• Mycoplasma Pnuemoniae: – 267 nodes– 257 links
• Inputs:• MGE links• MPN links
• MGE synonyms• MPN synonyms
• MGE amino acid sequence• MPN amino acid sequence
RESULTSMGE MPN
DISCOVERY OF MISSING DATA
• Missing link
• Link between in MPN632 and MPN637 is missing in our data but exists in literature
DISCOVERY OF MISSING DATA
• Missing node with known COG
MPN236--- MPN237---MPN238---MPN678MG098 ----MG099-----MG100----MG459
MG459 is ortholog of MPN678
DISCOVERY OF MISSING DATA
• Missing node without known ortholog
CONCLUSION
• Large-scale genomics• Interaction data captures system
structure and dynamics• Graph matching exploits the scale-
free characteristics• Novel interactions and genes can
be identified
ACKNOWLEDGEMENT
• YASEMİN TÜRKELİ