University at Buffalo The State University of New York
Clustering of Interaction NetworkClustering of Interaction Network
Definition
Process to detect densely connected sub-graphs
Determines protein complexes or functional modules
Difficulties
Noisy data (too many false positives or false negatives) Cannot be solved by traditional clustering techniques
Difficult to define the pair-wise distance between proteins in the network.
Protein complexes may overlap. Disparate sources of data
Different reliabilities 17%~50%
Small overlaps <17%
University at Buffalo The State University of New York
Protein Interaction NetworkProtein Interaction Network Undirected, unweighted graph
Node represents protein, edge represents interaction
Example of Yeast protein interaction
network
Importance
Provide a global view of cellular organizations and biological functions
Applicable to systematic approaches for functional knowledge discovery
Problem
Large scale
Complex connectivity
University at Buffalo The State University of New York
Small-world Phenomenon ( Watts & Strogatz )
Appearance of networks in the middle of regular and random networks
Higher average clustering coefficient than expected by random chance
Significantly small average shortest path length
Scale-free Distribution ( Barabasi & Albert )
Network growth by preferential attachment
Power law degree distribution – a few high degree nodes, many low degree nodes
Clustering coefficient distribution independent to degree
Protein Interaction Database DIP MIPS
density 0.0015 0.0015
average clustering coefficient 0.2283 0.2878
average shortest path length 4.14 4.43
degree distribution (γ) 1.77 1.64
high modularity
hub existence
Structural Property
University at Buffalo The State University of New York
Conventional Graph Clustering Approaches
Density-based Clustering
Finding densely connected sub-graphs ( e.g. Maximal clique algorithm )
Hierarchical Clustering
Top-down approach: iteratively partitioning a graph
( e.g. Minimum cut algorithm )
Bottom-up approach: iteratively merging nodes
( e.g. Node merging by common neighbors )
Problems
Computationally inefficient
Unable to detect overlapping clusters
Discard sparsely connected nodes
University at Buffalo The State University of New York
Functional Influence ModelFunctional Flow
treat each protein of known functional annotation as a ‘source’ of ‘functional flow’ for that function
simulating the spread of this functional flow through the neighborhoods surrounding the sources with random walk.
‘functional score’: the amount of ‘flow’ that the protein has received for that function
u vFunc(a)
University at Buffalo The State University of New York
Functional InfluenceFunctional Influence based on Distance.
Weibull Distribution
kdk edk
kdf)(1)(),;(
Curve Fitting
d is the distance between two nodes
University at Buffalo The State University of New York
Information Flow Simulation
Computation of functional influence infs(x) of s on x ∈ V based on Shortest Path
Input: a weighted interaction network and a source node s
Output: functional influence pattern of s
Measurements
Functional Influence Model
PathRatioPathRatio is the natural “aging” or “losing” of information propagation in the network.
SPath(s,y) is all the shortest paths between node s and node y.
PR(s,y) is the PathRatio between node s and node y.
PathStrength
PS(P) measures the strength of path P using weights
on the edges along the path P.
)),((
)(),( ),(
ysSPathN
pSPysPR ysSPathp
),()( 11
ii
k
ivvwpPS
University at Buffalo The State University of New York
Framework of functional influence simulation
)(),()inf()( dFysPRsysI
)(
)()(infyNx
s yxIy
Algorithm
1. Initialize inf(s)
2. Compute initial flow I(s → y) by
3. Update inf(y) by
4. Repeat 3 for every node in the network.
5. Finally, the functional profile,
is generated for every node in the network.
F(d) is the functional distribution model. d is the distance between node s and node y.
PR(s,y) is the Path Resistance between node s and node y.
Inf(s) is the initial functional influence from node s.
Infs(y) is the functional influence received by node y from node s.
)](inf),...,(inf),([inf)(21
yyyyVnsss
University at Buffalo The State University of New York
Functional Module Detection (FMD)
University at Buffalo The State University of New York
FlowChart for functional module detection
University at Buffalo The State University of New York
Functional Modularity Detection
Experimental Data DIP (4935 proteins, 14162 interaction)
Evaluation
Functional categories and annotations from MIPS
Hyper-geometric p-value
Result
University at Buffalo The State University of New York
Computational Epidemiology
Computational Epidemiology is a multidisciplinary field utilizing techniques to develop tools and
models to aid epidemiologists in their study of the spread of diseases.
1. Developing a virus spread and containment
respond model
2. Understanding virus spread and identifying
critical properties
3. Utilizing this finding into real infectious virus
spread
4. Analyzing results of the containment
strategy (death toll vs. strategies)
University at Buffalo The State University of New York
Virus Spread Network Model
What represent nodes and edges in virus spread network model? Node
Person (community network) Town or place (road network)
Edge Interaction (community network) Pathway (road network)
Weight of nodes and edges Changed by time t based on virus
spread dynamics model Node weight: Status of health (0 ~ 1) Edge weight: Status of strength (0 ~ 1)
University at Buffalo The State University of New York
Model Scheme
Spread Model Spreading phase: edges which are in the
region of spreading will be damaged
Defense Model Signaling and propagation phase: nodes
which have a certain number of damaged edges will send signals to neighbor nodes
Defense action phase: nodes which have a certain level of signals from neighbor nodes will remove all edges of those nodes
Signaling alarms to neighbor nodes from infected neighbor node
Virus progression to neighbor nodes
Culling nodes to prevent from virus
progression
University at Buffalo The State University of New York
Spread ModelSpreading Model
Simulating disease spreading
Damaging nodes and edges which are in a virus spread radius from center
Virus Spread by r(t)
University at Buffalo The State University of New York
Defense ModelDefense Model
Simulating defense system of disease spreading and message spreading
Culling interactions from damaged nodes in order to stop spreading (Edge Culling in Green Circles)
University at Buffalo The State University of New York
Problem / Solution Approach
Which element of virus spread system has the greatest impact on containment campaign? Identifying critical element
of system by computational modeling and stochastic simulation.
How to plan a effective containment campaign for minimizing damages by virus spread? Mining best combination
of critical parameters under certain conditions.
Parameters Critical parameterSimulation & Analysis
University at Buffalo The State University of New York
Application Virus Spread Simulation on the road network at the city of Oldenburg, German
Green edges: Healthy edges Red edges: Damaged edges by spread process Blue edges: Damaged edges by defense process
Uncontrolled = 0.02
Intermediate = 0.12
Controlled = 0.22
University at Buffalo The State University of New York
Osteoporosis Osteoporosis
Definition: “a systemic skeletal disease characterized by low bone mass and micro-architectural deterioration of bone tissue leading to enhanced bone fragility and a consequent increase in fracture risk”
25 million people in the United States are suffered. $10 billion dollars are expended by medical charges including
rehabilitation and treatment facilities. Research Funding will be $200 billion by the year of 2040
Normal Osteoporosis
University at Buffalo The State University of New York
Challenges Diagnosis of Osteoporosis?
Traditional method of evaluating bone strength is by assessing bone mineral density (BMD).
Limitations on BMD A major limitation of BMD is that it incompletely reflects
variation in bone strength. Other factors like bone microarchitecture contribute
substantially to bone strength By evaluating bone microstructure we can improve
determination of bone quality and strength
Computational Model on Bone Microstructure
University at Buffalo The State University of New York
Computational Model on Bone Microstructure
Questions What is the better way to evaluate bone strength? How can we identify fragile locations of the bone structure? Why don’t we think this problem in a new direction?
Let me think this problem with the structural point of view.
Graph-based approach of bone microstructure Bone microstructure contributes on bone strength. We suppose rod-like mineral fibers represented by edges in a
graph. It is capable of quantitative
assessment of bone mineral
density and bone micro-architecture
University at Buffalo The State University of New York
Model Approach
Bone is not a uniformly solid material, but rather has some spaces between its hard elements.
Designing a network approach model for the bone microstructure.
Quantitative assessment of bone mineral density could be successfully done with this approach.
University at Buffalo The State University of New York
Bone Network Model Creating Bone Network
A femur bone image from patients with osteoporosis by DXA scan.
By image profiling on DXA scan image, we create bone network based on the bone density.
What represent nodes and edges in bone network model? Node: fiber binding point for bone
cell movements and biochemical interactions
Edge: a group of mineralized fibers Weight of nodes and edges
Node weight: average weight of directly connected edges
Edge weight: Strength status of mineralized fibers
University at Buffalo The State University of New York
Problem / Solution Approach
What alternative ways for determining the strength of bone rather than Bone Mineral Density (BMD)?Designing a computational
model of bone microstructure.
How can we identify fragile locations of the bone structure?Creating algorithms for
mining weak locations from a computational model of bone microstructure.
Bone Model
Human Bone
University at Buffalo The State University of New York
Identifying Critical Locations
Information Propagation ModelAn algorithm to find critical edges
in bone networkMeasuring the quantity of stress
energy in each edgeCutting the most critical edge by
Information Propagation Model Iteratively run to find the next
critical edges. It stops at the first isolated network
University at Buffalo The State University of New York
Conclusions
Various applications are generating data very rapidly and in great volume, demanding data mining approaches.
Network-based approaches look promising to solve complex problems.
This research requires close collaboration among multidisciplinary groups.
Semi-supervised approaches to integrate domain knowledge into data mining tools are important to the success of the research.