streaming graph analysis a statistical framework...
TRANSCRIPT
![Page 1: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/1.jpg)
A Statistical Framework forStreaming Graph AnalysisJames Fairbanks, David Ediger, Rob McColl, David A. Bader, Eric Gilbert
![Page 2: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/2.jpg)
Problem
James Fairbanks, ASONAM 2013
In order to understand social media, we must understand the evolution of relationships in streaming data.
● How can we detect change?● What is a significant change?● Are two sets of vertices significantly different?● How can we visualize 10,000+ vertices?● Which vertices look anomalous?
We look to statistical analysis for guidance on these questions.
1
![Page 3: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/3.jpg)
Challenge
James Fairbanks, ASONAM 2013
● Graph data is big, sparse, irregular, messy, high dimensional
● Statistics works best on dense, regular, clean, low dimensional data
2
![Page 4: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/4.jpg)
Solution● Use graph theoretic computations to embed the graph
in a low-dimensional space● Embedding is not topology preserving● Do Machine Learning and Statistics in Euclidean Space
Wikipedia articles Scatter plot of vertices in feature space
3James Fairbanks, ASONAM 2013
![Page 5: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/5.jpg)
Related Work
James Fairbanks, ASONAM 2013
● Tracking Earthquakes [Sakaki, et al., 2010]
● Rumors about Earthquakes [Mendoza, et al., 2010]
● London Riots and Hashtags [Glasgow and Fink, 2013]
● Streaming Clustering Coefficient [Ediger, Riedy, et al., 2011]
● Atlanta Floods, H1N1 [Ediger, et al., 2010]
● Dynamic Visual Analysis [Federico, et al., 2012]
4
![Page 6: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/6.jpg)
Definitions
James Fairbanks, ASONAM 2013
Vertex [Features, Metrics, Statistics]
A vertex statistic associates a number to each vertex at each time step.
Graph Kernels
computational subroutines that compute vertex features or maintain a data structure on top of the graph
5
![Page 7: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/7.jpg)
Examples
James Fairbanks, ASONAM 2013
Vertex Features● Degree● Size of connected
component● Geodesic distance● Local clustering
coefficient● PageRank● Betweenness Centrality
Graph Kernels● Counting neighbors● Shiloach-Vishkin connected
components● Breadth First Search (BFS)● Counting Neighborhood
intersection● Power Iteration ● Brandes 2001
6
![Page 8: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/8.jpg)
A high performance, dynamic graph data structure withsemantic and temporal properties
● Supports concurrent streaming data sources and analysis● Scalable on shared-memory Intel x86 platforms and Cray XMT● Open source and free (BSD License)● http://www.stingergraph.com
STINGER
James Fairbanks, ASONAM 2013 7
![Page 9: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/9.jpg)
Data Set
James Fairbanks, ASONAM 2013
● Hurricane Sandy public Tweets [28 Oct, 12 Nov 2012]
● 1,238,109 mentions
● 662,575 unique users
● Batches of 10,000 updates
● Update interval: 1 batch represents ~3 hours of Tweets
photo credit: NASA Earth Observatory
8
![Page 10: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/10.jpg)
Clustering Coefficient
James Fairbanks, ASONAM 2013
● Where tri(v) is the number of 3-cycles containing v● Measures how tightly knit the graph is at the local
level[Watts, Strogatz, 98]
● Compute in time
9
![Page 11: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/11.jpg)
Clustering Coefficient
James Fairbanks, ASONAM 2013
NJ Landfall
● Counting vertices that have increasing or decreasing clustering coefficient
● Model as stochastic process for forecasting
10
![Page 12: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/12.jpg)
Temporal Correlation
James Fairbanks, ASONAM 2013
● Defined for any quantity measuring strength of association
● For Pearson’s correlation
● formula
● quantifies strength of association between successive measurements
11
![Page 13: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/13.jpg)
Correlation Decay
James Fairbanks, ASONAM 2013
● New edges change vertex statistics
● Correlation measures of forgetfulness of vertex statistic
● Bigger graph implies less impact
12
![Page 14: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/14.jpg)
The centered discrete derivative of a vertex feature:
Derivatives
James Fairbanks, ASONAM 2013 13
![Page 15: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/15.jpg)
Anomaly Detection
James Fairbanks, ASONAM 2013
In graphs:● What is an anomalous vertex in a graph?● In Social Media, who uses the service in a novel way?● A vertex with edges that look different.
From statistics:● Outlier: a point in a region of space with very low
probability density. ● Points close to the outliers in space, are rare.● If we can estimate the true density from a finite
sample, then we can find outliers.
14
![Page 16: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/16.jpg)
Outlier Detection Features
James Fairbanks, ASONAM 2013
● Mean(CC)● Var(CC)● Mean(Deriv(CC))● Var(Deriv(CC))
● Gaussian Radial Basis Function● Radius 0.3● By choice 5% of the data is labeled
outlier
15
![Page 17: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/17.jpg)
Outlier Detection
James Fairbanks, ASONAM 2013
Used a one Class SVM because of multimodal features
16
![Page 18: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/18.jpg)
Validation
James Fairbanks, ASONAM 2013
● Inlier and Outlier distributions differ
● Outliers more uniformly distributed
● Mixing in each scatter plot means all dimensions are necessary
17
![Page 19: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/19.jpg)
Conclusions
James Fairbanks, ASONAM 2013
● Separating computation into graph algorithms, then machine learning and statistics phase allows leveraging best techniques from both fields.
● Applying multivariate outlier detection methods to streaming graphs reveals two distinct distributions of vertices.
● These feature based methods enable dynamic visualization of much larger graphs than traditional two dimensional embeddings.
18
![Page 20: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/20.jpg)
Acknowledgment of Support
James Fairbanks, ASONAM 2013
![Page 21: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/21.jpg)
Future Work
James Fairbanks, ASONAM 2013
Explore predictive ability in feature space
![Page 22: Streaming Graph Analysis A Statistical Framework forstingergraph.com/data/uploads/asonam2013_slides.pdf · Streaming Graph Analysis James Fairbanks, David Ediger, Rob McColl, David](https://reader033.vdocuments.us/reader033/viewer/2022052804/6049ebe5c34dbc54af20b01f/html5/thumbnails/22.jpg)
Signal Processing
James Fairbanks, ASONAM 2013
Estimate Periodicity. Filtering out small deviations and trends.