graph olap: towards online analytical processing on graphs
DESCRIPTION
Graph OLAP: Towards Online Analytical Processing on Graphs. Chen Chen , Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center University of Illinois at Chicago. Outline. Motivation Framework Efficient Computation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/1.jpg)
Graph OLAP: Towards Online Analytical Processing on GraphsChen Chen, Xifeng Yan, Feida Zhu, Jiawei Han,
Philip S. YuUniversity of Illinois at Urbana-Champaign
IBM T. J. Watson Research CenterUniversity of Illinois at Chicago
![Page 2: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/2.jpg)
OutlineMotivationFrameworkEfficient ComputationExperimentsConclusion
![Page 3: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/3.jpg)
Online Analytical ProcessingJim Gray, 1997OLAP as a powerful analytical tool
![Page 4: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/4.jpg)
The Usefulness of OLAPMulti-dimensional
Different perspectivesMulti-level
Different granularitiesCan we offer roll-up/drill-down and slice/dice
on graph data?Traditional OLAP cannot handle this, because
they ignore links among data objects
![Page 5: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/5.jpg)
The Prevalence of GraphsChemical compounds, computer vision
objects, circuits, XMLEspecially various information networks
Biological networksBibliographic networksSocial networksWorld Wide Web (WWW)
![Page 6: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/6.jpg)
ApplicationsWWW
>= 3 billion nodes, >= 50 billion arcsFacebook
>= 100 million active usersCombining topological structures and
node/edge attributesGreat challenge to view and analyze them
We propose Graph OLAP to tackle this issue
![Page 7: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/7.jpg)
Scenario #1A bibliographic
networkThe collaboration
patterns among researchers for SIGMOD 2004
![Page 8: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/8.jpg)
![Page 9: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/9.jpg)
Scenario #2
![Page 10: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/10.jpg)
OutlineMotivationFramework
Data ModelTwo types of Graph OLAPDimension, Measure and OLAP operations
Efficient ComputationExperimentsConclusion
![Page 11: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/11.jpg)
Data ModelWe have a collection of network snapshots G = {G1, G2, . . . , GN}
Each snapshot Gi = (I1,i, I2,i, . . . , Ik,i; Gi)I1,i, I2,i, . . . , Ik,i are k informational attributes
describing the snapshot as a wholeGi = (Vi, Ei) is an attributed graph, with
attributes attached with its nodes Vi and edges Ei
Since G1, G2, . . . , GN only represent different observations of a network, V1, V2, . . . , VN actually correspond to the same set of objects
![Page 12: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/12.jpg)
Two Types of OLAPInformational OLAP (abbr. I-OLAP)Topological OLAP (abbr. T-OLAP)
![Page 13: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/13.jpg)
Informational OLAPDimensions come
from informational attributes attached at the whole snapshot level, so-called Info-Dims
e.g., scenario #1
![Page 14: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/14.jpg)
I-OLAP CharacteristicsOverlay multiple pieces of informationDo not change the objects whose interactions
are being looked atIn the underlying snapshots, each node is a
researcherIn the summarized view, each node is still a
researcher
![Page 15: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/15.jpg)
Topological OLAPDimensions come from the node/edge
attributes inside individual networks, so-called Topo-Dims
e.g., scenario #2
![Page 16: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/16.jpg)
T-OLAP CharacteristicsZoom in/Zoom outNetwork topology changed: “generalized”
nodes and “generalized” edgesIn the underlying network, each node is a
researcherIn the summarized view, each node becomes an
institute that comprises multiple researchers
![Page 17: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/17.jpg)
Measures in Graph OLAPMeasure is an aggregated graph
I-aggregated graphT-aggregated graphOther measures like node count, average
degree, etc. can be treated as derivedGraph plays a dual role
Data sourceAggregate measure
![Page 18: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/18.jpg)
Generality of the FrameworkMeasures could be complex
e.g., maximum flow, shortest path, centralityCombine I-OLAP and T-OLAP into a hybrid
case
![Page 19: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/19.jpg)
Graph OLAP OperationsGraph I-OLAP Graph T-OLAP
Roll-up
Overlay multiple snapshots to form a higher-level summary via I-aggregated graph
Shrink the topology and obtain a T-aggregated graph that represents a compressed view, whose topological elements (i.e., nodes and/or edges) have been merged and replaced by corresponding higher-level ones
Drill-down
Return to the set of lower-level snapshots from the higher-level overlaid (aggregated) graph
A reverse operation of roll-up
Slice/dice
Select a subset of qualifying snapshots based on Info-Dims
Select a subgraph of the network based on Topo-Dims
![Page 20: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/20.jpg)
OutlineMotivationFrameworkEfficient Computation
Measure classificationOptimizationsConstraint pushing
ExperimentsConclusion
![Page 21: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/21.jpg)
Two Categories of StrategiesTop-down
Generalized cells laterHow to combine and leverage intermediate
results?Bottom-up
Generalized cells firstHow to early-stop?
![Page 22: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/22.jpg)
Measure ClassificationHow to combine and leverage intermediate
results?Distributive
The computation of high-level cells can be directly built on low-level cells
Algebraic Not distributive, but can be easily derived from
several distributive measuresHolistic
Neither distributive nor algebraic
![Page 23: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/23.jpg)
ExamplesDistributive: collaboration frequency
Use distributiveness to drive computation up the cuboid lattice
Algebraic: maximum flowWill prove laterSemi-distributive
Holistic: centralityNeed to go down to the raw data and start
from scratch
![Page 24: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/24.jpg)
OptimizationsSpecial measures may have special
properties that can help optimize the calculations
We discuss two of them here, with regard to I-OLAPLocalizationAttenuation
![Page 25: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/25.jpg)
LocalizationDuring computation, only a neighborhood of
the networks needs to be consultede.g., the collaboration frequency of “R.
Agrawal” and “R.Srikant” for [sigmod, all-years] only depends on their collaboration frequencies in each SIGMOD conferences
Perfect (i.e., 0-neighborhood) localizationk-neighborhood is less ideal, but still useful
e.g., # of common friends shared by “R. Agrawal” and “R.Srikant”
![Page 26: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/26.jpg)
AttenuationConsider the transporting capability (i.e.,
maximum flow) from source S to destination TMultiple transportation networks, each one is
operated by a separate companyWith regard to I-OLAP, each network is a
“snapshot”, and overlaying more than one snapshots means to share link capacities among companies
![Page 27: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/27.jpg)
AttenuationData graph C
Node: citiesEdge: capacity of a link
Measure graph FNode: citiesEdge: when maximum flow is transmitted, the
quantity that passes through a link
![Page 28: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/28.jpg)
AttenuationMaximum flow is algebraic
F can be derived from C Just run the maximum flow algorithm
The capacity graph C is obviously distributiveLemma
Let F be a flow in C and let CF be its residual graph, where residual means that CF = C - F, then F′ is a maximum flow in CF if and only if F + F′ is a maximum flow in C
![Page 29: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/29.jpg)
AttenuationConsider two snapshots that are overlaid
Maximum flow F1, F2 already calculated from C1, C2
Without attenuation Compute the overall maximum flow F from C1 + C2
With attenuation Take F1 + F2 as basis Compute the residual maximum flow F′ from (C1 - F1)
+ (C2 - F2), and augment it onto F1 + F2
Thus, our input attenuates from C1 + C2 to (C1 + C2 ) - (F1 + F2 ), which substantially decreases the efforts
![Page 30: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/30.jpg)
Constraint PushingIceberg graph cube
Partial materializationSatisfying some interestingness requirement
Push the constraintsAnti-monotone
e.g., maximum flow |f| ≥ δ|f|
Monotone e.g., diameter d ≥ δd
![Page 31: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/31.jpg)
OutlineMotivationFrameworkEfficient ComputationExperimentsConclusion
![Page 32: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/32.jpg)
OLAP a Bibliographic NetworkWe get the coauthorship data from DBLPMeasure
Information CentralityTwo Info-Dims
Area Database (DB): PODS/SIGMOD/VLDB/ICDE/EDBT Data Mining (DM): ICDM/SDM/KDD/PKDD Information Retrieval (IR): SIGIR/WWW/CIKM
Time
![Page 33: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/33.jpg)
OLAP a Bibliographic Network
![Page 34: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/34.jpg)
EfficiencyA test that computes maximum flow as the
measureSynthetically generate flow networks
Details in the paper, with each “snapshot” representing an individual player in the transportation industry
Like the Multi-Way method, calculate low-level cells before merging them into high-level onesOne takes advantage of the attenuation
heuristicThe other does not
![Page 35: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/35.jpg)
Efficiency
![Page 36: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/36.jpg)
OutlineMotivationFrameworkEfficient ComputationExperimentsConclusion
![Page 37: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/37.jpg)
ConclusionWe propose a Graph OLAP framework to
perform multi-dimensional, multi-level analysis on network dataMeasure is an aggregated graphInformational/Topological dimensions lead to I-
OLAP, T-OLAP
![Page 38: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/38.jpg)
ConclusionMainly focusing on I-OLAP, we discuss how a
graph cube can be efficiently computed and materializeddistributive, algebraic, holisticOptimizations: localization, attenuationConstraint pushing
![Page 39: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/39.jpg)
Future WorksTechnical issues for T-OLAPSelective drilling and discovery-driven
InfoNet-OLAP
![Page 40: Graph OLAP: Towards Online Analytical Processing on Graphs](https://reader036.vdocuments.us/reader036/viewer/2022062411/5681685a550346895dde9150/html5/thumbnails/40.jpg)
Thank You!