data mining: concepts and techniques — chapter 9 — graph mining and social network analysis
DESCRIPTION
Data Mining: Concepts and Techniques — Chapter 9 — Graph mining and Social Network Analysis. Li Xiong Slides credits: Jiawei Han and Micheline Kamber. Graph Mining and Social Network Analysis. Graph mining Frequent subgraph mining Social network analysis Social network - PowerPoint PPT PresentationTRANSCRIPT
April 21, 2023 1
Data Mining: Concepts and Techniques
— Chapter 9 —Graph mining and Social Network Analysis
Li Xiong
Slides credits: Jiawei Han and Micheline Kamber
Graph Mining and Social Network Analysis
Graph mining Frequent subgraph mining
Social network analysis Social network Social network analysis at different levels Link analysis
April 21, 2023Mining and Searching Graphs in Graph
Databases 2
April 21, 2023Mining and Searching Graphs in Graph
Databases 3
Graph Mining Methods for Mining Frequent Subgraphs
Applications:
Graph Indexing
Similarity Search
Classification and Clustering
Summary
April 21, 2023Mining and Searching Graphs in Graph
Databases 4
Why Graph Mining? Graphs are ubiquitous
Chemical compounds (Cheminformatics)
Protein structures, biological pathways/networks (Bioinformactics)
Program control flow, traffic flow, and workflow analysis
XML databases, Web, and social network analysis
Graph is a general model Trees, lattices, sequences, and items are degenerated graphs
Diversity of graphs Directed vs. undirected, labeled vs. unlabeled (edges & vertices),
weighted, with angles & geometry (topological vs. 2-D/3-D)
Complexity of algorithms: many problems are of high
complexity
April 21, 2023Mining and Searching Graphs in Graph
Databases 5
Graph, Graph, Everywhere
Aspirin Yeast protein interaction network
from
H.
Jeon
g e
t al N
atu
re 4
11
, 4
1
(20
01
)
Internet Co-author network
April 21, 2023Mining and Searching Graphs in Graph
Databases 6
Graph Pattern Mining Frequent subgraph mining
Finding frequent subgraphs within a single graph
Finding frequent (sub)graphs in a set of graphs
support (occurrence frequency) no less than a
minimum support threshold
Applications of graph pattern mining
Mining biochemical structures, program control flow
analysis, XML structures or Web communities
Building blocks for graph classification, clustering,
compression, comparison, and correlation analysis
April 21, 2023Mining and Searching Graphs in Graph
Databases 7
Example: Frequent Subgraph Mining in Chemical Compounds
GRAPH DATASET
FREQUENT PATTERNS(MIN SUPPORT IS 2)
(A) (B) (C)
(1) (2)
April 21, 2023Mining and Searching Graphs in Graph
Databases 8
Graph Mining Algorithms
Finding interesting and frequent substructures in a
single graph SUBDUE
Finding frequent patterns in a set of independent
graphs Apriori-based approach
Pattern-growth approach
April 21, 2023 Li Xiong 9
SUBDUE (Holder et al. KDD’94)
Problem Finding “interesting” and repetitive substructures
(connected subgraphs) in data represented as a graph
Basic idea Minimum description length (MDL) principle
Beam search algorithm Start with best single vertices
Expand best substructures with a new edge
Substructures are evaluated based on their ability to
compress input graphs
Minimum Description Length (MDL) Minimum description length (MDL) principle
A formalization of Occam’s Razor
Best hypothesis minimizes description length of the data (largest
compression) Graph substructure discovery based on MDL
Description length (DL): represent vertices and adjacency matrix
Graph compression: replace substructure instances with pointers
Find best substructure S in G that minimizes: DL(S) + DL(G|S)
R1
C1
T1
S1
T2
S2
T3
S3
T4
S4
Input Database (G) Substructure (S1) Compressed Database (G|S1)
R1
C1
S1S1 S1S1 S1S1
S1S1Triangle
Square
Holder et al.
Beam Search Algorithm Beam search
An optimization of best-first search
Breadth-first search with a predetermined number of
paths kept as candidates (beam width)
Subgraph discovery based on beam search Start with best single vertices
Expand best substructures with a new edge
Substructures are evaluated based on their ability to
compress input graphs (minimize description length)
April 21, 2023 Li Xiong 11
Holder et al. 12
Algorithm
1. Create substructure for each unique vertex label
Substructures (S)
triangle (4), square (4),circle (1), rectangle (1)
circle
rectangle
triangle
square
on
on
triangle
square
on
ontriangle
square
on
ontriangle
square
on
on
on
R1
C1
T1
S1
T2
S2
T3
S3
T4
S4
Input Database (G)Input Database (G)(Graph form)
Holder et al. 13
Algorithm (cont.)
2. Expand best substructures by an edge or edge + neighboring vertex
Substructures (S)
triangle
square
on
rectangle
square
on
rectangle
triangleon
circle
rectangle
triangle
square
on
on
triangle
square
on
ontriangle
square
on
ontriangle
square
on
on
on
rectangle
circle
on
Holder et al. SRL Workshop 14
Algorithm (cont.)3. Keep best beam-width substructures on queue4. Terminate when queue is empty or #discovered
substructures >= limit5. Compress graph with hierarchical description
April 21, 2023Mining and Searching Graphs in Graph
Databases 15
Frequent Subgraph Mining Approaches Problem: finding frequent subgraphs in a set of graphs Apriori-based approach
AGM: Inokuchi, et al. (PKDD’00) FSG: Kuramochi and Karypis (ICDM’01) PATH#: Vanetik and Gudes (ICDM’02, ICDM’04) FFSM: Huan, et al. (ICDM’03)
Pattern growth approach MoFa, Borgelt and Berthold (ICDM’02) gSpan: Yan and Han (ICDM’02) Gaston: Nijssen and Kok (KDD’04)
Close pattern mining CLOSEGRAPH: Yan & Han (KDD’03)
April 21, 2023 16
Apriori-Based Approach
…
G
G1
G2
Gn
Frequent subgraphs
Subgraphs with extra vertex, edge
G’
G’’
JOIN
Level-wise algorithm: building candidate subgraphs from small frequent subgraphs
April 21, 2023Mining and Searching Graphs in Graph
Databases 17
Apriori-Based Search AGM (Apriori-based Graph Mining), Inokuchi, et al. PKDD’00
generates new graphs with one more node
FSG (Frquent SubGraph mining), Kuramochi and Karypis, ICDM’01 generates new graphs with one more edge
cbaa
aa
aa
aa
April 21, 2023Mining and Searching Graphs in Graph
Databases 18
Pattern Growth Method
…
G
G1
G2
Gn
k-edge
(k+1)-edge
…
(k+2)-edge
…
duplicate graph
Depth-based search and right-most extension
April 21, 2023Mining and Searching Graphs in Graph
Databases 19
GSPAN (Yan and Han ICDM’02)
April 21, 2023Mining and Searching Graphs in Graph
Databases 20
Graph Mining Methods for Mining Frequent Subgraphs
Applications:
Classification and Clustering
Graph Indexing
Similarity Search
April 21, 2023Mining and Searching Graphs in Graph
Databases 21
Using Graph Patterns
Similarity measures based on graph patterns Feature-based similarity measure
Each graph is represented as a feature vector
Frequent subgraphs can be used as features
Vector distance
Structure-based similarity measure
Maximal common subgraph
Graph edit distance: insertion, deletion, and relabel
Frequent and discriminative subgraphs are
high-quality indexing features
Social Network Analysis Social network Different levels of social network analysis Common measures and methods for social
network analysis Link analysis
April 21, 2023Mining and Searching Graphs in Graph
Databases 22
Social Network Social network: a social structure consists of nodes and
ties. Nodes are the individual actors within the networks
May be different kinds May have attributes, labels or classes
Ties are the relationships between the actors May be different kinds Links may have attributes, directed or undirected
Homogeneous networks Single object type and single link type Single model social networks (e.g., friends) WWW: a collection of linked Web pages
Heterogeneous networks Multiple object and link types Medical network: patients, doctors, disease, contacts, treatments Bibliographic network: publications, authors, venues
April 21, 2023Mining and Searching Graphs in Graph
Databases 23
Small World Phenomenon Number of degrees of separation in actual social
networks? Six-degree separation: everyone is an average of
six "steps" away from each person on Earth. Empirical studies
Michael Gurevich,1961. US population linked by 2 intermediaries
Duncan Watts, 2001. Email-delivery on the internet: average number of intermediaries is 6.
Leskovec and Horvitz, 2007. Instant messages: average path length is 6.6
April 21, 2023Mining and Searching Graphs in Graph
Databases 24
April 21, 2023 Data Mining: Concepts and Techniques 25
Six Degrees of Kevin Bacon
Vertices: actors and actresses Edge between u and v if they appeared in a film together
Is Kevin Bacon the most
connected actor?
NO!
Rank NameAveragedistance
# ofmovies
# oflinks
1 Rod Steiger 2.537527 112 25622 Donald Pleasence 2.542376 180 28743 Martin Sheen 2.551210 136 35014 Christopher Lee 2.552497 201 29935 Robert Mitchum 2.557181 136 29056 Charlton Heston 2.566284 104 25527 Eddie Albert 2.567036 112 33338 Robert Vaughn 2.570193 126 27619 Donald Sutherland 2.577880 107 2865
10 John Gielgud 2.578980 122 294211 Anthony Quinn 2.579750 146 297812 James Earl Jones 2.584440 112 3787…
876 Kevin Bacon 2.786981 46 1811…
876 Kevin Bacon 2.786981 46 1811
Kevin Bacon
No. of movies : 46 No. of actors : 1811 Average separation: 2.79
April 21, 2023 Data Mining: Concepts and Techniques 26
Rod Steiger
Martin Sheen
Donald Pleasence
#1
#2
#3
#876Kevin Bacon
Social Network Analysis Actor level: centrality, prestige, and roles such as
isolates, liaisons, bridges, etc. Dyadic level: distance and reachability, structural
and other notions of equivalence, and tendencies toward reciprocity.
Triadic level: balance and transitivity Subset level: cliques, cohesive subgroups,
components Network level: connectedness, diameter,
centralization, density, prestige, etc.
April 21, 2023Social network analysis: methods and
applications 27
Measures in Social Network Analysis – Actor level
Non-directional graphs Degree Centrality
The number of direct connections a node has 'connector' or 'hub' in this network
Betweenness Centrality Degree an individual lies between other individuals in the
network an intermediary; liaison; bridge
Closeness Centrality The degree an individual is near all other individuals in a
network (directly or indirectly) Eigenvector centrality
A measure of relative importance of a node Based on the principle that connections to nodes having a high
score contribute more to the current node Directional graphs
Prestige: measure the degree of incoming ties
April 21, 2023Mining and Searching Graphs in Graph
Databases 28
Actor Centrality Example
April 21, 2023 OrgNet.com 29
Measures in Social Network Analysis – Dyadic, Triadic and Subset Level
Path Length The distances between pairs of nodes in the network.
Structural equivalence Extent to which actors have a common set of linkages
to other actors in the system. Clustering coefficient
A measure of the likelihood that two associates of a node are associates themselves
Cliquishness of u’s neighborhood Cohesion
The degree to which actors are connected directly to each other by cohesive bonds
Cliques
April 21, 2023Mining and Searching Graphs in Graph
Databases 30
Measures in Social Network Analysis – Network Level
Network Centralization The difference between number of links for each node Centralized vs. decentralized networks
Network density Proportion of ties in a network relative to the total number possible Sparse vs. dense networks
Average Path Length Average of distances between all pairs of nodes
Reach The degree any member of a network can reach other members of
the network. Structural cohesion
The minimum number of members who, if removed from a group, would disconnect the group.
April 21, 2023Mining and Searching Graphs in Graph
Databases 31
April 21, 2023 Data Mining: Concepts and Techniques 32
Another Taxonomy of Link Mining Tasks Object-Related Tasks
Link-based object ranking Link-based object classification Object clustering (group detection) Object identification (entity resolution)
Link-Related Tasks Link prediction
Graph-Related Tasks Subgraph discovery Graph classification Generative model for graphs
Social Network Applications
Link-based object ranking for WWW (actor-level analysis) PageRank HITS
Influence and diffusion
April 21, 2023Mining and Searching Graphs in Graph
Databases 33
April 21, 2023 Data Mining: Concepts and Techniques 34
Link-Based Object Ranking (LBR)
Exploit the link structure of a graph to order or prioritize the set of objects within the graph Focused on graphs with single object type and single
link type Focus of link analysis community Algorithms
PageRank HITS
PageRank: Ranking web pages (Brin & Page’98)
Intuition Web pages are not equally “important”
www.joe-schmoe.com v www.stanford.edu Links as citations: a page cited often is more important
www.stanford.edu has 23,400 inlinks www.joe-schmoe.com has 1 inlink
Are all links equal? Recursive model: being cited by a highly cited paper
counts a lot… Eigenvector prestige measure
Each link’s vote is proportional to the importance of its source page
If page P with importance x has n outlinks, each link gets x/n votes
Page P’s own importance is the sum of the votes on its inlinks
Simple Recursive Flow Model
Yahoo
M’softAmazon
y
a m
y/2
y/2
a/2
a/2
m
y = y /2 + a /2a = y /2 + mm = a /2
Solving the equation with constraint: y+a+m = 1y = 2/5, a = 2/5, m = 1/5
Matrix formulation Web link matrix M: one row and one column per web page
Suppose page j has n outlinks, if j ! i, then Mij=1/n, else Mij=0 M is a column stochastic matrix - Columns sum to 1
Rank vector r: one entry per web page ri is the importance score of page i |r| = 1
Flow equation: r = Mr Rank vector is an eigenvector of the web matrix
i
j
M r r
=j
i
Matrix formulation Example
Yahoo
M’softAmazon
y 1/2 1/2 0a 1/2 0 1m 0 1/2 0
y a m
y = y /2 + a /2a = y /2 + mm = a /2
r = Mr
y 1/2 1/2 0 y a = 1/2 0 1 a m 0 1/2 0 m
Power Iteration method Simple iterative scheme (aka relaxation) Suppose there are N web pages Initialize: r0 = [1/N,….,1/N]T
Iterate: rk+1 = Mrk
Stop when |rk+1 - rk|1 < |x|1 = 1≤i≤N|xi| is the L1 norm Can use any other vector norm e.g., Euclidean
Power Iteration Example
Yahoo
M’softAmazon
y 1/2 1/2 0a 1/2 0 1m 0 1/2 0
y a m
ya =m
1/31/31/3
1/31/21/6
5/12 1/3 1/4
3/811/241/6
2/52/51/5
. . .
Random Walk Interpretation Imagine a random web surfer
At any time t, surfer is on some page P At time t+1, the surfer follows an outlink from P
uniformly at random Ends up on some page Q linked from P Process repeats indefinitely
p(t) is the probability distribution whose ith component is the probability that the surfer is at page i at time t
The stationary distribution Where is the surfer at time t+1?
p(t+1) = Mp(t) Suppose the random walk reaches a state such
that p(t+1) = Mp(t) = p(t) Then p(t) is a stationary distribution for the random
walk Our rank vector r satisfies r = Mr
Existence and Uniqueness of the Solution
Theory of random walks (aka Markov processes):For graphs that satisfy certain conditions, the stationary distribution is unique and eventually will be reached no matter what the initial probability distribution at time t = 0.
April 21, 2023Mining and Searching Graphs in Graph
Databases 43
Spider traps A group of pages is a spider trap if there are no
links from within the group to outside the group Spider traps violate the conditions needed for the
random walk theorem
Yahoo
M’softAmazon
y 1/2 1/2 0a 1/2 0 0m 0 1/2 1
y a m
ya =m
111
11/23/2
3/41/27/4
5/83/82
003
. . .
Random teleports At each time step, the random surfer has two
options: With probability , follow a link at random With probability 1-, jump to some page uniformly at
random Common values for are in the range 0.8 to 0.9
Surfer will teleport out of spider trap within a few time steps
Random teleports Example ()
Yahoo
M’softAmazon
1/2 1/2 0 1/2 0 0 0 1/2 1
1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3
y 7/15 7/15 1/15a 7/15 1/15 1/15m 1/15 7/15 13/15
0.8 + 0.2
ya =m
111
1.000.601.40
0.840.601.56
0.7760.5361.688
7/11 5/1121/11
. . .
Matrix formulation Matrix vector A
Aij = Mij + (1-)/N Mij = 1/|O(j)| when j!i and Mij = 0 otherwise Verify that A is a stochastic matrix
The page rank vector r is the principal eigenvector of this matrix satisfying r = Ar
Equivalently, r is the stationary distribution of the random walk with teleports
April 21, 2023 Data Mining: Concepts and Techniques 48
HITS: Capturing Authorities & Hubs (Kleinberg’98)
Intuitions Pages that are widely cited are good authorities Pages that cite many other pages are good hubs
HITS (Hypertext-Induced Topic Selection)1. Authorities are pages containing useful information and
linked by Hubs course home pages home pages of auto manufacturers
2. Hubs are pages that link to Authorities course bulletin list of US auto manufacturers
Iterative reinforcement …
Hubs Authorities
Matrix Formulation Transition (adjacency) matrix A
A[i, j] = 1 if page i links to page j, 0 if not The hub score vector h: score is
proportional to the sum of the authority scores of the pages it links to h = λAa Constant λ is a scale factor
The authority score vector a: score is proportional to the sum of the hub scores of the pages it is linked from a = μAT h Constant μ is scale factor
Hubs Authorities
Transition Matrix Example
Yahoo
M’softAmazon
y 1 1 1a 1 0 1m 0 1 0
y a m
A =
Iterative algorithm Initialize h, a to all 1’s h = Aa Scale h so that its max entry is 1.0 a = ATh Scale a so that its max entry is 1.0 Continue until h, a converge
Iterative Algorithm Example
1 1 1A = 1 0 1 0 1 0
1 1 0AT = 1 0 1 1 1 0
a(yahoo)a(amazon)a(m’soft)
===
111
111
14/51
1 0.75 1
. . .
. . .
. . .
10.7321
h(yahoo) = 1h(amazon) = 1h(m’soft) = 1
12/31/3
1 0.73 0.27
. . .
. . .
. . .
1.0000.7320.268
10.710.29
Existence and Uniqueness of the Solution
h = λAaa = μAT hh = λμAAT ha = λμATA a
Under reasonable assumptions about A, the dual iterative algorithm converges to vectors h* and a* such that:• h* is the principal eigenvector of the matrix AAT
• a* is the principal eigenvector of the matrix ATA
Page Rank and HITS Similarities
Iterative algorithm based on the linkage of the documents on the web
Same problem: what is the value of a link from S to D? Different models
PageRank: depends on the links into S HITS: depends on the value of the other links out of S
The destinies of PageRank and HITS post-1998 PageRank: trademark of Google HITS: not commonly used by search engines (Ask.com
?)
Social Network Analysis Applications
Link-based object ranking for WWW (actor-level analysis) PageRank HITS
Influence and diffusion
April 21, 2023Mining and Searching Graphs in Graph
Databases 55
Influence and Diffusion
OrgNet.com 56CDC: Spread of Airborne Disease
Coming Up Paper presentations:
Knowledge discovery from transportation network data Maximizing the spread of influence through a social
network Wherefore Art Thou R3579X? Anonymized Social
Networks, Hidden Patterns, and Structural Steganography
April 21, 2023Mining and Searching Graphs in Graph
Databases 57
April 21, 2023Mining and Searching Graphs in Graph
Databases 58
References (1) T. Asai, et al. “Efficient substructure discovery from large semi-structured data”, SDM'02
C. Borgelt and M. R. Berthold, “Mining molecular fragments: Finding relevant
substructures of molecules”, ICDM'02
D. Cai, Z. Shao, X. He, X. Yan, and J. Han, “Community Mining from Multi-Relational
Networks”, PKDD'05.
M. Deshpande, M. Kuramochi, and G. Karypis, “Frequent Sub-structure Based
Approaches for Classifying Chemical Compounds”, ICDM 2003
M. Deshpande, M. Kuramochi, and G. Karypis. “Automated approaches for classifying
structures”, BIOKDD'02
L. Dehaspe, H. Toivonen, and R. King. “Finding frequent substructures in chemical
compounds”, KDD'98
C. Faloutsos, K. McCurley, and A. Tomkins, “Fast Discovery of 'Connection Subgraphs”,
KDD'04
H. Fröhlich, J. Wegner, F. Sieker, and A. Zell, “Optimal Assignment Kernels For Attributed
Molecular Graphs”, ICML’05
T. Gärtner, P. Flach, and S. Wrobel, “On Graph Kernels: Hardness Results and Efficient
Alternatives”, COLT/Kernel’03
April 21, 2023Mining and Searching Graphs in Graph
Databases 59
References (2)
L. Holder, D. Cook, and S. Djoko. “Substructure discovery in the subdue
system”, KDD'94
J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha.
“Mining spatial motifs from protein structure graphs”, RECOMB’04
J. Huan, W. Wang, and J. Prins. “Efficient mining of frequent subgraph in the
presence of isomorphism”, ICDM'03
H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, “Mining Coherent Dense Subgraphs
across Massive Biological Networks for Functional Discovery”, ISMB'05
A. Inokuchi, T. Washio, and H. Motoda. “An apriori-based algorithm for mining
frequent substructures from graph data”, PKDD'00
C. James, D. Weininger, and J. Delany. “Daylight Theory Manual Daylight
Version 4.82”. Daylight Chemical Information Systems, Inc., 2003.
G. Jeh, and J. Widom, “Mining the Space of Graph Properties”, KDD'04
H. Kashima, K. Tsuda, and A. Inokuchi, “Marginalized Kernels Between
Labeled Graphs”, ICML’03
April 21, 2023Mining and Searching Graphs in Graph
Databases 60
References (3)
M. Koyuturk, A. Grama, and W. Szpankowski. “An efficient algorithm for detecting
frequent subgraphs in biological networks”, Bioinformatics, 20:I200--I207, 2004.
T. Kudo, E. Maeda, and Y. Matsumoto, “An Application of Boosting to Graph
Classification”, NIPS’04
M. Kuramochi and G. Karypis. “Frequent subgraph discovery”, ICDM'01
M. Kuramochi and G. Karypis, “GREW: A Scalable Frequent Subgraph Discovery
Algorithm”, ICDM’04
C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “Mining Behavior Graphs for ‘Backtrace'' of
Noncrashing Bugs’'', SDM'05
P. Mahé, N. Ueda, T. Akutsu, J. Perret, and J. Vert, “Extensions of Marginalized Graph
Kernels”, ICML’04
B. McKay. Practical graph isomorphism. Congressus Numerantium, 30:45--87, 1981.
S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference.
KDD'04
J. Prins, J. Yang, J. Huan, and W. Wang. “Spin: Mining maximal frequent subgraphs
from graph databases”. KDD'04
April 21, 2023Mining and Searching Graphs in Graph
Databases 61
References (4) D. Shasha, J. T.-L. Wang, and R. Giugno. “Algorithmics and applications of tree and
graph searching”, PODS'02 J. R. Ullmann. “An algorithm for subgraph isomorphism”, J. ACM, 23:31--42, 1976. N. Vanetik, E. Gudes, and S. E. Shimony. “Computing frequent graph patterns from
semistructured data”, ICDM'02 C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. “Scalable mining of large disk-base
graph databases”, KDD'04 T. Washio and H. Motoda, “State of the art of graph-based data mining”, SIGKDD
Explorations, 5:59-68, 2003 X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Mining”, ICDM'02 X. Yan and J. Han, “CloseGraph: Mining Closed Frequent Graph Patterns”, KDD'03 X. Yan, P. S. Yu, and J. Han, “Graph Indexing: A Frequent Structure-based Approach”,
SIGMOD'04 X. Yan, X. J. Zhou, and J. Han, “Mining Closed Relational Graphs with Connectivity
Constraints”, KDD'05 X. Yan, P. S. Yu, and J. Han, “Substructure Similarity Search in Graph Databases”,
SIGMOD'05 X. Yan, F. Zhu, J. Han, and P. S. Yu, “Searching Substructures with Superimposed
Distance”, ICDE'06 M. J. Zaki. “Efficiently mining frequent trees in a forest”, KDD'02
April 21, 2023 Data Mining: Concepts and Techniques 62
Ref: Mining on Social Networks
D. Liben-Nowell and J. Kleinberg. The Link Prediction Problem for Social Networks. CIKM’03
P. Domingos and M. Richardson, Mining the Network Value of Customers. KDD’01
M. Richardson and P. Domingos, Mining Knowledge-Sharing Sites for Viral Marketing. KDD’02
D. Kempe, J. Kleinberg, and E. Tardos, Maximizing the Spread of Influence through a Social Network. KDD’03.
P. Domingos, Mining Social Networks for Viral Marketing. IEEE Intelligent Systems, 20(1), 80-82, 2005.
S. Brin and L. Page, The anatomy of a large scale hypertextual Web search engine. WWW7.
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, Mining the link structure of the World Wide Web. IEEE Computer’99
D. Cai, X. He, J. Wen, and W. Ma, Block-level Link Analysis. SIGIR'2004.