a game theoretic framework for heterogenous information network clustering

63
Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion Game Theoretic Framework for Heterogeneous Information Network Clustering Faris Alqadah Johns Hopkins University

Upload: faris-alqadah

Post on 24-May-2015

536 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Game Theoretic Framework forHeterogeneous Information Network Clustering

Faris Alqadah

Johns Hopkins University

Page 2: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Page 3: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Page 4: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Motivation

Heterogeneous Information Networks (HINs) are pervasivein applications ranging from bioinformatics to e-commerce.

Generalization of bi-clustering to pairwise relations asopposed to tensor spaces.

No unified definition of a HIN-cluster or algorithmicframework to mine them.

Address short coming of ‘pattern’-based approaches.

Page 5: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

HINs

Objects derived fromdistinct domains

Topology of the networkdetermined bypairwise-binary relationsamongst domains.

Graph representation of aHIN is a multi-partitegraph.

Clicking patterns, socialnetworks, gene networksfrom different experiments.

Page 6: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Related Work

Three major categories of work

Multi-way clustering [5, 4, 1, 2]: Directly extendbi-clustering or co-clustering. Mostly hard-clusters.

Information-network [10, 11]: Combine ranking andclustering using probabilty generating models, limited bynetwork-topology, hard clustering.

Pattern-based [3, 12, 7]: Formal Concept Analysis,overlapping clustering, too many clusters, parametersettings.

Page 7: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Page 8: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Page 9: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Page 10: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Page 11: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Multiple-edge HIN,competingcluster-influences.

Page 12: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Multiple-edge HIN,competingcluster-influences.

Page 13: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Multiple-edge HIN,competingcluster-influences.

Page 14: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

Multiple-edge HIN,competingcluster-influences.

An ‘ideal’ HIN-clustershould be an equilibriumpoint among all competingclustering influences.

Page 15: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

Multiple-edge HIN,competingcluster-influences.

An ‘ideal’ HIN-clustershould be an equilibriumpoint among all competingclustering influences.

Nash equilibrium: No onecan do any betterassuming everyone elseretains the same strategy.

Page 16: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Page 17: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Notation

Context Kij = (Gi ,Gj , Iij), two sets and a relation.

A HIN Gn = (V,E) where V is a set of domains{G1, . . . ,Gn} and (Gi ,Gj) ∈ E iff ∃Kij

Page 18: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Notation

Context Kij = (Gi ,Gj , Iij), two sets and a relation.

A HIN Gn = (V,E) where V is a set of domains{G1, . . . ,Gn} and (Gi ,Gj) ∈ E iff ∃Kij

Page 19: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Concepts (maximal bicliques)

Common neighbors:

ψj(Ai) =

{

{gj ∈ Gj |gj Iijgi ∀gi ∈ Ai} if (Gi ,Gj) ∈ E,

∅ otherwise.

Concept or maximal bi-clique: (Ai ,Aj) such thatψj(Ai) = Aj and ψi(Aj) = Ai .

Page 20: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Concepts (maximal bicliques)

Common neighbors:

ψj(Ai) =

{

{gj ∈ Gj |gj Iijgi ∀gi ∈ Ai} if (Gi ,Gj) ∈ E,

∅ otherwise.

Concept or maximal bi-clique: (Ai ,Aj) such thatψj(Ai) = Aj and ψi(Aj) = Ai .

Page 21: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

FCA-based approaches

Generalize the notion of a concept (several definitions),and enumerate all such concepts.

Parameter settings not always intuitive.

Substantially different algorithm design for simple changein definition.

For suitably defined game, Nash equilibrium points capturemaximal bi-cliques.

Page 22: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Page 23: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Normal form game

A finite, n-player, normal form game, G, is a triple 〈N, (Mi), (ri )〉where

N = {1, . . . ,n} is the set of players

Mi = {m1i , . . . ,m

lii } is the set of moves available to player i

and li is the number of available moves for that player.

ri : M1 × · · · × Mn → R is the reward function for eachplayer i . It maps a profile of moves to a value.

Each player i selects a strategy from the set of all availablestrategies, Pi = {pi : Mi → [0,1]}

Page 24: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Normal form game

A finite, n-player, normal form game, G, is a triple 〈N, (Mi), (ri )〉where

N = {1, . . . ,n} is the set of players

Mi = {m1i , . . . ,m

lii } is the set of moves available to player i

and li is the number of available moves for that player.

ri : M1 × · · · × Mn → R is the reward function for eachplayer i . It maps a profile of moves to a value.

Each player i selects a strategy from the set of all availablestrategies, Pi = {pi : Mi → [0,1]}

Page 25: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Nash equilibrium and example

Nash equilibrium: A strategy profile in which no player has anincentive to unilaterally deviate [8, 6].

∀i ∈ N,pi ∈ Pi :

ri(p∗1, . . . ,p

∗i−1,pi , . . . ,p

∗n) ≤ ri(p

∗1, . . . ,p

∗n)

Player 2 chooses 0 Player 2 chooses 1 Player 2 chooses 2

Player 1 chooses 0 (0,0) (1,0) (2,-2)Player 1 chooses 1 (0,1) (1,1) ( 3,-2)Player 1 chooses 2 (-2,2) (-2,3) (2,2)

Page 26: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Nash equilibrium and example

Nash equilibrium: A strategy profile in which no player has anincentive to unilaterally deviate [8, 6].

∀i ∈ N,pi ∈ Pi :

ri(p∗1, . . . ,p

∗i−1,pi , . . . ,p

∗n) ≤ ri(p

∗1, . . . ,p

∗n)

Player 2 chooses 0 Player 2 chooses 1 Player 2 chooses 2

Player 1 chooses 0 (0,0) (1,0) (2,-2)Player 1 chooses 1 (0,1) (1,1) ( 3,-2)Player 1 chooses 2 (-2,2) (-2,3) (2,2)

Page 27: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Page 28: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Party planner game

Two party planners P1 and P2 plan a party by invitingguests from disjoint sets of clients G1 and G2.

Party planners receive compensation based on overallsatisfaction of clients.

Client satisfaction is a function of positive and negativeinteractions at the party

P1 and P2 do not cooperate, but are privy to each othersguest list at any point. Both wish to maximizecompensation.

Page 29: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Satisfaction Reward Function

Let (A1,A2) be a party. Define satisfaction of g1 ∈ A1 attendingparty (A1,A2) as

sat1(g1,A2) =|ψ2(g1) ∩ A2| − w ∗ |A2 \ ψ

2(g1)|

|A2|(1)

Overall reward to party planner i :

rsati (Ai ,Aj) =

gi∈Ai

sati(gi ,Aj) (2)

Page 30: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Concepts as Nash equilibrium points

M1 M1,M2 M1,M2,M3 M1,M3 M2 M2,M3 M3G1 (1,1) (1,2) (1,3) (1,2) (1,1) (1,2) (1,1)

G1,G2 (2,1) (-1,-1) (-2,-3) (-1,-1) (-4,-2) (-4,-4) (-4,-2)G1,G2,G3 (3,1) (0,0) (-3,-3) (-3,-2) (-3,-1) (-6,-4) (-9,-3)

G1,G3 (2,1) (2,2) (0,0) (-1,-1) (2,1) (-1,-1) (-4,-2)G2 (1,1) (-2,-4) (-3,-9) (-2,-4) (-5,-5) (-5,-10) (-5,-5)

G2,G3 (2,1) (-1,-1) (-4,-6) (-4,-4) (-4,-2) (-7,-7) (-10,-5)G3 (1,1) (1,2) (-1,-3) (-2,-4) (1,1) (-2,-4) (-5,-5)

Page 31: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Concepts as Nash equilibrium points

Theorem

For any instance of the bi-clustering game Gbicluster in which rsati

is the selected reward function, there exists w∗, such that∀w ≥ w∗ if (A∗

1,A∗2) is a concept of K = (G1,G2, I12) then

(A∗1,A

∗2) is a Nash equilibrium point of Gbicluster .

Page 32: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Page 33: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

HIN-clustering game

Extend bi-clustering game to n-party planners, n sets of guests.Guest interactions are determined by network topology.

Mining HIN-clusters is equivalent to findingNash-equilibrium points of the HIN-clustering game.

Finding Nash-equilibrium is non-trivial [9].

Adapt simple strategy and key heuristic to enumerate theNash equilibrium points.

Page 34: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Strategy and heuristics

M1 M1,M2 M1,M2,M3 M1,M3 M2 M2,M3 M3G1 (1,1) (1,2) (1,3) (1,2) (1,1) (1,2) (1,1)

G1,G2 (2,1) (-1,-1) (-2,-3) (-1,-1) (-4,-2) (-4,-4) (-4,-2)G1,G2,G3 (3,1) (0,0) (-3,-3) (-3,-2) (-3,-1) (-6,-4) (-9,-3)

G1,G3 (2,1) (2,2) (0,0) (-1,-1) (2,1) (-1,-1) (-4,-2)G2 (1,1) (-2,-4) (-3,-9) (-2,-4) (-5,-5) (-5,-10) (-5,-5)

G2,G3 (2,1) (-1,-1) (-4,-6) (-4,-4) (-4,-2) (-7,-7) (-10,-5)G3 (1,1) (1,2) (-1,-3) (-2,-4) (1,1) (-2,-4) (-5,-5)

1 Mark all second components that are maximal in each row.

Page 35: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Strategy and heuristics

M1 M1,M2 M1,M2,M3 M1,M3 M2 M2,M3 M3G1 (1,1) (1,2) (1,3**) (1,2) (1,1) (1,2) (1,1)

G1,G2 (2,1**) (-1,-1) (-2,-3) (-1,-1) (-4,-2) (-4,-4) (-4,-2)G1,G2,G3 (3,1**) (0,0) (-3,-3) (-3,-2) (-3,-1) (-6,-4) (-9,-3)

G1,G3 (2,1) (2,2**) (0,0) (-1,-1) (2,1) (-1,-1) (-4,-2)G2 (1,1**) (-2,-4) (-3,-9) (-2,-4) (-5,-5) (-5,-10) (-5,-5)

G2,G3 (2,1**) (-1,-1) (-4,-6) (-4,-4) (-4,-2) (-7,-7) (-10,-5)G3 (1,1) (1,2**) (-1,-3) (-2,-4) (1,1) (-2,-4) (-5,-5)

1 Mark all second components that are maximal in each row.

Page 36: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Strategy and heuristics

M1 M1,M2 M1,M2,M3 M1,M3 M2 M2,M3 M3G1 (1,1) (1,2) (1**,3**) (1**,2) (1,1) (1**,2) (1**,1)

G1,G2 (2,1**) (-1,-1) (-2,-3) (-1,-1) (-4,-2) (-4,-4) (-4,-2)G1,G2,G3 (3**,1**) (0,0) (-3,-3) (-3,-2) (-3,-1) (-6,-4) (-9,-3)

G1,G3 (2,1) (2**,2**) (0,0) (-1,-1) (2**,1) (-1,-1) (-4,-2)G2 (1,1**) (-2,-4) (-3,-9) (-2,-4) (-5,-5) (-5,-10) (-5,-5)

G2,G3 (2,1**) (-1,-1) (-4,-6) (-4,-4) (-4,-2) (-7,-7) (-10,-5)G3 (1,1) (1,2**) (-1,-3) (-2,-4) (1,1) (-2,-4) (-5,-5)

1 Mark all second components that are maximal in each row.2 Mark all first components that are maximal in each column.

Page 37: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Strategy and heuristics

M1 M1,M2 M1,M2,M3 M1,M3 M2 M2,M3 M3G1 (1,1) (1,2) (1**,3**) (1**,2) (1,1) (1**,2) (1**,1)

G1,G2 (2,1**) (-1,-1) (-2,-3) (-1,-1) (-4,-2) (-4,-4) (-4,-2)G1,G2,G3 (3**,1**) (0,0) (-3,-3) (-3,-2) (-3,-1) (-6,-4) (-9,-3)

G1,G3 (2,1) (2**,2**) (0,0) (-1,-1) (2**,1) (-1,-1) (-4,-2)G2 (1,1**) (-2,-4) (-3,-9) (-2,-4) (-5,-5) (-5,-10) (-5,-5)

G2,G3 (2,1**) (-1,-1) (-4,-6) (-4,-4) (-4,-2) (-7,-7) (-10,-5)G3 (1,1) (1,2**) (-1,-3) (-2,-4) (1,1) (-2,-4) (-5,-5)

1 Mark all second components that are maximal in each row.2 Mark all first components that are maximal in each column.3 Any cell that has both components marked is a Nash

equilibrium.

Heuristic: Every Nash equilibrium point is a superset of ann-concept.

Page 38: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

GHIN framework

Utilizing heuristic, exponential run time still possible.

Sacrifice completeness, but guarantee correctness

Attempt to form a Nash equilibrium point with each objectin the HIN.

Page 39: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

GHIN framework

1 For each object gi in the seed set attempt to formmaximally large n-partite clique in HIN.

2 Add objects from all domains to the clique while the rewardincreases.

3 Remove objects not in original clique from all domainswhile the reward increases.

4 If no change from step 2 and 3 Nash equilibrium found,else repeat 2 and 3.

5 Update the seed set by removing all objects in the cluster.

Page 40: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Page 41: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Shortcomings of satisfaction reward function

Satisfaction reward function simple, intuitive, and efficient.

If matrices in HIN have significantly different density levels,then bias occurs.

Use expected satisfaction instead.

Page 42: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Expected satisfaction

Assume all objects are independent.

For given party (A1, . . . ,An) expected number ofinteractions is number of success in |Aj | draws from finitepopulation of |Gj | objects

Expected number of success is hypergeometricallydistributed random variable.

Page 43: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Expected satisfaction

expij(gi ,Aj) =|Aj | ∗ |ψ

j(gi)|

|Gj |

varij(gi ,Aj) =|Aj | ∗ |ψ

j(gi)| ∗ (|Gj | − |Aj |) ∗ (|Gj | − |ψj (gi)|)

|Gj |2 ∗ (|Gj | − 1)

esat(gi ,Aj) =|ψj(gi) ∩ Aj | − expij(gi ,Aj)

varij(gi ,Aj)− w

esat(gi ,A−i) =∑

Aj⊆Gj ,(Gi ,Gj)∈E

esat(gi ,Gj)

resati (Ai ,A−i) =

gi∈Ai

esat(gi ,A−i)

Page 44: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Expected satisfaction

expij(gi ,Aj) =|Aj | ∗ |ψ

j(gi)|

|Gj |

varij(gi ,Aj) =|Aj | ∗ |ψ

j(gi)| ∗ (|Gj | − |Aj |) ∗ (|Gj | − |ψj (gi)|)

|Gj |2 ∗ (|Gj | − 1)

esat(gi ,Aj) =|ψj(gi) ∩ Aj | − expij(gi ,Aj)

varij(gi ,Aj)− w

esat(gi ,A−i) =∑

Aj⊆Gj ,(Gi ,Gj)∈E

esat(gi ,Gj)

resati (Ai ,A−i) =

gi∈Ai

esat(gi ,A−i)

Page 45: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Tiring party goers

Incorporate ‘tiring’ factor to avoid too much overlap. Let c(gi)denote the number of clusters gi has appeared in upto thecurrent time-step, then let

t = f (c(gi))

wheref : N → (0,1]

and f is anti-monotonic. For example:

f (x) =1x2

f (x) =1ex

Page 46: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Page 47: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

HINs and evaluation

HIN name Description Num domains Num classes Total num objectsMER Newsgroup, Middle East politics and Religion 3 2 24,783REC Newsgroup, recreation 3 2 26,225SCI Newsgroup, science 3 5 37,413PC Newsgroup, pc and software 3 5 35,186

PCR Newsgroup, politics and Christianity 3 2 24,485FOUR_AREAS DBLP subset of database, data mining, AI, and IR papers 4 4 70,517

Extrinsic evaluation, B3 recall and precision:

Prec(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|C(g) ∩ C(g′)|

Rcl(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|L(g) ∩ L(g′)|

B3Prec = Avgg [Avgg′,C(g)∩C(g′)6=∅[Prec(g,g′)]]

B3Rcl = Avgg [Avgg′,L(g)∩L(g′)6=∅[Rcl(g,g′)]]

Page 48: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

HINs and evaluation

HIN name Description Num domains Num classes Total num objectsMER Newsgroup, Middle East politics and Religion 3 2 24,783REC Newsgroup, recreation 3 2 26,225SCI Newsgroup, science 3 5 37,413PC Newsgroup, pc and software 3 5 35,186

PCR Newsgroup, politics and Christianity 3 2 24,485FOUR_AREAS DBLP subset of database, data mining, AI, and IR papers 4 4 70,517

Extrinsic evaluation, B3 recall and precision:

Prec(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|C(g) ∩ C(g′)|

Rcl(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|L(g) ∩ L(g′)|

B3Prec = Avgg [Avgg′,C(g)∩C(g′)6=∅[Prec(g,g′)]]

B3Rcl = Avgg [Avgg′,L(g)∩L(g′)6=∅[Rcl(g,g′)]]

Page 49: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

HINs and evaluation

HIN name Description Num domains Num classes Total num objectsMER Newsgroup, Middle East politics and Religion 3 2 24,783REC Newsgroup, recreation 3 2 26,225SCI Newsgroup, science 3 5 37,413PC Newsgroup, pc and software 3 5 35,186

PCR Newsgroup, politics and Christianity 3 2 24,485FOUR_AREAS DBLP subset of database, data mining, AI, and IR papers 4 4 70,517

Extrinsic evaluation, B3 recall and precision:

Prec(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|C(g) ∩ C(g′)|

Rcl(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|L(g) ∩ L(g′)|

B3Prec = Avgg [Avgg′,C(g)∩C(g′)6=∅[Prec(g,g′)]]

B3Rcl = Avgg [Avgg′,L(g)∩L(g′)6=∅[Rcl(g,g′)]]

Page 50: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Results

HIN Algorithm F1 F0.5 F2

MER

GHIN expsat 0.627051 0.736396 0.622735GHIN sat 0.553790 0.649559 0.569664NetClus 0.3759 0.4512 0.322

MDC 0.3661 0.4533 0.3070

REC

GHIN expsat 0.544189 0.633362 0.508778GHIN sat 0.434367 0.485025 0.451840NetClus 0.2784 0.2870 0.2704

MDC 0.2845 0.2953 0.2746

SCI

GHIN expsat 0.484068 0.589704 0.530239GHIN sat 0.402306 0.481798 0.462886NetClus 0.2609 0.2583 0.2635

MDC 0.2532 0.2529 0.2535

PC

GHIN expsat 0.334827 0.520472 0.302943GHIN sat 0.306503 0.432229 0.345382NetClus 0.2254 0.2068 0.2477

MDC 0.2282 0.2116 0.2476

PCR

GHIN expsat 0.640894 0.793399 0.508778GHIN sat 0.541986 0.574588 0.530971NetClus 0.3642 0.4396 0.3109

MDC 0.3440 0.4268 0.2810

FOUR_AREAS

GHIN expsat 0.623117 0.598877 0.650079GHIN sat 0.5315 0.506687 0.5588NetClus 0.3612 0.36655 0.3560

MDC 0.5085 0.5162 0.5010

Page 51: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Class distributions in clusters

Algorithm Class C1 C2 C3 C4

GHIN expsat

DB 0.0601266 0.93633 0.0133188 0.0512748DM 0.028481 0.0363608 0.0106007 0.850142IR 0.882911 0.0204432 0.133188 0.0339943AI 0.028481 0.00686642 0.842892 0.0645892

NetClus

DB 0.0553833 0.450802 0.500074 0.0955971DM 0.163934 0.15815 0.128535 0.304584IR 0.179553 0.0512035 0.242707 0.112786AI 0.60113 0.339844 0.128684 0.487033

MDC

DB 0.186681 0.232455 0.803727 0.000000DM 0.261844 0.000000 0.128592 0.161790IR 0.003183 0.278748 0.000000 0.75888AI 0.548292 0.488797 0.067680 0.079323

Page 52: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Sample Clusters

Terms Authors Conferencesdata Surajit Chaudhuri VLDB

database Divesh Srivastava SIGMODqueries H. V. Jagadish ICDE

databases Jeffrey F. Naughton PODSquerys Michael J. Carey EDBT

xml Raghu Ramakrishnan

mining Jiawei Han KDDlearning Christos Faloutsos PAKDD

data Wei Wang ICDMfrequent Heikki Mannila SDM

association Srinivasan Parthasarathy PKDDpatterns Ke Wang ICML

Page 53: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Applying GHIN to EMAP data

E-MAP (epistatic miniarray porfiles) query and target genes

Genetic interaction score indicates whether strain ishealthier or sicker than expected (positive or negative)

Negative network derived by using scores ≤ −2.5

Find Nash points, and use functional enrichment: Do wefind small functional classes?

Page 54: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Applying GHIN to EMAP data

−0.01 0 0.01 0.02 0.03 0.04 0.05 0.060

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Fra

ctio

n of

pat

tern

s en

riche

d

P−value threshold

Functional enrichment by large classes (31−500)

Exp sat tiringSat

−0.01 0 0.01 0.02 0.03 0.04 0.05 0.060

0.1

0.2

0.3

0.4

0.5

0.6

0.7

P−value threshold

Fra

ctio

n of

pat

tern

s en

riche

d

Functional enrichment by small classes

Exp sat tiringSat

Page 55: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Clusters exclusively annotated by small functional classes:

YBR078W ECM33YIL034C CAP2YIL159W BNR1YKL007W CAP1YMR054W STV1YMR058W FET3YMR089C YTA12

YFL031W HAC1YHR079C IRE1YJL095W BCK1YCL048W SPS22YIL073C SPO22YJL155C FBP26YLR267W BOP2

Page 56: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Parameter study

Effect of w on extrinsic clustering quality.

0 2 4 6 8 10 12−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

w

F1

scor

e

merrecpcrpcscifour

0 2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

w

F0.

5 sc

ore

merrecpcrpcscifour

0 2 4 6 8 10 12−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

w

F2

scor

e

merrecpcrpcscifour

Page 57: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Parameter study

Effect of w on algorithm operation.

0 2 4 6 8 10 125

10

15

20

25

30

w

Ave

rage

num

iter

atio

ns to

find

Nas

h

merrecpcrpcscifour

0 2 4 6 8 10 120

0.5

1

1.5

2

2.5x 10

4

w

Tot

al n

umbe

r of

iter

atio

ns

mer

rec

pcr

pc

sci

four

0 2 4 6 8 10 120

100

200

300

400

500

600

700

800

900

1000

w

Num

ber

clus

ters

mer

rec

pcr

pc

sci

four

Page 58: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Conclusion

Novel framework for defining and enumeratingHIN-clusters.

First (as far as I know) connection between Informationnetwork clustering and game theory.

Initial experimental results show promise.

Page 59: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Ongoing and future work

Development of reward functions, (information theortic,spectral?).

Clustering in biological data, do we find smaller functionalclasses compared to other bi-clustering methods?

Extension of framework to weighted HINs.

More algorithmic development.

Compare algorithms with actual Nash solver.

Page 60: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

S. M. Arindam Banerjee, Sugato Basu.Multi-way clustering on relation graphs.In Proceedings of the SIAM International Conference onData Mining, 2007.

R. Bekkerman, R. El-Yaniv, and A. McCallum.Multi-way distributional clustering via pairwise interactions.In ICML ’05: Proceedings of the 22nd internationalconference on Machine learning, pages 41–48, New York,NY, USA, 2005. ACM.

J. Li, G. Liu, H. Li, and L. Wong.Maximal biclique subgraphs and closed pattern pairs of theadjacency matrix: A one-to-one correspondence andmining algorithms.IEEE Trans. Knowl. Data Eng., 19(12):1625–1637, 2007.

B. Long, X. Wu, Z. M. Zhang, and P. S. Yu.Unsupervised learning on k-partite graphs.

Page 61: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

In KDD ’06: Proceedings of the 12th ACM SIGKDDinternational conference on Knowledge discovery and datamining, pages 317–326, New York, NY, USA, 2006. ACM.

B. Long, Z. M. Zhang, X. Wu, and P. S. Yu.Spectral clustering for multi-type relational data.In ICML ’06: Proceedings of the 23rd internationalconference on Machine learning, pages 585–592, NewYork, NY, USA, 2006. ACM.

E. Mendelson.Introducing Game Theory and Its Applications.Chapman & Hall / CRC, 2004.

I. A. T. S. Mohammed J Zaki, Markus Peters.Clicks: An effective algorithm for mining subspace clustersin categorical datasets.Data and Knowledge Engineering special issue onIntelligent Data Mining, 60 (2):51–70, 2007.

Page 62: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

G. Owen.Game Theory.Academic Press, 1995.

R. Porter, E. Nudelman, and Y. Shoham.Simple search methods for finding a nash equilibrium.In Games and Economic Behavior, pages 664–669, 2004.

Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu.Rankclus: Integrating clustering with ranking forheterogeneous information network analysis.In Proc. 2009 Int. Conf. on Extending Data BaseTechnology (EDBT’09 ), 2009.

Y. Sun, Y. Yu, and J. Han.Ranking-based clustering of heterogeneous informationnetworks with star network schema.In Proc. 2009 ACM SIGKDD Int. Conf. on KnowledgeDiscovery and Data Mining (KDD’09 ), 2009.

Page 63: A Game Theoretic Framework for Heterogenous Information Network Clustering

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

A. Tanay, R. Sharan, and R. Shamir.Discovering statistically significant biclusters in geneexpression data.In In Proceedings of ISMB 2002, 2002.