a game theoretic framework for heterogenous information network clustering

Post on 24-May-2015

539 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Game Theoretic Framework forHeterogeneous Information Network Clustering

Faris Alqadah

Johns Hopkins University

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Motivation

Heterogeneous Information Networks (HINs) are pervasivein applications ranging from bioinformatics to e-commerce.

Generalization of bi-clustering to pairwise relations asopposed to tensor spaces.

No unified definition of a HIN-cluster or algorithmicframework to mine them.

Address short coming of ‘pattern’-based approaches.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

HINs

Objects derived fromdistinct domains

Topology of the networkdetermined bypairwise-binary relationsamongst domains.

Graph representation of aHIN is a multi-partitegraph.

Clicking patterns, socialnetworks, gene networksfrom different experiments.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Related Work

Three major categories of work

Multi-way clustering [5, 4, 1, 2]: Directly extendbi-clustering or co-clustering. Mostly hard-clusters.

Information-network [10, 11]: Combine ranking andclustering using probabilty generating models, limited bynetwork-topology, hard clustering.

Pattern-based [3, 12, 7]: Formal Concept Analysis,overlapping clustering, too many clusters, parametersettings.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Multiple-edge HIN,competingcluster-influences.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Multiple-edge HIN,competingcluster-influences.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

For single-edge HIN,trade-off between numberof nodes in bipartite sets.

Multiple-edge HIN,competingcluster-influences.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

Multiple-edge HIN,competingcluster-influences.

An ‘ideal’ HIN-clustershould be an equilibriumpoint among all competingclustering influences.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Key Idea

Multiple-edge HIN,competingcluster-influences.

An ‘ideal’ HIN-clustershould be an equilibriumpoint among all competingclustering influences.

Nash equilibrium: No onecan do any betterassuming everyone elseretains the same strategy.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Notation

Context Kij = (Gi ,Gj , Iij), two sets and a relation.

A HIN Gn = (V,E) where V is a set of domains{G1, . . . ,Gn} and (Gi ,Gj) ∈ E iff ∃Kij

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Notation

Context Kij = (Gi ,Gj , Iij), two sets and a relation.

A HIN Gn = (V,E) where V is a set of domains{G1, . . . ,Gn} and (Gi ,Gj) ∈ E iff ∃Kij

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Concepts (maximal bicliques)

Common neighbors:

ψj(Ai) =

{

{gj ∈ Gj |gj Iijgi ∀gi ∈ Ai} if (Gi ,Gj) ∈ E,

∅ otherwise.

Concept or maximal bi-clique: (Ai ,Aj) such thatψj(Ai) = Aj and ψi(Aj) = Ai .

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Concepts (maximal bicliques)

Common neighbors:

ψj(Ai) =

{

{gj ∈ Gj |gj Iijgi ∀gi ∈ Ai} if (Gi ,Gj) ∈ E,

∅ otherwise.

Concept or maximal bi-clique: (Ai ,Aj) such thatψj(Ai) = Aj and ψi(Aj) = Ai .

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

FCA-based approaches

Generalize the notion of a concept (several definitions),and enumerate all such concepts.

Parameter settings not always intuitive.

Substantially different algorithm design for simple changein definition.

For suitably defined game, Nash equilibrium points capturemaximal bi-cliques.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Normal form game

A finite, n-player, normal form game, G, is a triple 〈N, (Mi), (ri )〉where

N = {1, . . . ,n} is the set of players

Mi = {m1i , . . . ,m

lii } is the set of moves available to player i

and li is the number of available moves for that player.

ri : M1 × · · · × Mn → R is the reward function for eachplayer i . It maps a profile of moves to a value.

Each player i selects a strategy from the set of all availablestrategies, Pi = {pi : Mi → [0,1]}

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Normal form game

A finite, n-player, normal form game, G, is a triple 〈N, (Mi), (ri )〉where

N = {1, . . . ,n} is the set of players

Mi = {m1i , . . . ,m

lii } is the set of moves available to player i

and li is the number of available moves for that player.

ri : M1 × · · · × Mn → R is the reward function for eachplayer i . It maps a profile of moves to a value.

Each player i selects a strategy from the set of all availablestrategies, Pi = {pi : Mi → [0,1]}

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Nash equilibrium and example

Nash equilibrium: A strategy profile in which no player has anincentive to unilaterally deviate [8, 6].

∀i ∈ N,pi ∈ Pi :

ri(p∗1, . . . ,p

∗i−1,pi , . . . ,p

∗n) ≤ ri(p

∗1, . . . ,p

∗n)

Player 2 chooses 0 Player 2 chooses 1 Player 2 chooses 2

Player 1 chooses 0 (0,0) (1,0) (2,-2)Player 1 chooses 1 (0,1) (1,1) ( 3,-2)Player 1 chooses 2 (-2,2) (-2,3) (2,2)

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Nash equilibrium and example

Nash equilibrium: A strategy profile in which no player has anincentive to unilaterally deviate [8, 6].

∀i ∈ N,pi ∈ Pi :

ri(p∗1, . . . ,p

∗i−1,pi , . . . ,p

∗n) ≤ ri(p

∗1, . . . ,p

∗n)

Player 2 chooses 0 Player 2 chooses 1 Player 2 chooses 2

Player 1 chooses 0 (0,0) (1,0) (2,-2)Player 1 chooses 1 (0,1) (1,1) ( 3,-2)Player 1 chooses 2 (-2,2) (-2,3) (2,2)

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Party planner game

Two party planners P1 and P2 plan a party by invitingguests from disjoint sets of clients G1 and G2.

Party planners receive compensation based on overallsatisfaction of clients.

Client satisfaction is a function of positive and negativeinteractions at the party

P1 and P2 do not cooperate, but are privy to each othersguest list at any point. Both wish to maximizecompensation.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Satisfaction Reward Function

Let (A1,A2) be a party. Define satisfaction of g1 ∈ A1 attendingparty (A1,A2) as

sat1(g1,A2) =|ψ2(g1) ∩ A2| − w ∗ |A2 \ ψ

2(g1)|

|A2|(1)

Overall reward to party planner i :

rsati (Ai ,Aj) =

gi∈Ai

sati(gi ,Aj) (2)

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Concepts as Nash equilibrium points

M1 M1,M2 M1,M2,M3 M1,M3 M2 M2,M3 M3G1 (1,1) (1,2) (1,3) (1,2) (1,1) (1,2) (1,1)

G1,G2 (2,1) (-1,-1) (-2,-3) (-1,-1) (-4,-2) (-4,-4) (-4,-2)G1,G2,G3 (3,1) (0,0) (-3,-3) (-3,-2) (-3,-1) (-6,-4) (-9,-3)

G1,G3 (2,1) (2,2) (0,0) (-1,-1) (2,1) (-1,-1) (-4,-2)G2 (1,1) (-2,-4) (-3,-9) (-2,-4) (-5,-5) (-5,-10) (-5,-5)

G2,G3 (2,1) (-1,-1) (-4,-6) (-4,-4) (-4,-2) (-7,-7) (-10,-5)G3 (1,1) (1,2) (-1,-3) (-2,-4) (1,1) (-2,-4) (-5,-5)

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Concepts as Nash equilibrium points

Theorem

For any instance of the bi-clustering game Gbicluster in which rsati

is the selected reward function, there exists w∗, such that∀w ≥ w∗ if (A∗

1,A∗2) is a concept of K = (G1,G2, I12) then

(A∗1,A

∗2) is a Nash equilibrium point of Gbicluster .

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

HIN-clustering game

Extend bi-clustering game to n-party planners, n sets of guests.Guest interactions are determined by network topology.

Mining HIN-clusters is equivalent to findingNash-equilibrium points of the HIN-clustering game.

Finding Nash-equilibrium is non-trivial [9].

Adapt simple strategy and key heuristic to enumerate theNash equilibrium points.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Strategy and heuristics

M1 M1,M2 M1,M2,M3 M1,M3 M2 M2,M3 M3G1 (1,1) (1,2) (1,3) (1,2) (1,1) (1,2) (1,1)

G1,G2 (2,1) (-1,-1) (-2,-3) (-1,-1) (-4,-2) (-4,-4) (-4,-2)G1,G2,G3 (3,1) (0,0) (-3,-3) (-3,-2) (-3,-1) (-6,-4) (-9,-3)

G1,G3 (2,1) (2,2) (0,0) (-1,-1) (2,1) (-1,-1) (-4,-2)G2 (1,1) (-2,-4) (-3,-9) (-2,-4) (-5,-5) (-5,-10) (-5,-5)

G2,G3 (2,1) (-1,-1) (-4,-6) (-4,-4) (-4,-2) (-7,-7) (-10,-5)G3 (1,1) (1,2) (-1,-3) (-2,-4) (1,1) (-2,-4) (-5,-5)

1 Mark all second components that are maximal in each row.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Strategy and heuristics

M1 M1,M2 M1,M2,M3 M1,M3 M2 M2,M3 M3G1 (1,1) (1,2) (1,3**) (1,2) (1,1) (1,2) (1,1)

G1,G2 (2,1**) (-1,-1) (-2,-3) (-1,-1) (-4,-2) (-4,-4) (-4,-2)G1,G2,G3 (3,1**) (0,0) (-3,-3) (-3,-2) (-3,-1) (-6,-4) (-9,-3)

G1,G3 (2,1) (2,2**) (0,0) (-1,-1) (2,1) (-1,-1) (-4,-2)G2 (1,1**) (-2,-4) (-3,-9) (-2,-4) (-5,-5) (-5,-10) (-5,-5)

G2,G3 (2,1**) (-1,-1) (-4,-6) (-4,-4) (-4,-2) (-7,-7) (-10,-5)G3 (1,1) (1,2**) (-1,-3) (-2,-4) (1,1) (-2,-4) (-5,-5)

1 Mark all second components that are maximal in each row.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Strategy and heuristics

M1 M1,M2 M1,M2,M3 M1,M3 M2 M2,M3 M3G1 (1,1) (1,2) (1**,3**) (1**,2) (1,1) (1**,2) (1**,1)

G1,G2 (2,1**) (-1,-1) (-2,-3) (-1,-1) (-4,-2) (-4,-4) (-4,-2)G1,G2,G3 (3**,1**) (0,0) (-3,-3) (-3,-2) (-3,-1) (-6,-4) (-9,-3)

G1,G3 (2,1) (2**,2**) (0,0) (-1,-1) (2**,1) (-1,-1) (-4,-2)G2 (1,1**) (-2,-4) (-3,-9) (-2,-4) (-5,-5) (-5,-10) (-5,-5)

G2,G3 (2,1**) (-1,-1) (-4,-6) (-4,-4) (-4,-2) (-7,-7) (-10,-5)G3 (1,1) (1,2**) (-1,-3) (-2,-4) (1,1) (-2,-4) (-5,-5)

1 Mark all second components that are maximal in each row.2 Mark all first components that are maximal in each column.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Strategy and heuristics

M1 M1,M2 M1,M2,M3 M1,M3 M2 M2,M3 M3G1 (1,1) (1,2) (1**,3**) (1**,2) (1,1) (1**,2) (1**,1)

G1,G2 (2,1**) (-1,-1) (-2,-3) (-1,-1) (-4,-2) (-4,-4) (-4,-2)G1,G2,G3 (3**,1**) (0,0) (-3,-3) (-3,-2) (-3,-1) (-6,-4) (-9,-3)

G1,G3 (2,1) (2**,2**) (0,0) (-1,-1) (2**,1) (-1,-1) (-4,-2)G2 (1,1**) (-2,-4) (-3,-9) (-2,-4) (-5,-5) (-5,-10) (-5,-5)

G2,G3 (2,1**) (-1,-1) (-4,-6) (-4,-4) (-4,-2) (-7,-7) (-10,-5)G3 (1,1) (1,2**) (-1,-3) (-2,-4) (1,1) (-2,-4) (-5,-5)

1 Mark all second components that are maximal in each row.2 Mark all first components that are maximal in each column.3 Any cell that has both components marked is a Nash

equilibrium.

Heuristic: Every Nash equilibrium point is a superset of ann-concept.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

GHIN framework

Utilizing heuristic, exponential run time still possible.

Sacrifice completeness, but guarantee correctness

Attempt to form a Nash equilibrium point with each objectin the HIN.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

GHIN framework

1 For each object gi in the seed set attempt to formmaximally large n-partite clique in HIN.

2 Add objects from all domains to the clique while the rewardincreases.

3 Remove objects not in original clique from all domainswhile the reward increases.

4 If no change from step 2 and 3 Nash equilibrium found,else repeat 2 and 3.

5 Update the seed set by removing all objects in the cluster.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Shortcomings of satisfaction reward function

Satisfaction reward function simple, intuitive, and efficient.

If matrices in HIN have significantly different density levels,then bias occurs.

Use expected satisfaction instead.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Expected satisfaction

Assume all objects are independent.

For given party (A1, . . . ,An) expected number ofinteractions is number of success in |Aj | draws from finitepopulation of |Gj | objects

Expected number of success is hypergeometricallydistributed random variable.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Expected satisfaction

expij(gi ,Aj) =|Aj | ∗ |ψ

j(gi)|

|Gj |

varij(gi ,Aj) =|Aj | ∗ |ψ

j(gi)| ∗ (|Gj | − |Aj |) ∗ (|Gj | − |ψj (gi)|)

|Gj |2 ∗ (|Gj | − 1)

esat(gi ,Aj) =|ψj(gi) ∩ Aj | − expij(gi ,Aj)

varij(gi ,Aj)− w

esat(gi ,A−i) =∑

Aj⊆Gj ,(Gi ,Gj)∈E

esat(gi ,Gj)

resati (Ai ,A−i) =

gi∈Ai

esat(gi ,A−i)

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Expected satisfaction

expij(gi ,Aj) =|Aj | ∗ |ψ

j(gi)|

|Gj |

varij(gi ,Aj) =|Aj | ∗ |ψ

j(gi)| ∗ (|Gj | − |Aj |) ∗ (|Gj | − |ψj (gi)|)

|Gj |2 ∗ (|Gj | − 1)

esat(gi ,Aj) =|ψj(gi) ∩ Aj | − expij(gi ,Aj)

varij(gi ,Aj)− w

esat(gi ,A−i) =∑

Aj⊆Gj ,(Gi ,Gj)∈E

esat(gi ,Gj)

resati (Ai ,A−i) =

gi∈Ai

esat(gi ,A−i)

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Tiring party goers

Incorporate ‘tiring’ factor to avoid too much overlap. Let c(gi)denote the number of clusters gi has appeared in upto thecurrent time-step, then let

t = f (c(gi))

wheref : N → (0,1]

and f is anti-monotonic. For example:

f (x) =1x2

f (x) =1ex

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Outline

1 IntroductionMotivation

2 PreliminariesHINs and FCAGame Theory

3 The Bi-clustering GameParty-Planners

4 FrameworkGHIN

5 Reward FunctionsExpected Satisfaction

6 Experimental ResultsReal world HINs

7 Conclusion

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

HINs and evaluation

HIN name Description Num domains Num classes Total num objectsMER Newsgroup, Middle East politics and Religion 3 2 24,783REC Newsgroup, recreation 3 2 26,225SCI Newsgroup, science 3 5 37,413PC Newsgroup, pc and software 3 5 35,186

PCR Newsgroup, politics and Christianity 3 2 24,485FOUR_AREAS DBLP subset of database, data mining, AI, and IR papers 4 4 70,517

Extrinsic evaluation, B3 recall and precision:

Prec(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|C(g) ∩ C(g′)|

Rcl(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|L(g) ∩ L(g′)|

B3Prec = Avgg [Avgg′,C(g)∩C(g′)6=∅[Prec(g,g′)]]

B3Rcl = Avgg [Avgg′,L(g)∩L(g′)6=∅[Rcl(g,g′)]]

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

HINs and evaluation

HIN name Description Num domains Num classes Total num objectsMER Newsgroup, Middle East politics and Religion 3 2 24,783REC Newsgroup, recreation 3 2 26,225SCI Newsgroup, science 3 5 37,413PC Newsgroup, pc and software 3 5 35,186

PCR Newsgroup, politics and Christianity 3 2 24,485FOUR_AREAS DBLP subset of database, data mining, AI, and IR papers 4 4 70,517

Extrinsic evaluation, B3 recall and precision:

Prec(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|C(g) ∩ C(g′)|

Rcl(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|L(g) ∩ L(g′)|

B3Prec = Avgg [Avgg′,C(g)∩C(g′)6=∅[Prec(g,g′)]]

B3Rcl = Avgg [Avgg′,L(g)∩L(g′)6=∅[Rcl(g,g′)]]

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

HINs and evaluation

HIN name Description Num domains Num classes Total num objectsMER Newsgroup, Middle East politics and Religion 3 2 24,783REC Newsgroup, recreation 3 2 26,225SCI Newsgroup, science 3 5 37,413PC Newsgroup, pc and software 3 5 35,186

PCR Newsgroup, politics and Christianity 3 2 24,485FOUR_AREAS DBLP subset of database, data mining, AI, and IR papers 4 4 70,517

Extrinsic evaluation, B3 recall and precision:

Prec(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|C(g) ∩ C(g′)|

Rcl(g,g′) =min(|C(g) ∩ C(g′)|, |L(g) ∩ L(g′)|)

|L(g) ∩ L(g′)|

B3Prec = Avgg [Avgg′,C(g)∩C(g′)6=∅[Prec(g,g′)]]

B3Rcl = Avgg [Avgg′,L(g)∩L(g′)6=∅[Rcl(g,g′)]]

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Results

HIN Algorithm F1 F0.5 F2

MER

GHIN expsat 0.627051 0.736396 0.622735GHIN sat 0.553790 0.649559 0.569664NetClus 0.3759 0.4512 0.322

MDC 0.3661 0.4533 0.3070

REC

GHIN expsat 0.544189 0.633362 0.508778GHIN sat 0.434367 0.485025 0.451840NetClus 0.2784 0.2870 0.2704

MDC 0.2845 0.2953 0.2746

SCI

GHIN expsat 0.484068 0.589704 0.530239GHIN sat 0.402306 0.481798 0.462886NetClus 0.2609 0.2583 0.2635

MDC 0.2532 0.2529 0.2535

PC

GHIN expsat 0.334827 0.520472 0.302943GHIN sat 0.306503 0.432229 0.345382NetClus 0.2254 0.2068 0.2477

MDC 0.2282 0.2116 0.2476

PCR

GHIN expsat 0.640894 0.793399 0.508778GHIN sat 0.541986 0.574588 0.530971NetClus 0.3642 0.4396 0.3109

MDC 0.3440 0.4268 0.2810

FOUR_AREAS

GHIN expsat 0.623117 0.598877 0.650079GHIN sat 0.5315 0.506687 0.5588NetClus 0.3612 0.36655 0.3560

MDC 0.5085 0.5162 0.5010

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Class distributions in clusters

Algorithm Class C1 C2 C3 C4

GHIN expsat

DB 0.0601266 0.93633 0.0133188 0.0512748DM 0.028481 0.0363608 0.0106007 0.850142IR 0.882911 0.0204432 0.133188 0.0339943AI 0.028481 0.00686642 0.842892 0.0645892

NetClus

DB 0.0553833 0.450802 0.500074 0.0955971DM 0.163934 0.15815 0.128535 0.304584IR 0.179553 0.0512035 0.242707 0.112786AI 0.60113 0.339844 0.128684 0.487033

MDC

DB 0.186681 0.232455 0.803727 0.000000DM 0.261844 0.000000 0.128592 0.161790IR 0.003183 0.278748 0.000000 0.75888AI 0.548292 0.488797 0.067680 0.079323

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Sample Clusters

Terms Authors Conferencesdata Surajit Chaudhuri VLDB

database Divesh Srivastava SIGMODqueries H. V. Jagadish ICDE

databases Jeffrey F. Naughton PODSquerys Michael J. Carey EDBT

xml Raghu Ramakrishnan

mining Jiawei Han KDDlearning Christos Faloutsos PAKDD

data Wei Wang ICDMfrequent Heikki Mannila SDM

association Srinivasan Parthasarathy PKDDpatterns Ke Wang ICML

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Applying GHIN to EMAP data

E-MAP (epistatic miniarray porfiles) query and target genes

Genetic interaction score indicates whether strain ishealthier or sicker than expected (positive or negative)

Negative network derived by using scores ≤ −2.5

Find Nash points, and use functional enrichment: Do wefind small functional classes?

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Applying GHIN to EMAP data

−0.01 0 0.01 0.02 0.03 0.04 0.05 0.060

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Fra

ctio

n of

pat

tern

s en

riche

d

P−value threshold

Functional enrichment by large classes (31−500)

Exp sat tiringSat

−0.01 0 0.01 0.02 0.03 0.04 0.05 0.060

0.1

0.2

0.3

0.4

0.5

0.6

0.7

P−value threshold

Fra

ctio

n of

pat

tern

s en

riche

d

Functional enrichment by small classes

Exp sat tiringSat

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Clusters exclusively annotated by small functional classes:

YBR078W ECM33YIL034C CAP2YIL159W BNR1YKL007W CAP1YMR054W STV1YMR058W FET3YMR089C YTA12

YFL031W HAC1YHR079C IRE1YJL095W BCK1YCL048W SPS22YIL073C SPO22YJL155C FBP26YLR267W BOP2

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Parameter study

Effect of w on extrinsic clustering quality.

0 2 4 6 8 10 12−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

w

F1

scor

e

merrecpcrpcscifour

0 2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

w

F0.

5 sc

ore

merrecpcrpcscifour

0 2 4 6 8 10 12−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

w

F2

scor

e

merrecpcrpcscifour

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Parameter study

Effect of w on algorithm operation.

0 2 4 6 8 10 125

10

15

20

25

30

w

Ave

rage

num

iter

atio

ns to

find

Nas

h

merrecpcrpcscifour

0 2 4 6 8 10 120

0.5

1

1.5

2

2.5x 10

4

w

Tot

al n

umbe

r of

iter

atio

ns

mer

rec

pcr

pc

sci

four

0 2 4 6 8 10 120

100

200

300

400

500

600

700

800

900

1000

w

Num

ber

clus

ters

mer

rec

pcr

pc

sci

four

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Conclusion

Novel framework for defining and enumeratingHIN-clusters.

First (as far as I know) connection between Informationnetwork clustering and game theory.

Initial experimental results show promise.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

Ongoing and future work

Development of reward functions, (information theortic,spectral?).

Clustering in biological data, do we find smaller functionalclasses compared to other bi-clustering methods?

Extension of framework to weighted HINs.

More algorithmic development.

Compare algorithms with actual Nash solver.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

S. M. Arindam Banerjee, Sugato Basu.Multi-way clustering on relation graphs.In Proceedings of the SIAM International Conference onData Mining, 2007.

R. Bekkerman, R. El-Yaniv, and A. McCallum.Multi-way distributional clustering via pairwise interactions.In ICML ’05: Proceedings of the 22nd internationalconference on Machine learning, pages 41–48, New York,NY, USA, 2005. ACM.

J. Li, G. Liu, H. Li, and L. Wong.Maximal biclique subgraphs and closed pattern pairs of theadjacency matrix: A one-to-one correspondence andmining algorithms.IEEE Trans. Knowl. Data Eng., 19(12):1625–1637, 2007.

B. Long, X. Wu, Z. M. Zhang, and P. S. Yu.Unsupervised learning on k-partite graphs.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

In KDD ’06: Proceedings of the 12th ACM SIGKDDinternational conference on Knowledge discovery and datamining, pages 317–326, New York, NY, USA, 2006. ACM.

B. Long, Z. M. Zhang, X. Wu, and P. S. Yu.Spectral clustering for multi-type relational data.In ICML ’06: Proceedings of the 23rd internationalconference on Machine learning, pages 585–592, NewYork, NY, USA, 2006. ACM.

E. Mendelson.Introducing Game Theory and Its Applications.Chapman & Hall / CRC, 2004.

I. A. T. S. Mohammed J Zaki, Markus Peters.Clicks: An effective algorithm for mining subspace clustersin categorical datasets.Data and Knowledge Engineering special issue onIntelligent Data Mining, 60 (2):51–70, 2007.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

G. Owen.Game Theory.Academic Press, 1995.

R. Porter, E. Nudelman, and Y. Shoham.Simple search methods for finding a nash equilibrium.In Games and Economic Behavior, pages 664–669, 2004.

Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu.Rankclus: Integrating clustering with ranking forheterogeneous information network analysis.In Proc. 2009 Int. Conf. on Extending Data BaseTechnology (EDBT’09 ), 2009.

Y. Sun, Y. Yu, and J. Han.Ranking-based clustering of heterogeneous informationnetworks with star network schema.In Proc. 2009 ACM SIGKDD Int. Conf. on KnowledgeDiscovery and Data Mining (KDD’09 ), 2009.

Introduction Preliminaries The Bi-clustering Game Framework Reward Functions Experimental Results Conclusion

A. Tanay, R. Sharan, and R. Shamir.Discovering statistically significant biclusters in geneexpression data.In In Proceedings of ISMB 2002, 2002.

top related