combinatorial hodge theory and a geometric approach to...

36
Combinatorial Hodge Theory and a Geometric Approach to Ranking Combinatorial Hodge Theory and a Geometric Approach to Ranking Yuan Yao 2008 SIAM Annual Meeting San Diego, July 7, 2008 with Lek-Heng Lim et al.

Upload: phungthuan

Post on 14-Apr-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a GeometricApproach to Ranking

Yuan Yao

2008 SIAM Annual Meeting

San Diego, July 7, 2008

with Lek-Heng Lim et al.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Outline

1 Ranking on networks (graphs)Netflix exampleSkew-symmetric matrices and network flows

2 Combinatorial Hodge TheoryDiscrete Differential GeometryHodge Decomposition TheoremRank aggregation as a linear projection

3 Why pairwise works for netflix: a spectral embedding view

4 Conclusions

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Ranking on Networks (Graphs)

“Multi-criteria” rank/decision systems• Amazon or Netflix’s recommendation system (user-product)• Interest ranking in social networks (person-interest)• Financial analyst recommendation system (analyst-stock)• Voting (voter-candidate)

“Peer-review” systems• publication citation systems (paper-paper)• Google’s webpage ranking (web-web)• eBay’s reputation system (customer-customer)

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

continued...

Ranking data are

incomplete: partial list or pairwise (e.g. ∼ 1% in netflix)

unbalanced: varied distributed votes (e.g. power law)

cardinal: scores or stochastic choices

Implicitly or explicitly, ranking data may be viewed to live on asimple graph G = (V ,E )

V : set of alternatives (products, interests,...) to be ranked

E : pairs of alternatives compared

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Netflix example

Example: Netflix

Example (Netflix Customer-Product Rating)

480189-by-17770 customer-product 5-star rating matrix X

X is incomplete, with 98.82% of missing values

However,

pairwise comparison graph G = (V ,E ) is very dense!

only 0.22% edges are missed, almost complete

rank aggregation without estimating missing values!

unbalanced: number of raters on e ∈ E varies.

Caveat: we are not trying to solve the Netflix prize problem

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Netflix example

Netflix example continued

The first-order statistics, mean score for each product, is ofteninadequate because of the following:

most customers just rate a very small portion of the products

different products might have different raters, whence meanscores involve noise due to arbitrary individual rating scales

not able to characterize the inconsistency (ubiquitous for rankaggregation: Arrow”s impossibility theorem)

How about high order statistics?

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

From 1st Order to 2nd Order: Pairwise Rankings

Linear Model: average score difference between product i andj over all customers who have rated both of them,

wij =

k(Xkj − Xki )

#{k : Xki ,Xkj exist} ,

is translation invariant.

Log-linear Model: when all the scores are positive, thelogarithmic average score ratio,

wij =

k(log Xkj − log Xki )

#{k : Xki ,Xkj exist} ,

is invariant up to a multiplicative constant.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

More Invariants

Linear Probability Model: the probability that product j ispreferred to i in excess of a purely random choice,

wij = Pr{k : Xkj > Xki} −1

2.

This is invariant up to a monotone transformation.

Bradley-Terry Model: logarithmic odd ratio (logit)

wij = logPr{k : Xkj > Xki}Pr{k : Xkj < Xki}

.

This is invariant up to a monotone transformation.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Skew-Symmetric Matrices of Pairwise Rankings

All such models induce (sparse) skew-symmetric matrices of|V |-by-|V | (or 2-alternating tensor),

wij =

{

−wji , {i , j} ∈ ENA, otherwise

where G = (V ,E ) is a pairwise comparison graph.Note: Such a skew-symmetric matrix induces a pairwise rankingnetwork flow on graph G .

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Pairwise Ranking Flow for IMDB Top 20 Movies

Figure: Pairwise ranking flow on a complete graph

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Rank Aggregation Problem

Hardness:

Arrow-Sen’s impossibility theorems in social choice theory

Kemeny-Snell optimal ordering is NP-hard to compute

Spectral analysis on Sn is impractical for large n since|Sn| = n!

Our approach:

Problem (Rank Aggregation)

Does there exist a global ranking function, v : V → R, such that

wij = vj − vi =: δ0(v)(i , j)?

Equivalently, does there exist a scalar field v : V → R whosegradient field gives the flow w?

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Answer: Not Always!

From multivariate calculus, there are non-integrable vector fields(cf. movie A Beautiful Mind). A combinatorial version:

1

A

B

C

1

1

B

1 2

1

1

1

1

1

2

C

D

EF

A

Figure: No global ranking v gives wij = vj − vi : (a) cyclic ranking, notewAB + wBC + wCA 6= 0; (b) contains a 4-node cyclic flowA→ C → D → E → A, note on 3-clique {A, B, C} (also {A, E , F}),wAB + wBC + wCA = 0

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Triangular Transitivity: null triangular-trace

Fact

For a skew-symmetric matrix W = (wij) associated with graph G,

∃v : wij = vj − vi⇒wij + wjk + wki = 0,∀ 3-clique {i , j , k}

Note:

Transitivity subspace: null triangular-trace or curl-free

{W : wij + wjk + wki = 0,∀ 3-clique {i , j , k}}

Example in the last slide, (a) lies outside; (b) lies in thissubspace, but not a gradient flow.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Hodge decomposition for skew-symmetric matrices

A skew-symmetric matrix W associated with G has an uniqueorthogonal decomposition

W = W1 + W2 + W3

where

W1 satisfies (’integrable’): W1(i , j) = vj − vi for somev : V → R;

W2 satisfies that• (’curl-free’) W2(i , j) + W2(j , k) + W3(k, i) = 0 for all3-clique {i , j , k}• (’divergence-free’)

j W3(i , j) = 0 for all edge {i , j} ∈ E ;

W3 ⊥W1 and W3 ⊥W2.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Hodge decomposition for network flows

A network flows (e.g. pairwise rankings) on graph G has anorthogonal decomposition into

gradient flow + locally acyclic (harmonic) + locally cyclic

where the first two components lie in the transitivity subspace and

gradient flow is integrable to give a global ranking

example (b) is locally acyclic, but cyclic on large scale(harmonic)

example (a) is locally cyclic

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Clique Complex and Discrete Differential Forms

We extend graph G to a simplicial complex χG by attachingtriangles

0-simplices χ0G : vertices V

1-simplices χ1G : edges E such that comparison (i.e. pairwise

ranking) between i and j exists

2-simplices χ2G : triangles {i , j , k} such that every edge exists

Note: it suffices here to construct χG up to dimension 2!

global ranking v : V → R, 0-forms, vectors

pairwise ranking w(i , j) = −w(j , i) (∀(i , j) ∈ E ), 1-forms,skew-symmetric matrices, network flows

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Space of k-Forms (k-cochains) and Metrics

k-forms:

C k(χG , R) = {u : χk+1G → R, uσ(i0),...,σ(ik ) = sign(σ)ui0,...,ik}

for (i0, . . . , ik) ∈ χk+1G , where σ ∈ Sk+1 is a permutation on

(0, . . . , k). Also k + 1-alternating tensors.

One may associate C k(χG , R) with inner-products.

In particular, the following inner-product on 1-forms is forunbalance issue in pairwise ranking

〈wij , ωij〉D =∑

(i ,j)∈E

Dijwijωij , w , ω ∈ C 1(χG , R)

where Dij = |{customers who rate both i and j}|.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Discrete Exterior Derivatives (Coboundary Maps)

k-coboundary maps δk : C k(χG , R)→ C k+1(χG , R) aredefined as the alternating difference operator

(δku)(i0, . . . , ik+1) =k+1∑

j=0

(−1)j+1u(i0, . . . , ij−1, ij+1, . . . , ik+1)

δk plays the role of differentiation

In particular,• (δ0v)(i , j) = vj − vi =: (gradv)(i , j)• (δ1w)(i , j , k) = (±)wij + wjk + wki =: (curlw)(i , j , k)(triangular-trace of skew-symmetric matrix (wij))

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Curl

Definition

For each triangle {i , j , k}, the curl (triangular trace)

wij + wjk + wki

measures the total flow-sum along the loop i → j → k → i .

(δ1w)(i , j , k) = 0 implies the flow is path-independent, whichdefines the triangular transitivity subspace.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Two directions of cochain maps:

ForwardC 0 δ0−→ C 1 δ1−→ C 2,

in other words,

Globalgrad−−→ Pairwise

curl−−→ Triplewise

Backward

Globalδ∗0 =grad∗←−−−−−− Pairwise

δ∗1 =curl∗←−−−−− 3-alternating C 2

• grad∗ = δ∗0 is the negative divergence• curl∗ = δ∗1 , is the boundary operator, gives triangular(locally) cyclic pairwise rankings along triangles

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Divergence

Definition

For each alternative i ∈ V , the divergence

(div w)(i) := −(δT0 w)(i) :=

wi∗

measures the inflow-outflow sum at i .

(δT0 w)(i) = 0 implies alternative i is preference-neutral in all

pairwise comparisons.

divergence-free flow δT0 w = 0 is cyclic

With metric D, conjugate operator gives weighted flow-sum

(δ∗0w)(i) =∑

j

wijDij = (δT0 Dw)(i)

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

A Fundamental Property: Closed Map Property

’Boundary of boundary is empty’: δk+1 ◦ δk = 0

in particular• gradient flow is curl-free: curl ◦ grad = δ2 ◦ δ1 = 0• circular flow is divergence-free: div ◦ curl∗ = δ∗1 ◦ δ∗2 = 0

This leads to the powerful combinatorial Laplacians.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Combinatorial Laplacians

Define the k-dimensional combinatorial Laplacian,∆k : C k → C k by

∆k = δk−1δ∗k−1 + δ∗kδk , k > 0

k = 0, ∆0 = δT0 δ0 is the well-known graph Laplacian

k = 1,∆1 = curl ◦ curl∗− grad ◦ div

Important Properties:• ∆k positive semi-definite• ker(∆k) = ker(δT

k−1) ∩ ker(δk): divergence-free andcurl-free, called harmonics

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Hodge Decomposition Theorem

Hodge Decomposition Theorem

Theorem

The space of pairwise rankings, C 1(χG , R), admits an orthogonaldecomposition into three

C 1(χG , R) = im(δ0)⊕ H1 ⊕ im(δ∗1)

whereH1 = ker(δ1) ∩ ker(δ∗0) = ker(∆1).

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Hodge Decomposition Theorem

Hodge Decomposition Illustration

ker δ1

⊕ ⊕

CYCLIC(divergence-free)

LOCALLY CONSISTENT(curl-free)

im δ0

Harmonic

H1

Locally cyclic

im δ∗1

(locally acyclic)

Global

(globally acyclic)

ker δ∗0

Figure: Hodge decomposition for pairwise rankings

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Hodge Decomposition Theorem

Harmonic Rankings: Locally but NOT Globally Consistent

B

1 2

1

1

1

1

1

2

C

D

EF

A

Figure: (a) a harmonic ranking; (b) from truncated Netflix network

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Rank aggregation as a linear projection

Rank aggregation as a linear projection

Corollary

Every pairwise ranking admits a unique orthogonal decomposition,

w = projim δ0w + projker(δ∗0 ) w

i.e.pairwise = grad(global) + cyclic

Particularly the first projection grad(global) gives a global ranking

x∗ = (δ∗0δ0)†δ∗0w = −(∆0)

† div w

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Rank aggregation as a linear projection

When Harmonic Ranking Vanishes

Corollary

If the clique complex χG has trivial 1-homology (no holes ofboundary length ≥ 4), then triangular transitivity (curl w = 0)implies the existence of global ranking im δ0.

On simple-connected domains, curl-free vector fields areintegrable.

With trivial 1-homology χG , local consistency implies theglobal consistency

A particular case is when G is a complete graph, theprojection on transitivity subspace gives a unique globalranking, called Borda Count (1782) in social choice theory

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Rank aggregation as a linear projection

Example: Erdos-Renyi Random Graph

Theorem (Kahle ’06)

For an Erdos-Renyi random graph G (n, p) with n vertices and eachedge independently emerging with probability p, its clique complexχG almost always has zero 1-homology, except that

1

n2≪ p ≪ 1

n.

Note that full Netflix movie-movie comparison graph is almostcomplete (0.22% missing edges), is that a Erdos-Renyirandom graph?

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Rank aggregation as a linear projection

Which Pairwise Ranking Model is More Consistent?

Curl distribution measures the intrinsic local inconsistency in apairwise ranking:

0 100 200 300 400 500 600 700 800 900 10000

0.5

1

1.5

2

2.5

3x 10

5 Curl Distribution of Pairwise Rankings

Pairwise Score Difference

Pairwise Probability Difference

Probability Log Ratio

Figure: Curl distribution of three pairwise rankings, based on mostpopular 500 movies. The pairwise score difference (the Linear model) inred have the thinnest tail.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Rank aggregation as a linear projection

Comparisons of Netflix Global Rankings

Mean Score Score Difference Probability Difference Logarithmic Odd Ratio

Mean Score 1.0000 0.9758 0.9731 0.9746Score Difference 1.0000 0.9976 0.9977

Probability Difference 1.0000 0.9992Logarithmic Odd Ratio 1.0000

Cyclic Residue - 6.03% 7.16% 7.15%

Table: Kendall’s rank correlation coefficients between different globalrankings for Netflix. Note that the pairwise score difference (the Linearmodel) has the smallest relative residue.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Why pairwise works for netflix: a spectral embedding view

Why Pairwise Ranking Works for Netflix?

Pairwise rankings are good approximations of gradient flowson movie-movie networks

In fact, netflix data in the large scale behave like a1-dimensional curve in high dimensional space

To visualize this, we may use a spectral embedding approach

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Why pairwise works for netflix: a spectral embedding view

Spectral Embedding

Map every movie to a point in S4 by

movie m → (√

p1(m), . . . ,√

p5(m))

where pk(m) is the probability that movie m is rated as stark ∈ {1, . . . , 5}. Obtain a movie-by-star matrix Y .

Do SVD on Y , which is equivalent to do eigenvaluedecomposition on the linear kernel

K (s, t) = 〈s, t〉d , d = 1, s, t ∈ S4

K (s, t) is non-negative, whence the first eigenvector capturesthe centricity (density) of data and the second captures atangent field of the manifold.

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Why pairwise works for netflix: a spectral embedding view

SVD Embedding

Figure: The second singular vector is monotonic to the mean score,indicating the intrinsic parameter of the horseshoe curve is driven by themean score

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Conclusions

Conclusions

Ranking as 1-dimensional scaling of data

Pairwise ranking as approximate gradient fields or flows ongraphs

Hodge Theory provides an orthogonal decomposition forpairwise ranking flows

This decomposition helps characterize the local (triangular)vs. global consistency of pairwise rankings, and gives anatural rank aggregation scheme

Geometry and topology play important roles, but everything isjust linear algebra!

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Acknowledgements

Gunnar Carlsson

Vin de Silva

Persi Diaconis

Leo Guibas

Fei Han

Susan Holmes

Qixing Huang

Xiaoye Jiang

Ming Ma

Jason Morton

Art Owen

Michael Saunders

Harlan Sexton

Steve Smale

Shmuel Weinberger

Ya Xu