combinatorial hodge theory and a geometric approach to ranking

36
Combinatorial Hodge Theory and a Geometric Approach to Ranking Combinatorial Hodge Theory and a Geometric Approach to Ranking Yuan Yao 2008 SIAM Annual Meeting San Diego, July 7, 2008 with Lek-Heng Lim et al.

Upload: others

Post on 12-Sep-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a GeometricApproach to Ranking

Yuan Yao

2008 SIAM Annual Meeting

San Diego, July 7, 2008

with Lek-Heng Lim et al.

Page 2: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Outline

1 Ranking on networks (graphs)Netflix exampleSkew-symmetric matrices and network flows

2 Combinatorial Hodge TheoryDiscrete Differential GeometryHodge Decomposition TheoremRank aggregation as a linear projection

3 Why pairwise works for netflix: a spectral embedding view

4 Conclusions

Page 3: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Ranking on Networks (Graphs)

“Multi-criteria” rank/decision systems• Amazon or Netflix’s recommendation system (user-product)• Interest ranking in social networks (person-interest)• Financial analyst recommendation system (analyst-stock)• Voting (voter-candidate)

“Peer-review” systems• publication citation systems (paper-paper)• Google’s webpage ranking (web-web)• eBay’s reputation system (customer-customer)

Page 4: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

continued...

Ranking data are

incomplete: partial list or pairwise (e.g. ∼ 1% in netflix)

unbalanced: varied distributed votes (e.g. power law)

cardinal: scores or stochastic choices

Implicitly or explicitly, ranking data may be viewed to live on asimple graph G = (V ,E )

V : set of alternatives (products, interests,...) to be ranked

E : pairs of alternatives compared

Page 5: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Netflix example

Example: Netflix

Example (Netflix Customer-Product Rating)

480189-by-17770 customer-product 5-star rating matrix X

X is incomplete, with 98.82% of missing values

However,

pairwise comparison graph G = (V ,E ) is very dense!

only 0.22% edges are missed, almost complete

rank aggregation without estimating missing values!

unbalanced: number of raters on e ∈ E varies.

Caveat: we are not trying to solve the Netflix prize problem

Page 6: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Netflix example

Netflix example continued

The first-order statistics, mean score for each product, is ofteninadequate because of the following:

most customers just rate a very small portion of the products

different products might have different raters, whence meanscores involve noise due to arbitrary individual rating scales

not able to characterize the inconsistency (ubiquitous for rankaggregation: Arrow”s impossibility theorem)

How about high order statistics?

Page 7: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

From 1st Order to 2nd Order: Pairwise Rankings

Linear Model: average score difference between product i andj over all customers who have rated both of them,

wij =

k(Xkj − Xki )

#{k : Xki ,Xkj exist} ,

is translation invariant.

Log-linear Model: when all the scores are positive, thelogarithmic average score ratio,

wij =

k(log Xkj − log Xki )

#{k : Xki ,Xkj exist} ,

is invariant up to a multiplicative constant.

Page 8: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

More Invariants

Linear Probability Model: the probability that product j ispreferred to i in excess of a purely random choice,

wij = Pr{k : Xkj > Xki} −1

2.

This is invariant up to a monotone transformation.

Bradley-Terry Model: logarithmic odd ratio (logit)

wij = logPr{k : Xkj > Xki}Pr{k : Xkj < Xki}

.

This is invariant up to a monotone transformation.

Page 9: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Skew-Symmetric Matrices of Pairwise Rankings

All such models induce (sparse) skew-symmetric matrices of|V |-by-|V | (or 2-alternating tensor),

wij =

{

−wji , {i , j} ∈ ENA, otherwise

where G = (V ,E ) is a pairwise comparison graph.Note: Such a skew-symmetric matrix induces a pairwise rankingnetwork flow on graph G .

Page 10: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Pairwise Ranking Flow for IMDB Top 20 Movies

Figure: Pairwise ranking flow on a complete graph

Page 11: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Rank Aggregation Problem

Hardness:

Arrow-Sen’s impossibility theorems in social choice theory

Kemeny-Snell optimal ordering is NP-hard to compute

Spectral analysis on Sn is impractical for large n since|Sn| = n!

Our approach:

Problem (Rank Aggregation)

Does there exist a global ranking function, v : V → R, such that

wij = vj − vi =: δ0(v)(i , j)?

Equivalently, does there exist a scalar field v : V → R whosegradient field gives the flow w?

Page 12: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Answer: Not Always!

From multivariate calculus, there are non-integrable vector fields(cf. movie A Beautiful Mind). A combinatorial version:

1

A

B

C

1

1

B

1 2

1

1

1

1

1

2

C

D

EF

A

Figure: No global ranking v gives wij = vj − vi : (a) cyclic ranking, notewAB + wBC + wCA 6= 0; (b) contains a 4-node cyclic flowA→ C → D → E → A, note on 3-clique {A, B, C} (also {A, E , F}),wAB + wBC + wCA = 0

Page 13: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Triangular Transitivity: null triangular-trace

Fact

For a skew-symmetric matrix W = (wij) associated with graph G,

∃v : wij = vj − vi⇒wij + wjk + wki = 0,∀ 3-clique {i , j , k}

Note:

Transitivity subspace: null triangular-trace or curl-free

{W : wij + wjk + wki = 0,∀ 3-clique {i , j , k}}

Example in the last slide, (a) lies outside; (b) lies in thissubspace, but not a gradient flow.

Page 14: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Hodge decomposition for skew-symmetric matrices

A skew-symmetric matrix W associated with G has an uniqueorthogonal decomposition

W = W1 + W2 + W3

where

W1 satisfies (’integrable’): W1(i , j) = vj − vi for somev : V → R;

W2 satisfies that• (’curl-free’) W2(i , j) + W2(j , k) + W3(k, i) = 0 for all3-clique {i , j , k}• (’divergence-free’)

j W3(i , j) = 0 for all edge {i , j} ∈ E ;

W3 ⊥W1 and W3 ⊥W2.

Page 15: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Ranking on networks (graphs)

Skew-symmetric matrices and network flows

Hodge decomposition for network flows

A network flows (e.g. pairwise rankings) on graph G has anorthogonal decomposition into

gradient flow + locally acyclic (harmonic) + locally cyclic

where the first two components lie in the transitivity subspace and

gradient flow is integrable to give a global ranking

example (b) is locally acyclic, but cyclic on large scale(harmonic)

example (a) is locally cyclic

Page 16: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Clique Complex and Discrete Differential Forms

We extend graph G to a simplicial complex χG by attachingtriangles

0-simplices χ0G : vertices V

1-simplices χ1G : edges E such that comparison (i.e. pairwise

ranking) between i and j exists

2-simplices χ2G : triangles {i , j , k} such that every edge exists

Note: it suffices here to construct χG up to dimension 2!

global ranking v : V → R, 0-forms, vectors

pairwise ranking w(i , j) = −w(j , i) (∀(i , j) ∈ E ), 1-forms,skew-symmetric matrices, network flows

Page 17: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Space of k-Forms (k-cochains) and Metrics

k-forms:

C k(χG , R) = {u : χk+1G → R, uσ(i0),...,σ(ik ) = sign(σ)ui0,...,ik}

for (i0, . . . , ik) ∈ χk+1G , where σ ∈ Sk+1 is a permutation on

(0, . . . , k). Also k + 1-alternating tensors.

One may associate C k(χG , R) with inner-products.

In particular, the following inner-product on 1-forms is forunbalance issue in pairwise ranking

〈wij , ωij〉D =∑

(i ,j)∈E

Dijwijωij , w , ω ∈ C 1(χG , R)

where Dij = |{customers who rate both i and j}|.

Page 18: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Discrete Exterior Derivatives (Coboundary Maps)

k-coboundary maps δk : C k(χG , R)→ C k+1(χG , R) aredefined as the alternating difference operator

(δku)(i0, . . . , ik+1) =k+1∑

j=0

(−1)j+1u(i0, . . . , ij−1, ij+1, . . . , ik+1)

δk plays the role of differentiation

In particular,• (δ0v)(i , j) = vj − vi =: (gradv)(i , j)• (δ1w)(i , j , k) = (±)wij + wjk + wki =: (curlw)(i , j , k)(triangular-trace of skew-symmetric matrix (wij))

Page 19: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Curl

Definition

For each triangle {i , j , k}, the curl (triangular trace)

wij + wjk + wki

measures the total flow-sum along the loop i → j → k → i .

(δ1w)(i , j , k) = 0 implies the flow is path-independent, whichdefines the triangular transitivity subspace.

Page 20: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Two directions of cochain maps:

ForwardC 0 δ0−→ C 1 δ1−→ C 2,

in other words,

Globalgrad−−→ Pairwise

curl−−→ Triplewise

Backward

Globalδ∗0 =grad∗←−−−−−− Pairwise

δ∗1 =curl∗←−−−−− 3-alternating C 2

• grad∗ = δ∗0 is the negative divergence• curl∗ = δ∗1 , is the boundary operator, gives triangular(locally) cyclic pairwise rankings along triangles

Page 21: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Divergence

Definition

For each alternative i ∈ V , the divergence

(div w)(i) := −(δT0 w)(i) :=

wi∗

measures the inflow-outflow sum at i .

(δT0 w)(i) = 0 implies alternative i is preference-neutral in all

pairwise comparisons.

divergence-free flow δT0 w = 0 is cyclic

With metric D, conjugate operator gives weighted flow-sum

(δ∗0w)(i) =∑

j

wijDij = (δT0 Dw)(i)

Page 22: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

A Fundamental Property: Closed Map Property

’Boundary of boundary is empty’: δk+1 ◦ δk = 0

in particular• gradient flow is curl-free: curl ◦ grad = δ2 ◦ δ1 = 0• circular flow is divergence-free: div ◦ curl∗ = δ∗1 ◦ δ∗2 = 0

This leads to the powerful combinatorial Laplacians.

Page 23: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Discrete Differential Geometry

Combinatorial Laplacians

Define the k-dimensional combinatorial Laplacian,∆k : C k → C k by

∆k = δk−1δ∗k−1 + δ∗kδk , k > 0

k = 0, ∆0 = δT0 δ0 is the well-known graph Laplacian

k = 1,∆1 = curl ◦ curl∗− grad ◦ div

Important Properties:• ∆k positive semi-definite• ker(∆k) = ker(δT

k−1) ∩ ker(δk): divergence-free andcurl-free, called harmonics

Page 24: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Hodge Decomposition Theorem

Hodge Decomposition Theorem

Theorem

The space of pairwise rankings, C 1(χG , R), admits an orthogonaldecomposition into three

C 1(χG , R) = im(δ0)⊕ H1 ⊕ im(δ∗1)

whereH1 = ker(δ1) ∩ ker(δ∗0) = ker(∆1).

Page 25: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Hodge Decomposition Theorem

Hodge Decomposition Illustration

ker δ1

⊕ ⊕

CYCLIC(divergence-free)

LOCALLY CONSISTENT(curl-free)

im δ0

Harmonic

H1

Locally cyclic

im δ∗1

(locally acyclic)

Global

(globally acyclic)

ker δ∗0

Figure: Hodge decomposition for pairwise rankings

Page 26: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Hodge Decomposition Theorem

Harmonic Rankings: Locally but NOT Globally Consistent

B

1 2

1

1

1

1

1

2

C

D

EF

A

Figure: (a) a harmonic ranking; (b) from truncated Netflix network

Page 27: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Rank aggregation as a linear projection

Rank aggregation as a linear projection

Corollary

Every pairwise ranking admits a unique orthogonal decomposition,

w = projim δ0w + projker(δ∗0 ) w

i.e.pairwise = grad(global) + cyclic

Particularly the first projection grad(global) gives a global ranking

x∗ = (δ∗0δ0)†δ∗0w = −(∆0)

† div w

Page 28: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Rank aggregation as a linear projection

When Harmonic Ranking Vanishes

Corollary

If the clique complex χG has trivial 1-homology (no holes ofboundary length ≥ 4), then triangular transitivity (curl w = 0)implies the existence of global ranking im δ0.

On simple-connected domains, curl-free vector fields areintegrable.

With trivial 1-homology χG , local consistency implies theglobal consistency

A particular case is when G is a complete graph, theprojection on transitivity subspace gives a unique globalranking, called Borda Count (1782) in social choice theory

Page 29: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Rank aggregation as a linear projection

Example: Erdos-Renyi Random Graph

Theorem (Kahle ’06)

For an Erdos-Renyi random graph G (n, p) with n vertices and eachedge independently emerging with probability p, its clique complexχG almost always has zero 1-homology, except that

1

n2≪ p ≪ 1

n.

Note that full Netflix movie-movie comparison graph is almostcomplete (0.22% missing edges), is that a Erdos-Renyirandom graph?

Page 30: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Rank aggregation as a linear projection

Which Pairwise Ranking Model is More Consistent?

Curl distribution measures the intrinsic local inconsistency in apairwise ranking:

0 100 200 300 400 500 600 700 800 900 10000

0.5

1

1.5

2

2.5

3x 10

5 Curl Distribution of Pairwise Rankings

Pairwise Score Difference

Pairwise Probability Difference

Probability Log Ratio

Figure: Curl distribution of three pairwise rankings, based on mostpopular 500 movies. The pairwise score difference (the Linear model) inred have the thinnest tail.

Page 31: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory

Rank aggregation as a linear projection

Comparisons of Netflix Global Rankings

Mean Score Score Difference Probability Difference Logarithmic Odd Ratio

Mean Score 1.0000 0.9758 0.9731 0.9746Score Difference 1.0000 0.9976 0.9977

Probability Difference 1.0000 0.9992Logarithmic Odd Ratio 1.0000

Cyclic Residue - 6.03% 7.16% 7.15%

Table: Kendall’s rank correlation coefficients between different globalrankings for Netflix. Note that the pairwise score difference (the Linearmodel) has the smallest relative residue.

Page 32: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Why pairwise works for netflix: a spectral embedding view

Why Pairwise Ranking Works for Netflix?

Pairwise rankings are good approximations of gradient flowson movie-movie networks

In fact, netflix data in the large scale behave like a1-dimensional curve in high dimensional space

To visualize this, we may use a spectral embedding approach

Page 33: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Why pairwise works for netflix: a spectral embedding view

Spectral Embedding

Map every movie to a point in S4 by

movie m → (√

p1(m), . . . ,√

p5(m))

where pk(m) is the probability that movie m is rated as stark ∈ {1, . . . , 5}. Obtain a movie-by-star matrix Y .

Do SVD on Y , which is equivalent to do eigenvaluedecomposition on the linear kernel

K (s, t) = 〈s, t〉d , d = 1, s, t ∈ S4

K (s, t) is non-negative, whence the first eigenvector capturesthe centricity (density) of data and the second captures atangent field of the manifold.

Page 34: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Why pairwise works for netflix: a spectral embedding view

SVD Embedding

Figure: The second singular vector is monotonic to the mean score,indicating the intrinsic parameter of the horseshoe curve is driven by themean score

Page 35: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Conclusions

Conclusions

Ranking as 1-dimensional scaling of data

Pairwise ranking as approximate gradient fields or flows ongraphs

Hodge Theory provides an orthogonal decomposition forpairwise ranking flows

This decomposition helps characterize the local (triangular)vs. global consistency of pairwise rankings, and gives anatural rank aggregation scheme

Geometry and topology play important roles, but everything isjust linear algebra!

Page 36: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Acknowledgements

Gunnar Carlsson

Vin de Silva

Persi Diaconis

Leo Guibas

Fei Han

Susan Holmes

Qixing Huang

Xiaoye Jiang

Ming Ma

Jason Morton

Art Owen

Michael Saunders

Harlan Sexton

Steve Smale

Shmuel Weinberger

Ya Xu