new models for relational classification ricardo silva (statslab) joint work with wei chu and zoubin...

56
New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Upload: jody-campbell

Post on 29-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

New Models for Relational Classification

Ricardo Silva (Statslab)

Joint work with Wei Chu and Zoubin Ghahramani

Page 2: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

The talk

Classification with non-iid data A source of non-iidness: relational

information A new family of models, and what is

new Applications to classification of text

documents

Page 3: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

The prediction problem

X

Y

Page 4: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Standard setup

X

Y

N Xnew

Ynew

Page 5: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Prediction with non-iid data

X1

Y1

Xnew

Ynew

X2

Y2

Page 6: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Where does the non-iid information come from?

Relations Links between data points

Webpage A links to Webpage B Movie A and Movie B are often rented together

Relations as data “Linked webpages are likely to present similar

content” “Movies that are rented together often have

correlated personal ratings”

Page 7: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

The vanilla relational domain: time-series

Relations: “Yi precedes Yi + k”, k > 0 Dependencies: “Markov structure G”

Y1 Y2 Y3… …

Page 8: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

A model for integrating link data

How to model the class labels dependencies?

Movies that are rented together often might have all other sources of common, unmeasured factors

These hidden common causes affect the ratings

Page 9: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Example

MovieFeatures(M1)

Rating(M1)

MovieFeatures(M2)

Rating(M2)

Same genre?

Both released in same year?

Same director?

Target same age groups?

Page 10: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Integrating link data

Of course, many of these common causes will be measured

Many will not Idea:

Postulate a hidden common cause structure, based on relations

Define a model Markov to this structure Design an adequate inference algorithm

Page 11: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Example: Political Books database

A network of books about recent US politics sold by the online bookseller Amazon.com Valdis Krebs, http://www.orgnet.com/

Relations: frequent co-purchasing of books by the same buyers Political inclination factors as the hidden

common causes

Page 12: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Political Books relations

Page 13: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Political Books database

Features: I collected the Amazon.com front page

for each of the books Bag-of-words, tf-idf features, normalized

to unity Task:

Binary classification: “liberal” or “not-liberal” books

43 liberal books out of 105

Page 14: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Contribution

We will show a classical multiple linear

regression model built a relational variation generalize with a more complex set of

independence constraints generalize it using Gaussian processes

Page 15: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Seemingly unrelated regression (Zellner,1962)

Y = (Y1, Y2), X = (X1, X2)

Suppose you regress Y1 ~ X1, X2 and X2 turns out to be useless Analogously for Y2 ~ X1, X2

(X1 vanishes) Suppose you regress

Y1 ~ X1, X2, Y2 And now every variable is a

relevant predictor

X1 X2

Y1

X

X1 X2

Y1

Y2

Page 16: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Graphically, with latents

Capital(GE)

Stock price(GE)

Capital(Westinghouse)

Stock price(Westinghouse)

Industry factor 1Industry factor 2

Industry factor k?

X:

Y:

Page 17: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

The Directed Mixed Graph (DMG)

Capital(GE)

Stock price(GE)

Capital(Westinghouse)

Stock price(Westinghouse)

X:

Y:

Richardson (2003), Richardson and Spirtes (2002)

Page 18: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

A new family of relational models

Inspired by SUR Structure: DMG graphs

Edges postulated from given relations

X1

Y1

Y3

Y4

Y2

Y5

X2 X3 X4 X5

Page 19: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Model for binary classification

Nonparametric Probit regression

Zero-mean Gaussian process prior over f( . )

P(yi = 1| xi) = P(y*(xi) > 0)

y*(xi) = f(xi) + i, i ~ N(0, 1)

Page 20: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Relational dependency model

Make {} dependent multivariate Gaussian

For convenience, decouple it into two error terms

= * +

Page 21: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Dependency model: the decomposition

= * +

Independent from each other

Marginally independent Dependent according to relations

=* +

Diagonal Not diagonal, with 0s onlyon unrelated pairs

Page 22: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Dependency model: the decomposition

If K was the original kernel matrix for f(. ), the covariance of g(. ) is simply

y*(xi) = f(xi) + = f(xi) + + * = g(xi) + *

g(.) = K + *

Page 23: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approximation

Posterior for f(.), g(.) is a truncated Gaussian, hard to integrate

Approximate posterior with a Gaussian Expectation-Propagation (Minka, 2001)

The reason for * becomes apparent in the EP approximation

Page 24: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approximation

Likelihood does not factorize over f( . ), but factorizes over g( . )

Approximate each factor p(yi | g(xi)) with a Gaussian if * were 0, yi would be a deterministic

function of g(xi)

p(g | x, y) p(g | x) p(yi | g(xi))i

Page 25: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Generalizations

This can be generalized for any number of relations

Y1

Y3

Y4

Y2

Y5

= * + 1 + 2 + 3

Page 26: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

But how to parameterize ?

Non-trivial Desiderata:

Positive definite Zeroes on the right places Few parameters, but broad family Easy to compute

Page 27: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

But how to parameterize ?

“Poking zeroes” on a positive definite matrix doesn’t work

Y1 Y2 Y3

1 0.8 0.8

0.8 1 0.8

0.8 0.8 1

1 0.8 0

0.8 1 0.8

0 0.8 1

positive definite not positive definite

Page 28: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approach #1

Assume we can find all cliques for the bi-directed subgraph of relations

Create a “factor analysis model”, where for each clique Ci there is a latent variable Li

members of each clique are the only children of Li

Set of latents {L} is a set of N(0, 1) variables coefficients in the model are equal to 1

Page 29: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approach #1

Y1 = L1 + 1

Y2 = L1 + L2 + 2

Y1

Y3

Y4

Y2

L1 L2

Y1 Y3Y2 Y4

Page 30: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approach #1

In practice, we set the variance of each to a small constant (10-4)

Covariance between any two Ys is proportional to the number of cliques they

belong together inversely proportional to the number of

cliques they belong to individually

Page 31: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approach #1

Let U be the correlation matrix obtained from the proposed procedure

To define the error covariance, use a single hyperparameter [0, 1]

*

=(I – Udiag) + U

Page 32: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approach #1

Notice: if everybody is connected, model is exchangeable and simple

Y1

Y3

Y4

Y2

L1

Y1 Y3Y2 Y4

=1

1

1

1

Page 33: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approach #1

Finding all cliques is “impossible”, what to do?

Triangulate and them extract cliques Can be done in polynomial time

This is a relaxation of the problem, since constraints are thrown away

Can have bad side effects: the “Blow-Up” effect

Page 34: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Political Books dataset

Page 35: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Political Books dataset:the “Blow-up” effect

Page 36: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approach #2

Don’t look for cliques: create a latent for each pair of variables

Very fast to compute, zeroes respected

Y1

Y3

Y4

Y2

Y1

Y3

Y4

Y2

L13

L13

L13

L13

Page 37: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approach #2

Correlations, however, are given by

Penalizes nodes with many neighbors, even if Yi and Yj have many neighbors in common

We call this the “pulverization” effect

Sqrt(#neigh(i) . #neigh(j))

1Corr(i, j)

Page 38: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Political Books dataset

Page 39: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Political Books dataset:the “pulverization” effect

Page 40: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

WebKB dataset: links of pages in University of Washington

Page 41: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approach #1

Page 42: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Approach #2

Page 43: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Comparison:undirected models

Generative stories Conditional random fields (Lafferty,

McCallum, Pereira, 2001) Wei et al., 2006/Richardson and Spirtes,

2002;

X1

Y1 Y3Y2

X2 X3

Page 44: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Chu Wei’s model

Y1*

Y1 Y3Y2

Y2* Y3

*

X1 X2 X3

R12 = 1 R23 = 1

Dependency family equivalent to a pairwise Markov random field

Y1 Y3Y2

Page 45: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Properties of undirected models

MRFs propagate information among “test” points

Y1 Y7

Y6

Y5

Y8

Y10

Y9 Y12Y11

Y2 Y4Y3

Page 46: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Properties of DMG models

DMGs propagate information among “training” points

Y1 Y7

Y6

Y5

Y8

Y10

Y9 Y12Y11

Y2 Y4Y3

Page 47: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Properties of DMG models

In a DMG, each “test” point will have in the Markov blanket a whole “training component”

Y1 Y7

Y6

Y5

Y8

Y10

Y9 Y12Y11

Y2 Y4Y3

Page 48: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Properties of DMG models

It seems acceptable that a typical relational domain will not have a “extrapolation” pattern Like typical “structured output” problems,

e.g., NLP domains Ultimately, the choice of model

concerns the question: “Hidden common causes” or

“relational indicators”?

Page 49: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Experiment #1

A subset of the CORA database 4,285 machine learning papers, 7 classes Links: citations between papers

“hidden common cause” interpretation: particular ML subtopic being treated

Experiment: 7 binary classification problems, Class 5 vs. others

Criterion: AUC

Page 50: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Experiment #1

Comparisons: Regular GP Regular GP + citation adjacency matrix Chu Wei’s Relational GP (RGP) Our method, miXed graph GP (XGP)

Fairly easy task Analysis of low-sample tasks

Uses 1% of the data (roughly 10 data points for training)

Not that useful for XGP, but more useful for RGP

Page 51: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Experiment #1

Chu Wei’s method get up to 0.99 in several of those…

Page 52: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Experiment #2

Political Books database 105 datapoints, 100 runs using 50% for training

Comparison with standard Gaussian processes Linear kernels

Results 0.92 for regular GP 0.98 for XGP (using pairwise kernel generator)

Hyperparameters optimized by grid search Difference: 0.06 with std 0.02 Chu Wei’s method does the same…

Page 53: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Experiment #3

WebKB Collections of webpages from 4 different

universities Task: “outlier classification”

Identify which pages are not a student, course, project or faculty pages

10% for training data (still not that hard) However, an order of magnitude of more data

than in Cora

Page 54: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Experiment #3

As far as I know, XGP gets easily the best results on this task

Page 55: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Future work

Tons of possibilities on how to parameterize output covariance matrix Incorporating relation attributes too

Heteroscedastic relational noise Mixtures of relations New approximation algorithms Clustering problems On-line learning

Page 56: New Models for Relational Classification Ricardo Silva (Statslab) Joint work with Wei Chu and Zoubin Ghahramani

Thank You