automated model construction through compositional grammars

50
Automated Model Construction through Compositional Grammars David Duvenaud Cambridge University Computational and Biological Learning Lab April 22, 2013

Upload: others

Post on 01-Jul-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Model Construction through Compositional Grammars

Automated Model Construction throughCompositional Grammars

David Duvenaud

Cambridge UniversityComputational and Biological Learning Lab

April 22, 2013

Page 2: Automated Model Construction through Compositional Grammars

OUTLINE

I MotivationI Automated structure discovery in regression

I Gaussian process regressionI Structures expressible through kernel compositionI A massive missing pieceI grammar & search over modelsI Examples of structures discovered

I Automated structure discovery in matrix modelsI expressing models as matrix decompositionsI grammar & special casesI examples of structures discovered on images

Page 3: Automated Model Construction through Compositional Grammars

CREDIT WHERE CREDIT IS DUE

Talk based on two papers:I Structure Discovery in Nonparametric Regression through

Compositional Kernel Search [ICML 2013]David Duvenaud, James Robert Lloyd, Roger Grosse,Joshua B. Tenenbaum, Zoubin Ghahramani

I Exploiting compositionality to explore a large space ofmodel structures [UAI 2012]Roger B. Grosse, Ruslan Salakhutdinov,William T. Freeman, Joshua B. Tenenbaum

Page 4: Automated Model Construction through Compositional Grammars

MOTIVATION

I Models today built by hand, or chosen from a fixed set.I Fixed set sometimes not that rich

I Just being nonparametric sometimes isn’t good enoughI to learn efficiently, need to have a rich prior that can

express most of the structure in your data.I Building by hand requires expertise, understanding of the

dataset.I Follows cycle of: propose model, do inference, check

model fitI Propose new modelI Do inferenceI Check model fit

I Andrew Gelman asks: How would an AI do statistics?I It would need a language for describing arbitrarily

complicated models, a way to search over those models,nad a way of checking model fit.

Page 5: Automated Model Construction through Compositional Grammars

FINDING STRUCTURE IN GP REGRESSION( Lin × SE + SE × ( Per + RQ ) )

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

−40

−20

0

20

40

60

=

Lin × SE

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

−40

−20

0

20

40

60

SE × Per

1984 1985 1986 1987 1988 1989

−5

0

5

+SE × RQ

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

−4

−2

0

2

4

Residuals

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

−0.5

0

0.5

Page 6: Automated Model Construction through Compositional Grammars

GAUSSIAN PROCESS REGRESSION

Assume X, y is generated by y = f(X) + εσA GP prior over f means that, for any finite set of points X,

p(f(x)) = N (µ(X),K(X,X))

whereKij = k(Xi,Xj)

is the covariance function or kernel. k(x, x′) = cov [f (x), f (x′)]

Typically, kernel says that nearby x1, x2 will have highlycorrelated function values f (x1), f (x2):

kSE(x, x′) = exp(− 12θ|x−x′|22)

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

Distance

Cov

aria

nce

Page 7: Automated Model Construction through Compositional Grammars

GAUSSIAN PROCESS REGRESSION

Assume X, y is generated by y = f(X) + εσA GP prior over f means that, for any finite set of points X,

p(f(x)) = N (µ(X),K(X,X))

whereKij = k(Xi,Xj)

is the covariance function or kernel. k(x, x′) = cov [f (x), f (x′)]

Typically, kernel says that nearby x1, x2 will have highlycorrelated function values f (x1), f (x2):

kSE(x, x′) = exp(− 12θ|x−x′|22)

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

Distance

Cov

aria

nce

Page 8: Automated Model Construction through Compositional Grammars

SAMPLING FROM A GP

function simple_gp_sample

% Choose a set of x locations.N = 100;x = linspace( -2, 2, N);

% Specify the covariance between function% values, depending on their location.for j = 1:N

for k = 1:Nsigma(j,k) = covariance( x(j), x(k) );

endend

% Specify that the prior mean of f is zero.mu = zeros(N, 1);

% Sample from a multivariate Gaussian.f = mvnrnd( mu, sigma );

plot(x, f);end

% Squared-exp covariance function.function k = covariance(x, y)

k = exp( -0.5*( x - y )^2 );end

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Page 9: Automated Model Construction through Compositional Grammars

SAMPLING FROM A GP

function simple_gp_sample

% Choose a set of x locations.N = 100;x = linspace( -2, 2, N);

% Specify the covariance between function% values, depending on their location.for j = 1:N

for k = 1:Nsigma(j,k) = covariance( x(j), x(k) );

endend

% Specify that the prior mean of f is zero.mu = zeros(N, 1);

% Sample from a multivariate Gaussian.f = mvnrnd( mu, sigma );

plot(x, f);end

% Squared-exp covariance function.function k = covariance(x, y)

k = exp( -0.5*( x - y )^2 );end

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Page 10: Automated Model Construction through Compositional Grammars

SAMPLING FROM A GP

function simple_gp_sample

% Choose a set of x locations.N = 100;x = linspace( -2, 2, N);

% Specify the covariance between function% values, depending on their location.for j = 1:N

for k = 1:Nsigma(j,k) = covariance( x(j), x(k) );

endend

% Specify that the prior mean of f is zero.mu = zeros(N, 1);

% Sample from a multivariate Gaussian.f = mvnrnd( mu, sigma );

plot(x, f);end

% Squared-exp covariance function.function k = covariance(x, y)

k = exp( -0.5*( x - y )^2 );end

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

Page 11: Automated Model Construction through Compositional Grammars

SAMPLING FROM A GP

function simple_gp_sample

% Choose a set of x locations.N = 100;x = linspace( -2, 2, N);

% Specify the covariance between function% values, depending on their location.for j = 1:N

for k = 1:Nsigma(j,k) = covariance( x(j), x(k) );

endend

% Specify that the prior mean of f is zero.mu = zeros(N, 1);

% Sample from a multivariate Gaussian.f = mvnrnd( mu, sigma );

plot(x, f);end

% Squared-exp covariance function.function k = covariance(x, y)

k = exp( -0.5*( x - y )^2 );end

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

Page 12: Automated Model Construction through Compositional Grammars

SAMPLING FROM A GP

function simple_gp_sample

% Choose a set of x locations.N = 100;x = linspace( -2, 2, N);

% Specify the covariance between function% values, depending on their location.for j = 1:N

for k = 1:Nsigma(j,k) = covariance( x(j), x(k) );

endend

% Specify that the prior mean of f is zero.mu = zeros(N, 1);

% Sample from a multivariate Gaussian.f = mvnrnd( mu, sigma );

plot(x, f);end

% Periodic covariance function.function c = covariance(x, y)

c = exp( -0.5*( sin(( x - y )*1.5).^2 ));end

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Page 13: Automated Model Construction through Compositional Grammars

CONDITIONAL POSTERIOR

f (x∗)|X, y ∼ N (k(x∗,X)K−1y,k(x∗, x∗)− k(x∗,X)K−1k(X, x∗))

With SE kernel:

x

f(x)

GP Posterior MeanGP Posterior Uncertainty

Page 14: Automated Model Construction through Compositional Grammars

CONDITIONAL POSTERIOR

f (x∗)|X, y ∼ N (k(x∗,X)K−1y,k(x∗, x∗)− k(x∗,X)K−1k(X, x∗))

With SE kernel:

x

f(x)

GP Posterior MeanGP Posterior UncertaintyData

Page 15: Automated Model Construction through Compositional Grammars

CONDITIONAL POSTERIOR

f (x∗)|X, y ∼ N (k(x∗,X)K−1y,k(x∗, x∗)− k(x∗,X)K−1k(X, x∗))

With SE kernel:

x

f(x)

GP Posterior MeanGP Posterior UncertaintyData

Page 16: Automated Model Construction through Compositional Grammars

CONDITIONAL POSTERIOR

f (x∗)|X, y ∼ N (k(x∗,X)K−1y,k(x∗, x∗)− k(x∗,X)K−1k(X, x∗))

With SE kernel:

x

f(x)

GP Posterior MeanGP Posterior UncertaintyData

Page 17: Automated Model Construction through Compositional Grammars

CONDITIONAL POSTERIOR

f (x∗)|X, y ∼ N (k(x∗,X)K−1y,k(x∗, x∗)− k(x∗,X)K−1k(X, x∗))

With SE kernel:

x

f(x)

GP Posterior MeanGP Posterior UncertaintyData

Page 18: Automated Model Construction through Compositional Grammars

KERNEL CHOICE IS IMPORTANT

I Kernel determines almost all the properties of the prior.I Many different kinds, with very different properties:

0 0

Squared-exp(SE)

local variation Periodic (PER)repeatingstructure

0

0

Linear (LIN)linear func-tions

Rational-quadratic(RQ)

multi-scalevariation

Page 19: Automated Model Construction through Compositional Grammars

KERNELS CAN BE COMPOSED

I Two main operations: adding, multiplying

0 0

LIN × LINquadraticfunctions

SE × PERlocallyperiodic

0

0

LIN + PERperiodic withtrend

SE + PERperiodic withnoise

Page 20: Automated Model Construction through Compositional Grammars

KERNELS CAN BE COMPOSED

I Can be composed across multiple dimensions

00

LIN × SEincreasingvariation

LIN × PERgrowingamplitude

−4−2

02

4

−4

−2

0

2

40

0.2

0.4

0.6

0.8

1

−4−2

02

4

−4

−2

0

2

4−1

0

1

2

3

4

−4−2

02

4

−4

−2

0

2

40

0.2

0.4

0.6

0.8

1

−4−2

02

4

−4

−2

0

2

4−3

−2

−1

0

1

2

SE1 + SE2 f1(x1) +f2(x2) SE1 × SE2 f (x1, x2)

Page 21: Automated Model Construction through Compositional Grammars

SPECIAL CASES

Bayesian linear regression LIN

Bayesian polynomial regression LIN × LIN × . . .Generalized Fourier decomposition PER + PER + . . .

Generalized additive models∑D

d=1 SEd

Automatic relevance determination∏D

d=1 SEd

Linear trend with deviations LIN + SELinearly growing amplitude LIN × SE

Page 22: Automated Model Construction through Compositional Grammars

APPROPRIATE KERNELS ARE NECESSARY FOR

EXTRAPOLATION

I SE kernel→ basic smoothing.I Richer kernels means richer structure can be captured.

Local smooth structure

0 5 10 15−3

−2

−1

0

1

2

3True structure

0 5 10 15−4

−3

−2

−1

0

1

2

3

Page 23: Automated Model Construction through Compositional Grammars

KERNELS ARE HARD TO CHOOSE

I Given the diversity of priors available, how to choose one?I Standard GP software packages include many base kernels

and means to combine them, but no default kernelI Software can’t choose model for you, you’re the expert (?)

Page 24: Automated Model Construction through Compositional Grammars

KERNELS ARE HARD TO CONSTRUCT

I Carl devotes 4 pages of his book to constructing a customkernel for CO2 data

I requires specialized knowledge, trial and error, and adataset small and low-dimensional enough that a humancan interpret it.

I In practice, most users can’t or won’t make custom kernel,and SE kernel became de facto standard kernel throughinertia.

Page 25: Automated Model Construction through Compositional Grammars

RECAP

I GP Regression is a powerful toolI Kernel choice allows for rich structure to be captured -

different kernels express very different model classesI Composition generates a rich space of modelsI Hard & slow to search by handI Can kernel specification be automated?

Page 26: Automated Model Construction through Compositional Grammars

COMPOSITIONAL STRUCTURE SEARCH

I Define grammar over kernels:I K → K + KI K → K × KI K → {SE, RQ, LIN, PER}

I Search the space of kernels greedily by applyingproduction rules, checking model fit (approximatemarginal likelihood).

Page 27: Automated Model Construction through Compositional Grammars

COMPOSITIONAL STRUCTURE SEARCH

No structure

SE RQ

SE + RQ . . . PER + RQ

SE + PER + RQ . . . SE × (PER + RQ)

. . . . . . . . .

. . .

. . . PER × RQ

LIN PER

Page 28: Automated Model Construction through Compositional Grammars

EXAMPLE SEARCH: MAUNA LUA CO2

RQ

2000 2005 2010−20

0

20

40

60

( Per + RQ )

2000 2005 20100

10

20

30

40

SE × ( Per + RQ )

2000 2005 201010

20

30

40

50

( SE + SE × ( Per + RQ ) )

2000 2005 20100

20

40

60

Page 29: Automated Model Construction through Compositional Grammars

EXAMPLE DECOMPOSITION: MAUNA LOA CO2( Lin × SE + SE × ( Per + RQ ) )

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

−40

−20

0

20

40

60

=

Lin × SE

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

−40

−20

0

20

40

60

SE × Per

1984 1985 1986 1987 1988 1989

−5

0

5

+SE × RQ

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

−4

−2

0

2

4

Residuals

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

−0.5

0

0.5

Page 30: Automated Model Construction through Compositional Grammars

COMPOUND KERNELS ARE INTERPRETABLE

Suppose functions f1, f2 are draw from independent GP priors,f1 ∼ GP(µ1, k1), f2 ∼ GP(µ2, k2). Then it follows that

f := f1 + f2 ∼ GP(µ1 + µ2, k1 + k2)

Sum of kernels is equivalent to sum of functions. Distributivitymeans we can write compound kernels as sums of products ofbase kernels:

SE × (RQ + LIN) = SE × RQ + SE × LIN.

Page 31: Automated Model Construction through Compositional Grammars

EXAMPLE DECOMPOSITION: AIRLINESE × ( Lin + Lin × ( Per + RQ ) )

1950 1952 1954 1956 1958 1960 1962

−200

0

200

400

600

=

SE × Lin

1950 1952 1954 1956 1958 1960 1962

−200

−100

0

100

200

300

400

SE × Lin × Per

1950 1952 1954 1956 1958 1960 1962

−200

−100

0

100

200

300

+SE × Lin × RQ

1950 1952 1954 1956 1958 1960 1962

−100

−50

0

50

100

Residuals

1950 1952 1954 1956 1958 1960 1962

−5

0

5

Page 32: Automated Model Construction through Compositional Grammars

EXAMPLE: SUNSPOTS

Page 33: Automated Model Construction through Compositional Grammars

CHANGEPOINT KERNEL

Can express change in covariance:

Periodic changing to SE

Page 34: Automated Model Construction through Compositional Grammars

CHANGEPOINT KERNEL

Can express change in covariance:

SE changing to linear

Page 35: Automated Model Construction through Compositional Grammars

SUMMARY

I Choosing form of kernel is currently done by hand.I Compositions of kernels lead to more interesting priors on

functions than typically considered.I A simple grammar specifies all such compositions, and can

be searched over automatically.I Composite kernels lead to interpretable decompositions.

Page 36: Automated Model Construction through Compositional Grammars

GRAMMARS FOR MATRIX DECOMPOSITIONS

Previous work introduced idea of grammar of compositions:I Exploiting compositionality to explore a large space of

model structures [UAI 2012]Roger B. Grosse, Ruslan Salakhutdinov,William T. Freeman, Joshua B. Tenenbaum

I Slides that follow are from Roger Grosse

Page 37: Automated Model Construction through Compositional Grammars

MATRIX DECOMPOSITION

Page 38: Automated Model Construction through Compositional Grammars

RECURSIVE MATRIX DECOMPOSITION

I Main idea: Matrices can be recursively decomposedI Example: Co-clustering by clustering cluster assignments.

Page 39: Automated Model Construction through Compositional Grammars

BUILDING BLOCKS

Page 40: Automated Model Construction through Compositional Grammars

MATRIX DECOMPOSITION: GRAMMAR

Page 41: Automated Model Construction through Compositional Grammars

MATRIX DECOMPOSITION: SPECIAL CASES

Page 42: Automated Model Construction through Compositional Grammars

EVOLUTION OF IMAGE MODELS

Page 43: Automated Model Construction through Compositional Grammars

APPLICATION TO NATURAL IMAGE PATCHES

Page 44: Automated Model Construction through Compositional Grammars

CONCLUSIONS

I Model-building is currently done mostly by hand.I Grammars over composite structures are a simple way to

specify open-ended model classes.I Composite structures often imply interpretable

decompositions of the data.I Searching over these model classes is a step towards

automating statistical analysis.

Thanks!

Page 45: Automated Model Construction through Compositional Grammars

CONCLUSIONS

I Model-building is currently done mostly by hand.I Grammars over composite structures are a simple way to

specify open-ended model classes.I Composite structures often imply interpretable

decompositions of the data.I Searching over these model classes is a step towards

automating statistical analysis.

Thanks!

Page 46: Automated Model Construction through Compositional Grammars

RELATED WORK

Page 47: Automated Model Construction through Compositional Grammars

EXTRAPOLATION

Page 48: Automated Model Construction through Compositional Grammars

MULTI-D INTERPOLATION

Page 49: Automated Model Construction through Compositional Grammars

GRAMMARS FOR MATRIX DECOMPOSITIONS

Page 50: Automated Model Construction through Compositional Grammars

GRAMMARS FOR MATRIX DECOMPOSITIONS