revealing inductive biases with bayesian models tom griffiths uc berkeley with mike kalish, brian...

66
Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley Mike Kalish, Brian Christian, and Steve Lewa

Post on 21-Dec-2015

217 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Revealing inductive biases with Bayesian models

Tom GriffithsUC Berkeley

with Mike Kalish, Brian Christian, and Steve Lewandowsky

Page 2: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Inductive problems

blicket toma

dax wug

blicket wug

S X Y

X {blicket,dax}

Y {toma, wug}

Learning languages from utterances

Learning functions from (x,y) pairs

Learning categories from instances of their members

Page 3: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Generalization requires induction

Generalization: predicting the properties of an entity from observed properties of others

y

x

Page 4: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

What makes a good inductive learner?

• Hypothesis 1: more representational power– more hypotheses, more complexity– spirit of many accounts of learning and development

Page 5: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Some hypothesis spaces

Linear functions

Quadratic functions

8th degree polynomials€

g(x) = p1x + p0

g(x) = p2x 2 + p1x + p0

g(x) = p j xj

j= 0

8

Page 6: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Minimizing squared error

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 7: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Minimizing squared error

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 8: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Minimizing squared error

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 9: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Minimizing squared error

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 10: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Measuring prediction error

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 11: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

What makes a good inductive learner?

• Hypothesis 1: more representational power– more hypotheses, more complexity– spirit of many accounts of learning and development

• Hypothesis 2: good inductive biases– constraints on hypotheses that match the environment

Page 12: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Outline

The bias-variance tradeoff

Bayesian inference and inductive biases

Revealing inductive biases

Conclusions

Page 13: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Outline

The bias-variance tradeoff

Bayesian inference and inductive biases

Revealing inductive biases

Conclusions

Page 14: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

A simple schema for induction• Data D are n pairs (x,y)

generated from function f

• Hypothesis space of functions, y = g(x)

• Error is E = (y - g(x))2

• Pick function g that minimizes error on D

• Measure prediction error, averaging over x and y

y

x

Page 15: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Bias and variance

• A good learner makes (f(x) - g(x))2 small

• g is chosen on the basis of the data D

• Evaluate learners by the average of (f(x) - g(x))2 over data D generated from f

E p(D ) ( f (x) − g(x))2[ ] =

( f (x) − E p(D )[g(x)])2 + E p(D ) g(x) − E p(D )[g(x)][ ]2

bias variance

(Geman, Bienenstock, & Doursat, 1992)

Page 16: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Making things more intuitive…

• The next few slides were generated by:– choosing a true function f(x)– generating a number of datasets D from p(x,y) defined

by uniform p(x), p(y|x) = f(x) plus noise

– finding the function g(x) in the hypothesis space that minimized the error on D

• Comparing average of g(x) to f(x) reveals bias

• Spread of g(x) around average is the variance

Page 17: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Linear functions (n = 10)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 18: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Linear functions (n = 10)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

}bias

pink is g(x) for each dataset

red is average g(x)

black is f(x)

y

x

}variance

Page 19: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Quadratic functions (n = 10)

pink is g(x) for each dataset

red is average g(x)

black is f(x)

y

x

Page 20: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

8-th degree polynomials (n = 10)

pink is g(x) for each dataset

red is average g(x)

black is f(x)

y

x

Page 21: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Bias and variance

(for our (quadratic) f(x), with n = 10)

Linear functionshigh bias, medium variance

Quadratic functionslow bias, low variance

8-th order polynomialslow bias, super-high variance

Page 22: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

In general…

• Larger hypothesis spaces result in higher variance, but low bias across several f(x)

• The bias-variance tradeoff:– if we want a learner that has low bias on a range of

problems, we pay a price in variance

• This is mainly an issue when n is small– the regime of much of human learning

Page 23: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Quadratic functions (n = 100)

pink is g(x) for each dataset

red is average g(x)

black is f(x)

y

x

Page 24: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

8-th degree polynomials (n = 100)

pink is g(x) for each dataset

red is average g(x)

black is f(x)

y

x

Page 25: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

The moral

• General-purpose learning mechanisms do not work well with small amounts of data– more representational power isn’t always better

• To make good predictions from small amounts of data, you need a bias that matches the problem– these biases are the key to successful induction, and

characterize the nature of an inductive learner

• So… how can we identify human inductive biases?

Page 26: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Outline

The bias-variance tradeoff

Bayesian inference and inductive biases

Revealing inductive biases

Conclusions

Page 27: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Bayesian inference

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Reverend Thomas Bayes

• Rational procedure for updating beliefs

• Foundation of many learning algorithms

• Lets us make the inductive biases of learners precise

Page 28: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Bayes’ theorem

P(h | d) =P(d | h)P(h)

P(d | ′ h )P( ′ h )′ h ∈H

Posteriorprobability

Likelihood Priorprobability

Sum over space of hypothesesh: hypothesis

d: data

Page 29: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Priors and biases

• Priors indicate the kind of world a learner expects to encounter, guiding their conclusions

• In our function learning example…– likelihood gives probability to data that decrease with

sum squared errors (i.e. a Gaussian)– priors are uniform over all functions in hypothesis

spaces of different kinds of polynomials– having more functions corresponds to a belief in a more

complex world…

Page 30: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Outline

The bias-variance tradeoff

Bayesian inference and inductive biases

Revealing inductive biases

Conclusions

Page 31: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Two ways of using Bayesian models

• Specify models that make different assumptions about priors, and compare their fit to human data

(Anderson & Schooler, 1991;

Oaksford & Chater, 1994;

Griffiths & Tenenbaum, 2006)

• Design experiments explicitly intended to reveal the priors of Bayesian learners

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 32: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Iterated learning(Kirby, 2001)

What are the consequences of learners learning from other learners?

Page 33: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Objects of iterated learning

• Knowledge communicated across generations through provision of data by learners

• Examples:– religious concepts– social norms– myths and legends– causal theories– language

Page 34: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Analyzing iterated learning

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

PL(h|d): probability of inferring hypothesis h from data d

PP(d|h): probability of generating data d from hypothesis h

PL(h|d)

PP(d|h)

PL(h|d)

PP(d|h)

Page 35: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

• Variables x(t+1) independent of history given x(t)

• Converges to a stationary distribution under easily checked conditions (i.e., if it is ergodic)

x x x x x x x x

Transition matrixT = P(x(t+1)|x(t))

Markov chains

Page 36: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Analyzing iterated learning

d0 h1 d1 h2PL(h|d) PP(d|h) PL(h|d)

d2 h3PP(d|h) PL(h|d)

d PP(d|h)PL(h|d)h1 h2d PP(d|h)PL(h|d)

h3

A Markov chain on hypotheses

d0 d1h PL(h|d) PP(d|h)d2h PL(h|d) PP(d|h) h PL(h|d) PP(d|h)

A Markov chain on data

Page 37: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Iterated Bayesian learning

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

PL(h|d)

PP(d|h)

PL(h|d)

PP(d|h)

PL (h | d) =PP (d | h)P(h)

PP (d | ′ h )P( ′ h )′ h ∈H

Assume learners sample from their posterior distribution:

Page 38: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Stationary distributions

• Markov chain on h converges to the prior, P(h)

• Markov chain on d converges to the “prior predictive distribution”

P(d) = P(d | h)h

∑ P(h)

(Griffiths & Kalish, 2005)

Page 39: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Explaining convergence to the prior

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

PL(h|d)

PP(d|h)

PL(h|d)

PP(d|h)

• Intuitively: data acts once, prior many times

• Formally: iterated learning with Bayesian agents is a Gibbs sampler on P(d,h)

(Griffiths & Kalish, in press)

Page 40: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Revealing inductive biases

• If iterated learning converges to the prior, it might provide a tool for determining the inductive biases of human learners

• We can test this by reproducing iterated learning in the lab, with stimuli for which human biases are well understood

Page 41: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Iterated function learning

• Each learner sees a set of (x,y) pairs

• Makes predictions of y for new x values

• Predictions are data for the next learner

data hypotheses

(Kalish, Griffiths, & Lewandowsky, in press)

Page 42: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Function learning experiments

Stimulus

Response

Slider

Feedback

Examine iterated learning with different initial data

Page 43: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

1 2 3 4 5 6 7 8 9

IterationInitialdata

Page 44: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Identifying inductive biases

• Formal analysis suggests that iterated learning provides a way to determine inductive biases

• Experiments with human learners support this idea– when stimuli for which biases are well understood are used,

those biases are revealed by iterated learning

• What do inductive biases look like in other cases?– continuous categories– causal structure– word learning– language learning

Page 45: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Outline

The bias-variance tradeoff

Bayesian inference and inductive biases

Revealing inductive biases

Conclusions

Page 46: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Conclusions

• Solving inductive problems and forming good generalizations requires good inductive biases

• Bayesian inference provides a way to make assumptions about the biases of learners explicit

• Two ways to identify human inductive biases:– compare Bayesian models assuming different priors– design tasks to extract biases from Bayesian learners

• Iterated learning provides a lens for magnifying the inductive biases of learners– small effects for individuals are big effects for groups

Page 47: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky
Page 48: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Iterated concept learning

• Each learner sees examples from a species

• Identifies species of four amoebae

• Iterated learning is run within-subjects

data hypotheses

(Griffiths, Christian, & Kalish, in press)

Page 49: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Two positive examples

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

data (d)

hypotheses (h)

Page 50: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Bayesian model(Tenenbaum, 1999; Tenenbaum & Griffiths, 2001)

P(h | d) =P(d | h)P(h)

P(d | ′ h )P( ′ h )′ h ∈H

∑d: 2 amoebaeh: set of 4 amoebae

P(d | h) =1/ h

m

0

⎧ ⎨ ⎩

d ∈ h

otherwise

m: # of amoebae in the set d (= 2)|h|: # of amoebae in the set h (= 4)

P(h | d) =P(h)

P( ′ h )h '|d ∈h'

∑Posterior is renormalized prior

What is the prior?

Page 51: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Classes of concepts(Shepard, Hovland, & Jenkins, 1961)

Class 1

Class 2

Class 3

Class 4

Class 5

Class 6

shape

size

color

Page 52: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Experiment design (for each subject)Class 1Class 2Class 3Class 4Class 5Class 6Class 1Class 2Class 3Class 4Class 5Class 6

6 iterated learning chains

6 independent

learning “chains”

Page 53: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Estimating the prior

data (d)hy

poth

eses

(h)

Page 54: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Estimating the prior

Class 1Class 2

Class 3

Class 4

Class 5

Class 6

0.8610.087

0.009

0.002

0.013

0.028

Prior

r = 0.952

Bayesian modelHuman subjects

Page 55: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Two positive examples(n = 20)

Prob

abil

ity

Iteration

Prob

abil

ity

Iteration

Human learners Bayesian model

Page 56: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Two positive examples(n = 20)

Prob

abil

ity

Bayesian model

Human learners

Page 57: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Three positive examples

data (d)

hypotheses (h)

Page 58: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Three positive examples(n = 20)

Prob

abil

ity

Iteration

Prob

abil

ity

Iteration

Human learners Bayesian model

Page 59: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Three positive examples(n = 20)

Bayesian model

Human learners

Page 60: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky
Page 61: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Serial reproduction(Bartlett, 1932)

• Participants see stimuli, then reproduce them from memory

• Reproductions of one participant are stimuli for the next

• Stimuli were interesting, rather than controlled– e.g., “War of the Ghosts”

Page 62: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky
Page 63: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

Discovering the biases of models

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Generic neural network:

Page 64: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Discovering the biases of models

EXAM (Delosh, Busemeyer, & McDaniel, 1997):

Page 65: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Discovering the biases of models

POLE (Kalish, Lewandowsky, & Kruschke, 2004):

Page 66: Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky