part iv: monte carlo and nonparametric bayes
DESCRIPTION
Part IV: Monte Carlo and nonparametric Bayes. Outline. Monte Carlo methods Nonparametric Bayesian models. Outline. Monte Carlo methods Nonparametric Bayesian models. The Monte Carlo principle. The expectation of f with respect to P can be approximated by - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/1.jpg)
Part IV: Monte Carlo and nonparametric Bayes
![Page 2: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/2.jpg)
Outline
Monte Carlo methods
Nonparametric Bayesian models
![Page 3: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/3.jpg)
Outline
Monte Carlo methods
Nonparametric Bayesian models
![Page 4: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/4.jpg)
The Monte Carlo principle
• The expectation of f with respect to P can be approximated by
where the xi are sampled from P(x)
• Example: the average # of spots on a die roll €
EP(x ) f (x)[ ] ≈ 1n
f (x i)i=1
n
∑
![Page 5: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/5.jpg)
The Monte Carlo principle
€
EP(x ) f (x)[ ] ≈ f (x i)i=1
n
∑QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Ave
rage
num
ber o
f spo
ts
Number of rolls
The law of large numbers
![Page 6: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/6.jpg)
Two uses of Monte Carlo methods
1. For solving problems of probabilistic inference involved in developing computational models
2. As a source of hypotheses about how the mind might solve problems of probabilistic inference
![Page 7: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/7.jpg)
Making Bayesian inference easier
€
P(h | d) = P(d | h)P(h)P(d | ′ h )P( ′ h )
′ h ∈H∑
Evaluating the posterior probability of a hypothesis requires considering all hypotheses
Modern Monte Carlo methods let us avoid this
![Page 8: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/8.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 9: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/9.jpg)
Modern Monte Carlo methods
• Sampling schemes for distributions with large state spaces known up to a multiplicative constant
• Two approaches:– importance sampling (and particle filters)– Markov chain Monte Carlo
![Page 10: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/10.jpg)
Importance sampling
Basic idea: generate from the wrong distribution, assign weights to samples to correct for this
€
E p(x ) f (x)[ ] = f (x)p(x)dx∫
€
= f (x) p(x)q(x)
q(x)∫ dx
€
≈1n
f (x ii=1
n
∑ ) p(x i)q(x i)
for x i ~ q(x)
![Page 11: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/11.jpg)
Importance sampling
works when sampling from proposal is easy, target is hard
![Page 12: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/12.jpg)
An alternative scheme…
€
E p(x ) f (x)[ ] ≈ 1n
f (x ii=1
n
∑ ) p(x i)q(x i)
for x i ~ q(x)
€
E p(x ) f (x)[ ] ≈f (x i
i=1
n
∑ ) p(x i)q(x i)
p(x i)q(x i)i=1
n
∑for x i ~ q(x)
works when p(x) is known up to a multiplicative constant
![Page 13: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/13.jpg)
Likelihood weighting
• A particularly simple form of importance sampling for posterior distributions
• Use the prior as the proposal distribution• Weights:
€
p(h | d)p(h)
= p(d | h)p(h)p(d)p(h)
= p(d | h)p(d)
∝ p(d | h)
![Page 14: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/14.jpg)
Likelihood weighting
• Generate samples of all variables except observed variables
• Assign weights proportional to probability of observed data given values in sample
X1 X2
X3X4
![Page 15: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/15.jpg)
Importance sampling• A general scheme for sampling from complex
distributions that have simpler relatives• Simple methods for sampling from posterior
distributions in some cases (easy to sample from prior, prior and posterior are close)
• Can be more efficient than simple Monte Carlo– particularly for, e.g., tail probabilities
• Also provides a solution to the question of how people can update beliefs as data come in…
![Page 16: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/16.jpg)
Particle filtering
d1 d2 d3 d4
s1 s2 s3 s4
We want to generate samples from P(s4|d1, …, d4)
€
P(s4 | d1,...,d4 )∝ P(d4 | s4 )P(s4 | d1,...,d3)
= P(d4 | s4 ) P(s4 | s3)P(s3 | d1,...,d3)s3
∑We can use likelihood weighting if we can sample
from P(s4|s3) and P(s3|d1, …, d3)
![Page 17: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/17.jpg)
Particle filtering
€
P(s4 | d1,...,d4 )∝ P(d4 | s4 ) P(s4 | s3)P(s3 | d1,...,d3)s3
∑
samples from P(s3|d1,…,d3)
samples from P(s4|d1,…,d3)
sample fromP(s4|s3)
weight byP(d4|s4)
weighted atomsP(s4|d1,…,d4)
samples fromP(s4|d1,…,d4)
![Page 18: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/18.jpg)
The promise of particle filters• People need to be able to update probability
distributions over large hypothesis spaces as more data become available
• Particle filters provide a way to do this with limited computing resources… – maintain a fixed finite number of samples
• Not just for dynamic models– can work with a fixed set of hypotheses, although this
requires some further tricks for maintaining diversity
![Page 19: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/19.jpg)
Markov chain Monte Carlo
• Basic idea: construct a Markov chain that will converge to the target distribution, and draw samples from that chain
• Just uses something proportional to the target distribution (good for Bayesian inference!)
• Can work in state spaces of arbitrary (including unbounded) size (good for nonparametric Bayes)
![Page 20: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/20.jpg)
Variables x(t+1) independent of all previous variables given immediate predecessor x(t)
Markov chains
x x x x x x x x
Transition matrixT = P(x(t+1)|x(t))
![Page 21: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/21.jpg)
An example: card shuffling• Each state x(t) is a permutation of a deck of cards (there
are 52! permutations)• Transition matrix T indicates how likely one
permutation will become another• The transition probabilities are determined by the
shuffling procedure– riffle shuffle– overhand– one card
![Page 22: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/22.jpg)
Convergence of Markov chains
• Why do we shuffle cards?• Convergence to a uniform distribution takes
only 7 riffle shuffles…• Other Markov chains will also converge to a
stationary distribution, if certain simple conditions are satisfied (called “ergodicity”)– e.g. every state can be reached in some number of
steps from every other state
![Page 23: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/23.jpg)
Markov chain Monte Carlo
• States of chain are variables of interest• Transition matrix chosen to give target
distribution as stationary distribution
x x x x x x x x
Transition matrixT = P(x(t+1)|x(t))
![Page 24: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/24.jpg)
Metropolis-Hastings algorithm
• Transitions have two parts:– proposal distribution: Q(x(t+1)|x(t))
– acceptance: take proposals with probability
A(x(t),x(t+1)) = min( 1, )P(x(t+1)) Q(x(t)|x(t+1))P(x(t)) Q(x(t+1)|x(t))
![Page 25: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/25.jpg)
Metropolis-Hastings algorithm
p(x)
![Page 26: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/26.jpg)
Metropolis-Hastings algorithm
p(x)
![Page 27: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/27.jpg)
Metropolis-Hastings algorithm
p(x)
![Page 28: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/28.jpg)
Metropolis-Hastings algorithm
A(x(t), x(t+1)) = 0.5
p(x)
![Page 29: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/29.jpg)
Metropolis-Hastings algorithm
p(x)
![Page 30: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/30.jpg)
Metropolis-Hastings algorithm
A(x(t), x(t+1)) = 1
p(x)
![Page 31: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/31.jpg)
Metropolis-Hastings in a slide
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 32: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/32.jpg)
Gibbs sampling
Particular choice of proposal distribution
For variables x = x1, x2, …, xn
Draw xi(t+1) from P(xi|x-i)
x-i = x1(t+1), x2
(t+1),…, xi-1(t+1)
, xi+1(t)
, …, xn(t)
(this is called the full conditional distribution)
![Page 33: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/33.jpg)
In a graphical model…
X1 X2
X3X4
X1 X2
X3X4
X1 X2
X3X4
X1 X2
X3X4
Sample each variable conditioned on its Markov blanket
![Page 34: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/34.jpg)
Gibbs sampling
(MacKay, 2002)
X1 X2
X1 X2
![Page 35: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/35.jpg)
The magic of MCMC• Since we only ever need to evaluate the relative
probabilities of two states, we can have huge state spaces (much of which we rarely reach)
• In fact, our state spaces can be infinite– common with nonparametric Bayesian models
• But… the guarantees it provides are asymptotic– making algorithms that converge in practical
amounts of time is a significant challenge
![Page 36: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/36.jpg)
MCMC and cognitive science• The main use of MCMC is for probabilistic
inference in complex models • The Metropolis-Hastings algorithm seems like a
good metaphor for aspects of development…• A form of cultural evolution can be shown to be
equivalent to Gibbs sampling (Griffiths & Kalish, 2007)
• We can also use MCMC algorithms as the basis for experiments with people…
![Page 37: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/37.jpg)
Samples from Subject 3(projected onto plane from LDA)
![Page 38: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/38.jpg)
Two uses of Monte Carlo methods
1. For solving problems of probabilistic inference involved in developing computational models
2. As a source of hypotheses about how the mind might solve problems of probabilistic inference
3. As a way to explore people’s subjective probability distributions
Three
![Page 39: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/39.jpg)
Outline
Monte Carlo methods
Nonparametric Bayesian models
![Page 40: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/40.jpg)
Nonparametric Bayes• Nonparametric models…
– can capture distributions outside parametric families– have infinitely many parameters– grow in complexity with the data
• Provide a way to automatically determine how much structure can be inferred from data– how many clusters?– how many dimensions?
![Page 41: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/41.jpg)
How many clusters?
Nonparametric approach:Dirichlet process mixture models
![Page 42: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/42.jpg)
Mixture models• Each observation is assumed to come from a
single (possibly previously unseen) cluster• The probability that the ith sample belongs to the
kth cluster is
• Where p(xi|zi=k) reflects the structure of cluster k (e.g. Gaussian) and p(zi=k) is its prior probability
€
p(zi = k | x i)∝ p(x i | zi = k)p(zi = k)
![Page 43: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/43.jpg)
Dirichlet process mixture models• Use a prior that allows infinitely many clusters
(but finitely many for finite observations)• The ith sample is drawn from the kth cluster with
probability
where is a parameter of the model(known as the “Chinese restaurant process”)
![Page 44: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/44.jpg)
Nonparametric Bayes and cognition• Nonparametic Bayesian models are useful for
answering questions about how much structure people should infer from data
• Many cognitive science questions take this form– how should we represent categories? – what features should we identify for objects?
![Page 45: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/45.jpg)
Nonparametric Bayes and cognition• Nonparametic Bayesian models are useful for
answering questions about how much structure people should infer from data
• Many cognitive science questions take this form– how should we represent categories? – what features should we identify for objects?
![Page 46: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/46.jpg)
The Rational Model of Categorization(RMC; Anderson 1990; 1991)
• Computational problem: predicting a feature based on observed data– assume that category labels are just features
• Predictions are made on the assumption that objects form clusters with similar properties– each object belongs to a single cluster– feature values likely to be the same within clusters– the number of clusters is unbounded
![Page 47: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/47.jpg)
Representation in the RMCFlexible representation can interpolate between
prototype and exemplar models
Feature Value Feature Value
Pro
bab i
lity
Pr o
bab i
lity
![Page 48: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/48.jpg)
The “optimal solution” The probability of the missing feature (i.e., the
category label) taking a certain value is
where j is a feature value, Fn are the observed features of a set of n objects, and xn is a partition of objects into clusters
posterior over partitions
![Page 49: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/49.jpg)
The prior over partitions• An object is assumed to have a constant probability of joining
same cluster as another object, known as the coupling probability• This allows some probability that a stimulus forms a new cluster,
so the probability that the ith object is assigned to the kth cluster is
![Page 50: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/50.jpg)
Equivalence Neal (1998) showed that the prior for the RMC
and the DPMM are the same, with
RMC prior:
DPMM prior:
![Page 51: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/51.jpg)
The computational challenge The probability of the missing feature (i.e., the
category label) taking a certain value is
where j is a feature value, Fn are the observed features of a set of n objects, and xn is a partition of objects into groups
n 1 2 3 4 5 6 7 8 9 10
|xn| 1 2 5 15 52 203 877 4140 21147 115975
![Page 52: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/52.jpg)
Anderson’s approximation• Data observed sequentially• Each object is deterministically
assigned to the cluster with the highest posterior probability
• Call this the “Local MAP”– choosing the cluster with the
maximum a posteriori probability
111
111
011
111
0110.54 0.46
111
011
100
111
011
1000.33 0.67
Final partition
![Page 53: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/53.jpg)
Two uses of Monte Carlo methods
1. For solving problems of probabilistic inference involved in developing computational models
2. As a source of hypotheses about how the mind might solve problems of probabilistic inference
![Page 54: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/54.jpg)
Alternative approximation schemes
• There are several methods for making approximations to the posterior in DPMMs– Gibbs sampling– Particle filtering
• These methods provide asymptotic performance guarantees (in contrast to Anderson’s procedure)
(Sanborn, Griffiths, & Navarro, 2006)
![Page 55: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/55.jpg)
Gibbs sampling for the DPMM
111
011
100
Starting Partition
111
011
100
111
011
100
111
011
100
111
011
100
0.33 0.67
111
011
100
0.48 0.12 0.40
111
011
100
0.33
111
011
100
0.67
Sample #1
Sample #2
• All the data are required at once (a batch procedure)
• Each stimulus is sequentially assigned to a cluster based on the assignments of all of the remaining stimuli
• Assignments are made probabilistically, using the full conditional distribution
![Page 56: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/56.jpg)
Particle filter for the DPMM• Data are observed
sequentially• The posterior
distribution at each point is approximated by a set of “particles”
• Particles are updated, and a fixed number of are carried over from trial to trial
111 111
111
011
111
0110.27 0.23
111
011
111
0110.27 0.23
111
011
1000.17
111
011
1000.33 0.08
111
011
100
111
100
0110.16
111
011
1000.26
Sample #1
Sample #2
![Page 57: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/57.jpg)
Approximating the posterior
• For a single order, the Local MAP will produce a single partition
• The Gibbs sampler and particle filter will approximate the exact DPMM distribution
111
011
100
111
011
100
111
100
011
111
011
100
111
011
100
111
011
100
111
011
100
111
100
011
111
011
100
111
011
100
![Page 58: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/58.jpg)
Order effects in human data• The probabilistic model underlying the
DPMM does not produce any order effects– follows from exchangeability
• But… human data shows order effects(e.g., Medin & Bettger, 1994)
• Anderson and Matessa tested local MAP predictions about order effects in an unsupervised clustering experiment
(Anderson, 1990)
![Page 59: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/59.jpg)
Anderson and Matessa’s Experiment
• Subjects were shown all sixteen stimuli that had four binary features
• Front-anchored ordered stimuli emphasized the first two features in the first eight trials; end-anchored ordered emphasized the last two
scadsporm
scadstirm
sneksporb
snekstirb
sneksporm
snekstirm
scadsporb
scadstirb
snadstirb
snekstirb
scadsporm
sceksporm
sneksporm
snadsporm
scedstirb
scadstirb
Front-Anchored Order
End-Anchored Order
![Page 60: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/60.jpg)
Front-Anchored Order
End-Anchored Order
Experimental Data
0.55
0.30
Local MAP
1.00
0.00
Particle Filter (1)
0.59
0.38
Particle Filter (100)
0.50
0.50
Gibbs Sampler
0.48
0.49
Proportion that are Divided Along a Front-Anchored Feature
Anderson and Matessa’s Experiment
![Page 61: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/61.jpg)
A “rational process model”• A rational model clarifies a problem and serves
as a benchmark for performance• Using a psychologically plausible
approximation can change a rational model into a “rational process model”
• Research in machine learning and statistics has produced useful approximations to statistical models which can be tested as general-purpose psychological heuristics
![Page 62: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/62.jpg)
Nonparametric Bayes and cognition• Nonparametic Bayesian models are useful for
answering questions about how much structure people should infer from data
• Many cognitive science questions take this form– how should we represent categories? – what features should we identify for objects?
![Page 63: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/63.jpg)
Learning the features of objects
• Most models of human cognition assume objects are represented in terms of abstract features
• What are the features of this object?
• What determines what features we identify?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
(Austerweil & Griffiths, submitted)
![Page 64: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/64.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 65: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/65.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 66: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/66.jpg)
Binary matrix factorization
![Page 67: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/67.jpg)
Binary matrix factorization
How should we infer the number of features?
![Page 68: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/68.jpg)
The nonparametric approach
Use the Indian buffet process as a prior on Z
Assume that the total number of features is unbounded, but only a finite number will be
expressed in any finite dataset
(Griffiths & Ghahramani, 2006)
![Page 69: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/69.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
(Austerweil & Griffiths, submitted)
![Page 70: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/70.jpg)
Summary
• Sophisticated tools from Bayesian statistics can be valuable in developing probabilistic models of cognition…
• Monte Carlo methods provide a way to perform inference in probabilistic models, and a source of ideas for process models and experiments
• Nonparametric models help us tackle questions about how much structure to infer, with unbounded hypothesis spaces
• We look forward to seeing what you do with them!
![Page 71: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/71.jpg)
Resources…• “Bayesian models of cognition” chapter in
Handbook of Computational Psychology• Tom’s Bayesian reading list:
– http://cocosci.berkeley.edu/tom/bayes.html – tutorial slides will be posted there!
• Trends in Cognitive Sciences special issue on probabilistic models of cognition (vol. 10, iss. 7)
• IPAM graduate summer school on probabilistic models of cognition (with videos!)
![Page 72: Part IV: Monte Carlo and nonparametric Bayes](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815bac550346895dc9aeb8/html5/thumbnails/72.jpg)