![Page 1: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/1.jpg)
The dynamics of iterated learning
Tom GriffithsUC Berkeley
with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman
![Page 2: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/2.jpg)
Learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.data
![Page 3: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/3.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Iterated learning(Kirby, 2001)
What are the consequences of learners learning from other learners?
![Page 4: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/4.jpg)
Objects of iterated learning
• Knowledge communicated across generations through provision of data by learners
• Examples:– religious concepts– social norms– myths and legends– causal theories– language
![Page 5: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/5.jpg)
Language
• The languages spoken by humans are typically viewed as the result of two factors– individual learning – innate constraints (biological evolution)
• This limits the possible explanations for different kinds of linguistic phenomena
![Page 6: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/6.jpg)
Linguistic universals
• Human languages possess universal properties– e.g. compositionality
(Comrie, 1981; Greenberg, 1963; Hawkins, 1988)
• Traditional explanation:– linguistic universals reflect innate constraints
specific to a system for acquiring language(e.g., Chomsky, 1965)
![Page 7: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/7.jpg)
Cultural evolution
• Languages are also subject to change via cultural evolution (through iterated learning)
• Alternative explanation:– linguistic universals emerge as the result of the fact
that language is learned anew by each generation (using general-purpose learning mechanisms)
(e.g., Briscoe, 1998; Kirby, 2001)
![Page 8: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/8.jpg)
Analyzing iterated learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
PL(h|d): probability of inferring hypothesis h from data d
PP(d|h): probability of generating data d from hypothesis h
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
![Page 9: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/9.jpg)
• Variables x(t+1) independent of history given x(t)
• Converges to a stationary distribution under easily checked conditions (i.e., if it is ergodic)
x x x x x x x x
Transition matrixT = P(x(t+1)|x(t))
Markov chains
![Page 10: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/10.jpg)
Stationary distributions
• Stationary distribution:
• In matrix form
is the first eigenvector of the matrix T– easy to compute for finite Markov chains– can sometimes be found analytically
€
i = P(x(t +1) = i |j
∑ x(t ) = j)π j = Tijπ j
j
∑
€
=Tπ
![Page 11: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/11.jpg)
Analyzing iterated learning
d0 h1 d1 h2PL(h|d) PP(d|h) PL(h|d)
d2 h3PP(d|h) PL(h|d)
d PP(d|h)PL(h|d)h1 h2d PP(d|h)PL(h|d)
h3
A Markov chain on hypotheses
d0 d1h PL(h|d) PP(d|h)d2h PL(h|d) PP(d|h) h PL(h|d) PP(d|h)
A Markov chain on data
![Page 12: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/12.jpg)
Bayesian inference
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Reverend Thomas Bayes
• Rational procedure for updating beliefs
• Foundation of many learning algorithms
• Lets us make the inductive biases of learners precise
![Page 13: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/13.jpg)
Bayes’ theorem
€
P(h | d) =P(d | h)P(h)
P(d | ′ h )P( ′ h )′ h ∈H
∑
Posteriorprobability
Likelihood Priorprobability
Sum over space of hypothesesh: hypothesis
d: data
![Page 14: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/14.jpg)
Iterated Bayesian learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
€
PL (h | d) =PP (d | h)P(h)
PP (d | ′ h )P( ′ h )′ h ∈H
∑
Assume learners sample from their posterior distribution:
![Page 15: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/15.jpg)
Stationary distributions
• Markov chain on h converges to the prior, P(h)
• Markov chain on d converges to the “prior predictive distribution”
€
P(d) = P(d | h)h
∑ P(h)
(Griffiths & Kalish, 2005)
![Page 16: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/16.jpg)
Explaining convergence to the prior
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
• Intuitively: data acts once, prior many times
• Formally: iterated learning with Bayesian agents is a Gibbs sampler for P(d,h)
(Griffiths & Kalish, submitted)
![Page 17: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/17.jpg)
Gibbs sampling
For variables x = x1, x2, …, xn
Draw xi(t+1) from P(xi|x-i)
x-i = x1(t+1), x2
(t+1),…, xi-1(t+1)
, xi+1(t)
, …, xn(t)
Converges to P(x1, x2, …, xn)
(a.k.a. the heat bath algorithm in statistical physics)
(Geman & Geman, 1984)
![Page 18: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/18.jpg)
Gibbs sampling
(MacKay, 2003)
![Page 19: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/19.jpg)
Explaining convergence to the prior
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
When the target distribution is P(d,h), the conditional distributions are PL(h|d) and PP(d|h)
![Page 20: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/20.jpg)
Implications for linguistic universals
• When learners sample from P(h|d), the distribution over languages converges to the prior– identifies a one-to-one correspondence between
inductive biases and linguistic universals
![Page 21: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/21.jpg)
Iterated Bayesian learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
€
PL (h | d) =PP (d | h)P(h)
PP (d | ′ h )P( ′ h )′ h ∈H
∑
Assume learners sample from their posterior distribution:
![Page 22: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/22.jpg)
€
PL (h | d)∝PP (d | h)P(h)
PP (d | ′ h )P( ′ h )′ h ∈H
∑
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥
r
r = 1 r = 2 r =
From sampling to maximizing
![Page 23: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/23.jpg)
• General analytic results are hard to obtain– (r = is a stochastic version of the EM algorithm)
• For certain classes of languages, it is possible to show that the stationary distribution gives each hypothesis h probability proportional to P(h)r
– the ordering identified by the prior is preserved, but not the corresponding probabilities
(Kirby, Dowman, & Griffiths, submitted)
From sampling to maximizing
![Page 24: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/24.jpg)
Implications for linguistic universals
• When learners sample from P(h|d), the distribution over languages converges to the prior– identifies a one-to-one correspondence between inductive
biases and linguistic universals
• As learners move towards maximizing, the influence of the prior is exaggerated– weak biases can produce strong universals– cultural evolution is a viable alternative to traditional
explanations for linguistic universals
![Page 25: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/25.jpg)
Analyzing iterated learning
• The outcome of iterated learning is determined by the inductive biases of the learners– hypotheses with high prior probability ultimately
appear with high probability in the population
• Clarifies the connection between constraints on language learning and linguistic universals…
• …and provides formal justification for the idea that culture reflects the structure of the mind
![Page 26: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/26.jpg)
Conclusions
• Iterated learning provides a lens for magnifying the inductive biases of learners– small effects for individuals are big effects for groups
• When cognition affects culture, studying groups can give us better insight into individuals
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.data
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 27: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2b5503460f94a005c6/html5/thumbnails/27.jpg)