analyzing cultural evolution by iterated learning
DESCRIPTION
Analyzing cultural evolution by iterated learning. Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley. Learning languages from utterances. blicket toma dax wug blicket wug. S X Y X {blicket,dax} Y {toma, wug}. Learning functions from ( x , y ) pairs. - PowerPoint PPT PresentationTRANSCRIPT
Analyzing cultural evolution by iterated learning
Tom GriffithsDepartment of Psychology
Cognitive Science Program
UC Berkeley QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Inductive problems
blicket toma
dax wug
blicket wug
S X Y
X {blicket,dax}
Y {toma, wug}
Learning languages from utterances
Learning functions from (x,y) pairs
Learning categories from instances of their members
Learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.data
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Iterated learning(Kirby, 2001)
What are the consequences of learners learning from other learners?
Outline
Part I: Formal analysis of iterated learning
Part II: Iterated learning in the lab
Outline
Part I: Formal analysis of iterated learning
Part II: Iterated learning in the lab
Objects of iterated learning
How do constraints on learning (inductive biases) influence cultural universals?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Language
• The languages spoken by humans are typically viewed as the result of two factors– individual learning – innate constraints (biological evolution)
• This limits the possible explanations for different kinds of linguistic phenomena
Linguistic universals
• Human languages possess universal properties– e.g. compositionality
(Comrie, 1981; Greenberg, 1963; Hawkins, 1988)
• Traditional explanation:– linguistic universals reflect strong innate constraints
specific to a system for acquiring language(e.g., Chomsky, 1965)
Cultural evolution
• Languages are also subject to change via cultural evolution (through iterated learning)
• Alternative explanation:– linguistic universals emerge as the result of the fact that
language is learned anew by each generation (using general-purpose learning mechanisms, expressing weak constraints on languages)
(e.g., Briscoe, 1998; Kirby, 2001)
Analyzing iterated learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
PL(h|d): probability of inferring hypothesis h from data d
PP(d|h): probability of generating data d from hypothesis h
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
• Variables x(t+1) independent of history given x(t)
• Converges to a stationary distribution under easily checked conditions (i.e., if it is ergodic)
x x x x x x x x
Transition matrixP(x(t+1)|x(t))
Markov chains
Analyzing iterated learning
d0 h1 d1 h2PL(h|d) PP(d|h) PL(h|d)
d2 h3PP(d|h) PL(h|d)
d PP(d|h)PL(h|d)h1 h2d PP(d|h)PL(h|d)
h3
A Markov chain on hypotheses
d0 d1h PL(h|d) PP(d|h)d2h PL(h|d) PP(d|h) h PL(h|d) PP(d|h)
A Markov chain on data
Bayesian inference
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Reverend Thomas Bayes
Bayes’ theorem
P(h | d) P(d | h)P(h)
P(d | h )P( h )h H
Posteriorprobability
Likelihood Priorprobability
Sum over space of hypothesesh: hypothesis
d: data
A note on hypotheses and priors
• No commitment to the nature of hypotheses– neural networks (Rumelhart & McClelland, 1986)
– discrete parameters (Gibson & Wexler, 1994)
• Priors do not necessarily represent innate constraints specific to language acquisition– not innate: can reflect independent sources of data– not specific: general-purpose learning algorithms also
have inductive biases expressible as priors
Iterated Bayesian learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
PL (h | d) PP (d | h)P(h)
PP (d | h )P( h )h H
Assume learners sample from their posterior distribution:
Stationary distributions
• Markov chain on h converges to the prior, P(h)
• Markov chain on d converges to the “prior predictive distribution”
P(d) P(d | h)h
P(h)
(Griffiths & Kalish, 2005)
Explaining convergence to the prior
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
• Intuitively: data acts once, prior many times
• Formally: iterated learning with Bayesian agents is a Gibbs sampler on P(d,h)
(Griffiths & Kalish, 2007)
Gibbs sampling
For variables x = x1, x2, …, xn
Draw xi(t+1) from P(xi|x-i)
x-i = x1(t+1), x2
(t+1),…, xi-1(t+1)
, xi+1(t)
, …, xn(t)
Converges to P(x1, x2, …, xn)
(a.k.a. the heat bath algorithm in statistical physics)
(Geman & Geman, 1984)
Gibbs sampling
(MacKay, 2003)
Explaining convergence to the prior
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
When target distribution is P(d,h) = PP(d|h)P(h), conditional distributions are PL(h|d) and PP(d|h)
Implications for linguistic universals
• When learners sample from P(h|d), the distribution over languages converges to the prior– identifies a one-to-one correspondence between
inductive biases and linguistic universals
Iterated Bayesian learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
PL (h | d) PP (d | h)P(h)
PP (d | h )P( h )h H
Assume learners sample from their posterior distribution:
PL (h | d) PP (d | h)P(h)
PP (d | h )P( h )h H
r
r = 1 r = 2 r =
From sampling to maximizing
• General analytic results are hard to obtain– (r = is Monte Carlo EM with a single sample)
• For certain classes of languages, it is possible to show that the stationary distribution gives each hypothesis h probability proportional to P(h)r
– the ordering identified by the prior is preserved, but not the corresponding probabilities
(Kirby, Dowman, & Griffiths, 2007)
From sampling to maximizing
Implications for linguistic universals
• When learners sample from P(h|d), the distribution over languages converges to the prior– identifies a one-to-one correspondence between inductive
biases and linguistic universals
• As learners move towards maximizing, the influence of the prior is exaggerated– weak biases can produce strong universals– cultural evolution is a viable alternative to traditional
explanations for linguistic universals
Infinite populations in continuous time
• “Language dynamical equation”
• “Neutral model” (fj(x) constant)
• Stable equilibrium at first eigenvector of Q, which is our stationary distribution
dx i
dt qij f j (x)
j
x j (x)x i
(Nowak, Komarova, & Niyogi, 2001)
dx i
dt qij
j
x j x i
(Komarova & Nowak, 2003)
dx
dt(Q I)x
Analyzing iterated learning
• The outcome of iterated learning is strongly affected by the inductive biases of the learners– hypotheses with high prior probability ultimately
appear with high probability in the population
• Clarifies the connection between constraints on language learning and linguistic universals…
• …and provides formal justification for the idea that culture reflects the structure of the mind
Outline
Part I: Formal analysis of iterated learning
Part II: Iterated learning in the lab
Inductive problems
blicket toma
dax wug
blicket wug
S X Y
X {blicket,dax}
Y {toma, wug}
Learning languages from utterances
Learning functions from (x,y) pairs
Learning categories from instances of their members
Revealing inductive biases
• Many problems in cognitive science can be formulated as problems of induction– learning languages, concepts, and causal relations
• Such problems are not solvable without bias(e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995)
• What biases guide human inductive inferences?
If iterated learning converges to the prior, then it may provide a method for investigating biases
Serial reproduction(Bartlett, 1932)
General strategy
• Step 1: use well-studied and simple tasks for which people’s inductive biases are known– function learning– concept learning
• Step 2: explore learning problems where effects of inductive biases are controversial– frequency distributions– systems of color terms
Iterated function learning
• Each learner sees a set of (x,y) pairs
• Makes predictions of y for new x values
• Predictions are data for the next learner
data hypotheses
(Kalish, Griffiths, & Lewandowsky, 2007)
Function learning experiments
Stimulus
Response
Slider
Feedback
Examine iterated learning with different initial data
1 2 3 4 5 6 7 8 9
IterationInitialdata
Iterated concept learning
• Each learner sees examples from a species
• Identifies species of four amoebae
• Species correspond to boolean concepts
data hypotheses
(Griffiths, Christian, & Kalish, 2006)
Types of concepts(Shepard, Hovland, & Jenkins, 1961)
Type I
Type II
Type III
Type IV
Type V
Type VI
shape
size
color
Results of iterated learningPr
obab
ilit
y
Iteration
Prob
abil
ity
Iteration
Human learners Bayesian model
Frequency distributions
• Each learner sees objects receiving two labels
• Produces labels for those objects at test
• First learner: one label {0,1,2,3,4,5}/10 times
data hypotheses
(Reali & Griffiths, submitted)
5 x “DUP”
5 x “NEK”P(“DUP”| ) =
(Vouloumanos, 2008)
Results after one generation
Fre
quen
cy o
f ta
rget
labe
l
Condition
Results after five generations
Frequency of target label
The Wright-Fisher model
• Basic model: x copies of gene A in population of N
• With mutation…
x(t1) ~ Binomial(N,) =x(t )
N
x(t1) ~ Binomial(N,) =x(t )(1 u) (N x(t ))v
N
Iterated learning and Wright-Fisher
• Basic model is MAP with uniform prior, mutation model is MAP with Beta prior
• Extends to other models of genetic drift…– connection between drift models and inference
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
x x x
Nv
1 u v,
Nu
1 u v
with Florencia Reali
Cultural evolution of color terms
with Mike Dowman and Jing Xu
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Identifying inductive biases
• Formal analysis suggests that iterated learning provides a way to determine inductive biases
• Experiments with human learners support this idea– when stimuli for which biases are well understood are used,
those biases are revealed by iterated learning
• What do inductive biases look like in other cases?– continuous categories– causal structure– word learning– language learning
Conclusions
• Iterated learning provides a lens for magnifying the inductive biases of learners– small effects for individuals are big effects for groups
• When cognition affects culture, studying groups can give us better insight into individuals
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.data
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
CreditsJoint work with…
Brian ChristianMike Dowman
Mike KalishSimon Kirby
Steve LewandowskyFlorencia Reali
Jing Xu
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Computational Cognitive Science Labhttp://cocosci.berkeley.edu/