Learning causal theories
Josh TenenbaumMIT
Department of Brain and Cognitive SciencesComputer Science and AI Lab (CSAIL)
Collaborators
Charles Kemp Noah Goodman
Tom Griffiths Vikash Mansinghka
Structure
Data
A standard answer: infer the network structure that best fits the statistics of the observed data.
How do people learn causal relations from data?
What’s missing from this account? The background knowledge that makes causal learning possible.• Causal schemata: domain-specific theories that constrain “natural”
causal hypotheses– Abstract classes of variables and mechanisms– Causal laws defined over these classes
• Causal variables: substrate of causal hypotheses– Which variables are relevant– How variables ground out in perceptual and motor experience
The puzzle: This background knowledge must itself be learned, and learned together with specific causal relations. How?
A possible answer: hierarchical Bayesian models
Learning causal schemata
Causal schema
Causalmodel
Eventdata
Behaviors Diseases Symptoms
high-fat dietworking in factory…
heart diseaselung cancer…
coughingchest pain…
(Griffiths & Tenenbaum; Kemp, Goodman, Tenenbaum)
TO ADD• # of Bayes nets on 12 variables:
521939651343829405020504063
• # of Bayes nets on 12 variables that fit this schema: 131072
• Maybe put the bayes net learning slide next.
recoveredmodel
1 2 3 4 5 6
7 8 9 10 11 12 13 14 15 16
20 80 1000
Data
Causal model
Data
Causal model
Causal schema
recoveredmodel
# samples
(Mansinghka, Kemp, Tenenbaum, Griffiths)
Towards moreschema-basedmachine learning
1 2 3 4 5 6…
7 8 9 10 11 12 1314 15 16…
…c1
c2
0.4 “blessing of abstraction”
Learning causal schemata
cashew
almond
chestnut
walnut macadamia ?
(Kemp, Goodman, Tenenbaum)
rash
?Causal model
EventData
macadamia
nut rash
Learning causal schemata
(Kemp, Goodman, Tenenbaum)
rash
+0.54
rash
+0.47
rash
?
almond
rash
chestnut
rash… …… …
Causal model
EventData
walnut cashew macadamia
nut rash
nut rash
nut rash
nut rash
nut rash
Learning causal schemata
(Kemp, Goodman, Tenenbaum)
rash
+0.54
rash
+0.47
rash
+0.5?
almond
rash
chestnut
rash
walnut cashew macadamiaalmond chestnut T2
T2
rash
+0.5
T1
rash
… …… …
Causal model
EventData
Causal schema
walnut cashew macadamia
nut rash
nut rash
nut rash
nut rash
nut rash
T1
Learning causal schemata
GO
+ + + +
o1o3
o2o4
e
o5o7
o6o8
e
1)
Type 2Type 1
o1o3
o2o4
e
o5o7
o6o8
e
o1o4
o2o5
e
o3o6
o1o4
o2o5
e
o3o6
2) 3) 4)
One-shot learning: Design
• Training phase: 1 of 4 conditions shown above.• Test phase: new object activates the machine once.
new object fails on the machine once.
+0.5
+0.1 +0.9 +0.1
One-shot learning: Resultso1o3
o2o4
e+0.5
o5o7
o6o8
e
o1o3
o2o4
e+0.1
o5o7
o6o8
e+0.9
o1o4
o2o5
e
o3o6
Con
ditio
nM
odel
Peo
ple
strength strength strength
like
lihoo
dlik
elih
ood
o1o4
o2o5
e
o3o6
+0.1
strength
Question: what is the causal strength of ?
What’s missing from this account?
• Causal schemas: domain-specific theories that constrain “natural” causal hypotheses– Abstract classes of variables and mechanisms– Causal laws defined over these classes
• Causal variables: constituents of causal hypotheses– Which variables are relevant– How variables ground out in perceptual and motor experience
A possible answer: hierarchical Bayesian models (Kemp, Goodman, Griffiths, Tenenbaum).
The problem
A child learns that petting the cat leads to purring, while pounding leads to growling. But what are the origins of these symbolic event concepts (“variables”) over which causal links are defined?• Option 1: Variables are innate.
• Option 2 (“clusters than causes”): Variables are learned first, independent of causal relations, through a kind of bottom-up perceptual clustering.
• Option 3: Variables are learned together with causal relations.
?
Learning grounded causal models(Goodman, Mansinghka & Tenenbaum)
Hypotheses:
Data:
Time t Time t’
…
A
B
C
“Alien control panel” experiment
Blue bar: human
Red bar: model
A B C A B C A B C
A B C A B C A B C A B C A B C
“blessing of abstraction”
Testing joint model vs. bottom-up model
How many variables are discovered?A B C
Joint Model
Humans Blue bars: 3 variables
Red bars: 4 variables
Bottom-up Model
Conclusions• Hierarchical Bayesian models (HBMs) explain how the background
knowledge that supports causal learning may itself be learned from data through rational inferential means.– Domain-specific schemas constraining candidate causal networks
– Causal variables grounded in sensorimotor experience
• These issues are more general than just causal learning, relevant to learning associations, symbolic rules, …
• Contrast with traditional approaches to knowledge acquisition:– Classical empiricism: variables are innate, schemata learned slowly by accretion
and superposition.
– Classical nativism: variables are innate, schemata are innate.
– Hierarchical Bayes: variables and schemata could be learned; abstract knowledge may be learned from surprisingly little data.
• Ongoing and future work: Applying HBMs to many different aspects of cognitive development – categories and properties, word learning, syntax in language, social relations, theory of mind, …
A B C A B C A B C
“Alien control panel” experiment
A B C
a
b
c
Modeling learning curves
Blue bar: human
Red bar: model