no slide title - mit opencourseware...- 2 premises 0.90 0.67 n/a (pos. and neg.) property type...

35
Outline Theory-based Bayesian framework for property induction Causal structure induction – Constraint-based (bottom-up) learning – Theory-based Bayesian learning

Upload: others

Post on 05-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Outline

• Theory-based Bayesian framework for property induction

• Causal structure induction – Constraint-based (bottom-up) learning – Theory-based Bayesian learning

Page 2: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

The Bayesian approach

?

Species 1 Species 2 ? Species 3 ? Species 4 ? Species 5 ? Species 6 ? Species 7 ? Species 8 ? Species 9 Species 10 ?

Features New property

Page 3: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

x

x

CowMutation process generates p(h|T): Horse – Choose label for root. Rhino

Gorilla– Probability that label mutates Chimp

1− e−2λ balong branch b : Dolphin Seal2

λ = mutation rate T|b| = length of branch b p(h|T)

h dCowHorse ?

?Rhino ?Gorilla

ChimpDolphin ?

?Seal

Features Generalization New property Hypothesis

Page 4: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Results

"Theory-based" Bayes

Images removed due to copyright considerations.

"Empiricist" Bayes

Max-sim

Page 5: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

A Bayesian dreamPrior based on mutations over tree structure

addresses all the challenges to traditional Bayesian concept learning (Mitchell, Tenenbaum, etc.) – Assign a reasonable prior over all logically

possible concepts (labelings) in a potentially unbounded domain, with natural Occam’s razor.

– Efficiently integrate over all logically possible concepts consistent with the training data.

– Robust with respect to label noise. – PAC-style guarantees of generalization.

Page 6: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Bayes with alternative theories• Taxonomic Bayes (strictly taxonomic

hypotheses, with no mutation process)

Page 7: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Results

Bias is Theory-based

Bayes just

right!

Taxonomic Bayes Images removed due to

copyright considerations.

Bias is too

strong

Bias is “Empiricist” too

Bayes weak

Page 8: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

p

“all mammals”

a e w nt o nele am rs n hiill o usar o eo h ihi c irrh pu sh op rc olg qme dsel

Cows have property P. Seals have property P. Dolphins have property P. Dolphins have property P. Squirrels have property P. Squirrels have property P.

All mammals have property P. All mammals have property P.

Strong: 0.76 [max = 0.82] Weak: 0.30 [min = 0.14] l

Page 9: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Bayes with alternative theories• Taxonomic Bayes (strictly taxonomic

hypotheses, with no mutation process) • Theory-based Bayes using actual

evolutionary tree.

Page 10: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Results

Theory-basedBayes

Theory-based Bayes w/

evolutionary tree

“Empiricist”Bayes

Images removed due tocopyright considerations.

Bias is just

right!

Bias is wrong

Bias is too

weak

Page 11: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Bayes with alternative theories• Taxonomic Bayes (strictly taxonomic

hypotheses, with no mutation process) • Theory-based Bayes using actual

evolutionary tree. • Replace mutation process with generic

“Occam’s Razor” prior over branches of tree: – e.g., p(feature changes along branch b) = λ, independent of branch length.

Page 12: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Bayes(taxonomy+mutation)

Bayes(taxonomy+

Occam)

Max-sim

Conclusion kind:

Number of

Images removed due to copyright considerations.

“all mammals”

Premise typicality effect (Rips, 1975; Osherson et al., 1990):

Strong:

Horses have property P.

All mammals have property P.

Weak:

Seals have property P.

All mammals have property P.

examples: 1

Page 13: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Typicality meets hierarchies•

hierarchically Collins and Quillian: semantic memory structured

Figure of semantic trees from Quillian (1968). Quillian, M. R. "Semantic Memory." In Semantic Information Processing. Edited by M. Minsky. Cambridge, MA: MIT Press,1968, pp. 216-270. Courtesy of the MIT Press. Used with permission.

• Traditional story: Simple hierarchical structure uncomfortable with typicality effects & exceptions.

• New story: Typicality & exceptions compatible with rational statistical inference over hierarchy.

Page 14: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Bayes with alternative theories• Taxonomic Bayes (strictly taxonomic

hypotheses, with no mutation process) • Theory-based Bayes using actual

evolutionary tree. • Replace mutation process with generic

“Occam’s Razor” prior over branches of tree.

• Infinite flat mixture model (essentially, Anderson’s model of categorization)

Page 15: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Beaver Otter Rat

Weasel Raccoon

Chihuahua Persian Cat Siamese Cat

Collie German Shepherd

Lion Tiger

Leopard Wolf

Bobcat Fox

Polar Bear Grizzly Bear

Dalmatian

Best Cluster Structure

Gorilla Chimp

Monkey Bat

Rabbit Mouse

Hamster Mole Skunk

Squirrel

Cow Pig Ox

Sheep Buffalo Moose Horse Zebra

Antelope Deer

Giraffe Rhinoceros

Elephant Hippopotamus

Dolphin Seal

Humpback Whale Blue Whale

Walrus Killer Whale

Giant Panda

Page 16: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Results with flat mixture modelCows can catch Disease X Rhinos can catch Disease X

Horses can catch Disease X

Gorillas can catch Disease X Mice can catch Disease X Seals can catch Disease X

All mammals can catch Disease X

Specific

Model

General

Peop

le

Peop

le

Model

Page 17: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Results with flat mixture modelPersian Cats have property X

Otters have property X

Conclusion Animal

Arg

umen

t Stre

ngth

Persian Cats have property X

Blue whales have property X

Persian Cats have property X

Pigs have property X

Page 18: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Beyond similarity-based induction

• Reasoning based on known

Poodles can bite through wire.

German shepherds can bite through wire.

Dobermans can bite through wire.dimensions: (Smith et al., 1993) German shepherds can bite through wire.

Page 19: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Beyond similarity-based induction

• Reasoning based on known

Poodles can bite through wire.

German shepherds can bite through wire.

Dobermans can bite through wire.dimensions: (Smith et al., 1993) German shepherds can bite through wire.

• Reasoning based on causal relations:

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

(Medin et al., Grizzly bears carry E. Spirus bacteria. 2004; Coley & Salmon carry E. Spirus bacteria.Shafto, 2003)

Page 20: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Property type “has T4 neurons” “can bite through wire” “carry E. Spirus bacteria”

Theory type taxonomic tree directed chain directed network + mutation + random threshold + noisy transmission

Class D

Class A

Class B

Class DClass A Class B Class A

Class F Class EClass C Class D Class C Class C Class E Class GClass EClass F Class B Class G

Class FHypotheses Class G

Class AClass BClass C

. . . . . .. . .Class DClass EClass FClass G

Page 21: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Property type “can bite through wire”

Theory type directed chain + random threshold

Class D

Class A Class F

Class C

Class E Class B

Class G

Reasoning based on known dimensions (Smith et al., 1993):

Poodles can bite through wire.

German shepherds can bite through wire.

Dobermans can bite through wire.

German shepherds can bite through wire.

Page 22: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Models Bayes Bayes Sim.-Datasets (chain) (tree) Cover. Smith et al. (1993):

- night vision - thick skin

Blok et al. (2002):

- 1 premise - 2 premises - 1 premise (pos. and neg.)

- 2 premises (pos. and neg.)

Page 23: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

BayesModels Bayes Sim.-(chain)Datasets (tree) Cover.

Smith et al. (1993):

- night vision - thick skin

r = 0.84 0.94

Blok et al. (2002):

- 1 premise 0.97 - 2 premises 0.98 - 1 premise 0.91 (pos. and neg.)

- 2 premises 0.90 (pos. and neg.)

Page 24: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Models Bayes Bayes Sim.-Datasets (chain) (tree) Cover. Smith et al. (1993):

- night vision - thick skin

r = 0.84 0.94

0.49 0.32

0.51 0.27

Blok et al. (2002):

- 1 premise 0.97 0.07 0.32 - 2 premises 0.98 0.47 0.47 - 1 premise 0.91 0.46 N/A (pos. and neg.)

- 2 premises 0.90 0.67 N/A (pos. and neg.)

Page 25: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Property type “carry E. Spirus bacteria”

Theory typedirected network+ noisy transmission

Class A

Class B

Class D

Class E

Class C

Class G

Class F

Reasoning based on causal relations (Medin et al., 2004; Coley & Shafto, 2003):

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Salmon carry E. Spirus bacteria.

Page 26: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Property type “carry E. Spirus bacteria”

Theory typedirected network+ noisy transmission

Class A

Class B

Class D

Class E

Class C

Class G

Class F

Experiment w/ Pat Shafto, Liz Baraff & John Coley:

• Participants taught two systems of relations: – Food web – Taxonomic tree

• Asked to reason about two kinds of properties:– Diseases

– Genetic properties • Two different ecosystems:

– Mammals, Island

Page 27: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

HerringIsland ecosystemTuna Mako shark Sand shark

Taxonomy

Dolphin

Human

Kelp

Food web Sand shark

Human Mako shark Tuna Herring Kelp

Dolphin

Page 28: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Models Bayes Bayes Sim.-Datasets (food web) (tree) Cover. Mammalecosystem:- disease

- genetic property

Island ecosystem:- disease

- geneticproperty

Page 29: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Models Bayes Bayes Sim.-Datasets (food web) (tree) Cover. Mammalecosystem:- disease r = 0.75 -0.15 0.07 - genetic 0.25 0.92 0.87 property

Island ecosystem:- disease 0.79 0.01 0.17 - genetic 0.31 0.89 0.86 property

Page 30: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Conclusions• Beyond classic dichotomies of “domain-specific vs.

domain-general”, or “structured theories vs. statistical learning”. – Bayes provides a powerful domain-general statistical engine for

generalizing reliably from limited data. – Theories generate structured domain-specific priors that provide

crucial constraints for Bayesian induction.

• Advantages of Theory-based Bayesian models:– Strong quantitative models of generalization behavior, with

minimal free parameters or arbitrary assumptions. – Flexibility to model different patterns of reasoning that arise

with different kinds of properties, using differently structured theories (but the same general-purpose Bayesian engine).

– Framework for explaining why inductive generalization works.

Page 31: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Theory-based Bayesian framework

• The big picture. – What do we mean by “theory”?

Page 32: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

T1 theory (c.f. theory type, structure grammar, “framework theory”)taxonomic tree directed chain directed network+ mutation + random threshold + noisy transmission

T0 theory (c.f. structure, “specific theory”) Class A Class D

Class A

Class B

Class D Class B Class A

Class C Class F Class E Class D Class C Class C Class E Class GClass EClass F

Class B Class G

Class F Properties Class G

Class AClass BClass C

. . . . . .. . .Class DClass EClass FClass G

Page 33: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Theory-based Bayesian framework

• The big questions:– How are new properties learned, guided by a

T0 theory? – How is a T0 theory learned, guided by a T1

level theory? – How are T1 theories learned?

Page 34: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Theory-based Bayesian framework

• The big questions:– How does a T0 theory generate a hypothesis

space of properties? – How does a T1 theory generate a hypothesis

space of T0 theories? – What does the hypothesis space of T1 theories

look like? (i.e., what are the T2 and higher-level theories?)

Page 35: No Slide Title - MIT OpenCourseWare...- 2 premises 0.90 0.67 N/A (pos. and neg.) Property type “carry E. Spirus bacteria” Theory type directed network + noisy transmission Class

Theory-based Bayesian framework

• The big questions:– How do we figure out which theory to use for

which properties? – What structures and relations exist between

properties? • Clusters, hierarchies

• Ordered dimensions • Causal networks

– How do structures over properties relate to structures over classes?