towards a knowledgeable machine that can pass an elementary science test

56
Towards a Knowledgeable Machine that can Pass an Elementary Science Test Peter Clark Vulcan Inc August 2013

Upload: elgin

Post on 24-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Towards a Knowledgeable Machine that can Pass an Elementary Science Test. Peter Clark Vulcan Inc August 2013. Outline. Halo: The Goal and Road Travelled… AURA, Inquire, and reflections Exploiting Semi-Formal Representations and Textual Inference - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Peter ClarkVulcan Inc

August 2013

Page 2: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Outline

1. Halo: The Goal and Road Travelled… AURA, Inquire, and reflections

2. Exploiting Semi-Formal Representations and Textual Inference

3. A New Challenge: Fourth-Grade Science Tests

Page 3: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Overall Goals Long-Term Goal: The Digital Aristotle

Have large volumes of knowledge encoded in a computable form, such that the computer can answer questions, explain its answers, and ultimately dialog with users about the subject matter

History Halo Pilot: Assess representation & reasoning technologies

Formal reasoning works, but acquisition and language are problems Halo: Develop high-performance acquisition tool (AURA) HaloBook (2010-12): Aim to encode much of a textbook

Inquire: An iPad app – the knowledgeable book Halo 2.0: Reorient towards semi-automated acquisition

focus on taking K-12 science exams

“Explainable Reasoning”

Page 4: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

…Eukaryotic cells similarly have a plasma membrane, but also contain a cell nucleus that houses the eukaryotic cell's DNA…

∀x isa( x, Eukaryotic-cell) → ∃p,n,d isa(p, Plasma-membrane) ∧isa(n, Nucleus) ∧ isa(d, DNA) ∧ has-part(x, p) ∧has-part(x, n) ∧ has-part(x, d) ∧ is-inside(d, n)

Logic (Internal View)

Concept Map (User View)

The Knowledge Encoding Process

Page 5: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

The Knowledge Encoding Process

Page 6: Towards a Knowledgeable Machine that can Pass an Elementary Science Test
Page 7: Towards a Knowledgeable Machine that can Pass an Elementary Science Test
Page 8: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

EukaryoticCell

PlantCell

Parts:• Plasma

membrane• Nucleus• DNA

Parts:• Plasma

membrane• Cell wall• Chloroplast

Parts:• Plasma

membrane• Cell wall• Chloroplast• Nucleus• DNA

Reasoning: Deductive elaboration of the graph using other graphs and commonsense rules

PlantCell

(more)

Page 9: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Question Answering

Typical examples of questions the system can answer:During mitosis, when does the cell plate begin to form?What happens during DNA replication?What is the relationship between photosynthesis and cellular respiration?What do ribosomes do?During synapsis, when are chromatids exchanged?What are the differences between eukaryotic cells and prokaryotic cells?How many chromosomes are in a human cell?In which phase of mitosis does the cell divide?What is the structure of a plasma membrane?

Page 10: Towards a Knowledgeable Machine that can Pass an Elementary Science Test
Page 11: Towards a Knowledgeable Machine that can Pass an Elementary Science Test
Page 12: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Outcomes The good…

Experiments suggested Inquire is educationally useful

Some question classes answered well “Suggested question” mechanism helped a lot

The bad… Only covered ~25% of the book after 2 years Deductive question-answering somewhat hit-and-miss

Page 13: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

It’s not that manually constructed rulebases are “bad”, but: Expensive (of course, costs may be brought down) Brittle (unless the task is very tightly constrained) Never seem to be finished (permanently incomplete)…

Textual Inference / Semi-Formal Representations: Create language-based representations from (lots of) text

include words/phrases – deferred ontological commitment Imprecise, shallower reasoning

an evidential process, using multiple sources of evidence

The Dilemma of Knowledge EngineeringManual methods are expensive, automatic methods are shallow

Page 14: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Outline

1. Halo: The Goal and Road Travelled… AURA, Inquire, and reflections

2. Exploiting Semi-Formal Representations and Textual Inference

3. A New Challenge: Fourth-Grade Science Tests

Page 15: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Levels of Formality

? ?- has-part(ribosome,?x).

Text Logic

Query

Semi-Formal

Logicalentailment

Textualentailment

Page 16: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

16

"Channel proteins facilitate the passage of molecules across the membrane."

*S:-17 +----------------------------------+---------+ NP:-3 VP:-13 | +----------------------------+-----+ N^:-2 V:0 *NP:-12* | | +------------+---------------+ N:-2 FACILITATE NP:-8 PP:-2 +----+----+ +-------+-------+ +-------+---+ N:-1 N:0 NP:-1 PP:-2 P:0 NP:-1 | | +----+--+ +----+--+ | +----+---+ CHANNEL PROTEINS DET:0 N^:0 P:0 NP:-1 ACROSS DET:0 N^:0 | | | | | | THE N:0 OF N^:0 THE N:0 | | | PASSAGE N:0 MEMBRANE | MOLECULES

Parse

Logical Form

Sentence1. Representation

subject(facilitate-1, channel-protein-1).object(facilitate-1, passage-1).of(passage-1, molecule-1).across(passage-1, membrane-1).

“facilitate”“channel protein” “passage” “molecule” “membrane”

subj obj ofacross

Page 17: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

17

2. Textual Inference Reasoning with semi-formal structures Find sequence of transformations from text to question Requires general lexical and world knowledge

Channel proteins facilitate the passage of molecules across the membrane.

IF X facilitates Y THEN X helps Y“passage”(n) → “move”(v)

“through” ↔ “across”

Which proteins help move molecules through the membrane?

A. Channel proteins

Knowledge resources

Page 18: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

18

2. Textual InferenceWhich proteins help move molecules through the membrane?

What ?x help move molecules through the membrane? Is ?x a protein?

1. (simple) question decomposition

Channel proteins facilitate the passage of molecules across the membrane.

Channel proteins help the passage of molecules across the membrane.

Channel proteins help move molecules through the membrane.

What ?x help move molecules through the membrane?

IF X “facilitates” Y THEN X “helps” Y

“passage”(n) → “move”(v),“through” ↔ “across”

2a. textual entailment

Page 19: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

19

2. Textual InferenceWhich proteins help move molecules through the membrane?

What ?x help move molecules through the membrane? Is ?x a protein?

1. (simple) question decomposition

Channel proteins help move molecules through the membrane.

What ?x help move molecules through the membrane?

IF X “facilitates” Y THEN X “helps” Y

“passage”(n) → “move”(v),“through” ↔ “across”

Channel proteins facilitate the passage of molecules across the membrane.

Channel proteins help the passage of molecules across the membrane.

2a. textual entailmentIs an evidence-gathering process

Page 20: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

20

2. Textual Inference

Channel proteins facilitate the passage of molecules across the membrane.

Channel proteins help the passage of molecules across the membrane.

What evidence can I find that“X facilitates Y” “X helps Y”?

WordNetPPDB(Johns Hopkins)

DIRT paraphrases

BioKB-101ontology

12M rules 30k rules4M rules 146k rules

Page 21: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

21

2. Textual Inference

Channel proteins facilitate the passage of molecules across the membrane.

Channel proteins help the passage of molecules across the membrane.

What evidence can I find that“X facilitates Y” “X helps Y”?

WordNetPPDB(Johns Hopkins)

DIRT paraphrases

BioKB-101 ontology

12M rules 30k rules4M rules 146k rules

Page 22: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

22

2. Textual Inference

Channel proteins facilitate the passage of molecules across the membrane.

Channel proteins help the passage of molecules across the membrane.

What evidence can I find that“X facilitates Y” “X helps Y”?

WordNetPPDB(Johns Hopkins)

DIRT paraphrases

BioKB-101 ontology

12M rules 30k rules4M rules 146k rules

Page 23: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

23

2. Textual Inference

Channel proteins facilitate the passage of molecules across the membrane.

Channel proteins help the passage of molecules across the membrane.

What evidence can I find that“X facilitates Y” “X helps Y”?

WordNetPPDB(Johns Hopkins)

DIRT paraphrases

BioKB-101 ontology

12M rules 30k rules4M rules 146k rules

Page 24: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Domain-Biased Paraphrases (Johns Hopkins) Paraphrases learned via bilingual pivoting, and rescored

using distributional similarity.

Page 25: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Some examples from PPDBamplify elevate 0.993amplify explore 0.992amplify enhance 0.984amplify speed up 0.984amplify strengthen 0.982amplify improve0.982amplify magnify 0.98amplify extend 0.978amplify accept 0.97amplify follow 0.965amplify carry out 0.965amplify broaden0.962amplify go into 0.962amplify promote 0.959amplify explain 0.955amplify implement 0.951amplify leave 0.944amplify adopt 0.944amplify acquire 0.942amplify expand 0.942… … …

travel fly 0.893travel roll over0.882travel relax 0.87travel freeze 0.861travel breathe 0.861travel swim 0.858travel move 0.855travel die 0.848travel swell 0.845travel switch 0.842travel consumers 0.838travel bend 0.835travel walk 0.835travel paint 0.828travel work 0.828travel move over 0.825travel feed 0.825travel evolve 0.825travel survive 0.821… … …

???

???

Page 26: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Performance Currently, 3 databases of semi-formal representations

Current F1 ≈ 30% (e.g., 50% on 10% of qns) Answer = weighted sum of evidence Learn the weights (via simulated annealing)

Page 27: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

27

Performance

Page 28: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Levels of Formality

? ?- has-part(ribosome,?x).

Text Logic

Query

Semi-Formal

Page 29: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Levels of Formality

? ?- has-part(ribosome,?x).

Text Logic

Query

Semi-Formal

What should go in here?

Page 30: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Outline

1. Halo: The Goal and Road Travelled… AURA, Inquire, and reflections

2. Exploiting Semi-Formal Representations and Textual Inference

3. A New Challenge: Fourth-Grade Science Tests

Page 31: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

K-12 Grade Science Tests Provide a (task-oriented) focus Simpler (question) language Involves more common sense Wide variety of question types

and difficulties

Caveats Multiple choice are common Diagrams are common

Page 32: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them?

Page 33: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them?

Page 34: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them? “Retrieval”

Page 35: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

1. Taxonomic

Question interpretation: Decompose question into “isa” queries

Several good sources of simple “isa” knowledge WordNet, Cyc, Wikipedia Within text itself

“isa” knowledge is fundamental to other reasoning types

Page 36: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

2. Definitions

erosion: The process of being eroded by wind, water, or other natural agents.erosion: The wearing away of rocks and other deposits on the earth's surface …erosion: The gradual wearing away of land surface materials, especially rocks, …

Dictionary Resources

Page 37: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

2. Definitions

erosion: The process of being eroded by wind, water, or other natural agents.erosion: The wearing away of rocks and other deposits on the earth's surface …erosion: The gradual wearing away of land surface materials, especially rocks, …

the movement of soil by wind or water

The gradual wearing away of land surface materials, especially rocks, sediments, and soils, by the action of water, wind, or a glacier.

Entailment-Style Reasoning

Dictionary Resources

Page 38: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

3. Basic Facts

“Semantic Databases” Some basic facts can be pre-extracted and cleaned

parts, functions, steps in a process, etc. + existing resources have some of this knowledge

Page 39: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Building Semantic Databases…

good “parts” relations

(training data)Sentences expressing those relations

MultiR (Univ Washington)

Classifier

candidate pair, e.g.,“plant cell” has-part “chloroplast”?

Text

Finalparts database

Iterate,+ Human/machinevalidation

has-part(Leaf,Stomata)“Stomata in a leaf's surface lead to a maze of internal air spaces”

Decision(yes/no +

confidence)

Knownparts

AURAWordNet

LOD

Page 40: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them?

“Inference”

Page 41: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

4. “Rules” (simple inference)

Many questions require simple, one-step entailments X eats → X gets nutrients X breathes oxygen –enables→ X make energy X made of metal → X conducts electricity

Large number of such facts and rules needed Manually enter them? Induce them? Just read them?

Via: Judicious forms of text Good NLP Manual validation

Page 42: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

4. Knowledge (Rule) Extraction from TextAnimals take in air by breathing. They need oxygen, which is in the air. Oxygen allows the animal to make and use energy, which it needs to survive. Animals also need water to survive. Water is used to break down and move materials throughout the body. Animals cannot make their own food so they must eat to get nutrients. Nutrients are necessary for growth and energy.

Assertionsair contains oxygenanimals need oxygenanimals need energyanimals need waterImplicationsanimal breathes → animal takes in airanimal breathes oxygen -enables→ animal make energyanimal eat -enables→ animal get nutrientsanimal get nutrients -enables→ animal growanimal has water -enables→ animal breakdown materials

Page 43: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

4. Knowledge (Rule) Extraction from Text Rule acquisition:

specific patterns in text

X Ys by Z IF X Zs THEN X Ys

“Animals take in air by breathing.”

IF an animal breathes THEN an animal takes in air

Rule application: using textual entailment-style inference If rule condition entailed, then infer conclusion

Current status: Pretty noisy rules!

Page 44: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them?

“Models”

Page 45: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

5. Domain Models Sometimes you do need some “computational clockwork”

Qualitative models qualitative influences (X goes up → Y goes down) what happens to Z if X goes up?

Process models partially ordered network of events how does X contribute to Y?

Acquisition Task ≠ “read the text” = extract/build model instances from the text

Page 46: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

5. Example: Process ModelsProcess reasoner: Given a process, can answer questions, e.g.

What is the role of Entity in Process? What Entity performs Role in Event? During X, what happens after Y?

KA Task = extract a process instance from text: 1. Identify where a process is being described 2. Extract it, e.g., with a set of trained classifiers

When the cell is stimulated, gated channels open that facilitate Na+ diffusion. Sodium ions then diffuse down their electrochemical gradient….

“stimulate” [theme: “cell”]

“open” [theme: “gated channels”]

“diffuse” [theme: “sodium ions”,”Na+” direction: “down ec gradient”]

Page 47: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Extracting Process Models: The annotation tool

Page 48: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Extracting Process Modelsflow downH+ ions

enter

change

spin

activate

produce

binding site rotor

shape

rotor, rod

ATP

ADP, Pi

catalytic site

gradient

causes

H+ ions flowing down their gradient enter a half channel in a stator, which is anchored in the membrane. H+ ions enter binding sites within a rotor, changing the shape of each subunit so that the rotor spins within the membrane... Spinning of the rotor causes an internal rod to spin as well. This rod extends like a stalk into the knob below it, which is held stationary by part of the stator. Turning of the rod activates catalytic sites in the knob that can produce ATP from ADP and Pi.

Page 49: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Another Example: Energy Conversion

Modeling technique: Energy conversion extract event sequence (process model) layer energy types on top → initial form of energy? final? form that produced X? etc

baby shake rattle rattle make noise

movement

mechanical energy

sound

sound energy

Page 50: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them?

“Diagrams”

Page 51: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

6. Diagrams, Images, Tables Common in exams; many different styles and challenges

(Non-essential diagram)

Page 52: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

6. Diagrams, Images, Tables Common in exams; many different styles and challenges

(Hard)

Page 53: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

6. Diagrams, Images, Tables Common in exams; many different styles and challenges

(Extremely hard)

Page 54: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Where to? Revised picture of intelligence

Knowledge as a collection of resources, at various levels of formality taxonomic, factual, semi-formal rules, formal models

Reasoning as a collection of “experts”, with various specialized skills taxonomic, textual entailment, targeted formal systems

Semi-formal representations avoid some of the rigidity of deductive logic ≠ proof tree, = most plausible chain of inference

Introspection: Why materialize knowledge at all? Allows refinement and inconsistency reduction

Page 55: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

What did we learn from Watson? The obvious:

leverage lots of data multiple solvers + machine reasoning = better results

The less obvious: evidential reasoning

not about finding a proof, but searching for evidence deduction often comes “tantalizingly close”

no single, pre-defined ontology

Doesn’t mean we don’t need ontologies!

(judiciously chosen)

“What material is DNA made of?” → “nucleotides”

“What shape does the six carbon atoms in glucose form?”

Page 56: Towards a Knowledgeable Machine that can Pass an Elementary Science Test

Summary Halo: toward knowledgeable machines Now pursuing a quite different model of

intelligence Fourth-Grade Science Tests

Wide variety of question types and challenges taxonomic definitional basic facts simple (but many possible) inferences from given facts formal modeling techniques diagrams

A good driver and test for this picture!