towards a knowledgeable machine that can pass an elementary science test
Post on 24-Feb-2016
42 Views
Preview:
DESCRIPTION
TRANSCRIPT
Towards a Knowledgeable Machine that can Pass an Elementary Science Test
Peter ClarkVulcan Inc
August 2013
Outline
1. Halo: The Goal and Road Travelled… AURA, Inquire, and reflections
2. Exploiting Semi-Formal Representations and Textual Inference
3. A New Challenge: Fourth-Grade Science Tests
Overall Goals Long-Term Goal: The Digital Aristotle
Have large volumes of knowledge encoded in a computable form, such that the computer can answer questions, explain its answers, and ultimately dialog with users about the subject matter
History Halo Pilot: Assess representation & reasoning technologies
Formal reasoning works, but acquisition and language are problems Halo: Develop high-performance acquisition tool (AURA) HaloBook (2010-12): Aim to encode much of a textbook
Inquire: An iPad app – the knowledgeable book Halo 2.0: Reorient towards semi-automated acquisition
focus on taking K-12 science exams
“Explainable Reasoning”
…Eukaryotic cells similarly have a plasma membrane, but also contain a cell nucleus that houses the eukaryotic cell's DNA…
∀x isa( x, Eukaryotic-cell) → ∃p,n,d isa(p, Plasma-membrane) ∧isa(n, Nucleus) ∧ isa(d, DNA) ∧ has-part(x, p) ∧has-part(x, n) ∧ has-part(x, d) ∧ is-inside(d, n)
Logic (Internal View)
Concept Map (User View)
The Knowledge Encoding Process
The Knowledge Encoding Process
EukaryoticCell
PlantCell
Parts:• Plasma
membrane• Nucleus• DNA
Parts:• Plasma
membrane• Cell wall• Chloroplast
Parts:• Plasma
membrane• Cell wall• Chloroplast• Nucleus• DNA
Reasoning: Deductive elaboration of the graph using other graphs and commonsense rules
PlantCell
(more)
Question Answering
Typical examples of questions the system can answer:During mitosis, when does the cell plate begin to form?What happens during DNA replication?What is the relationship between photosynthesis and cellular respiration?What do ribosomes do?During synapsis, when are chromatids exchanged?What are the differences between eukaryotic cells and prokaryotic cells?How many chromosomes are in a human cell?In which phase of mitosis does the cell divide?What is the structure of a plasma membrane?
Outcomes The good…
Experiments suggested Inquire is educationally useful
Some question classes answered well “Suggested question” mechanism helped a lot
The bad… Only covered ~25% of the book after 2 years Deductive question-answering somewhat hit-and-miss
It’s not that manually constructed rulebases are “bad”, but: Expensive (of course, costs may be brought down) Brittle (unless the task is very tightly constrained) Never seem to be finished (permanently incomplete)…
Textual Inference / Semi-Formal Representations: Create language-based representations from (lots of) text
include words/phrases – deferred ontological commitment Imprecise, shallower reasoning
an evidential process, using multiple sources of evidence
The Dilemma of Knowledge EngineeringManual methods are expensive, automatic methods are shallow
Outline
1. Halo: The Goal and Road Travelled… AURA, Inquire, and reflections
2. Exploiting Semi-Formal Representations and Textual Inference
3. A New Challenge: Fourth-Grade Science Tests
Levels of Formality
? ?- has-part(ribosome,?x).
Text Logic
Query
Semi-Formal
Logicalentailment
Textualentailment
16
"Channel proteins facilitate the passage of molecules across the membrane."
*S:-17 +----------------------------------+---------+ NP:-3 VP:-13 | +----------------------------+-----+ N^:-2 V:0 *NP:-12* | | +------------+---------------+ N:-2 FACILITATE NP:-8 PP:-2 +----+----+ +-------+-------+ +-------+---+ N:-1 N:0 NP:-1 PP:-2 P:0 NP:-1 | | +----+--+ +----+--+ | +----+---+ CHANNEL PROTEINS DET:0 N^:0 P:0 NP:-1 ACROSS DET:0 N^:0 | | | | | | THE N:0 OF N^:0 THE N:0 | | | PASSAGE N:0 MEMBRANE | MOLECULES
Parse
Logical Form
Sentence1. Representation
subject(facilitate-1, channel-protein-1).object(facilitate-1, passage-1).of(passage-1, molecule-1).across(passage-1, membrane-1).
“facilitate”“channel protein” “passage” “molecule” “membrane”
subj obj ofacross
17
2. Textual Inference Reasoning with semi-formal structures Find sequence of transformations from text to question Requires general lexical and world knowledge
Channel proteins facilitate the passage of molecules across the membrane.
IF X facilitates Y THEN X helps Y“passage”(n) → “move”(v)
“through” ↔ “across”
Which proteins help move molecules through the membrane?
A. Channel proteins
Knowledge resources
18
2. Textual InferenceWhich proteins help move molecules through the membrane?
What ?x help move molecules through the membrane? Is ?x a protein?
1. (simple) question decomposition
Channel proteins facilitate the passage of molecules across the membrane.
Channel proteins help the passage of molecules across the membrane.
Channel proteins help move molecules through the membrane.
What ?x help move molecules through the membrane?
IF X “facilitates” Y THEN X “helps” Y
“passage”(n) → “move”(v),“through” ↔ “across”
2a. textual entailment
19
2. Textual InferenceWhich proteins help move molecules through the membrane?
What ?x help move molecules through the membrane? Is ?x a protein?
1. (simple) question decomposition
Channel proteins help move molecules through the membrane.
What ?x help move molecules through the membrane?
IF X “facilitates” Y THEN X “helps” Y
“passage”(n) → “move”(v),“through” ↔ “across”
Channel proteins facilitate the passage of molecules across the membrane.
Channel proteins help the passage of molecules across the membrane.
2a. textual entailmentIs an evidence-gathering process
20
2. Textual Inference
Channel proteins facilitate the passage of molecules across the membrane.
Channel proteins help the passage of molecules across the membrane.
What evidence can I find that“X facilitates Y” “X helps Y”?
WordNetPPDB(Johns Hopkins)
DIRT paraphrases
BioKB-101ontology
12M rules 30k rules4M rules 146k rules
21
2. Textual Inference
Channel proteins facilitate the passage of molecules across the membrane.
Channel proteins help the passage of molecules across the membrane.
What evidence can I find that“X facilitates Y” “X helps Y”?
WordNetPPDB(Johns Hopkins)
DIRT paraphrases
BioKB-101 ontology
12M rules 30k rules4M rules 146k rules
22
2. Textual Inference
Channel proteins facilitate the passage of molecules across the membrane.
Channel proteins help the passage of molecules across the membrane.
What evidence can I find that“X facilitates Y” “X helps Y”?
WordNetPPDB(Johns Hopkins)
DIRT paraphrases
BioKB-101 ontology
12M rules 30k rules4M rules 146k rules
23
2. Textual Inference
Channel proteins facilitate the passage of molecules across the membrane.
Channel proteins help the passage of molecules across the membrane.
What evidence can I find that“X facilitates Y” “X helps Y”?
WordNetPPDB(Johns Hopkins)
DIRT paraphrases
BioKB-101 ontology
12M rules 30k rules4M rules 146k rules
Domain-Biased Paraphrases (Johns Hopkins) Paraphrases learned via bilingual pivoting, and rescored
using distributional similarity.
Some examples from PPDBamplify elevate 0.993amplify explore 0.992amplify enhance 0.984amplify speed up 0.984amplify strengthen 0.982amplify improve0.982amplify magnify 0.98amplify extend 0.978amplify accept 0.97amplify follow 0.965amplify carry out 0.965amplify broaden0.962amplify go into 0.962amplify promote 0.959amplify explain 0.955amplify implement 0.951amplify leave 0.944amplify adopt 0.944amplify acquire 0.942amplify expand 0.942… … …
travel fly 0.893travel roll over0.882travel relax 0.87travel freeze 0.861travel breathe 0.861travel swim 0.858travel move 0.855travel die 0.848travel swell 0.845travel switch 0.842travel consumers 0.838travel bend 0.835travel walk 0.835travel paint 0.828travel work 0.828travel move over 0.825travel feed 0.825travel evolve 0.825travel survive 0.821… … …
???
???
Performance Currently, 3 databases of semi-formal representations
Current F1 ≈ 30% (e.g., 50% on 10% of qns) Answer = weighted sum of evidence Learn the weights (via simulated annealing)
27
Performance
Levels of Formality
? ?- has-part(ribosome,?x).
Text Logic
Query
Semi-Formal
Levels of Formality
? ?- has-part(ribosome,?x).
Text Logic
Query
Semi-Formal
What should go in here?
Outline
1. Halo: The Goal and Road Travelled… AURA, Inquire, and reflections
2. Exploiting Semi-Formal Representations and Textual Inference
3. A New Challenge: Fourth-Grade Science Tests
K-12 Grade Science Tests Provide a (task-oriented) focus Simpler (question) language Involves more common sense Wide variety of question types
and difficulties
Caveats Multiple choice are common Diagrams are common
The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them?
The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them?
The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them? “Retrieval”
1. Taxonomic
Question interpretation: Decompose question into “isa” queries
Several good sources of simple “isa” knowledge WordNet, Cyc, Wikipedia Within text itself
“isa” knowledge is fundamental to other reasoning types
2. Definitions
erosion: The process of being eroded by wind, water, or other natural agents.erosion: The wearing away of rocks and other deposits on the earth's surface …erosion: The gradual wearing away of land surface materials, especially rocks, …
Dictionary Resources
2. Definitions
erosion: The process of being eroded by wind, water, or other natural agents.erosion: The wearing away of rocks and other deposits on the earth's surface …erosion: The gradual wearing away of land surface materials, especially rocks, …
the movement of soil by wind or water
The gradual wearing away of land surface materials, especially rocks, sediments, and soils, by the action of water, wind, or a glacier.
Entailment-Style Reasoning
Dictionary Resources
3. Basic Facts
“Semantic Databases” Some basic facts can be pre-extracted and cleaned
parts, functions, steps in a process, etc. + existing resources have some of this knowledge
Building Semantic Databases…
good “parts” relations
(training data)Sentences expressing those relations
MultiR (Univ Washington)
Classifier
candidate pair, e.g.,“plant cell” has-part “chloroplast”?
Text
Finalparts database
Iterate,+ Human/machinevalidation
has-part(Leaf,Stomata)“Stomata in a leaf's surface lead to a maze of internal air spaces”
Decision(yes/no +
confidence)
Knownparts
AURAWordNet
LOD
The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them?
“Inference”
4. “Rules” (simple inference)
Many questions require simple, one-step entailments X eats → X gets nutrients X breathes oxygen –enables→ X make energy X made of metal → X conducts electricity
Large number of such facts and rules needed Manually enter them? Induce them? Just read them?
Via: Judicious forms of text Good NLP Manual validation
4. Knowledge (Rule) Extraction from TextAnimals take in air by breathing. They need oxygen, which is in the air. Oxygen allows the animal to make and use energy, which it needs to survive. Animals also need water to survive. Water is used to break down and move materials throughout the body. Animals cannot make their own food so they must eat to get nutrients. Nutrients are necessary for growth and energy.
Assertionsair contains oxygenanimals need oxygenanimals need energyanimals need waterImplicationsanimal breathes → animal takes in airanimal breathes oxygen -enables→ animal make energyanimal eat -enables→ animal get nutrientsanimal get nutrients -enables→ animal growanimal has water -enables→ animal breakdown materials
4. Knowledge (Rule) Extraction from Text Rule acquisition:
specific patterns in text
X Ys by Z IF X Zs THEN X Ys
“Animals take in air by breathing.”
IF an animal breathes THEN an animal takes in air
Rule application: using textual entailment-style inference If rule condition entailed, then infer conclusion
Current status: Pretty noisy rules!
The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them?
“Models”
5. Domain Models Sometimes you do need some “computational clockwork”
Qualitative models qualitative influences (X goes up → Y goes down) what happens to Z if X goes up?
Process models partially ordered network of events how does X contribute to Y?
Acquisition Task ≠ “read the text” = extract/build model instances from the text
5. Example: Process ModelsProcess reasoner: Given a process, can answer questions, e.g.
What is the role of Entity in Process? What Entity performs Role in Event? During X, what happens after Y?
KA Task = extract a process instance from text: 1. Identify where a process is being described 2. Extract it, e.g., with a set of trained classifiers
When the cell is stimulated, gated channels open that facilitate Na+ diffusion. Sodium ions then diffuse down their electrochemical gradient….
“stimulate” [theme: “cell”]
“open” [theme: “gated channels”]
“diffuse” [theme: “sodium ions”,”Na+” direction: “down ec gradient”]
Extracting Process Models: The annotation tool
Extracting Process Modelsflow downH+ ions
enter
change
spin
activate
produce
binding site rotor
shape
rotor, rod
ATP
ADP, Pi
catalytic site
gradient
causes
H+ ions flowing down their gradient enter a half channel in a stator, which is anchored in the membrane. H+ ions enter binding sites within a rotor, changing the shape of each subunit so that the rotor spins within the membrane... Spinning of the rotor causes an internal rod to spin as well. This rod extends like a stalk into the knob below it, which is held stationary by part of the stator. Turning of the rod activates catalytic sites in the knob that can produce ATP from ADP and Pi.
Another Example: Energy Conversion
Modeling technique: Energy conversion extract event sequence (process model) layer energy types on top → initial form of energy? final? form that produced X? etc
baby shake rattle rattle make noise
movement
mechanical energy
sound
sound energy
The 4th Grade NY Regents’ Science Exam What types of questions are there? What would it take to answer them?
“Diagrams”
6. Diagrams, Images, Tables Common in exams; many different styles and challenges
(Non-essential diagram)
6. Diagrams, Images, Tables Common in exams; many different styles and challenges
(Hard)
6. Diagrams, Images, Tables Common in exams; many different styles and challenges
(Extremely hard)
Where to? Revised picture of intelligence
Knowledge as a collection of resources, at various levels of formality taxonomic, factual, semi-formal rules, formal models
Reasoning as a collection of “experts”, with various specialized skills taxonomic, textual entailment, targeted formal systems
Semi-formal representations avoid some of the rigidity of deductive logic ≠ proof tree, = most plausible chain of inference
Introspection: Why materialize knowledge at all? Allows refinement and inconsistency reduction
What did we learn from Watson? The obvious:
leverage lots of data multiple solvers + machine reasoning = better results
The less obvious: evidential reasoning
not about finding a proof, but searching for evidence deduction often comes “tantalizingly close”
no single, pre-defined ontology
Doesn’t mean we don’t need ontologies!
(judiciously chosen)
“What material is DNA made of?” → “nucleotides”
“What shape does the six carbon atoms in glucose form?”
Summary Halo: toward knowledgeable machines Now pursuing a quite different model of
intelligence Fourth-Grade Science Tests
Wide variety of question types and challenges taxonomic definitional basic facts simple (but many possible) inferences from given facts formal modeling techniques diagrams
A good driver and test for this picture!
top related