eswc ss 2012 - friday keynote chris welty: inside the mind of watson
DESCRIPTION
TRANSCRIPT
© 2011 IBM Corporation
Inside the mind of Watson
Chris Welty
IBM Research
ibmwatson.com
Do Not Record. Do Not Distribute.
© 2011 IBM Corporation
The Core Technical Team* Researchers and Engineers in NLP, ML, IR, KR&R and CL at
IBM Labs and a growing number of universities
© 2011 IBM Corporation
Automatic Open-Domain Question Answering A Long-Standing Challenge in Artificial Intelligence to emulate human expertise
Given – Rich Natural Language Questions
– Over a Broad Domain of Knowledge
Deliver – Precise Answers: Determine what is being asked & give precise response
– Accurate Confidences: Determine likelihood answer is correct
– Consumable Justifications: Explain why the answer is right
– Fast Response Time: Precision & Confidence in <3 seconds
3
© 2011 IBM Corporation
What is Jeopardy?
Jeopardy! is an American quiz
show
– 1964 – Today
answer-and-question format
– contestants are presented with
clues in the form of answers
– must phrase their responses in
question form.
Example – Category: General Science
– Clue: When hit by electrons, a phosphor gives off electromagnetic energy in this form
– Answer: What is light?
© 2011 IBM Corporation
$200 If you are looking at
the wainscoating,
you are looking in
this direction.
$1000
The first person
mentioned by name in
‘The Man in the Iron
Mask’ is this hero of a
previous book by the
same author.
5
The Jeopardy! Challenge Hard for humans, hard for machines
Broad/Open
Domain
Complex
Language
High
Precision
Accurate
Confidence
High
Speed
$600 In cell division, mitosis
splits the nucleus &
cytokinesis splits this
liquid cushioning the
nucleus
$800 The conspirators against
this man were wounded by
each other while they
stabbed at him
But hard for different reasons.
For people, the challenge is knowing the answer
For machines, the challenge is understanding the
question
What is down? Who is
D’Artagnan?
What is
cytoplasm? Who is Julius
Caesar?
© 2011 IBM Corporation
What It Takes to compete against Top Human Jeopardy! Players Our Analysis Reveals the Winner’s Cloud
Winning Human
Performance
2007 QA Computer System
Grand Champion
Human Performance
Top human players are remarkably
good.
Each dot – actual historical human Jeopardy! games
More Confident Less Confident
© 2011 IBM Corporation
What It Takes to compete against Top Human Jeopardy! Players Our Analysis Reveals the Winner’s Cloud
2007 QA Computer System
In 2007, we committed to
making a Huge Leap!
More Confident Less Confident
Each dot – actual historical human Jeopardy! games
Computers? Not So Good.
Winning Human
Performance
Grand Champion
Human Performance
© 2011 IBM Corporation
Welty’s Trident
A new software paradigm is emerging – Increasingly, computational tasks require inexact solutions that
combine multiple methods in unpredictable ways
Knowledge is not the destination – Watson does not answer a question by translating natural language
input into formally represented knowledge and simply running queries against this knowledge
Machine intelligence is not human intelligence – The difference is most notable in the mistakes they make
© 2011 IBM Corporation
Welty’s Trident
A new software paradigm is emerging – Increasingly, computational tasks require inexact solutions that
combine multiple methods in unpredictable ways
Knowledge is not the destination – Watson does not answer a question by translating natural language
input into formally represented knowledge and simply running queries against this knowledge
Machine intelligence is not human intelligence – The difference is most notable in the mistakes they make
© 2011 IBM Corporation
DeepQA: The Technology Behind Watson
An example of a new software paradigm
. . .
Answer
Scoring
Models
Answer &
Confidence
Question
Evidence
Sources
Models
Models
Models
Models
Models Primary
Search
Candidate
Answer
Generation
Hypothesis
Generation
Hypothesis and
Evidence Scoring
Final Confidence
Merging &
Ranking
Synthesis
Answer
Sources
Question &
Topic
Analysis
Question
Decomposition
Evidence
Retrieval
Deep
Evidence
Scoring
Hypothesis
Generation
Hypothesis and Evidence
Scoring
Learned Models
help combine and
weigh the Evidence
DeepQA generates and scores many hypotheses using an extensible collection of
Natural Language Processing, Machine Learning and Reasoning Algorithms.
These gather and weigh evidence over both unstructured and structured content to
determine the answer with the best confidence.
© 2011 IBM Corporation
Example Question
In 1894 C.W. Post
created his warm
cereal drink Postum in
this Michigan city
Related Content (Structured & Unstructured)
Primary Search
1985
Post Foods
aramour
General Foods
Grand Rapids
…
Battle Creek
…
…
Candidate Answer Generation
1) Battle Creek (0.85) 2) Post Foods ( 0.20) 3) 1985 (0.05)
Merging & Ranking
Evidence Retrieval
Question Analysis
Keywords: 1894, C.W. Post,
created …
Lexical AnswerType: (Michingan city) Date(1984) Relations: Create(Post, cereal drink) …
[0.58 0 -1.3 … 0.97]
[0.71 1 13.4 … 0.72]
[0.12 0 2.0 … 0.40]
[0.84 1 10.6 … 0.21]
[0.33 0 6.3 … 0.83]
[0.21 1 11.1 … 0.92]
[0.91 0 -8.2 … 0.61]
[0.91 0 -1.7 … 0.60]
Evidence Scoring
© 2011 IBM Corporation
Category: MICHIGAN MANIA Clue: In 1894 C.W. Post created his warm cereal drink Postum in this
Michigan city
Hypothesis Scoring
Answer Scorers can be applied depending on different relations or constraints detected in the
question. For example, this question focus with modifiers is “Michigan city.” Watson can
detect this as a geospatial relation that indicates the correct answer must be a city spatially
located within the sate of Michigan.
Candidate Answers Evidence Feature Scores (Answer Scoring + Passage Scoring)
Doc Rank Pass Rank Ty Cor Geo
General Foods 0 1 0.1 0
Post Foods 2 1 0.1 0
Battle Creek 1 2 0.8 1
Will Keith Kellogg 3 0.1 0
Grand Rapids 0.9 1
1895 0 0.0 0
Tycor
Temporal
Spatial
Popularity
…
© 2011 IBM Corporation
It was named after C. W. Post, the founder of
the Postum Cereal Company that later
became General Foods. The cereal company
unit was later sold off and is now Post Foods
Category: MICHIGAN MANIA
Clue: In 1894 C.W. Post created his warm cereal drink Postum in this
Michigan city
Passage Scoring
In Deep Evidence Scoring, Watson retrieves evidence for each candidate answer, then evaluates the evidence using a
large number of deep evidence scoring analytics. The evidence for a candidate answer may come from the original
document or passage where the candidate answer was generated, or it may come from an evidence retrieval search
performed by taking the keyword search query from Step 2, replacing the focus terms with the candidate answer, and
retrieving the relevant passages that are found. The passages, or “context” in which the candidate answer occurs are
evaluated as evidence to support or refute the candidate answer as the correct answer for the question.
C.W. Post came to the Battle Creek
sanitarium to cure his upset stomach.
He later created Postum, a cereal-
based coffee substitute
The company was incorporated in 1922,
having developed from the earlier Postum
Cereal Co. Ltd., founded by C.W. Post
(1854-1914) in 1895 in Battle Creek, Mich.
After a number of experiments, Post
marketed his first product-the cereal
beverage called Postum-in 1895
1854 C. W. Post (Charles William) was
born. He founded the Postum Cereal Co.
in 1895 (renamed General Foods Corp.
in 1922) to manufacture Postum cereal
beverage
Post Foods, LLC, also known as Post Cereals
(formerly Postum Cereals) was founded by C.W.
Post. It began in 1895 with the first Postum, a
"cereal beverage", developed by Post in Battle
Creek, Michigan. The first cereal, Grape-Nuts,
was developed in 1897
General Foods' products go from breakfast
(Post's cereals) to warm nightcaps (Postum,
Sanka), also wash the pots and pans that its
foods are cooked in (S.O.S. Scouring Pads
1895: In Battle Creek, Michigan, C.W.
Post made the first POSTUM , a cereal
beverage. Post created GRAPE-NUTS
cereal in 1897, and POST TOASTIES
corn flakes in 1908
Battle Creek
Post Foods
General Foods
© 2011 IBM Corporation
Category: MICHIGAN MANIA
Clue: In 1894 C.W. Post created his warm cereal drink Postum in this …
Merging Candidate Answers and Scoring the Confidence
In the final processing step, Watson detects variants of the same answer and merges their feature scores together.
Watson then computes the final confidence scores for the candidate answers by applying a series of Machine
Learning models that weight all of the feature scores to produce the final confidence scores.
Candidate
Answers
Evidence Feature Scores
Doc
Rank
Pass
Rank
Ty Cor Geo LFAC
S
Term
Match
Temp-
oral
General Foods 0 1 0.1 0 0.2 22 1
Post Foods 2 1 0.1 0 0.4 41 1
Battle Creek 1 2 0.8 1 0.5 30 0.9
Will Keith Kellogg 3 0.1 0 0 23 0.5
Grand Rapids 0.9 1 0 10 0.5
1895 0 0.0 0 0 21 0.6
Machine
Learning
Model
Application
Final Answers Confi-
dence
Battle Creek 0.946
Post Foods 0.152
1895 0.040
Grand Rapids 0.033
General Foods 0.014
Correct
Answer
© 2011 IBM Corporation
“Minimal” Deep QA Pipeline
Hypothesis
Generation
Hypothesis and
Evidence Scoring
Final Confidence
Merging &
Ranking
Question
Analysis
Question
Primary
Search
Clue: In 1894 C.W. Post created his warm cereal drink Postum in this
Michigan city
Category: MICHIGAN MANIA
Battle Creek
Candidate
Answers
General
Foods
Post
Foods
Battle
Creek
LAT
Mitchigan
City
Document
Search
Results
R Title
0 General
Foods
1 Battle
Creek
2
Post Foods
3 Will Keith
Kellogg
Evidence Features
Ty Cor Geo
0.1 0
0.1 0
0.8 1
Final Answers Confi-
dence
Battle Creek 0.946
Post Foods 0.152
1895 0.040
© 2011 IBM Corporation
A new software paradigm emerging (not that we invented it)
The basic Watson computation is Hypothesis Scoring
How well does an answer fit into a question?
More than 100 different Hypothesis scoring software components
No single scoring component does the whole job
Many of them do very similar jobs
12 typing components, 8 passage alignment components, 10 n-gram components, …
These components are not integrated with each other beyond that they each produce a score for each hypothesis
A machine learning algorithm learns how to combine them to produce a final score
The development methodology involved an incremental approach of producing stable baseline systems and testing changes with “follow-ons”
Changes that improve performance according to our metrics are accepted into the next stable baseline
© 2011 IBM Corporation
Follow-on development
+ ~10%
© 2011 IBM Corporation
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
% Answered
Baseline
12/2007
8/2008
5/2009
10/2009
11/2010
12/2008
Incremental Baselines
5/2008
4/2010
Pre
cis
ion
© 2011 IBM Corporation
Welty’s Trident
A new software paradigm is emerging – Increasingly, computational tasks require inexact solutions that
combine multiple methods in unpredictable ways
Knowledge is not the destination – Watson does not answer a question by translating natural language
input into formally represented knowledge and simply running queries against this knowledge
Machine intelligence is not human intelligence – The difference is most notable in the mistakes they make
© 2011 IBM Corporation
ClassicQA: NOT The Technology Behind Watson
Answer &
Confidence
Question
Primary
Search GOFNLP
Answer
Sources
Formal
Query Logical
Reasoner
From the dawn of AI, it was envisioned that question answering would work by having a
process that completely translated natural language (content & questions) into an
unambiguous (logical) representation, and a reasoning process would run on that
representation to produce answers. This vision has never been realized.
Formal Knowledge
© 2011 IBM Corporation
into the Gap
Language Knowledge
Precision
Recall
Mentions
Brittleness
Acquisition
Scale
NLP Semantic
Technology FAIL
© 2011 IBM Corporation
into the Gap
Language Knowledge
Precision
Recall
Mentions
Brittleness
Acquisition
Scale
NLP Semantic
Technology
No!
© 2011 IBM Corporation
into the Gap
Language Knowledge
Precision
Recall
Mentions
Brittleness
Acquisition
Scale
NLP Semantic
Technology
Knowledge is not the destination
© 2011 IBM Corporation
into the Gap
Language
Task
(e.g. QA)
Parsing
SemTech ML
IR
NER
LF Crowds
© 2011 IBM Corporation
Using Structured Evidence
Useful for explanation data –Precise and reliable evidence
(e.g. spatial / temporal constraint match)
• Exploit wealth of freely available structured information • e.g. Linked Open Data (LOD)
• Types, Relations, Links
• Complement results from unstructured text analysis • Classic Precision Vs. Recall Tradeoff
© 2011 IBM Corporation
Answer &
Confidence
Question
Evidence
Sources
Models
Models
Models
Models Primary
Search
Candidate
Answer
Generation
Hypothesis
Generation
Hypothesis and
Evidence Scoring
Final Confidence
Merging &
Ranking
Question &
Topic
Analysis
Evidence
Retrieval
Evidence
Scoring
Question
Decomposition Synthesis
Structured Data and Inference in Watson
LAT Inference Q: “Annexation of this in
1803..”
(Using PRISMATIC)
“this” Region
Relation Detection and
Scoring Using Structured
KBs Q: “This 1997 Titanic hero..”
matches
<Dicaprio, lead-actor, Titanic>
Answer Typing
(Type Coercion) LAT: Scottish Inventor
Answer: James Watt
Anti-Type Coercion LAT: Country
Candidate: Einstein
Spatial Reasoning Containment (“This African country..”)
Relative direction (“This sea east of Florida..”) Border (“This state bordering the Great
Lakes..”) Relative location (“bldg. near Times Square..”) Numeric Properties: area/population/height
(“This sea, largest in area,..”)
Temporal Reasoning
Lifespan, Duration
Answer In Clue Q: “In 2003, ‘Big
Blue’ acquired this company..”
Downweigh IBM
Evidence Diffusion Q: “Sunan Intl. Airport is in this country”
Diffuse evidence from (Pyongyang ->> N Korea)
© 2011 IBM Corporation
LOD Impact on DeepQA for Typing Answers
61.5%
62.0%
62.5%
63.0%
63.5%
64.0%
64.5%
65.0%
65.5%
66.0%
66.5%
An ensemble of TyCor components
+ ~10%
© 2011 IBM Corporation
Welty’s Trident
A new software paradigm is emerging – Increasingly, computational tasks require inexact solutions that
combine multiple methods in unpredictable ways
Knowledge is not the destination – Watson does not answer a question by translating natural language
input into formally represented knowledge and simply running queries against this knowledge
Machine intelligence is not human intelligence – The difference is most notable in the mistakes they make
© 2011 IBM Corporation
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
% Answered
And the winner is….
Pre
cis
ion
And the winner is….not human
© 2009 IBM Corporation
IBM Research
THIS FRENCHMAN WAS "THE
FATHER OF BACTERIOLOGY”
FATHERLY NICKNAMES
HOW TASTY
WAS MY
LITTLE
FRENCHMAN
© 2009 IBM Corporation
IBM Research
IN 1824 THIS FIRST FOREIGNER TO
ADDRESS A JOINT SESSION OF
CONGRESS CONGRATULATED THE
U.S. ON ITS GROWTH
THERE'S A FIRST TIME FOR
EVERYTHING President
Bush
© 2009 IBM Corporation
IBM Research
WHAT IS THE TEXT OF AN OPERA
CALLED?
MUSIC
Michael
© 2011 IBM Corporation
GRASSHOPPERS EAT PRIMARILY
THIS
HAPPY MEALS
Kosher
© 2011 IBM Corporation
It was the anatomical oddity of U.S.
gymnast George Eyser, who won a
gold medal on the parallel bars in 1904
OLYMPIC ODDITIES
Had only
one hand
© 2011 IBM Corporation
Welty’s Trident
A new software paradigm is emerging – Increasingly, computational tasks require inexact solutions that
combine multiple methods in unpredictable ways
Knowledge is not the destination – Watson does not answer a question by translating natural language
input into formally represented knowledge and simply running queries against this knowledge
Machine intelligence is not human intelligence – The difference is most notable in the mistakes they make
iswc2012.semanticweb.org
CONFIRMED KEYNOTE:
TOM MALONE, MIT
PAPER DEADLINES:
MID JUNE