eswc ss 2012 - friday keynote chris welty: inside the mind of watson

© 2011 IBM Corporation

Inside the mind of Watson

Chris Welty

IBM Research

ibmwatson.com

Do Not Record. Do Not Distribute.


The Core Technical Team* Researchers and Engineers in NLP, ML, IR, KR&R and CL at

IBM Labs and a growing number of universities


Automatic Open-Domain Question Answering A Long-Standing Challenge in Artificial Intelligence to emulate human expertise

Given – Rich Natural Language Questions

– Over a Broad Domain of Knowledge

Deliver – Precise Answers: Determine what is being asked & give precise response

– Accurate Confidences: Determine likelihood answer is correct

– Consumable Justifications: Explain why the answer is right

– Fast Response Time: Precision & Confidence in <3 seconds

3


What is Jeopardy?

Jeopardy! is an American quiz

show

– 1964 – Today

answer-and-question format

– contestants are presented with

clues in the form of answers

– must phrase their responses in

question form.

Example – Category: General Science

– Clue: When hit by electrons, a phosphor gives off electromagnetic energy in this form

– Answer: What is light?


$200 If you are looking at

the wainscoating,

you are looking in

this direction.

$1000

The first person

mentioned by name in

‘The Man in the Iron

Mask’ is this hero of a

previous book by the

same author.

5

The Jeopardy! Challenge Hard for humans, hard for machines

Broad/Open

Domain

Complex

Language

High

Precision

Accurate

Confidence

High

Speed

$600 In cell division, mitosis

splits the nucleus &

cytokinesis splits this

liquid cushioning the

nucleus

$800 The conspirators against

this man were wounded by

each other while they

stabbed at him

But hard for different reasons.

For people, the challenge is knowing the answer

For machines, the challenge is understanding the

question

What is down? Who is

D’Artagnan?

What is

cytoplasm? Who is Julius

Caesar?


What It Takes to compete against Top Human Jeopardy! Players Our Analysis Reveals the Winner’s Cloud

Winning Human

Performance

2007 QA Computer System

Grand Champion

Human Performance

Top human players are remarkably

good.

Each dot – actual historical human Jeopardy! games

More Confident Less Confident


What It Takes to compete against Top Human Jeopardy! Players Our Analysis Reveals the Winner’s Cloud

2007 QA Computer System

In 2007, we committed to

making a Huge Leap!

More Confident Less Confident

Each dot – actual historical human Jeopardy! games

Computers? Not So Good.

Winning Human

Performance

Grand Champion

Human Performance


Welty’s Trident

A new software paradigm is emerging – Increasingly, computational tasks require inexact solutions that

combine multiple methods in unpredictable ways

Knowledge is not the destination – Watson does not answer a question by translating natural language

input into formally represented knowledge and simply running queries against this knowledge

Machine intelligence is not human intelligence – The difference is most notable in the mistakes they make


DeepQA: The Technology Behind Watson

An example of a new software paradigm

. . .

Answer

Scoring

Models

Answer &

Confidence

Question

Evidence

Sources

Models

Models

Models

Models

Models Primary

Search

Candidate

Answer

Generation

Hypothesis

Generation

Hypothesis and

Evidence Scoring

Final Confidence

Merging &

Ranking

Synthesis

Answer

Sources

Question &

Topic

Analysis

Question

Decomposition

Evidence

Retrieval

Deep

Evidence

Scoring

Hypothesis

Generation

Hypothesis and Evidence

Scoring

Learned Models

help combine and

weigh the Evidence

DeepQA generates and scores many hypotheses using an extensible collection of

Natural Language Processing, Machine Learning and Reasoning Algorithms.

These gather and weigh evidence over both unstructured and structured content to

determine the answer with the best confidence.


Example Question

In 1894 C.W. Post

created his warm

cereal drink Postum in

this Michigan city

Related Content (Structured & Unstructured)

Primary Search

1985

Post Foods

aramour

General Foods

Grand Rapids

…

Battle Creek

…

…

Candidate Answer Generation

1) Battle Creek (0.85) 2) Post Foods ( 0.20) 3) 1985 (0.05)

Merging & Ranking

Evidence Retrieval

Question Analysis

Keywords: 1894, C.W. Post,

created …

Lexical AnswerType: (Michingan city) Date(1984) Relations: Create(Post, cereal drink) …

[0.58 0 -1.3 … 0.97]

[0.71 1 13.4 … 0.72]

[0.12 0 2.0 … 0.40]

[0.84 1 10.6 … 0.21]

[0.33 0 6.3 … 0.83]

[0.21 1 11.1 … 0.92]

[0.91 0 -8.2 … 0.61]

[0.91 0 -1.7 … 0.60]

Evidence Scoring


Category: MICHIGAN MANIA Clue: In 1894 C.W. Post created his warm cereal drink Postum in this

Michigan city

Hypothesis Scoring

Answer Scorers can be applied depending on different relations or constraints detected in the

question. For example, this question focus with modifiers is “Michigan city.” Watson can

detect this as a geospatial relation that indicates the correct answer must be a city spatially

located within the sate of Michigan.

Candidate Answers Evidence Feature Scores (Answer Scoring + Passage Scoring)

Doc Rank Pass Rank Ty Cor Geo

General Foods 0 1 0.1 0

Post Foods 2 1 0.1 0

Battle Creek 1 2 0.8 1

Will Keith Kellogg 3 0.1 0

Grand Rapids 0.9 1

1895 0 0.0 0

Tycor

Temporal

Spatial

Popularity

…


It was named after C. W. Post, the founder of

the Postum Cereal Company that later

became General Foods. The cereal company

unit was later sold off and is now Post Foods

Category: MICHIGAN MANIA

Clue: In 1894 C.W. Post created his warm cereal drink Postum in this

Michigan city

Passage Scoring

In Deep Evidence Scoring, Watson retrieves evidence for each candidate answer, then evaluates the evidence using a

large number of deep evidence scoring analytics. The evidence for a candidate answer may come from the original

document or passage where the candidate answer was generated, or it may come from an evidence retrieval search

performed by taking the keyword search query from Step 2, replacing the focus terms with the candidate answer, and

retrieving the relevant passages that are found. The passages, or “context” in which the candidate answer occurs are

evaluated as evidence to support or refute the candidate answer as the correct answer for the question.

C.W. Post came to the Battle Creek

sanitarium to cure his upset stomach.

He later created Postum, a cereal-

based coffee substitute

The company was incorporated in 1922,

having developed from the earlier Postum

Cereal Co. Ltd., founded by C.W. Post

(1854-1914) in 1895 in Battle Creek, Mich.

After a number of experiments, Post

marketed his first product-the cereal

beverage called Postum-in 1895

1854 C. W. Post (Charles William) was

born. He founded the Postum Cereal Co.

in 1895 (renamed General Foods Corp.

in 1922) to manufacture Postum cereal

beverage

Post Foods, LLC, also known as Post Cereals

(formerly Postum Cereals) was founded by C.W.

Post. It began in 1895 with the first Postum, a

"cereal beverage", developed by Post in Battle

Creek, Michigan. The first cereal, Grape-Nuts,

was developed in 1897

General Foods' products go from breakfast

(Post's cereals) to warm nightcaps (Postum,

Sanka), also wash the pots and pans that its

foods are cooked in (S.O.S. Scouring Pads

1895: In Battle Creek, Michigan, C.W.

Post made the first POSTUM , a cereal

beverage. Post created GRAPE-NUTS

cereal in 1897, and POST TOASTIES

corn flakes in 1908

Battle Creek

Post Foods

General Foods



Clue: In 1894 C.W. Post created his warm cereal drink Postum in this …

Merging Candidate Answers and Scoring the Confidence

In the final processing step, Watson detects variants of the same answer and merges their feature scores together.

Watson then computes the final confidence scores for the candidate answers by applying a series of Machine

Learning models that weight all of the feature scores to produce the final confidence scores.

Candidate

Answers

Evidence Feature Scores

Doc

Rank

Pass

Rank

Ty Cor Geo LFAC

S

Term

Match

Temp-

oral

General Foods 0 1 0.1 0 0.2 22 1

Post Foods 2 1 0.1 0 0.4 41 1

Battle Creek 1 2 0.8 1 0.5 30 0.9

Will Keith Kellogg 3 0.1 0 0 23 0.5

Grand Rapids 0.9 1 0 10 0.5

1895 0 0.0 0 0 21 0.6

Machine

Learning

Model

Application

Final Answers Confi-

dence

Battle Creek 0.946

Post Foods 0.152

1895 0.040

Grand Rapids 0.033

General Foods 0.014

Correct

Answer


“Minimal” Deep QA Pipeline

Hypothesis

Generation

Hypothesis and

Evidence Scoring

Final Confidence

Merging &

Ranking

Question

Analysis

Question

Primary

Search

Clue: In 1894 C.W. Post created his warm cereal drink Postum in this

Michigan city


Battle Creek

Candidate

Answers

General

Foods

Post

Foods

Battle

Creek

LAT

Mitchigan

City

Document

Search

Results

R Title

0 General

Foods

1 Battle

Creek

2

Post Foods

3 Will Keith

Kellogg

Evidence Features

Ty Cor Geo

0.1 0

0.1 0

0.8 1

Final Answers Confi-

dence

Battle Creek 0.946

Post Foods 0.152

1895 0.040


A new software paradigm emerging (not that we invented it)

The basic Watson computation is Hypothesis Scoring

How well does an answer fit into a question?

More than 100 different Hypothesis scoring software components

No single scoring component does the whole job

Many of them do very similar jobs

12 typing components, 8 passage alignment components, 10 n-gram components, …

These components are not integrated with each other beyond that they each produce a score for each hypothesis

A machine learning algorithm learns how to combine them to produce a final score

The development methodology involved an incremental approach of producing stable baseline systems and testing changes with “follow-ons”

Changes that improve performance according to our metrics are accepted into the next stable baseline


Follow-on development

+ ~10%


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

% Answered

Baseline

12/2007

8/2008

5/2009

10/2009

11/2010

12/2008

Incremental Baselines

5/2008

4/2010

Pre

cis

ion


Welty’s Trident







ClassicQA: NOT The Technology Behind Watson

Answer &

Confidence

Question

Primary

Search GOFNLP

Answer

Sources

Formal

Query Logical

Reasoner

From the dawn of AI, it was envisioned that question answering would work by having a

process that completely translated natural language (content & questions) into an

unambiguous (logical) representation, and a reasoning process would run on that

representation to produce answers. This vision has never been realized.

Formal Knowledge


into the Gap

Language Knowledge

Precision

Recall

Mentions

Brittleness

Acquisition

Scale

NLP Semantic

Technology FAIL


into the Gap

Language Knowledge

Precision

Recall

Mentions

Brittleness

Acquisition

Scale

NLP Semantic

Technology

No!


into the Gap

Language Knowledge

Precision

Recall

Mentions

Brittleness

Acquisition

Scale

NLP Semantic

Technology

Knowledge is not the destination


into the Gap

Language

Task

(e.g. QA)

Parsing

SemTech ML

IR

NER

LF Crowds


Using Structured Evidence

Useful for explanation data –Precise and reliable evidence

(e.g. spatial / temporal constraint match)

• Exploit wealth of freely available structured information • e.g. Linked Open Data (LOD)

• Types, Relations, Links

• Complement results from unstructured text analysis • Classic Precision Vs. Recall Tradeoff


Answer &

Confidence

Question

Evidence

Sources

Models

Models

Models

Models Primary

Search

Candidate

Answer

Generation

Hypothesis

Generation

Hypothesis and

Evidence Scoring

Final Confidence

Merging &

Ranking

Question &

Topic

Analysis

Evidence

Retrieval

Evidence

Scoring

Question

Decomposition Synthesis

Structured Data and Inference in Watson

LAT Inference Q: “Annexation of this in

1803..”

(Using PRISMATIC)

“this” Region

Relation Detection and

Scoring Using Structured

KBs Q: “This 1997 Titanic hero..”

matches

<Dicaprio, lead-actor, Titanic>

Answer Typing

(Type Coercion) LAT: Scottish Inventor

Answer: James Watt

Anti-Type Coercion LAT: Country

Candidate: Einstein

Spatial Reasoning Containment (“This African country..”)

Relative direction (“This sea east of Florida..”) Border (“This state bordering the Great

Lakes..”) Relative location (“bldg. near Times Square..”) Numeric Properties: area/population/height

(“This sea, largest in area,..”)

Temporal Reasoning

Lifespan, Duration

Answer In Clue Q: “In 2003, ‘Big

Blue’ acquired this company..”

Downweigh IBM

Evidence Diffusion Q: “Sunan Intl. Airport is in this country”

Diffuse evidence from (Pyongyang ->> N Korea)


LOD Impact on DeepQA for Typing Answers

61.5%

62.0%

62.5%

63.0%

63.5%

64.0%

64.5%

65.0%

65.5%

66.0%

66.5%

An ensemble of TyCor components

+ ~10%


Welty’s Trident







0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

% Answered

And the winner is….

Pre

cis

ion

And the winner is….not human


IBM Research

THIS FRENCHMAN WAS "THE

FATHER OF BACTERIOLOGY”

FATHERLY NICKNAMES

HOW TASTY

WAS MY

LITTLE

FRENCHMAN


IBM Research

IN 1824 THIS FIRST FOREIGNER TO

ADDRESS A JOINT SESSION OF

CONGRESS CONGRATULATED THE

U.S. ON ITS GROWTH

THERE'S A FIRST TIME FOR

EVERYTHING President

Bush


IBM Research

WHAT IS THE TEXT OF AN OPERA

CALLED?

MUSIC

Michael


GRASSHOPPERS EAT PRIMARILY

THIS

HAPPY MEALS

Kosher


It was the anatomical oddity of U.S.

gymnast George Eyser, who won a

gold medal on the parallel bars in 1904

OLYMPIC ODDITIES

Had only

one hand


Welty’s Trident






iswc2012.semanticweb.org

CONFIRMED KEYNOTE:

TOM MALONE, MIT

PAPER DEADLINES:

MID JUNE