getting’computers’to’understand’ whattheyread...

Getting Computers to Understand What They Read (Or Hear)

Christopher Manning

http://nlp.stanford.edu/

Computer Forum 2012

The future was …

A vast quantity of information, contained in knowledge bases, with artificial intelligence systems for

reasoning over it

The future is …

A vast quantity of information in an ugly mess known as The Web.

But it’s all indexed and easily searchable, and, for humans,

most of the time it actually works

amazingly well.

But how can we use it to get computers to do more advanced tasks

which require getting knowledge from language and putting facts together?

We need machine reading.

We need more than word counts

Extracting Knowledge

Textual abstract: A summary for humans

LLNL EQ Lawrence Livermore National Laboratory LLNL LOC-‐IN California Livermore LOC-‐IN California LLNL IS-‐A scientific research laboratory LLNL FOUNDED-‐BY University of California LLNL FOUNDED-‐IN 1952

“The Lawrence Livermore National Laboratory (LLNL) in Livermore, California is a scientific research

laboratory founded by the University of California in 1952.”

Structured knowledge: A summary for machines

relation entity

Machine Reading with Distant Supervision

[Mintz, et al. ACL 2009; Surdeanu et al. 2011]

•  If we had relations marked in texts, we could train a conventional relation extraction system …

•  Can we exploit the abundant found information about relations – such as from DBpedia or Freebase – to be able to bootstrap systems for machine reading?

•  Method: use database as “distant supervision” of text •  The challenge is dealing with the “noise” that enters

the picture

Results

•  Precision of extracted facts: about 70%

•  New relations learned:

Montmartre IS-‐IN Paris Fyoder Kamesnky DIED-‐IN Clearwater

Fort Erie IS-‐IN Ontario Utpon Sinclair WROTE Lanny Budd

Vince McMahon FOUNDED WWE Thomas Mellon HAS-‐PROFESSION Judge

Where syntactic knowledge helps

How useful are syntactic representations for this goal?

Back Street is a 1932 film made by Universal Pictures, directed by John M. Stahl, and produced by Carl Laemmle Jr.

– Back Street and John M. Stahl are far apart in the surface string

– But they are close together in a dependency parse

Stanford Dependencies as a representation for relation extraction

The little boy jumped over the fence.

jumped!

boy! fence!

the! the!little!

prep_over nsubj

det amod det

det(boy-‐3, The-‐1) amod(boy-‐3, little-‐2) nsubj(jumped-‐4, boy-‐3) det(fence-‐7, the-‐6) prep_over(jumped-‐4, fence-‐7)

S

NP VP

NN

JJ PP

DT

NN DT VBD

IN NP The little

over

boy

the fence

jumped

[de Marneffe & Manning 2008]

Stanford Dependencies as a representation for relation extraction

•  Stanford Dependencies favor short paths between related content words

Björne et al. 2009

Over ¾

How do we design a human language understanding system?

•  Most systems use a pipeline of processing stages –  Tokenize –  Part-‐of-‐speech –  Named entities –  Syntactic parse –  Semantic roles –  Coreference –  …

Probabilistic joint inference helps component tasks [Finkel & Manning, NAACL 2009, 2010]

55

60

65

70

75

80

85

90

ABC CNN MNB NBC PRI VOA

Named Entity Recognition F1-score on OntoNotes (by section)

Baseline

Joint Inference

  Goal: Joint modeling of the many phases of linguistic analysis   Here, parsing and named entities

  Fixed 24% of named entity boundary errors and of incorrect label errors

  22% improvement in parsing scores

How can we understand relationships between pieces of text?

•  Can one conclude one piece of text from another? –  Emphasis is on handling the variability of linguistic expression

•  This textual inference technology would enable: –  Semantic search: lobbyists attempting to bribe U.S. legislators

The A.P. named two more senators who received contributions engineered by lobbyist Jack Abramoff in return for political favors.

–  Question answering: Who bought J.D. Edwards? Thanks to its recent acquisition of J.D. Edwards, Oracle will soon be able …

–  Customer email response –  Paraphrase and contradiction detection

Natural Logic [MacCartney & Manning 2008, 2009]

OK, the example is contrived, but it compactly exhibits containment, exclusion, and implicativity….

P Jimmy Dean refused to move without blue jeans. H James Dean didn’t dance without pants

yes

Natural logic attempts to capture valid inferences from their surface linguistic forms A revival of Aristotelian syllogistics An example:

7 basic entailment relations

Venn symbol name example

P = Q equivalence couch = sofa

P ⊏ Q forward entailment (strict)

crow ⊏ bird

P ⊐ Q reverse entailment (strict)

European ⊐ French

P ^ Q negation (exhaustive exclusion)

human ^ nonhuman

P | Q alternation (non-exhaustive exclusion)

cat | dog

P _ Q cover (exhaustive non-exclusion)

animal _ nonhuman

P # Q independence hungry # hippo

Relations are defined for all semantic types: tiny ⊏ small, hover ⊏ fly, kick ⊏ strike, ��this morning ⊏ today, in Beijing ⊏ in China, everyone ⊏ someone, all ⊏ most ⊏ some

Lexical entailment classification

P Jimmy Dean

refused to move without blue jeans

H James Dean did n’t dance without pants

edit��index 1 2 3 4 5 6 7 8

edit��type SUB DEL INS INS SUB MAT DEL SUB

lex��feats

strsim=��0.67

implic: ��–/o cat:aux cat:neg hypo hyper

lex��entrel = | = ^ ⊐ = ⊏ ⊏

inversion

Entailment projection

P Jimmy Dean



edit��index 1 2 3 4 5 6 7 8


lex��feats

strsim=��0.67


lex��entrel = | = ^ ⊐ = ⊏ ⊏

projec-tivity ↑ ↑ ↑ ↑ ↓ ↓ ↑ ↑

atomic��entrel = | = ^ ⊏ = ⊏ ⊏

Final answer

Entailment composition

P Jimmy Dean



edit��index 1 2 3 4 5 6 7 8


lex��feats

strsim=��0.67


lex��entrel = | = ^ ⊐ = ⊏ ⊏

projec-tivity ↑ ↑ ↑ ↑ ↓ ↓ ↑ ↑

atomic��entrel = | = ^ ⊏ = ⊏ ⊏

compo-sition = | | ⊏ ⊏ ⊏ ⊏ ⊏

fish | human

human ^ nonhuman

fish ⊏ nonhuman

For example:

✓

Multiword paraphrases

•  But this system is not so good at working out “multiword paraphrases”

– walked inland – moved away from the coast

– Pollack said the plaintiffs failed to show that Merrill and Blodget directly caused their losses

– Basically , the plaintiffs did not show that omissions in Merrill’s research caused the claimed losses

Hierarchical Deep Learning: Unsupervised Recursive Autoencoder

Recursive autoencoders capture sematic similarity

Recursive autoencoders for full-‐sentence paraphrase detection

Experiments on Microsoft Research Paraphrase Corpus (Dolan et al. 2004)

Language is inherently connected to people

“… the common misconception [is] that language use has primarily to do with words and what they mean. It doesn’t. It has primarily to do with people and what they mean.”

Asking questions and influencing answers

Clark & Schober, 1992

What does it mean?

A: Was the movie good?

B: Hysterical. We laughed so hard.

Was it a good movie? YES/NO ?

The outpouring of social language use on the web let’s us learn what people mean (as never

before)

Review ratings can teach modifier scales

Grounded learning of answer interpretations

A: Is this hurricane season extraordinary?

B: Very unusual in the sense of how many storms we've had.

•  We learn “contingent oppositions”

A: Is Obama qualified?

B: I think he is young.

Envoi

•  Probabilistic models have given us very good tools for analyzing human language sentences

•  We can extract participants and their relations with good accuracy

•  There is exciting work in text understanding and inference based on these foundations

•  This provides a basis for computers to do higher-‐level tasks that involve knowledge & reasoning

•  But much work remains to achieve the language competence of science fiction robots….

getting’computers’to’understand’ whattheyread...

Documents