discovery of linguistic relations using lexical attraction

73
Discovery of Linguistic Relations Using Lexical Attraction Deniz Yuret

Upload: others

Post on 10-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Discovery of Linguistic Relations Using Lexical Attraction

Discovery of Linguistic Relations

Using Lexical Attraction

Deniz Yuret

Page 2: Discovery of Linguistic Relations Using Lexical Attraction

Overview

• Motivation

• Demonstration

• Theory, Learning, Algorithm

• Evaluation

• Contributions

Page 3: Discovery of Linguistic Relations Using Lexical Attraction

Syntax and Semantics independently constrain

linguistic relations

• I saw the Statue of Liberty flying over New

York.

– Lenat, 1984

• I hit the boy with the girl with long hair

with a hammer with vengeance.

– Schank, 1973

• Colorless green ideas sleep furiously.

– Chomsky, 1956

Page 4: Discovery of Linguistic Relations Using Lexical Attraction

Contributions of this thesis

• Opening a door for the use of common

sense knowledge in language processing

and acquisition.

• A learning paradigm that bootstraps by

interdigitating learning with processing.

Page 5: Discovery of Linguistic Relations Using Lexical Attraction

Bringing common sense into language

John eats ice−cream

S O

John

ice−cream

eat

Page 6: Discovery of Linguistic Relations Using Lexical Attraction

Bootstrapping by interdigitating learning and

processing

P

M

Page 7: Discovery of Linguistic Relations Using Lexical Attraction

Phrase structure versus dependency

structure

The glorious sun will shine in the winter

DeterminerAdjective Noun

NP

NP2

Aux Verb

VP

VP2

PrepPP

S

Noun

NP2Determiner

NP

The glorious sun will shine in the winter

Page 8: Discovery of Linguistic Relations Using Lexical Attraction

Discovery of Linguistic Relations

An Example

Simple Sentence 1/5

(Before training)

* these people also want more government money for education . *

Page 9: Discovery of Linguistic Relations Using Lexical Attraction

Simple Sentence 2/5

(After 1000 words of training)

* these people also want more government money for education . *

Page 10: Discovery of Linguistic Relations Using Lexical Attraction

Simple Sentence 3/5

(After 10,000 words of training)

* these people also want more government money for education . *

Page 11: Discovery of Linguistic Relations Using Lexical Attraction

Simple Sentence 4/5

(After 100,000 words of training)

* these people also want more government money for education . *

Page 12: Discovery of Linguistic Relations Using Lexical Attraction

Simple Sentence 5/5

(After 1,000,000 words of training)

* these people also want more government money for education . *

Page 13: Discovery of Linguistic Relations Using Lexical Attraction

Bringing common sense into language

The theory

John eats ice−cream

S O

John

ice−cream

eat

Page 14: Discovery of Linguistic Relations Using Lexical Attraction

A Theory of Syntactic Relations

• Lexical attraction is the likelihood of a

syntactic relation

• The context of a word is given by its syn-

tactic relations

• Syntactic relations can be formalized as a

graph

• Entropy is determined by syntactic rela-

tions

Page 15: Discovery of Linguistic Relations Using Lexical Attraction

H = −∑pi log pi

The information content of a word:

The IRA is fighting British rule in Northern Ireland4.20 15.85 7.33 13.27 12.38 13.20 5.80 12.60 14.65

Total: 99.28 bits

Page 16: Discovery of Linguistic Relations Using Lexical Attraction

The word pair and relative information:

Ireland

3.53

Northern

Northern1.48

Ireland

Northern Ireland

12.60

12.60

14.65

14.65

Page 17: Discovery of Linguistic Relations Using Lexical Attraction

The lexical attraction link:

IrelandNorthern12.60 14.65

11.12

Page 18: Discovery of Linguistic Relations Using Lexical Attraction

Language Model Determines the Context

The IRA is fighting British rule in Northern Ireland4.20 12.90 3.73 10.54 8.66 5.96 3.57 9.25 3.53

> > > > > > > >

Total: 99.28 → 62.34 bits

Page 19: Discovery of Linguistic Relations Using Lexical Attraction

Context should be determined by syntactic re-

lations:

The man with the dog spoke?

The man with the dog spoke

Page 20: Discovery of Linguistic Relations Using Lexical Attraction

Context should be determined by syntactic re-

lations:

The IRA is fighting British rule in Northern Ireland1.25 6.60 4.60 13.27 5.13 8.13 2.69 1.48 6.70

<<

<>

<

><

<

Total: 62.34 → 49.85 bits

Page 21: Discovery of Linguistic Relations Using Lexical Attraction

Dependency structure is acyclic:

• Mathematically: cannot use all the lexical

attraction links in a cycle.

• Linguistically: cannot construct a consis-

tent head-modifier structure.

A B C

Page 22: Discovery of Linguistic Relations Using Lexical Attraction

Syntactic relations form a planar tree:

(Links do not cross)

I met the woman in the red dress in the afternoon

I met the woman in the afternoon in the red dress

?

Page 23: Discovery of Linguistic Relations Using Lexical Attraction

Syntactic relations form a planar tree:

(Links do not cross)

• Hays and Lecerf (1960) discovered that

(almost) all sentences in a language are

planar.

• Gaifman (1965) proved that a planar de-

pendency grammar can generate the same

set of languages as a context free gram-

mar.

• Planar trees can be encoded with constant

number of bits per word.

Page 24: Discovery of Linguistic Relations Using Lexical Attraction

Cayley’s formula for counting trees:

T(n) = nn−2

Planar trees are polynomial in n:

The IRA is fighting British rule in Northern Ireland<

<<

><

><

<

Encoding: LPLLPPRLPRLPLPPP

L:10 R:11 P:0

Upper bound: 3 bits per word

Page 25: Discovery of Linguistic Relations Using Lexical Attraction

Lexical attraction is symmetric

The IRA is fighting British rule

The IRA is fighting British rule

The IRA is fighting British rule

Page 26: Discovery of Linguistic Relations Using Lexical Attraction

Lexical attraction is symmetric

S = (W, L, w0)

W = { wi }L = { (wi, wj) }

P(S) = P(L)P(w0)∏

(wi,wj)∈L

P(wj | wi)

= P(L)P(w0)∏

(wi,wj)∈L

P(wi, wj)

P(wi)

= P(L)∏

wi∈W

P(wi)∏

(wi,wj)∈L

P(wi, wj)

P(wi)P(wj)

Page 27: Discovery of Linguistic Relations Using Lexical Attraction

Dependency structure is an undirected, acyclic,

planar graph:

The IRA is fighting British rule in Northern Ireland4.20 15.85 7.33 13.27 12.38 13.20 5.80 12.60 14.65

2.95

9.25

2.73

5.07

7.25

7.95

3.11

11.12

Page 28: Discovery of Linguistic Relations Using Lexical Attraction

Information in a Sentence =

Information in Words

+ Information in the Tree

- Mutual Information in Syntactic Relations

Page 29: Discovery of Linguistic Relations Using Lexical Attraction

The Memory

P

M

Page 30: Discovery of Linguistic Relations Using Lexical Attraction

The memory observes the processor

kick the ball now

kick the ball now

ball nowthekick

Page 31: Discovery of Linguistic Relations Using Lexical Attraction

Learning simple structures

kick the ball now

the ball

ballthe

throw at

with in

kick the ball now

Page 32: Discovery of Linguistic Relations Using Lexical Attraction

Simple structures help see complex

structures

kick the ball now

kick ball now

the

kick the now

ball

Page 33: Discovery of Linguistic Relations Using Lexical Attraction

Learning complex structures

kick the ball now

kick ball now

the

kick the nowball

kick the ball now

Page 34: Discovery of Linguistic Relations Using Lexical Attraction

The Processor

P

M

Page 35: Discovery of Linguistic Relations Using Lexical Attraction

• We need to discover the best linkage.

* these people also want more government money for education . *

Page 36: Discovery of Linguistic Relations Using Lexical Attraction

• Words are read in left to right order.

* these

118

Page 37: Discovery of Linguistic Relations Using Lexical Attraction

• New word considers links with previous

words.

* these people

118

348

Page 38: Discovery of Linguistic Relations Using Lexical Attraction

• Cycles are not allowed.

• Link with minimum score gets rejected.

* these people

118 348

55

Page 39: Discovery of Linguistic Relations Using Lexical Attraction

• Link with negative value not accepted.

* these people also

118 348

−164

Page 40: Discovery of Linguistic Relations Using Lexical Attraction

• Link crossing not allowed.

• Link with minimum score gets eliminated.

* these people also want

118 348178

143

315

Page 41: Discovery of Linguistic Relations Using Lexical Attraction

* these people also want

118 348 143315

261

Page 42: Discovery of Linguistic Relations Using Lexical Attraction

• The two constraints straighten out previ-

ous mistakes by eliminating bad links.

* these people also want more government money

118 348 143315

12653

43

401

Page 43: Discovery of Linguistic Relations Using Lexical Attraction

• Eliminating bad links 2/3

* these people also want more government money

118 348 143315

126

43401

209

Page 44: Discovery of Linguistic Relations Using Lexical Attraction

• Eliminating bad links 3/3

* these people also want more government money

118 348 143315

43401

209

66

Page 45: Discovery of Linguistic Relations Using Lexical Attraction

• New link can knock off old link in cycle.

* these people also want more government money for education

118 348 143315

43401

209

261 258

392

Page 46: Discovery of Linguistic Relations Using Lexical Attraction

• The final result.

* these people also want more government money for education .

118 348 143315

43401

209

261392

107

Page 47: Discovery of Linguistic Relations Using Lexical Attraction

Discovery of Linguistic Relations

Using Lexical Attraction

A demonstration

• Long distance link

• Complex noun phrase

• Syntactic ambiguity

Page 48: Discovery of Linguistic Relations Using Lexical Attraction

Long Distance Link 1/3

(After 1,000 words of training)

* the cause of his death friday was not given . *

Page 49: Discovery of Linguistic Relations Using Lexical Attraction

Long Distance Link 2/3

(After 100,000 words of training)

* the cause of his death friday was not given . *

Page 50: Discovery of Linguistic Relations Using Lexical Attraction

Long Distance Link 3/3

(After 10,000,000 words of training)

* the cause of his death friday was not given . *

Page 51: Discovery of Linguistic Relations Using Lexical Attraction

Complex Noun Phrase 1/4

(After 10,000 words of training)

* the new york stock exchange composite index fell . *

Page 52: Discovery of Linguistic Relations Using Lexical Attraction

Complex Noun Phrase 2/4

(After 100,000 words of training)

* the new york stock exchange composite index fell . *

Page 53: Discovery of Linguistic Relations Using Lexical Attraction

Complex Noun Phrase 3/4

(After 1,000,000 words of training)

* the new york stock exchange composite index fell . *

Page 54: Discovery of Linguistic Relations Using Lexical Attraction

Complex Noun Phrase 4/4

(After 10,000,000 words of training)

* the new york stock exchange composite index fell . *

Page 55: Discovery of Linguistic Relations Using Lexical Attraction

Syntactic Ambiguity 1/3

(After 1,000,000 words of training)

* many people died in the clashes in the west in september . *

Page 56: Discovery of Linguistic Relations Using Lexical Attraction

Syntactic Ambiguity 1/3

(After 10,000,000 words of training)

* many people died in the clashes in the west in september . *

Page 57: Discovery of Linguistic Relations Using Lexical Attraction

Syntactic Ambiguity 2/3

(After 500,000 words of training)

* a number of people protested . *

* the number of people increased . *

Page 58: Discovery of Linguistic Relations Using Lexical Attraction

Syntactic Ambiguity 2/3

(After 5,000,000 words of training)

* a number of people protested . *

* the number of people increased . *

Page 59: Discovery of Linguistic Relations Using Lexical Attraction

Syntactic Ambiguity 3/3

(After 1,000,000 words of training)

* the driver saw the airplane flying over washington . *

* the pilot saw the train flying over washington . *

Page 60: Discovery of Linguistic Relations Using Lexical Attraction

Syntactic Ambiguity 3/3

(After 10,000,000 words of training)

* the driver saw the airplane flying over washington . *

* the pilot saw the train flying over washington . *

Page 61: Discovery of Linguistic Relations Using Lexical Attraction

Results

• Evaluation criteria

• Upper and lower bounds

• Link accuracy

• Related work

Page 62: Discovery of Linguistic Relations Using Lexical Attraction

Evaluation criteria: Content-word links

I saw the mountains flying over New York?

?

People want more money for education? ?

Page 63: Discovery of Linguistic Relations Using Lexical Attraction

Training

• Up to 100 million words of Associated Press

material.

Testing

• 200 out-of-sample sentences.

• Selected from 5000 word vocabulary (90%

of all the words seen in the corpus).

• 3152 words (15.76 words per sentence).

• Hand parsed with 1287 content-word links.

Page 64: Discovery of Linguistic Relations Using Lexical Attraction

Accuracy:

n1 = human links

n2 = program links

n12 = common links

• Precision = n12 / n2

• Recall = n12 / n1

Page 65: Discovery of Linguistic Relations Using Lexical Attraction

Lower bound:

Random lexical attraction → 8.9% precision,

5.4% recall

Linking every adjacent word → 41% recall

Upper bound:

85% of syntactically related pairs have posi-

tive lexical attraction

Page 66: Discovery of Linguistic Relations Using Lexical Attraction

Recording adjacent pairs

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 10 100 1000 10000 100000 1e+06 1e+07 1e+08

Per

cent

age

Number of words trained

Procedure 1: Recording adjacent pairs

PrecisionRecall

Precision = 67%

Recall = 41%

Page 67: Discovery of Linguistic Relations Using Lexical Attraction

Recording all pairs

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 10 100 1000 10000 100000 1e+06 1e+07 1e+08

Per

cent

age

Number of words trained

Procedure 2: Recording all pairs

PrecisionRecall

Precision = 55%

Recall = 48%

Page 68: Discovery of Linguistic Relations Using Lexical Attraction

Using feedback from processor

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 10 100 1000 10000 100000 1e+06 1e+07 1e+08

Per

cent

age

Number of words trained

Procedure 3: Recording pairs selected by processor

PrecisionRecall

Precision = 62%

Recall = 52%

Page 69: Discovery of Linguistic Relations Using Lexical Attraction

Related work

• Magerman and Marcus, 1990

• Lari and Young, 1990

• Pereira and Schabes, 1992

• Briscoe and Waegner, 1992

• Carroll and Charniak, 1992

• Stolcke, 1994

• Chen, 1996

• de Marcken, 1996

Page 70: Discovery of Linguistic Relations Using Lexical Attraction

de Marcken, 1995

S

CP

AP C

A BP

B

S

CP

C

A

BAP

BP

AP => A BPBP => BCP => AP C

AP => ABP => AP BCP => BP C

Page 71: Discovery of Linguistic Relations Using Lexical Attraction

Lessons learned

• Training with words instead of parts of

speech enable the program to learn com-

mon but idiosyncratic usages of words.

• Not committing to early generalizations

prevent the program from making irrecov-

erable mistakes early.

• Using a representation that makes the rel-

evant features (such as syntactic relations)

explicit simplifies learning.

Page 72: Discovery of Linguistic Relations Using Lexical Attraction

Contributions

• Opening a door for common sense in lan-

guage

• Bootstrapping from zero by interdigitat-

ing learning and processing

Page 73: Discovery of Linguistic Relations Using Lexical Attraction

Future Work

• Second degree models

• History mechanism

• Categorization and generalization