grammatical processing with lfg and xle ron kaplan arda symposium, august 2004 advanced question...

Grammatical processing withGrammatical processing with LFG and XLE LFG and XLE

Ron KaplanRon Kaplan

ARDA Symposium, August 2004ARDA Symposium, August 2004

Advanced QUestionAnswering for INTelligence

F-structure Conceptual semantics

XLE/LFG Parsing KR MappingTarget

KRRText

Sources

Question

Assertions

Text to user

ComposedF-StructureTemplates

AnswersExplanationsSubqueries

XLE/LFGGeneration

MMatch M

Layered Architecture for Question AnsweringLayered Architecture for Question Answering

KRRText

Sources

Question

Assertions

Text to user

ComposedF-StructureTemplates

AnswersExplanationsSubqueries

XLE/LFGGeneration

Layered Architecture for Question AnsweringLayered Architecture for Question Answering

KRRText

Sources

Question

Assertions

TheoriesLexical Functional GrammarAmbiguity managementGlue Semantics

ResourcesEnglish GrammarGlue lexiconKR mapping

InfrastructureXLEMaxEnt models Linear deductionTerm rewriting

Deep analysis matters…Deep analysis matters… if you care about the answerif you care about the answer

Example:

A delegation led by Vice President Philips, head of the chemical division, flew to Chicago a week after the incident.

Question: Who flew to Chicago?

Candidate answers:

division closest nounhead next closestV.P. Philips next

shallow but wrong

delegation furthest away butSubject of flew

deep and right

“grammatical function”

F-structure: localizes argumentsF-structure: localizes argumentsWas John pleased? “John was easy to please” Yes “John was eager to please” Unknown

PRED please(SUBJ, OBJ)SUBJ someoneOBJ John

PRED easy(SUBJ, COMP)SUBJ John

PRED please(SUBJ, OBJ)SUBJ JohnOBJ someone

PRED eager(SUBJ, COMP)SUBJ John

“lexical dependency”

TopicsTopics

Basic LFG architecture Ambiguity management in XLE Pargram project: Large scale grammars Robustness Stochastic disambiguation [Shallow markup] [Semantic interpretation]

Focus on the language end, not knowledge

“Tony decided to go.”

The Language Mapping: LFG & XLEThe Language Mapping: LFG & XLE

Functionalstructures

LFGGrammar

NamedEntities

Generate

English, German, etc.

Sentence

XLE: Efficient ambiguitymanagement

TokensMorphology

StochasticModel

Why deep analysis is difficultWhy deep analysis is difficult Languages are hard to describe

– Meaning depends on complex properties of words and sequences

– Different languages rely on different properties

– Errors and disfluencies

Languages are hard to compute– Expensive to recognize complex patterns– Sentences are ambiguous– Ambiguities multiply: explosion in time and space

EnglishGroup, order

JapaneseGroup, mark

The small children are chasing the dog.

tiisaismall

kodomotatichildren

inudog

oikaketeiruare chasing

Different patterns code same meaningDifferent patterns code same meaning

NAdjDet

V’ NP

the small

children

dogare

chasing

EnglishGroup, order

JapaneseGroup, mark

The small children are chasing the dog.

tiisaismall

kodomotatichildren

inudog

oikaketeiruare chasing

Different patterns code same meaningDifferent patterns code same meaning

NAdjDet

V’ NP

the small

children

dogare

chasing

WarlpiriMark only

witajarrarlusmall-Sbj

malikidog-Obj

kurdujarrarluchildren-Sbj

wajilipinyichase

kapalaPresent

A chase(small(children), dog)

Pred ‘chase<Subj, Obj>’

Tense Present

PredMod

childrensmall

Pred dog

LFG theory: minor adjustments on universal theme

LFG architectureLFG architecture

C(onstituent)-structures and F(unctional)-structures

Formal encoding of grammatical relations

Formal encoding of order and grouping

ModularityNearly-decomposable

SUBJPRED ‘John’NUM SG

TENSE PRESENT

PRED ‘Mary’NUM SG

PRED ‘like<SUBJ,OBJ>’

related by a piecewise correspondence

LFG LFG grammargrammar

S NP VP( SUBJ)= =

Lexical entries

N John

( PRED)=‘John’

( NUM)=SG

V likes

( PRED)=‘like<SUBJ, OBJ>’

( SUBJ NUM)=SG

(↑ SUBJ PERS)=3

Context-free rules define valid c-structures (trees). Annotations on rules give constraints that corresponding f-

structures must satisfy. Satisfiability of constraints determines grammaticality. F-structure is solution for constraints (if satisfied).

VP V (NP)= ( OBJ)=

NP (Det) N = =

Rules as well-formedness conditionsRules as well-formedness conditions

S NP( SUBJ)=

NP VPSUBJ [ ]

A tree containing S over NP - VP is OK if F-unit corresponding to NP node is SUBJ of f-unit corresponding to S node The same f-unit corresponds to both S and VP nodes.

If * denotes a particular daughter: : f-structure of mother (M(*)) : f-structure of daughter (*)

v be the f-structure of the VP

s be the f-structure of the NP

Let f be the (unknown) f-structure of the S

(f SUBJ NUM)=PL and (f SUBJ NUM)=SG

=> SG=PL => FALSE

NP( SUBJ)=

walks( SUBJ NUM)=SG

S VP =

they( NUM)=PL

Inconsistent equations = UngrammaticalInconsistent equations = Ungrammatical

What’s wrong with “They walks” ?

(f SUBJ) = s and (s NUM)=PL => (f SUBJ NUM)=PL

Then (substituting equals for equals):

f = v and (v SUBJ NUM)=SG => (f SUBJ NUM)=SG

If a valid inference chain yields FALSE, the premises are unsatisfiable, no f-structure.

walksthey

English and JapaneseEnglish and Japanese

NP*( ( GF))=

Japanese: Any number of NP’s before Verb Particle on each defines its grammatical function

ga: ( GF)=SUBJ

o: ( GF)=OBJ

English: One NP before verb, one after: Subject and Object

NP( SUBJ)=

NP( OBJ)=

Warlpiri: Discontinuous constituentsWarlpiri: Discontinuous constituents

witajarrarlusmall-Sbj

malikidog-Obj

kurdujarrarluchildren-Sbj

wajilipinyichase

kapalaPresent

VAux NP

PRED ‘chase<Subj, Obj>’

TENSE Present

PREDMOD

childrensmall

PRED dog

Like Japanese: Any number of NP’s Particle on each defines its grammatical function

Unlike Japanese, head Noun is optional in NP

NP A* ( MOD)

rlu: ( GF)=SUBJki: ( GF)=OBJ

S … NP* …( ( GF))=

English: Discontinuity in questionsEnglish: Discontinuity in questionsWho did Mary see?Who did Bill think Mary saw?Who did Bill think saw Mary?

Who is understood as subject/object of distant verb.Uncertainty: which function of which verb?

PRED see<SUBJ,OBJ>TENSE pastSUBJ MaryOBJ

Mary saw

Aux NP V S

did Bill think

Q WhoTENSE pastPRED think<SUBJ, COMP>

S’ → NP S(↑ Q)=↓ ↑=↓

(↑ COMP* SUBJ|OBJ)=↓

OBJCOMP OBJCOMP SUBJ

Summary: Lexical Functional GrammarSummary: Lexical Functional Grammar

Modular: c-structure/f-structure in correspondence Mathematically simple, computationally transparent

– Combination of Context-free grammar, Quantifier-free equality theory– Closed under composition with regular relations: finite-state morphology

Grammatical functions are universal primitives– Subject and Object expressed differently in different languages

English: Subject is first NP Japanese: Subject has ga

– But: Subject and Object behave similarly in all languages Active to Passive: Object becomes Subject English: move words Japanese: move ga

Adopted by world-wide community of linguists– Large literature: papers, (text)books, conferences; reference theory– (Relatively) easy to describe all languages– Linguists contribute to practical computation

Stable: Only minor changes in 25 years

Kaplan and Bresnan, 1982

Efficient computation withEfficient computation with LFG grammars: LFG grammars: Ambiguity Management in XLE Ambiguity Management in XLE

Computation challenge: Pervasive ambiguityComputation challenge: Pervasive ambiguity

walks Noun or Verb? untieable knot (untie)able or un(tieable)? bank river or financial?

The duck is ready to eat. Cooked or hungry?

Every proposer wants an award. The same award or each their own?

I like Jan. |Jan|.| or |Jan.|.| (sentence end or abbreviation) I like Jan. |Jan|.| or |Jan.|.| (sentence end or abbreviation)

The sheet broke the beam. Atoms or photons?

KnowledgeSemanticsSyntaxMorphologyTokenization

Coverage vs. AmbiguityCoverage vs. Ambiguity

I fell in the park.

+I know the girl in the park.

I see the girl in the park.

Ambiguity can be explosiveAmbiguity can be explosiveIf alternatives multiply within or across components…

Tokenize

Morphology

Syntax

antics

Computational consequences of ambiguityComputational consequences of ambiguity

Serious problem for computational systems– Broad coverage, hand written grammars frequently produce

thousands of analyses, sometimes millions– Machine learned grammars easily produce hundreds of

thousands of analyses if allowed to parse to completion

Three approaches to ambiguity management:– Prune: block unlikely analysis paths early– Procrastinate: do not expand alternative analysis paths until

something else requires them» Also known as underspecification

– Manage: compact representation and computation of all possible analyses

Pruning Pruning Premature Disambiguation⇒ Premature Disambiguation⇒ Conventional approach: Use heuristics to kill as soon as possible

Tokenize

Morphology

Syntax

antics

Statistics

Oops: Strong constraints may reject the so-far-best (= only) option

Fast computation, wrong result

Procrastination: Passing the BuckProcrastination: Passing the Buck

Chunk parsing as an example:– Collect noun groups, verb groups, PP groups– Leave it to later processing to put these together– Some combinations are nonsense

Later processing must either:– Call (another) parser to check constraints– Have its own model of constraints (= grammar)– Solve constraints that chunker includes with output

Computational Complexity of LFG Computational Complexity of LFG

LFG is simple combination of two simple theories– Context-free grammars for trees– Quantifier free theory of equality for f-structures

Both theories are easy to compute– Cubic CFG Parsing– Linear equation solving

Combination is difficult: Parsing problem is NP Complete– Exponential/intractible in the worst case

(but computable, unlike some other linguistic theories

– Can we avoid the worst case?

Some syntactic dependenciesSome syntactic dependencies

Local dependencies: These dogs *This dogs (agreement)

Nested dependencies: The dogs [in the park] bark (agreement)

Cross-serial dependencies:

Jan Piet Marie zag helpen zwemmen (predicate/argument

See(Jan, help(Piet, swim(Marie)))

Long distance dependencies:

The girl who John says that Bob believes … likes Henry left.

Left(girl) Says(John, believes(Bob, (…likes(girl, Henry))))

ExponentialIntractable!

Expressiveness vs. complexityExpressiveness vs. complexity

But languages have mostly local and nested dependencies... so (mostly) cubic performance should be possible.

Linear

The Chomsky Hierarchy

n n is length of sentenceis length of sentence

Type DependencyComputational

Complexity

Regular Local O(n)

Context-free Nested O(n3)

Context-sensitive

Cross-serial and

Long DistanceO(2n)

NP Complete ProblemsNP Complete Problems Problems that can be solved by a Nondeterministic Turing Machine

in Polynomial time General characterization: Generate and test

– Lots of candidate solutions that need to be verified for correctness– Every candidate is easy to confirm or disconfirm

n elementsn elements

22nn candidates candidates

NonNondeterministic TM has an oracledeterministic TM has an oraclethat provides only the right candidatesthat provides only the right candidatesto test, doesn’t search.to test, doesn’t search.

Deterministic TM doesn’t have oracle,Deterministic TM doesn’t have oracle,must test all (exponentially many) candidates. must test all (exponentially many) candidates.

Polynomial search problemsPolynomial search problems Subparts of a candidate are independent of other parts: outcome is

not influenced by other parts (context free) The same independent subparts appear in many candidates We can (easily) determine that this is the case Consequence: test subparts independent of context, share results

Why is LFG parsing NP Complete?Why is LFG parsing NP Complete?

Classic generate-and-test search problem

Exponentially many tree-candidates– CFG chart parser quickly produces packed representation

of all trees– CFG can be exponentially ambiguous– Each tree must be tested for f-structure satisfiability

Exponentially many exponential problems

Boolean combinations of per-tree constraintsEnglish base verbs: Not 3rd singular

( SUBJ NUM)SG ( SUBJ PERS)3

Disjunction!

XLE Ambiguity Management: The intuitionXLE Ambiguity Management: The intuition

The sheep saw the fish.How many sheep?

How many fish?

Packed representation is a “free choice” system– Encodes all dependencies without loss of information– Common items represented, computed once– Key to practical efficiency

The sheep-sg saw the fish-sg.The sheep-pl saw the fish-sg.The sheep-sg saw the fish-pl.The sheep-pl saw the fish-pl.

Options multiplied out

In principle, a verb might require agreement of Subject and Object: Have to check it out.

The sheep saw the fishsgpl

Options packed

But English doesn’t do that: Subparts are independent

… but it’s wrongDoesn’t encode all dependencies, choices are not free.

Dependent choicesDependent choices

Das Mädchen-nom sah die Katze-nomDas Mädchen-nom sah die Katze-accDas Mädchen-acc sah die Katze-nomDas Mädchen-acc sah die Katze-acc

Das Mädchen sah die Katzenomacc

nomacc

The girl saw the cat

Again, packing avoids duplication

badThe girl saw the catThe cat saw the girl bad

Who do you want to succeed? I want to succeed John want intrans, succeed trans I want John to succeed want trans, succeed intrans

Solution: Label dependent choicesSolution: Label dependent choices

Das Mädchen-nom sah die Katze-nomDas Mädchen-nom sah die Katze-accDas Mädchen-acc sah die Katze-nomDas Mädchen-acc sah die Katze-acc

badThe girl saw the catThe cat saw the girl bad

• Label each choice with distinct Boolean variables p, q, etc.• Record acceptable combinations as a Boolean expression • Each analysis corresponds to a satisfying truth-value assignment

(free choice from the true lines of ’s truth table)

Das Mädchen sah die Katze p:nom

(pq) =

Boolean SatisfiabilityBoolean Satisfiability

Produces simple conjunctions of literal propositions (“facts”--equations)

Easy checks for satisfiability

If ad FALSE, replace any conjunction with a and d by FALSE.

Blow-up of disjunctive structure before fact processing

Individual facts are replicated (and re-processed): Exponential

(a b) x (c d)

(a x c) (a x d) (b x c) (b x d)

Can solve Boolean formulas by multiplying out: Disjunctive Normal Form

Alternative: “Contexted” normal formAlternative: “Contexted” normal form

Produce a flat conjunction of contexted facts

(a b) x (c d) pa pb x qc qd

context fact

– Each fact is labeled with its position in the disjunctive structure– Boolean hierarchy discarded

(a b) x (c d) pa pb x qc qd

a b dc

xp qp q

xp a p b q c q d

Produce a flat conjunction of contexted facts

No blow-up, no duplicates– Each fact appears and can be processed once– Claims:

» Checks for satisfiability still easy» Facts can be processed first, disjunctions deferred

Alternative: “Contexted” normal formAlternative: “Contexted” normal form

Conversion to logically equivalent contexted form

Lemma: iff p p (p a new Boolean variable)

Proof:(If) If is true, let p be true, in which case p p is true. (Only if) If p is true, then is true, in which case is true.

A sound and complete methodA sound and complete method

Ambiguity-enabled inference (by trivial logic):

If is a rule of inference, then so is [C1 ] [C2 ] [(C1C2) ]

Maxwell & Kaplan, 1987, 1991

E.g. Substition of equals for equals: x=y x/y is a rule of inference

Therefore: (C1 x=y) (C2 ) (C1 C2 x/y)

Valid for any theory

Test for satisfiabilityTest for satisfiability

Perform all fact-inferences, conjoining contexts If infer FALSE, add context to nogoods Solve conjunction of nogoods

– Boolean satisfiability: exponential in nogood context-Booleans– Independent facts: no FALSE, no nogoods

Implicitly notices independence/context-freeness

E.g. R SG=PL ⇒ R → FALSE.

R is called a “nogood” context.

Suppose R FALSE is deduced from a contexted formula . Then is satisfiable only if R.

Example 1Example 1

“They walk”– No disjunction, all facts are in the default “True” context– No change to inference

T(f SUBJ NUM)=SG T(f SUBJ NUM)=SG T SG=SG

reduces to: (f SUBJ NUM)=SG (f SUBJ NUM)=SG SG=SG

“They walks”– No disjunction, all facts still in the default “True” context– No change to inference:

T(f SUBJ NUM)=PL T(f SUBJ NUM)=SG TPL=SG T→FALSE

Satisfiable iff ¬T, so unsatisfiable

Examples 2Examples 2

“The sheep walks”– Disjunction of NUM feature from sheep

(f SUBJ NUM)=SG (f SUBJ NUM)=PL

– Contexted facts: p(f SUBJ NUM)=SG p(f SUBJ NUM)=PL

(f SUBJ NUM)=SG (from walks)

– Inferences: p(f SUBJ NUM)=SG (f SUBJ NUM)=SG p SG=SG

p(f SUBJ NUM)=PL (f SUBJ NUM)=SG p PL=SG p FALSE

p FALSE is true iff p is false iff p is True. Conclusion: Sentence is grammatical in context p: Only 1 sheep

Contexts and packing: Index by facts Contexts and packing: Index by facts

The sheep saw the fish.

Contexted unification concatenation, when choices don’t interact.

NUM p SGp PL

NUM q SGq PL

NUM p SGp PL

NUM q SGq PL

Compare: DNF unificationCompare: DNF unification

The sheep saw the fish.

DNF cross-product of alternatives: Exponential

[ SUBJ [NUM SG]]

[SUBJ [NUM PL ]]

[ OBJ [NUM SG]]

[ OBJ [NUM PL ]]

SUBJ [ NUM SG]OBJ [ NUM SG]

SUBJ [ NUM SG]OBJ [ NUM PL]

SUBJ [ NUM PL]OBJ [ NUM SG]

SUBJ [ NUM PL]OBJ [ NUM PL]

The XLE wagerThe XLE wager (for real sentences of real languages)(for real sentences of real languages)

Alternatives from distant choice-sets can be freely chosen without affecting satisfiability– FALSE is unlikely to appear

Contexted method optimizes for independence– No FALSE no nogoods nothing to solve.

Bet: Worst case 2n reduces to k2m where m << n

Ambiguity-enabled inference:Ambiguity-enabled inference:Choice-logic common to all modulesChoice-logic common to all modules

If is a rule of inference,then so is C1 C2 (C1C2)

1. Substitution of equals for equals (e.g. for LFG syntax) x=y x/y

Therefore: C1x=y C2 (C1C2) x/y

Ambiguity-enabled components propagate choices, can defer choosing, enumerating

2. Reasoning Cause(x,y) Prevent(y,z) Prevent(x,z)

Therefore: C1Cause(x,y) C2Prevent(y,z) (C1C2)Prevent(x,z)

3. Log-linear disambiguation Prop1(x) Prop2(x) Count(Featuren)

Therefore: C1 Prop1(x) C2 Prop2(x) (C1C2) Count(Featuren)

Summary: Contexted constraint satisfactionSummary: Contexted constraint satisfaction

Packed– facts not duplicated– facts not hidden in Boolean structure

Efficient– deductions not duplicated– fast fact processing (e.g. equality) can prune slow disjunctive processing– optimized for independence

General and simple– applies to any deductive system, uniform across modules – not limited to special-case disjunctions– mathematically trivial

Compositional free-choice system– enumeration of (exponentially many?) valid solutions deferred across

module boundaries– enables backtrack-free, linear-time, on-demand enumeration– enables packed refinement by cross-module constraints: new nogoods

The remaining exponentialThe remaining exponential

Contexted constraint satisfaction (typically) avoids the Boolean explosion in solving f-structure constraints for single trees

How can we suppress tree enumeration? (and still determine satisfiability)

Ordering strategy: Easy things firstOrdering strategy: Easy things first

Do all c-structure before any f-structure processing– Chart is a free choice representation, guarantees valid trees

Only produce/solve f-structure constraints for constituents in complete, well-formed trees

[NB: Interleaved, bottom-up pruning is a bad idea] Bets on inconsistency, not independence

Asking the right questionAsking the right question

How can we make it faster?– More efficient unifier: undoable operations, better

indexing, clever data structures, compiling.– Reordering for more effective pruning.

Why not cubic?– Intuitively, the problem isn’t that hard.– GPSG: Natural language is nearly context free.– Surely for context-free equivalent grammars!

No f-structure filtering, no nogoods...No f-structure filtering, no nogoods... but still explosive but still explosive

LFG grammar for a context-free language:

a( A)=+

S S S( L)=

S( R)=

S Sa aa

Chart:Packed trees

F-structuresenumerate trees

R [A +]

L [A +]

R [A +]RL

Disjunctive lazy copyDisjunctive lazy copy

• Pack functional information from alternative local subtrees.

• Unpack/copy to higher consumers only on demand.

Automatically takes advantage of context freeness, without grammar analysis or compilation

p f1q f2r f3

p f6q f5r f4

( L)= on Sdoesn’t access

internal features

The XLE wagerThe XLE wager

Most feature dependencies are restricted to local subtrees – mother/daughter/sister interactions– maybe a grandmother now and then– very rarely span an unbounded distance

Optimize for local case– bounded computation per subtree gives cubic curve– graceful degradation with non-local interactions

… but still correct

Packing Equalities in F-structurePacking Equalities in F-structure

NP(SUBJ)=

NP(NUM)=sg

NP(SUBJ)=

Adj NP=

visiting relatives

visiting relatives (NUM)=pl

is(SUBJ NUM)=sg

boring

T (SUBJ NUM)=sg

A1 (SUBJ NUM)=sg

A2 (SUBJ NUM)=pl

T & A1 sg=sg

T & A2 sg=pl nogood(A2)

a:1 a:2

XLE Performance: HomeCentre CorpusXLE Performance: HomeCentre Corpus

0 20 40 60

Time (secs)

About 1100 English sentences

0 20 40 60

Local Subtrees

Time is ~linear in subtrees: Time is ~linear in subtrees: Nearly cubic Nearly cubic

0 1000 2000 3000 4000

Local Subtrees

Time (secs)

2.1 ms/subtree

R2=.79

French HomeCentreFrench HomeCentre

0 2000 4000 6000 8000

Local Subtrees

Time (secs)

3.3 ms/subtree

R2=.80

German HomeCentre German HomeCentre

0 1000 2000 3000 4000

Local Subtrees

Time (secs)

3.8 ms/subtree

R2=.44

Generation with LFG/XLEGeneration with LFG/XLE

Parse: string → c-structure → f-structure Generate: f-structure → c-structure → string Same grammar: shared development, maintenance Formal criterion: s ∈ Gen(Parse(s)) Practical criterion: don’t generate everything

– Parsing robustness → undesired strings, needless ambiguity– Use optimality marks to restrict generation grammar– Restricted (un)tokenizing transducer: don’t allow arbitrary white

space, etc.

Mathematics and ComputationMathematics and Computation

Formal properties Gen(f) is a (possibly infinite) set

– Equality is idempotent: x=y x=y ⇔ x=y∧– Longer strings with redundant equations map to same f-structure

What kind of set?Context-free language (Kaplan & Wedekind, 2000)

ComputationComputationXLE/LFG generation: Convert LFG grammar to CFG only for strings that map to f

– NP complete, ambiguity managed (as usual)– All strings in CFL are grammatical w.r.t. LFG grammar– Composition with regular relations is crucial

CFG is a packed, free-choice representation of all strings– Can use ordinary CF generation algorithms to enumerate strings– Can defer enumeration, give CFG for client to enumerate– Can apply other context-free technology

» Choose shortest string» Reduce to finite set of unpumped strings (Context free Pumping Lemma)» Choose most probable (for fluency, not grammaticality)

Generating from incomplete f-structuresGenerating from incomplete f-structures

Grammatical features can’t be read from– Back-end question-answering logic– F-structure translated from other language

Generating from a bounded underspecification of a complete f-structure is still context-free– Example: a skeleton of predicates– Proof: CFL’s are closed under union, bounded extensions produce

finite alternatives

Generation from arbitrary underspecification is undecidable– Reduces to undecidable emptiness problem (= Hilbert’s 10th)

(Dymetman, van Noord, Wedekind, Roach)

Question: What is the graph partitioning problem?– Generated Queries: “The graph partitioning problem is *”– Answer (Google): The graph partitioning problem is defined as

dividing a graph into disjoint subsets of nodes …

Question: When were the Rolling Stones formed?– Generated Queries: “The Rolling Stones were formed *”

“* formed the Rolling Stones *” – Answer (Google): Mick Jagger, Keith Richards, Brian Jones, Bill

Wyman,and Charlie Watts formed the Rolling Stones in

A (light-weight?) approach to QAA (light-weight?) approach to QA

ParseAsk GenerateQuestion F-structure Queries Search

Analyze the question, anticipate and search for possible answer phrases

Pipeline for Answer AnticipationPipeline for Answer Anticipation

Convert

GeneratorParserQuestionAnswerPhrases

Englishgrammar

Search(Google...)

Question f-structures Answer f-structures

Grammar engineering:Grammar engineering: The Parallel Grammar ProjectThe Parallel Grammar Project

Pargram projectPargram project Large-scale LFG grammars for several languages

– English, German, Japanese, French, Norwegian– Coming along: Korean, Urdu, Chinese, Arabic, Welsh, Malagasy, Danish– Intuition + Corpus: Cover real uses of language--newspapers, documents, etc.

Parallelism: test LFG universality claims– Common c- to f-structure mapping conventions

(unless typologically motivated variation)

– Similar underlying f-structures Permits shared disambiguation properties, Glue interpretation premises

– Practical: all grammars run on XLE software International consortium of world-class linguists

– PARC, Stuttgart, Fuji Xerox, Konstanz, Bergen, Copenhagen, Oxford, Dublin City University, PIEAS…

– Full week meetings, twice a year– Contributions to linguistics and comp-ling: books and papers – Each group is self-funded, self-managed

Pargram goalsPargram goals Practical

– Create grammatical resources for NL applications» translation, question answering, information retrieval, ...

– Develop discipline of grammar engineering» what tools, techniques, conventions make it easy to develop

and maintain broad-coverage grammars?» how long does it take?» how much does it cost?

Theoretical– Refine and guide LFG theory through broad coverage

of multiple languages– Refine and guide XLE algorithms and implementation

Parallel f-structures (where possible)Parallel f-structures (where possible)

……but different c-structuresbut different c-structures

Pargram grammarsPargram grammars

German

English*

French

Japanese (Korean)

#Rules

#States

13,655

#Disjuncts

13,294

55,725

16,938

* English allows for shallow markup: labeled bracketing, named-entities

Why Norwegian and Japanese?Why Norwegian and Japanese?

Engineering assessment: given mature system, parallel grammar specs. How hard is it?

Norwegian: best case– Well-trained LFG linguists– Users of previous Parc software– Closely related to existing Pargram languages

Japanese: worst case– One computer scientist, one traditional Japanese linguist--no LFG

experience– Typologically different language– Character sets, typographical conventions

Conclusion: not that hardFor both languages: good coverage, accuracy in ~2 person years

Engineering resultsEngineering results

Grammars and Lexicons Grammar writer’s cookbook (Butt et al., 1999)

New practical formal devices– Complex categories for efficiency NP[nom] vs. NP: ( CASE) = NOM

– Optimality marks for robustness

enlarge grammar without being overrun by peculiar analyses

– Lexical priority: merging different lexicons

Integration of off-the-shelf morphologyFrom Inxight, based on earlier PARC research, and Kyoto

Accuracy and coverageAccuracy and coverage WSJ F scores for English Pargram grammar

– Produces dependencies, not labeled trees

– Stochastic model trained on sections 2-22

– Tested on dependencies for 700 sentences in section 23

– Robustness: some output for every input

Fragments

Best 88.5 76.7

Most probable

82.5 69

Random 78.4 67.7

(Named Entities seem to bump these by ~3%)

Riezler et al., 2002

““Meridian will pay a premium of $30.5 million to Meridian will pay a premium of $30.5 million to assume $2 billion in deposits.” assume $2 billion in deposits.”

mood(pay~0, indicative),tense(pay~0, fut),adjunct(pay~0, assume~7),obj(pay~0, premium~3),stmt_type(pay~0, declarative),subj(pay~0, Meridian~5),det_type(premium~3, indef),adjunct(premium~3, of~23),num(premium~3, sg),pers(premium~3, 3),adjunct(million~4, 30.5~28),number_type(million~4, cardinal),num(Meridian~5, sg),pers(Meridian~5, 3),obj(assume~7, $~9),stmt_type(assume~7, purpose),

subj(assume~7, pro~8),number($~9, billion~17),adjunct($~9, in~11),num($~9, pl),pers($~9, 3),adjunct_type(in~11, nominal),obj(in~11, deposit~12),num(deposit~12, pl),pers(deposit~12, 3),adjunct(billion~17, 2~19),number_type(billion~17, cardinal),number_type(2~19, cardinal),obj(of~23, $~24),number($~24, million~4),num($~24, pl),pers($~24, 3),number_type(30.5~28, cardinal))

Accuracy and coverageAccuracy and coverage Japanese Pargram grammar

– ~97% coverage on large corpora» 10,000 newspaper sentences (EDR)» 460 copier manual sentences» 9,637 customer-relations sentences

– F-scores against 200 hand-annotated sentences from newspaper corpus:

» Best: 87%» Average: 80%

Recall: Grammar constructed with ~2 person-years of effort (compare: Effort to create an annotated training corpus)

Robustness:Robustness: Some output for every input Some output for every input

Sources of BrittlenessSources of Brittleness

Vocabulary problems– Gaps in coverage, neologisms, terminology– Incorrect entries, missing frames…

Missing constructions– No theoretical guidance (or interest)

(e.g. dates, company names)

– Core constructions overlooked» Intuition and corpus both limited

Ungrammatical input– Real world text is not perfect– Sometimes it’s horrendous

Strict performance limits (XLE parameters)

Real world inputReal world input

Other weak blue-chip issues included Chevron, which went down 2 to 64 7/8 in Big Board composite trading of 1.3 million shares; Goodyear Tire & Rubber, off 1 1/2 to 46 3/4, and American Express, down 3/4 to 37 1/4. (WSJ, section 13)

``The croaker's done gone from the hook” (WSJ, section 13)

(SOLUTION 27000 20) Without tag P-248 the W7F3 fuse is located in the rear of the machine by the charge power supply (PL3 C14 item 15. (Copier repair tip)

LFG entries from Finite-State MorphologiesLFG entries from Finite-State Morphologies

Broad-coverage inflectional transducers falls → fall +Noun +Pl

fall +Verb +Pres +3sg

Mary → Mary +Prop +Giv +Fem +Sg

vienne → venir +SubjP +SG {+P1|+P3} +Verb

For listed words, transducer provides– canonical stem form– inflectional information

On-the-fly LFG entries On-the-fly LFG entries “-unknown” head-word matches unrecognized stems Grammar writer defines -unknown and affixes

-unknown N (↑ PRED)=‘%stem’ (↑ NTYPE)=common;

V (↑ PRED)=‘%stem<SUBJ,OBJ>’.

+Noun N-AFX (↑ PERS)=3.

+Pl N-AFX (↑ NUM)=pl.

+Pres V-AFX (↑ TENSE)=present

+3sg V-AFX (↑ SUBJ PERS)=3 (↑ SUBJ NUM)=sg

Pieces assembled by sublexical rules: NOUN → N N-AFX*.

VERB → V V-AFX*.

(transitive)

Guessing for unlisted wordsGuessing for unlisted words

Use FST guesser for general patterns– Capitalized words can be proper nouns

» Saakashvili → Saakashvili +Noun +Proper +Guessed

– ed words can be past tense verbs or adjectives» fumped → fump +Verb +Past +Guessed

fumped +Adj +Deverbal +Guessed

Languages with richer morphology allow better guessers

Subcategorization and Argument Mapping?Subcategorization and Argument Mapping?

Transitive, intransitive, inchoative…– Not related to inflection– Can’t be inferred from shallow data

Fill in gaps from external sources– Machine readable dictionaries– Other resources: VerbNet, WordNet, FrameNet, Cyc– Not always easy, not always reliable

» Current research

Grammatical failuresGrammatical failures

Fall-back approach First try to get a complete analysis

– Prefer standard rules, but– Allow for anticipated errors

E.g. subject/verb disagree, but interpretation is obvious

– Optimality-theory marks to prefer standard analyses

If fail, enlarge grammar, try again– Build up fragments that get complete sub-parses

(c-structure and f-structure)– Allow tokens that can’t be chunked– Link chunks and tokens in a single f-structure

Fall-back grammar for fragmentsFall-back grammar for fragments

Grammar writer specifies REPARSECAT– Alternative c-structure root if no complete parse– Allows for fragments and linking

Grammar writer specifies possible chunks– Categories (e.g. S, NP, VP but not N, V)– Looser expansions

Optimality theoryGrammar writer specifies marks to

» Prefer standard rules over anticipated errors» Prefer parse with fewest chunks» Disprefer using tokens over chunks

ExampleExample

“The the dog appears.”

Analyzed as– “token” the– sentence “the dog appears”

C-structureC-structure

F-structureF-structure

• Many chunks have useful analyses• XLE/LFG degrades to shallow parsing in worst case

Robustness summaryRobustness summary

External resources for incomplete lexical entries – Morphologies, guessers, taggers– Current work: Verbnet, Wordnet, Framenet, Cyc– Order by reliability

Fall back techniques for missing constructions– Disprefered rules– Fragment grammar

Current WSJ evaluation:– 100% coverage, ~85% full parses– F-score (esp. recall) declines for fragment parses

Brief demo Brief demo

Stochastic disambiguation:Stochastic disambiguation: When you have to choose When you have to choose

Finding the most probable parse Finding the most probable parse XLE produces many candidates

– All valid (with respect to grammar and OT marks)– Not all equally likely– Some applications are ambiguity enabled (defer selection)– … But some require a single best guess

Grammar writers have only coarse preference intuitions– Many implicit properties of words and structures with unclear

significance Appeal to probability model to choose best parse

– Assume: previous experience is a good guide for future decisions– Collect corpus of training sentences– Build probability model that optimizes for previous good results– Apply model to choose best analysis of new sentences

IssuesIssues

What kind of probability model? What kind of training data? Efficiency of training, disambiguation? Benefit vs. random choice of parse?

– Random is awful for treebank grammars– Hard LFG constraints restrict to plausible candidates

Probability modelProbability model Conventional models: stochastic branching process

– Hidden Markov models– Probabilistic Context-Free grammars

Sequence of decisions, each independent of previous decisions, each choice having a certain probability– HMM: Choose from outgoing arcs at a given state– PCFG: Choose from alternative expansions of a given category

Probability of an analysis = product of choice probabilities Efficient algorithms

– Training: forward/backward, inside/outside– Disambiguation: Viterbi

Abney 1997 and others: Not appropriate for LFG, HPSG…– Choices are not independent: Information from different CFG branches

interacts through f-structure– Relative-frequency estimator is inconsistent

Exponential models are appropriateExponential models are appropriate (aka Log-linear models) (aka Log-linear models)

Assign probabilities to representations, not to choices in a derivation

No independence assumption Arithmetic combined with human insight

– Human:» Define properties of representations that may be relevant

» Based on any computable configuration of f-structure features, trees

– Arithmetic:» Train to figure out the weight of each property

Stochastic Disambiguation in XLEStochastic Disambiguation in XLE All parses All parses Most probable Most probable

Discriminative ranking– Conditional log-linear model on c/f-structure pairs

Probability of parse x for string s, wheref is a vector of feature values for x is a vector of feature weightsZ is normalizer for all parses of s

– Discriminative estimation of from partially labeled data(Riezler et al. ACL’02)

– Combined l1-regularization and feature-selection» Avoid over-fitting, choose best features

(Riezler & Vasserman, EMNLP’04)

pλ (x | s) =eλ • f (x )

Coarse training data for XLECoarse training data for XLE“Correct” parses are consistent with weak annotation

Considering/VBG (NP the naggings of a culture imperative), (NP-SBJ I) promptly signed/VBD up.

Sufficient for disambiguation, not for grammar induction

(S (S-ADV (NP-SBJ (-NONE- *-1)) (VP (VBG Considering) (NP (NP (DT the) (NNS naggings)) (PP (IN of) (NP (DT a) (NN culture) (NN imperative)))))) (, ,) (NP-SBJ-1 (PRP I)) (VP (ADVP-MNR (RB promptly)) (VBD signed) (PRT (RB up))) (. .))

Compare with full PTB annotation:

Classes of propertiesClasses of properties C-structure nodes and subtrees

– indicating certain attachment preferences Recursively embedded phrases

– indicating high vs. low attachment F-structure attributes

– presence of grammatical functions Atomic attribute-value pairs in f-structure

– particular feature values Left/right/ branching behavior of c-structures (Non)parallelism of coordinations in c- and f-structures Lexical elements

– tuples of head words, argument words, grammatical relations

~60,000 candidate properties, ~1000 selected

Some properties and weightsSome properties and weights0.937481 cs_embedded VPv[pass] 1-0.126697cs_embedded VPv[perf] 3-0.0204844 cs_embedded VPv[perf] 2-0.0265543 cs_right_branch-0.986274cs_conj_nonpar 5-0.536944cs_conj_nonpar 4-0.0561876 cs_conj_nonpar 30.373382 cs_label ADVPint-1.20711 cs_label ADVPvp-0.57614 cs_label AP[attr]-0.139274cs_adjacent_label DATEP PP-1.25583 cs_adjacent_label MEASUREP PPnp-0.35766 cs_adjacent_label NPadj PP-0.00651106 fs_attrs 1 OBL-COMPAR0.454177 fs_attrs 1 OBL-PART-0.180969fs_attrs 1 ADJUNCT0.285577 fs_attr_val DET-FORM the0.508962 fs_attr_val DET-FORM this0.285577 fs_attr_val DET-TYPE def0.217335 fs_attr_val DET-TYPE demon0.278342 lex_subcat achieve OBJ,SUBJ,VTYPE SUBJ,OBL-AG,PASSIVE=+0.00735123 lex_subcat acknowledge COMP-EX,SUBJ,VTYPE

EfficiencyEfficiency Properties counts

– Associated with AND/OR tree of XLE contexts (a1, b2)» Detectors may add new nodes to tree: conjoined contexts

– Shared among many parses Training

– Dynamic programming algorithm applied to AND/OR tree» Avoids unpacking of individual parses (Miyao and Tsujii HLT’02)

» Similar to inside-outside algorithm of PCFG

– Fast algorithm for choosing best properties– Can train only on sentences with relatively low-ambiguity

» Shorter, perhaps easier to annotate

– 5 hours to train over WSJ (given file of parses) Disambiguation

– Viterbi algorithm applied to Boolean tree– 5% of parse time to disambiguate– 30% gain in F-score from random-parse baseline

Integrating Shallow Mark up:Integrating Shallow Mark up:Part of speech tagsPart of speech tagsNamed entitiesNamed entitiesSyntactic bracketsSyntactic brackets

Shallow mark-up of input stringsShallow mark-up of input strings Part-of-speech tags (tagger?)

I/PRP saw/VBD her/PRP duck/VB. I/PRP saw/VBD her/PRP$ duck/NN.

Named entities (named-entity recognizer)

<person>General Mills</person> bought it. <company>General Mills</company> bought it Syntactic brackets (chunk parser?)

[NP-S I] saw [NP-O the girl with the telescope]. [NP-S I] saw [NP-O the girl] with the telescope.

HypothesisHypothesis Shallow mark-up

– Reduces ambiguity– Increases speed– Without decreasing accuracy– (Helps development)

Issues– Markup errors may eliminate correct analyses– Markup process may be slow– Markup may interfere with existing robustness mechanisms

(optimality, fragments, guessers)– Backoff may restore robustness but decrease speed in 2-

pass system

Implementation in XLEImplementation in XLEInput string

Marked up string

Tokenizer (FST)(plus POS, NE converter)

Morphology (FST)(plus POS filter)

LFG grammar(plus bracket metarule,

NE sublexical rule)

f-strc-strf-str

Input string

Tokenizer (FST)

Morphology (FST)

LFG grammar

Integration with minimal changes to existing system/grammar

Experimental Results: PARC 700Experimental Results: PARC 700

Full/All % Full

parses

Optimalsol’ns

Unmarked 76 482/1753 82/79 65/100

Named ent 78 263/1477 86/84 60/91

POS tag 62 248/1916 76/72 40/48

Lab brk 65 158/ 774 85/79 19/31

Comparison: Shallow vs. Deep parsingComparison: Shallow vs. Deep parsingHLT, 2004HLT, 2004

Popular myth– Shallow statistical parsers are fast, robust… and useful

– Deep grammar-based parsers are slow and brittle

Is this true?Comparison on predicate-argument relations, not phrase-trees

– Needed for meaning-sensitive applications (= usefulness)

(translation, question answering…but maybe not IR)

– Collins (1999) parser: state-of-the-art, marks arguments

(for fair test, wrote special code to make relations explicit--not so easy)

– LFG/XLE with morphology, named-entities, disambiguation

– Measured time, accuracy against PARC 700 Gold Standard

– Results:» Collins is a bit times faster than LFG/XLE

» LFG/XLE makes somewhat fewer errors, provides more useful detail

XLE SystemXLE System Parser/generator for LFG grammars: multilingual Composition with finite-state transductions Careful ambiguity-management implementation

– Preserves context-free locality in equational disjunctions– Exports ambiguity-enabling interfaces

Efficient implementation of clause-conjunction (C1C2)\

Log-linear disambiguation– Appropriate for LFG representations– Ambiguity-enabled theory and implementation

Robustness: shallow in the worst-case Scales to broad-coverage grammars, long sentences Semantic interface: Glue

LFG/XLE: Current issuesLFG/XLE: Current issues Induction of LFG grammars from treebanks

– Basic work in ParGram: Dublin City University– Principles of generalization, for human extension, combination with

manual grammar DCU + PARC

Large grammars for more language typologies– E.g. verb initial: Welsh, Malagasy, Arabic

Reduce performance variance; why not linear?– Competence vs. performance: limit center embedding?– Investigate speed/accuracy trade-off

Embedding in applications: XLE as a black box– Question answering(!), Translation, Sentence condensation …– Develop, combine with other ambiguity-enabled modules

Reasoning, transfer-rewriting…

Matching for Question AnsweringMatching for Question Answering

F-structure

ParserParserQuestion

Englishgrammar

F-structure

AnswerSources

Semantics

Overlap detector

Glue SemanticsGlue Semantics

Logical & Collocational SemanticsLogical & Collocational Semantics

Logical Semantics– Map sentences to logical representations of meaning– Enables inference & reasoning

Collocational semantics – Represent word meanings as feature vectors– Typically obtained by statistical corpus analysis– Good for indexing, classification, language modeling, word

sense disambiguation– Currently does not enable inference

Complementary, not conflicting, approaches

Example Semantic RepresentationExample Semantic Representation

F-structure gives basic predicate-argument structure, but lacks:

– Standard logical machinery (variables, connectives, etc)

– Implicit arguments (events, causes)

– Contextual dependencies (the wire = part25)

Mapping from f-structure to logical form is systematic, but non-trivial

The wire broke

break<SUBJ>

PRED wireSPEC defNUM sg

Syntax (f-structure)

w. wire(w) & w=part25 & t. interval(t) & t<now & e. break_event(e) & occurs_during(e,t) & object_of_change(e,w) & c. cause_of_change(e,c)

Semantics (logical form)

Glue Semantics Glue Semantics Dalrymple, Lamping & Saraswat 1993 Dalrymple, Lamping & Saraswat 1993 and subsequentlyand subsequently

Syntax-semantics mapping as linear logic inference

Two logics in semantics:– Meaning Logic (target semantic representation) any suitable semantic representation– Glue Logic (deductively assembles target meaning) fragment of linear logic

Syntactic analysis produces lexical glue premises

Semantic interpretation uses deduction to assemble final meaning from these premises

Linear LogicLinear Logic Influential development in theoretical computer

science (Girard 87) Premises are resources consumed in inference

(Traditional logic: premises are non-resourced)

• Linguistic processing typically resource sensitiveWords/meanings used exactly once

Traditional LinearA, AB |= B A, A -o B |= BA, AB |= A&B A, A -o B |= AB A re-used A consumed

A, B |= B A, B |= B A discarded Cannot discard A

Glue Interpretation (Outline)Glue Interpretation (Outline) Parsing sentence instantiates lexical entries to produce

lexical glue premises Example lexical premise (verb “saw” in “John saw Fred”):

see : g -o (h -o f)Meaning Term Glue Formula2-place predicate g, h, f: constituents in parse “consume meanings of g and h to produce meaning of f”

• Glue derivation |= M : f

• Consume all lexical premises ,

• to produce meaning, M, for entire sentence, f

Glue Interpretation Glue Interpretation Getting the premisesGetting the premises

PRED John

PRED Fred

John saw Fred

Syntactic Analysis:

Lexicon: John NP john: Fred NP fred: saw V see: SUBJ -o (OBJ -o )

Instantiated premises: john: g fred: h see: g -o (h -o f)

Glue InterpretationGlue InterpretationDeduction with premisesDeduction with premises

Premises john: g fred: h see: g -o (h -o f)

Linear Logic Derivation g -o (h -o f) g

h -o f h f Using linear modus ponens

Derivation with Meaning Terms see: g -o (h -o f) john: g

see(john) : h -o f fred : h

see(john)(fred) : f

Linear modus ponens = function application

Fun: Arg:

Fun(Arg):

Modus Ponens = Function ApplicationModus Ponens = Function ApplicationThe Curry-Howard IsomorphismThe Curry-Howard Isomorphism

Curry Howard Isomorphism: Pairs LL inference rules with operations on meaning terms

g -o f g

Propositional linear logic inference constructs meanings LL inference completely independent of meaning language

(Modularity of meaning representation)

Semantic AmbiguitySemantic AmbiguityMultiple derivations from single set of premisesMultiple derivations from single set of premises

PRED criminal

MODSalleged

from London

Alleged criminal from London Premises

criminal: f

alleged: f -o f

from-London: f -o f

Two distinct derivations:

1. from-London(alleged(criminal))

2. alleged(from-London(criminal))

Semantic Ambiguity & ModifiersSemantic Ambiguity & Modifiers

Multiple derivations from single premise set– Arises through different ways of permuting modifiers

around an skeleton Modifiers given formal representation in glue as

-o logical identities– E.g. an adjective is a noun -o noun modifier

Modifiers prevalent in natural language, and lead to combinatorial explosion– Given N f -o f modifiers, N! ways of permuting

them around f skeleton

Ambiguity management in semanticsAmbiguity management in semantics

Efficient theorem provers that manage combinatorial explosion of modifiers– Packing of N! analyses

» Represent all N! analyses in polynomial space» Compute representation in polynomial time» Free choice: Read off any given analysis in linear time

– Packing through structure re-use» N! analyses through combinations of N sub-analyses» Compute each sub-analysis once, and re-use

Generate

Select

Transfer

Interpret

Glue SemanticsLFG Syntax

FS Morphology

EnglishFrench

JapaneseGerman

AlgorithmsProgramsData structures

Mathematics

MultidimensionalArchitecture

Parc Linguistic EnvironmentParc Linguistic Environment

Translation

Condensation

Applications

Dialog

Question Answering

Email Routing

Knowledge tracking

Email ResponseTheorySoftwareTableware

UrduNorwegian

Models, parametersAm

ScaleModularityRobustness

grammatical processing with lfg and xle ron kaplan arda symposium, august 2004 advanced question...

knowledge slide

small children

n dogare v chasing slide

space slide

intelligence slide

xle english

grammatical function

english group

Documents

2018 camry l, xle, & xle v6 accessories - …...2018 camry...

arda farms bull sale

lfg media - interempresas.net · 28 lfg lfg web video la...

kakia chatsiou a brief introduction to xle 26.02.2009lg617 -...

grammar development with lfg and xle - uni...

arda deniz aksular

20121015 nectec-arda

portfolio asli arda (2)

lfg poster

arda tutorial

catalogo arda disegno

irp-cdn.multiscreensite.com...oarda american resort...

m cscape xle train

brochure titan xle eng - italdron

arda marred rulebook

lcg arda status massimo lamanna 1 arda in a nutshell arda is...

arda 2014 presentation

karadeniz powership arda bey - ata...

arda rework

atlas and arda