starting with complex primitives pays off: complicate locally, simplify globally aravind k. joshi...

Starting With Complex Primitives Pays Off:Complicate Locally, Simplify Globally

ARAVIND K. JOSHIDepartment of Computer and Information Science

andInstitute for Research in Cognitive Science

CogSci 2003

Boston, August 1 2003

cogsci-03: 2

Outline

• • Summary

cogsci-03: 3

Introduction

• Formal systems to specify a grammar formalism• Start with primitives (basic primitive structures or building blocks) as simple as possible and then introduce various operations for constructing more complex structures

• Alternatively,

cogsci-03: 4

Introduction: CLSG

• Start with complex (more complicated) primitives which directly capture some crucial linguistic properties and then introduce some general operations for composing them -- Complicate Locally, Simplify Globally (CLSG

• CLSG approach is characterized by localizing almost all complexity in the set of primitives, a key property

cogsci-03: 5

Introduction: CLSG – localization of complexity

• Specification of the finite set of complex primitives becomes the main task of a linguistic theory

• CLSG pushes all dependencies to become local, i. e. , they arise initially in the primitive

structures to start with

cogsci-03: 6

Constrained formal systems: another dimension

• Unconstrained formal systems -- add linguistic constraints, which become, in a sense all stipulative • Alternatively, start with a constrained formal system, just adequate for describing language -- formal constraints become universal, in a sense -- other linguistic constraints become stipulative• Convergence: CLSG approach leads to constrained formal systems

cogsci-03: 7

CLSG approach

• CLSG approach as led to several new insights into• Syntactic description• Semantic composition• Language generation• Statistical processing• Psycholinguistic properties• Discourse structure

• CSLG approach will be described by a particular class of grammars, TAG (LTAG), which illustrate the CSLG approach to its maximum• Simple examples to communicate the interplay between formal analysis and linguistic and processing issues

cogsci-03: 8

Context-free Grammars

• The domain of locality is the one level tree -- primitive building blocks

CFG, G S NP VP VP V NP

VP VP ADV NP DET N

DET the N man/car

V likes ADV passionately

NP VP manVP ADV

DET N passionatelylikes

NP V ADV

cogsci-03: 9

Context-free Grammars

• The arguments of the predicate are not in the same local domain• They can be brought together in the same domain -- by introducing a rule

S NP V NP

• However, then the structure is lost• Further the local domains of a CFG are not necessarily lexicalized• Domain of Locality and Lexicalization

cogsci-03: 10

Towards CLSG: Lexicalization

• Lexical item One or more elementary structures (trees, directed acyclic graphs), which are syntactically and semantically encapsulated.

• Universal combining operations

• Grammar Lexicon

cogsci-03: 11

Lexicalized Grammars

• Context-free grammar (CFG)CFG, G S NP VP

VP V NP VP VP ADV

NP HarryNP peanuts V likesADV passionately

(Non-lexical)(Lexical)S

HarryVP ADV

V NP passionately

likes peanuts

cogsci-03: 12

Weak Lexicalization

• Greibach Normal Form (GNF)CFG rules are of the form

A a B1 B2 ... Bn

This lexicalization gives the same set of strings but not the same set of trees, i.e., the same set of structural descriptions. Hence, it is a weak lexicalization.

cogsci-03: 13

Strong Lexicalization

• Same set of strings and same set of trees or structural descriptions.

• Tree substitution grammars (TSG)– Increased domain of locality– Substitution as the only combining operation

cogsci-03: 14

Substitution

cogsci-03: 15

Strong Lexicalization

• Tree substitution grammars (TSG)CFG, G S NP VP

VP V NP

NP HarryNP peanuts V likes

TSG, G’ 1 S

peanuts

cogsci-03: 16

Insufficiency of TSG

• Formal insufficiency of TSGG: S SS (non-lexical) S a (lexical)

TSG: G’: 1: S

cogsci-03: 17

Insufficiency of TSG

TSG: G’: 1: S

S S S S

G’ can generate all strings of G but not all trees of G.CFGs cannot be lexicalized by TSG’s, i.e., bysubstitution only.

grows on both sides of the root

cogsci-03: 18

Tree adjoined to tree at the node labeled X in the tree

Adjoining

cogsci-03: 19

With Adjoining

TSG: G’: 1: S

SG: S SS S a

Adjoining 2 to 3 at the S node, the rootnode and then adjoining 1 to the S nodeof the derived tree we have .

CFGs can be lexicalized by LTAGs. Adjoining is crucial for lexicalization.

Adjoining arises out of lexicalization

cogsci-03: 20

Lexicalized LTAG

• Finite set of elementary trees anchored on lexical items -- extended projections of lexical anchors, -- encapsulate syntactic and semantic dependencies

• Elementary trees: Initial and Auxiliary• Operations: Substitution and Adjoining• Derivation:

– Derivation Tree• How elementary trees are put together.

– Derived tree

cogsci-03: 21

• agreement: person, number, gender• subcategorization: sleeps: null; eats: NP; gives:

NP NP; thinks: S• filler-gap: who did John ask Bill to invite e• word order: within and across clauses as in

scrambling and clitic movement• function – argument: all arguments of the

lexical anchor are localized

Localization of Dependencies

cogsci-03: 22

Localization of Dependencies

• word-clusters (flexible idioms): non-compositional aspect•take a walk, give a cold shoulder to

• word co-occurrences• lexical semantic aspects• statistical dependencies among heads • anaphoric dependencies

cogsci-03: 23

NPV NP

transitive object

extractionsome other trees for likes: subject extraction, topicalization, subject relative, object relative, passive, etc.

LTAG: Examples

cogsci-03: 24

NPV NP

NPV S*

NP NP NP

who Harry Bill

LTAG: A derivation

cogsci-03: 25

NPV NP

NPV S*

NP NP NP

who Harry Bill

substitution

adjoining

who does Bill think Harry likes

LTAG: A Derivation

cogsci-03: 26

LTAG: Derived Tree

NPV NP

likese

cogsci-03: 27

who think Harry

does Bill

* Compositional semantics on this derivation structure* Related to dependency diagrams

substitution

adjoining

LTAG: Derivation Tree

cogsci-03: 28

Nested Dependencies

Architecture of elementary trees and determines thenature of dependencies described by the TAG grammar

a a a…b b b

cogsci-03: 29

Architecture of elementary trees and determines the kinds of dependencies that can be characterized

b is one level below a and to the right of the spine

Crossed dependencies

cogsci-03: 30

a a b b

Linear structure

Topology of Elementary Trees: Crossed dependencies

cogsci-03: 31

a a b b

(Linear) Crossed Dependencies

Topology of Elementary Trees: Crossed dependencies

Dependencies are nested on the tree

cogsci-03: 32

Examples: Nested Dependencies

• Center embedding of relative clauses in English

(1) The rat1 the cat2 chased2 ate1 the cheese

• Center embedding of complement clauses in German

(2) Hans1 Peter2 Marie3 schwimmen3 lassen2 sah1

(Hans saw Peter let/make Marie swim)

Important differences between (1) and (2)

cogsci-03: 33

Examples: Crossed Dependencies

• • Center embedding of complement clauses in Dutch

Jan1 Piet2 Marie3 zag1 laten2 zwemmen3

(Jan saw Piet let/make Marie swim)

• It is possible to obtain a wide range of complex dependencies, i.e., complex combinations of nested and crossed dependencies. Such patterns arise in word order phenomena such as scrambling and clitic movement and also due to scope ambiguities

cogsci-03: 34

• TAGs are more powerful than CFGs, both weakly and strongly, i.e., in terms of both -- the string sets they characterize and -- the structural descriptions they support

• TAGs carry over all formal properties of CFGs, modified in an appropriate way

-- polynomial parsing, n6 as compared to n3

• TAGs correspond to Embedded Pushdown Automata (EPDA) in the same way as PDAs correspond to CFGs (Vijay-Shanker, 1987)

LTAG: Some Formal Properties

cogsci-03: 35

• An EPDA is like a PDA, however, at each move it can -- create a specified (by the move) number of stacks to the left and right of the current stack and push specified information into them -- push or pop on the current stack -- at the end of the move --stack pointer moves to the top of the rightmost stack -- if a stack becomes empty it drops out

cogsci-03: 36

Input tape

Finite Control

Current stack

cogsci-03: 37

Input tape

Finite Control

old current stack

x x x x x

newly created stacks by the move

cogsci-03: 38

• TAGs (more precisely, languages of TAGs) belong to the class of languages called mildly context-sensitive languages (MCSL) characterized by

• polynomial parsing complexity• grammars for the languages in this class can characterize a limited set of patterns of nested and crossed dependencies and their combinations• languages in this class have the constant growth property, i.e., sentences, if arranged in increasing order of length, grow only by a bounded amount• this class properly includes CFLs

cogsci-03: 39

• MCSL hypothesis : Natural Languages belong to MCSL• Generated very fruitful research in

• comparing different linguistic and formal proposals• discovering provable equivalences among formalisms and constrained formal systems• providing new perspectives on linguistic theories and processing issues

• In general, leading to a fruitful interplay of formal frameworks, substantive linguistic theories, and computational and processing paradigms

cogsci-03: 40

Two alternate perspectives on LTAG

• Supertagging

• Flexible composition

cogsci-03: 41

NPV NP

Supertagging –supertag disambiguation: Two supertags for likes

Elementary trees associated with a lexical item can be regarded as super parts-of-speech (super POS or supertags) associated with that item

cogsci-03: 42

Supertagging –supertag disambiguation

• Given a corpus parsed by an LTAG grammar– we have statistics of supertags -- unigram, bigram,

trigram, etc.– these statistics combine the lexical statistics as well

as the statistics of the constructions in which the lexical items appear

• Apply statistical disambiguation techniques for standard parts-of-speech (POS) such as N (noun), V(verb), P(preposition), etc. for supertagging Joshi & Srinivas (1994), Srinivas and Joshi (1998)

cogsci-03: 43

Supertagging

the purchase price includes two ancillary companies

On the average a lexical item has about 15 to 20 supertags

cogsci-03: 44

Supertagging

the purchase price includes two ancillary companies

- Select the correct supertag for each word -- shown in blue- Correct supertag for a word means the supertag that corresponds to that word in the correct parse of the sentence

cogsci-03: 45

Supertagging -- performance

• Training corpus: 1 million words

• Test corpus: 47,000 words

• Baseline: Assign the most likely supertag: 77%

• Trigram supertagger: 92% Srinivas (1997)

• Some recent results: 93% Chen & Vijay-Shanker (2000)

• Improvement from 77% to 93%

• Comparison with standard POS: over 90% to 98%

cogsci-03: 46

Abstract characterization of supertagging

• Complex (richer) descriptions of primitives (anchors) – contrary to the standard mathematical convention

• Associate with each primitive all information associated with it

cogsci-03: 47

Abstract characterization of supertagging

• Making descriptions of primitives more complex– increases the local ambiguity, i.e., there are more

descriptions for each primitive– however, these richer descriptions of primitives

locally constrain each other– analogy to a jigsaw puzzle -- the richer the

description of each primitive the better– Waltz?

cogsci-03: 48

Complex descriptions of primitives

• Making the descriptions of primitives more complex– allows statistics to be computed over these complex

descriptions– these statistics are more meaningful– local statistical computations over these complex

descriptions lead to robust and efficient processing– Skip?

cogsci-03: 49

Flexible Composition

Split at x

supertree of at X

subtree of at X

Adjoining as Wrapping

cogsci-03: 50

wrapped around i.e., the two components and are wrapped around

supertree of at X

subtree of at X

Flexible CompositionAdjoining as Wrapping

cogsci-03: 51

NP(wh)

substitution

adjoining

Flexible CompositionWrapping as substitutions and adjunctions

- We can also view this composition as wrapped around - Flexible composition

cogsci-03: 52

NP(wh)

substitution

adjoining

Flexible CompositionWrapping as substitutions and adjunctions

and are the two components ofattached (adjoined) to the root node S of attached (substituted) at the foot node S of

Leads to multi-component TAG (MC-TAG)

cogsci-03: 53

Multi-component LTAG (MC-LTAG)

The two components are used together in one composition step. Both components attach to nodes in an elementary tree. This preserves locality.

The representation can be used for both -- predicate-argument relationships -- non-p/a information such as scope, focus, etc.

cogsci-03: 54

Tree-Local Multi-component LTAG (MC-LTAG)

- How can the components of MC-LTAG compose preserving locality of LTAG- Tree-Local MC-LTAG -- Components of a set compose only with an elementary tree or an elementary component- Flexible composition- Tree-Local MC-LTAGs are weakly equivalent to LTAGs- However, Tree-Local MC-LTAGs provide structural descriptions not obtainable by LTAGs- Increased strong generative power

cogsci-03: 55

Scope ambiguities: Example

student

course

( every student hates some course)

cogsci-03: 56

Derivation with scope information: Example

student

course

cogsci-03: 57

Derivation tree with scope information: Example

(hates)

(E) (every) (some) (S)

(student) (course)

0 01 2.

- and are both adjoined at the root of (hates)- They can be adjoined in any order, thus representing the two scope readings (underspecified representation)- The scope readings represented in the LTAG derivation itself

cogsci-03: 58

Tree-Local MC-LTAG and flexible semantics

• Applications to word order variations including scrambling, clitic movement and even scope ambiguities• All word order variations up to two levels of embedding (three clauses in all) can be correctly described by tree-local MC-TAGs with flexible composition– correctly means providing appropriate structural descriptions, i.e., correct semantics -- however,

cogsci-03: 59

• Beyond two levels of embedding not all patterns of word order variation will be correctly described Joshi, Becker, and Rambow (2002)

• Thus the class of tree-local MC-TAG has the property that for any grammar, G, in this class, if G works up to two levels of embedding then it fails beyond two levels for at least some patterns of word order

cogsci-03: 60

• Beyond two levels of embedding not all patterns of word order variation will be correctly described Joshi, Becker, and Rambow (2002)

• Thus the class of tree-local MC-TAG has the property that for any grammar, G, in this class, if G works up to two levels of embedding then it fails beyond two levels for at least some patterns of word order Main idea

cogsci-03: 61

• Three clauses, C1, C2, and C3, each clause can be either a single elementary tree or a multi- component tree set with two components

• The verb in C1 takes the verb in C2 as the argument and the verb in C2 takes the verb in C3 as the argument

• Flexible composition allows us to compose the three clauses in three ways

cogsci-03: 62

Three ways of composing C1, C2, and C3C1

(1) (2) (3)

• The third mode of composition is crucial for completing the proof for two levels of embedding• It is not available beyond two levels, without violating semantics!

cogsci-03: 63

Psycholinguistic processing issues

• Supertagging in psycholinguistic models • Processing of crossed and nested dependencies

• A new twist to the competence performance distinction -- a different perspective on this distinction

cogsci-03: 64

Supertagging in psycholinguistic models

• Convergence of perspectives in the roles of -- computational linguistics and psycholinguistics• Due to a shift to lexical and statistical approaches to sentence processing• A particular integration by Kim, Srinivas, and Trueswell (2002) from the perspective of LTAG

cogsci-03: 65

• Supertagging: Much of the computational work of linguistic analysis, -- traditionally viewed as structure building -- can be viewed as lexical disambiguation• Integration of supertagging in a psycholinguistic model -- One would predict that many of the initial processing commitments of syntactic analysis are made at the lexical level in the sense of supertagging

cogsci-03: 66

• Integration of• a constraint-based lexicalist theory (CBL), MacDonald, Pearlmutter, and Seidenberg (1994), Trueswell and Tanenhaus (1984)

• lexicon represented as supertags with their distribution estimated from the supertagging experiments described earlier (Srinivas (1997))

cogsci-03: 67

• Distinction between PP attachment ambiguities in(1) I saw the man in the park with a telescope

(2) The secretary of the general with red hair

Two supertags for with

NP* PP

VP* PP

cogsci-03: 68

• Distinction between PP attachment ambiguities in(1) I saw the man in the park with a telescope(2) The secretary of the general with red hair

• In (1) the ambiguity is lexical in the supertagging sense

• In (2) the ambiguity is resolved at the level of attachment computation (structure building)

•The ambiguity in (1) is resolved at an earlier level of processing while in (2) it is resolved at a later level ofprocessing

cogsci-03: 69

(3) The student forgot her name(4) The student forgot that the homework was due today

In (1) forgot takes an NP complement while in (2) it takes a that S complement -- Thus there will be two different supertags for forgot -- The ambiguity in (3) and (4) is lexical (supertagging sense) and need not be viewed as a structural ambiguity

Kim, Srinivas, and Trueswell (2002) present a neuralnet based architecture using supertags and confirmthese and related results

cogsci-03: 70

Processing of nested and crossed dependencies

• CFG – associated automata PDA• TAG – associated automata EPDA (embedded PDA) (Vijay-Shanker (1987))• EPDAs provide a new perspective on the relative ease or difficulty of processing crossed and nested dependencies which arise in -- center embedded complement constructions

cogsci-03: 71

(1)Hans1 Peter2 Marie3 schwimmen3 lassen2 sah1

(German– nested order)

(2)Jan1 Piet2 Marie3 zag1 laten2 zwemmen3

(Dutch– crossed order)(3) Jan saw Peter let/make Mary swim

(English– iterated order, no center embedding)

Center embedding of complements -- each verb is embedded in a higher verb, except the matrix verb (top level tensed verb)

cogsci-03: 72

(Dutch– crossed order)

Bach, Brown, and Marslen-Wilson (1986)Stated very simply, they showed that Dutch is easier than GermanCrossed order is easier to process than nested order

cogsci-03: 73

• “ German and Dutch subjects performed two tasks-- rating comprehensibility and a test of successful comprehension—on matched sets of sentences which varied in complexity from a simple sentence to one containing three levels of embedding” “no difference between Dutch and German for sentences within the normal range (up to one level) but with a significant preference emerging for the Dutch crossed order” Bach. Brown, and Marslen-Wilson (1986)

cogsci-03: 74

(Dutch– crossed order)• It is not enough to locate a well formed structure but we need to have a place for it to go-- In (1) a PDA can locate the innermost N3 and V3 but we do not know at this stage where this structure belongs, we do not have the higher verb, V2

• PDA is inadequate for (1) and, of course, for (2)

cogsci-03: 75

(Dutch– crossed order)• EPDA can precisely model the processing of (1) and (2), consistent with the principle that -- when a well formed structure is identified it is POPPED only if there is a place for it to go, i.e., the structure in which it fits has already been POPPED --Principle of Partial Interpretation (PPI), Joshi(1990), based on Bach, Brown, and Marslen-Wilson (1986)

cogsci-03: 76

(Dutch– crossed order)• Measure of complexity– maximum number of items from the input that have to be held back before the sentence processing (interpretation) is complete.• German is about twice as hard as Dutch

cogsci-03: 77

• Principle of partial interpretation (PPI) can be correctly instantiated for both Dutch and German, resulting in complexity for German about twice as that for Dutch• Among all possible strategies consistent with PPI we choose the one, say, M1, which makes Dutch as hard as possible• Among all possible strategies consistent with PPI we choose the one, say, M2, which makes German as easy as possible• Then show that the complexity of M1 is less than M2 by about the same proportion as in Bach et al. (1986)!

cogsci-03: 78

• Significance of the EPDA modeling of the processing of nested and crossed dependencies

• Precise correspondence between EPDA and TAG, -- direct correspondence between processing and grammars• We have a precise characterization of the computational power of the processing strategy

Much more recent work, e.g., Gibson (2000), Lewis (2002), Vasishth (2002)

cogsci-03: 79

Competence performance distinction- a new twist• How do we decide whether a certain property is a competence property or a performance property?• Main point: The answer depends on the formal devices available for describing language!• In the context of MC-LTAG describing a variety of word order phenomena, such as scrambling, clitic movement, and even scope ambiguities, there is an interesting answer• We will look at scrambling (e.g., in German)

cogsci-03: 80

Competence performance distinction- a new twist

(1) Hans1 Peter2 Marie3 schwimmen3 lassen2 sah1

(Hans saw Peter make Marie swim)

• In (1) the three nouns are in the standard order It is possible for them to be in any order, in principle, keeping the verbs in the same order as in (1), for example, as in (2)

(2) Hans1 Marie3 Peter2 schwimmen3 lassen2 sah1

In general, P(N1, N2 … Nk) Vk Vk-1 … V1

where P is a permutation of k nouns

cogsci-03: 81

(A) Sentences involving scrambling from more than two levels are difficult to interpret

(B) Similar to the difficulty of processing more than two (perhaps even more than one) center embedding of relative clauses in EnglishThe rat the cat the dog chased bit ate the cheese

(C) Since the difficulty in (B) is regarded as a performance property, we could also declare the difficulty in (A) also as performance property, but WAIT !

cogsci-03: 82

• We already know that the class of tree-local MC-TAG has the property that for any grammar G, in this class,

if G works up to two levels of embedding then it fails beyond two levels for some patterns of word order by not being able to assign a correct structural description, i.e,, correct semantics• Inability to assign correct structural descriptions is the reason for the processing difficulty!!

cogsci-03: 83

• So what should we conclude?

• The claim is not that we must conclude that the difficulty of processing sentences with scrambling from more than two levels of embedding has to be a competence property

• The claim is that we are presented with a choice -- the property can be a competence property -- or, we can continue to regard it a performance property

cogsci-03: 84

• To the best of knowledge, this is the first example where a particular processing difficulty can be claimed as a competence property• Hence, whether a property is competence property or a performance property depends on the formal devices (grammars and machines) available to us for describing language• What about the difficulty of processing more than two levels (perhaps only one) of center embedding of relative clauses in English?

cogsci-03: 85

• In order to show that the difficulty of processing sentences with more than two levels of center embeddings of relative clauses, we will have to exhibit a class of grammars, say, , such that for any grammar, G, in , if G assigns correct correct structural descriptions (correct semantics), for all sentences up to two levels of embedding, then - G fails to assign correct structural descriptions to some sentences with more than two embeddings

cogsci-03: 86

• For each grammar, G, in -- if G works up to two levels then -- G fails beyond two levels• However, as far as I know, we cannot exhibit such a class, -- Finite State Grammars (FSG) will not work -- CFGs will not work -- TAGs will not work• So we have no choice but to regard the processing difficulty as a performance property

cogsci-03: 87

• For center embedding of relative clauses -- we have no choice, so far• For scrambling of center embedded complement clauses, -- we have a choice, we have an opportunity to claim the property as a competence property• The two constructions are quite different• The traditional assumption that all such properties have to be performance properties is not justified at all!

cogsci-03: 88

Summary

cogsci-03: 89

Tree-Local Multi-component LTAG (MC-LTAG)

- How can the components of MC-LTAG compose preserving locality of LTAG- Tree-Local MC-LTAG -- Components of a set compose only with an elementary tree or an elementary component- Non-directional composition- Tree-Local MC-LTAGs are weakly equivalent to LTAGs- However, Tree-Local MC-LTAGs provide structural descriptions not obtainable by LTAGs- Increased strong generative power

cogsci-03: 91

Scrambling: N3 N2 N1 V3 V2 V1

substitution adjoining

(non-directional composition, semantics of attachments)

cogsci-03: 92

Scrambling: N0 N3 N2 N1 V3 V2 V1 V0

substitution adjoining

( breakdown after two levels of embedding)

cogsci-03: 93

Scrambling: N0 N3 N2 N1 V3 V2 V1 V0

-- Beyond two levels of embedding semantically coherent structural descriptions cannot be assigned to all scrambled strings -- the multi-component tree for V0 is forced to combine with the VP component of the V2 tree -- the V0 tree cannot be combined with the V1 tree because the composition has to be tree local

-- Similar results hold for clitic ‘movement’

cogsci-03: 94

Semantics

Harry for fruit

substitution

adjoining

breakfast2.2

eats (x, y, e) ^ Harry (x) ^ fruit (y) ^ for (e, z) ^ breakfast (z) l1: eats (x, y) l2 : Harry (x) l3 : fruit (y) l4 : for (e, z) l5 : breakfast(z)

cogsci-03: 95

Semantics

Harry eats fruit for breakfast

NP NP NP

Harry fruit

breakfast

starting with complex primitives pays off: complicate locally, simplify globally aravind k. joshi...

Documents

aravind eye hospital, madurai.pptx

complex – simple – complicate - chaos

1 using wavelets for recognition of cognitive pattern...

diya aravind dec 13

output primitives computer graphics. output primitives...

aravind eye care sestem

cardiac diseases complicate 1

clarify, illustrate, complicate, extend

aravind vinnakota ejb_architecture

the aravind hospital

accolades - aravind

aravind vinnakota ejb architecture

aravind hospital

gamification by aravind gogineni

communicate or complicate

aravind eye care

life is simple, let's don't complicate

patogeni emergenti nelle infezioni urinarie complicate

19. tqm - aravind

qs c community series outreach initiatives - aravind … ·...