morphological decomposition in word recognition: how to exploit

73
Decomposition to the Root: MEG Studies of Morphologically Complex Words Alec Marantz Olla Solomyak, Ehren Reilly NYU Depts. of Linguistics and Psychology KIT/NYU MEG Joint Research Lab

Upload: dangmien

Post on 23-Dec-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Morphological Decomposition in Word Recognition: How to Exploit

Decomposition to the Root: MEG Studies of Morphologically

Complex WordsAlec Marantz

Olla Solomyak, Ehren ReillyNYU Depts. of Linguistics and Psychology

KIT/NYU MEG Joint Research Lab

Page 2: Morphological Decomposition in Word Recognition: How to Exploit

Decomposition to the root(why the morphologist cares about lexical access)

• Claim associated now with Distributed Morphology:– all “lexical categories” decompose at least to a root and a

category-determining affix– all relations between words or morphemes (e.g., blocking

relations) are computed at the syntactic level of terminal nodes. Thus a single item (e.g., undecomposed irregular past tense “gave”) cannot “compete” with a complex structure (e.g., [give [pst]]) (Embick & Marantz 2008)

– the grammar itself demands full decomposition to the root – the existence of “whole word” roots to lexical access or processing would necessitate a different grammatical system for processing language as opposed to, say, computing grammaticality

Page 3: Morphological Decomposition in Word Recognition: How to Exploit

Decomposition to the root(why the morphologist cares about lexical access)

• Tracking the “ami” in “amiable,” then, is one step along the way toward understanding how the root “cat” functions inside “cat”

adj n

√ami adj √cat n

-able ø

Page 4: Morphological Decomposition in Word Recognition: How to Exploit

(Overly) Simplified Models of Lexical Access: Pinker’s Words and Rules

• Full storage model: all complex words (walked, taught) stored and accessed as wholes– only surface frequency effects predicted– Reaction Time (RT) correlates with the surface frequency

of a complex word• Full decomposition model: no complex words stored

and accessed as wholes– only stem frequency effects predicted– RT correlates with the frequency of the stem of a complex

word, not the frequency of the word as a whole (surface frequency)

Page 5: Morphological Decomposition in Word Recognition: How to Exploit

• Dual Route Model (Pinker’s): irregular complex forms (taught) are stored and accessed as wholes; regular complex forms (walked) are not:– surface frequency effects for irregulars (and high

frequency regulars) = RT to taught correlates with freq of taught, not teach

– stem frequency but no surface frequency effects on access for regulars = RT to walked correlates with freq of walk, not walked

Page 6: Morphological Decomposition in Word Recognition: How to Exploit

• There are Stem Frequency effects in access for complex words– RT to walked does correlate with freq of walk

• These effects are not attributable to post-access decomposition

Page 7: Morphological Decomposition in Word Recognition: How to Exploit

• But, surface frequency effects in lexical access are found in wide variety of cases, including completely regular morphology (e.g., for most inflected words in Finnish)

Page 8: Morphological Decomposition in Word Recognition: How to Exploit

• Surface frequency effects even for transparent productive regular morphology like -less and for same words that yield base frequency effects– surface frequency effects when surface frequency is varied

and base frequency is held constant– base frequency effects when base frequency is varied and

surface frequency is held constant

E.g.:

Page 9: Morphological Decomposition in Word Recognition: How to Exploit

Additional Problems for Pinker-style Dual Route Model

• The representation of irregular derived or inflected forms must be complex– from the grammatical point of view, gave is as

complex as walked• no further affixation: *the gaving, *the walkeding

(note: Pinker’s appeal to irregular plurals inside compounds highlights his incorrect prediction here – mice eater, but *micey (mousey))

• alternations with do support: Did he walk/*walked, Did he give/*gave

Page 10: Morphological Decomposition in Word Recognition: How to Exploit

– from the psycho and neurolinguistic point of view, irregulars contain the stem in the same way that regulars do

• taught-teach identity priming in long-lag priming (only identity (“morphological”) relations - not semantic nor phonological - survive in long distance priming)

• and for M350 brain response (e.g., Stockall & Marantz 2006)

– taught-teach M350 (~N400) priming equivalent to identity priming, although RT priming is reduced

Page 11: Morphological Decomposition in Word Recognition: How to Exploit

Whole Word “Representations” for Regulars, if Surface Frequency effects imply whole word

representations(in some sense)

• Surface frequency effects on access are seen for a variety of completely regular derivations and inflections, implying whole word representations, in some sense

• Obligatory decomposition:– surface frequency effects could be tied to decomposition (the more

you’ve decomposed a particular letter/sound sequence into stem and affix, the faster you are at it) and/or

– recombination (the more often you’ve put together a particular stem and affix, the faster you are at it)

– in either case, against Pinker’s dual route model, such effects imply representation of whole word as complex structure, regardless of regularity

Page 12: Morphological Decomposition in Word Recognition: How to Exploit

• walked may “stored” as a complex form with a certain frequency in the same way that a saying like, And now for something completely different, is

• That is, any surface frequency effect may be connected to long-term effects of having computed a complex form and thus imply a “representation” of the complex form, no matter how regular

• This “usage-based” account of frequency effects holds no immediate implications for the grammar of morphologically complex words, nor for the issue of whether all complex words are recognized via decomposition (and recomposition)

Page 13: Morphological Decomposition in Word Recognition: How to Exploit

u n r e a l

un unreal (??) real

[un[real]]

“not” REAL

form code(letters)

modality specific access lexicon

(visual word form)

lemma(lexical entry)

EncyclopediaStored info about encountered items(outside language system)

“And now for something completely different”

UN+REAL (??)

interactive dual route models and obligatory decomposition models differ on the possible presence of complex word forms in modality specific access lexicons, and perhaps on whether derived forms have “lexical entries”

White House

Page 14: Morphological Decomposition in Word Recognition: How to Exploit

Differences Between Realistic Dual Route Model and Realistic Full Decomposition Model

• Both models require a (modality specific) word form “lexicon”– for full decomposition model, this lexicon holds

only forms of morphemes– for dual route model, this lexicon holds some

morphologically complex forms• Dual Route but not Full Decomposition model

allows whole word lexical entries and word form entries for morphologically complex forms

Page 15: Morphological Decomposition in Word Recognition: How to Exploit

Stages of Lexical Access:which computations in a Full Decomposition

Model affect RT?• I. Decomposition (affix-stripping): no general

effect on RT– Taft: cost-free– Literature: no evidence that ease or difficulty in

affix stripping generally correlates with change in RT

– MEG studies (to be discussed): brain activity correlated with decomposition does not correlate with RT (more brain work associated with decomposition does not yield longer RTs)

Page 16: Morphological Decomposition in Word Recognition: How to Exploit

• II. Lemma access: frequency of “lemma” (stem) correlates with RT– Lemma (stem) access is modulated by frequency and by

priming– Morphological family size of a stem and number of related

senses (polysemy) have been shown to modulate brain activity associated with lemma access at the same brain time/place (the “M350”) as stem frequency

– However, the relationship between an affix and a stem for a morphologically complex word has not been shown to affect the same brain response

Page 17: Morphological Decomposition in Word Recognition: How to Exploit

• III. Recomposition: surface frequency statistics correlate with RT because of their role in determining the ease of recomposition of stem and affixes – So, whole word “representations” (in the sense of

“Encyclopedia” storage or simply in the sense of repeatedly used neural pathways) are accessed via decomposition and recomposition, where the surface frequency properties of these representations exert a late influence on lexical access

Page 18: Morphological Decomposition in Word Recognition: How to Exploit

Sequential processing of words

Page 19: Morphological Decomposition in Word Recognition: How to Exploit

Sequential processing of words

Pylkkänen and Marantz, 2003, Trends in Cognitive Sciences

Page 20: Morphological Decomposition in Word Recognition: How to Exploit

(Pylkkänen, Stringfellow, Flagg, Marantz, Biomag2000 Proceedings, 2000)

Repetition Frequency

1 2 3 4 5 6

Frequency Category (Frequent -- Infrequent)

Behavioral Data: Reaction Time

Categories (n/Million):

1: 7002: 1403: 30 4: 6 5: 1 6: .2

1 2 3 4 5 6Frequency Category (Frequent -- Infrequent)

Latency of m350 Component

Categories (n/Million):

1: 7002: 1403: 30 4: 6 5: 1 6: .2

(Embick, Hackl, Shaeffer, Kelepir, Marantz, Cognitive Brain Research, 2001)

Latency of M350 sensitive to lexical factors such as lexical frequency and repetition:

reflects stage of lexical access

Page 21: Morphological Decomposition in Word Recognition: How to Exploit

Full Decomposition Model Related to MEG response components

• M100 (“Type I” Tarkiainen et al.) response from primary visual areas– visual feature analysis

• M130 (“Type II”) response from occipital-temporal junction– abstract letter string analysis

• M170 (“visual word from area”) response from fusiform area– affix stripping and functional morpheme identification– visual word form recognition

Page 22: Morphological Decomposition in Word Recognition: How to Exploit

Regions of interest derived from peak activity in grand averaged data across subjects

Page 23: Morphological Decomposition in Word Recognition: How to Exploit

• M350 (early “N400m”) response from temporal lobe, with possible (likely) contribution from inferior frontal cortex– lemma activation

• Post-M350 N400m response from temporal lobe (and other regions)– recombination of stem and affix, contact with

Encyclopedic knowledge, integration into context

Page 24: Morphological Decomposition in Word Recognition: How to Exploit

Statistical Connections between Stem and Affix

• J. Hay proposes that the transition probability of the affix given the stem (so, from stem to affix) should correlated with ease of decomposition - the higher this probability, the harder the decomposition and the more “affix dominant” a complex word is

Page 25: Morphological Decomposition in Word Recognition: How to Exploit

• The transition probability of the stem given the affix (from affix to stem), on the other hand, could reflect the ease of recomposition. – Note that for all but the most frequent regular English past

tense verbs, the probability of the stem given the past tense suffix is vanishingly small.

– If RT that seems to correlate with surface frequency is actually correlating with the transition probability from affix to stem, this could explain why regular formations in English do not show surface frequency effects unless the frequencies are very high.

Page 26: Morphological Decomposition in Word Recognition: How to Exploit

Transition Probabilities & Affix dominance

tokens of words containing “mere”

tokens of “merely”

tokens of words with -ly

tokens of “merely”

transition probability from stem to suffix correlates with ratio of a suffixed word’s frequency to frequency of words with the same stem, which is essentially equivalent to “affix dominance”

transition probability from suffix to stem correlates with ratio of a suffixed word’s frequency to the frequency of words with the same suffix

Page 27: Morphological Decomposition in Word Recognition: How to Exploit

hypothetical example:matched for stem frequency (9), difference in surface dominant

(mere(ly)) or stem dominant (sane(ly))

• mere merely• mere merely• mere merely• mere merely• merely

• sane sanely• sane• sane• sane• sane• sane• sane• sane

Page 28: Morphological Decomposition in Word Recognition: How to Exploit

Effect of “Dominance” on Lexical Access:view from interactive dual route model

• Hay: affix dominance leads to difficulty in parsing/decomposition, thus reliance on whole-word recognition and suppression of decomposition in favor of whole-word route

• So, words with high affix dominance should not be recognized via decomposition and should show only surface frequency effects

Page 29: Morphological Decomposition in Word Recognition: How to Exploit

Taft (2004): “Morphological Decomposition and the Reverse Base Frequency Effect”

Obligatory decomposition makes similar predictions as interactive Dual Route model for RT in lexical decision

• Base frequency effects…• RT to complex word correlates with freq of stem

• …reflect accessing the stem of morphological complex forms whereas

• Surface frequency effects…• RT to complex word correlates with freq of complex word

• …reflect the stage of checking the recombination of stem and stripped affix for existence and/or well-formedness.

Page 30: Morphological Decomposition in Word Recognition: How to Exploit

How can we distinguish these accounts of RT differences?

• With brain evidence for the various stages of lexical access leading up to the RT– Interactive dual route models: no base frequency

effects at lexical access for affix-dominant words– Full decomposition: base frequency effects across

affix- and stem-dominant words at lexical access followed by surface frequency effects in RT associated with recombination

Reilly, Badecker & Marantz 2006 (Mental Lexicon):

Page 31: Morphological Decomposition in Word Recognition: How to Exploit

Experiment: parallel behavioral and MEG processing measures

• Lexical Manipulation (Baayen, Dijkstra & Schreuder, 1997, JML)– Lemma/stem frequency (CELEX database)

– Stem vs. affix dominance

Stem Frequenc

y:

Stem Dominant=

low surface freq

Affix Dominant=

high surface freq

High desk – desks crop – crops

Mid deck – decks cliff – cliffs

Low chef – chefs chord – chords

Page 32: Morphological Decomposition in Word Recognition: How to Exploit

Stimuli: 3 Lexical Categoriesfully productive morphology

• Nouns: singular/plural– bone– bones

• Verbs: stem/progressive– chop– chopping

• Adjectives:adjective/-ly adverb– clear– clearly

Page 33: Morphological Decomposition in Word Recognition: How to Exploit

Experiment: behavioral measures • Reliable effect of stem frequency in RT in lexical

decision

High Medium Low

620

640

660

680

700

720

740

760

Stem Frequency

Page 34: Morphological Decomposition in Word Recognition: How to Exploit

Experiment: behavioral measures • Interacting effects on RT of affixation (base vs.

affixed) and dominance (base-dominant vs. affix-dominant)

B

B

J

J

Unaffixed Affixed

640

660

680

700

720

740

760

780

Affixation

B Base-Dominant

J Affix Dominant

This is a surface frequency effect for completely regular morphology.

Same words, both base and surface frequency effects, undermining Pinker theory

Page 35: Morphological Decomposition in Word Recognition: How to Exploit

M350 sensors chosen subject by subject

Page 36: Morphological Decomposition in Word Recognition: How to Exploit

Analysis of M350 peak latency(brain index of lexical access)

• Reliable effect of Stem frequency for unaffixed words and for affixed words

High Medium Low

250

300

350

400

Stem Frequency

High Medium Low

250

300

350

400

Stem Frequency

Unaffixed Words

Affixed Words

Page 37: Morphological Decomposition in Word Recognition: How to Exploit

Analysis of M350 peak latency• No effect of Dominance (base-dominant vs. affix-dominant)

- no effect of surface frequency - on M350 peak latency – Against prediction of interactive dual route theory

Affix Dominant Base Dominant

250

300

350

400

Affixed Words

Page 38: Morphological Decomposition in Word Recognition: How to Exploit

Analysis of M350 peak latency• No interaction between Dominance (base-dominant vs. affix-

dominant) and Affixation (base vs. affixed)

B

B

J

J

Unaffixed Affixed

335

345

355

365

375

385

Affixation

B Base-Dominant

J Affix Dominant

B

B

J

J

Unaffixed Affixed

640

660

680

700

720

740

760

780

Affixation

M350 peak latency

Behavioral RT

Page 39: Morphological Decomposition in Word Recognition: How to Exploit

Analysis of M350 peak latency• Evidence that early stages of access for affixed words is based on full

parsing: Stem frequency affects M350/lexical access while whole word frequency affects post-access (recombination) stage of word recognition.

Affix Dominant

Base Dominant

0 100 200 300 400 500 600 700 800

M350 Peak Latency and Residual RT for Base-Dominant and Affix-Dominant Affixed Words

Page 40: Morphological Decomposition in Word Recognition: How to Exploit

But what about evidence for parsing and recombination?

RMS Correlations Across Subjects• For some set of sensors, calculate at each time point

in each experimental “epoch” the root mean square (RMS) = the square root of the mean of the squares of the values at each sensor (after normalization of values)

• So, for each subject, for each item, an RMS “wave” can be provided for the correlational analysis

• At each time point, the RMS value for each stimulus is correlated with a stimulus variable

Page 41: Morphological Decomposition in Word Recognition: How to Exploit

Grand Average All Stimuli All Subjects (11)

Page 42: Morphological Decomposition in Word Recognition: How to Exploit

M170 sensors chosen on the basis of field pattern, subject by subject

Page 43: Morphological Decomposition in Word Recognition: How to Exploit

M170 Correlation with Dominance:Significant “parsing” effect

The higher the transition probability from stem to affix, the higher the M170 amplitude – for affix-dominant words

Page 44: Morphological Decomposition in Word Recognition: How to Exploit

Recombination Effect?:Correlation with Conditional Probability of Stem, Given Affix, for

Affixed Words at 450ms, after the M350

Page 45: Morphological Decomposition in Word Recognition: How to Exploit

Summary of Dominance Exp• Base and Surface Freq RT effects for same words again argues

against simplistic (Pinker) Dual Route theory• Affix dominance effect at M170 for high affix dominant words

argues against Hay’s interactive Dual Route theory, where such words should be accessed via the whole word route – as does lack of M350 latency effects for these words

• M350 latency effects for stem frequency but not surface frequency (and not affix dominance) followed by effect of transition probability from affix to stem post M350 argues that recombination dominates RT effect for surface frequency of affixed words

Page 46: Morphological Decomposition in Word Recognition: How to Exploit

Evidence for an orthographic word form lexicon

• Frequency of stem relative to full affixed form – affix dominance – correlates with M170 amplitude; implies access to some kind of stem representation

• Zweig & Pylkkänen (2008) show M170 effect of decomposition in the contrast between farmer (complex) and winter (simple), where the contrast implies access to a representation of farm at the M170 (wint lacks a representation)

Page 47: Morphological Decomposition in Word Recognition: How to Exploit

Zweig & Pylkkänen (2008, LCP)

Bimorphemic: farmer, Monomorphemic Orth: winter

Page 48: Morphological Decomposition in Word Recognition: How to Exploit

Modality-Specific Access Lexicon?

• Pulvermüller in a number of studies has found early (~150ms) word frequency effects in evoked brain responses in the posterior brain regions

• These are found for monomorphemic words, and the effects seem limited to shorter words

• These could be explained by higher order n-gram frequencies - by the frequencies of letter strings, i.e., by features of word form representations that do not make contact with the (semantic) lexicon

Page 49: Morphological Decomposition in Word Recognition: How to Exploit

Modality-Specific Access Lexicon?• “Parsing” at the M170 requires access to word forms (or to high-n

n-grams)• Dominance effects at the M170 suggest frequency information

associated with word-forms– dominance reflects the conditional probability of the affix given

the stem, where notion of “stem” implies form representation of the stem

• Difference between visual word form representation and lexical entry?– heteronyms like “wind” (“moving air” vs. “twist”)– visual word form frequency is not the same as lexical frequency– “wind” has one word form frequency but two lexical

frequencies, one for each meaning

Page 50: Morphological Decomposition in Word Recognition: How to Exploit

Lexical access in early stages of visual word processing: A single-trial correlational MEG study of

heteronym recognition Marantz & Solomyak (2008, Brain & Language)

• All (20) monomorphemic heteronyms (meeting other criteria) of English

• If M170 marks access to visual word form representations, but not lexical entries, then only form frequency variables associated with heteronyms should correlate with M170 brain activity

• If M170 marks lexical access, relative frequency of the 2 pronunciations of heteronyms should correlate with activity

Page 51: Morphological Decomposition in Word Recognition: How to Exploit
Page 52: Morphological Decomposition in Word Recognition: How to Exploit

Regions of interest derived from peak activity in grand averaged data across subjects

Page 53: Morphological Decomposition in Word Recognition: How to Exploit

• The white point represents the peak of the Visual Word Form Area, as identified by Cohen et al. (2002)• The yellow line outlines the region of peak M170 activation in an average of 9 subjects’ brain activity.

Visual Word Form Area Left Hemisphere — Ventral

View

Page 54: Morphological Decomposition in Word Recognition: How to Exploit

Mean Activity in LH M170 Region for 9 Subjects(Dotted line shows average across subjects)

Page 55: Morphological Decomposition in Word Recognition: How to Exploit

Grand averaged activation over time from M170 and M350 ROIs

Page 56: Morphological Decomposition in Word Recognition: How to Exploit

Only the form property (~bigram frequency) showed significant correlation with brain activity in the M170 ROI while only the semantic property (ratio of frequency of meanings) showed a significant correlation in the M350 ROI

A Monte Carlo procedure was used to test for significance in the face of multiple comparisons (across time points)

Page 57: Morphological Decomposition in Word Recognition: How to Exploit

Evidence so Far

• Decomposition even for “affix-dominant” words– evidence at M170 that high transition probability

between stem and affix makes affix-stripping harder

– evidence post-M350 and at RT that surface frequency makes recomposition easier

Page 58: Morphological Decomposition in Word Recognition: How to Exploit

• Evidence for “visual word form lexicon” accessed at M170– transition probability effects at M170 depend on

frequencies over word form representations– complexity effects at M170 (Zweig & Pylkkänen)

depend on wint vs. farm word form contrast: farm is a word form but wint (in winter) isn’t

Page 59: Morphological Decomposition in Word Recognition: How to Exploit

• Evidence that word form effects involve word forms, not lexical entries– open bi-gram frequency (representational form

for word forms) correlates with activity at M170– but frequency ratio for heteronyms doesn’t

correlate with activity at M170– but does correlate with activity at M350

Page 60: Morphological Decomposition in Word Recognition: How to Exploit

What about the status of bound stems?Can MEG help settle a disputed linguistic issue

• Bound stem: durable– same root in duration– predicts durability

• Unique stem: amiable– no other uses of root– but, predicts amiability

Page 61: Morphological Decomposition in Word Recognition: How to Exploit

tracking the -able in amiable

• If words like durable with a recurring root and amiable with a unique root nevertheless are parsed and computed as is workable with a free root, then– M170 “parsing” effects should be visible for these

“opaque” words, since effects are strong for affix-dominant words

– M350 effects should be observed for stem frequency for bound stems

Page 62: Morphological Decomposition in Word Recognition: How to Exploit

Crucial contrasts:• To show effect of affix processing, need to show

correlation with, e.g., affix frequency that is not equally explained by the positional frequency of the letters at end of the affixed word– distinguish “able” as affix from “a-b-l-e”

• To show effect of “parsing variable” transition probability of affix given the stem, need show correlation with transition probability that is not equally explained by the transition probability between the last letters of the stem and letters of the suffix.

Page 63: Morphological Decomposition in Word Recognition: How to Exploit

Categories of Affixed Words for New Experiment

• 1. Free Root-Affix– taxable

• 2. Bound Root-Affix– tolerable

• 3. Unique Root-Affix – capable

• Morphological parsing as from English Lexicon Project

Page 64: Morphological Decomposition in Word Recognition: How to Exploit

Nine Affixes(All derivational suffixes in English that yielded reasonable

number of examples for each category)

• able• ary• ant• ity• ate

• ic• er• al• ion

Page 65: Morphological Decomposition in Word Recognition: How to Exploit

The ROIs determined again from the grand average across subjects.

Page 66: Morphological Decomposition in Word Recognition: How to Exploit
Page 67: Morphological Decomposition in Word Recognition: How to Exploit
Page 68: Morphological Decomposition in Word Recognition: How to Exploit

Decomposition Effects at M170• Positional letter string freq effects at M130• Affix freq effects but no letter string effects at M170• Morph trans probability effects but no orthographic trans probability

effects at M170 – Multiple regression, taking out first (non-significant) orthographic parsability leaves significant effect of morphological parsability at M170

Page 69: Morphological Decomposition in Word Recognition: How to Exploit

Summary• At M130, form property of final letter frequency

correlates with activity• At M170, affix frequency but not final letter

frequency correlates with activity for all groups, including bound and unique root groups

• At M170 transition probability between stem and affix, but not between last letters of stem and letters of affix, correlates with activity for both free and bound stems

• At M350, stem frequency effects for both free and bound root stems

Page 70: Morphological Decomposition in Word Recognition: How to Exploit

bound stems

• For transition probability results, bound stems pattern with free stems

• For affix frequency results, all stems, including unique bound stems, pattern alike

• Thus we find evidence for full decomposition for free, bound, and unique stems

Page 71: Morphological Decomposition in Word Recognition: How to Exploit

Conclusions• All evidence massively disconfirms a Pinker-style dual

route theory in which some morphologically complex words are recognized as undecomposed wholes

• Full Decomposition theories of lexical access are completely consistent with (in fact predict) surface frequency effects for morphologically complex words

• Surface frequency effects reflect statistics of composition rather than the frequency of whole word access

Page 72: Morphological Decomposition in Word Recognition: How to Exploit

• MEG data confirm the existence of a visual word form lexicon that enters into morphological decomposition in the recognition of complex words

• MEG confirms the morphologist’s claim that decomposition extends to bound and unique roots

Page 73: Morphological Decomposition in Word Recognition: How to Exploit

Thanks to the audience and the colloquium organizers!