recognising nominalisations supervisors: dr. alex lascarides dr. mirella lapata (andrew) yuk on kong...

25
RECOGNISING NOMINALISATIONS • Supervisors: Dr. Alex Lascar ides Dr. Mirella Lapata • (Andrew) Yuk On KONG • University of Edinburgh

Upload: brittany-casey

Post on 16-Dec-2015

221 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

RECOGNISING NOMINALISATIONS

• Supervisors: Dr. Alex Lascarides

Dr. Mirella Lapata

• (Andrew) Yuk On KONG

• University of Edinburgh

Page 2: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

DEFINITION

• “Nominalisation refers to the process of forming a noun from some other word-class. (e.g. red+ness)

• or (in classical transformational grammar especially) the derivation of a noun phrase from an underlying clause (e.g. Her answering of the letter….from She answered the letter).

• The term is also used in the classification of relative clauses (e.g. What concerns me is her attitude)…….” (Crystal 1997)

Page 3: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

• Nominalisations (1st definition) from verbs only are considered here, e.g. "statement" from "state".

• Problem: WORD--noun? from a verb or not?• Nominalsations derived from verbs are very produ

ctive in English and are usually created by means of suffixation (i.e., suffixes that form nouns are attached to verb bases).

Page 4: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

EXCLUSIONS

• Nominals, e.g. the poor, the wounded

• Nominalisation NOT From Verb, e.g. redness

• -ing form, e.g. the making of the movie

• Antidisestablish-ment-arian-ism

Page 5: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

REGULAR?

• Nominalise nominalisation• Interpret interpretation• Interrupt interruption• Associate association• delete deletion

 • break breakage• leak leakage

Page 6: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

• Confine confinement• Refine refinement

(but• define definition)

 • submit submission• admit admission (but also admittance)• remit remission; remittance; remit

Page 7: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

VERB=NOUN

• Debate Debate (not debation); debater• Pay pay• Love love• Boss boss• Stand stand• purchase purchase• Lie lie (“tell a lie”)• (cf lie down)

Page 8: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

VERB=NOUN (except stress)

• transfer transfer

• transport transport

• import import

• rebel rebel; (rebellion)

Page 9: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

1 VERB, >1 NOUNS

• Collect collection; collector

• Interpret interpretation; interpreter

• Cover cover; coverage

• Conduct conduction; conductor;

• Depend dependant/dependent; dependence;

dependency

Page 10: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

SEMANTICS

• Conduct conduction(conduct electricity/heat)

• Conduct conduct(behave/organise)

Page 11: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

WHEN TO USE WHICH SUFFIX

• -tion/-sion• er/or

• Debate debater• Talk talker

• Collect collector• Conduct conductor

Page 12: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

IRREGULAR NOMINALISATION

• Choose choice

• Succeedsuccess;succession;successor

• Decide decision

• Sell sale

Page 13: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

PSEUDO-NOMINALISATION

• mote?? Motion

(noun; a very small piece of dust)

• Depart Departure; Department???

• Apart apartment????

Page 14: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

WHY BOTHER?

• The identification of nominalisations and their associated verbs (e.g. "statement" and "state"). important for a number of NLP tasks:– machine translation– information retrieval– automatic learning of machine-readable diction

aries– grammar induction

Page 15: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

HOW ?

• nominalisation is a productive morphological phenomenon:

• list all acceptable nominalised forms?

• New words?

Page 16: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

techniques NOT focusing on nominalisations

• build rules

• machine-learning approaches to induce morphological structures using large corpora

• knowledge-free induction of inflectional morphologies (Schone and Jurafsky 2001).

Page 17: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

SCHONE AND JURAFSKY (2001)

• Schone and Jurafsky (2001) have performed work for acquiring cognates and morphological variants. – Induced semantics—Latent Semantic Analysis (LSA)

– Induced orthographic info

– Induced syntactic info

– Transitive information

– Affix frequencies

Page 18: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

GOAL OF THIS STUDY

• The principal goal of this project is to develop a system which can recognise nominalisations, together with the verbs from which they are derived.

Page 19: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

EXPERIMENT 1 (baseline)

• identify nouns using the tags in the corpus• identify potential nominalisations from the list of

nouns with a list of nominalisation suffixes• find the corresponding potential verb for each by i

dentifying the verb (from among verbs as tagged) that shares with it the greatest number of letters in sequence

• accept a pair of nominalisation and verb if the % letter matched > 50% and discard any other

Page 20: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

EXPERIMENT 2

• using decision tree to build a model• possible features include:

-letter similarity between verbs and nouns-suffix frequency-verb frequency-verb semantics-subject of noun-subject of verb

Page 21: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

EVALUATION

• experiments will be based on the BNC corpus.

• The obtained nominalisations will be evaluated against the CELEX morphological lexicon and manually annotated data.

• Precision, recall and F-score

Page 22: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

BRITISH NATIONAL CORPUS

• Over 100 million words• Corpus of modern English• Both spoken (10%) and written (90%)• Each word is automatically tagged by the CLAWS

stochastic POS tagger• 65 different tags• encoded using SGML to represent POS tags and a

variety of other structural properties of texts (e.g. headings, paragraphs, lists, etc.)

Page 23: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

• <item>• <s n=086>• <w NN1-VVG>Shopping <w PRP>including <w NN1>c

ollection <w PRF>of• <w NN2>prescriptions• </item>• <item>• <s n=087>• <w VVG>Daysitting <w CJC>and <w VVG>nightsitting• </item>

Page 24: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

CELEX

• English, Dutch and German• Annotated by human using lemmata from t

wo dictionaries of English• 52,446 lemmata and 160,594 wordforms• orthographic, phonological, morphological,

syntactic and frequency information• morphological structure, e.g. ((celebrate),(io

n))

Page 25: RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

MILESTONES

• 6/2002 Experiment 1—baseline

• 7/2002 Experiment 2

• 8/2002 Write-up

• 9/2002 Finalise report