1 natural language processing (2a) zhao hai 赵海 department of computer science and engineering...
TRANSCRIPT
1
Natural Language Processing (2a)
Zhao Hai 赵海
Department of Computer Science and Engineering
Shanghai Jiao Tong University
2010-2011
http://bcmi.sjtu.edu.cn/~zhaohai/lessons/nlp2011/index.html
2
Lexicons and Lexical Analysis
Lexicon: A Language Resource
A Lexicon for English Words: WordNet
Outline
3
Lexicon: A Language Resource (1)Features for Lexicons (1)
A lexicon means machine dictionary, which has the following features: It elaborately provides all information which a dictionary contains; Based on semantic descriptions, it describes syntagmatic and
paradigmatic relationships for each word, e.g.:
red + flower, green + leave, big + eye (syntagmatic rel.)
red, green, and big; flower, leave and eye (paradigmatic rel.);
Lexicons and Lexical Analysis (1)
4
Lexicon: A Language Resource (2)Features for Lexicons (2)
word building: fixed collocation between words;
systematization: description consistency including
morphological, syntactic and semantic description;
formalization: expression with meta-langauge, e.g.
[±noun].
Lexicons and Lexical Analysis (2)
5
Lexicon: A Language Resource (3)Construction of Lexicons
The construction of a lexicon might contain the following critical points:
a knowledgebase rather than database is built. This work should be fulfilled by domain experts;
it can be built by manual or semi-automatic mode; it can be applied to any machine platforms and domains; it should have a general framework, so that it is able to
interact with other lexicons.
Lexicons and Lexical Analysis (3)
6
Lexicon: A Language Resource (4)Types of Lexicons
The lexicon can be divided into four categories:
general lexicon (or basic lexicon);
collocation lexicon;
bilingual lexicon;
domain lexicon.
Lexicons and Lexical Analysis (4)
7
Lexicon: A Language Resource (5)Information within Lexicons
The information of a basic lexicon may contain: lexical information (lexical entry etc.); morphological information (POS, tense, etc.); syntactic information (sentence pattern of verb, etc.); semantic information (semantic attribute, predicate frame,
etc.); conceptual information (conceptual mark, word meaning
explanation, etc.).
Lexicons and Lexical Analysis (5)
8
Lexicon: A Language Resource (6)Sample (Morp., Syn. and Sem.)
“给” (give) :
Morp = [hq2, hq7, vjg, vjl, …];
Syn = [bso, bss, ksd, …];
Sem = [kyd, 240202].
e.g.: hq2 – allow to be followed by a numeral (verb as a quantifier);
bso – it can not act as an object solely;
kyd – donate or bestow;
240202 – taxonomic code
Lexicons and Lexical Analysis (6)
9
Lexicon: A Language Resource (7)Sample (Frame)
“给” (give) → S = NP + VP + NP1 + NP2 Syntactic Frame
NP = [AP] + [QP] + N
VP = [ADP] + V
NP1 = [QP] + N
NP2 = [QP] + N;
NP = AGT (Agent) Semantic Frame
NP1 = DAT (Dative)
NP2 = OBJ (Patient)
NP = human | country | society | saying Semantic Constraint
NP1 = human | animal | collectivity | region
NP2 = thing | a slap in the face | way out | elicitation
Lexicons and Lexical Analysis (7)
10
Lexicon: A Language Resource (8)Collocation Lexicon
Col(w) = <cat, mor, syn, msy, sen>
where: cat – multi-POS;
mor – morphology;
syn – syntax and semantics;
msy – nesting collocation;
sen – sentence modifying rule set.
Lexicons and Lexical Analysis (8)
11
Lexicon: A Language Resource (9)Sample (Collocation Lexicon)
w: ‘ 大概’ (probably)
cat: ^ ‘ 大概’ + (‘ 的’ ; n) @setmark(a);
cat: ^ ‘ 大概’ + (m; p; v; a; b; z) @setmark(d);
cat: q + ^ ‘ 大概’ @setmark(n);
…
…
Lexicons and Lexical Analysis (9)
12
A Lexicon for English Words: WordNet (1)What is WordNet ?
WordNet is an on-line lexical reference system whose design is
inspired by current psycholinguistic theories of human lexical
memory.
English nouns, verbs, adjectives and adverbs are organized
into synonym sets, each representing one underlying lexical
concept. Different relations link the synonym.
Lexicons and Lexical Analysis (10)
13
A Lexicon for English Words: WordNet (2)Information within WordNet
WordNet divides the lexicon into five categories: Nouns Verbs Adjectives Adverbs Function verbs (particles)
WordNet organizes lexical information in terms of wordmeanings, rather than word forms. Therefore, for organization,semantic relations are used.
Lexicons and Lexical Analysis (11)
14
A Lexicon for English Words: WordNet (3)Psycholinguistics
The 20th Century has seen the emergence of psycho-
linguistics, an interdisciplinary field of research concerned with
the cognitive bases of linguistic competence.
Both linguists and psycholinguists have explored in consider-
able depth the factors determining the contemporary (belonging
to the same time) structure of linguistic knowledge in general, and
lexical knowledge in particular.
Lexicons and Lexical Analysis (12)
15
A Lexicon for English Words: WordNet (4)Psycholexicology
Miller and Johnson-Laird (1976) have proposed that research concerned with the lexical component of language should be called psycholexicology. As linguistic theories evolved in recent decades, linguists became increasingly explicit about the information a lexicon must contain in order for the phonological, syntactic, and lexical components to work together in the everyday production and comprehension of linguistic messages, and those proposals have been incorporated into the work of psycholinguists.
Lexicons and Lexical Analysis (13)
16
A Lexicon for English Words: WordNet (5)
Lexicography
Beginning with word association studies at the turn of the
century and continuing down to the sophisticated experimental
tasks of the past twenty years, psycholinguists have discovered
many synchronic properties of the mental lexicon that can be
exploited in lexicography.
Lexicons and Lexical Analysis (14)
17
A Lexicon for English Words: WordNet (6)Naissance of WordNet
In 1985 a group of psychologists and linguists at Princeton
University undertook to develop a lexical database along lines
suggested by these investigations (Miller, 1985).
The initial idea was to provide an aid to use in searching
dictionaries conceptually, rather than merely alphabetically.
As the work proceeded, however, it demanded a more
ambitious formulation of its own principles and goals.
Lexicons and Lexical Analysis (15)
18
POS Unique Strings Synsets Total Word-Sense Pairs
Noun 117798 82115 146312
Verb 11529 13767 25047
Adjective 21479 18156 30002
Adverb 4481 3621 5580
Totals 155287 117659 206941
Lexicons and Lexical Analysis (16)
A Lexicon for English Words: WordNet (7)Size of WordNet
http://wordnet.princeton.edu/
19
A Lexicon for English Words: WordNet (8)
Some Problems
What kinds of utterances enter into these lexical associations?
What is the nature and organization of the lexicalized
concepts
that words can express?
What syntactic roles do different words play?
Lexicons and Lexical Analysis (17)
20
Lexicons and Lexical Analysis (18)
A Lexicon for English Words: WordNet (9)Lexical Matrix (1)
In order to reduce ambiguity, ‘‘word form’’ is used here to
refer to the physical utterance;
‘‘word meaning’’ is referred to the lexicalized concept that a
form can be used to express;
Then the starting point for lexical semantics can be said to be
the mapping between forms and meanings.
21
Lexicons and Lexical Analysis (19)
A Lexicon for English Words: WordNet (10)Lexical Matrix (2)
Word Meanings
Word Forms
F1 F2 F3 . . . Fn
M1
M2
M3
.
.
.
Mm
E1,1 E1,2
E2,2
E3,3
.
.
.
Em,n
If there are two entries in
the same column, the word
form is polysemous; if
there are two entries in the
same row, the two word
forms are synonyms
(relative to a context).
Therefore, F1 and F2 are
synonyms; F2 is
polysemous.
22
Lexicons and Lexical Analysis (20)
A Lexicon for English Words: WordNet (11)Polysemy and Synonymy
Mappings between forms and meanings are many:many—some
forms have several different meanings, and some meanings can be
expressed by several different forms.
That is to say, a listener or reader who recognizes a form must
cope with its polysemy; a speaker or writer who hopes to express a
meaning must decide between synonyms.
23
Lexicons and Lexical Analysis (21)
A Lexicon for English Words: WordNet (12)
Some of the Relations
Synonym
Antonym
Hyponymy / Hypernymy (Subordination / Superordination)
Meronymy / Holonymy (Part-Whole)
24
Lexicons and Lexical Analysis (22)
A Lexicon for English Words: WordNet (13)Synonym (1)
There are several definitions for synonym:
Two expressions are synonymous if the substitution of one for the
other never changes the truth value of a sentence in which the
substitution is made.
Two expressions are synonymous in a linguistic context C if the
substitution of one for the other in C does not alter the truth value.
…
25
Lexicons and Lexical Analysis (23)
A Lexicon for English Words: WordNet (14)Synonym (2)
Note that the definition of synonymy in terms of substitutability
makes it necessary to partition WordNet into nouns, verbs,
adjectives, and adverbs.
That is to say, if concepts are represented by synsets, and if
synonyms must be interchangeable, then words in different
syntactic categories cannot be synonyms (cannot form synsets)
because they are not interchangeable.
26
Lexicons and Lexical Analysis (24)
A Lexicon for English Words: WordNet (15)Antonym (1)
The antonym of a word x is sometimes not-x, but not always. For
example, rich and poor are antonyms, but to say that someone is
not rich does not imply that they must be poor; many people
consider themselves neither rich nor poor.
Antonymy is a lexical relation between word forms, not a
semantic relation between word meanings.
27
Lexicons and Lexical Analysis (25)
A Lexicon for English Words: WordNet (16)Antonym (2)
For example, the meanings {rise, ascend} and {fall, descend} may be
conceptual opposites, but they are not antonyms; [rise / fall] are
antonyms and so are [ascend / descend], but most people hesitate and look
thoughtful when asked if rise and descend, or ascend and fall, are antonyms.
Note that synonymy words are enclosed in curly brackets, ‘{’ and ‘}’,
and other lexical relations will be enclosed in square brackets, ‘[’ and ‘]’.
28
Lexicons and Lexical Analysis (26)
A Lexicon for English Words: WordNet (17)Hyponymy / Hypernymy
It is a semantic relation between word meanings. It is also called as subordination / superordination, subset / superset, or the ISA relation. Hyponymy is transitive and asymmetrical. x is said to be a hyponymy of y if native speakers of English accept the sentence constructed as “An x is a (kind of) y.”
Ex.: tree is a hyponymy of plant
plant is a hypernymy of a tree
29
Lexicons and Lexical Analysis (27)
A Lexicon for English Words: WordNet (18)
Meronymy / Holonymy
It is a semantic relation which can also be called as part-whole
or HASA relation.
x is said to be a meronymy of y if native speakers of English
accept the sentence constructed as “An x is a part of y”.
Ex.: a frame is a part of car or
a car has a frame.
30
Lexicons and Lexical Analysis (28)
A Lexicon for English Words: WordNet (19)User Interface
31
Lexicons and Lexical Analysis (29)
A Lexicon for English Words: WordNet (20)References
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller.
1990. Introduction to WordNet: An on-line lexical database.
Journal of Lexicography, Vol. 3, pages 235-244.
G. Miller. 1990. Nouns in WordNet: A Lexical Inheritance
System. Journal of Lexicography, Vol. 3, pages 245-264.
C. Fellbaum. 1990. English Verbs as a Semantic. Journal of
Lexicography, Vol. 3, pages 278-301.
32
Lexicons and Lexical Analysis (30)Assignments (2)
1. The text described several different example tests for distinguishing word
classes. For example, nouns can occur in sentences of the form I saw the
X, whereas adjectives can occur in sentences of the form It’s so X. Give
some additional tests to distinguish these forms and to distinguish
between count nouns and mass nouns. State whether each of the
following words can be used as an adjective, count noun, or mass noun.
If the word is ambiguous, give all its possible uses.
milk, house, liquid, green, group, concept, airborne