037_2002_v1_josef ruppenhofer, collin f. baker & charles j. fillmore_collocational information in
TRANSCRIPT
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
1/11
REFORTS ON
LEXICOGRAPHICAL
AN D
LEXICOLOGICAL
PROJECTS
Collocational Information
in
the
FrameNet Database
JosefRuppenhofer,
Collin
F.
Baker,
and Charles
J.
Fillmore
International
Computer Science Institute
1947CenterSt .
Berkeley, CA
94704-1198,
USA
ftosef,
collinb,
fillmore}@icsi.berkeley.edu
website:
http://framenet.icsi.berkeley.edu/~framenet
Abstract
Th e FrameNet
exical
atabase
yields
nformation
bout collocations
nd
multiword
xpressions n
various
ways.
n
om e
cases
phrasal
units
have
been entered
from th e start as
lexical
entries (write
down). In
other
cases headword+preposition pairs
can
be recognized
as
special
collocations
where
th e preposition
in
question
is
a ecessary
nd
exically
pecified
marker
of
an rgument
of th e
eadword
fond
of, ostile
o) . Nominal
compounds
re
nnotated with espect
o
oun r
pertinative) djective modifiers, om e of which re
analyzable
but
also entrenched
wheel chair, fiscal year). Nouns that
name
ggregates, portions,
ypes,
tc.,
sometimes
hold
lexically specified relations to
their dependents •
•
•
ofgeese). nd event
nouns
frequently
select
th e upport
verbs
which
ermit
them o
nter
nto
redications
file
n
bjection,
nter a
plea).
A
subproject aims
at extracting, as
structured
clusters of
lexical
tems, th e minimal
emantically
central kernel
dependency
graphs
ro m th e set
of
annotations.
Such research
will yield
not
only
ommonplace groupings
(eat:
og, bone)
but
will
ls o
yield
itherto unnoticed
ollocations
within
uch graphs
(answer:
ou,
oor)
where
certain dependency links
within
them are
idiomatic
or
otherwise
lexically
special,
here answer
door.
Cpllocational
information
can
also be
retrieved
by various types ofqueries within our MySQL search tool.
Introduction
Th e FrameNet
research
project
[Baker
et
al.
998;
Fillmore
&
Baker
2001]
s building an
online
lexical
resource that
aims
to
provide,
fo r
a
significant
portion
of
th e
vocabulary
of
contemporary
English,
a
body
of
semantically and
syntactically
annotated
sentences
from
which
reliable information
can be
reported
on th e
valences
or
combinatorial
possibilities of
each
item
included.
The project
uses
a descriptive
model
based on semantic
frames
[Fillmore
1977,
982,
985;
Fillmore
&
Atkins
988]
nd documents
ts
observations
by
means
of
carefully
annotated
attestations
taken
from corpora, each
sentence
annotated
in
respect to a
single
arget
word
with
th e
phrases that
are
n grammatical
onstruction
with t
abelled
according
to their
grammatical
relation to th e target, th e
semantic
role they serve
within
th e
frame
to
which th e target word
belongs,
and
it s
syntactic
phrase type.
Th e FN
database
can serve
both
human and
machine users"
nd
can
function
both as
a
dictionary
and as
a
thesaurus.
As
a
dictionary,
each
lexical
unit
( lemma in a
given
sense)
is
provided
with (1) th e name of
th e
rame
t belongs to and access to a description
of
th e
frame, (2 )
a
definition
(either
original
or
from th e Concise Oxford Dictionary, courtesy of
Oxford
niversity
ress),
3)
alence
escription
hich
ummarizes
he
ttested
combinatorial ossibilities n espect
o oth emantic oles nd he yntactic orm nd
function of th e hrases
that
nstantiate those roles, nd (4) ccess o
annotated
xamples
illustrating
each
syntactic
pattern
found
in
th e
corpus
and th e
kinds
ofsemantic information
they contain. The
semantic
role
annotation
s
done
manually
y
persons
trained n
rame
359
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
2/11
EURALEX
2 2
PROCEEDINGS
semantic
heory;
he yntactic
nformation
s
dded
utomatically,
nd he
ull alence
descriptions
are
produced
automatically.
It
is
possible
to
consider
th e
database
as
a
thesaurus
by
noting
that lemmas
are
linked
to
th e
semantic frames in
which they
participate,
and frames, in
turn,
are
l inked
both
to
th e full
set
ofwords which instantiate them and to related frames. Frame-to-frame relations include (1)
composition,
y
which
omplex
rame s
ecomposable
nto
ubframes,
ften
n
structured
procedural sequence
(thus,
th e Arraignment frame
is
treated as
a subframe
of
th e
Criminal Justice rame), nd (2 ) nheritance,
by which
a
single rame can be seen
as
n
elaboration of
one
or
more
other frames, with bindings between
th e
inherited semantic
roles
(a s
when
criticize
(in
th e Judgement_Communication frame) can
be
seen
as
inheriting
from
both
th e
Judgement
and
Communication
frames,
requiring
a
binding
between
th e
Speaker
of
th e
Communication
frame
and
th e
Judge
ofthe
Judgement.
The
Fram eN e t
Database
The FN ata are
tored
n
MySQL
database
Fillmore t
al . 001] which s
asically
divided nto
w o
alves,
ne
epresenting
he rames, he
rame
lements,
he emmas
connected
with them and th e
relations
among
them (shown
in
Fig. ),
nd th e
other
(not
shown)
representing
th e
corpus
sentences
and
th e
labels
attached to them,
marking
phrases
as
instantiations of given frame elements, phrase types and grammatical
functions, etc. This
division
corresponds
to
th e
tw o
main
software
tools
used
in
FN
work, th e
Frame
Editor
and
th e
Annotation
tool,
both
ofwhich
will
be
demonstrated
in
th e
FrameNet
demonstration.
Lemma
CreatedD4e
Crct f edBy
Q P a tO S p ch _ R e l
&ocked
STInheril
QP* e n t ST _Re l
fl>0*ET_Stl
_ 2C
LexUnit
S e m e D e * :
DeatedDate
Crc4edBy
Q L e n m 4_Re
QF ( o n e _Rd
eLocked
SemantJcTyps
Frame
N ar
Defintìon
CreatedDate
CreotedBy
bLocked
Image
Symbofcflep
~
^
•
4
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
3/11
REPORTS ON
LEXICOGRAPHICAL AM)
LEXICOLOGICAL PROJECTS
including
th e
name
and
definition,
one
fo r
frame
elements,
and
a
relation between
them
(the
line
marked
"A"
in
Fig.
)
such
that
each
FE
is
associated
with
exactly
one
frame. Lexical
units (LUs) are represented as a
table
linking lemmas
and
frames, i.e. an LU is a Saussurean
sign
linking
form
and
meaning;
there
is
also
a
field
fo r
a
description of
th e
sense.
Lemmas,
in turn
are
composed
of
one
or
more
lexemes, and lexemes
have
one
or
more word
forms.
For example,
th e
lexeme
grill
(with
word
forms grilled,
grilling, etc.)
is
th e
only
lexeme
of
a
l emma
which
is
associated
with
tw o
quite
different frames,
Questioning and
Apply_heat.
Of particular
interest
to th e present discussion is th e relation between
lemmas
and
lexemes,
which s many o
many
the in e
marked B"
n
Fig. ),
meaning
hat
emma
an
e
comprised
of more
than
one lexeme (multiword
expressions,
MWEs), and a
lexeme
can be
associated
with
more
than
one lemma. Fo r
example,
th e lemmas
write
up,
write
down
and
write
in are
al l associated with th e Writing
frame on th e one hand,
and all
contain
th e
lexeme
write,
sharing
its
word
forms
writing,
wrote,
etc. on
th e
other.
There are
three procedures by which data
are made
part
ofthe
database: (1 )
th e
annotation
process,
hrough
hich,
or pecifically
argeted
exical
nit,
xemplary
entence
constituents
are tagged
according to their
semantic
and
syntactic
relation
to th e target;
(2 )
descriptions
of frames
repared
y
he
rame
nalyst, nd
3)
elations mong
rames
prepared by
th e
lexicon
analyst.
{Databo» T r e e SW>l
S e nte nc e
pl a c e d
f l rd in
load
q u eu e
E
¡CtoMjyf
Figure 2.
Screenshot
ofthe
Annotation Tool
361
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
4/11
EüRALEX 2002 PROCEEDINGS
The
Annotator, shown n Figure
2 ,
consists
of
three rames
The
top
eft one
contains
a
listing
ofthe
names
ofindividual subcorpora within
which
annotation
is
carried
out ,
derived
from
searches
o r
the
target
word (write)
n
certain
predefined
syntactic
contexts.
The to p
right
frame lists
th e sentences ofthe
currently
selected
subcorpus.
Th e lower
frame is
th e
place here
ndividual
entences
re
nnotated
y
pplying
using he
ouse
r he
keyboard)
labels
to
the annotation layers
ofthe
constituents
instantiating
particular
semantic
roles.
The
most important
annotation layers
are
the
top
three
shown
here ,
which
represent the
Frame Element
(i.e.
frame-specific semantic role) ,
th e
Grammatical Function, and the Phrase
Type oftheconstituent
being
labeled.
Fi le Da t a B a s e F r ame Othe r
Q Shoot_pro)ect i les
Q
Si lence
Q
Skill_test
D S l e e p
D
stice
Q
Smuggl i ng
Q Sociabi l i ty
Q Social_connect lon
Q Soc la i_even t
Q
State
Q
Sta tement
Q
Str tctness
Q
Suasion1
Q
Suas lon2
Q
Suas lon3
Q
SuasJon4
Q
Swarmmg_mot ion
Q
Terr i tory
QTest2
QThefl
Q Transfer
Q Translt rve_act lon
Q
Transportat ion
D
Tr ia |
QType
Q Useful_space
Q
Use_f i rearm
Q Verbs
Q Volubi l i ty
Q Weal th iness
Qwri i i J
tH
FrameElements
Lexical
Unit s)
r me
Editor
FrameName
Nt i ng
Symbolic
Rep
bnage
Cha ng e
C rea ted Da te :
2002-02-28
17:57:56 .0
C rea ted B y : i n f in i ty
Las t Modif ied
B y : in fín i ty
Frame Definition
An Author
creates
a Text (physical ,
like
a
letter,
or
spoken,
like
a
speech) which
con ta ins mean ingfu l
l lngutst lc
tokens,
a nd may ha ve
a partJcularAddressee In mind.
Th e
topic is
not
an FE ,
t hough it may
be found Inside
th e
const i tuent
of the
Text
AUTHOR] {penned} [a
letter concerning
r ac lsmTEXT)
[to
CongressADDR) .
rjhe brothers AUTHOR]
{satd}
(not
tw o
words
TEXT]
[to each other ADDR | .
Frame Eiemt(s)
Relatk>nship s)
say:COD: 1e.recite aspeechorfom
speateFN: sa y < a
verba l
formula) .
| type
i n :FN :
i npu t
by
typ ing.
type ou tFN :
type completery, express
type up:FN:
create
a
message
by typlr
| t ype:COD: wri te using a typewri ter
| u t t s r : COD:
. make asound)orsay
| wri te
down :FN:
physical ly wri te
(a
ma
|writein:FN:
nsert aname)bywriting
wri te
ou tFN :
wri te
careful ly,
ganera l t y
|wri te
upCOD
S .V wri te)
d
wri te a
fi
wr i teCOD
a
ompose (a tex to rwo i
:
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
5/11
REPORTS ON
LEXICOGRAPHICAL AN D
LEXICOLOGICAL
PROJECTS
Chitd Frame
Elaboration Editor
Parent Frame
fcvrWnj|'
Addresses
HonoreeOHon)
«CopyAD«
«Copy Setect«d«
A u t o U ap
il
Mappliigs
for Child
Au tho r=
ln tent lona l^_t rea te.Aaent
Components=tntenUonaNy_crea te, Componen ts
Creatad_ent l ty
=
intenMonaity_craate.
Crea tad_enf l ty
DeplcUve-actQr=tntenuona l fy_crea te,
Depícüve-actor
ins t rumen i=
i n ten f lona i ty_crea tB.
I ns t rument
Location
=
lntentionaHy_create.
Place
Man ne r =
in tenUonaHy_craate,
Manne r
••
^
Parent
RetatiO iship
Note(s)
«Add
N ew Note>
Figure
4.
Screenshot
ofthe
Frame
Inheritance
Editing
Tool
The elationships
ditor l lows he
exicon nalyst o ecord
nheritance elationships
between
he
urrent rame
nd
ther
rames.
As igure
4
hows,
ne
an
onsider
he
semantic cenario of
writing
s
ub-type
of
intentionally reating omething. The
not
necessarily
omplete)
quivalences
etween
he
emantic
oles
of
th e w o
rames
re
indicated
as
so-called
mappings:
th e
author
ofthe
Writing
frame corresponds to
th e
agent of
th e
intentional
creation frame; etc.
Collocations
and
M W E s in
a
Samp le Text
In
order
to
get
some
idea
ofthe
MWEs
present
in
typical
texts, le t
us
consider
the
following
example
ofjournalist ic prose, taken
from
CNN.conVLAW, dated 14
February
2001.
Washington
(CNN)—
Alleged White
House
gunman Robert Pickett w as arraigned
Wednesday
at a
federal
court
in
Washington
and
ordered
held
without
bond. A
federal magistrate nformed Pickett of th e
harges
gainst
im
—
assaulting
federal
officer
with
a
deadly weapon, which carries a
maximum
of te n
years
in
prison.
Th e
magistrate
et a
preliminary
hearing
or
next Tuesday and
rdered
Pickett held
without
bond.
Pickett,
who
w as
shot
in
th e
knee
by th e
Secret
Service
after
allegedly firing tw o shots outside th e
White House, used crutches to
walk
into th e
court.
He did not enter a
plea.
A
general
way
ofapplying FN
valence
information
to th e analysis
ofa
sentence
is
to
(1)
363
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
6/11
EURALEX
2002
PROCEEDINGS
choose
a word
(starting
from
the
highest semantically-relevant
predicate),
(2 )
determine
the
f rames
that
this
word
evokes
in
this
context,
(3 )
notice
the
semantic
roles
of
the
participants
in each such f rame, 4) match
the semantic needs associated
with
each such r ame hence
with each sense
ofthe
word) with phrases
found in the
sentence, (5) those
which permit the
most
coherent
it ,
nd
6)
egister
he emantic tructures associated
with
he
dependent
constituents as provided by
the
selected f rame.
But the
analysis cannot
s imply
proceed on the
basis
of f rame
information built on the words
of the
text
taken
one
a t
a
time.
Many word sequences
in our
text
m us t
be
identified
as
f ixed
phrases
or
tight
collocations,
the
most obvious ones being the
proper
names
White
House,
Robert
Pickett,
and
Secret
Service, others including
held
without
bond,
assaulting
afederal
officer
with
a
deadly
weapon,
preliminary
hearing,
firing
shots,
nd
enter
a
plea.
A ll
of
these phrases re parsable
an d
semantically
transparent, bu t they re
also
entrenched: held
without
bond
s
ne
of
the
tandard phrases
or eporting a decision
n n
arraignment
hearing,
assaulting
afederal
officer with
a
deadly
weapon is
a
named offense
in American
\aw,preliminary
hearing
is
a
named step in the
criminaljustice process,
andfire
and
enter
are best treated as support verbs fo r the event nouns shot and plea respectively.
Many
ofthese words evoke subframes ofa
f rame involving
s teps
in
U.S.
criminal process.
Other
phases
of
the
process deal with
bail ,
indings
of guilt
or
innocence,
sentencing,
tc.,
and
various bortions
or
alternative
outings
hrough
he
process
uch
s
kipping
ail,
changing
one ' s
plea,
having
ajudge
dismiss
(or
the
prosecutor
withdraw)
the
charges
against
the
defendant,
nd so
n.
The
exica l
units n complex r ame
will
simultaneously
evoke
both
he hase
of
the
rocess
which
m a ps
n to
he
grammatical
tructure
of sentences
containing
a
given
lex ica l unit ,
and
the larger
event
type
ofwhich that
phase is
a
part .
Information on
Multiword
Expressions
in
the
FrameNet
Database
Information about Multiword Expressions
is
represented in , or derivable f rom, the FrameNet
database in
a
variety
ofways.
1. Multiword
Lexical Units
Certain
l e m m a s
in
the
FN
database
were
entered
as
multiword
units
from
the
start.
Examples
ofMWEs entered as
such
wil l
include lexicalized noun-noun compounds (wheel chair ,
etc.);
verb-particle emmas trip
p,
tc.),
nd
various
inds
of
id ioms
cook someone's goose,
etc.). In
some cases lemmas
originally
treated
on their
own
were
la ter recognized as
bested
treated
as
part
ofa MWE and the analysis was changed accordingly.
2. Collocations Involving
Subcategorization
Details
In
xtracting
entences
or annotation,
e m m a s
re argeted n heir ubcategorizational
contexts,
n d
om e
of
these
nvolve
the marking of
a
constituent
by particular.
Thus,
w e
searched for the
verb
object and
the noun objection in contexts where it preceded
a
to-phrase
(nobody
objected o your
decision;
he main objection o hat proposal).
These
LU s
an
occur
alone,
where
the
entity
objected to is missing but pragmatically salient
(Iobject
How
could
there
be
any
object ion?). But whenever this FE
is
made explicit in the sentence, the
syntactic
constituent
that
expresses it
has
to
be
a
prepositional
phrase
headed
by
to . (In
this
regard, hen , verbs ik e
object
differ from the particle verbs, whose particles can never be
3 64
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
7/11
REPORTS
ON
LEXICOGRAPHICAL AND
LEXICOLOGICAL
PROJECTS
omitted.)
To
repare
or
he
iscovery
of
such
ollocations
as
object o, prevent f rom,
interested
in ,
fond
of,
etc.),
th e
lexicographers
identify
th e
core
FEs
in
th e
words '
valence.
That is ,
or
each
LU , we
identify those
FEs
which
are most
centrally
connected
with th e
word 's meaning,
as
distinguished from those
that are
more
peripheral. Thus,
n
a sentence
like She
objected to th e
bill
in
an
angry editorial
in
the Times, th e phrase to th e bill is more
central han n n ngry editorial n he
imes.
W e
an
hen egard LUs with
exically
specified
prepositions
in their core valence as
instances
ofmult iword
units.
3.
Noun
Compounds
with
Core
FE
as Modifier
Among
he
ossibilities
or
he mplification
of
th e rames
ssociated
with oun
s
modification
y
another
noun
navy
captain)
r
by
pertinative
2
also alled
elational)
adjective
naval
commander).
Extracting
oun
ompounds ro m
he
FrameNet
atabase,
then,
will
eek
ut either noun
equences dentified
s
multiword
argets,
r
ouns with
modifiers
that
are
labeled
with
core FEs.
4 .
Collocations across Transparent Nouns
One
of th e
amilies
of
noun
ypes
we
ave agged,
which
we
all
ransparent
nouns,
includes ouns
esignating
ypes, ggregates,
arts,
ortions, lassifiers,
nitizers,
tc.,
especially
as
they occur
as
th e
first
noun
n
an
N ofN
construction.
B y
transparent"
we
mean
that
in
this
construction
th e
first noun
is transparent with
regard to
collocational
r
selection
elations
etween
he econd
N nd he xternal ontext of
th e onstruction
[Fontenelle
20xx]
or
transparent
to
number
agreement
[Svensson
1998].
3
Examples
with
th e
relevant
collocations
underlined
follow:
(1)
n
th e
1920s,
after
th e
British
literary establishment had neglected
him
fo r
forty
years,
Machen
attracted
a
coterie
ofadmirers in
th e
United States.
(2 )
Certain
trains of
Escherichia
coli (E.
olO.
or
example
re responsible
or
causing
"Traveller's
Diarrhoea".
(3 )
He
has
pinned
a little square ofmater ia lonto
both
his
knees
so
that
when
he
drives,
th e
fabric
ofhis
best trousers
will
not ru b against th e
steering
wheel.
5 .
Collocations
with
Transparent
Nouns
Th e fact that
we
have transparent nouns labeled as such makes
it possible to produce
tables
ofthese
N -N pairs, and when
w e
do
we
will find some that are
lexically
significant:77ocfc
of
geese,pride
oflions,
swarm
ofbees,
bout
oftheflu,
case
ofhepatitis,
etc. Ofthe
Type
nouns
we will
find
many
that
are
completely
general
(type, kind,
sort), others
that
are
more
special
(variety, brand, strain, breed). Thus, th e
decision
to give special status to transparent nouns
contributes
to
th e
detection
of
M W E s
n
tw o ways:
irst,
here an
be exically
elevant
pairings etween
he
tw o
ouns
n he
onstruction, ut
secondly,
means
of collecting
linguistically
relevant collocates
can
be devised
in
which
it can
be
shown
that
th e
second N ,
not th e
first,
figures in th e
collocational relation.
(Thus,
using
examples in
section
5,
we
can
detect
th e collocations
attract
admirers,
pin
material
onto his
knees,
and
Escherichia
coli
(E.coli)
auses
raveller s Diarrhoea.)
his
s
elated
o
he
ernel
ependency
graph
extraction
exercise
discussed
below.
365
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
8/11
E RALEX 2002 PROCEEDINGS
6.
Event Nouns
with
Support Verbs
Among the contexts
n
which
FrameNet annotators dentify external arguments or event
nouns
is
that
ofbeing a syntactic argument ofan accompanying predicate (control structures,
but ls o
upport
verbs
of
all
ypes) . ince
upport
verbs
Akimoto 989;
Mel'cuk 995,
1996,
998]
epresent he nteresting
ase
f
objects
electing
erbs
ather han
erbs
selecting
objects,
w e
wil l
find that
there
are lexicographically
interesting
pairings of
support
verb plus noun, and it
wil l
be possible to construct such tables
as
the
following for
the
nouns
in our Statement
frame,
which show these relationships:
Support Verbs Event Nouns
make
address, admission, allegation,
announcement ,
assertion,
comment ,
complaint,
concession,
confession,
declaration, xclamation, proclamation,
remark, s tatement
give
address, exclamation, lecture
deliver
address,
lecture
issue
declaration, denial, proclamation
utter exclamation,
remark
express,
lodge,
register,
submit,
voice
complaint
face,
get
complaint
have
complaint,
revelation
Table .
Event Nouns and Associated
Support
Verbs
In
the
Statement
Frame
Table provides
everal
nteresting pieces
of
information.
t sho ws
that
make
occurs
with
the
broadest range
ofnouns
in
this
frame.
It also suggests
that
the
type
ofspeech
events
that
are
delivered
are
nes
hat
have
a
public
audience
ather
han just n nterlocutor
s
n
Addressee.
In
addition
the
table
shows ,
in an indirect
w a y ,
that
various semantic
roles of
an
event noun can be realized
as
the subject ofdifferent support verbs. Consider the fact that the
noun complaint
occurs
n
four
different
rows
n
the
table.
The verbs
in
the irst
two
of the
rows
ake
he
perspective
of
the
peaker
mak e ;
xpress,
tter,
odge,
egister,
ubmit),
whereas those
in
the last two rows
tface,
get;
have)
take the perspective of the addressee.
This
distinction is exemplified
fo r
complaint
by
the
sentences in (4-5)
and
(6-7).
(6) The woman MADE no complaint
to
the police for five
days .
(7 )
Voters
have
VOICED
complaints
at the
elections being held before
the
trials
begin nd
before Mr Papandreou has
a
chance
to
prove
his
innocence.
(4) Bernard Antoine,
general manager of the Novotel, West London,
said
he
had RECEIVED
no
complaints about charges.
(5 )
om e
ofthe things he
says
are
really
quite outrageous;
do
you ever
GET
any complaints
about
h is
language?
Inspection of support
verb
patterns cross
many emantic
r ames would
ikely
upport a
larger currently
n ly ntuition-based) eneralization
hat face nd
get lways
xpress
patients
(o r
a t
least ,
non-agents).
3 66
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
9/11
REPORTS
ON LEXICOGRAPHICAL AND
LEXICOLOGICAL
PROJECTS
Kernel
Dependency
Graphs KDGs)
One
of th e
side
activities
of th e
FrameNet
work
is
that of
devising
a
means of
extracting
what
w e
re alling kernel dependency graphs, by which we
mean
isplays
of
frame-
bearing
lexical
units
found in
th e
corpus together with
the
lexical
heads
of
th e
constituents
that
realize
their
core
FEs.
(in
th e
case
of
phrases marked with function words , we would
want this
to include information about
the
marker
and
th e head of th e marked constituent.
This
would effectively
be
a
display of governors together
with
their dependents along with
an
indication
of
both
th e
semantic
roles
and
the
marking
of
those dependents.
Thus, or
a
sentence
like (8 )
th e top-level
K D G could
be
represented as
in
(9).
(8 ) Th e
patient
objected
strenuously
to th e diet
her
doctor
put
her
on.
(9)
o ject
actor
p tient
content diet
marker
t o ]
One
of
th e advantages
of
recognizing that
arguments
can be found
at a distance from
th e
predicates
hey
re
emantically
ependent
n,
hrough ontrol tructures of th e amiliar
kind,
upport
verbs nd
transparent
ouns,
s
hat t ecomes
ossible
o
ero
n n he
semantically correct KDGs
in
th e
data.
Thus,
from
a
sentence like (10)
it
should be possible
to
detect
a K D G
as
in
(11), centered
in th e
noun objection
that
looks
almost exactly
like th e
previous
one,
by seeing through' th e control structure around likely,
th e support verb have,
and
th e
transparent
nouns;
given
th e
meaning of
th e
sentence,
patient
and
diet
are
more
appropriate
lexical
companions
to
objection
than
kind
and
sort.
(10)
hat
kind ofpat ient is
likely to have
strong
objections
to
this
sort
ofdiet .
1 1
jection
support h ve
actor
p tient
content diet
marker
t
Th e minimal
parsing
needed
fo r finding th e head nouns
can
generally be
done
automatically.
By gnoring ll of
th e
ransparent
tructures, w e an asily ind he words
eeded
or
extracting
th e
semantically
significant
KDGs
in
a
text.
Of
particular
interest
to
our
present
point,
om e of th e KDGs w e re ow ble o
ecognize
will
urn
ut
o e mportant
collocations in
their
ow n
right. ince
special
collocations
occur
not
only between verbs and
deverbal ouns nd
heir
omplements,
ut
ls o
etween
djectives
nd he
ouns
hey
modify,
expanding
th e
search fo r
KDGs beyond complementation structures to modification
structures
allows
us to
add
a
new
class
ofcollocations.
Including
examples
from
Fontenelle
[1999,
pp .
28-29]
w e
can find, by
skipping past th e
transparent
nouns,
th e collocations
in the
left
column
in
th e phrases given
in
the
right column
of
Table
2.
Th e first tw o are relations
between
verbs and objects; th e
next
tw o are
relations
between
adjectives
an d their
semantic
heads;
th e
last
tw o
are relations
between
prepositions
and their
object
nouns.
367
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
10/11
EüRALEX 2002
PROCEEDINGS
Collocation
Text
lay
eggs
The
hens
laid
dozens
of
eggs.
suffered fever
suffered
a bout
of fever
sound advice
a
sound piece
of
advice
a
fine
mess afine sort of mess
on
table on this
part
of th e
table
in
closet
in
this part
of
th e
closet
Table
2.
Finding
Collocations
across
Transparent
Nouns.
Sum m ary
So m e
nformation about multiword expressions is a
par t
of
the
FrameNet database because
lexicographers
hose
to enter space-separated words.as lexical units; th e rest is derivable by
searches r eports based n he nformation entered
n to
he database y other means .
Sometimes
an
argument
(typically
a
subject)
of
a
verb
that
takes
a
frame-bearing
noun
as it s
direct
object
is necessarily dentified with
a
f rame
element
ofthat object
noun .
hile
many
such verbs
(support verbs
and
other
sorts
of
lexical
unctions n the
sense
of
Mel'cuk)
add
configurational information
of
on e
kind or another to the verba l concept (features
of aspect,
point
of view,
evaluation,
tc.),
heir main
unction
n
m a n y
cases s
to
combine with he
nominal
object to express
a
verbal meaning : all
such
pairings (verb + object noun) can count
as MWEs; those in
which
the verb is lexically selected
by the noun are entrenched MWEs
and
hould
e
isted
eparately
n he
exicon.
erbs, djectives
nd
ouns
hose
semantically
asic
omplements
re
exically
pecified
s
eing
arked
y
articular
prepositions,
an e
ounted
s MWEs
nd
isted
it h
heir
repositions.
ll oun
compounds re MWEs, ut
n ly
hose which have eanings ssigned o hem beyond
whatever semant ic structure they m ay have y compositional principles wil l need separate
entries
in
the lex icon.
n
sum, the
database resulting from
the
straightforward
exicographic
practice
created
fo r
the FrameNet
project
has
proved capable ofyielding
reliable
information
about
articipation
n
ollocational
atterns
nd
ultiword
xpressions or
he
ords
covered in the database.
Acknowledgements
W e
are grateful to the
National
Science
Foundation fo r funding the work of
the
FrameNet
project hrough wo
rants,
RI 618838 Tools
or
exicon
uilding March
997~
February
000 , nd
TFVHCI 086132
FrameNet++:
n
n-Line exical emantic
Resource
nd
ts
pplication
o
peech
nd
anguage
echnology eptember
000-
August
2 0 0 3 .
he Principal nvestigators of
FrameNet++ re Charles . Fillmore, ICSI),
Dan
Jurafsky (University
ofColorado
a t
Boulder),
Srini Narayanan (SRI Internat ionaLflCSI) ,
and
Mark Gawron (San
Diego
State
University).
References
[Akimoto 1989]
Akimoto,
Minoji 1989, A Study of Verbo-Nominal
Structures
in
English.
Shinozaki
Shorin, Tokyo.
••••
•
• et al . 1998]
Baker,
C.F.,
C.J. Fillmore & J.B.
Lowe,
1998. The Berkeley FrameNet Project,
in: COLING-ACL
198:
Proceedings
ofthe
Conference,
heldat
the
University ofMontreal,
pp.
86-90,
Association
fo r
Computational
Linguistics,
Montreal.
368
-
8/18/2019 037_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_Collocational Information In
11/11
REPORTS
ON
LEXICOGRAPHICAL AND
LEXICOLOGICAL PROJECTS
pFillmore
1985] Fillmore, C.J. 1985,
Frames
and
th e
Semantics
ofUnders tanding,
in :
Quaderni
di
Semantica
VI.2
rFillmore
&
Atkins
998]
Fillmore, C.J.
B.T.S.
Atkins
998,
rameNet
nd
exicographic
Relevance, in:
Proceedings
ofthe First
International
Conference
on
Language
Resources And
Evaluation.
Granada,
Spain.
[Fillmore
Baker
001]
Fillmore, CJ.
&
C.F.
Baker
001,
Frame Semantics
fo r
ext
Understanding,
n
Proceedings
of
WordNet
nd
Other
Lexical
Resources
Workshop, eld
t
North American Association
fo r Computat ional
Linguistics,
Pittsburgh.
rFillmore
t
l. 001]
Fillmore,
C.J.,
C.
Wooters
&
C.F.
Baker 001, Building
Large
Lexical
Database
Which
Provides
Deep
Semantics,
in :
B.
Tsou
&
O.
Kwong (eds.),
Proceedings
ofthe
15th
Pacific
Asia
Conference on
Language, Information
and
Computation.
Hong Kong.
fFontenelle
1999] Fontenelle,
T.
1999,
Semantic
Resources fo r Word
Sense
Disambiguation: A
Sine
Q ua
Non?,
in :
Linguistica
e
Filologia
9.
University
degli
Studi
di
Bergamo,
Italy.
[Gildea
&
urafsky
000]
Gildea, Daniel nd
aniel urafsky.
000, Automatic
abeling
of
Semantic
Roles,
In
Proceedings
ofthe
ACL
2000,
Hong
Kong.
fMel'cuk
995]
Mel'cuk,
. 995,
Lexical
unctions,
n:
L.
anner ed.)
Lexical Function
n
Lexicography
an d N LP . Benjamins,
pVIel'cuk
1996] Mel'cuk,
I.
1996, Phrasemes and
Phraseology,
in :
M.
Everaert,
E.-J.
van
der
Linden,
A. Schenk,
R. Schreuder
(eds.),
Idioms. tructural
and
Psychological Perspectives, Lawrence
Erlbaum Associates.
N ew
Jersey.
PVIel'cuk 1998] el 'cuk,
.
1998, ollocations
nd
exical unctions,
n:
.
owie
ed.)
Phraseology. heory, Analysis andApplications.
Clarendon Press.
[Svensson
1998]
Svensson,
P.,
1998. Nu m b e r
andCountabili ty
in English Nouns .
Uppsala: Swedish
Science
Press.
Endnotes
'
The word "frame" here means "section
of a window on th e screen"
2
Pertinative
adjectives
("pertainyms"
in
WordNet terminology)
generally
are not used predicatively
and
hen
odifying ouns
enerally
unction
n
ays imilar o
odifying ouns
n
oun
compounds. (Compare linguistic [adj]
society and
linguistics
[n ]
society,
Paris [n ] connection and
French
[adj] connection.)
Occasions
ofpredicate
use
ar e
found in
special
constructions
{thisproblem
is
economic
in
nature).
3
Of course we need
to
recognize hat
not
every
instance of
an
N
ofN pattern is a t ransparent
noun
structure: many
relational
nouns
occurring as
th e
first
N
in
this
pattern
can
have
a
following
q^-phrase
as
a
complement.
here
th e
same
word
occurs
in
either
such
structure
w e
can
have
local
ambiguity,
with
cases
in
which
it
is th e
first
noun
ofan7Vq^Nphrase
that is
th e relevant collocate
ofsomething
in
its
environment: compare
ea t a number ofapples with
calculate the number ofapples.
Th e
noun
number
is a paradigmatic fellow
to
such nouns as bunch, group, cluster, collection, etc.,
in
th e
one
context, and to
quantity,
size,
weight,
height,
etc.,
in th e
other.
3 69