038_2002_v1_josef ruppenhofer, collin f. baker & charles j. fillmore_the framenet database and...

6
REPORTS  (  LEXICOGRAPHICAL  AN D  LEXICOLOGICAL  PROJECTS  T h e  FrameNet  Database  a n d  Software  Tools  Josef  Ruppenhofer,  Collin  F Baker,  an d  Charles  J.  Fillmore  International  Computer  Science  Institute  1947  Center  St .  Berkeley,  C A  94704-1198,  US A  ftosef,  collinb,  fillmore}@icsi.berkeley.edu  website:  http://framenet.icsi.berkeley.edu/~framenet  Abstract  T h e  FrameNet  Project  is  producing  a  lexicon  of  English  for  both  human  us e  an d  N LP  applications,  based  on  th e  principles  of  Frame  Semantics,  in  which  sentences  ar e  described  on  th e  basis  of  predicators  which  evoke  semantic  frames  an  other  constituents  which  express  th e  participants  frame  elements)  in  these  frames.  O ur  lexicon  contains  detailed  information  about  th e  possible  syntactic  realizations  of  frame  elements,  derived  from  annotated  corpus  examples.  n  th e  process,  w e  have  developed  a  suite  of  tools  fo r  th e  definition  of  semantic  frames,  fo r  annotating  sentences,  for  searching  th e  results,  an d  fo r  creating  a  variety  ofreports.  W e  will  discuss  th e  conceptual  basis ofour  work an d  demonstrate  th e  tools  we  work  with,  th e  results  we  produce,  and  how they  m ay  be ofuse  to  other  N LP  projects.  Introduction  The  FrameNet  FN)  research  project  [Fillmore  a n d  Baker  2001,  Baker  et  a l.  1998,  Lowe  et  a l.  1997] 1  s  building  a  lexical  resource  that  aims  to  provide,  for  a  significant  portion  ofthe  vocabulary  of  contemporary  English,  a  body  of  semantically  and  syntactically  annotated  sentences  ro m  which  reliable  nformation  ca n  b e  reported  on  the  valences  combinatorial  possibilities)  ofeach  item  to  be  analyzed.  T h e  project  s  committed  o  a  descriptive  ramework  based  o n  semantic  rames  Fillmore  1985,  illmore  nd  tkins  988]  nd  o  ocumenting  ts  bservations  n  he  asis  f  carefully  annotated  attestations  taken  from  corpora.  semantic  frame henceforth  simply  frame)  is  a  script-like  structure  ofinferences,  linked  b y  linguistic  convention  to  th e  meanings  oflinguistic  units  --  in  our  case,  lexical  items.  Each  frame  identifies  a  s e t  offrame  elements  FEs)  --  participants  a nd  props  in  the  frame.  A  frame  semantic  description  of  a  lexical  te m  identifies  the  frames  which  underlie  a  given  meaning  and  specifies  the  ways  in  which  FEs,  a nd  constellations  of FEs,  a re  ealized  n  structures  headed  by  the  word.  eneralizations  about  frame  structure  a nd  grammatical  organization  a re  derived  automatically  ro m  a  large  body  of  annotated  sentences,  each  of  these  annotated  to  show  o n e  combinatory  arrangement  for  th e  particular  targeted  word.  Corpora  a n d  Software  For  the  first  part  of  th e  project,  T h e  British  National  Corpus  BNC,  http://info.ox.ac.ukfànc)  w a s  used,  courtesy  of  Oxford  University  Press.  or  our  continuing  work,  w e  a re  depending  on  both  the  B N C  a nd  th e  corpora  ofEnglish  news  texts  provided  by  the  LDC;  eventually  w e  hop o  e  b le  o  d d  h e  ull  esources  f  he  merican  ational  orpus  http://www.cs.vassar.edu/ide/ancO-  h e  project  has  used  a n  in-house  user  interface  to  run  th e  Corpus  Workbench  software  ro m  nstitut  u r  Maschinelle  Sprachverarbeitung  of  the  3 7 1  

Upload: escarlata-ohara

Post on 07-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 038_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_The FrameNet Database and So

8/18/2019 038_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_The FrameNet Database and So

http://slidepdf.com/reader/full/0382002v1josef-ruppenhofer-collin-f-baker-charles-j-fillmorethe-framenet 1/5

REPORTS ( LEXICOGRAPHICAL AN D LEXICOLOGICAL PROJECTS

The FrameNet Database and Software Tools

Josef Ruppenhofer, Collin F Baker, an d Charles J. Fillmore

International Computer Science Institute 1947 Center St .

Berkeley, CA 94704-1198, USA ftosef, collinb, fillmore}@icsi.berkeley.edu

website: http://framenet.icsi.berkeley.edu/~framenet

Abstract The FrameNet Project is producing a lexicon of English for both human us e and N LP applications, based on th e principles of Frame Semantics, in which sentences are described on th e basis of predicators which evoke semantic frames and other constituents which express th e participants frame elements) in these frames. Our lexicon contains detailed information about th e possible syntactic realizations of frame elements, derived from annotated corpus examples. n th e process, we have developed a suite of tools for th e definition of semantic frames, for annotating sentences, for searching th e results, and for creating a variety ofreports. W e will discuss the conceptual basis ofour work and demonstrate th e tools we work with, th e results we produce, and how they may be ofuse to other N LP projects.

Introduction The FrameNet FN) research project [Fillmore and Baker 2001, Baker et al. 1998, Lowe et al. 1997]1 s building a lexical resource that aims to provide, for a significant portion ofthe vocabulary of contemporary English, a body of semantically and syntactically annotated sentences ro m which reliable nformation ca n be reported on the valences combinatorial possibilities) ofeach item to be analyzed. The project s committed o a descriptive ramework based on semantic rames Fillmore 1985, illmore nd tkins 988] nd o ocumenting ts bservations n he asis f carefully annotated attestations taken from corpora. semantic frame henceforth simply frame) is a script-like structure ofinferences, linked by linguistic convention to th e meanings oflinguistic units -- in our case, lexical items. Each frame identifies a set offrame elements FEs) -- participants and props in the frame. A frame semantic description of a lexical te m

identifies the frames which underlie a given meaning and specifies the ways in which FEs, and constellations of FEs, are ealized n structures headed by the word. eneralizations about frame structure and grammatical organization are derived automatically rom a large body of annotated sentences, each of these annotated to show one combinatory arrangement for th e particular targeted word.

Corpora and Software

For the first part of th e project, The British National Corpus BNC, http://info.ox.ac.ukfànc) was used, courtesy of Oxford University Press. or our continuing work, w e are depending on both the BN C and th e corpora ofEnglish news texts provided by the LDC; eventually we hop o e ble o dd he ull esources f he merican ational orpus http://www.cs.vassar.edu/ide/ancO- he project has used an in-house user interface to run

th e Corpus Workbench software rom nstitut ur Maschinelle Sprachverarbeitung of the

371

Page 2: 038_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_The FrameNet Database and So

8/18/2019 038_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_The FrameNet Database and So

http://slidepdf.com/reader/full/0382002v1josef-ruppenhofer-collin-f-baker-charles-j-fillmorethe-framenet 2/5

Page 3: 038_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_The FrameNet Database and So

8/18/2019 038_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_The FrameNet Database and So

http://slidepdf.com/reader/full/0382002v1josef-ruppenhofer-collin-f-baker-charles-j-fillmorethe-framenet 3/5

REPORTS ON LEXICOGRAPHICAL AND LEXICOLOGICAL PROJECTS

The rame database contains, or ach rame, ts name nd description, a is t of frame

elements,

ach

with

a description

nd

xamples,

nd

nformation

about

elations

among them. The most important relations include frame inheritance (ISA, with inheritance of FEs from parent to child) and frame composition (PART-OF, with optional bindings between FEs ofsubframes and those ofthe complex frame they constitute).

The lexical database consists ofa lexicon with entries for nouns, verbs, and adjectives. Each entry represents a lexical unit, a pairing ofa lemma with a semantic frame (i.e. one sense of a word). ach entry details th e FEs that can occur with a particular lexical unit and the syntactic patterns in which they can occur, in terms ofphrase type and grammatical function. Every such pattern is supported by annotated examples from a corpus (averaging more than

20 examples per lexical unit).

This ection of the demo will describe he database n om e detail, ncluding nternal database structure (part ofwhich is shown in Fig. , the format ofthe XM L files used for th e distribution, and give instructions for obtaining copies. ecause the data for th e second phase ofthe FrameNet project are much more complex than those for the first phase, a ne w XM L ormat has been used, allowing he epresentation of multiple ayers, overlapping labels, rame nheritance, tc. e re lso nterested n making he FN data accessible through he Smart Web ; o his nd e re n he rocess f adding D F sing DAML+OIL to our XM L representation. om e preliminary versions ofthis format will be

displayed.

Software to be Demonstrated

Web-based Report System Reports are generated from the database providing various views of the data. Of particular interest is the Lexical Entry report, which concisely shows the definition, the FEs, and the valence patterns (FEs in particular combinations of phrase type and grammatical function), with links to the annotated sentences supporting each line in these tables.

Other reports give the complete description of a frame and its FEs, and provide convenient ways to look up frames from lemmas and lemmas from frames.

Frame Editing Tools We will demonstrate th e frame editor, which uses a GUI written in Java to edit the tables in a MySQL database which epresent he rames, rame elements, emmas and exical units being described. he editor not only facilitates creating these units, but also establishing relations among them, i.e. frame inheritance and frame composition as mentioned above.

Demonstration ofManual Frame Element Annotation

We will how the process of annotation used n he daily work of th e project, using a different GUI which dds data o different ables n he MySQL database, epresenting sentences and labels attached to them. nformation regarding POS tags, location ofthe target lemma, and FEs is represented by labels in several layers associated with each sentence. he annotation oftware uses multiple ayers hat allow not only overlapping FE abels and

7

Page 4: 038_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_The FrameNet Database and So

8/18/2019 038_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_The FrameNet Database and So

http://slidepdf.com/reader/full/0382002v1josef-ruppenhofer-collin-f-baker-charles-j-fillmorethe-framenet 4/5

El'RALEX 2002 PROCEEDINGS

discrepancies between syntactic and semantic constituents, but also multiple targets lemmas

within a sentence, each associated with a separate set ofannotation layers. We will briefly discuss th e policies or choosing appropriate entences, delimiting rame elements, defining grammatical functions, etc.

The FrameSQL Tools fo r Searching th e FN Database FrameSQL is a web interface written by Hiroaki Sato of Senshu University., Japan, which allows th e user to earch th e FN database n a variety of ways; here are wo evels of complexity, depending on he needs nd ophistication of th e user. earch parameters include th e frame, th e lemma, th e FEs, specific phrase types and grammatical functions of FEs, th e head noun of a particular FE, etc. W e will demonstrate how to use FrameSQL for

queries such as find all example sentences containing verbs in the Communication frame whose Addressees are expressed as direct objects .

Automatic Frame Element Recognition In an effort to speed up th e annotation process, we are developing a system for recognition of frame lements using ules pecified a priori by exicographers, ased partially on introspection and partially on preliminary corpus searching. or example, in annotating the lemma tell, with sentences such as . he president told th e reporters the answer to the question.

2. he president told th e story to th e Cabinet later that morning.

th e lexicographer recognizes tw o possible valence patterns, V N P N P (ditransitive) and V N P to N P; in Ex . 2, th e preposition to marks th e Addressee, while in Ex . , the presence of tw o N Ps signals that th e first must be th e addressee. set of simple rules, functioning as a cascaded filter, can be written to automatically mark frame elements which match portions of th e rules. he annotation task can then be educed in most nstances) to approving, disapproving, or editing the pre-labeled sentences. Results from this rule-based system will be compared to those from a different system, created by D an Gildea [Gildea and Jurafsky 200, Gildea 2001], using an algorithm that learns from lexical units that have already been annotated.

Acknowledgements We are grateful to th e National Science Foundation for funding th e work of th e FrameNet project hrough wo grants, RI 618838 Tools or Lexicon Building March 997- February 000, nd 086132 FrameNet++: n n-Line exical emantic Resource nd ts Application o peech and Language Technology eptember 2000~ August 2003. he Principal Investigators ofFrameNet++ are Charles J. Fillmore, (ICSI), D an Jurafsky (University ofColorado at Boulder), Srini Narayanan (SRI International/ICSI), and Mark Gawron (San Diego State University).

74

Page 5: 038_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_The FrameNet Database and So

8/18/2019 038_2002_V1_Josef Ruppenhofer, Collin F. Baker & Charles J. Fillmore_The FrameNet Database and So

http://slidepdf.com/reader/full/0382002v1josef-ruppenhofer-collin-f-baker-charles-j-fillmorethe-framenet 5/5

REPORTS ON LEXICOGRAPHICAL, A N D LEXICOLOGICAL PROJECTS

Fig. Structure ofthe FrameNet Database partial)

References [Baker et al. 1998] Baker, C.F., C.J. Fillmore & J.B. Lowe, 1998. The Berkeley FrameNet Project,

in: COLlNG-ACL Í98: Proceedings ofthe Conference, held at the University ofMontreal, pp. 86-90, Association for Computational Linguistics, Montreal.

p411more 1985] Fillmore, C.J. 1985, Frames and th e Semantics ofUnderstanding, in: Quaderni di Semantica VI.2

• • • • •• & tkins 998] illmore, .J. B.T.S. tkins 998, rameNet nd Lexicographic

Relevance, in: Proceedings ofthe First International Conference on Language Resources nd Evaluation. Granada, Spain. flFillmore aker 001] illmore, .J. .F . aker 001, rame emantics or ext

Understanding, n: Proceedings of WordNet and Other Lexical Resources Workshop, held t North American Association for Computational Linguistics, Pittsburgh.

PFillmore t l. 2001] Fillmore, C.J., C. Wooters & C.F. Baker 2001, Building a Large Lexical Database Which Provides Deep Semantics, in: B. Tsou & O. Kwong (eds.), Proceedings ofthe 15th Pacific Asia Conference on Language, Information and Computation. ong Kong.

[Gildea urafsky 000] ildea, aniel nd Daniel urafsky. 000, utomatic abeling f Semantic Roles, In Proceedings ofthe ACL 2000, Hong Kong.

[Lowe et al. 1997] Lowe, John B. and Collin F. Baker and Charles J. Fillmore (1997) A rame-

semantic approach o emantic nnotation,

n:

Marc

Light ed.),

agging Text with Lexical Semantics: hy, hat and How? pecial nterest Group on he Lexicon, ssociation or Computational Linguistics.

Endnotes Th e full texts of most references in

http://framenet.icsi.berkeley.edu/~framenet^apers.html. this paper are available at

375