dictionary and corpus - dcu school of computingjwagner/doc/ludewig_collocationsh04.pdf ·...

Post on 11-Jun-2020






Click to see full reader



Collocations –Mediating between Lexical Abstractions

and Textual Concretions

Petra LudewigInstitute of Cognitive Science

University of OsnabrückGermany

Joachim WagnerNational Centre for

Language TechnologySchool of ComputingDublin City University


Overview1. Introduction

– Gap between Abstractions and Concretions– Collocations

2. LogoTax– General Objectives– Three Layer Representation– Technological Aspects

3. Conclusion

Abstractions vs. Concretions

Linguistic paradigms• Generative

Grammar• Structuralism

Pedagogical paradigms• Instructivsm• Constructivism

Gap between Abstractions and Concretions

Abstract description of a single word

Authentic example sentences

?LogoTax, a system combining dictionary and corpus


Collocations• Associations of two or more lexemes

– are more or less semantically transparent– involve an arbitrary choice of at least one lexeme– usually cannot be translated compositionally – often highly frequent– sometimes show a special morpho-syntactic

behaviour • “give a talk”

– German: “einen Vortrag halten”– French: “faire une conférence”

Morpho-syntactic Behavior of Collocations

• to put an end to something– *But then I decided to put the end to these

unedifying contacts.– *The end to which I put these unedifying

contacts was pleasant.• Normal behaviour

– to give a talk– the talk that I give today ...

LogoTaxGeneral Objectives

• A combination of dictionary and corpus• Tool to build up a personal dictionary

– tailored to individual needs– reading-based and production oriented– learning as knowledge construction– data-driven entry design– German verb-noun combinations

LogoTaxThree Layer Representation

Abstract Layer: canonical form

full set subsetExample Layer: full, authentic sentences

Intermediate Layer:

morpho-syntactic featuresand their frequency counts

LogoTax - Three Layer RepresentationAbstract Layer

LogoTax - Three Layer RepresentationExample Layer

• Screenshot “Examples”


LogoTax - Three Layer RepresentationIntermediate Layer

• Screenshot “Variations”

LogoTax - Three Layer RepresentationExample Layer – Grouped

LogoTax - Three Layer RepresentationConnecting the Representation Layers

Mediating description

Textual concretion

Lexical abstraction

Connecting the Representation Layers

How is this done?gepardlfg-parser



parseable ~ 30% parseable

Light the fire.

irrelevant: no/wrong relation examples +

feature description

The explosion lit a fire at a nearby mobile home park.

Light the fire!He lit

the cand


that c

aused th

e fire.

The explosion lit a fire at a

nearby mobile home park.

He lit the candle that caused the fire.

LogoTaxTechnological Aspects

• Automatic retrieval of examples– POS Tagger (IMS)– Der Spiegel 1994– aligned (en/de) corpus of EU publications

• LFG-based parsing• Parser coverage:

– approx. 30%– low recall, high precision

• Chart parser: exponential degradation


LogoTax• does more than just showing examples• uses parsing

– to automate feature identification– to distinguish compatible sentences from

incompatible ones • groups examples according to featues• gives relevant statistics of features


Thank you!


ReferencesHeid, U. (1994): On Ways Words Work together – Research Topics in Lex-

ical Combinatorics. In Martin, W., W. Meijs, M. Moerland, E. ten Pas, P. van Sterkenburg and P. Vossen (Ed.): EURALEX ´94, Proceedings of the VIth Euralex International Congress, S. 226 – 257, Amsterdam.

Lewis, M. (2000): Teaching Collocation: Further Developments in the Lexical Approach. Language Teaching Publications (LTP), Hove.

Ludewig, P. (2001): LogoTax – un outil exploratoire pour l'étude de collocations en corpus. In: tal (traitement automatique des langues), vol. 42:2, Special Issue on: Natural Language Processing and Corpus Linguistics / Traitement automatique des langues et linguistique de corpus. Hermès, Paris.

Ludewig, P. (2003): Korpusbasiertes Kollokationslernen – Computer-Assisted Language Learning als prototypisches Anwendungsszenario der Computerlinguistik. Habilitation thesis, University of Osnabrück.

Spitzer, M. (2002): Lernen – Gehirnforschung und die Schule des Lebens.Spektrum – Akademischer Verlag, Heidelberg.

top related