deliverable 2.6 evaluation of robustness improvements · have_vh0 a_at1 midi_np1 on_ii my_app$...
TRANSCRIPT
DEEPTHOUGHT
Hybrid Deep and Shallow Methods
for Knowledge-Intensive
Information Extraction
Deliverable 2.6 Evaluation of Robustness
Improvements
The DeepThought Consortium October 2004
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 II
PROJECT REF. NO. IST-2001-37836
Project acronym DeepThought
Project full title DeepThought - Hybrid Deep and Shallow Methods for
Knowledge-Intensive Information Extraction
Security (distribution level)
Contractual date of delivery
Actual date of delivery
Deliverable number
Deliverable name
Type
Status & version
Number of pages
WP contributing to the
deliverable
WP / Task responsible
Other contributors
Author(s) John Carroll, Alex Fang, Frederik Fouvry EC Project Officer Evangelia Markidou
Keywords
Abstract For extracting a subcategorisation lexicon for the English HPSG, we collected a 4M-word corpus of Internet mobile phone discussions, applied the RASP shallow analyser to it, and automatically collected, filtered, and mapped verb frames to appropriate HPSG lexical types. The acquired lexicon contains 3,864 verb stems and a total of 5,608 entries. With respect to the original lexicon and the mobile phone corpus, lexical coverage increased from 85.9% to 94.4% in terms of tokens and from 42.3% to 62.9% in terms of types. Using the LKB parser and the [incr tsdb()] profiling package, on a set of 1,000 randomly selected sentences the lexicon led to an increase in parse success rate from 52.9% to 57.3%.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 3
Content
1 Introduction
Typed unification-based grammatical frameworks such as head-driven phrase-structure
grammars (HPSG; Pollard and Sag, 1994) typically make use of two levels of lexical
descriptions: a generic feature-based description for the word class in general and a set
of lexical entries with specific features for each item. The grammar uses these lexical
descriptions to assign precise syntactic and semantic analyses. Verb subcategorisation
information, in particular, plays a vital role in the correct identification of complements of
predicates. Consider:
[1] I’m thinking of buying this software too but the trial version doesn't seem to have the
option to set priorities on channels.
The two prepositional phrases in [1] can only be attached correctly if the lexicon
specifies that the verb think is complemented by the of phrase and that the complex
transitive verb set can take a prepositional phrase headed by on as the object
complement.
To date, verb subcategorisation lexicons have generally been constructed by
hand. As with most handcrafted linguistic resources, they tend to excel in terms of
precision of description but suffer from omissions and inconsistencies. A reason for this
is that linguistically sophisticated descriptions of natural language are complex and
expensive to construct and there is always some disparity between the computational
grammarian’s introspection and realistic input to the system. As an example, the LinGO
English Resource Grammar (ERG; Copestake and Flickinger, 2000) is a large HPSG
grammar that contains around 2,000 lexical entries for over 1,100 verbs1 selected to
cover a number of application-orientated corpora. When tested with 10,000 randomly
selected sentences from a corpus of mobile phone-related discussions – a new domain
for the ERG and containing many lexical items not present in standard dictionaries – it
1 We use version 1.4 (April 2003) of the LinGO ERG.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 4
achieved a lexical coverage of only about 86% in terms of tokens and 42.3% in terms of
types (see Section 4 below).
In this report, we describe research that aims to enhance the lexical coverage
(and hence the parser success rate) of the LinGO ERG, using purely automatic
techniques. We describe the construction and pre-processing of a domain-specific
corpus, the extraction of verb subcategorisations from it using shallow processing, and
experimental results on the impact of empirically extracted verb subcategorisations on
the performance of the deep grammar/parser combination.
The approach we take is novel in that it combines unsupervised learning of
language with manually constructed linguistic resources, working towards robust deep
language processing.
2 Corpus Construction
We take as our domain and text type emails about models of mobile phones. In selecting
a source for building a corpus of material of this type, practical considerations include:
• The source be unencumbered by copyright.
• The source be rich enough to allow for a sizable sample.
• The source be diversified enough to include a mixture of both spontaneous discussions and
more formal prose (e.g. press releases).
As a result of these considerations, we decided to use the following mobile phone news
groups, as archived by Google:
• alt.cell-phone.tech
• alt.cellular-phone-tech
• alt.cellular sub-groups:
alltel, attws, bluetooth, cingular, clearnet, data, ericsson, fido, gsm, motorola, nextel, nokia,
oki, rogersatt, sprintpcs, tech, telephones, umts, verizon
We downloaded 16,979 newsgroup postings, covering the period from 27 December
2002 to 4 April 2003.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 5
2.1 Pre-processing
Each posting was automatically segmented into a header, quoted text, body text, and
signature, based on the HTML markup and formatting clues (see Figure 1). We made
sure that only the body text was retained as part of the corpus, with the header, any
quotation and signature automatically removed.
Header
From: John S. ([email protected])
Subject: Re: Which GSM phone
Newsgroups: alt.cellular
Date: 2003-03-04 07:19:25 PST
Quotation
>The only real disadvantage of it is that you must
manually change from 1900 to
>900/1800 when you travel.
>
Text This was a minor annoyance compared to all
the other issues with Motorola equipment!
Signature
--
John S.
e-mail responses to – john at nationwide-online.net
Figure 1. An Example Email in the Corpus.
Because of the planned evaluation of the subcategorisation lexicon, it was necessary to
divide the corpus into training, development, and testing sections. From the corpus,
1,000 articles were set aside as the development set, 2,000 were held out for testing,
and the rest (13,979 articles in all) were used for the training of the subcategorisation
lexicon. Table 1 summarises the division of the corpus. Ten sets of 1,000 sentences
were randomly selected from the testing section, to be used for evaluation purposes.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 6
Development 1,000
Testing 2,000 Articles
Training 13,979
16,979
Development 226,089
Testing 482,810 Tokens
Training 3,341,417
4,050,316
Development 20,465
Testing 33,454 Types
Training 109,771
123,177
Table 1. Summary Statistics of the Corpus.
2.2 Tokenisation and Part-of-Speech Tagging
We next applied the RASP system (Briscoe and Carroll, 2002) sentence boundary
detector, tokeniser and part-of-speech (PoS) tagger to the training corpus. The 13,979
articles gave rise to 251,805 sentences. On this corpus we estimate sentence
segmentation and tagging accuracy to both be around 95%, and the tagger’s guessing
of unknown words to be 80% accurate. See [2].
[2] I_PPIS1 've_VH0 been_VBN looking_VVG around_RL and_CC finally_RR
have_VH0 a_AT1 MIDI_NP1 on_II my_APP$ computer_NN1 for_IF my_APP$
Motorola_NP1 C332_NN1 GSM/-GPRS_MC phone_NN1 ._.
2.3 Lemmatisation
The lexical component of the ERG is based around word stems rather than inflected
word forms. We therefore applied the RASP morphological analyser (Minnen et al, 2001)
to reduce inflected verbs and nouns to their base forms. See [3].
[3] I_PPIS1 have_VH0 be+en_VBN look+ing_VVG around_RL and_CC finally_RR
have_VH0 a_AT1 MIDI_NP1 on_II my_APP$ computer_NN1 for_IF my_APP$
Motorola_NP1 C332_NN1 GSM/-GPRS_MC phone_NN1 ._.
2.4 Syntactic Parsing
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 7
The RASP parser was used to syntactically analyse the corpus. The parser uses a
manually written “shallow” unification grammar of PoS tags and contains a statistical
disambiguation model that selects the structurally most plausible syntactic analysis. The
parser by default does not contain any word co-occurrence information, which makes it
suitable for use in a system that acquires lexical information. The parser can recover
from extra-grammaticality by returning partial analyses, which is essential for the
processing of real-world data. The parser produces output that explicitly indicates the
verb frame for each clause.
[4] I_PPIS1 have_VH0 talk+ed_VVN to_II the_AT tech_JJ in_II Motorola_NP1 ,_, and_CC
for_IF my_APP$ local_JJ service_NN1 provider_NN1 ,_, but_CCB they_PPHS2
cannot_NN1 help_VV0 I+_PPIO1 ._.
For instance, [4] has a pair of coordinated finite clauses and its analysis by RASP
comprises the two verb frames shown in Figure 2. As this example indicates, each verb
frame is represented as a “pattern set” that also contains the heads of the complements
of the verb (with SLT being the subject, and OLT the object). Prepositional complements
are also represented if they occur.
After parsing, 165,852 of the 251,805 sentences in the training section of the
corpus produced at least one verb pattern. This means that about 65.9% of the corpus
was useful data for the extraction of verb subcategorisation patterns.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 8
(#S(LPATTERNSET
:TGTWD |talk+ed|
:TPNO 1
:PATTS(#S(LPATTERN
:TLT (|talk+ed| VVN)
:SLT ((I PPIS1))
:RNAME (VSUBCAT PP)
:OLT1 ((PSUBCAT NP) ((|to| II))
NIL) :PNO 1)))
#S(LPATTERNSET
:TGTWD |help|
:TPNO 1
:PATTS (#S(LPATTERN
:TLT (|help| VV0)
:SLT ((|they| PPHS2))
:RNAME (VSUBCAT NP)
:OLT1 ((I+ PPIO1))
:PNO 1))))
Figure 2. The RASP Analysis for [4].
3 Lexicon Construction
This phase consists of two stages: extraction of subcategorisation frames, and then
mapping them to the ERG scheme.
3.1 Extraction of Subcategorisation Frames
This stage involves the extraction of all the observed frames for any particular verb. For
this purpose, we used the subcategorisation acquisition system of Briscoe and Carroll
(1997) as enhanced by Korhonen (2002) and applied it to all the verbal pattern sets
extracted from the training section of the parsed corpus. There are thus three sub-
categorisation representations: the RASP grammar subcategorisation values used in the
parsed corpus, the Briscoe and Carroll (B&C) classes produced by the acquisition
system, and the ERG lexical types for the target grammar.
Figure 3 shows one frame produced for the verb abandon. From the training
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 9
section of the corpus, a total of 16,371 such frames were extracted with 4,295 unique
verb stems. On average, each verb has 3.8 different frames.
3.2 Mapping between RASP and ERG
The final stage in constructing the new ERG verb subcategorisation lexicon is the
mapping of subcategorisation frames from the B&C scheme to the ERG scheme. The
B&C scheme comprises a total of 163 possible subcategorisations and ERG 216.
#S(EPATTERN
:TARGET |abandon|
:SUBCAT (VSUBCAT NP) :CLASSES (24 5281)
:FREQSCORE 1.5707502e-6 :FREQCNT 7 :TLTL (VV0 VV0 VV0 VV0 VV0 VV0 VV0)
:SLTL
((|they| PPHS2)) ((|they| PPHO2)) ((|Motorola| NP1)) (I+ PPIO1)) :OLT1L
(((|phone| NN1))
((|microcell| NN1))
((TDMA NP1) (AMPS NP1))
((TDMA NP1))
((|packaging| NN1))
((|packaging| NN1)) (|iPAQ| NP1)))
Figure 3. An Extracted Frame for abandon.
.A B&C-to-ERG translation map was manually drawn up2 and automatically applied to
the acquired subcategorisation lexicon. 145 of the B&C subcategorisation frames map to
70 unique ERG lexical types, indicating a considerable degree of many-to-one matching.
In the current mapping, for example, 9 different B&C frames are mapped to
v_empty_prep_trans_le, the ERG type for verbs complemented by a prepositional phrase:
NP-FOR-NP She bought a book for him.
NP-P-ING-OC She accused him of being lazy.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 10
NP-P-ING-SC She wasted time on combing her hair.
NP-P-ING-AC She told him about going to the park.
NP-P-NP-ING She blamed it on no one buying it.
NP-P-POSSING She asked him about his missing the train.
NP-P-WH-S She asked whether she should go.
NP-P-WHAT-S She asked what she should do.
NP-PP She added salt to the food.
Conversely, 48 ERG lexical types do not have a B&C counterpart. Both schemes
encode syntactic and semantic distinctions but it is evident that they do this in different
ways.
The eventual lexicon contains 5,608 entries for 3,864 verb stems, an average of
1.45 entries per verb. See Figure 4 for entries acquired for the verb accept (where the
lexical type v_np_trans_le represents a transitive entry and v_unerg_le an intransitive
one).
Accept_rasp_v_np_trans_le
:= v_np_trans_le &
[ STEM <“accept”>].
Accept_rasp_v_unerg_le
:= v_unerg_le &
[ STEM <“accept”>].
Figure 4. Entries Acquired for accept.
4 Lexical Coverage
We carried out a number of experiments to measure the impact of the subcategorisation
lexicon on the performance of the HPSG parser. First of all, we wanted to determine
whether the empirically extracted verb stems would enhance the lexical coverage of the
grammar. Secondly, we wanted to see whether the empirically extracted verb entries
would result in better parsing success rate, i.e., more sentences receiving an analysis by
2 We are grateful to Dan Flickinger for providing us with this mapping.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 11
the parser. A third possibility is that the use of empirically extracted verb
subcategorisations, when applied to text of the same subject domain, would produce
more accurate analyses. However, our experiments to date have been limited to the
evaluation of the first two. In addition, we investigated the impact of the acquired lexicon
on system performance, as measured by parsing time and space.
For the first experiment, we collected and unified all the verb stems from three
machine-readable lexicons: 5,453 from the machine-readable Oxford Advanced
Learner’s Dictionary (OALD), 5,654 from ComLex (Grishman et al, 1994), and 1,126
from the ERG lexicon. These three lexicons jointly yielded a total of 6,341 verbs.
Comparing the 3,864 mobile phone verbs with this joint list, there are a number of items
that are not represented which thus can be considered to be particular to the mobile
phone domain. Manual inspection of the list indicates that there seem to be three groups
of verbs. First of all, verbs like recognise, realise, and patronise may have crept into the
list as a result of British English spelling not being represented in the three sources.
Secondly, we observe some neologisms such as txt and msg, which are almost
exclusively used within the mobile phone domain. Finally, we observe verbs that have
been derived from free combinations of prefixes and verb stems such as re-install and
xtnd-connect. These observations suggest the importance of the inclusion of variant
spellings, and empirical corpus-based selection of lexical items.
To measure the lexical coverage of these verb stems, ten sets of 1,000 randomly
selected sentences each were used as testing material. For comparison, all the verb
stems (1,126 in all) were extracted from the manually coded ERG lexicon. The results
show that the hand-selected verb stems have an average coverage of 85.9% for verb
tokens and 42.3% for verb types in the 10 test sets. In comparison, the empirically
selected list has a much higher coverage of 94.4% in terms of tokens and 62.9% in
terms of types. The average token coverage of the lexicon that would be produced by
taking the union of the verbs in OALD and Comlex is 91%. See Figure 5. There is
considerable variation in the coverage by ERG verb stems across the ten test sets
(SD=1.1) while the empirically selected verbs have a much more consistent coverage
with a standard deviation of only 0.34. The high level and consistency of coverage by the
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 12
empirically selected verb stems demonstrate the advantages of a corpus-based
approach to lexical selection.
5 Parse Success Rate
In order to achieve tangible results through comparison, we compared the original
version of the ERG with an enhanced version that incorporates the empirically extracted
verb subcategorisations. For these experiments we used the efficient HPSG parser PET
(Callmeier, 2000), launched from [incr tsdb()], an integrated package for evaluating parser
and grammar performance on test suites (Oepen, 1999). For all of the experiments
described in the following sections, we set a resource limit on the parser, restricting it to
produce at most 40,000 chart edges.
The evaluation aimed at two scenarios: theoretical enhancements from the
empirically extracted verb subcategorisations and realistic improvements. The former
was an attempt to ascertain an upper bound on parser coverage independent of
practical limitations of parser timeouts as a result of the increase in lexical ambiguity in
the grammar. The latter was performed in order to establish the real enhancements
given practical constraints such as available parsing space and time. The following
sections describe these two experiments.
80
82
84
86
88
90
92
94
96
98
100
1 2 3 4 5 6 7 8 9 10
Figure 5. Lexical Coverage by the 3 Verb Sets.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 13
5.1 Theoretical Enhancement
Deep grammars tend to require large amounts of memory when parsing to
accommodate rich intermediate representations and their combination. High levels of
lexical ambiguity often result in parser timeouts and thus could affect the correct
estimate of the enhancements afforded by the extracted verb subcategorisations.
Whether PET produces an analysis for a test item is decided by several factors. While
the coverage of the grammar plays a central role, lexical complexities may cause the
parser to run out of memory. We therefore attempted to factor out effects caused by
increased lexical ambiguity in the acquired lexicon. For this purpose, a set of 1,133
sentences were specially selected from the test corpus with the condition that all the
verbs were represented in the ERG grammar. We then manipulated this sub-corpus,
replacing all of the nouns by sense, adjectives by nice, and adverbs by nicely. In doing
so, it was guaranteed that failure to produce an analysis by the parser was not due to
lexical combinatorial problems. Any observable improvements could be unambiguously
attributed to the extracted verb subcategorisations. We refer to the original handcrafted
grammar as the “ERG grammar” and the modified version with extracted verb
subcategorisations as the “ERG+Subcat grammar”.
Table 2 and the subsequent ones in this section have columns that provide
summary information about the parsing of the test set:
Table 2. Theoretical Coverage of the ERG.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 14
• Aggregate sentence length breakdown
• Total items number of sentences
• Positive items sentences seen by the parser
• Word string average number of words
• Lexical items average number of lexical entries
• Parser analyses average number of analyses
• Total results total number of successfully parsed sentences
• Overall coverage percentage of successfully parsed sentences
As can be seen from Table 2, the average parse success rate of the ERG for the test set is
48.5%, with sentences of fewer than 5 words receiving the highest percentage of
analysis (60.8%). Lexical ambiguity is about 3.6 entries per word (33.71/9.32). In
contrast, Table 3 shows that the enhanced ERG achieved a considerably higher parse
success rate, i.e., 63.5%, an increase of 15 basis points over that achieved by the
original grammar. This time, 85.7% of the sentences of fewer than 5 words have at least
one analysis, an almost 25% increase over the coverage of the same group by the
original grammar. This improvement was achieved at a considerable increase in lexical
ambiguity, 6.98 entries per word versus 3.6 for the original grammar. Parsing speed
dropped from one second per sentence to about 4 seconds per sentence3, and the
average space requirement increased from 4.8 MB to 13.4 MB.
3 There are 10 sentences which disproportionally increase the average. This appears to be due to errors in the subcategorisations for a few common words such as ask.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 15
Table 3. Theoretical Coverage of the ERG+Subcat Grammar.
It should be noted that many ‘sentences’ in the original test set do not conform to
standard English grammar; the version of PET that we used does not contain a
robustness component, so such sentences failed to get a parse. In addition, lexical
substitution was applied only to words tagged by RASP as either adverbs, adjectives,
and nouns; there remain still a large number of ill-formed words (e.g. *that*, dfgh, 17c)
that eventually caused parse failures. Moreover, the words we used for lexical
substitution are not grammatical in all contexts so in some cases substitution actually
reduced parser coverage. Therefore, the absolute coverage figures should not be taken
as definitive. However, since both setups used the same set of substituted sentences
these failures do not affect our measurement of the difference in coverage, the subject of
this experiment.
5.2 Realistic Enhancement
For the second experiment, we used a test set of 1,000 sentences randomly selected
from the original test corpus. Unlike the previous experiment, the sentences were not
lexically reduced. Since PET requires that every input token be known to the grammar,
we supplemented the grammar with an openclass lexicon of nouns, adjectives and
adverbs. These additional openclass items were extracted from the training section of
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 16
the mobile phone corpus (as tagged by RASP) and consisted of 29,328 nouns, 7,492
adjectives, and 2,159 adverbs, totalling 38,797 stems. To cater for lexical items in the
test set that are still not represented in the extended lexicon, the RASP tagging system
was used as a PoS pre-processor for the test set. Each input sentence is thus annotated
with PoS tags for all the tokens. For any lexical item unknown to the grammar, PET falls
back to the PoS tag assigned automatically by RASP and applies an appropriate generic
lexical description for the unknown word.
Table 4. Realistic Coverage of the ERG.
For this experiment, again, we have two grammars. The baseline statistics were
obtained with the original handcrafted ERG grammar supplemented by the additional
openclass items in the noun, adjective, and adverb classes. The enhanced grammar is
additionally supplemented with the extracted verb subcategorisations. As indicated in
Table 4, the original ERG grammar scored an average parse success rate of 52.9% with
the space limit of 40,000 edges. Four test items had to be deleted from the test set since
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 17
they consistently caused the parser to crash. The actual success rate is 52.7%.
Table 5. Realistic Coverage of the Enhanced Grammar.
The same version of the ERG grammar was then integrated with the automatically
extracted verb subcategorisations and the augmented version was applied to the same
set of test sentences with the same system settings. As shown in Table 5, the enhanced
grammar had 57.3% coverage, an increase of 4.4% over the original grammar without
the automatically extracted verb subcategorisations.
We thus calculate the overall coverage enhancement as 4.4%. As in our previous
experiment, we observed an expected increase in lexical ambiguity due to the increase
of lexical items in the lexicon, up from 3.05 entries per word to 4.87. CPU time increased
from 9.78 seconds per string for the original grammar to 21.78 for the enhanced
grammar, with space requirements increasing from 45 MB to 55 MB.
6 Conclusion
We presented a novel approach to improving the robustness of deep parsing, through
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 18
the unsupervised acquisition of a verb subcategorisation lexicon. The acquired
information helps deep parsing to produce detailed logical form representations that
shallow analysers are unable to.
We described the construction of a corpus in our target mobile phone domain,
and the acquisition of verb subcategorisations from the corpus through shallow
processing with the RASP system. The empirically selected verb stems show enhanced
lexical coverage of 94.4% against the 85.9% achieved by an existing handcrafted list of
verb stems.
We then reported experiments to establish theoretical and practical gains from
the use of extracted verb subcategorisations. When tested with 1,133 lexically reduced
sentences, we observed that the enhanced grammar had 15% better coverage than the
original grammar. Under practical constraints of parsing space and time, the enhanced
grammar had an overall parse success rate of 57.3%, a 4.4% increase over the original
grammar.
In our experiments, we used a completely automatic process to augment an
existing hand-crafted lexicon. If manual effort is available, a good strategy would be for
a linguist to check the acquired entries before they are added to the lexicon.
References Briscoe, E. and J. Carroll. 1997. Automatic extraction of subcategorization from corpora. In Proceedings
of the 5th ACL Conference on Applied Natural Language Processing, Washington, DC. 356–363.
Briscoe, E. and J. Carroll. 2002. Robust accurate statistical annotation of general text. In Proceedings of
the 3rd International Conference on Language Resources and Evaluation, Las Palmas, Gran Canaria.
1499–1504.
Callmeier, U. 2000. PET – A platform for experimentation with efficient HPSG processing techniques.
Natural Language Engineering, 6(1) (Special Issue on Efficient Processing with HPSG):99–108.
Copestake, A. and D. Flickinger. 2000. An open-source grammar development environment and broad-coverage English grammar using HPSG. In Proceedings of LREC 2000, Athens, Greece.
Grishman, R., C. Macleod and A. Meyers. 1994. Comlex syntax: Building a computational lexicon. In
Proceedings of the 15th International Conference on Computational Linguistics, Kyoto, Japan. 268–
272.
Evaluation of Robustness Improvements D 2.6
DeepThought IST-2001-37836 19
Korhonen, A. 2002. Subcategorization Acquisition. PhD thesis published as Techical Report UCAM-CL-
TR-530. Computer Laboratory, University of Cambridge.
Minnen, G., J. Carroll and D. Pearce. 2001. Applied morphological processing of English. Natural
Language Engineering, 7(3). 207-223.
Oepen, S. 1999. [incr tsdb()]: Competence and Performance Laboratory: User & Reference Manual,
Computational Linguistics, Saarland University, Saarbrücken, Germany.
Pollard, C. and I. Sag. 1994. Head-Driven Phrase Structure Grammar. Chicago University Press.