csa2050: introduction to computational linguistics
Post on 21-Mar-2016
82 Views
Preview:
DESCRIPTION
TRANSCRIPT
CSA2050:Introduction to Computational
Linguistics
Part of Speech (POS) Tagging II
Transformation Based Tagging Brill (1995)
February 2007 CSA3050: Tagging III and Chunking 2
3 Approaches to Tagging
1. Rule-Based Tagger: ENGTWOL Tagger(Voutilainen 1995)
2. Stochastic Tagger: HMM-based Tagger
3. Transformation-Based Tagger: Brill Tagger(Brill 1995)
April 2005 CLINT Lecture IV 3
Transformation-Based Tagging
A combination of rule-based and stochastic tagging methodologies: like the rule-based tagging because rules are
used to specify tags in a certain environment; like stochastic tagging, because machine
learning is used. uses Transformation-Based Learning (TBL)
Input: tagged corpus dictionary (with most frequent tags)
April 2005 CLINT Lecture IV 4
Transformation-Based Tagging
Basic Process: Set the most probable tag for each word as a
start value, e.g. tag all “race” with NNP(NN|race) = .98P(VB|race) = .02
The set of possible transformations is limited by using a fixed number of rule templates,
containing slots and allowing a fixed number of fillers to fill the slots
April 2005 CLINT Lecture IV 5
Transformation Based Error Driven Learningunannotated
textinitialstate
annotatedtext
TRUTH learner
transformationrulesdiagram after Brill (1996)
retag
April 2005 CLINT Lecture IV 6
TBL Requirements
Initial State Annotator List of allowable transformations Scoring function Search strategy
February 2007 CSA3050: Tagging III and Chunking 7
Initial State Annotation
Input Corpus Dictionary Frequency counts for each entry
Output Corpus tagged with most frequent tags
February 2007 CSA3050: Tagging III and Chunking 8
Transformations
Each transformation comprises A source tag A target tag A triggering environmentExample NN VB Previous tag is TO
February 2007 CSA3050: Tagging III and Chunking 9
More Examples
Source tag Target Tag Triggering Environment
NN VB previous tag is TO
VBP VB one of the three previous tags is MD
JJR RBR next tag is JJ
VBP VB one of the two previous words is n’t
February 2007 CSA3050: Tagging III and Chunking 10
TBL Requirements
Initial State Annotator List of allowable transformations Scoring function Search strategy
April 2005 CLINT Lecture IV 11
Rule Templates- triggering environments
Schema ti-3 ti-2 ti-1 ti ti+1 ti+2 ti+3
1 *2 *3 *4 *5 *6 *7 *8 *9 *
February 2007 CSA3050: Tagging III and Chunking 12
Set of Possible Transformations
The set of possible transformations is enumerated by allowing
every possible tag or word in every possible slot in every possible schema
This set can get quite large
April 2005 CLINT Lecture IV 13
Rule Types and InstancesBrill’s Templates
• Each rule begins with change tag a to tag b• The variables a,b,z,w range over POS tags• All possible variable substitutions are considered
February 2007 CSA3050: Tagging III and Chunking 14
TBL Requirements
Initial State Annotator List of allowable transformations Scoring function Search strategy
February 2007 CSA3050: Tagging III and Chunking 15
Scoring Function
For a given tagging state of the corpusFor a given transformation For every word position in the corpus
If the rule applies and yields a correct tag, increment score by 1
If the rule applies and yields an incorrect tag, decrement score by 1
April 2005 CLINT Lecture IV 16
The Basic Algorithm
Label every word with its most likely tag Repeat the following until a stopping
condition is reached. Examine every possible transformation, selecting
the one that results in the most improved tagging Retag the data according to this rule Append this rule to output list
Return output list
April 2005 CLINT Lecture IV 17
Examples of learned rules
April 2005 CLINT Lecture IV 18
TBL: Remarks Execution Speed: TBL tagger is slower than
HMM approach. Learning Speed is slow: Brill’s implementation
over a day (600k tokens)BUT … Learns small number of simple, non-
stochastic rules Can be made to work faster with Finite
State Transducers
April 2005 CLINT Lecture IV 19
Tagging Unknown Words New words added to (newspaper) language
20+ per month Plus many proper names … Increases error rates by 1-2% Methods
Assume the unknowns are nouns. Assume the unknowns have a probability
distribution similar to words occurring once in the training set.
Use morphological information, e.g. words ending with –ed tend to be tagged VBN.
April 2005 CLINT Lecture IV 20
Evaluation
The result is compared with a manually coded “Gold Standard” Typically accuracy reaches 95-97% This may be compared with the result for a
baseline tagger (one that uses no context).
Important: 100% accuracy is impossible even for human annotators.
April 2005 CLINT Lecture IV 21
A word of caution
95% accuracy: every 20th token wrong 96% accuracy: every 25th token wrong
an improvement of 25% from 95% to 96% ??? 97% accuracy: every 33th token wrong 98% accuracy: every 50th token wrong
April 2005 CLINT Lecture IV 22
How much training data is needed?
When working with the STTS (50 tags) we observed
a strong increase in accuracy when testing on 10´000, 20´000, …, 50´000 tokens,
a slight increase in accuracy when testing on up to 100´000 tokens,
hardly any increase thereafter.
April 2005 CLINT Lecture IV 23
Summary
Tagging decisions are conditioned on a wider range of events that HMM models mentioned earlier. For example, left and right context can be used simultaneously.
Learning and tagging are simple, intuitive and understandable.
Transformation-based learning has also been applied to sentence parsing.
April 2005 CLINT Lecture IV 24
The Three Approaches Compared Rule Based
Hand crafted rules It takes too long to come up with good rules Portability problems
Stochastic Find the sequence with the highest probability – Viterbi Algorithm Result of training not accessible to humans Large volume of intermediate results
Transformation Rules are learned Small number of rules Rules can be inspected and modified by humans
top related