1 persian part of speech tagging mostafa keikha database research group (dbrg) ece department,...
TRANSCRIPT
1
Persian Part Of Speech Tagging
Mostafa Keikha
Database Research Group (DBRG)
ECE Department, University of Tehran
2
Decision Trees
Decision Tree (DT): Tree where the root and each internal node is
labeled with a question. The arcs represent each possible answer to the
associated question. Each leaf node represents a prediction of a
solution to the problem. Popular technique for classification; Leaf
node indicates class to which the corresponding tuple belongs.
4
Decision Trees
A Decision Tree Model is a computational model consisting of three parts: Algorithm to create the tree Algorithm that applies the tree to data
Creation of the tree is the most difficult part. Processing is basically a search similar to that in
a binary search tree (although DT may not be binary).
6
Using DT in POS Tagging
Compute Ambiguity classes Each term may have
different tags Ambiguity class for each
term: set of all possible tags
compute # of occurrence for each tag in each ambiguity class
Ambiguity Class
# of occurrence
a b c d10 20 25 40
b c d 40 39 50
b d 60 55
7
Using DT in POS Tagging
Create Decision Tree on Ambiguity classes
In each level delete tag with minimum occurrence
a b c d10 20 25 40
b c d40 39 50
b d60 55
b
8
Using DT in POS Tagging
Advantage Easy to understand Easy to implement
Disadvantage Context independent
9
Using DT in POS Tagging
Known Tokens Results
Run PercentTokensCorrectAccuracy
197.9739392336376492.34%
298.0635563032896592.50%
397.9639752836778992.51%
497.9241056138157892.94%
597.9740307937230592.36%
Average97.976392144.2362880.292.474%
11
POS tagging using HMMs
Let W be a sequence of words W = w1 , w2 , … , wn
Let T be the corresponding tag sequence T = t1 , t2 , … , tn
Task : Find T which maximizes P ( T | W )
T’ = argmaxT P ( T | W )
12
POS tagging using HMMs
By Bayes Rule,
P ( T | W ) = P ( W | T ) * P ( T ) / P ( W )
T’ = argmaxT P ( W | T ) * P ( T )
Transition Probability,
P ( T ) = P ( t1 ) * P ( t2 | t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | t1 … tn-1 )
Applying Tri-gram approximation,
P ( T ) = P ( t1 ) * P ( t2 | t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | tn-2 tn-1 )
Introducing a dummy tag, $, to represent the beginning of a sentence,
P ( T ) = P ( t1 | $ ) * P ( t2 | $ t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | tn-2 tn-1 )
13
POS tagging using HMMs
Smoothing Transition Probabilities
Sparse data problem
Linear interpolation method
P'(ti | ti - 2 , ti - 1) = λ1 P( ti ) + λ2 P(ti | ti - 1 ) + λ3 P(ti | ti - 2 , ti - 1)
such that the s sum to 1
15
POS tagging using HMMs
Emission Probability,
P(W | T ) ≈ P(w1 | t1) * P(w2 | t2) * . . . * P(wn | tn)
Context Dependency
To make more dependent on the context the emission probability is calculated as:
P(W | T ) ≈ P(w1 | $ t1) * P(w2 | t1 t2) ...* P(wn | tn-1 tn)
16
POS tagging using HMMs
Smoothing technique is applied
P' (wi | ti-1 ti) = θ1 P(wi | ti) + θ2 P(wi | ti-1 ti) Sum of all θs is equal to 1
θs are different for different words.
23
POS tagging using HMMs
Known Tokens Results
Run PercentTokensCorrectAccuracy
198.0739429038221196.94%
298.1634591334591397.18%
398.0439784934389496.96%
498.0241097039848796.96%
598.0740346039147597.03%
Average98.072390496.437239697.01%
24
Unknown Tokens Results
Run PercentTokensCorrectAccuracy
11.937760582975.12%
21.846689535780.09%
31.967956615377.34%
41.988283643577.69%
51.937945624678.62%
Average1.9287726.6600477.77%