text classification
DESCRIPTION
Text Classification. Elnaz Delpisheh Introduction to Computational Linguistics York University Department of Computer Science and Engineering August 24, 2014. Outline. Definition and applications Representing texts Pre-processing the text Text classification methods Naïve Bayes - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/1.jpg)
Elnaz Delpisheh
Introduction to Computational Linguistics
York UniversityDepartment of Computer Science and
EngineeringApril 24, 2023
Text Classification
![Page 2: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/2.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationSubjective text classification
![Page 3: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/3.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 4: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/4.jpg)
Text Classification-DefinitionText classification is the assignment of text
documents to one or more predefined categories based on their content.
The classifier:Input: a set of m hand-labeled documents (x1,y1),....,
(xm,ym)Output: a learned classifier f:x y
Classifier
Text document
Class A
Class B
Class C
Text docum
ent
Text docum
ent
![Page 5: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/5.jpg)
Text Classification-ApplicationsClassify news stories as World, US, Business, SciTech,
Sports, Entertainment, Health, Other.Classify business names by industry.Classify student essays as A,B,C,D, or F. Classify email as Spam, Other.Classify email to tech staff as Mac, Windows, ..., Other.Classify pdf files as ResearchPaper, OtherClassify documents as WrittenByReagan, GhostWrittenClassify movie reviews as
Favorable,Unfavorable,Neutral.Classify technical papers as Interesting, Uninteresting.Classify jokes as Funny, NotFunny.Classify web sites of companies by Standard Industrial
Classification (SIC) code.
![Page 6: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/6.jpg)
Text Classification-ExampleBest-studied benchmark: Reuters-21578
newswire stories9603 train, 3299 test documents, 80-100
words each, 93 classesARGENTINE 1986/87 GRAIN/OILSEED REGISTRATIONSBUENOS AIRES, Feb 26Argentine grain board figures show crop registrations of grains, oilseeds
and their products to February 11, in thousands of tonnes, showing those for future shipments month, 1986/87 total and 1985/86 total to February 12, 1986, in brackets:
• Bread wheat prev 1,655.8, Feb 872.0, March 164.6, total 2,692.4 (4,161.0).
• Maize Mar 48.0, total 48.0 (nil).• Sorghum nil (nil)• Oilseed export registrations were:• Sunflowerseed total 15.0 (7.9)• Soybean May 20.0, total 20.0 (nil)The board also detailed export registrations for subproducts, as follows....
Categories: grain, wheat (of 93 binary choices)
![Page 7: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/7.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 8: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/8.jpg)
Text Classification-Representing Texts
ARGENTINE 1986/87 GRAIN/OILSEED REGISTRATIONSBUENOS AIRES, Feb 26Argentine grain board figures show crop registrations of grains, oilseeds and their
products to February 11, in thousands of tonnes, showing those for future shipments month, 1986/87 total and 1985/86 total to February 12, 1986, in brackets:
• Bread wheat prev 1,655.8, Feb 872.0, March 164.6, total 2,692.4 (4,161.0).• Maize Mar 48.0, total 48.0 (nil).• Sorghum nil (nil)• Oilseed export registrations were:• Sunflowerseed total 15.0 (7.9)• Soybean May 20.0, total 20.0 (nil)The board also detailed export registrations for subproducts, as follows....
f( )=y? What is the best representation
for the document x being classified?
simplest useful
![Page 9: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/9.jpg)
Pre-processing the TextRemoving stop words
Punctuations, Prepositions, Pronouns, etc.Stemming
Walk, walker, walked, walking.IndexingDimensionality reduction
![Page 10: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/10.jpg)
Representing text: a list of words
ARGENTINE 1986/87 GRAIN/OILSEED REGISTRATIONSBUENOS AIRES, Feb 26Argentine grain board figures show crop registrations of grains, oilseeds and their
products to February 11, in thousands of tonnes, showing those for future shipments month, 1986/87 total and 1985/86 total to February 12, 1986, in brackets:
• Bread wheat prev 1,655.8, Feb 872.0, March 164.6, total 2,692.4 (4,161.0).• Maize Mar 48.0, total 48.0 (nil).• Sorghum nil (nil)• Oilseed export registrations were:• Sunflowerseed total 15.0 (7.9)• Soybean May 20.0, total 20.0 (nil)The board also detailed export registrations for subproducts, as follows....
f( )=y
f( )=y(argentine, 1986, 1987, grain, oilseed, registrations, buenos, aires, feb, 26, argentine, grain, board, figures, show, crop, registrations, of, grains, oilseeds, and, their, products, to, february, 11, in, …
![Page 11: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/11.jpg)
Pre-processing the Text-IndexingUsing vector space model
![Page 12: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/12.jpg)
Indexing (Cont.)
![Page 13: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/13.jpg)
Indexing (Cont.)tfc-weighting
It considers the normalized length of documents (M).
ltc-weighting It considers the logarithm of the word frequency to
reduce the effect of large differences in frequencies.
Entropy weighting
![Page 14: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/14.jpg)
Categories: grain, wheat
grain(s) 3oilseed(s) 2total 3wheat 1maize 1soybean 1tonnes 1
... ...
word freqARGENTINE 1986/87 GRAIN/OILSEED REGISTRATIONSBUENOS AIRES, Feb 26Argentine grain board figures show crop registrations of
grains, oilseeds and their products to February 11, in thousands of tonnes, showing those for future shipments month, 1986/87 total and 1985/86 total to February 12, 1986, in brackets:
• Bread wheat prev 1,655.8, Feb 872.0, March 164.6, total 2,692.4 (4,161.0).
• Maize Mar 48.0, total 48.0 (nil).• Sorghum nil (nil)• Oilseed export registrations were:• Sunflowerseed total 15.0 (7.9)• Soybean May 20.0, total 20.0 (nil)The board also detailed export registrations for subproducts,
as follows....
If the order of words doesn’t matter, x can be a vector of word frequencies. “Bag of words”: a long
sparse vector x=(,…,fi,….) where fi is the frequency of the i-th word in the vocabulary
Indexing-Word Frequency Weighting
![Page 15: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/15.jpg)
Pre-processing the Text-Dimensionality ReductionFeature selection: It attempts to remove
non-informative words.Document frequency thresholdingInformation gain
Latent Semantic IndexingSingular Value Decomposition
![Page 16: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/16.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 17: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/17.jpg)
Text Classification-MethodsNaïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
![Page 18: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/18.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 19: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/19.jpg)
Text Classification-Naïve BayesRepresent document x as list of words w1, w1,…
For each class y, build a probabilistic model Pr(X|Y=y) of “documents” Pr(X={argentine,grain...}|Y=wheat) = ....Pr(X={stocks,rose,in,heavy,...}|Y=nonWheat) = ....
To classify, find the y which was most likely to generate x—i.e., which gives x the best score according to Pr(x|y)f(x) = argmaxyPr(x|y)*Pr(y)
![Page 20: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/20.jpg)
Text Classification-Naïve BayesHow to estimate Pr(X|Y) ?Simplest useful process to generate a bag of
words:pick word 1 according to Pr(W|Y)repeat for word 2, 3, ....each word is generated independently of the
others (which is clearly not true) but means
∏=
===n
iin yYwyYww
11 )|Pr()|,...,Pr(
How to estimate Pr(W|Y)?
![Page 21: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/21.jpg)
Text Classification-Naïve BayesHow to estimate Pr(X|Y) ?
∏=
===n
iin yYwyYww
11 )|Pr()|,...,Pr(
myYmpyYwWyYwW
+=+==
===)count(
) and count()|Pr(
Estimate Pr(w|y) by looking at the data...
Terms:• This Pr(W|Y) is a multinomial distribution• This use of m and p is a Dirichlet prior for the multinomial
![Page 22: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/22.jpg)
Text Classification-Naïve BayesHow to estimate Pr(X|Y) ?
∏=
===n
iin yYwyYww
11 )|Pr()|,...,Pr(
1)count(5.0) and count()|Pr(
+=+==
===yY
yYwWyYwW
This Pr(W|Y) is a multinomial distribution This use of m and p is a Dirichlet prior for the multinomial
for instance: m=1, p=0.5
![Page 23: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/23.jpg)
Text Classification-Naïve Bayes
Putting this together:for each document xi with label yi
for each word wij in xi
count[wij][yi]++count[yi]++count++
to classify a new x=w1...wn, pick y with top score:
∑= +
++=
n
i
ik y
ywywwyscore1
1 1][count0.5]][[countlg
count][countlg)...,(
key point: we only need counts for words that actually appear in document x
![Page 24: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/24.jpg)
Naïve Bayes for SPAM filtering
![Page 25: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/25.jpg)
Naïve Bayes-SummaryPros:
Very fast and easy-to-implementWell-understood formally & experimentally
Cons:Seldom gives the very best performance“Probabilities” Pr(y|x) are not accurate
e.g., Pr(y|x) decreases with length of xProbabilities tend to be close to zero or one
![Page 26: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/26.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 27: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/27.jpg)
The Curse of DimensionalityHow can you learn with so many
features?For efficiency (time & memory), use sparse
vectors.Use simple classifiers (linear or loglinear)Rely on wide margins.
![Page 28: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/28.jpg)
Margin-based Learning
+++++
+++
+++ +
+ +
+
++
+
---
- --
-- -
--
-
-
- - - - -
--- -
- -- -
+
--
The number of features matters: but not if the margin is sufficiently wide and examples are sufficiently close to the origin (!!)
![Page 29: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/29.jpg)
Text Classification-Voted Perceptron Text documents: X=< x1, x2,… ,xk > K= number of features Two classes={yes, no} W= < w1, w2,… ,wk > Objective:
Learn a weight vector W and a threshold θ If
Yes Otherwise
No
If the answer is Correct:
w1 ++ Incorrect: There is a mistake. Correction is made:
xk+1 = xk + xiwk k = k+1 wk+1 = 1
_____.___ = _ __________> ____
__=1
![Page 30: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/30.jpg)
Voted Perceptron-Error Correction
![Page 31: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/31.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 32: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/32.jpg)
Lessons of the Voted Perceptron
Voted Perceptron shows that you can make few mistakes in incrementally learning as you pass over the data, if the examples x are small (bounded by R), some u exists that has large margin.
Why not look for this line directly?
Support vector machines:
•find u to maximize the margin.
![Page 33: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/33.jpg)
Text Classification-Support vector MachinesFacts about support vector machines:
the “support vectors” are the xi’s that touch the margin.
the classifier can be written as
where the xi’s are the support vectors.support vector machines often give very good results on topical text classification.
+ ++
++
+++ +
+ +++
+
-
--- -
--- -
--
-
- - --
+
____(________.___) ≥ __ ________.___
![Page 34: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/34.jpg)
Support Vector Machine Results
![Page 35: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/35.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 36: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/36.jpg)
Text Classification-Decision TreesObjective of decision tree learning
Learn a decision tree from a set of training data
The decision tree can be used to classify new examples
Decision tree learning algorithmsID3 (Quinlan, 1986)C4.5 (Quinlan, 1993)CART (Breiman, Friedman, et. al. 1983)etc.
![Page 37: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/37.jpg)
Decision Trees-CARTCreating the tree
Splitting the set of training vectorsThe best splitter has the purest childrenDiversity measure is entropy
Where S: set of example; k:number of classes; P(Ci): the probability of examples in S that belong to Ci.
To find the best splitter each component of the document vector is considered in turn.
This process is repeated until no sets can be partitioned any further.
Each leaf is now assigned a class.
_ __(____) log__(____)__
__=1
![Page 38: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/38.jpg)
Decision Trees-Example
![Page 39: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/39.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 40: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/40.jpg)
TF-IDF RepresentationThe results above use a particular way to
represent documents: bag of words with TFIDF weighting“Bag of words”: a long sparse vector x=(,…,fi,….) where
fi is the “weight” of the i-th word in the vocabularyfor word w that appears in DF(w) documents out of N in
a collection, and appears TF(w) times in the documents being represented use weight:
also normalize all vector lengths (||x||) to 1
)(log)1)(log()( wDF
NwTFf wi ×+=
![Page 41: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/41.jpg)
TF-IDF RepresentationTF-IDF representation is an old trick
from the information retrieval community, and often improves performance of other algorithms:K- nearest neighbor algorithm Rocchio’s algorithm
![Page 42: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/42.jpg)
Text Classification-K-nearest NeighborThe nearest neighbor algorithm
ObjectiveTo classify a new object, find the object in the
training set that is most similar. Then assign the category of this nearest neighbor.
Determine the largest similarity with any element in the training set:
Collect the subset of X that has highest similarity with :
_________________ = __________________(___,___)
__= ____ _ _____________,____ = ____________(___)}
![Page 43: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/43.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 44: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/44.jpg)
Text Classification-Rocchio’s Algorithmclassify using distance to centroid of
documents from each class
Assign test document to the class with maximum similarity
∑ ∑+ −
−=),( ),(z z
zz βαw
![Page 45: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/45.jpg)
Support Vector Machine Results
![Page 46: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/46.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 47: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/47.jpg)
Text Classification-Neural NetworksA neural network text classifier is a
collection of interconnected neurons representing documents that incrementally learn from its environment to categorize the documents.
A neuron
.
.
.
X1
X2
Xn
U =∑XiWi
y1
W1
W2
Wn
F(u)training
set
__(__) = 1 (1+___ __)_
y2y3
![Page 48: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/48.jpg)
Text Classification-Neural Networks
H1
H2
Hm
.
.
.
.
.
.
IM1u11
uji
vj
IM2
IMn
Forward propagation
Forward and Backward propagation
Backward propagation
OS1(r)
OS2(r)
OS3(r)
![Page 49: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/49.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationsubjective text classification
![Page 50: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/50.jpg)
Performance EvaluationPerformance of a classification algorithm
can be evaluated in the following aspectsPredictive performance
How accurate is the learned model in prediction?Complexity of the learned modelRun time
Time to build the modelTime to classify examples using the model
Here we focus on the predictive performance
![Page 51: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/51.jpg)
Predictive Performance EvaluationHow predictive is the model?Most common performance measures
Classification accuracy on a datasetRecallPrecisionF-measureFalloutAccuracyError…
Classification error rate (1-accuracy)Error on the training data is not a good
indicator of performance on future data.
![Page 52: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/52.jpg)
Performance MeasuresCorrect Incorre
ctAssign
eda b
Rejected
c d
________________= __+____+__+__+__
__________= __+____+__+__+ __
____= 2_ ___________________ ____________
__________________+____________
![Page 53: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/53.jpg)
Performance Measures(Cont.)macro-averaging
Gives equal weight to each categoryComputes the performance measures per
categoryComputes the global means by averaging the
above resultmicro-averaging
Gives equal weight to each documentConstructs a global contingency table Computes performance measure
![Page 54: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/54.jpg)
Macro/Micro-averaging Example
Class 1Yes
No
Call: yes
10 10
Call: no
10 970
Class 2Yes
No
Call: yes
90 10
Call: no
10 890
Pooled tableYes
No
Call: yes
100
20
Call: no
20 1860Macro-averaged precision:
[10/(10+10)+90/(10+90)]/2=(0.5+0.7)/2=0.7
Micro-averaged precision:100/(100+20)=0.83
![Page 55: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/55.jpg)
Estimating Error RateHold-out estimation
Randomly split the data set into a training set and a test set, e.g., training set (2/3), test set (1/3)
Build model from the training set and estimate the error on the test set
Repeat hold-out estimationHold-out estimate can be made more reliable by
repeating the process with different random splits.For each split, an error rate is collected.An overall error rate is obtained by averaging the
error rates on the different splits.
![Page 56: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/56.jpg)
Estimating Error Rate(cont.)Cross-validation
Randomly divide the data set into k subsets of equal size
Use k-1 subsets as training data and one subset as test data- do this k times using each subset in turn for testing
The error rates are averaged to yield an overall error estimate
This is called k-fold cross-validation Standard method for evaluation: 10-fold cross-
validationWhy 10? Extensive experiments have shown that this is
best choice to get an accurate estimate.
![Page 57: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/57.jpg)
OverfittingOverfitting occurs when a model begins to
memorize the training data rather than learning to generalize from the model.
It can learn to perfectly predict the training data but it fails to predict the unseen data.
Why?Insufficient
examplesComplex modelToo many
examplesNoisy data
![Page 58: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/58.jpg)
UnderfittingUnderfitting occurs when a model is too simple
to find the relationship in the input data.It fails to work well both on the training
data and the unseen data.
![Page 59: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/59.jpg)
OutlineDefinition and applicationsRepresenting texts
Pre-processing the textText classification methods
Naïve BayesVoted PerceptronSupport Vector MachinesDecision TreesK-nearest neighborRocchio’s algorithmNeural Networks
Performance evaluationSubjective text classification
![Page 60: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/60.jpg)
Text Classification: Examples Classify news stories as World, US, Business, SciTech, Sports,
Entertainment, Health, Other: topical classification, few classes Classify email to tech staff as Mac, Windows, ..., Other: topical
classification, few classes Classify email as Spam, Other: topical classification, few classes
Adversary may try to defeat your categorization scheme Add MeSH terms to Medline abstracts
e.g. “Conscious Sedation” [E03.250] topical classification, many classes
Classify web sites of companies by Standard Industrial Classification (SIC) code. topical classification, many classes
Classify business names by industry. Classify student essays as A,B,C,D, or F. Classify pdf files as ResearchPaper, Other Classify documents as WrittenByReagan, GhostWritten Classify movie reviews as Favorable,Unfavorable,Neutral. Classify technical papers as Interesting, Uninteresting. Classify jokes as Funny, NotFunny.
![Page 61: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/61.jpg)
Classifying Reviews as Favorable or Not
Dataset: 410 reviews from EpinionsAutos, Banks, Movies, Travel Destinations
Learning method:Extract 2-word phrases containing an adverb or
adjective (eg “unpredictable plot”)Classify reviews based on average Semantic
Orientation (SO) of phrases found:
Computed using queries to web search engine
![Page 62: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/62.jpg)
Classifying Reviews as Favorable or Not
![Page 63: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/63.jpg)
Classifying Movie Reviews
Idea: like Turney, focus on “polar” sections: subjective sentences
![Page 64: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/64.jpg)
"Fearless" allegedly marks Li's last turn as a martial arts movie star--at 42, the ex-wushu champion-turned-actor is seeking a less strenuous on-camera life--and it's based on the life story of one of China's historical sports heroes, Huo Yuanjia. Huo, a genuine legend, lived from 1868-1910, and his exploits as a master of wushu (the general Chinese term for martial arts) raised national morale during the period when beleaguered China was derided as "The Sick Man of the East."
"Fearless" shows Huo's life story in highly fictionalized terms, though the movie's most dramatic sequence--at the final Shanghai tournament, where Huo takes on four international champs, one by one--is based on fact. It's a real old-fashioned movie epic, done in director Ronny Yu's ("The Bride with White Hair") usual flashy, Hong Kong-and-Hollywood style, laced with spectacular no-wires fights choreographed by that Bob Fosse of kung fu moves, Yuen Wo Ping ("Crouching Tiger" and "The Matrix"). Dramatically, it's on a simplistic level. But you can forgive any historical transgressions as long as the movie keeps roaring right along.
![Page 65: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/65.jpg)
"Fearless" allegedly marks Li's last turn as a martial arts movie star--at 42, the ex-wushu champion-turned-actor is seeking a less strenuous on-camera life--and it's based on the life story of one of China's historical sports heroes, Huo Yuanjia. Huo, a genuine legend, lived from 1868-1910, and his exploits as a master of wushu (the general Chinese term for martial arts) raised national morale during the period when beleaguered China was derided as "The Sick Man of the East."
"Fearless" shows Huo's life story in highly fictionalized terms, though the movie's most dramatic sequence--at the final Shanghai tournament, where Huo takes on four international champs, one by one--is based on fact. It's a real old-fashioned movie epic, done in director Ronny Yu's ("The Bride with White Hair") usual flashy, Hong Kong-and-Hollywood style, laced with spectacular no-wires fights choreographed by that Bob Fosse of kung fu moves, Yuen Wo Ping ("Crouching Tiger" and "The Matrix"). Dramatically, it's on a simplistic level. But you can forgive any historical transgressions as long as the movie keeps roaring right along.
![Page 66: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/66.jpg)
Classifying Movie Reviews [Pang et al, ACL 2004]
Dataset: Rotten Tomatoes (+), IMDB plot reviews (-)Apply ML to build a sentence classifier Try and force nearby sentences to have similar subjectivity: use methods to find minimum cut on a constructed graph
![Page 67: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/67.jpg)
Classifying Movie Reviews
One vertex for each sentence
“subjective” “non subjective”Confidence in classifications
Edges indicate proximity
![Page 68: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/68.jpg)
Classifying Movie Reviews [Pang et al, ACL 2004]
Pick class - vs + for v2, v3
Pick class + vs – for v1
Retained f(v2)=f(v3), but not f(v2)=f(v1)
![Page 69: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/69.jpg)
Summary & ConclusionsThere are many, many
applications of text classification
Topical classification is fairly well understoodMost of the information is
in individual wordsVery fast and simple
methods work well
In many applications, classes are not topicsSentiment detectionSubjectivity/opinion
detection
In many applications, distinct classification decisions are interdependentReviews: Subjectivity of
nearby sentences
Lots of prior work to build on, lots of prior experimentation to consider
![Page 70: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/70.jpg)
ReferencesK. Aas, L. Eikvil, Text Categorization: A Survey, June 1999.W. W. Cohen, Text Classification: An Advanced Tutorial.
Machinelearning Department, Carnegie Mellon, 2010.Manning, C.D., and Schutze, H. (1999). Foundations of
Statistical Natural Language Processing, MIT Press, Cambridge, MA.
A. C. Cachopo, A. L. Oliveira, An Emprical Comparison of Text Categorization Methods, String Processing and Information Retrieval, 10th International Symposium, Springer Verlag, 2003.
C. Donghui, L. Zhijing, A New Text Categorization Method based on HMM and SVM, International Symposium on Communications and Information Technologies, 2007.
W. Xue, X. Xu, Three New Feature Weighting Methods for Text categorization, Springer-Verlag Berlin Heidelberg ,2010.
![Page 71: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/71.jpg)
References(Cont.)F. Xia, T. Jicun, L. Zhihui, A Text Categorization
Method Based on Local Document Frequency, IEEE Computer Society, 2009.
F. Sebastiani, Machine Learning in Automated text Categorization, ACM Computing Surveys, 2002.
K. Nigam, A. K. Mccallum, S. Thrun, T. Mitchell, Text Classification from Labeled and Unlabeled Documents using EM, Special issue on information retrieval ACM, 2000.
M. E. Ruiz, P. Srinivasan, Hierarchical Text Categorization using Neural Networks, Kluwer Academic Publications, Springer, 2002.
![Page 72: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/72.jpg)
References(Cont.)J. Meng, H. Lin, A Two-stage Feature
Selection Method for Text Categorization, Seventh International Confeence on Fuzzy Systems and Knowledge Discovery, IEEE, 2010.
![Page 73: Text Classification](https://reader035.vdocuments.us/reader035/viewer/2022062400/568155b7550346895dc38d88/html5/thumbnails/73.jpg)
Thanks!