text classification, active/interactive learning

26
Text Classification, Active/Interactive learning

Upload: virginia-mccormick

Post on 12-Jan-2016

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Text Classification, Active/Interactive learning

Text Classification,Active/Interactive learning

Page 2: Text Classification, Active/Interactive learning

Text Categorization

• Categorization of documents, based on topics

• When the topics are known – categorization (supervised)

• When the topics are unknown – classification (a.k.a clustering, unsupervised)

Page 3: Text Classification, Active/Interactive learning

Supervised learning

• Training data – documents with their assigned (true) categories

• Documents are represented by feature vectors x=(x1, x2,…,xn)

Page 4: Text Classification, Active/Interactive learning

Document features

• Word frequencies• Stems/lemmas• Phrases• POS tags• Semantic features (concepts, named entities)• tdidf :

tf = term frequencyidf = inverse document frequency

Page 5: Text Classification, Active/Interactive learning

TFIDF Weights

TFIDF definitions:tfik: #occurrences of term tk in document Di

dfk: #documents which contain tk

idfk: log(d / dfk) where d is the total number of documentswik: tfik idfk term weight

Intuition: rare words get more weight, common words less weight

Page 6: Text Classification, Active/Interactive learning

Naïve Bayes classifier

• Tightly tied to text categorization• Interesting theoretical properties• A simple example of an important class of

learners based on generative models that approximate how data is produced

• For certain special cases, NB is the best thing you can do

Page 7: Text Classification, Active/Interactive learning

Bayes’ rule

)(

)()|()|(

XP

CPCXPXCP

Page 8: Text Classification, Active/Interactive learning

Maximum a posteriori hypothesis

)|(argmax DhPhHh

MAP

)(

)()|(argmax

DP

hPhDP

Hh

)()|(argmax hPhDPHh

As P(D) isconstant

Page 9: Text Classification, Active/Interactive learning

Maximum likelihood hypothesis

If all hypotheses are a priori equally likely, we only need to consider the P(D|h) term:

)|(argmax hDPhHh

ML

Page 10: Text Classification, Active/Interactive learning

Naive Bayes classifiers

Task: Classify a new instance D based on a tuple of attribute values into one of the classes cj CnxxxD ,,, 21

),,,|(argmax 21 njCc

MAP xxxcPcj

)()|,,,(argmax 21 jjnCc

cPcxxxPj

Page 11: Text Classification, Active/Interactive learning

Naïve Bayes assumption

• P(cj)– Can be estimated from the frequency of classes in the

training examples.

• P(x1,x2,…,xn|cj) – O(|X|n•|C|) parameters– Could only be estimated if a very, very large number

of training examples was available.Naïve Bayes Conditional Independence Assumption:• Assume that the probability of observing the conjunction

of attributes is equal to the product of the individual probabilities P(xi|cj).

Page 12: Text Classification, Active/Interactive learning

Naive Bayes for text categorization

• Attributes are text positions, values are words

• Still too many possibilities• Assume that classification is independent of the

positions of the words– Use same parameters for each position– Result is bag of words model (over tokens not

types)

)|text""()|our""()(argmax

)|()(argmax

1j

j

jnjjCc

ijij

CcNB

cxPcxPcP

cxPcPc

Page 13: Text Classification, Active/Interactive learning

• Textj single document containing all docsj

• for each word xk in Vocabulary– nk number of occurrences of xk in Textj

Naïve Bayes: learning probabilities• From training corpus, extract Vocabulary• Calculate required P(cj) and P(xk | cj) terms

– For each cj in C do• docsj subset of documents for which the target class is cj

|documents # total|

||)( jj

docscP

Page 14: Text Classification, Active/Interactive learning

Naïve Bayes: classifying

• positions all word positions in current document which contain tokens found in

Vocabulary

• Return cNB, where

Page 15: Text Classification, Active/Interactive learning

Underflow Prevention

Page 16: Text Classification, Active/Interactive learning

Binomial Naïve Bayes

• One feature Xw for each word in dictionary

• Xw = true in document d if w appears in d

• Naive Bayes assumption: Given the document’s topic, appearance of one word in

the document tells us nothing about chances that another word appears

Page 17: Text Classification, Active/Interactive learning

Parameter Estimation

fraction of documents of topic cj

in which word w appears

• Binomial model:

• Multinomial model:

– Can create a mega-document for topic j by concatenating all documents in this topic

– Use frequency of w in mega-document

)|(ˆjw ctXP

fraction of times in which word w appears

across all documents of topic cj

)|(ˆji cwXP

Page 18: Text Classification, Active/Interactive learning

Learning probabilities

• Passive learning – learning from the the annotated corpus

• Active learning – learning only from the most informative instances (real time annotation)

• Interactive learning – the annotator can choose to suggest specific features (e.g., playoffs to indicate sports) and not just complete instances

Page 19: Text Classification, Active/Interactive learning

https://github.com/burrsettles/dualist

Page 20: Text Classification, Active/Interactive learning

{Inter}active learning

• Available actions:–Annotate an instance with a class–Annotate a feature with a class–Suggest new feature and annotate it

with a class

Page 21: Text Classification, Active/Interactive learning

Adding prior to features’ max likelihood

- Normalization factor (summing over all words)

Page 22: Text Classification, Active/Interactive learning

Adding prior to classes’ max likelihood

- Normalization factor (summing over all classes)

Page 23: Text Classification, Active/Interactive learning

Semi-supervised (learning from unlabeled data)

• Learning model probabilities ( )using only priors

• Apply the induced classifier on the unlabeled instances

• Re-estimate the probabilities using the labeled as well as the probabilistically labeled instances (multiply the latters with 0.1 to avoid over whelming the model)

• Possibly iterating this processes • This is actually EM

Page 24: Text Classification, Active/Interactive learning

Suggesting instances for annotation

• Use weight function• For example, entropy-based uncertainty weight:

• Then, suggest the top D documents

Document d

Page 25: Text Classification, Active/Interactive learning

Suggesting features for annotation

• Using info-gain

• Then, suggest the top V features for the class with which they occur most

Page 26: Text Classification, Active/Interactive learning

Results of 3 annotators, comparing active, interactive and passive learning

(Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances, Burr Settles)