unsupervised transfer classification application to text categorization tianbao yang, rong jin, anil...

24
Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Upload: morgan-russell

Post on 18-Dec-2015

222 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Unsupervised Transfer Classification

Application to Text Categorization

Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong

Michigan State University

Page 2: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Overview

Introduction Related Work Unsupervised Transfer Classification

Problem Definition Approach & Analysis

Experiments Conclusions

Page 3: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Introduction

Classification: supervised learning semi-supervised learning

What if No label information is available? impossible but not with

some additional information

supervised

semi-supervised

unsupervised classification

Page 4: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Introduction

Unsupervised transfer classification (UTC) a collection of training examples and their

assignments to auxiliary classes to build a classification model for a target

class….

auxiliary class 1

auxiliary class K

target class

No Labeled training examples

prior

conditional probabilities

Page 5: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Introduction: Motivated Examples

Image Annotationsky

1

sun

0

1 1

0 1

water

0

0

1

0 0 1

grass?

?

?

?

Social Tagging

phone

verizon

apple

1

google

0 0

1 1 0

0

0

1

0

1

1

?

?

?

?

How to predict an annotation word/social tag that does not appear in the training data ?

?//// / / ?

auxiliary classes

auxiliary classes

target classestarget classes

Page 6: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Related Work

Transfer Learning transfer knowledge from source domain to

target domain similarity: transfer label information for

auxiliary classes to target class difference: assume NO label information for

target class

Multi-Label Learning, Maximum Entropy Model

Page 7: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Unsupervised Transfer Classification Data

for auxiliary class

target class

target class label

target classification model

Goal

Prior probability

conditional probabilities

Class Information

Examples

Auxiliary Classes

assignments to auxiliary classes

Page 8: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Maximum Entropy Model (MaxEnt)

Favor uniform

distribution

Favor uniform

distribution

Feature statistics computed

from conditional model

Feature statistics computed

from training data: the jth feature function

Page 9: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Generalized MaxEnt

With a large probability

Equality constraints

Inequality constraints

Page 10: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Generalized MaxEnt

Page 11: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Generalized MaxEnt

is unknown for target class is unknown for target class

How to extend generalized MaxEnt to unsupervised transfer classification ?

Page 12: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Estimating feature statistics of target class from those of the auxiliary classes

Unsupervised Transfer Classification

~~

Page 13: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Unsupervised Transfer Classification Build up Relation between Auxiliary

Classes and Target Class

Independence Assumption

Page 14: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Unsupervised Transfer Classification Estimating feature statistics for the

target class by regression

Feature Statistics for

Auxiliary Classes

Feature Statistics for

Auxiliary Classes

Feature Statistics for Target Class

Feature Statistics for Target Class

Class Informati

on

Class Informati

on

Page 15: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Unsupervised Transfer Classification Dual problem

: function of U; definition can be found in paper

Page 16: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Consistency Result

With a large probability

The optimal dual solution using the label information for the target class

The dual solution obtained by the proposed approach

Page 17: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Experiments

Text categorization Data sets: multi-labeled data

Protocol: leave one-class out as the target class

Metric: AUC (Area under ROC curve)

Page 18: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Experiments: Baselines

cModel train a classifier for each auxiliary class linearly combine them for the target class cLabel predict the assignment of the target class for training

examples by linearly combining the labels of auxiliary classes

train a classifier using the predicted labels for target class

GME-avg use generalized maxent model compute the feature statistics for the target class by

linearly combining those for the auxiliary classes

Proposed Approach: GME-Reg

Page 19: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Experiment (I)

Estimate class information from training data

Page 20: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Estimate class information from training data

Compare to the classifier of the target class learned by supervised learning

Experiment (I)

1500 2500

Page 21: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Experiment (II)

Obtain class information from external sources

Datasets: bibtex and delicious bibsonomy www.bibsonomy.org/tagsbibtex ACM DL www.portal.acm.orgbibtex deli.cio.us www.delicious.com/tag

delicious

Page 22: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Experiment (II)

Comparison with Supervised Classification

650

1000~1200

Page 23: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Conclusions

A new problem: unsupervised transfer classification

A statistical framework for unsupervised transfer classification based on generalized maximum entropy robust estimate feature statistics for target class provable performance by consistency analysis

Future Work relax independence assumption better estimation of feature statistics for target

class

Page 24: Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Thanks

Questions ?