actively transfer domain knowledge xiaoxiao shi wei fan jiangtao ren sun yat-sen university ibm t....

Actively Transfer Domain Knowledge

Xiaoxiao Shi† Wei Fan‡ Jiangtao Ren†

†Sun Yat-sen University‡IBM T. J. Watson Research Center

Transfer when you can, otherwise ask and don’t stretch it

2

Standard Supervised Learning

New York Times

training (labeled)

test (unlabeled)

Classifier

New York Times

85.5%

3

In Reality……

New York Times

training (labeled)

test (unlabeled)

New York Times

Labeled data are insufficient!

47.3%

How to improve the

performance?

4

Solution I : Active Learning

New York Times

training (labeled)

test (unlabeled)

Classifier

New York Times

LabelDomain Expert

$

Labeling Cost

83.4%

5

Solution II : Transfer Learning

Reuters

Out-of-domaintraining (labeled)

In-domaintest (unlabeled)

Transfer Classifier

New York Times

No guarantee transfer learning

could help!

Accuracydrops

Significant Differences

82.6%??43.5%

6

Motivation

• Active Learning:– Labeling cost

• Transfer Learning:– Domain difference risk

Both have disadvantages,

what to choose?

7

Active Learner choose

Proposed Solution (AcTraK)

Reuters

Transfer Classifier

Domain Expert

Label

Unreliable

DecisionFunction

Reliable, label by the classifier

ClassificationResult

Test

Labeled

Training

Cla

ssif

ier

Unlabeled in-domainTraining Data

out-domain training

(labeled)

8

Transfer Classifier

Mo

ML+

ML-

L+

L-

+

-X: In-domain

unlabeled

1. Classify X by out-of-domain Mo: P(L+|X, Mo) and P(L-|X, Mo).

2. Classify X by mapping classifiers ML+ and ML-: P(+|X, ML+) and P(+|X, ML-).

3. Then the probability for X to be “+” is:

T(X) = P(+|X) = P(L+|X, Mo) × P(+|X, ML+) + P(L-|X, Mo) ×P(+|X, ML-)

Out-of-domain dataset (labeled)

In-domain labeled

(few)

P(L+|X, Mo )

P(L-|X, M o)

P(+|X, ML+)

P(+|X

, ML- )

Train

TrainTrain

Mo

L+

L-In-domain

labeled (very few)

ML+

ML-

Train

Train

L+ = { (x,y=+/-)|Mo(x)=‘L+’ }the true in-domain

label may be either‘-’or ‘+’

-/L--/L+

+/L-+/L+In-domain

Label

Transfer Mo mapping

9

Active

Learner

Our Solution (AcTraK)

Reuters

Transfer Classifier

Domain Expert

Label

Unreliable

DecisionFunction

Reliable, label by the classifier

ClassificationResult

Test

Labeled

Training

Cla

ssif

ier

unlabeledTraining Data

outdomain training

(labeled)

when prediction by transfer classifier is unreliable, ask domain experts

10

Decision Function

Transfer Classifier

• In the following, ask the domain expert to label the instance, not the transfer classifier:

a) Conflict b) Low in confidence c) Few labeled in-domain examples

11

Decision Function

a) Conflict? b) Confidence? c) Size?

Decision Function:

Label by Transfer ClassifierLabel by Domain Expert

R : random number [0,1]

AcTraK asks the domain expert to label the instance with probability of

T(x): prediction by the transfer classifierML(x): prediction given by the in-domain classifier

12

• It can reduce domain difference risk.- According to Theorem 2, the expected error is bounded.

• It can reduce Labeling cost. - According to Theorem 3, the query probability is bounded.

Properties

13

Theorems

expected error of the transfer classifier

Maximum size

14

• Data Sets

– Synthetic data sets– Remote Sensing: data collected from regions with a

specific ground surface condition data collected from a new region

– Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup)

• Comparable Models– Inductive Learning model: AdaBoost, SVM– Transfer Learning model: TrAdaBoost (ICML’07)– Active Learning model: ERS (ICML’01)

Experiments setup

15

Experiments on Synthetic Datasets

In-domain:2 labeled training

&testing

4 out domain labeled training

16

Experiments on Real World DatasetEvaluation metric:• Compared with transfer learning on accuracy.• Compared with active learning on IEA (Integral

Evaluation on Accuracy).

17

1. Comparison with Transfer Learner

2. Comparison with Active Learner

20 Newsgroup

Accuracy Compari son

0. 45

0. 55

0. 65

0. 75

0. 85

1 2 3 4 5 6Datasets

Accuracy

SVMTrAdaBoostAcTraK

I EA(AcTraK, ERS, 250)

0

0. 5

1

1. 5

2

1 2 3 4 5 6

Datasets

IEA

• comparison with active learner ERS

18

• Actively Transfer Domain Knowledge

– Reduce domain difference risk: transfer useful knowledge (Theorem 2)

– Reduce labeling cost: query domain experts only when necessary (Theorem 3)

Conclusions

actively transfer domain knowledge xiaoxiao shi wei fan jiangtao ren sun yat-sen university ibm t....

Documents

domain classifier slide

domain label transfer

domain unlabeled

transfer classifier

domain training data

training slide

domain dataset

domain examples