actively transfer domain knowledge xiaoxiao shi wei fan jiangtao ren sun yat-sen university ibm t....
TRANSCRIPT
Actively Transfer Domain Knowledge
Xiaoxiao Shi† Wei Fan‡ Jiangtao Ren†
†Sun Yat-sen University‡IBM T. J. Watson Research Center
Transfer when you can, otherwise ask and don’t stretch it
2
Standard Supervised Learning
New York Times
training (labeled)
test (unlabeled)
Classifier
New York Times
85.5%
3
In Reality……
New York Times
training (labeled)
test (unlabeled)
New York Times
Labeled data are insufficient!
47.3%
How to improve the
performance?
4
Solution I : Active Learning
New York Times
training (labeled)
test (unlabeled)
Classifier
New York Times
LabelDomain Expert
$
Labeling Cost
83.4%
5
Solution II : Transfer Learning
Reuters
Out-of-domaintraining (labeled)
In-domaintest (unlabeled)
Transfer Classifier
New York Times
No guarantee transfer learning
could help!
Accuracydrops
Significant Differences
82.6%??43.5%
6
Motivation
• Active Learning:– Labeling cost
• Transfer Learning:– Domain difference risk
Both have disadvantages,
what to choose?
7
Active Learner choose
Proposed Solution (AcTraK)
Reuters
Transfer Classifier
Domain Expert
Label
Unreliable
DecisionFunction
Reliable, label by the classifier
ClassificationResult
Test
Labeled
Training
Cla
ssif
ier
Unlabeled in-domainTraining Data
out-domain training
(labeled)
8
Transfer Classifier
Mo
ML+
ML-
L+
L-
+
-X: In-domain
unlabeled
1. Classify X by out-of-domain Mo: P(L+|X, Mo) and P(L-|X, Mo).
2. Classify X by mapping classifiers ML+ and ML-: P(+|X, ML+) and P(+|X, ML-).
3. Then the probability for X to be “+” is:
T(X) = P(+|X) = P(L+|X, Mo) × P(+|X, ML+) + P(L-|X, Mo) ×P(+|X, ML-)
Out-of-domain dataset (labeled)
In-domain labeled
(few)
P(L+|X, Mo )
P(L-|X, M o)
P(+|X, ML+)
P(+|X
, ML- )
Train
TrainTrain
Mo
L+
L-In-domain
labeled (very few)
ML+
ML-
Train
Train
L+ = { (x,y=+/-)|Mo(x)=‘L+’ }the true in-domain
label may be either‘-’or ‘+’
-/L--/L+
+/L-+/L+In-domain
Label
Transfer Mo mapping
9
Active
Learner
Our Solution (AcTraK)
Reuters
Transfer Classifier
Domain Expert
Label
Unreliable
DecisionFunction
Reliable, label by the classifier
ClassificationResult
Test
Labeled
Training
Cla
ssif
ier
unlabeledTraining Data
outdomain training
(labeled)
when prediction by transfer classifier is unreliable, ask domain experts
10
Decision Function
Transfer Classifier
• In the following, ask the domain expert to label the instance, not the transfer classifier:
a) Conflict b) Low in confidence c) Few labeled in-domain examples
11
Decision Function
a) Conflict? b) Confidence? c) Size?
Decision Function:
Label by Transfer ClassifierLabel by Domain Expert
R : random number [0,1]
AcTraK asks the domain expert to label the instance with probability of
T(x): prediction by the transfer classifierML(x): prediction given by the in-domain classifier
12
• It can reduce domain difference risk.- According to Theorem 2, the expected error is bounded.
• It can reduce Labeling cost. - According to Theorem 3, the query probability is bounded.
Properties
13
Theorems
expected error of the transfer classifier
Maximum size
14
• Data Sets
– Synthetic data sets– Remote Sensing: data collected from regions with a
specific ground surface condition data collected from a new region
– Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup)
• Comparable Models– Inductive Learning model: AdaBoost, SVM– Transfer Learning model: TrAdaBoost (ICML’07)– Active Learning model: ERS (ICML’01)
Experiments setup
15
Experiments on Synthetic Datasets
In-domain:2 labeled training
&testing
4 out domain labeled training
16
Experiments on Real World DatasetEvaluation metric:• Compared with transfer learning on accuracy.• Compared with active learning on IEA (Integral
Evaluation on Accuracy).
17
1. Comparison with Transfer Learner
2. Comparison with Active Learner
20 Newsgroup
Accuracy Compari son
0. 45
0. 55
0. 65
0. 75
0. 85
1 2 3 4 5 6Datasets
Accuracy
SVMTrAdaBoostAcTraK
I EA(AcTraK, ERS, 250)
0
0. 5
1
1. 5
2
1 2 3 4 5 6
Datasets
IEA
• comparison with active learner ERS
18
• Actively Transfer Domain Knowledge
– Reduce domain difference risk: transfer useful knowledge (Theorem 2)
– Reduce labeling cost: query domain experts only when necessary (Theorem 3)
Conclusions