![Page 1: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/1.jpg)
Actively Transfer Domain Knowledge
Xiaoxiao Shi† Wei Fan‡ Jiangtao Ren†
†Sun Yat-sen University‡IBM T. J. Watson Research Center
Transfer when you can, otherwise ask and don’t stretch it
![Page 2: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/2.jpg)
2
Standard Supervised Learning
New York Times
training (labeled)
test (unlabeled)
Classifier
New York Times
85.5%
![Page 3: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/3.jpg)
3
In Reality……
New York Times
training (labeled)
test (unlabeled)
New York Times
Labeled data are insufficient!
47.3%
How to improve the
performance?
![Page 4: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/4.jpg)
4
Solution I : Active Learning
New York Times
training (labeled)
test (unlabeled)
Classifier
New York Times
LabelDomain Expert
$
Labeling Cost
83.4%
![Page 5: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/5.jpg)
5
Solution II : Transfer Learning
Reuters
Out-of-domaintraining (labeled)
In-domaintest (unlabeled)
Transfer Classifier
New York Times
No guarantee transfer learning
could help!
Accuracydrops
Significant Differences
82.6%??43.5%
![Page 6: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/6.jpg)
6
Motivation
• Active Learning:– Labeling cost
• Transfer Learning:– Domain difference risk
Both have disadvantages,
what to choose?
![Page 7: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/7.jpg)
7
Active Learner choose
Proposed Solution (AcTraK)
Reuters
Transfer Classifier
Domain Expert
Label
Unreliable
DecisionFunction
Reliable, label by the classifier
ClassificationResult
Test
Labeled
Training
Cla
ssif
ier
Unlabeled in-domainTraining Data
out-domain training
(labeled)
![Page 8: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/8.jpg)
8
Transfer Classifier
Mo
ML+
ML-
L+
L-
+
-X: In-domain
unlabeled
1. Classify X by out-of-domain Mo: P(L+|X, Mo) and P(L-|X, Mo).
2. Classify X by mapping classifiers ML+ and ML-: P(+|X, ML+) and P(+|X, ML-).
3. Then the probability for X to be “+” is:
T(X) = P(+|X) = P(L+|X, Mo) × P(+|X, ML+) + P(L-|X, Mo) ×P(+|X, ML-)
Out-of-domain dataset (labeled)
In-domain labeled
(few)
P(L+|X, Mo )
P(L-|X, M o)
P(+|X, ML+)
P(+|X
, ML- )
Train
TrainTrain
Mo
L+
L-In-domain
labeled (very few)
ML+
ML-
Train
Train
L+ = { (x,y=+/-)|Mo(x)=‘L+’ }the true in-domain
label may be either‘-’or ‘+’
-/L--/L+
+/L-+/L+In-domain
Label
Transfer Mo mapping
![Page 9: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/9.jpg)
9
Active
Learner
Our Solution (AcTraK)
Reuters
Transfer Classifier
Domain Expert
Label
Unreliable
DecisionFunction
Reliable, label by the classifier
ClassificationResult
Test
Labeled
Training
Cla
ssif
ier
unlabeledTraining Data
outdomain training
(labeled)
![Page 10: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/10.jpg)
when prediction by transfer classifier is unreliable, ask domain experts
10
Decision Function
Transfer Classifier
• In the following, ask the domain expert to label the instance, not the transfer classifier:
a) Conflict b) Low in confidence c) Few labeled in-domain examples
![Page 11: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/11.jpg)
11
Decision Function
a) Conflict? b) Confidence? c) Size?
Decision Function:
Label by Transfer ClassifierLabel by Domain Expert
R : random number [0,1]
AcTraK asks the domain expert to label the instance with probability of
T(x): prediction by the transfer classifierML(x): prediction given by the in-domain classifier
![Page 12: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/12.jpg)
12
• It can reduce domain difference risk.- According to Theorem 2, the expected error is bounded.
• It can reduce Labeling cost. - According to Theorem 3, the query probability is bounded.
Properties
![Page 13: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/13.jpg)
13
Theorems
expected error of the transfer classifier
Maximum size
![Page 14: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/14.jpg)
14
• Data Sets
– Synthetic data sets– Remote Sensing: data collected from regions with a
specific ground surface condition data collected from a new region
– Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup)
• Comparable Models– Inductive Learning model: AdaBoost, SVM– Transfer Learning model: TrAdaBoost (ICML’07)– Active Learning model: ERS (ICML’01)
Experiments setup
![Page 15: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/15.jpg)
15
Experiments on Synthetic Datasets
In-domain:2 labeled training
&testing
4 out domain labeled training
![Page 16: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/16.jpg)
16
Experiments on Real World DatasetEvaluation metric:• Compared with transfer learning on accuracy.• Compared with active learning on IEA (Integral
Evaluation on Accuracy).
![Page 17: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/17.jpg)
17
1. Comparison with Transfer Learner
2. Comparison with Active Learner
20 Newsgroup
Accuracy Compari son
0. 45
0. 55
0. 65
0. 75
0. 85
1 2 3 4 5 6Datasets
Accuracy
SVMTrAdaBoostAcTraK
I EA(AcTraK, ERS, 250)
0
0. 5
1
1. 5
2
1 2 3 4 5 6
Datasets
IEA
• comparison with active learner ERS
![Page 18: Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise](https://reader036.vdocuments.us/reader036/viewer/2022062318/551515aa55034673228b4bc1/html5/thumbnails/18.jpg)
18
• Actively Transfer Domain Knowledge
– Reduce domain difference risk: transfer useful knowledge (Theorem 2)
– Reduce labeling cost: query domain experts only when necessary (Theorem 3)
Conclusions