semi-supervised learning10701/slides/17_ssl.pdfsemi-supervised svm many others 9. notation 10....
TRANSCRIPT
![Page 1: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/1.jpg)
Semi-Supervised Learning
Barnabas Poczos
Slides Courtesy: Jerry Zhu, Aarti Singh
![Page 2: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/2.jpg)
Supervised Learning
Learning algorithm
Labeled
Goal:
Feature Space Label Space
Optimal predictor (Bayes Rule) depends on unknown PXY, so instead
learn a good prediction rule from training data
2
![Page 3: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/3.jpg)
Labeled and Unlabeled data
Human expert/
Special equipment/
Experiment
“Crystal” “Needle” “Empty”
Cheap and abundant ! Expensive and scarce !
“0” “1” “2” …
“Sports”
“News”
“Science”
…
3
![Page 4: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/4.jpg)
Free-of-cost labels?
Luis von Ahn: Games with a purpose (ReCaptcha)
Word challenging to OCR (Optical Character Recognition)
You provide a free label!
4
![Page 5: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/5.jpg)
Semi-Supervised learning
Supervised learning (SL)
Semi-Supervised learning (SSL)
Learning algorithm
Goal: Learn a better prediction rule than based on labeled data alone.
“Crystal”
5
![Page 6: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/6.jpg)
Semi-Supervised learning in Humans
6
![Page 7: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/7.jpg)
Can unlabeled data help?
Assume each class is a coherent group (e.g. Gaussian)
Then unlabeled data can help identify the boundary more accurately.
Positive labeled data
Negative labeled data
Unlabeled data
Supervised Decision Boundary
Semi-Supervised Decision Boundary
7
![Page 8: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/8.jpg)
Can unlabeled data help?
35
8
7
9
4
2
1
53
8
7
9
4
2
1
“0” “1” “2” …
“Similar” data points have “similar” labels8
This embedding can be done by manifold learning algorithms
![Page 9: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/9.jpg)
Some SSL Algorithms
▪ Self-Training
▪ Generative methods, mixture models
▪ Graph-based methods
▪ Co-Training
▪ Semi-supervised SVM
▪ Many others
9
![Page 10: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/10.jpg)
Notation
10
![Page 11: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/11.jpg)
Self-training
11
![Page 12: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/12.jpg)
Self-training Example
Propagating 1-NN
12
![Page 13: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/13.jpg)
![Page 14: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/14.jpg)
![Page 15: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/15.jpg)
Mixture Models for Labeled Data
15
![Page 16: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/16.jpg)
Mixture Models for Labeled Data
> 1/2<
Estimate the
parameters from the
labeled data
Decision for any test
point not in the labeled
dataset
16
![Page 17: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/17.jpg)
Mixture Models for Labeled Data
17
![Page 18: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/18.jpg)
Mixture Models for SSL Data
18
![Page 19: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/19.jpg)
Mixture Models
19
![Page 20: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/20.jpg)
Mixture ModelsSL vs SSL
20
![Page 21: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/21.jpg)
Mixture Models
21
![Page 22: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/22.jpg)
Gaussian Mixture Models
22
![Page 23: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/23.jpg)
EM for Gaussian Mixture Models
23
![Page 24: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/24.jpg)
Assumption for GMMs
24
![Page 25: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/25.jpg)
Assumption for GMMs
25
![Page 26: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/26.jpg)
Assumption for GMMs
26
![Page 27: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/27.jpg)
Related: Cluster and Label
27
![Page 28: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/28.jpg)
28
![Page 29: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/29.jpg)
Graph Based Methods
Assumption: Similar unlabeled data have similar labels.
29
![Page 30: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/30.jpg)
Graph RegularizationSimilarity Graphs: Model local neighborhood relations between data points
30
Assumption:
Nodes connected by heavy edges
tend to have similar label
![Page 31: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/31.jpg)
Graph Regularization
If data points i and j are similar (i.e. weight wij is large), then their labels
are similar fi = fj
Loss on labeled data
(mean square,0-1)
Graph based smoothness prior
on labeled and unlabeled data
31
![Page 32: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/32.jpg)
Co-training
![Page 33: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/33.jpg)
Co-training (Blum & Mitchell, 1998) (Mitchell, 1999) assumes that (i) features can be split into two sets; (ii) each sub-feature set is sufficient to train a good classifier.
• Initially two separate classifiers are trained with the labeled data, on the two sub-feature sets respectively.
• Each classifier then classifies the unlabeled data, and ‘teaches’ the other classifier with the few unlabeled examples (and the predicted labels) they feel most confident.
• Each classifier is retrained with the additional training examples given by the other classifier, and the process repeats.
33
Co-training Algorithm
![Page 34: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/34.jpg)
Co-training AlgorithmBlum & Mitchell’98
![Page 35: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/35.jpg)
Semi-Supervised SVMs
35
![Page 36: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled](https://reader033.vdocuments.us/reader033/viewer/2022043021/5f3cba9fdb9d7b72893bd564/html5/thumbnails/36.jpg)
Semi-Supervised Learning
▪ Generative methods
▪ Graph-based methods
▪ Co-Training
▪ Semi-Supervised SVMs
▪ Many other methods
SSL algorithms can use unlabeled data to help improve prediction accuracy if data satisfies appropriate assumptions
36