semi-supervised learning rong jin. semi-supervised learning label propagation transductive...
TRANSCRIPT
![Page 1: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/1.jpg)
Semi-supervised Learning
Rong Jin
![Page 2: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/2.jpg)
Semi-supervised learning
Label propagation Transductive learning Co-training Active learning
![Page 3: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/3.jpg)
Label Propagation A toy problem
Each node in the graph is an example Two examples are labeled Most examples are unlabeled
Compute the similarity between examples Sij
Connect examples to their most similar examples
How to predicate labels for unlabeled nodes using this graph?
Unlabeled example
Two labeled examples
wij
![Page 4: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/4.jpg)
Label Propagation Forward propagation
![Page 5: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/5.jpg)
Label Propagation Forward propagation Forward propagation
![Page 6: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/6.jpg)
Label Propagation Forward propagation Forward propagation Forward propagation
How to resolve conflicting cases
What label should be given to this node ?
![Page 7: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/7.jpg)
Label Propagation Let S be the similarity matrix S=[Si,j]nxn
Let D be a diagonal matrix where Di = i j Si,j
Compute normalized similarity matrix S’ S’=D-1/2SD-1/2
Let Y be the initial assignment of class labels Yi = 1 when the i-th node is assigned to the positive class Yi = -1 when the i-th node is assigned to the negative class Yi = 0 when the I-th node is not initially labeled
Let F be the predicted class labels The i-th node is assigned to the positive class if Fi >0 The i-th node is assigned to the negative class if Fi < 0
![Page 8: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/8.jpg)
Label Propagation Let S be the similarity matrix S=[Si,j]nxn
Let D be a diagonal matrix where Di = i j Si,j
Compute normalized similarity matrix S’ S’=D-1/2SD-1/2
Let Y be the initial assignment of class labels Yi = 1 when the i-th node is assigned to the positive class Yi = -1 when the i-th node is assigned to the negative class Yi = 0 when the i-th node is not initially labeled
Let F be the predicted class labels The i-th node is assigned to the positive class if Fi >0 The i-th node is assigned to the negative class if Fi < 0
![Page 9: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/9.jpg)
Label Propagation One iteration
F = Y + S’Y = (I + S’)Y weights the propagation values
Two iteration F =Y + S’Y + 2S’2Y = (I + S’ + 2S’2)Y
How about the infinite iteration
F = (n=01nS’n)Y = (I - S’)-1Y
Any problems with such an approach?
![Page 10: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/10.jpg)
Label Consistency Problem Predicted vector F may
not be consistent with the initially assigned class labels Y
![Page 11: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/11.jpg)
Energy Minimization Using the same notation
Si,j: similarity between the I-th node and j-th node
Y: initially assigned class labels F: predicted class labels
Energy: E(F) = i,jSi,j(Fi – Fj)2 Goal: find label assignment F that is consistent with
labeled examples Y and meanwhile minimizes the energy function E(F)
![Page 12: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/12.jpg)
Harmonic Function E(F) = i,jSi,j (Fi – Fj)2 = FT(D-S)F Thus, the minimizer for E(F) should be (D-S)F = 0,
and meanwhile F should be consistent with Y. FT = (Fl
T, FuT), YT = (Yl
T, YuT)
Fl = Yl
ll ul
lu uu
L LD S L
L L
l l u
u l u
Y Y FF 0
F Y Fll ul ll ul
lu uu ul uu
L L L LL
L L L L1
u lF Yuu ul L L
![Page 13: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/13.jpg)
Optical Character Recognition Given an image of a digit letter, determine its value
1 2
Create a graph for images of digit letters
![Page 14: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/14.jpg)
Optical Character Recognition #Labeled_Examples+#Unlabeled_Examples = 4000
CMN: label propagation
1NN: for each unlabeled example, using the label of its closest neighbor
![Page 15: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/15.jpg)
Spectral Graph Transducer Problem with harmonic function
Why this could happen ? The condition (D-S)F = 0 does not hold for constrained
cases
l l u
u l u
Y Y F 0F
F Y F 0ll ul ll ul
lu uu ul uu
L L L LL
L L L L
![Page 16: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/16.jpg)
Spectral Graph Transducer Problem with harmonic function
Why this could happen ? The condition (D-S)F = 0 does not hold for constrained
cases
l l u
u l u
Y Y F 0F
F Y F 0ll ul ll ul
lu uu ul uu
L L L LL
L L L L
![Page 17: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/17.jpg)
Spectral Graph TransducerminF FTLF + c (F-Y)TC(F-Y)
s.t. FTF=n, FTe = 0 C is the diagonal cost matrix, Ci,i = 1 if the i-th node is
initially labeled, zero otherwise Parameter c controls the balance between the consistency
requirement and the requirement of energy minimization Can be solved efficiently through the computation of
eigenvector
![Page 18: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/18.jpg)
Empirical Studies
![Page 19: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/19.jpg)
Green’s Function The problem of minimizing energy and meanwhile being
consistent with initially assigned class labels can be formulated into Green’s function problem
Minimizing E(F) = FTLF LF = 0 Turns out L can be viewed as Laplacian operator in the discrete case LF = 0 r2F=0
Thus, our problem is find solution F
r2F=0, s.t. F = Y for labeled examples We can treat the constraint that F = Y for labeled examples as
boundary condition (Von Neumann boundary condition) A standard Green function problem
![Page 20: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/20.jpg)
Why Energy Minimization?
2,
1 1
( ) ( )n n
i j i ji j
E Y w y y
Final classification results
![Page 21: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/21.jpg)
Cluster Assumption Cluster assumption
Decision boundary should pass low density area
Unlabeled data provide more accurate estimation of local density
![Page 22: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/22.jpg)
Cluster Assumption vs. Maximum Margin Maximum margin classifier (e.g. SVM)
denotes +1
denotes -1
wx+b Maximum margin
low density around decision boundary
Cluster assumption
Any thought about utilizing the unlabeled data in support vector machine?
![Page 23: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/23.jpg)
Transductive SVM Decision boundary given a
small number of labeled examples
![Page 24: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/24.jpg)
Transductive SVM Decision boundary given a
small number of labeled examples
How will the decision boundary change given both labeled and unlabeled examples?
![Page 25: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/25.jpg)
Transductive SVM Decision boundary given a
small number of labeled examples
Move the decision boundary to place with low local density
![Page 26: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/26.jpg)
Transductive SVM Decision boundary given
a small number of labeled examples
Move the decision boundary to place with low local density
Classification results How to formulate this
idea?
![Page 27: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/27.jpg)
Transductive SVM: Formulation Labeled data L: Unlabeled data D: Maximum margin principle for mixture of
labeled and unlabeled data For each label assignment of unlabeled data,
compute its maximum margin Find the label assignment whose maximum
margin is maximized
1 1 2 2{( , ), ( , ),..., ( , )}n nL x y x y x y
1 2{( ), ( ),..., ( )}n n n mD x x x
![Page 28: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/28.jpg)
Tranductive SVM
Different label assignment for unlabeled data
different maximum margin
![Page 29: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/29.jpg)
Transductive SVM: Formulation
* *
,
1 1
2 2
{ , }= argmin
1
1 labeled
examples....
1
w b
n n
w b w w
y w x b
y w x b
y w x b
Original SVM
1
* *
,..., ,
1 1
2 2
1 1
{ , }= argmin argmin
1
1 labeled
examples....
1
1 unlabeled
....examples
1
n n my y w b
n n
n n
n m n m
w b w w
y w x b
y w x b
y w x b
y w x b
y w x b
Transductive SVM
Constraints for unlabeled data
A binary variables for label of each example
![Page 30: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/30.jpg)
Computational Issue
No longer convex optimization problem. (why?) How to optimize transductive SVM? Alternating optimization
1
* *1 1
,..., ,
1 1 11 1 1
2 2 2
{ , }= argmin argmin
1 1
1 labeled unlabeled ....
examples exampl....1
1
n n m
n ni ii i
y y w b
n n
n m n m mn n n
w b w w
y w x by w x b
y w x b
y w x by w x b
es
![Page 31: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/31.jpg)
Alternating Optimization
Step 1: fix yn+1,…, yn+m, learn weights w
Step 2: fix weights w, try to predict yn+1,…, yn+m (How?)
1
* *1 1
,..., ,
1 1 11 1 1
2 2 2
{ , }= argmin argmin
1 1
1 labeled unlabeled ....
examples exampl....1
1
n n m
n ni ii i
y y w b
n n
n m n m mn n n
w b w w
y w x by w x b
y w x b
y w x by w x b
es
![Page 32: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/32.jpg)
Empirical Study with Transductive SVM
10 categories from the Reuter collection
3299 test documents 1000 informative words
selected using MI criterion
![Page 33: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/33.jpg)
Co-training for Semi-supervised Learning Consider the task of classifying web pages into two
categories: category for students and category for professors
Two aspects of web pages should be considered Content of web pages
“I am currently the second year Ph.D. student …”
Hyperlinks “My advisor is …” “Students: …”
![Page 34: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/34.jpg)
Co-training for Semi-Supervised Learning
![Page 35: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/35.jpg)
Co-training for Semi-Supervised Learning
It is easy to classify the type of
this web page based on its
content
It is easier to classify this web
page using hyperlinks
![Page 36: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/36.jpg)
Co-training Two representation for each web page
Content representation:
(doctoral, student, computer, university…)
Hyperlink representation:
Inlinks: Prof. Cheng
Oulinks: Prof. Cheng
![Page 37: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/37.jpg)
Co-training: Classification Scheme1. Train a content-based classifier using labeled web pages
2. Apply the content-based classifier to classify unlabeled web pages
3. Label the web pages that have been confidently classified
4. Train a hyperlink based classifier using the web pages that are initially labeled and labeled by the classifier
5. Apply the hyperlink-based classifier to classify the unlabeled web pages
6. Label the web pages that have been confidently classified
![Page 38: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/38.jpg)
Co-training Train a content-based classifier
![Page 39: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/39.jpg)
Co-training Train a content-based classifier using
labeled examples Label the unlabeled examples that are
confidently classified
![Page 40: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/40.jpg)
Co-training Train a content-based classifier using
labeled examples Label the unlabeled examples that are
confidently classified Train a hyperlink-based classifier
Prof. : outlinks to students
![Page 41: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/41.jpg)
Co-training Train a content-based classifier using
labeled examples Label the unlabeled examples that are
confidently classified Train a hyperlink-based classifier
Prof. : outlinks to students
Label the unlabeled examples that are confidently classified
![Page 42: Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649cc35503460f9498b302/html5/thumbnails/42.jpg)
Co-training Train a content-based classifier using
labeled examples Label the unlabeled examples that are
confidently classified Train a hyperlink-based classifier
Prof. : outlinks to
Label the unlabeled examples that are confidently classified