ran el-yaniv and dmitry pechyony technion – israel institute of technology, haifa, israel...
TRANSCRIPT
![Page 1: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/1.jpg)
Ran El-Yaniv and Dmitry Pechyony
Technion – Israel Institute of Technology ,Haifa, Israel
24.08.2007
Transductive Rademacher Complexity and its
Applications
![Page 2: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/2.jpg)
Induction vs. TransductionInductive learning:
Distribution
of examples
training set learning algorit
hm
hypothesis
labels
unlabeled examples
Transductive learning (Vapnik ’74,’98):training set
test set
learning algorit
hm
labels of the test set
Goal: minimize
Goal: minimize
f (xi ;yi )gmi=1
SmM= f (xi ;yi )gm
i=1
X uM= fxi g
m+ui=m+1
L uM= E(x;y)2X u f (̀h(x);y)g
E(x;y)» D f (̀h(x);y)gD
(x;y)
![Page 3: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/3.jpg)
Distribution-free Model [Vapnik ’74,’98]
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X XXX X
X
Given: “Full sample” of unlabeled examples, each with its true (unknown) label.
m+ um+ u
![Page 4: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/4.jpg)
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X XXX X
X
Given: “Full sample” of unlabeled examples, each withits true (unknown) label.
m+ um+ u
Full sample is partitioned: training set (m points) test set (u points)
Distribution-free Model [Vapnik ’74,’98]
![Page 5: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/5.jpg)
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X
XX
X X XXX X
XLabels of the training
examples are revealed.
Given: “Full sample” of unlabeled examples, each with its true (unknown) label.
m+ um+ u
Full sample is partitioned: training set (m points) test set (u points)
Distribution-free Model [Vapnik ’74,’98]
![Page 6: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/6.jpg)
Labels of the training points are revealed.
Goal: Label test examples
X?
? X
??
? ?
X?
? ?
??
? ?
X?
X ?
??
? ?
??
X ?
??
? ? ?X? ?
?
Given: “Full sample” of unlabeled examples, each with its true (unknown) label.
m+ um+ u
Full sample is partitioned: training set (m points) test set (u points)
Distribution-free Model [Vapnik ’74,’98]
![Page 7: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/7.jpg)
Rademacher complexity Induction Hypothesis space : set of functions . - training points. - i.i.d. random
variables, Rademacher:
Transduction (version 1) Hypothesis space : set of vectors , .
- full sample with training and test
points. - distributed as in induction.
Rademacher:
X mX m ¾= f¾i gm
i=1
Rm(F ) = 1m EX m E¾f supf 2F
P mi=1 ¾i f (xi )gRm(F ) = 1
m EX m E¾f supf 2FP m
i=1 ¾i f (xi )g
FF f : D ! Rf : D ! R
Prf¾i = 1g= Prf¾i = ¡ 1g= 12
Prf¾i = 1g= Prf¾i = ¡ 1g= 12
Rm+u(H) = ( 1m + 1
u ) ¢E¾f suph2HP m+u
i=1 ¾i hi gRm+u(H) = ( 1m + 1
u ) ¢E¾f suph2HP m+u
i=1 ¾i hi g
HH hh H µ Rm+uH µ Rm+u
X m+uX m+u mm uu¾= f¾i gm+u
i=1
![Page 8: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/8.jpg)
Transductive Rademacher complexity Version 1: - full sample with training and test
points. - transductive hypothesis space. - i.i.d. random
variables distributed by : .
Rademacher complexity:
Version 2: sparse distribution, , of Rademacher variables
We develop risk bounds with .
X m+uX m+u mm uu
¾= f¾i gm+ui=1HH
Rm+u(H;D1) = ( 1m + 1
u ) ¢E¾» D1 f suph2HP m+u
i=1 ¾i hi gRm+u(H;D1) = ( 1m + 1
u ) ¢E¾» D1 f suph2HP m+u
i=1 ¾i hi g
Prf¾i = 1g= Prf¾i = ¡ 1g= 12
Prf¾i = 1g= Prf¾i = ¡ 1g= 12
Prf¾i = 1g= Prf¾i = ¡ 1g= mu(m+u)2
Prf¾i = 1g= Prf¾i = ¡ 1g= mu(m+u)2
Prf¾i = 0g= 1¡ 2 mu(m+u)2
Prf¾i = 0g= 1¡ 2 mu(m+u)2
D1D1
DsDs
Lemma 1: .
Rm+u(H;Ds) · Rm+u(H;D1)Rm+u(H;Ds) · Rm+u(H;D1)
Rm+u(H) M= Rm+u(H;Ds)Rm+u(H) M= Rm+u(H;Ds)
![Page 9: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/9.jpg)
Risk boundNotation: - 0/1 error of on test examples . - empirical -margin error of on training examples
.
Theorem: For any , with probability at least
over the random partition of the full sample into
, for all hypotheses it holds that
.
Proof: based on and inspired by the results of [McDiarmid, ‘89],
[Bartlett and Mendelson, ‘02] and [Meir and Zhang, ‘03].
Previous results: [Lanckriet et al., ‘04] - case of .
h 2 Hh 2 H
±> 0; ° > 0±> 0; ° > 0 1¡ ±1¡ ±
Sm+uSm+u
(Sm;X u)(Sm;X u)
L u(h)L u(h)
L °m(h)L °
m(h)hh X u
X u
hh SmSm°°
L u(h) · L °m(h) + 1
° Rm+u(H) + O³ q ¡
1m + 1
u
¢ln 1
±
´L u(h) · L °
m(h) + 1° Rm+u(H) + O
³ q ¡1m + 1
u
¢ln 1
±
´
m = um = u
![Page 10: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/10.jpg)
Inductive vs. Transductive hypothesis spacesInduction: To use the risk bounds, the hypothesis space
shouldbe defined before observing the training set.
Transduction: The hypothesis space can be defined afterobserving , but before observing the actual
partition .
Conclusion: Transduction allows for the choosing a data-dependent hypothesis space. For example, it can beoptimized to have low Rademacher complexity.
This cannot be done in induction!
X m+uX m+u
(Sm;X u)(Sm;X u)
![Page 11: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/11.jpg)
Another view on transductive algorithms
learner
compute KK
®®compute
h = K ®h = K ®
X m+uX m+u
(Sm;X u)(Sm;X u)
(m+ u) £ r(m+ u) £ rmatrix
r £ 1r £ 1vector
Example:
- inverse of graph Laplacian iff ; otherwise.
KK
®i = yi®i = yi xi 2 Sm
xi 2 Sm
®i = 0®i = 0
Unlabeled-Labeled Decomposition (ULD)
![Page 12: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/12.jpg)
Bounding Rademacher complexity
Hypothesis space : the set of all , obtained by operatingtransductive algorithm on all possible partitions .
Notation: , - set of ‘s generated by . - all singular values of .
Lemma 2:
Lemma 2 justifies the spectral transformations performedto improve the performance of transductive algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang and Ando,‘05]).
.
= f ! i gri=1
= f ! i gri=1 KK
h = K ®h = K ®
HAHA hh
(Sm;X u)(Sm;X u)AA
TT ®® AA¹ = sup®2T f k®k2g¹ = sup®2T f k®k2g
Rm+u(HA ) · ¹q
2mu
P ri=1 ! 2
iRm+u(HA ) · ¹
q2
mu
P ri=1 ! 2
i
![Page 13: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/13.jpg)
Bounds for graph-based algorithms
Consistency method [Zhou, Bousquet, Lal, Weston, Scholkopf,
‘03]:
where are singular values of .
Similar bounds for the algorithms of [Joachims,’03],
[Belkin et al., ‘04], etc.
Rm+u(HA ) ·q
2u
P m+ui=1 ! 2
iRm+u(HA ) ·
q2u
P m+ui=1 ! 2
i
f ! i gm+ui=1
f ! i gm+ui=1 KK
![Page 14: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/14.jpg)
Topics not covered Bounding the Rademacher complexity
when is a kernel matrix. For some algorithms: data-dependent
method of computing probabilistic upper and lower bounds on Rademacher complexity.
Risk bound for transductive mixtures.
KK
![Page 15: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/15.jpg)
Direction for future research
Tighten the risk bound to allow effective model
selection: Bound depending on 0/1 empirical error. Usage of variance information to obtain
better convergence rate. Local transductive Rademacher
complexity. Clever data-dependent choice of low-
Rademacher hypothesis spaces.
![Page 16: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/16.jpg)
![Page 17: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/17.jpg)
Monte Carlo estimation of transductive Rademacher complexity
Rm+u(H) = ( 1m + 1
u ) ¢E¾f suph2H ¾¢hgRm+u(H) = ( 1m + 1
u ) ¢E¾f suph2H ¾¢hgRademacher: .
Draw uniformly vectors of Rademacher variables, .
By Hoeffding inequality: for any , with prob. at least ,
.
How to compute the supremum? For the Consistency Method of [Zhou et al., ‘03] can be computed in time.
Symmetric Hoeffding inequality probabilistic lower bound on the transductive Rademacher complexity.
±> 0±> 0
nn ¾(1); : : : ;¾(n)¾(1); : : : ;¾(n)
1¡ ±1¡ ±
Rm+u(H) · ( 1m + 1
u ) ¢1n
P ni=1 suph2H ¾(i ) ¢h + O
³ q1n ln 1
±
´Rm+u(H) · ( 1
m + 1u ) ¢1
n
P ni=1 suph2H ¾(i ) ¢h + O
³ q1n ln 1
±
´
suph2H ¾(i ) ¢hsuph2H ¾(i ) ¢h
O¡(m+ u)2
¢O¡(m+ u)2
¢
!!
![Page 18: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/18.jpg)
Induction vs. Transduction: differences
Induction Unknown
underlying distribution
Transduction No unknown
distribution. Each example has unique label.
Test examples not known. Will be sampled from the same distribution.
Test examples are known.
Generate a general hypothesis.
Want generalization!
Only classify given examples.
No generalization!
Independent training examples.
Dependent training and test examples.
![Page 19: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications](https://reader030.vdocuments.us/reader030/viewer/2022032606/56649eb65503460f94bbf940/html5/thumbnails/19.jpg)
Justification of spectral transformations
, - set of ‘s generated by .
- all singular values of .
Lemma 2: . Lemma 2 justifies the spectral transformations
performedto improve the performance of transductive
algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang
and Ando,‘05]).
= f ! i gri=1
= f ! i gri=1 KK
TT ®® AA¹ = sup®2T f k®k2g¹ = sup®2T f k®k2g
Rm+u(HA ) · ¹q
2mu
P ri=1 ! 2
iRm+u(HA ) · ¹
q2
mu
P ri=1 ! 2
i