learning structural svms with latent variableslearning structural svms with latent variables...
TRANSCRIPT
![Page 1: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/1.jpg)
Learning Structural SVMswith Latent Variables
Chun-Nam Yu
Dept. of Computer Science, Cornell University
October 8-9, IBM SMiLe Workshop
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 1 / 21
![Page 2: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/2.jpg)
Structured Output PredictionTraditional classification and regression
Structured output prediction
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21
![Page 3: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/3.jpg)
Structured Output PredictionTraditional classification and regression
Structured output prediction
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21
![Page 4: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/4.jpg)
Structured Output PredictionTraditional classification and regression
Structured output prediction
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21
![Page 5: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/5.jpg)
Structured Output PredictionTraditional classification and regression
Structured output prediction
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21
![Page 6: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/6.jpg)
Structured Output PredictionTraditional classification and regression
Structured output prediction
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21
![Page 7: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/7.jpg)
Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi
s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi
Loss function ∆ controls the penalty of predicting y insteadof yi
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21
![Page 8: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/8.jpg)
Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi
s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi
~w ·Φ( , )
︸ ︷︷ ︸score of correct parse tree
Loss function ∆ controls the penalty of predicting y insteadof yi
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21
![Page 9: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/9.jpg)
Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi
s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi
~w ·Φ( , )
︸ ︷︷ ︸score of correct parse tree
~w ·Φ( , )
︸ ︷︷ ︸score of wrong parse tree
Loss function ∆ controls the penalty of predicting y insteadof yi
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21
![Page 10: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/10.jpg)
Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi
s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y)≥∆(yi , y)− ξi
~w ·Φ( , )
︸ ︷︷ ︸score of correct parse tree
≥ ~w ·Φ( , )
︸ ︷︷ ︸score of wrong parse tree
Loss function ∆ controls the penalty of predicting y insteadof yi
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21
![Page 11: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/11.jpg)
Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi
s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi
~w ·Φ( , )
︸ ︷︷ ︸score of correct parse tree
≥ ~w ·Φ( , )
︸ ︷︷ ︸score of wrong parse tree
Loss function ∆ controls the penalty of predicting y insteadof yiC.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21
![Page 12: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/12.jpg)
Solving Margin-based Training Problems withthe Cutting-Plane Algorithm
Exponentially many constraints, but solvable in polynomialtime
using the cutting-planealgorithm to speed uptraining of structural SVMs[Joachims, Finley & Yu,MLJ’09]
using approximatecutting-plane models tobuild faster and sparserkernel SVMs[Yu & Joachims, KDD’08],[Joachims & Yu, ECML’09;Best Machine Learning Paper]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 4 / 21
![Page 13: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/13.jpg)
Solving Margin-based Training Problems withthe Cutting-Plane Algorithm
Exponentially many constraints, but solvable in polynomialtime
using the cutting-planealgorithm to speed uptraining of structural SVMs[Joachims, Finley & Yu,MLJ’09]
using approximatecutting-plane models tobuild faster and sparserkernel SVMs[Yu & Joachims, KDD’08],[Joachims & Yu, ECML’09;Best Machine Learning Paper]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 4 / 21
![Page 14: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/14.jpg)
Solving Margin-based Training Problems withthe Cutting-Plane Algorithm
Exponentially many constraints, but solvable in polynomialtime
using the cutting-planealgorithm to speed uptraining of structural SVMs[Joachims, Finley & Yu,MLJ’09]
using approximatecutting-plane models tobuild faster and sparserkernel SVMs[Yu & Joachims, KDD’08],[Joachims & Yu, ECML’09;Best Machine Learning Paper]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 4 / 21
![Page 15: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/15.jpg)
Incomplete Label Information and LatentVariablesDiscriminative motif finding
Noun Phrase Coreference
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21
![Page 16: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/16.jpg)
Incomplete Label Information and LatentVariablesDiscriminative motif finding
Noun Phrase Coreference
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21
![Page 17: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/17.jpg)
Incomplete Label Information and LatentVariablesDiscriminative motif finding
Noun Phrase Coreference
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21
![Page 18: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/18.jpg)
Incomplete Label Information and LatentVariablesDiscriminative motif finding
Noun Phrase Coreference
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21
![Page 19: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/19.jpg)
Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,
maxh∈H
~w · Φ(xi , yi ,h)− maxh∈H
~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi
~w · Φ(︸ ︷︷ ︸xi
, ︸ ︷︷ ︸yi
, ︸ ︷︷ ︸h′
)
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21
![Page 20: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/20.jpg)
Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,
maxh∈H
~w · Φ(xi , yi ,h)− maxh∈H
~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi
{~w · Φ( , , )
~w · Φ(︸ ︷︷ ︸xi
, ︸ ︷︷ ︸yi
, ︸ ︷︷ ︸h′′
), . . .}
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21
![Page 21: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/21.jpg)
Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,
maxh∈H
~w · Φ(xi , yi ,h)− maxh∈H
~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi
maxh∈H{~w · Φ( , , )
~w · Φ(︸ ︷︷ ︸xi
, ︸ ︷︷ ︸yi
, ︸ ︷︷ ︸h′′
), . . .}
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21
![Page 22: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/22.jpg)
Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,
maxh∈H
~w · Φ(xi , yi ,h)−maxh∈H
~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi
maxh∈H{~w · Φ( , , )
~w · Φ(︸ ︷︷ ︸xi
, ︸ ︷︷ ︸y
, ︸ ︷︷ ︸h′′
), . . .}
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21
![Page 23: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/23.jpg)
Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]
min~w ,~ξ
12‖~w‖2 + C
n∑i=1
ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,
maxh∈H
~w · Φ(xi , yi ,h)−maxh∈H
~w · Φ(xi , y , h)≥∆(yi , y , h)− ξi
maxh∈H{~w · Φ(︸ ︷︷ ︸
xi
, ︸ ︷︷ ︸yi
, ︸ ︷︷ ︸h′
), . . . . . .}
≥maxh∈H{~w · Φ(︸ ︷︷ ︸
xi
, ︸ ︷︷ ︸y
, ︸ ︷︷ ︸h′
), . . . . . .}
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21
![Page 24: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/24.jpg)
Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]
1 Decompose the objective into convex and concave part
2 Upper bound the concave part with a hyperplane
3 Minimize the resulting convex sum. Iterate untilconvergence
Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21
![Page 25: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/25.jpg)
Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]
1 Decompose the objective into convex and concave part
2 Upper bound the concave part with a hyperplane
3 Minimize the resulting convex sum. Iterate untilconvergence
Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21
![Page 26: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/26.jpg)
Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]
1 Decompose the objective into convex and concave part
2 Upper bound the concave part with a hyperplane
3 Minimize the resulting convex sum. Iterate untilconvergence
Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21
![Page 27: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/27.jpg)
Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]
1 Decompose the objective into convex and concave part
2 Upper bound the concave part with a hyperplane
3 Minimize the resulting convex sum. Iterate untilconvergence
Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21
![Page 28: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/28.jpg)
Solving the Non-Convex Optimization
Concave-Convex Procedure (CCCP)(1) Decompose the objective into convex and concave part
[12‖~w‖2 + C
n∑i=1
max(y ,h)∈Y×H
[~w · Φ(xi , y , h) + ∆(yi , y , h)]
]︸ ︷︷ ︸
convex
−
[C
n∑i=1
maxh∈H
~w · Φ(xi , yi ,h)
]︸ ︷︷ ︸
concave
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 8 / 21
![Page 29: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/29.jpg)
Solving the Non-Convex Optimization
Concave-Convex Procedure (CCCP)(2) Upper bound the concave part with a hyperplane at ~wt
∀~w ,−
[C
n∑i=1
maxh∈H
~w · Φ(xi , yi ,h)
]︸ ︷︷ ︸
concave
≤ −
[C
n∑i=1
~w · Φ(xi , yi ,h∗i )
]︸ ︷︷ ︸
linear
where h∗i = argmaxh∈H
~wt · Φ(xi , yi ,h)
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 9 / 21
![Page 30: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/30.jpg)
Solving the Non-Convex Optimization
Concave-Convex Procedure (CCCP)(3) Minimize the resulting convex sum to get ~wt+1
~wt+1 = min~w
[12‖~w‖2 + C
n∑i=1
max(y ,h)∈Y×H
[~w · Φ(xi , y , h) + ∆(yi , y , h)]
]︸ ︷︷ ︸
convex
−
[C
n∑i=1
~w · Φ(xi , yi ,h∗i )
]︸ ︷︷ ︸
linear
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 10 / 21
![Page 31: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/31.jpg)
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper boundinghyperplane
M-step: equivalent to minimizing the convex sum
Point estimate for latent variables; no normalization withpartition function requiredDiscriminative probabilistic models with latent variables
I [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein’07]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21
![Page 32: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/32.jpg)
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper boundinghyperplane
M-step: equivalent to minimizing the convex sum
Point estimate for latent variables; no normalization withpartition function requiredDiscriminative probabilistic models with latent variables
I [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein’07]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21
![Page 33: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/33.jpg)
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper boundinghyperplane
M-step: equivalent to minimizing the convex sum
Point estimate for latent variables; no normalization withpartition function required
Discriminative probabilistic models with latent variablesI [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein
’07]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21
![Page 34: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/34.jpg)
Analogy to Expectation-Maximization
E-step: equivalent to computing the upper boundinghyperplane
M-step: equivalent to minimizing the convex sum
Point estimate for latent variables; no normalization withpartition function requiredDiscriminative probabilistic models with latent variables
I [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein’07]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21
![Page 35: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/35.jpg)
Noun Phrase CoreferenceInput x : Noun phraseswith edge features
Label y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree
[from Cardie & Wagstaff ’99]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21
![Page 36: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/36.jpg)
Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrases
Latent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree
[from Cardie & Wagstaff ’99]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21
![Page 37: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/37.jpg)
Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as trees
Task: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree
[from Cardie & Wagstaff ’99]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21
![Page 38: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/38.jpg)
Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclustering
Inference: MinimumSpanning Tree
[from Cardie & Wagstaff ’99]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21
![Page 39: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/39.jpg)
Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree
[from Cardie & Wagstaff ’99]
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21
![Page 40: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/40.jpg)
Noun Phrase Coreference: Results
Test on MUC 6 data, using the same features as in [Ng &Cardie ’02]
Initialize spanning trees by chronological order
10-fold CV results:Algorithm MITRE lossSVMcluster [Finley & Joachims ’05] 41.3Latent Structural SVM 35.6
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 13 / 21
![Page 41: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/41.jpg)
Noun Phrase Coreference: Results
Test on MUC 6 data, using the same features as in [Ng &Cardie ’02]
Initialize spanning trees by chronological order10-fold CV results:
Algorithm MITRE lossSVMcluster [Finley & Joachims ’05] 41.3Latent Structural SVM 35.6
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 13 / 21
![Page 42: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/42.jpg)
Discriminative Motif Finding
Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri
Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motifTask: Find out the predictive motifInference: Enumerate all positions h
S. cerevisiae
S. kluyveri
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21
![Page 43: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/43.jpg)
Discriminative Motif Finding
Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri
Label y : Whether the sequencereplicates in S. cerevisiae
Latent variable h: position of the motifTask: Find out the predictive motifInference: Enumerate all positions h
S. cerevisiae
S. kluyveri
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21
![Page 44: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/44.jpg)
Discriminative Motif Finding
Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri
Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motif
Task: Find out the predictive motifInference: Enumerate all positions h
S. cerevisiae
S. kluyveri
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21
![Page 45: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/45.jpg)
Discriminative Motif Finding
Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri
Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motifTask: Find out the predictive motif
Inference: Enumerate all positions h
S. cerevisiae
S. kluyveri
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21
![Page 46: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/46.jpg)
Discriminative Motif Finding
Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri
Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motifTask: Find out the predictive motifInference: Enumerate all positions h
S. cerevisiae
S. kluyveri
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21
![Page 47: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/47.jpg)
Discriminative Motif Finding: Results
Data - 197 yeast DNA sequences from S. cerevisiae and S.kluyveri.∼6000 intergenic sequences for background estimation
10-fold CV, 10 random restarts for each parameter settingAlgorithm Error RateGibbs Sampler (w=11) 37.9%Gibbs Sampler (w=17) 35.06%Latent Structural SVM (w=11) 11.09%Latent Structural SVM (w=17) 12.00%
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 15 / 21
![Page 48: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/48.jpg)
Discriminative Motif Finding: Results
Data - 197 yeast DNA sequences from S. cerevisiae and S.kluyveri.∼6000 intergenic sequences for background estimation10-fold CV, 10 random restarts for each parameter setting
Algorithm Error RateGibbs Sampler (w=11) 37.9%Gibbs Sampler (w=17) 35.06%Latent Structural SVM (w=11) 11.09%Latent Structural SVM (w=17) 12.00%
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 15 / 21
![Page 49: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/49.jpg)
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with anefficient solution algorithm
A modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21
![Page 50: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/50.jpg)
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasks
Potential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21
![Page 51: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/51.jpg)
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settings
Also looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21
![Page 52: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/52.jpg)
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21
![Page 53: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/53.jpg)
Conclusions and Future Directions
A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21
![Page 54: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/54.jpg)
Discriminative Motif Finding - FormulationFeature vector Φ: Position-specific weight matrix plusparameters for Markov background model
Φ(x , y ,h) =h∑
i=1
φBG(xi)︸ ︷︷ ︸background
+l∑
j=1
φ(j)PSM(xh+j)︸ ︷︷ ︸motif
+n∑
i=h+l+1
φBG(xi)︸ ︷︷ ︸background
[from Wasserman 2004]
Loss function ∆: Zero-one loss
Inference: enumeration, as y is binary and h is linear insequence length
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 17 / 21
![Page 55: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/55.jpg)
Discriminative Motif Finding - FormulationFeature vector Φ: Position-specific weight matrix plusparameters for Markov background model
Φ(x , y ,h) =h∑
i=1
φBG(xi)︸ ︷︷ ︸background
+l∑
j=1
φ(j)PSM(xh+j)︸ ︷︷ ︸motif
+n∑
i=h+l+1
φBG(xi)︸ ︷︷ ︸background
[from Wasserman 2004]
Loss function ∆: Zero-one loss
Inference: enumeration, as y is binary and h is linear insequence length
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 17 / 21
![Page 56: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/56.jpg)
Discriminative Motif Finding - FormulationFeature vector Φ: Position-specific weight matrix plusparameters for Markov background model
Φ(x , y ,h) =h∑
i=1
φBG(xi)︸ ︷︷ ︸background
+l∑
j=1
φ(j)PSM(xh+j)︸ ︷︷ ︸motif
+n∑
i=h+l+1
φBG(xi)︸ ︷︷ ︸background
[from Wasserman 2004]
Loss function ∆: Zero-one lossInference: enumeration, as y is binary and h is linear insequence length
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 17 / 21
![Page 57: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/57.jpg)
Noun Phrase Coreference - FormulationFeature vector Φ: sum of tree edge features:
Φ(x , y ,h) =∑(i,j)∈h
xij
Loss function ∆:
∆(y , y , h) = n(y)︸︷︷︸#nodes
− k(y)︸︷︷︸#components
+∑(i,j)∈h
`(y , (i , j))︸ ︷︷ ︸+1/−1
Inference: Any MaximumSpanning Tree algorithm
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 18 / 21
![Page 58: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/58.jpg)
Noun Phrase Coreference - FormulationFeature vector Φ: sum of tree edge features:
Φ(x , y ,h) =∑(i,j)∈h
xij
Loss function ∆:
∆(y , y , h) = n(y)︸︷︷︸#nodes
− k(y)︸︷︷︸#components
+∑(i,j)∈h
`(y , (i , j))︸ ︷︷ ︸+1/−1
Inference: Any MaximumSpanning Tree algorithm
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 18 / 21
![Page 59: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/59.jpg)
Noun Phrase Coreference - FormulationFeature vector Φ: sum of tree edge features:
Φ(x , y ,h) =∑(i,j)∈h
xij
Loss function ∆:
∆(y , y , h) = n(y)︸︷︷︸#nodes
− k(y)︸︷︷︸#components
+∑(i,j)∈h
`(y , (i , j))︸ ︷︷ ︸+1/−1
Inference: Any MaximumSpanning Tree algorithm
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 18 / 21
![Page 60: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/60.jpg)
Optimizing Precision@kInput x : A query with anassociated collection ofdocumentsLabel y : Relevancejudgments of eachdocumentLatent variable h: Top krelevant documents
Query q: ICML 2009
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 19 / 21
![Page 61: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/61.jpg)
Optimizing Precision@k - Formulation
Feature vector Φ: sum of features from top k documents
Φ(x , y ,h) =k∑
j=1
xhj
Loss function ∆: One minus precison@k
∆(y , y , h) = 1− 1k
k∑j=1
[yhj== 1]
Depends only on top k document selected by hInference: Sorting
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 20 / 21
![Page 62: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop](https://reader034.vdocuments.us/reader034/viewer/2022042219/5ec4dd10e891f3349e4e6b2d/html5/thumbnails/62.jpg)
Optimizing Precision@k - ResultsOHSUMED dataset from LETOR 3.0 benchmarkInitialize h with weight vector trained on classificationaccuracy5-fold CV results:
C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 21 / 21