learning theory: generalization and vc...
TRANSCRIPT
![Page 1: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/1.jpg)
Learning theory: generalization and VC dimension
Yifeng TaoSchool of Computer ScienceCarnegie Mellon University
Slides adapted from Eric Xing
Carnegie Mellon University 1Yifeng Tao
Introduction to Machine Learning
![Page 2: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/2.jpg)
Outline
oComputational learning theoriesoPAC frameworkoAgnostic framework
oVC dimension
Yifeng Tao Carnegie Mellon University 2
PAC Agnostic
|H| finite
|H| infinite, but VC(H) finite
![Page 3: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/3.jpg)
Generalizability of LearningoIn machine learning it's really generalization error that we care, but
most learning algorithms fit their models to the training set. oWhy should doing well on the training set tell us anything about
generalization error? Specifically, can we relate error on training set to generalization error?
oAre there conditions under which we can actually prove that learning algorithms will work well?
oLecture 1:
Yifeng Tao Carnegie Mellon University 3
[Slide from EricXing]
![Page 4: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/4.jpg)
What General Laws Constrain Inductive Learning?oWant theory to relate:
oTraining examples: moComplexity of hypothesis/concept space: H oAccuracy of approximation to target concept: oProbability of successful learning:
oAll the results in O(…)
Yifeng Tao Carnegie Mellon University 4
[Slide from EricXing]
![Page 5: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/5.jpg)
Prototypical concept learning taskoBinary classification
oEverything we'll say here generalizes to other, including regression and multi-class classification problems.
oGiven:o Instances X: Possible days, each described by the attributes Sky, AirTemp,
Humidity, Wind, Water, ForecastoTarget function c: EnjoySport : X à {0, 1} oHypotheses space H: Conjunctions of literals. E.g.
o (?, Cold, High, ?, ?, ¬EnjoySport).oTraining examples S: iid positive and negative examples of the target
functiono (x1, c(x1)), ... (xm, c(xm))
oDetermine:oA hypothesis h in H such that h(x) is "good" w.r.t c(x) for all x in S?oA hypothesis h in H such that h(x) is "good" w.r.t c(x) for all x in the true
distribution D?
Yifeng Tao Carnegie Mellon University 5
[Slide from EricXing]
![Page 6: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/6.jpg)
Sample ComplexityoHow many training examples m are sufficient to learn the target
concept? oTraining scenarios:
o If learner proposes instances, as queries to teacher o Learner proposes instance x, teacher provides c(x)
o If teacher (who knows c) provides training examples o Teacher provides sequence of examples offer m(x,c(x))
o If some random process (e.g., nature) proposes instances o Instance x generated randomly, teacher provides c(x)
Yifeng Tao Carnegie Mellon University 6
[Slide from EricXing]
![Page 7: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/7.jpg)
Two Basic Competing Models
Yifeng Tao Carnegie Mellon University 7
[Slide from EricXing]
![Page 8: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/8.jpg)
Protocol
Yifeng Tao Carnegie Mellon University 8
[Slide from EricXing]
![Page 9: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/9.jpg)
True error of a hypothesis
oDefinition: The true error (denoted εD(h)) of hypothesis h with respect to target concept c and distribution D is the probability that h will misclassify an instance drawn at random according to D.
Yifeng Tao Carnegie Mellon University 9
[Slide from EricXing]
![Page 10: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/10.jpg)
Two notions of erroroTraining error (a.k.a., empirical risk or empirical error) of hypothesis
h with respect to target concept coHow often h(x) ≠ c(x) over training instance from S
oTrue error of (a.k.a., generalization error, test error) hypothesis h with respect to c oHow often h(x) ≠ c(x) over future random instances drew iid from D
Yifeng Tao Carnegie Mellon University 10
[Slide from EricXing]
![Page 11: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/11.jpg)
The Union Bound
Yifeng Tao Carnegie Mellon University 11
[Slide from EricXing]
![Page 12: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/12.jpg)
Hoeffding inequality
Yifeng Tao Carnegie Mellon University 12
[Slide from EricXing]
![Page 13: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/13.jpg)
Version Space
Yifeng Tao Carnegie Mellon University 13
[Slide from EricXing]
![Page 14: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/14.jpg)
Consistent LearneroA learner is consistent if it outputs hypothesis that perfectly fits the
training dataoThis is a quite reasonable learning strategy
oEvery consistent learning outputs a hypothesis belonging to the version space
oWe want to know how such hypothesis generalizes
Yifeng Tao Carnegie Mellon University 14
[Slide from EricXing]
![Page 15: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/15.jpg)
Probably Approximately Correct
oDouble “hedging" oApproximatelyoProbably
oNeed both!
Yifeng Tao Carnegie Mellon University 15
[Slide from EricXing]
![Page 16: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/16.jpg)
Exhausting the version space
Yifeng Tao Carnegie Mellon University 16
VSH,D
[Slide from EricXing]
![Page 17: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/17.jpg)
How many examples will ε-exhaust the VS
Yifeng Tao Carnegie Mellon University 17
[Slide from EricXing]
![Page 18: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/18.jpg)
Proof
Yifeng Tao Carnegie Mellon University 18
[Slide from EricXingandDavidSontag]
![Page 19: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/19.jpg)
What it meanso[Haussler, 1988]: probability that the version space is not ε-
exhausted after m training examples is at most |H|e-εm
oSuppose we want this probability to be at most δ
oHow many training examples suffice?
oIf
Yifeng Tao Carnegie Mellon University 19
[Slide from EricXing]
![Page 20: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/20.jpg)
Learning Conjunctions of Boolean Literals
Yifeng Tao Carnegie Mellon University 20
[Slide from EricXing]
![Page 21: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/21.jpg)
PAC LearnabilityoA learning algorithm is PAC learnable if it
oRequires no more than polynomial computation per training example, and ono more than polynomial number of samples
oTheorem: conjunctions of Boolean literals is PAC learnable
Yifeng Tao Carnegie Mellon University 21
[Slide from EricXing]
![Page 22: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/22.jpg)
How about EnjoySport?
Yifeng Tao Carnegie Mellon University 22
[Slide from EricXing]
![Page 23: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/23.jpg)
PAC-Learning
Yifeng Tao Carnegie Mellon University 23
[Slide from EricXing]
![Page 24: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/24.jpg)
Agnostic Learning
Yifeng Tao Carnegie Mellon University 24
[Slide from EricXing]
![Page 25: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/25.jpg)
Empirical Risk Minimization Paradigm
Yifeng Tao Carnegie Mellon University 25
[Slide from EricXing]
![Page 26: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/26.jpg)
The Case of Finite HoH = {h1, ..., hk} consisting of k hypotheses.oWe would like to give guarantees on the generalization error of h.oFirst, we will show that is a reliable estimate of ε(h) for all h.oSecond, we will show that this implies an upper-bound on the
generalization error of h.
Yifeng Tao Carnegie Mellon University 26
[Slide from EricXing]
![Page 27: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/27.jpg)
Misclassification Probabilityo
o
o
Yifeng Tao Carnegie Mellon University 27
[Slide from EricXing]
![Page 28: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/28.jpg)
Uniform Convergenceo
o
o
Yifeng Tao Carnegie Mellon University 28
[Slide from EricXing]
![Page 29: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/29.jpg)
Yifeng Tao Carnegie Mellon University 29
[Slide from EricXing]
![Page 30: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/30.jpg)
Sample Complexity
Yifeng Tao Carnegie Mellon University 30
[Slide from EricXing]
![Page 31: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/31.jpg)
Generalization Error Bound
Yifeng Tao Carnegie Mellon University 31
[Slide from EricXing]
![Page 32: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/32.jpg)
Agnostic framework
Yifeng Tao Carnegie Mellon University 32
[Slide from EricXing]
![Page 33: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/33.jpg)
What if H is not finite?oCan’t use our result for infinite H
oNeed some other measure of complexity for H oVapnik-Chervonenkis (VC) dimension!
Yifeng Tao Carnegie Mellon University 33
[Slide from EricXing]
![Page 34: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/34.jpg)
How do we characterize “power”?oDifferent machines have different amounts of “power”. oTradeoff between:
oMore power: Can model more complex classifiers but might overfitoLess power: Not going to overfit, but restricted in what it can model
oHow do we characterize the amount of power?
Yifeng Tao Carnegie Mellon University 34
[Slide from EricXing]
![Page 35: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/35.jpg)
Shattering a Set of Instances
Yifeng Tao Carnegie Mellon University 35
[Slide from EricXing]
![Page 36: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/36.jpg)
Three Instances Shattered
Yifeng Tao Carnegie Mellon University 36
[Slide from EricXing]
![Page 37: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/37.jpg)
The Vapnik-Chervonenkis Dimension
Yifeng Tao Carnegie Mellon University 37
[Slide from EricXing]
![Page 38: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/38.jpg)
VC dimension: examples
Yifeng Tao Carnegie Mellon University 38
[Slide from EricXingandDavidSontag]
![Page 39: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/39.jpg)
VC dimension: examples
Yifeng Tao Carnegie Mellon University 39
[Slide from EricXing]
![Page 40: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/40.jpg)
Yifeng Tao Carnegie Mellon University 40
[Slide from EricXing]
![Page 41: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/41.jpg)
Yifeng Tao Carnegie Mellon University 41
[Slide from EricXing]
![Page 42: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/42.jpg)
Yifeng Tao Carnegie Mellon University 42
[Slide from EricXing]
![Page 43: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/43.jpg)
The VC Dimension and the Number of ParametersoThe VC dimension thus gives concreteness to the notion of the
capacity of a given set of h. oIs it true that learning machines with many parameters would have
high VC dimension, while learning machines with few parameters would have low VC dimension?
oAn infinite-VC function with just one parameter!
Yifeng Tao Carnegie Mellon University 43
[Slide from EricXing]
![Page 44: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/44.jpg)
An infinite-VC function with just one parameter
Yifeng Tao Carnegie Mellon University 44
[Slide from EricXing]
![Page 45: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/45.jpg)
Sample Complexity from VC Dimension
Yifeng Tao Carnegie Mellon University 45
[Slide from EricXing]
![Page 46: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/46.jpg)
ConsistencyoA learning process (model) is said to be consistent if model error,
measured on new data sampled from the same underlying probability laws of our original sample, converges, when original sample size increases, towards model error, measured on original sample.
Yifeng Tao Carnegie Mellon University 46
[Slide from EricXing]
![Page 47: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/47.jpg)
Vapnik main theorem
Yifeng Tao Carnegie Mellon University 47
[Slide from EricXing]
![Page 48: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/48.jpg)
Agnostic Learning: VC Bounds
Yifeng Tao Carnegie Mellon University 48
[Slide from EricXing]
![Page 49: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/49.jpg)
Model convergence speed
Yifeng Tao Carnegie Mellon University 49
[Slide from EricXing]
![Page 50: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/50.jpg)
How to control model generalization capacityoRisk Expectation = Empirical Risk + Confidence IntervaloTo minimize Empirical Risk alone will not always give a good
generalization capacity: one will want to minimize the sum of Empirical Risk and Confidence Interval
oWhat is important is not the numerical value of the Vapnik limit, most often too large to be of any practical use, it is the fact that this limit is a non decreasing function of model family function “richness”
Yifeng Tao Carnegie Mellon University 50
[Slide from EricXing]
![Page 51: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/51.jpg)
Structural Risk Minimization
Yifeng Tao Carnegie Mellon University 51
[Slide from EricXing]
![Page 52: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/52.jpg)
SRM strategy
Yifeng Tao Carnegie Mellon University 52
[Slide from EricXing]
![Page 53: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/53.jpg)
SRM strategy
Yifeng Tao Carnegie Mellon University 53
[Slide from EricXing]
![Page 54: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/54.jpg)
SRM strategy
Yifeng Tao Carnegie Mellon University 54
[Slide from EricXing]
![Page 55: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/55.jpg)
Putting SRM into action: linear models caseoThere are many SRM-based strategies to build models: oIn the case of linear models
y = wTx + boone wants to make ||w|| a controlled parameter: let us call HC the
linear model function family satisfying the constraint: ||w|| < C
oVapnik Major theorem: When C decreases, d(HC) decreases
Yifeng Tao Carnegie Mellon University 55
[Slide from EricXing]
![Page 56: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/56.jpg)
Putting SRM into action: linear models case
Yifeng Tao Carnegie Mellon University 56
yi - wTxi - b
wTxi +b
[Slide from EricXing]
![Page 57: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/57.jpg)
Take away message
oSample complexity varies with the learning setting oLearner actively queries traineroExamples provided at random
oWithin the PAC learning setting, we can bound the probability that learner will output hypothesis with given error oFor ANY consistent learner (case where c in H) oFor ANY “best fit” hypothesis (agnostic learning, where perhaps c not in H)
oVC dimension as measure of complexity of H oQuantitative bounds characterizing bias/variance in choice of H
Yifeng Tao Carnegie Mellon University 57
[Slide from EricXing]
![Page 58: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/58.jpg)
Take home message
Carnegie Mellon University 58Yifeng Tao
PAC
[Slide from MattGormley]
![Page 59: Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification](https://reader033.vdocuments.us/reader033/viewer/2022051603/5ff15fb164eb0874931f20c6/html5/thumbnails/59.jpg)
References
oEric Xing, Ziv Bar-Joseph. 10701 Introduction to Machine Learning: http://www.cs.cmu.edu/~epxing/Class/10701/
oMatt Gormley. 10601 Introduction to Machine Learning: http://www.cs.cmu.edu/~mgormley/courses/10601/index.html
oDavid Sontag. Introduction To Machine Learning. https://people.csail.mit.edu/dsontag/courses/ml12/slides/lecture14.pdf
Carnegie Mellon University 59Yifeng Tao