Learning From DataLecture 4
Real Learning is Feasible
Real Learning vs. VerificationThe Two Step Solution to Learning
Closer to Reality: Error and Noise
M. Magdon-IsmailCSCI 4100/6100
recap: Verification
h
Age
Incom
e
Eout(h)
↓ D
Age
Incom
e
Ein(h) =29
Hoeffding: Eout(g) ≈ Ein(g) (with high probability)
P[|Ein(h)− Eout(h)| > ǫ] ≤ 2e−2Nǫ2.
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 2 /16 Recap: selection bias and coins −→
recap: Selection Bias Illustrated with Coins
Coin tossing example:
• If we toss one coin and get no heads, its very surprising. P =1
2N
We expect it is biased: P[heads] ≈ 0.
• Tossing 70 coins, and find one with no heads. Is it surprising? P = 1−(1− 1
2N
)70
Do we expect P[heads] ≈ 0 for the selected coin?
Similar to the “birthday problem”: among 30 people, two will likely share the same birthday.
• This is called selection bias.Selection bias is a very serious trap. For example medical screening.
Search Causes Selection Bias
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 3 /16 Learning: finite model −→
Real Learning – Finite Learning Models
h1
Age
Incom
e
Eout(h1)
h2
Age
Incom
e
Eout(h2)
h3
Age
Incom
e
Eout(h3)
· · ·
hM
Age
Incom
e
Eout(hM)
↓ D
Age
Inco
me
Ein(h1) =29
Age
Inco
me
Ein(h2) = 0
Age
Inco
me
Ein(h3) =59
· · ·Age
Inco
me
Ein(hM) = 69
Pick the hypothesis with minimum Ein; will Eout be small?
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 4 /16 Hoeffding: finite model −→
Hoeffding says that Ein(g) ≈ Eout(g) for Finite H
P [|Ein(g)−Eout(g)| > ǫ] ≤ 2|H|e−2ǫ2N , for any ǫ > 0.
P [|Ein(g)−Eout(g)| ≤ ǫ] ≥ 1− 2|H|e−2ǫ2N , for any ǫ > 0.
We don’t care how g was obtained, as long as it is from H
Some Basic ProbabilityEvents A,B
ImplicationIf A =⇒ B (A ⊆ B) then P[A] ≤ P[B].
Union BoundP[A or B] = P[A ∪ B] ≤ P[A] + P[B].
Bayes’ Rule
P[A|B] =P[B|A] · P[A]
P[B]
Proof: Let M = |H|.
The event “|Ein(g)−Eout(g)| > ǫ” implies“|Ein(h1)− Eout(h1)| > ǫ” OR . . .OR “|Ein(hM)− Eout(hM)| > ǫ”
So, by the implication and union bounds:
P[|Ein(g)− Eout(g)| > ǫ] ≤ P[
M
ORm=1
|Ein(hM)− Eout(hM)| > ǫ
]
≤M∑
m=1
P[|Ein(hm)−Eout(hm)| > ǫ],
≤ 2Me−2ǫ2N .
(The last inequality is because we can apply the Hoeffding bound to each summand)
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 5 /16 Hoeffding as error bar −→
Interpreting the Hoeffding Bound for Finite |H|
P [|Ein(g)−Eout(g)| > ǫ] ≤ 2|H|e−2ǫ2N , for any ǫ > 0.
P [|Ein(g)−Eout(g)| ≤ ǫ] ≥ 1− 2|H|e−2ǫ2N , for any ǫ > 0.
Theorem. With probability at least 1− δ,
Eout(g) ≤ Ein(g) +
√1
2Nlog
2|H|δ
.
We don’t care how g was obtained, as long as g ∈ H
Proof: Let δ = 2|H|e−2ǫ2N . Then
P [|Ein(g)− Eout(g)| ≤ ǫ] ≥ 1− δ.
In words, with probability at least 1− δ,
|Ein(g)−Eout(g)| ≤ ǫ.
This implies
Eout(g) ≤ Ein(g) + ǫ.
From the definition of δ, solve for ǫ:
ǫ =
√1
2Nlog
2|H|δ
.
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 6 /16 Ein is close to Eout for small H −→
Ein Reaches Outside to Eout when |H| is Small
Eout(g) ≤ Ein(g) +
√1
2Nlog
2|H|δ
.
If N ≫ ln |H|, then Eout(g) ≈ Ein(g).
• Does not depend on X , P (x), f or how g is found.
• Only requires P (x) to generate the data points independently and also the test point.
What about Eout ≈ 0?
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 7 /16 2 step approach −→
The 2 Step Approach to Getting Eout ≈ 0:
(1) Eout(g) ≈ Ein(g).
(2) Ein(g) ≈ 0.
Together, these ensure Eout ≈ 0.
How to verify (1) since we do not know Eout
– must ensure it theoretically - Hoeffding.
We can ensure (2) (for example PLA)– modulo that we can guarantee (1)
There is a tradeoff:
• Small |H| =⇒ Ein ≈ Eout
• Large |H| =⇒ Ein ≈ 0 is more likely.in-sample error
model complexity√12N
log 2|H|δ
out-of-sample error
|H|
Error
|H|∗
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 8 /16 Summary: feasibility of learning −→
Feasibility of Learning (Finite Models)
• No Free Lunch: can’t know anything outside D, for sure.
• Can “learn” with high probability if D is i.i.d. from P (x).Eout ≈ Ein (Ein can reach outside the data set to Eout).
•We want Eout ≈ 0.
• The two step solution. We trade Eout ≈ 0 for 2 goals:
(i) Eout ≈ Ein;
(ii) Ein ≈ 0.
We know Ein, not Eout, but we can ensure (i) if |H| is small.
This is a big step!
•What about infinite H - the perceptron?
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 9 /16 Complex f are harder to learn −→
“Complex” Target Functions are Harder to Learn
What happened to the “difficulty” (complexity) of f?
• Simple f =⇒ can use small H to get Ein ≈ 0 (need smaller N).
• Complex f =⇒ need large H to get Ein ≈ 0 (need larger N).
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 10 /16 Learning setup with probability −→
Revising the Learning Problem – Adding in Probability
xx1,x2, . . . ,xN
g(x) ≈ f(x)
UNKNOWN TARGET FUNCTION
f : X 7→ Y
TRAINING EXAMPLES
(x1, y1), (x2, y2), . . . , (xN , yN )
UNKNOWNINPUT DISTRIBUTION
P (x)
FINALHYPOTHESIS
g
LEARNINGALGORITHM
A
HYPOTHESIS SET
H
yn = f(xn)
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 11 /16 Error and Noise −→
Error and Noise
Error Measure: How to quantify that h ≈ f .
Noise: yn 6= f(xn).
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 12 /16 Finger print example −→
Finger Print Recognition
f
+1 you
−1 intruder
Two types of error.
f
+1 −1
h+1 no error false accept
−1 false reject no error
In any application you need to think about how to penalize each type of error.
f+1 −1
h+1 0 1−1 10 0
f+1 −1
h+1 0 1000−1 1 0
Supermarket CIA
Take AwayError measure is specified by the user.
If not, choose one that is– plausible (conceptually appealing)
– friendly (practically appealing)
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 13 /16 Pointwise errors −→
Almost All Error Mearures are Pointwise
Compare h and f on individual points x using a pointwise error e(h(x), f(x)):
Binary error: e(h(x), f(x)) = Jh(x) 6= f(x)K (classification)
Squared error: e(h(x), f(x)) = (h(x)− f(x))2 (regression)
In-sample error:
Ein(h) =1
N
N∑
n=1
e(h(xn), f(xn)).
Out-of-sample error:Eout(h) = Ex[e(h(x), f(x))].
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 14 /16 Noisy targets −→
Noisy Targets
age 32 years
gender male
salary 40,000
debt 26,000
years in job 1 year
years at home 3 years
. . . . . .
Approve for credit?
Consider two customers with the same credit data.
They can have different behaviors.
The target ‘function’ is not a deterministic function but a stochastic function.
‘f(x)’ = P (y|x)
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 15 /16 Learning setup with error and noise −→
Learning Setup with Error Measure and Noisy Targets
x1,x2, . . . ,xN
TRAINING EXAMPLES
(x1, y1), (x2, y2), . . . , (xN , yN )
UNKNOWNINPUT DISTRIBUTION
P (x)
UNKNOWN TARGET DISTRIBUTION(target function f plus noise)
P (y | x)
x
g(x) ≈ f(x)
FINALHYPOTHESIS
g
HYPOTHESIS SET
H
LEARNINGALGORITHM
A
ERRORMEASURE
yn ∼ P (y | xn)
c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 16 /16