hypothesis testing and computational learning theoryddowney/courses/349_winter2014/lectures/... ·...

29
Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter 2014 With slides from Bryan Pardo, Tom Mitchell

Upload: trinhdien

Post on 16-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Hypothesis Testing and Computational Learning Theory

Doug Downey EECS 349 Winter 2014

With slides from Bryan Pardo, Tom Mitchell

Page 2: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Overview

• Hypothesis Testing: How do we know our learners are “good” ? – What does performance on test data imply/guarantee about

future performance?

• Computational Learning Theory: Are there general laws

that govern learning? – Sample Complexity: How many training examples are needed to

learn a successful hypothesis?

– Computational Complexity: How much computational effort is needed to learn a successful hypothesis?

Page 3: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Some terms

is the set of all possible instances

is the set of all possible concepts

where : {0,1}

is the set of hypotheses considered

by a learner,

is the learner

i

X

C c

c X

H

H C

L

D

s a probability distribution over

that generates observed instances

X

Page 4: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Definition • The true error of hypothesis h, with respect to the

target concept c and observation distribution D is the probability that h will misclassify an instance drawn according to D

• In a perfect world, we’d like the true error to be 0

[ ( ) ( )]Dx D

error P c x h x

Page 5: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Definition • The sample error of hypothesis h, with respect to the

target concept c and sample S is the proportion of S that that h misclassifies:

errorS(h) = 1/|S| xS (c(x), h(x))

where (c(x), h(x)) = 0 if c(x) = h(x), 1 otherwise

Page 6: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Problems Estimating Error

Page 7: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Example on Independent Test Set

Page 8: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Estimators

Page 9: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Confidence Intervals

and n*errorS(h), n*(1-errorS(h)) each > 5

Page 10: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Confidence Intervals

• Under same conditions…

Page 11: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Life Skills

• “Convincing demonstration” that certain enhancements improve performance?

• Use online Fisher Exact or Chi Square tests to evaluate hypotheses, e.g:

– http://people.ku.edu/~preacher/chisq/chisq.htm

Page 12: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Overview

• Hypothesis Testing: How do we know our learners are “good” ? – What does performance on test data imply/guarantee about

future performance?

• Computational Learning Theory: Are there general laws

that govern learning? – Sample Complexity: How many training examples are needed to

learn a successful hypothesis?

– Computational Complexity: How much computational effort is needed to learn a successful hypothesis?

Page 13: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Computational Learning Theory

• Are there general laws that govern learning?

– No Free Lunch Theorem: The expected accuracy of any learning algorithm across all concepts is 50%.

• But can we still say something positive?

– Yes.

– Probably Approximately Correct (PAC) learning

Page 14: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

The world isn’t perfect • If we can’t provide every instance for training, a consistent

hypothesis may have error on unobserved instances.

• How many training examples do we need to bound the likelihood of error to a reasonable level?

When is our hypothesis Probably Approximately Correct (PAC)?

Instance Space X

Training set

Hypothesis H

Concept C

Page 15: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Definitions

• A hypothesis is consistent if it has zero error on training examples

• The version space (VSH,T) is the set of all hypotheses consistent on training set T in our hypothesis space H

– (reminder: hypothesis space is the set of concepts we’re considering, e.g. depth-2 decision trees)

Page 16: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Definition: e-exhausted IN ENGLISH:

The set of hypotheses consistent with the training data

T is e-exhausted if, when you test them on the actual distribution of instances, all consistent hypotheses have error below e

IN MATH:

e

e

)(,

if.... ,on distributi sample and

concept for exhausted- is

herrorVSh

D

cVS

DH,T

H,T

Page 17: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

A Theorem

m

H,T |H|e VSP

c

mT

H

e

e

exhausted)-ε NOT is(

1...0any for THEN,

concept of examplesdrawn

randomlyt independen contains set

training& finite, is space hypothesis If

Page 18: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Proof of Theorem

)1( right) examples got (

:isright examples getting ofy probabilit theErgo

1 right) example 1got (

:isright exampe random single a getting

it ofy probabilit the,error truehas hypothesis If

m-εmhP

mh

-εhP

h

e

Page 19: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Proof of Theorem

)1(good looks bad oneleast at

)1(by BOUNDED is prob. This

.good looks bad oneleast at right instances got

hypotheses thoseofleast at y probabilit thecall ,

leastat error with in hypotheses are thereIf

m

m

-εkhP

-εk

)hP(m

k

Hk

e

“Union” bound

Page 20: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Proof of Theorem (continued) mm -ε-εkHk )1(H )1( that followsit , Since

eee e)1( then 1,0 If

mmm e-ε-εkhP e H)1(H )1()good looks bad oneleast at (

..Therefore.

eerror have will

data training with theconsistent shypothsesi

a that likelihood on the bound a have now We

complete! Proof

Page 21: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Using the theorem

m

m

m

m

e

e

e

m

m

m

e

e

e

e

e

e

e

1lnHln

1

lnHln1

lnHln

lnHln

lnlnHln

lnHln

H

. iserror true

our likelihood the

on bound aset to

need weexamples

ingmany train how

see torearrange sLet'

e

Page 22: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Probably Approximately Correct (PAC)

m e

lnHln1

The worst error we’ll tolerate

hypothesis space size

The likelihood a hypothesis consistent with the training data

will have error e

number of training examples

Page 23: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Using the bound

m e

lnHln1

Plug in e, , and H to get a number of training examples m that will “guarantee” your learner will generate a hypothesis that is Probably Approximately Correct.

NOTE: This assumes that the concept is actually IN H, that H is finite, and that

your training set is drawn using distribution D

Page 24: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Problems with PAC

• The PAC Learning framework has 2 disadvantages: 1) It can lead to weak bounds 2)Sample Complexity bound cannot be established

for infinite hypothesis spaces

• We introduce the VC dimension for dealing with these problems

Page 25: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Shattering Def: A set of instances S is shattered by hypothesis set H iff for every possible concept c on S there exists a hypothesis h in H that is consistent with that concept.

Page 26: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Can a linear separator shatter this?

The ability of H to shatter a set of instances is a measure of its capacity to represent target concepts defined over those instances

NO!

Page 27: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Can a quadratic separator shatter this?

Page 28: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

Vapnik-Chervonenkis Dimension

Def: The Vapnik-Chervonenkis dimension, VC(H) of hypothesis space H defined over instance space X is the size of the largest finite subset of X shattered by H. If arbitrarily large finite sets can be shattered by H, then VC(H) is infinite.

Page 29: Hypothesis Testing and Computational Learning Theoryddowney/courses/349_Winter2014/lectures/... · Hypothesis Testing and Computational Learning Theory Doug Downey EECS 349 Winter

How many training examples needed?

• Upper bound on m using VC(H)

))/13(log)(8)/2(log4(1

22 ee

HVCm