lecture 6 bootstraps maximum likelihood methods. boostrapping a way to generate empirical...

41
Lecture 6 Bootstraps Maximum Likelihood Methods

Upload: jayde-ventry

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Lecture 6

Bootstraps

Maximum Likelihood Methods

Page 2: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Boostrapping

A way to generateempirical probability distributions

Very handy for makingestimates of uncertainty

Page 3: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

100 realizations of a normal distribution p(y) with

y=50 y=100

Page 4: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

What is the distribution of

yest = i yi

?

N1

Page 5: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

We know this should be a Normal distribution with

expectation=y=50and variance=y/N=10

p(y)

y

p(yest)

yest

Page 6: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Here’s an empirical way of determining the distribution

called

bootstrapping

Page 7: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

y1

y2

y3

y4

y5

y6

y7

yN

y’1

y’2

y’3

y’4

y’5

y’6

y’7

y’N

4

3

7

11

4

1

9

6

N o

rigi

nal d

ata

Ran

dom

inte

gers

in

the

rang

e 1-

N

N r

esam

pled

dat

aN1

i y’i

Compute estimate

Now repeat a gazillion times and examine the resulting distribution of estimates

Page 8: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Note that we are doing

random sampling with replacement

of the original dataset y

to create a new dataset y’

Note: the same datum, yi, may appear several times in the new dataset, y’

Page 9: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

pot of an infinite number of y’s with

distribution p(y)

cup of N y’s drawn from

the pot

Does a cup drawn from the pot

capture the statistical behavior of what’s in the

pot?

Page 10: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

More or less the same thing in the 2 pots ?

Take 1 cup

p(y)D

uplic

ate

cup

an in

fini

te

num

ber

of ti

mes

Pour into new pot

p(y)

Page 11: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Random sampling easy to code in MatLab

yprime = y(unidrnd(N,N,1));

vector of N random integers between 1 and N

original dataresampled data

Page 12: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

The theoretical and bootstrap results match pretty well !

theoretical

Bootstrap with 105 realizations

Page 13: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Obviouslybootstrapping is of limited utility when we know the theoretical

distribution

(as in the previous example)

Page 14: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

but it can be very useful when we don’t

for example

what’s the distribution of yest

where (yest)2 = 1/(N-1) i (yi-yest)2

and yest= (1/N) i yi

(Yes, I know a statistician would know it follows Student’s T-distribution …)

Page 15: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

To do the bootstrap

we calculate

y’est= (1/N) i y’i

(y’est)2 = 1/(N-1) i (y’i-y’est)2

and y’est = (y’

est)2

many times – say 105 times

Page 16: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Here’s the bootstrap result …

Bootstrap with 105 realizations

ytrue

I numerically calculate an expected value of 92.8 and a variance of 6.2

Note that the distribution is not quite centered about the true value of 100

This is random variation. The original N=100 data are not quite representative of the an infinite ensemble of normally-distributed values

pyest)

yest

Page 17: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

So we would be justified saying

y 92.6 ± 12.4

that is, 26.2, the 95% confidence interval

Page 18: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

The Maximum Likelihood Distribution

A way to fitparameterized probability distributions

to data

very handy when you have good reasonto believe the data follow a particular

distribution

Page 19: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Likelihood Function, L

The logarithm ofthe probable-ness of a given dataset

Page 20: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

N data y are all drawn from the same distribution p(y)

the probable-ness of a single measurement yi is p(yi)

So the probable-ness of the whole dataset is

p(y1) p(y2) … p(yN) = i p(yi)

L = ln i p(yi) = i ln p(yi)

Page 21: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Now imagine that the distribution p(y) is known up to a vector m of unknown parameters

write p(y; m) with semicolon as a reminder

that its not a joint probabilty

The L is a function of m

L(m) = i ln p(yi; m)

Page 22: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

The Principle of Maximum Likelihood

Chose m so that it maximizes L(m)

L/mi = 0

the dataset that was in fact observed is the most probable one that could have been observed

Page 23: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Example – normal distribution of unknown mean y and variance 2

p(yi) = (2)-1/2 -1 exp{ -½ -2 (yi-y)2 }

L = i ln p(yi) =

-½Nln(2) –Nln() -½ -2 i (yi-y)2

L/y = 0 = -2 i (yi-y)

L/ = 0 = - N -1 + -3 i (yi-y)2

N’s arise because sum is

from 1 to N

Page 24: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Solving for y and

0 = -2 i (yi-y) y = N-1 iyi

0 = -N-1 + -3 i (yi-y)2 2 = N-1 i (yi-y)2

Page 25: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

y = N-1 iyi

2 = N-1 i (yi-y)2

Sample mean is the maximum likelihood estimate of the expected value of the normal distribution

Sample variance (more-or-less*) is the maximum likelihood estimate of the variance of the normal distribution

*issue of N vs. N-1 in the formula

Interpreting the results

Page 26: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Example – 100 data drawn from a normal distribution

truey=50=100

Page 27: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

L(y,)

y

maxat

y=62=107

Page 28: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Another Example – exponential distribution

p(yi) = ½ -1 exp{ - -1 |yi-y| }

Check normalization … use z= yi-y

p(yi)dy = ½-1 -+

exp{ - -1 |yi-y| } dyi

= ½ -1 2 0

+ exp{ - -1 z } dz

= -1 (-) exp{--1z}|0+ = 1

Is this parameter really the expectation ?

Is this parameter really variance ?

Page 29: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Is y the expectation ?

E(yi) = -+

yi ½ -1 exp{ - -1 |yi-y| } dyi

use z= yi-y

E(yi) = ½ -1 -+

(z+y) exp{ - -1|z| } dz

= ½ -1 2 y o

+exp{ - -1 z } dz

= - y exp{ - -1 z }|o+

= y

z exp(--1|z|) is odd function times even function so integral is zero

YES !

Page 30: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Is the variance ?

var(yi) = -+

(yi-y)2 ½ -1 exp{ - -1 |yi-y| } dyi

use z= -1(yi-y)

E(yi) = ½ -1 -+ 2 z2 exp{ -|z| } dz

= 2 0

+ z2 exp{ -z } dz

= 2 2 2

CRC Math Handbook gives this integral as equal to 2

Not Quite …

Page 31: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Maximum likelihood estimate

L = Nln(½) – Nln() - -1 i |yi-y|

L/y = 0 = - -1 i sgn (yi-y)

L/ = 0 = - N -1 + -2 i |yi-y|

y such that i sgn (yi-y) = 0

x

|x|

x

d|x|/dx

+1

-1

Zero when half the yi’s bigger than y, half of them smallery is the median of the yi’s

Page 32: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Once y is known then …

L/ = 0 = - N -1 + -2 i |yi-y|

= N-1 i |yi-y| with y = median(y)

Note that when N is even, y is not unique,

but can be anything between the two middle values in a sorted list of yi’s

Page 33: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Comparison

Normal distribution:

best estimate of expected value is sample mean

Exponential distribution

best estimate of expected value is sample median

Page 34: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

ComparisonNormal distribution:

short tailedoutlier extremely uncommonexpected value should be chosen to make

outliers have as small a deviation as possible

Exponential distribution:relatively long-tailedoutlier relatively commonexpected value should ignore actual value of outliers

yi

median mean

outlier

yi

median mean

Page 35: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

another important distributionGutenberg-Richter distribution

(e.g. earthquake magnitudes)

for earthquakes greater than some threshhold magnitude m0, the probability that the earthquake will have a magnitude greater than m is

–b (m-m0)

or P(m) = exp{ – log(10) b (m-m0) }

= exp{-b’ (m-m0) } with b’= log(10) b

P(m)=10

Page 36: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

This is a cumulative distribution, thus the probability that magnitude is greater than m0 is unity

P(m) = exp{ –b’ (m-m0) } = exp{0} = 1

Probability density distribution is its derivative

p(m) = b’ exp { –b’ (m-m0) }

Page 37: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Maximum likelihood estimate of b’ is

L(m) = N log(b’) – b’ i (mi-m0)

L/b’ = 0 = N/b’ - i (mi-m0)

b’ = N / i (mi-m0)

Page 38: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Originally Gutenberg & Richtermade a mistake …

magnitude, m

Log

10 P

(m)

slope = -b

… by estimating slope, b using least-squares, and not the Maximum Likelihood formula

least-squares fit

Page 39: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

yet another important distributionFisher distribution on a sphere

(e.g. paleomagnetic directions)

given unit vectors xi that scatter around some mean direction x, the probability distribution for the angle between xi and x (that is, cos()=xix) is

p() = sin() exp{ cos() }

2 sinh() is called the “precision parameter”

Page 40: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

Rationale for functional form

p() exp{ cos() }

For close to zero 1 – ½2 so

p() exp{ cos() } = exp{ exp{ – ½2 }

which is a gaussian

Page 41: Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of

I’ll let you figure out the

maximum likelihood estimate of

the central direction, x,

and the precision parameter,