lecture 2 probability and what it has to do with data analysis

67
Lecture 2 Probability and what it has to do with data analysis

Upload: maya-stokes

Post on 04-Jan-2016

28 views

Category:

Documents


2 download

DESCRIPTION

Lecture 2 Probability and what it has to do with data analysis. Abstraction. Random variable, x it has no set value, until you ‘realize’ it its properties are described by a probability, P. One way to think about it. pot of an infinite number of x’s. x. p(x). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 2 Probability and what it has to do with data analysis

Lecture 2

Probabilityand what it has to do with

data analysis

Page 2: Lecture 2 Probability and what it has to do with data analysis

Abstraction

Random variable, x

it has no set value, until you ‘realize’ it

its properties are described by a probability, P

Page 3: Lecture 2 Probability and what it has to do with data analysis

pot of an infinite number of x’s

Drawing one x from the pot “realizes” x

One way to think about it

p(x)

x

Page 4: Lecture 2 Probability and what it has to do with data analysis

Describing P

If x can take on only discrete values,

say (1, 2, 3, 4, or 5)

then a table would work:

x 1 2 3 4 5

P 10% 30% 40% 15% 5%

Probabilitiesshould sum to

100%

40% probability that x=4

Page 5: Lecture 2 Probability and what it has to do with data analysis

Sometimes you see probabilities written as fractions, instead of percentages

x 1 2 3 4 5

P 0.10 0.40 0.40 0.15 0.05

Probabilityshould sum

to 1

0.15 probability that x=4

x

P(x)

0.0

0.5

1 2 3 4 5

And sometimes you see probabilities plotted as a histogram

0.15 probability that x=4

Page 6: Lecture 2 Probability and what it has to do with data analysis

If x can take on any value, then use a smooth function (or “distribution”) p(x) instead of a table

p(x)

xx1 x2

probability that x is between x1 and x2 is proportional to

this area

mathematically P(x1<x<x2) = x1

x2 p(x) dx

Page 7: Lecture 2 Probability and what it has to do with data analysis

p(x)

x

Probability that x is between - and + is 100%, so total area = 1

Mathematically -+ p(x) dx = 1

Page 8: Lecture 2 Probability and what it has to do with data analysis

One Reason Why all this is relevant …

Any measurement of data that contains noise is treated as a random variable, d

and …

Page 9: Lecture 2 Probability and what it has to do with data analysis

The distribution p(d) embodies both the ‘true value’ of the datum being measured and the measurement noise

and …

Page 10: Lecture 2 Probability and what it has to do with data analysis

All quantities derived from a random variable are themselves random variables,

so …

Page 11: Lecture 2 Probability and what it has to do with data analysis

The algebra of random variables allows you to understand how …

… measurement noise affects inferences made from the data

Page 12: Lecture 2 Probability and what it has to do with data analysis

Basic Description of Distributions

want two basic numbers

1) something that describes what x’s commonly occur

2) something that describes the variability of the x’s

Page 13: Lecture 2 Probability and what it has to do with data analysis

1) something that describes what x’s e commonly occur

that is, where the distribution is centered

Page 14: Lecture 2 Probability and what it has to do with data analysis

p(x)

xxmode

Mode

x at which distribution has peak

most-likely value of x

peak

Page 15: Lecture 2 Probability and what it has to do with data analysis

The most popular car in the US is the Honda CR-V

But the next car you see on the highway will probably not be a Honda CR-V

Where’s a CV-R?

Honda CV-R

Page 16: Lecture 2 Probability and what it has to do with data analysis

But modes can be deceptive …p(

x)

xxmode

peak

0 10

x N0-1 31-2 182-3 113-4 84-5 115-6 146-7 87-8 78-9 119-10 9

Sure, the 1-2 range has the most counts, but most of the measurements are bigger than 2!

100 realizations of x

Page 17: Lecture 2 Probability and what it has to do with data analysis

p(x)

xxmedian

Median

50% chance x is smaller than xmedian

50% chance x is bigger than xmedian

No special reason the median needs to coincide with the peak

50% 50%

Page 18: Lecture 2 Probability and what it has to do with data analysis

P(x

)

x

Expected value or ‘mean’

value you would get if you took the mean of lots of realizations of x

01

2

3

4

1 2 3

Let’s examine a discrete distribution, for simplicity ...

Page 19: Lecture 2 Probability and what it has to do with data analysis

x N

1 20

2 80

3 40

Total 140

mean = [ 20 1 + 80 2 + 40 3 ] / 140

= (20/140) 1 + (80/140) 2 + (40/140) 3

= p(1) 1 + p(2) 2 + p(3) 3

= Σi p(xi) xi

Hypothetical table of 140 realizations of x

Page 20: Lecture 2 Probability and what it has to do with data analysis

by analogyfor a smooth distribution

Expected (or mean) value of x

E(x) = -+ x p(x) dx

Page 21: Lecture 2 Probability and what it has to do with data analysis

2) something that describes the variability of the x’s

that is, the width of the distribution

Page 22: Lecture 2 Probability and what it has to do with data analysis

p(x)

x

Here’s a perfectly sensible way to define the width of a distribution…

50%25%25%

W50

… it’s not used much, though

Page 23: Lecture 2 Probability and what it has to do with data analysis

p(x)

x

Width of a distribution

Here’s another way…

… multiply and integrate

E(x)

Parabola [x-E(x)]2

Page 24: Lecture 2 Probability and what it has to do with data analysis

p(x)

x

Variance = 2 = -+ [x-E(x)]2 p(x) dx

E(x)

[x-E

(x)]

2

[x-E

(x)]

2 p(x

)

xE(x)

Compute this total area …

Idea is that if distribution is narrow, then most of the probability lines up with the low spot of the parabola

But if it is wide, then some of the probability lines up with the high parts of the parabola

Page 25: Lecture 2 Probability and what it has to do with data analysis

p(x)

x

variance =

A measure of width …

we don’t immediately know its relationship to area, though …

E(x)

Page 26: Lecture 2 Probability and what it has to do with data analysis

the Gaussian or normal distribution

p(x) = exp{ - (x-x)2 / 22 ) 1(2)

x is expected value

2 is variance

Memorize me !

Page 27: Lecture 2 Probability and what it has to do with data analysis

x = 1

= 1

x = 3

= 0.5

x

x

p(x)

p(x)

Examples of

Normal

Distributions

Page 28: Lecture 2 Probability and what it has to do with data analysis

x

p(x)

x x+2x-2s

95%

Expectation =

Median =

Mode = x

95% of probability within 2 of the expected value

Properties of the normal distribution

Page 29: Lecture 2 Probability and what it has to do with data analysis

Again, Why all this is relevant …

Inference depends on data …

You use measurement, d, to deduce the values of some underlying parameter of interest, m.

e.g.use measurements of travel time, d, to deduce the seismic velocity, m, of the earth

Page 30: Lecture 2 Probability and what it has to do with data analysis

model parameter, m, depends on measurement, d

so m is a function of d, m(d)

so …

Page 31: Lecture 2 Probability and what it has to do with data analysis

If data, d, is a random variable

then so is model parameter, m

All inferences made from uncertain data are themselves uncertain

Model parameters are described by a distribution, p(m)

Page 32: Lecture 2 Probability and what it has to do with data analysis

Functions of a random variable

any function of a random variable is itself a random variable

Page 33: Lecture 2 Probability and what it has to do with data analysis

Special case of a linear relationship and a normal distribution

Normal p(d) with mean d and variance 2d

Linear relationship m = a d + b

Normal p(m) with mean ad+b and variance a22

d

Page 34: Lecture 2 Probability and what it has to do with data analysis

multivariate distributions

Page 35: Lecture 2 Probability and what it has to do with data analysis

Example

Liberty island is inhabited by both pigeons and seagulls

40% of the birds are pigeonsand 60% of the birds are gulls

50% of pigeons are white and 50% are grey100% of gulls are white

Page 36: Lecture 2 Probability and what it has to do with data analysis

Two variables

species s takes two values

pigeon p

and gull g

color c takes two values

white w

and tan t

Of 100 birds,

20 are white pigeons

20 are grey pigeons

60 are white gulls

0 are grey gulls

Page 37: Lecture 2 Probability and what it has to do with data analysis

What is the probability that a bird has species s and color c ?

cw t

p

g

s

20% 20%

60% 0%

Note: sum of all boxes is 100%

a random bird, that is

Page 38: Lecture 2 Probability and what it has to do with data analysis

This is called theJoint Probability

and is written

P(s,c)

Page 39: Lecture 2 Probability and what it has to do with data analysis

Two continuous variablessay x1 and x2

have a joint probability distributionand written

p(x1, x2)

with p(x1, x2) dx1 dx2 = 1

Page 40: Lecture 2 Probability and what it has to do with data analysis

You would contour a joint probability distribution

and it would look something like

x2

x1

Page 41: Lecture 2 Probability and what it has to do with data analysis

What is the probability that a bird has color c ?

cw t

p

g

s

20% 20%

60% 0%

start with P(s,c)

80% 20%

and sum columns

To get P(c)

Of 100 birds,

20 are white pigeons

20 are grey pigeons

60 are white gulls

0 are grey gulls

Page 42: Lecture 2 Probability and what it has to do with data analysis

What is the probability that a bird has species s ?

cw t

p

g

s

20% 20%

60% 0%

start with P(s,c)

60%

40%

and sum rows

To get P(s)

Of 100 birds,

20 are white pigeons

20 are grey pigeons

60 are white gulls

0 are grey gulls

Page 43: Lecture 2 Probability and what it has to do with data analysis

These operations make sense with distributions, too

x2

x1

x2

x1

x1

p(x1)

p(x1) = p(x1,x2) dx2

x2

p(x2)

p(x2) = p(x1,x2) dx1

distribution of x1

(irrespective of x2)distribution of x2

(irrespective of x1)

Page 44: Lecture 2 Probability and what it has to do with data analysis

Given that a bird is species swhat is the probability that it has color c ?

cw t

p

g

s

50% 50%

100% 0%

Note, all rows sum to 100

Of 100 birds,

20 are white pigeons

20 are grey pigeons

60 are white gulls

0 are grey gulls

Page 45: Lecture 2 Probability and what it has to do with data analysis

This is called theConditional Probability of c given s

and is writtenP(c|s)

similarly …

Page 46: Lecture 2 Probability and what it has to do with data analysis

Given that a bird is color cwhat is the probability that it has species s ?

cw t

p

g

s

25% 100%

75% 0%

Note, all columns sum to 100

Of 100 birds,

20 are white pigeons

20 are grey pigeons

60 are white gulls

0 are grey gulls

So 25% of white birds are pigeons

Page 47: Lecture 2 Probability and what it has to do with data analysis

This is called theConditional Probability of s given c

and is written

P(s|c)

Page 48: Lecture 2 Probability and what it has to do with data analysis

Beware!P(c|s) P(s|c)

cw t

p

g

s

50% 50%

100% 0%

cw t

p

g

s

25% 100%

75% 0%

Page 49: Lecture 2 Probability and what it has to do with data analysis

Actor Patrick Swaysepancreatic cancer victim

Lot of errors occur from confusing the two:

Probability that, if you have pancreatic cancer, that you

will die from it

90%

Probability that, if you die, you will have died of

pancreatic cancer

1.4%

Page 50: Lecture 2 Probability and what it has to do with data analysis

note

P(s,c) = P(s|c) P(c)

cw t

p

g

s

20 20

60 0

cw t

p

g

s

25 100

75 0

= 80 20

cw t

25% of 80 is 20

Page 51: Lecture 2 Probability and what it has to do with data analysis

and

P(s,c) = P(c|s) P(s)

cw t

p

g

s

20 20

60 0

=

cw t

p

g

s

50 50

100 0 60

40p

g

s

50% of 40 is 20

Page 52: Lecture 2 Probability and what it has to do with data analysis

and if

P(s,c) = P(s|c) P(c) = P(c|s) P(s)

then

P(s|c) = P(c|s) P(s) / P(c)

and

P(c|s) = P(s|c) P(c) / P(s)

… which is called Bayes Theorem

Page 53: Lecture 2 Probability and what it has to do with data analysis

In this example

bird color is the observable, the “data”

bird species is the “model parameter”

P(c|s) “color given species” or P(d|m) is

“making a prediction based on the model”Given a pigeon, what the probability that it’s grey?

P(s|c), “species given color” or P(m|d) is

“making an inference from the data”Given a grey bird, what the probability that it’s a pigeon?

Page 54: Lecture 2 Probability and what it has to do with data analysis

Why Bayes Theorem is important

It provides a framework for relating

making a prediction from the model, P(d|m)

to

making an inference from the data, P(m|d)

Page 55: Lecture 2 Probability and what it has to do with data analysis

Bayes Theorem also implies that the joint distribution of data and model parameters

p(d, m)

is the fundamental quantity

If you know p(d, m), you know everything there is to know …

Page 56: Lecture 2 Probability and what it has to do with data analysis

Expectation

Variance

And

Covariance

Of a multivariate distribution

Page 57: Lecture 2 Probability and what it has to do with data analysis

The expectation is computed by first reducing the distribution to one dimension

x2

x1

x2

x1

x1

p(x1)

x2

p(x2)

take theexpectationof p(x1) to get x1

x1

x2

take theexpectationof p(x2) to get x2

Page 58: Lecture 2 Probability and what it has to do with data analysis

The varaince is also computed by first reducing the distribution to one dimension

x2

x1

x2

x1

x1

p(x1)

x2

p(x2)

take thevarianceof p(x1) to get 1

2

x1

x2

take thevarianceof p(x2) to get 2

2

1

2

Page 59: Lecture 2 Probability and what it has to do with data analysis

Note that in this distributionif x1 is bigger than x1, then x2 is bigger than x2 and if x1 is smaller than x1, then x2 is smaller than x2

x2

x1Expected value

x1

x2

This is a

positive correlation

Page 60: Lecture 2 Probability and what it has to do with data analysis

Conversely, in this distributionif x1 is bigger than x1, then x2 is smaller than x2 and if x1 is smaller than x1, then x2 is smaller than x2

x2

x1Expected value

x1

x2

This is a

negative correlation

Page 61: Lecture 2 Probability and what it has to do with data analysis

This correlation can be quantified by multiplying the distribution by a four-quadrant function

x2

x1

x1

x2

+

+ -

-

And then integrating. The function (x1-x1)(x2-x2) works fine

C = (x1-x1) (x2-x2) p(x1,x2) dx1dx2Called the “covariance”

Page 62: Lecture 2 Probability and what it has to do with data analysis

Note that the matrix C with elements

Cij = (xi-xi) (xj-xj) p(x1,x2) dx1dx2

has diagonal elements of xi2 the variance of xi

andoff-diagonal elements of cov(xi,xj) the covariance of xi and xj

C =

12 cov(x1,x2) cov(x1,x3)

cov(x1,x2) 22 cov(x2,x2)

cov(x1,x3) cov(x2,x2) 32

Page 63: Lecture 2 Probability and what it has to do with data analysis

The “vector of means” of multivatiate distribution

x

and the “Covariance matrix” of multivariate distribution

Cx

summarized a lot – but not everything –about a multivariate distribution

Page 64: Lecture 2 Probability and what it has to do with data analysis

Functions of a set of random variables, x

A vector of of N random variables in a vector, x

Page 65: Lecture 2 Probability and what it has to do with data analysis

Special Case

linear function y=Mx

the expectation of y is

y=Mx

Memorize!

Page 66: Lecture 2 Probability and what it has to do with data analysis

the covariance of y is

So Cy = M Cx MT

Memorize!

Page 67: Lecture 2 Probability and what it has to do with data analysis

Note that these rules work regardless of the distribution of x

if y is linearly related to x, y=Mx then

y=Mx (rule for means)

Cy = M Cx MT

(rule for propagating error)

Memorize!