richard kass/sp07 p416 lecture 6 1 chi square distribution ( 2 ) and least squares fitting chi...

8
Richard Kass/Sp07 P416 Lecture 6 1 Chi Square Distribution ( 2 ) and Least Squares Fitting Chi Square Probability Distribution Function (See Taylor Ch 8, 12, App. D or Barlow P106-8, 150-3) Assume: We have a set of measurements {x 1 , x 2 , … x n }. We know the true value of each x i (x t1 , x t2 , … x tn ). Obviously the closer the (x 1 , x 2 , … x n )’s are to the (x t1 , x t2 , … x tn )’s the better (or more accurate) the measurements. Can we get more specific? Can we put a number (or probability) on how well they agree? Assume: The measurements are independent of each other. The measurements come from a Gaussian distribution. Let ( 1 , 2 ... n ) be the standard deviation associated with each measurement. Consider the following two possible measures of the quality of the data: n i i i n i i ti i n i i i n i i ti i d x x d x x R 1 2 2 1 2 2 2 1 1 ) ( Both measures give zero when the measurements are identical to the true values. em : We have some measurements and would like some way to measure how “good” these measurements really are. ion : Consider calculating the “ 2 (“chi-square”) . . +d -d R=0 x y

Upload: dulcie-lee

Post on 19-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Richard Kass/Sp07 P416 Lecture 6 1 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Probability Distribution Function (See Taylor Ch

Richard Kass/Sp07 P416 Lecture 61

Chi Square Distribution (2) and Least Squares Fitting

Chi Square Probability Distribution Function (See Taylor Ch 8, 12, App. D or Barlow P106-8, 150-3)

Assume: We have a set of measurements {x1, x2, … xn}. We know the true value of each xi (xt1, xt2, … xtn).

Obviously the closer the (x1, x2, … xn)’s are to the (xt1, xt2, … xtn)’s the better (or more accurate) the measurements.

Can we get more specific? Can we put a number (or probability) on how well they agree? Assume: The measurements are independent of each other.

The measurements come from a Gaussian distribution. Let (1, 2 ... n) be the standard deviation associated with each measurement.

Consider the following two possible measures of the quality of the data:

Which of the above gives more information on the quality of the data?Both R and 2 are zero if the measurements agree with the true value.R looks good because via the Central Limit Theorem as n the sum Gaussian. But

However, 2 is better!

n

i i

in

i i

tii

n

i i

in

i i

tii

dxx

dxxR

12

2

12

22

11

)(

Both measures give zerowhen the measurements areidentical to the true values.

Problem: We have some measurements and would like some way to measure how “good” these measurements really are.Solution: Consider calculating the “2” (“chi-square”)

.

.+d

-d

R=0

x

y

Page 2: Richard Kass/Sp07 P416 Lecture 6 1 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Probability Distribution Function (See Taylor Ch

Richard Kass/Sp07 P416 Lecture 62

One can show (derive) that the probability distribution function for 2 is exactly:

This is called the "Chi Square" (2) probability distribution function. is the “Gamma Function”.

This is a continuous probability distribution that is a function of two variables:2 and n = number of degrees of freedom (DOF).DOF=n = (# of data points) – (# of parameters calculated from the data points)

22/12/22/

2 0][)2/(2

1),(

2

en

np nn

0)( 10 xdttex xt

...)3,2,1(!)1( nnn

)2/1(

2 function for differentdegrees of freedom (v=n)

p(2 ,n

)

2

For n 20, P(2 > y) can be approximated using a Gaussian pdf with y = (22) 1/2 - (2n-1)1/2

Page 3: Richard Kass/Sp07 P416 Lecture 6 1 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Probability Distribution Function (See Taylor Ch

Richard Kass/Sp07 P416 Lecture 63

Example: We collected N events in an experiment and histogram the data in m bins before performing a fit to the data points. We have m data points!

Example: We count cosmic ray events in 15 second intervals and sort the data into 5 bins:

Suppose we want to compare our data with the expectations of a Poisson distribution (P):

In order to do this comparison we need to know A, the total number of intervals (=20 for this example). Since A is calculated from the data, we give up one degree of freedom: dof=m-1=5-1=4. If we also calculate the mean () of our Possion from the data we lose another dof:

dof=m-1=5-2=3.

Number of counts in 15 second intervals 0 1 2 3 4

Number of intervals with above counts 2 7 6 3 2

dof= # of degrees of freedom

!)(

e

AP

max # of degrees of freedom = # of data points

Example: We have 10 data points. How many degrees of freedom (n) do we have?Let and be the mean and standard deviation of the data.If we calculate and from the 10 data points then n = 8.If we know and calculate then n = 9.If we know and calculate then n = 9.If we know and then n = 10.

A=2+7+6+3+2=20

P(0)=# intervals with 0 countsP(1)=# intervals with 1 counts

8.123672

2433627120

Number of counts in 15 second intervals 0 1 2 3 4

Predicated Number of intervals with above counts 3.3 6.0 5.3 3.2 1.4

Although there were 36 cosmic rays in our sample we have only m=5 data points.

Page 4: Richard Kass/Sp07 P416 Lecture 6 1 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Probability Distribution Function (See Taylor Ch

Richard Kass/Sp07 P416 Lecture 64

Like the Gaussian probability distribution, this integral can not be done in closed form:

We must use a table to find the probability of exceeding a given 2 for a given number of dof.Example: What’s the probability to have 2 10 with

the number of degrees of freedom n = 4?Using Table D of Taylor or the graph on the right:

P(2 10, n = 4) = 0.04.Thus we say that the probability of getting a 2 10 with 4 degrees of freedom by chance is 0.04 (4%).

If instead of 4 degrees of freedom we only hadone degree of freedom then the probability of getting a 2 10 with 1 degree of freedom by chance is ~ 0.001 (0.1%). (Low probability of data being from a Poisson)

a

nn

ade

ndnpaP 22/12/2

2/222 2

][)2/(2

1),()(

Example: Back to our cosmic ray example from p3.We can calculate the probability that our data agreeswith the prediction from the Poisson using:

pointsdata

1 i

2ii2 03.1

predicted

)predictedmeasured(

i

why is this in the denominator??

P(2 1.03, n = 3) ≈ 0.79.Thus we say that the probability of getting a 2 1.03with 3 degrees of freedom by chance is 0.79 (79%). Thus the data “looks” Poisson!

5.24/10~:3-P292Taylor 2

Page 5: Richard Kass/Sp07 P416 Lecture 6 1 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Probability Distribution Function (See Taylor Ch

Richard Kass/Sp07 P416 Lecture 65

Least Squares Fitting (Taylor Ch 8) Suppose we have n data points (xi, yi, i).

Assume that we know a functional relationship between the points,

Assume that for each yi we know xi exactly. The parameters a, b, … are constants that we wish to determine from out data.

A procedure to obtain a and b is to minimize the following 2 with respect to a and b.

This is very similar to the Maximum Likelihood Method.For the Gaussian case MLM and LS are identical.

Technically this is a 2 distribution only if the y’s are from a Gaussian distribution.Since most of the time the y’s are not from a Gaussian we call it “least squares” rather than 2.

Example: We have a function with one unknown parameter:

Find b using the least squares technique. We need to minimize the following:

To find the b that minimizes the above function, we do the following:

y f (x,a,b...)

2 [yi f (xi ,a,b)]2

i2

i1

n

f (x,b) 1bx

2 [yi f (xi ,a,b)]2

i2

i1

n [yi 1 bxi ]

2

i2

i1

n

2

b

b[yi 1 bxi ]

2

i2

i1

n 2[yi 1 bxi ]xi

i2

i1

n 0

yixi

i2

i1

n xi

i2

i1

n bxi

2

i2

i1

n 0

Least Squares Fittingis sometimes called 2 fitting

Page 6: Richard Kass/Sp07 P416 Lecture 6 1 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Probability Distribution Function (See Taylor Ch

Richard Kass/Sp07 P416 Lecture 66

Here each measured data point (yi) is allowed to have a different standard deviation (i).The LS technique can be generalized to two or more parameters for simple and

complicated (e.g. non-linear) functions. One especially nice case is a polynomial function that is linear in the unknowns (ai):

We can always recast problem in terms of solving n simultaneous linear equations.We use the techniques from linear algebra and invert an n x n matrix to find the ai‘s!

Example: Given the following data perform a least squares fit to find the value of b.

Using the above expression for b we calculate:b = 1.05

Next is a plot of the data points and the line from the least squares fit:

b

yixi

i2

i1

n xi

i2

i1

n

xi2

i2

i1

n

f (x,a1...an ) a1 a2x a3x2 an xn1

f (x,b) 1bx

x 1.0 2.0 3.0 4.0

y 2.2 2.9 4.3 5.2

0.2 0.4 0.3 0.1

Solving for b we find:

Page 7: Richard Kass/Sp07 P416 Lecture 6 1 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Probability Distribution Function (See Taylor Ch

Richard Kass/Sp07 P416 Lecture 67

If we assume that the data points are from a Gaussian distribution,we can calculate a 2 and the probability associated with the fit.

From Table D of Taylor:The probability to get 2 1.04 for 3 degrees of freedom ≈ 80%.We call this a "good" fit since the probability is close to 100%.

If however the 2 was large (e.g. 15), the probability would be small (≈ 0.2% for 3 dof).

We would say this was a “bad” fit.

2 [yi 1 1.05xi ]2

i2

2.2 2.05

0.2

2

i1

n 2.9 3.1

0.4

2

4.3 4.16

0.3

2

5.2 5.2

0.1

2

1.04

RULE OF THUMBA “good” fit has 2 /dof ≤ 1

f (x,b) 1bx b = 1.05

y

x

Page 8: Richard Kass/Sp07 P416 Lecture 6 1 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Probability Distribution Function (See Taylor Ch

Richard Kass/Sp07 P416 Lecture 68

Some not so good things about the 2 distribution and test:

Given a set of data points two different functions can have the same value of 2.

The method does not specify a unique form of the solution or function

How many parameters should we fit for ??

We will get a different value for slope (b) if we fit 1+bx instead of a+bx

The value of 2 does not look at the order of the data points.

The 2 distribution ignores trends in the data points.

The distribution ignores the sign of differences between the data points and “true” values.

It only uses the square of the differences.

There are other distributions/statistical tests that are more powerful than 2 test:

“run tests” and “Kolmogorov test”

These tests do use the order of the points.

These tests can be harder to use or require more numerical processing than the 2 technique.