fitting a line to n data points – 1 if we use then a, b are not independent. to make a, b...

Fitting a line to N data points – 1

• If we use

then a, b are not independent.

• To make a, b independent, compute:

• Then use:

• Intercept = optimally weighted mean value:

• Variance of intercept:

y =ax+b

ˆ x =xi

i∑ /σ i

2

1i∑ / σ i

2

y =a(x−ˆ x) +b

ˆ b =ˆ y=yi

i∑ /σ i

2

1i∑ / σ i

2

Var ( ˆ b ) =11

i∑ / σ i

2

Fitting a line to N data points – 2

• Slope = optimally weighted mean value:

• Optimal weights:

• Hence get optimal slope and its variance:

ˆ a =wi yi −ˆ y( )

i∑ / xi −ˆ x( )

wii∑

wi =1

Var yi −ˆ y( ) / xi −ˆ x( )[ ]

=xi −ˆ x( )2

σ i2

ˆ a =yi −ˆ y( )

i∑ xi −ˆ x( ) / σ i

2

xi −ˆ x( )2 / σ i

2

i∑

, Var( ˆ a) =1

xi −ˆ x( )2 / σ i

2

i∑

Linear regression

• If fitting a straight line, minimize:

• To minimize, set derivatives to zero:

• Note that these are a pair of simultaneous linear equations -- the “normal equations”.

χ 2 ≡yi − (axi + b)

σ i

⎡

⎣ ⎢

⎤

⎦ ⎥

i=1

N

∑2

0 =∂χ 2

∂b=

−2 yi −axi −b( )σ i

2i=1

N

∑

0 =∂χ 2

∂a=

−2xi yi −axi −b( )σ i

2i=1

N

∑

The Normal Equations

• Solve as simultaneous linear equations in matrix form – the “normal equations”:

• In vector-matrix notation:

• Solve using standard matrix-inversion methods (see Press et al for implementation).

• Note that the matrix M is diagonal if:

• In this case we have chosen an orthogonal basis.

xi2 /σ i

2∑ xi /σ i2∑

xi / σ i2∑ 1/σ i

2∑ ⎡

⎣ ⎢ ⎢

⎤

⎦ ⎥ ⎥

ab ⎡ ⎣ ⎢ ⎤ ⎦ ⎥=

xiyi /σ i2∑

yi / σ i2∑

⎡

⎣ ⎢ ⎢

⎤

⎦ ⎥ ⎥

Mr α =

r C( r y)

xi → (xi −ˆ x), since (xi −ˆ x)∑ /σ i2 =0

General linear regression

• Suppose you wish to fit your data points yi with the sum of several scaled functions of the xi:

• Example: fitting a polynomial:

• Goodness of fit to data xi, yi, σi:

• where:

• To minimise χ2, then for each k we have an equation:

y(x) =a1p1(x) +a2p2 (x) + ...

y(x) =a1 + a2x+ a2x+L + akxk−1

χ 2 ≡yi − y(x i )

σ i

⎡

⎣ ⎢

⎤

⎦ ⎥

i=1

N

∑2

y(x) = akk=1

K

∑ Pk(x)

0 =∂χ 2

∂ak

=−2Pk(xi )σ i

2 yi − aj Pj (xi )j=1

K

∑ ⎛

⎝ ⎜

⎞

⎠ ⎟

i=1

N

∑

Normal equations

• Normal equations are constructed as before:

• Or in matrix form:

a j Pj (xi )Pk (xi )

σ i2

j=1

K

∑i=1

N

∑ =Pk(xi )yi

σ i2

i=1

N

∑

Mkja jj=1

K

∑ =Ck( r y), . .i e

P1(xi )[ ]2

σ i2

i∑ L L M

M O O M

MPj (xi )Pk(xi )

σ i2

i∑ O M

PK (xi )P1(xi )σ i

2i∑ L L M

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

a1

M

aj

M

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥

=

yiP1(xi )σ i

2i∑

MyiPk(xi )σ i

2i∑

M

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥

Uncertainties of the answers

• We want to know the uncertainties of the best-fit values of the parameters aj .

• For a one-parameter fit we’ve seen that:

• By analogy, for a multi-parameter fit the covariance of any pair of parameters is:

• Hence get local quadratic approximation to χ2 surface using Hessian matrix H:

if ˆ α minimizesχ 2 , (then Var̂ α ) = 2∂ 2χ 2 / ∂α 2 .

Cov(ak , a j ) =1

2

∂ 2χ 2

∂αk∂α j

⎡

⎣ ⎢ ⎢

⎤

⎦ ⎥ ⎥

−1

.

χ2 (

r a ) = χ 2 ( ˆ a ) + (

r a − ˆ a )H(

r a − ˆ a )

The Hessian matrix

• Defined as

• It’s the same matrix M we derived from the normal equations!

• Example: y = ax + b.

H jk ≡12∂2χ 2

∂aj∂ak

,

∂ 2 χ 2

∂a2= 2 xi

2 / σ i2

i∑

∂ 2 χ 2

∂a∂b= 2 xi /σ i

2

i∑

∂ 2 χ 2

∂b2= 2 1/ σ i

2

i∑ , so H =

xi2 / σ i

2

i∑ xi /σ i

2

i∑

xi / σ i2

i∑ 1/ σ i

2

i∑

⎡

⎣

⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥= M

Principal axes of χ2 ellipsoid

• The eigenvectors of H define the principal axes of the χ2 ellipsoid.

• H is diagonalised by replacing the coordinates xi with:

• This gives

• And so orthogonalises the parameters.

b

a

xi → (xi −ˆ x)

H =(xi −ˆ x)2 / σ i

2

i∑ 0

0 1/ σ i2

i∑

⎡

⎣

⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥

b

a

Principal axes for general linear models• In the general linear case where we fit K

functions Pk with scale factors ak:

• The Hessian matrix has elements:

• Normal equations are

• This gives K-dimensional ellipsoidal surfaces of constant χ2 whose principal axes are eigenvectors of the Hessian matrix H.

• Use standard matrix methods to find linear combinations of xi, yi that diagonalise H.

y(x) = akk=1

K

∑ Pk(x)

H jkak =cj wherecj =yiPj (xi )σ i

2i=1

N

∑

H jk ≡12∂ 2 χ 2

∂aj∂ak

=Pj (xi )Pk(xi )

σ i2

i=1

N

∑

fitting a line to n data points – 1 if we use then a, b are not independent. to make a, b...

Documents