fitting a line to n data points – 1 if we use then a, b are not independent. to make a, b...
TRANSCRIPT
Fitting a line to N data points – 1
• If we use
then a, b are not independent.
• To make a, b independent, compute:
• Then use:
• Intercept = optimally weighted mean value:
• Variance of intercept:
y =ax+b
ˆ x =xi
i∑ /σ i
2
1i∑ / σ i
2
y =a(x−ˆ x) +b
ˆ b =ˆ y=yi
i∑ /σ i
2
1i∑ / σ i
2
Var ( ˆ b ) =11
i∑ / σ i
2
Fitting a line to N data points – 2
• Slope = optimally weighted mean value:
• Optimal weights:
• Hence get optimal slope and its variance:
ˆ a =wi yi −ˆ y( )
i∑ / xi −ˆ x( )
wii∑
wi =1
Var yi −ˆ y( ) / xi −ˆ x( )[ ]
=xi −ˆ x( )2
σ i2
ˆ a =yi −ˆ y( )
i∑ xi −ˆ x( ) / σ i
2
xi −ˆ x( )2 / σ i
2
i∑
, Var( ˆ a) =1
xi −ˆ x( )2 / σ i
2
i∑
Linear regression
• If fitting a straight line, minimize:
• To minimize, set derivatives to zero:
• Note that these are a pair of simultaneous linear equations -- the “normal equations”.
χ 2 ≡yi − (axi + b)
σ i
⎡
⎣ ⎢
⎤
⎦ ⎥
i=1
N
∑2
0 =∂χ 2
∂b=
−2 yi −axi −b( )σ i
2i=1
N
∑
0 =∂χ 2
∂a=
−2xi yi −axi −b( )σ i
2i=1
N
∑
The Normal Equations
• Solve as simultaneous linear equations in matrix form – the “normal equations”:
• In vector-matrix notation:
• Solve using standard matrix-inversion methods (see Press et al for implementation).
• Note that the matrix M is diagonal if:
• In this case we have chosen an orthogonal basis.
xi2 /σ i
2∑ xi /σ i2∑
xi / σ i2∑ 1/σ i
2∑ ⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
ab ⎡ ⎣ ⎢ ⎤ ⎦ ⎥=
xiyi /σ i2∑
yi / σ i2∑
⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
Mr α =
r C( r y)
xi → (xi −ˆ x), since (xi −ˆ x)∑ /σ i2 =0
General linear regression
• Suppose you wish to fit your data points yi with the sum of several scaled functions of the xi:
• Example: fitting a polynomial:
• Goodness of fit to data xi, yi, σi:
• where:
• To minimise χ2, then for each k we have an equation:
y(x) =a1p1(x) +a2p2 (x) + ...
y(x) =a1 + a2x+ a2x+L + akxk−1
χ 2 ≡yi − y(x i )
σ i
⎡
⎣ ⎢
⎤
⎦ ⎥
i=1
N
∑2
y(x) = akk=1
K
∑ Pk(x)
0 =∂χ 2
∂ak
=−2Pk(xi )σ i
2 yi − aj Pj (xi )j=1
K
∑ ⎛
⎝ ⎜
⎞
⎠ ⎟
i=1
N
∑
Normal equations
• Normal equations are constructed as before:
• Or in matrix form:
a j Pj (xi )Pk (xi )
σ i2
j=1
K
∑i=1
N
∑ =Pk(xi )yi
σ i2
i=1
N
∑
Mkja jj=1
K
∑ =Ck( r y), . .i e
P1(xi )[ ]2
σ i2
i∑ L L M
M O O M
MPj (xi )Pk(xi )
σ i2
i∑ O M
PK (xi )P1(xi )σ i
2i∑ L L M
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
a1
M
aj
M
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥
=
yiP1(xi )σ i
2i∑
MyiPk(xi )σ i
2i∑
M
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥
Uncertainties of the answers
• We want to know the uncertainties of the best-fit values of the parameters aj .
• For a one-parameter fit we’ve seen that:
• By analogy, for a multi-parameter fit the covariance of any pair of parameters is:
• Hence get local quadratic approximation to χ2 surface using Hessian matrix H:
if ˆ α minimizesχ 2 , (then Var̂ α ) = 2∂ 2χ 2 / ∂α 2 .
Cov(ak , a j ) =1
2
∂ 2χ 2
∂αk∂α j
⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
−1
.
χ2 (
r a ) = χ 2 ( ˆ a ) + (
r a − ˆ a )H(
r a − ˆ a )
The Hessian matrix
• Defined as
• It’s the same matrix M we derived from the normal equations!
• Example: y = ax + b.
H jk ≡12∂2χ 2
∂aj∂ak
,
∂ 2 χ 2
∂a2= 2 xi
2 / σ i2
i∑
∂ 2 χ 2
∂a∂b= 2 xi /σ i
2
i∑
∂ 2 χ 2
∂b2= 2 1/ σ i
2
i∑ , so H =
xi2 / σ i
2
i∑ xi /σ i
2
i∑
xi / σ i2
i∑ 1/ σ i
2
i∑
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥= M
Principal axes of χ2 ellipsoid
• The eigenvectors of H define the principal axes of the χ2 ellipsoid.
• H is diagonalised by replacing the coordinates xi with:
• This gives
• And so orthogonalises the parameters.
b
a
xi → (xi −ˆ x)
H =(xi −ˆ x)2 / σ i
2
i∑ 0
0 1/ σ i2
i∑
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥
b
a
Principal axes for general linear models• In the general linear case where we fit K
functions Pk with scale factors ak:
• The Hessian matrix has elements:
• Normal equations are
• This gives K-dimensional ellipsoidal surfaces of constant χ2 whose principal axes are eigenvectors of the Hessian matrix H.
• Use standard matrix methods to find linear combinations of xi, yi that diagonalise H.
y(x) = akk=1
K
∑ Pk(x)
H jkak =cj wherecj =yiPj (xi )σ i
2i=1
N
∑
H jk ≡12∂ 2 χ 2
∂aj∂ak
=Pj (xi )Pk(xi )
σ i2
i=1
N
∑