chapter 15 modeling of data. statistics of data mean (or average): variance: median: a value x j...
TRANSCRIPT
Statistics of Data
• Mean (or average):
• Variance:
• Median: a value xj such that half of the data are bigger than it, and half of data smaller than it.
1
1 N
jj
x xN
221
1
1Var( , , )
1
N
N jj
x x x xN
σ is called standard deviation.
Least Squares
• Given N data points (xi,yi), i = 1, …, N, find the fitting parameters aj, j = 1, 2, …, M of the function
f(x) = y(x; a1,a2,…,aM)
such that
is minimized over the parameters aj.
2
11
( ; , , )N
i i Mi
y y x a a
Why Least Squares
• Given the parameters, what is the probability that the observed data occurred?
• Assuming independent, Gaussian distribution, that is:
2
1
( )1exp
2
Ni i
ii i
y y xP y
Chi-Square Fitting
• Minimize the quantity:
• If each term is an independent Gaussian, 2 follows so-called 2 distribution. Given the value 2 above, we can compute Q = Prob(random variable chi2 > 2)
• If Q < 0.001 or Q > .999, the model may be rejected.
2
2 1
1
( ; , , )Ni i M
i i
y y x a a
Meaning of Goodness-of-Fit Q
2
2 2 2( ) exp / 2P
Observed value of 2
Area = Q
If the statistic 2 indeed follows this distribution, the probability that chi-square value is the currently computed value 2, or greater, equals the hashed area Q.
It is quite unlikely if Q is very small or very close to 1. If so, we reject the model.
Number of degrees of freedom = N – M.0
Fitting to Straight Line(with known error bars)
x
y
fitting to y=a+bx
Given (xi, yi±σi) Find interception a and slope b such that
the chi-square merit function is minimized.
Goodness-of-fit is Q=gammq((N-2)/2, 2/2). If Q > 0.1, the fitting is good, if Q ≈ 0.001, may be OK, but if Q < 0.001, fitting is questionable.
If Q > 0.999, fitting is too good to be true.
2
2
1
( , )N
i i
i i
y a bxa b
Linear Regression Model
x
y
fitting to y=a+bx
ε
Data do not follow exactly the straight line. The basic assumption in linear regression (least squares fit) is that the deviations ε are independent gaussian random noise.
Error in y, but no error in x.
Solution of Straight Line Fit2 2
2 2 21 1 1
2
2 21 1
2
0, 0
1, ,
,
,
,
N N Ni i
x yi i ii i i
N Ni i i
xx xyi ii i
x y
x xx xy xx x
xx y x xy xy x y
a bx y
S S S
x x yS S
aS bS S
aS bS S SS S
S S S S SS S Sa b
Error Propagation
• Let z = f(y1,y2,…,yN) be a function of independent random variables yi. Assuming the variances are small, we have
• Variance of z is related to variances of yi by
1
( )i
N
i ii i y
fz z y y
y
2
2 2
1
N
f ii i
f
y
Error Estimates on a and b
• Using error propagation formula, viewing a as a function of yi, we have
• Thus
• Similarly
2xx x i
i i
S S xa
y
2
2 22
1
Nxx x i xx
a ii i
S S x S
2b
S
What if error in yi is unknown?
• The goodness-of-fit Q can no longer be computed
• Assuming all data have same σ:
• Error in a and b can still be estimated, using σi=σ (but less reliably)
22
1
( ) /( )N
i ii
y y x N M
M is number of basis functions, M=2 for straight line fit.
General Linear Least-Squares
• Fit to a linear combination of arbitrary functions:
• E.g., polynomial fit Xk(x)=xk-1, or harmonic series Xk(x)=sin(kx), etc
• The basis functions Xk(x) can be nonlinear
1
( ) ( )M
k kk
y x a X x
Merit Function & Design Matrix
• Find ak that minimize
• Define
• The problem can be stated as
2
2 1
1
( )M
Ni k k ik
i i
y a X x
( ),j i i
ij ii i
X x yA b
2Min || ||b Aa
1
2
M
a
aa
a
Let a be a column vector:
Normal Equation & Covariance
• The solution to min ||b-Aa|| is ATAa=ATb
• Let C = (ATA)-1, then a = CATb
• We can view data yi as a random variable due to random error, yi=y(x)+εi. <εi>=0, <εiεj>=σi
2 ij. Thus a is also a random
variable. Covariance of a is precisely C
• <aaT>-<a><aT> = C
• Estimate of the fitting coefficient is T
j jjja CA b C
Singular Value Decomposition
• We can factor arbitrary complex matrix as
A = UΣV†
11 1 11 1 111 1
21 2112
1
0
0 0
0
0 0 0
M NM
MMM
N NM NN
a a U U wV V
a UV
wV
a a U
NM NNNM
MM
U and V are unitary, i.e., UU†=1, VV†=1
Σ is diagonal (but need not square), real and positive, wj ≥ 0.
Solve Least-Squares by SVD
• From normal equation, we have
1
1
1 1
1 1
( )
but
so ( ) ( )
( ) ( )
T T
T
T T T T T
T T T T T T T T T
T T T T T T T
a A A A b
A U V
a U V U V U V b
V U U V V U b V V V U b
V V V U b V U b
( )( )
, 1, ,j
Tj
jw j M jw
U ba V
Or
Omitting terms with very small w gives robust method.
1 1 1
( )
( )
T T TAB B A
AB B A
Nonlinear Models y=y(x; a)
2 is a nonlinear function of a. Close to minimum, we have (Taylor expansion)
2 3min min min min
2 22
1( ) ( ) ( ) ( ) ( )
21
2
where
( )+ = ( ),
T
T T
iji j
O
Da a
a a a a D a a a a
d a a D a
ad D a a
Solution Methods
• Know gradient only, Steepest descent:
• Know both gradient and Hessian matrix:
• Define
2next cur curconstant ( ) a a a
1 2min cur cur( ) a a D a
22
1
( ; ) ( ; )1 1,
2
Ni i
kli i k l
y x y x
a a
a aβ
Levenberg-Marquardt Method
• Smoothly interpolate between the two methods by a control parameter . =0, use more precise Hessian; very large, use steepest descent.
• Define new matrix A’ with elements:
(1 ), if
, if ii
ijij
i j
i j
Levenberg-Marquardt Algorithm
• Start with an initial guess of a
• Compute 2(a)
• Pick a modest value for , say =0.001
• (†) Solve A’a=β, evaluate 2(a+a)
• If 2 increase, increase by a factor of 10 and go back to (†)
• If 2 decrease, decrease by a factor of 10, update a a+ a, and go back to (†)