chapter 15 modeling of data. statistics of data mean (or average): variance: median: a value x j...

24
Chapter 15 Modeling of Data

Upload: aileen-nicholson

Post on 28-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Chapter 15

Modeling of Data

Statistics of Data

• Mean (or average):

• Variance:

• Median: a value xj such that half of the data are bigger than it, and half of data smaller than it.

1

1 N

jj

x xN

221

1

1Var( , , )

1

N

N jj

x x x xN

σ is called standard deviation.

Higher Moments

3

11

4

11

1Skew( , , )

1Kurt( , , ) 3

Nj

Nj

Nj

Nj

x xx x

N

x xx x

N

Gaussian Distribution

2

2

( )

21( ; , )

2

x a

N x a e

2Var( )

Skew( ) 0

Kurt( ) 0

x a

x

x

x

Least Squares

• Given N data points (xi,yi), i = 1, …, N, find the fitting parameters aj, j = 1, 2, …, M of the function

f(x) = y(x; a1,a2,…,aM)

such that

is minimized over the parameters aj.

2

11

( ; , , )N

i i Mi

y y x a a

Why Least Squares

• Given the parameters, what is the probability that the observed data occurred?

• Assuming independent, Gaussian distribution, that is:

2

1

( )1exp

2

Ni i

ii i

y y xP y

Chi-Square Fitting

• Minimize the quantity:

• If each term is an independent Gaussian, 2 follows so-called 2 distribution. Given the value 2 above, we can compute Q = Prob(random variable chi2 > 2)

• If Q < 0.001 or Q > .999, the model may be rejected.

2

2 1

1

( ; , , )Ni i M

i i

y y x a a

Meaning of Goodness-of-Fit Q

2

2 2 2( ) exp / 2P

Observed value of 2

Area = Q

If the statistic 2 indeed follows this distribution, the probability that chi-square value is the currently computed value 2, or greater, equals the hashed area Q.

It is quite unlikely if Q is very small or very close to 1. If so, we reject the model.

Number of degrees of freedom = N – M.0

Fitting to Straight Line(with known error bars)

x

y

fitting to y=a+bx

Given (xi, yi±σi) Find interception a and slope b such that

the chi-square merit function is minimized.

Goodness-of-fit is Q=gammq((N-2)/2, 2/2). If Q > 0.1, the fitting is good, if Q ≈ 0.001, may be OK, but if Q < 0.001, fitting is questionable.

If Q > 0.999, fitting is too good to be true.

2

2

1

( , )N

i i

i i

y a bxa b

Linear Regression Model

x

y

fitting to y=a+bx

ε

Data do not follow exactly the straight line. The basic assumption in linear regression (least squares fit) is that the deviations ε are independent gaussian random noise.

Error in y, but no error in x.

Solution of Straight Line Fit2 2

2 2 21 1 1

2

2 21 1

2

0, 0

1, ,

,

,

,

N N Ni i

x yi i ii i i

N Ni i i

xx xyi ii i

x y

x xx xy xx x

xx y x xy xy x y

a bx y

S S S

x x yS S

aS bS S

aS bS S SS S

S S S S SS S Sa b

Error Propagation

• Let z = f(y1,y2,…,yN) be a function of independent random variables yi. Assuming the variances are small, we have

• Variance of z is related to variances of yi by

1

( )i

N

i ii i y

fz z y y

y

2

2 2

1

N

f ii i

f

y

Error Estimates on a and b

• Using error propagation formula, viewing a as a function of yi, we have

• Thus

• Similarly

2xx x i

i i

S S xa

y

2

2 22

1

Nxx x i xx

a ii i

S S x S

2b

S

What if error in yi is unknown?

• The goodness-of-fit Q can no longer be computed

• Assuming all data have same σ:

• Error in a and b can still be estimated, using σi=σ (but less reliably)

22

1

( ) /( )N

i ii

y y x N M

M is number of basis functions, M=2 for straight line fit.

General Linear Least-Squares

• Fit to a linear combination of arbitrary functions:

• E.g., polynomial fit Xk(x)=xk-1, or harmonic series Xk(x)=sin(kx), etc

• The basis functions Xk(x) can be nonlinear

1

( ) ( )M

k kk

y x a X x

Merit Function & Design Matrix

• Find ak that minimize

• Define

• The problem can be stated as

2

2 1

1

( )M

Ni k k ik

i i

y a X x

( ),j i i

ij ii i

X x yA b

2Min || ||b Aa

1

2

M

a

aa

a

Let a be a column vector:

Normal Equation & Covariance

• The solution to min ||b-Aa|| is ATAa=ATb

• Let C = (ATA)-1, then a = CATb

• We can view data yi as a random variable due to random error, yi=y(x)+εi. <εi>=0, <εiεj>=σi

2 ij. Thus a is also a random

variable. Covariance of a is precisely C

• <aaT>-<a><aT> = C

• Estimate of the fitting coefficient is T

j jjja CA b C

Singular Value Decomposition

• We can factor arbitrary complex matrix as

A = UΣV†

11 1 11 1 111 1

21 2112

1

0

0 0

0

0 0 0

M NM

MMM

N NM NN

a a U U wV V

a UV

wV

a a U

NM NNNM

MM

U and V are unitary, i.e., UU†=1, VV†=1

Σ is diagonal (but need not square), real and positive, wj ≥ 0.

Solve Least-Squares by SVD

• From normal equation, we have

1

1

1 1

1 1

( )

but

so ( ) ( )

( ) ( )

T T

T

T T T T T

T T T T T T T T T

T T T T T T T

a A A A b

A U V

a U V U V U V b

V U U V V U b V V V U b

V V V U b V U b

( )( )

, 1, ,j

Tj

jw j M jw

U ba V

Or

Omitting terms with very small w gives robust method.

1 1 1

( )

( )

T T TAB B A

AB B A

Nonlinear Models y=y(x; a)

2 is a nonlinear function of a. Close to minimum, we have (Taylor expansion)

2 3min min min min

2 22

1( ) ( ) ( ) ( ) ( )

21

2

where

( )+ = ( ),

T

T T

iji j

O

Da a

a a a a D a a a a

d a a D a

ad D a a

Solution Methods

• Know gradient only, Steepest descent:

• Know both gradient and Hessian matrix:

• Define

2next cur curconstant ( ) a a a

1 2min cur cur( ) a a D a

22

1

( ; ) ( ; )1 1,

2

Ni i

kli i k l

y x y x

a a

a aβ

Levenberg-Marquardt Method

• Smoothly interpolate between the two methods by a control parameter . =0, use more precise Hessian; very large, use steepest descent.

• Define new matrix A’ with elements:

(1 ), if

, if ii

ijij

i j

i j

Levenberg-Marquardt Algorithm

• Start with an initial guess of a

• Compute 2(a)

• Pick a modest value for , say =0.001

• (†) Solve A’a=β, evaluate 2(a+a)

• If 2 increase, increase by a factor of 10 and go back to (†)

• If 2 decrease, decrease by a factor of 10, update a a+ a, and go back to (†)

Problem Set 9

1. If we use the basis

{1, x, x + 2}

for a linear least-squares fit using normal equation method, do we encounter problem? Why? How about SVD?

2. What happen if we apply the Levenberg-Marquardt method for a linear least-square problem?