1 curve-fitting regression. 2 applications of curve fitting to fit curves to a collection of...

Post on 20-Jan-2016

225 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Curve-Fitting

Regression

2

Applications of Curve Fitting• To fit curves to a

collection of discrete points in order to obtain intermediate estimates or to provide trend analysis

3

Applications of Curve Fitting• To obtain a simplified version of a complicated

function– e.g.: In the applications of computing integration

b

a

b

axgxfxgxf )()()()(

• Hypothesis testing– Compare theoretical data model to empirical data

collected through experiments to test if they agree with each other.

4

Two Approaches

• Regression – Find the "best" curve to fit the points. The curve does not have to pass through the points. (Fig (a))– Uses when data have

noises or when only approximation is needed.

• Interpolation – Fit a curve or series of curves that pass through every point. (Figs (b) & (c))

5

Curve FittingRegression

Linear RegressionPolynomial RegressionMultiple Linear RegressionNon-linear Regression

InterpolationNewton's Divided-Difference InterpolationLagrange Interpolating PolynomialsSpline Interpolation

6

• Some data exhibit a linear relationship but have noises

• A curve that interpolates all points (that contain errors) would make a poor representation of the behavior of the data set.

• A straight line captures the linear relationship better.

Linear Regression – Introduction

7

Objective: Want to fit the "best" line to the data points (that exhibit linear relation).

– How do we define "best"?

Linear Regression

Pass through as many points as possible

Minimize the maximum residual of each point

Each point carries the same weight

8

Objective• Given a set of points

( x1, y1 ) , (x2, y2 ), …, (xn, yn )

• Want to find a straight line

y = a0+ a1xthat best fits the points.

The error or residual at each given point can be expressed as

ei = yi – a0 – a1xi

Linear Regression

9

Residual (Error) Measurement

10

Criteria for a "Best" Fit• Minimize the sum of residuals

– e.g.: Any line passing through mid-points would satisfy the criteria.

– Inadequate

n

iioi

n

ii xaaye

11

1

)(

• Minimize the sum of absolute values of residuals (L1-norm)

– e.g.: Any line within the upper and lower points would satisfy the criteria.

– Inadequate

n

iioi

n

ii xaaye

11

1

11

Criteria for a "Best" Fit• Minimax method: Minimize the

largest residuals of all the point (L∞-Norm)

– e.g.: Data set with an outlier. The line is affected strongly by the outlier.

– Inadequate

ioini

ini

xaaye 100maxminmaxmin

Outlier

Note: Minimax method is sometimes well suited for fitting a simple function to a complicated function. (Why?)

12

Least-Square Regression

n

iioi

n

ii xaaye

1

21

1

2 )(

• Minimize the sum of squares of the residuals (L2-Norm)

• Unique solution

n

iioi xaayaa

1

2110 )(minimizethatandfindtoHow

13

Least-Squares Fit of a Straight Line

satisfythat,findcanwe),,(minimizeTo

)(),(Let

11

1

211

aaaaS

xaayaaS

oor

n

iioior

n

iioi

n

iioi

r

xaay

xaay

a

S

11

11

0

0)(

0)(2

0

n

iiioii

n

iioii

r

xaxayx

xaayx

a

S

1

21

11

1

0)(

0)(2

0

14

Least-Squares Fit of a Straight Line

n

ii

n

iio

n

iio

n

ii

n

ii

n

io

n

ii

n

iioi

yaxna

xanay

xaay

xaay

11

1

11

1

11

11

11

0

0

0)(

n

iii

n

iio

n

ii

n

ii

n

iio

n

iii

n

ii

n

iio

n

iii

n

iiioii

yxaxax

xaxayx

xaxayx

xaxayx

11

1

2

1

1

21

11

1

21

11

1

21

0

0

0)(

These are called the normal equations.

How do you find a0 and a1?

15

Least-Squares Fit of a Straight Line

Solving the system of equations yields

n

iii

n

iio

n

ii

n

ii

n

iio

yxaxax

yaxna

11

1

2

1

11

1

xayn

xaya

xxn

yxyxna ii

ii

iiii1

10221

16

Quantification of Error of Linear Regression

Sy/x is called the standard error of the estimate.

Similar to "standard deviation", Sy/x quantifies the spread of the data points around the regression line.

The notation "y/x" designates that the error is for predicted value of y corresponding to a particular value of x.

2/

n

SS r

xy

n

iioi

n

iir xaayeS

1

21

1

2 )(

17

18

"Goodness" of our fit• Let St be the sum of the squares around the

mean for the dependent variable, y

• Let Sr be the sum of the squares of residuals around the regression line

• St - Sr quantifies the improvement or error reduction due to describing data in terms of a straight line rather than as an average value.

n

iit yyS

1

2)(

19

"Goodness" of our fit

• For a perfect fit

Sr=0 and r=r2=1, signifying that the line explains 100 percent of the variability of the data.

• For r=r2=0, Sr=St, the fit represents no improvement.

• e.g.: r2=0.868 means 86.8% of the original uncertainty has been explained by the linear model.

t

rt

S

SSr

2

tcoefficienncorrelatio:

iondeterminatoftcoefficien:2

r

r

20

Linearization of Nonlinear Relationships

• Linear regression is for fitting a straight line on points with linear relationship.

• Some non-linear relationships can be transformed so that in the transformed space the data exhibit a linear relationship.

• For examples,

33

3

33

222

111

111

equation.rate-growth

Saturation

logloglogequationPower

lnlnequationlExponentia

2

1

axa

b

yxb

xay

xbayxay

xbayeay

b

xb

21

Fig 17.9

22

Objective• Given n points

( x1, y1 ) , (x2, y2 ), …, (xn, yn )

• Want to find a polynomial of degree m

y = a0+ a1x + a2x2 + …+ amxm

that best fits the points.

The error or residual at each given point can be expressed as

ei = yi – a0 – a1x – a2x2 – … – amxm

Polynomial Regression

23

Least-Squares Fit of a Polynomial

n

ii

ji

mimiio

ji

n

i

mimiioi

ji

j

r

yxxaxaxaax

xaxaxaayx

mja

S

1

221

1

221

)...(

0)...(

yields...,,1,0for 0Setting

The procedures for finding a0, a1, …, am that minimize the sum of squares of the residuals is the same as those used in the linear least-square regression.

n

i

mimiioi

n

iir xaxaxaayeS

1

2221

1

2 )...(

24

imi

ii

ii

i

m

o

mi

mi

mi

mi

miiii

miiii

miii

yx

yx

yx

y

a

a

a

a

xxxx

xxxx

xxxx

xxxn

22

1

221

2432

132

2

Least-Squares Fit of a Polynomial

To find a0, a1, …, an that minimize Sr, we can solve this system of linear equations.

The standard error of the estimate becomes

)1(/

mn

SS r

xy

25

• In linear regression, y is a function of one variable.

• In multiple linear regression, y is a linear function of multiple variables.

• Want to find the best fitting linear equation

y = a0+ a1x1 + a2x2 + …+ amxm

• Same procedure to find a0, a1, a2, … ,am that minimize the sum of squared residuals

• The standard error of estimate is

Multiple Linear Regression

)1(/

mn

SS r

xy

26

General Linear Least Square• All of simple linear, polynomial, and multiple

linear regressions belong to the following general linear least squares model:

)functionsofkindanybecan(functionsdifferentare

...221100

i

mm

z

where

ezazazazay

• For polynomial

• For multiple linear mm xzxzxzz ...,,,,1 22110

mm xzxzxzxz ...,,,,1 2

210

0

27

General Linear Least Square

.pointtheatfunctionofvaluetherepresentswhere

,...,1,...th

221100

jzz

njezazazazay

iij

jmjmjjjj

• We can rewrite the above equations in matrix form as

nm

mnnn

m

m

n

n

e

e

e

a

a

a

zzz

zzz

zzz

y

y

y

or

2

1

1

0

10

21202

111011

eZay

• Given n points, we have

28

General Linear Least Square

The sum of squares of the residuals can be calculated as

To minimize Sr, we can set the partial derivatives of Sr to zeroes and solve the resulting normal equations.

The normal equations can also be expressed concisely as

n

i

m

jjijir zayS

1

2

0

yZZaZ TT How should we solve this system?

29

• The equation is non-linear and cannot be linearized using direct method.

• For example,

Non-Linear Regression

eeaxf xa )1()( 10

• Solution: approximation + iterative approach.• Gauss-Newton method:

– Use Taylor series to express the original non-linear equation in linear form.

– Use general least square theory to obtain new estimates of the parameters that minimize the sum of squared residuals

top related