non-linear regression introduction quite variation in the ... and...2005. rsm simplified, optimizing...

20
NON-LINEAR REGRESSION Introduction Quite often in regression a straight line is not the “best” model for explaining the variation in the dependent variable. A model that includes quadratic or higher order terms may be needed. The number of possible comparisons is equal to the number of levels of a factor minus one. For example, if there are three levels of a factor, there are two possible comparisons. Polynomials are equations such that each is associated with a power of the independent variable (e.g. X, linear; X 2 , quadratic; X 3 , cubic, etc.). 1 st order comparisons measure linear relationships. 2 nd order comparisons measures quadratic relationships. 3 rd order comparisons measures cubic relationships. Example Effect of row spacing on yield (bu/ac) of soybean. Row spacing (inches) Block 18 24 30 36 42 ΣY .j 1 33.6 31.1 33.0 28.4 31.4 157.5 2 37.1 34.5 29.5 29.9 28.3 159.3 3 34.1 30.5 29.2 31.6 28.9 154.3 4 34.6 32.7 30.7 32.3 28.6 158.9 5 35.4 30.7 30.7 28.1 29.6 154.5 6 36.1 30.3 27.9 26.9 33.4 154.6 Y i. 210.9 189.8 181.0 177.2 180.2 939.1 . Y i 35.15 31.63 30.17 29.53 30.03 31.3 Step 1. Determine the polynomials that can be included in the model 5 rows spacing’s, so we can use X, X 2 , X 3 , and X 4 . Step 2. Run the Stepwise Regression analysis. Input row block yield row2=row*row; row3=row2*row; row4=row3*row; datalines;

Upload: others

Post on 13-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

NON-LINEAR REGRESSION

Introduction

Quite often in regression a straight line is not the “best” model for explaining the

variation in the dependent variable.

A model that includes quadratic or higher order terms may be needed.

The number of possible comparisons is equal to the number of levels of a factor

minus one.

For example, if there are three levels of a factor, there are two possible comparisons.

Polynomials are equations such that each is associated with a power of the independent variable (e.g. X, linear; X2, quadratic; X3, cubic, etc.).

1st order comparisons measure linear relationships. 2nd order comparisons measures quadratic relationships. 3rd order comparisons measures cubic relationships.

Example Effect of row spacing on yield (bu/ac) of soybean. Row spacing (inches)

Block 18 24 30 36 42 ΣY.j

1 33.6 31.1 33.0 28.4 31.4 157.5

2 37.1 34.5 29.5 29.9 28.3 159.3

3 34.1 30.5 29.2 31.6 28.9 154.3

4 34.6 32.7 30.7 32.3 28.6 158.9

5 35.4 30.7 30.7 28.1 29.6 154.5

6 36.1 30.3 27.9 26.9 33.4 154.6

Yi. 210.9 189.8 181.0 177.2 180.2 939.1

.Yi 35.15 31.63 30.17 29.53 30.03 31.3

Step 1. Determine the polynomials that can be included in the model

5 rows spacing’s, so we can use X, X2, X3, and X4.

Step 2. Run the Stepwise Regression analysis. Input row block yield row2=row*row; row3=row2*row; row4=row3*row; datalines;

Page 2: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

Proc STEPWISE; Model=row row2 row3 row4; Run; Step 3: Determine which parameters should remain in the model. Row and Row2 Step 4: Rerun the analysis using the parameters that contributed significantly to the model using the Proc REG command and test for lack of fit. Proc REG Model=yield row row2/LACKFIT; RUN;

( ) ( )

28

30

32

34

36

18 24 30 36 42

Effect of row spacing on soybean yield

Row spacing (inches)

Yie

ld (

bu

/ac)

Page 3: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

Analysis using stepwise regression

The STEPWISE Procedure

Model: MODEL1

Dependent Variable: yield

07:02 Thursday, June 21, 2012 1

Number of Observations Read 30

Number of Observations Used 30

Stepwise Selection: Step 1

Variable row Entered: R-Square = 0.4452 and C(p) = 9.8393

Analysis of Variance

Source DF

Sum of

Squares

Mean

Square F Value Pr > F

Model 1 91.26667 91.26667 22.47 <.0001

Error 28 113.72300 4.06154

Corrected Total 29 204.98967

Variable Parameter

Estimate

Standard

Error Type II SS F Value Pr > F

Intercept 37.47000 1.35192 3120.00200 768.18 <.0001

row -0.20556 0.04336 91.26667 22.47 <.0001

Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable row2 Entered: R-Square = 0.6096 and C(p) = 1.2210

Page 4: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

Analysis using stepwise regression

The STEPWISE Procedure

Model: MODEL1

Dependent Variable: yield

Stepwise Selection: Step

2

07:02 Thursday, June 21, 2012 2

Analysis of Variance

Source DF Sum of

Squares

Mean

Square F Value Pr > F

Model 2 124.96000 62.48000 21.08 <.0001

Error 27 80.02967 2.96406

Corrected Total 29 204.98967

Variable

Parameter

Estimate

Standard

Error Type II SS F Value Pr > F

Intercept 52.03667 4.47218 401.29928 135.39 <.0001

row -1.26111 0.31526 47.42972 16.00 0.0004

row2 0.01759 0.00522 33.69333 11.37 0.0023

Bounds on condition number: 72.429, 289.71

All variables left in the model are significant at the 0.1500 level.

No other variable met the 0.1500 significance level for entry into the model.

Summary of Stepwise Selection

Step

Variable

Entered

Variable

Removed

Number

Vars In

Partial

R-Square

Model

R-Square C(p) F Value Pr > F

1 row 1 0.4452 0.4452 9.8393 22.47 <.0001

2 row2 2 0.1644 0.6096 1.2210 11.37 0.0023

Page 5: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

Analysis using Proc REG and LACKFIT

Number of Observations Read 30

Number of Observations Used 30

Parameter Estimates

Variable DF Parameter

Estimate

Standard

Error t Value Pr > |t|

Intercept 1 52.03667 4.47218 11.64 <.0001

row 1 -1.26111 0.31526 -4.00 0.0004

row2 1 0.01759 0.00522 3.37 0.0023

Analysis of Variance

Source DF

Sum of

Squares

Mean

Square F Value Pr > F

Model 2 124.96000 62.48000 21.08 <.0001

Error 27 80.02967 2.96406

Lack of Fit 2 0.70133 0.35067 0.11 0.8958

Pure Error 25 79.32833 3.17313

Corrected Total 29 204.98967

Root MSE 1.72165 R-Square 0.6096

Dependent Mean 31.30333 Adj R-Sq 0.5807

Coeff Var 5.49988

Lack of fit is NS, indicating that the model is worth considering.

Page 6: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

RESPONSE SURFACE REGRESSION OR MODELING (RSM)

Introduction

A form of multivariate non-linear regression where the influences of several independent

or “response” variables on a dependent variable are determined.

The goal of RSM is typically to optimize a response.

In most cases the relationship between the response and dependent variable are unknown;

thus, it is necessary to obtain an estimate of the effects.

Quite often, a “first order model” in the form of:

is determined.

If there is a non-linear effect, the a “second order model” in the form of:

∑ ∑ ∑∑

is determined.

The least squares method is used to estimate the parameters in the models.

Common Results Obtained From RSM

1. Simple maximum

http://www.google.com/imgres?q=response+surface+plot&hl=en&biw=1916&bih=1070&gbv=2&tbm=isch&tbnid=enwaXmtFKVsjPM:&i

mgrefurl=http://www.ualberta.ca/~csps/JPPS5(3)/P.Ellaiah/alkaline.htm&docid=pllnWaKwfSsfmM&imgurl=http://www.ualberta.ca/~csps/J

PPS5(3)/P.Ellaiah/Figure-

2.gif&w=545&h=502&ei=IfnMT_ONPMbL0QGr28jADg&zoom=1&iact=hc&vpx=425&vpy=127&dur=642&hovh=215&hovw=234&tx=120&ty=118&sig=102023753706153885228&page=1&tbnh=110&tbnw=117&start=0&ndsp=59&ved=1t:429,r:2,s:0, i:77

Page 7: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

2. Simple minimum

3. Ridge

http://www.google.com/imgres?q=response+surface+plot&hl=en&biw=1916&bih=1070&gbv=2&tbm=isch&tbnid=ypHy-

aZPpzs4HM:&imgrefurl=http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/statug_rsreg_sect005.htm&docid=1K_vs

jsQdRZ0GM&imgurl=http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/images/rsrgd.png&w=640&h=480&ei=IfnMT_ONPMbL0QGr28jADg&zoom=1

http://2.bp.blogspot.com/_uf8HSnevUy8/SKB3i_uk2AI/AAAAAAAAAEE/k05DBmHPGmM/s320/Picture4.jpg

Page 8: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

4. Saddle

Basic Steps in RSM

Step 1: Determine which parameters may have an influence on the variation in the

dependent variable.

o You may have known factors determined from previous experiments.

o You may need to conduct additional simple 2n factorial experiments to identify

additional parameters that may be important.

These types of 2n factorial experiments are considered low resolution.

Step 2: Determine which parameters you want to continue and whether or not you will

have a first order (no curvature) or second order (curvature) model.

o First order model: .

o Second order model: ∑ ∑ ∑∑

Step 3: Develop the Response Surface Model.

o This is an optimization process.

http://www.pqsystems.com/products/sixsigma/DOEpack/images/ResponseSurfacePlotLarge.gif

Page 9: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

o Parameters can be best estimated if proper experimental design (response surface

designs) are used.

o A common design for 2-factor models is the Central Composite Design.

o A common design for the 3-factor model is the Box-Behnken Design.

Step4: Validate the model.

Cautions on Using Happenstance Data for RSM

It is often very tempting to use data we have already collected from an experiment and

perform RSM on these data; however, the results obtained may not be reliable.

Happenstance data are typically obtained from experiments where:

o The process being maximized is typically highly controlled so inputs and outputs

vary little.

o Inputs tend to be highly correlated.

Just as we do in many experiments where we are trying to detect differences between

treatment means, choice of experimental design is very important when setting up

experiments for RSM.

Experimental Design for RSM Experiments

o Some researchers often use the One Factor At-A-Time Approach, where they maximize

one factor, fix the factor at this level and then maximize a second variable.

o For example, lets say we wish to conduct an experiment to determine the proper nitrogen

(N) fertilizer rate and plant population to maximize sugar yield in sugarbeet.

o The first step would be to determine the N rate that maximizes sugar yield. Using

this N rate, conduct another experiment to determine what plant population now

maximizes sugar yield.

o The problem with this type of approach is that you often end up with an

increasing ridge type of response. You may be on the top of the ridge, buy not at

the maximum on the plot.

1. Two-level factorial design with center points

Extremely useful when you believe there are interactions between factors.

Let’s say we have a three factor experiments, with two levels of each factor

o Time (minutes): 80, 90, and 100

Page 10: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

o Temperature (oC): 140, 145, and 150

o Rate of adding chemical (mL/min): 4, 5, and 6

o The model would be :

o This first order model will not provide an estimate of curvature.

o A strict factorial would require us to use 27 treatments (3 x 3 x 3) or runs of

the experiment.

o Another option is available where we repeat the center points but not the

outside points of the cube.

o This design allowed us to use only 12 runs and experimental error is estimated

based on the replicated center points.

o To simplify the analysis, the data are typically coded to ease model fitting and

coefficient interpretation.

http://courseware.ee.calpoly.edu/~dbraun/papers/K155Fig1a.gif

Page 11: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

Example of data for a 23

factorial with center points (treatments are presented before

randomization)†.

Treatment Time (min) Temp (oC) Rate (mL/min) Yield (g)

1 80 (-1) 140 (-1) 4 (-1) 72.6

2 100 (+1) 140 (-1) 4 (-1) 82.5

3 80 (-1) 150 (+1) 4 (-1) 86.0

4 100 (+1) 150 (+1) 4 (-1) 75.9

5 80 (-1) 140 (-1) 6 (+1) 79.1

6 100 (+1) 140 (-1) 6 (+1) 82.1

7 80 (-1) 150 (+1) 6 (+1) 88.2

8 100 (+1) 150 (+1) 6 (+1) 79.0

9 90 (0) 145 (0) 5 (0) 87.1

10 90 (0) 145 (0) 5 (0) 85.7

11 90 (0) 145 (0) 5 (0) 87.8

12 90 (0) 145 (0) 5 (0) 84.2

†Data obtained from Anderson, M.J., and P.J. Whitcomb. 2005. RSM simplified, optimizing

processes using response surface methods for design of experiments. Productivity Press, New

York.

2. Central Composite Design

This design allows you to get a better estimate of curvature that may be occurring in

your model.

http://www.mathworks.com/help/toolbox/stats/cc1.gif

Page 12: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

The model for a three factor study would be:

Example of data for a 23 factorial with the Central Composite Design (treatments are

presented before randomization)†.

Treatment Time (min) Temp oC Rate (mL/min) Yield (g)

1 80 140 4 72.6

2 100 140 4 82.5

3 80 150 4 86.0

4 100 150 4 75.9

5 80 140 6 79.1

6 100 140 6 82.1

7 80 150 6 88.2

8 100 150 6 79.0

9 90 145 5 87.1

10 90 145 5 85.7

11 90 145 5 87.8

12 90 145 5 84.2

13 73 145 5 79.1

14 107 145 5 82.1

15 90 137 5 88.2

16 90 153 5 79.0

17 90 145 3.3 87.1

18 90 145 6.7 85.7

19 90 145 5 87.8

20 90 145 5 84.2

†Data obtained from Anderson, M.J., and P.J. Whitcomb. 2005. RSM simplified, optimizing

processes using response surface methods for design of experiments. Productivity Press, New

York.

Desirable Results from RSM

Main effects for the parameters entered in the model are all significant.

o If some parameters are not significant, drop them from the model and reanalyze

the data.

Lack of fit is non-significant.

o If lack of fit is significant, you will need to use a more complex model.

A minimum or maximum point is identified.

Page 13: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

o If a minimum or maximum value is not obtained, then you will need to redo the

experiments with treatment levels that are closer to the optimum.

Using SAS for the Response Surface Analysis of CCD Data

SAS commands

options pageno=1;

data rsreg;

input v1 v2 x1 x2 yield;

label v1='original_variable_1' v2='original_variable_2' x1='coded_variable_1'

x2='coded_variable_2';

datalines;

80 170 -1 -1 76.5

80 180 -1 1 77

90 170 1 -1 78

90 180 1 1 79.5

85 175 0 0 79.9

85 175 0 0 80.3

85 175 0 0 80

85 175 0 0 79.7

85 175 0 0 79.8

92.07 175 1.414 0 78.4

77.93 175 1.414 0 75.6

85 182.07 0 1.414 78.5

85 167.93 0 1.414 77

;;

ods rtf file='rsreg.rtf';

ods graphics on;

proc print label;

title 'Example of Response Surface Regression with Two Independent Variables';

run;

proc rsreg data=rsreg

plots (unpack)=surface (3d);

model yield =v1 v2/lackfit;

run;

ods graphics off;

ods rtf close;

Page 14: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

Example of Response Surface Regression with Two Independent Variables

07:02 Thursday, June 21, 2012 1

Obs original_variable_1 original_variable_2 coded_variable_1 coded_variable_2 yield

1 80.00 170.00 -1.000 -1.000 76.5

2 80.00 180.00 -1.000 1.000 77.0

3 90.00 170.00 1.000 -1.000 78.0

4 90.00 180.00 1.000 1.000 79.5

5 85.00 175.00 0.000 0.000 79.9

6 85.00 175.00 0.000 0.000 80.3

7 85.00 175.00 0.000 0.000 80.0

8 85.00 175.00 0.000 0.000 79.7

9 85.00 175.00 0.000 0.000 79.8

10 92.07 175.00 1.414 0.000 78.4

11 77.93 175.00 1.414 0.000 75.6

12 85.00 182.07 0.000 1.414 78.5

13 85.00 167.93 0.000 1.414 77.0

Page 15: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

Example of Response Surface Regression with Two Independent Variables

The RSREG Procedure

07:02 Thursday, June 21, 2012 2

Coding Coefficients for the Independent Variables

Factor Subtracted off Divided by

v1 85.000000 7.070000

v2 175.000000 7.070000

Response Surface for Variable yield

Response Mean 78.476923

Root MSE 0.266290

R-Square 0.9827

Coefficient of Variation

0.3393

Regression DF Type I Sum of Squares R-Square F Value Pr > F

Linear 2 10.042955 0.3494 70.81 <.0001

Quadratic 2 17.953749 0.6246 126.59 <.0001

Crossproduct 1 0.250000 0.0087 3.53 0.1025

Total Model 5 28.246703 0.9827 79.67 <.0001

Residual DF Sum of

Squares Mean Square F Value Pr > F

Lack of Fit 3 0.284373 0.094791 1.79 0.2886

Pure Error 4 0.212000 0.053000

Total Error 7 0.496373 0.070910

Collectively, the linear and quadratic components of the model are significant, but the interaction was not.

Lack of fit is non-

significant, indicating

that a more complex

model is not needed

Page 16: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

Example of Response Surface Regression with Two Independent Variables

The RSREG Procedure

07:02 Thursday, June 21, 2012 3

Parameter DF Estimate Standard

Error t Value Pr > |t|

Parameter Estimate

from Coded Data

Intercept 1 -1430.688438 152.851334 -9.36 <.0001 79.939955

v1 1 7.808865 1.157823 6.74 0.0003 1.407001

v2 1 13.271745 1.484606 8.94 <.0001 0.728497

v1*v1 1 -0.055058 0.004039 -13.63 <.0001 -2.752067

v2*v1 1 0.010000 0.005326 1.88 0.1025 0.499849

v2*v2 1 -0.040053 0.004039 -9.92 <.0001 -2.002067

Factor DF Sum of

Squares Mean

Square F Value Pr > F Label

v1 3 21.344008 7.114669 100.33 <.0001 original_variable_1

v2 3 9.345251 3.115084 43.93 <.0001 original_variable_2

𝑌𝑖𝑒𝑙𝑑 4 9 (𝑣 ) (𝑣 ) 𝑣 (𝑣 𝑣 ) 4(𝑣 ) Even though some of the components of the model are non-significant, a common rule of thumb is RSM is to include all parameters.

Page 17: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

Example of Response Surface Regression with Two Independent Variables

The RSREG Procedure

Canonical Analysis of Response Surface Based on Coded Data

07:02 Thursday, June 21, 2012 4

Factor

Critical Value

Label Coded Uncoded

v1 0.275269 86.946152 original_variable_1

v2 0.216299 176.529233 original_variable_2

Predicted value at stationary point: 80.212393

Eigenvalues

Eigenvectors

v1 v2

-1.926415 0.289717 0.957112

-2.827719 0.957112 -0.289717

Stationary point is a maximum.

The maximum yield is obtained when v1=86.95 and v2=176.53.

Page 18: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

RSreg with all parameters

The RSREG Procedure

Page 19: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

3. Box-Behnken Design (BBD)

Constructed by combining two-level factorial designs with incomplete block designs,

and then adding a specified number of center points.

The BBD is used to fit second order models.

The figure below shows the points in space for the BBD, including replicated center

points.

http://ars.sciencedirect.com/content/image/1-s2.0-S0584854705000224-gr1.jpg

Page 20: NON-LINEAR REGRESSION Introduction Quite variation in the ... and...2005. RSM simplified, optimizing processes using response surface methods for design of experiments. Productivity

An example of coefficients for a three factor BBD

Runs A B C

1 -1 -1 0

2 +1 -1 0

3 -1 +1 0

4 +1 +1 0

5 -1 0 -1

6 +1 0 -1

7 -1 0 +1

8 +1 0 +1

9 0 -1 -1

10 0 -1 -1

11 0 +1 +1

12 0 +1 +1

13 0 0 0

14 0 0 0

15 0 0 0

16 0 0 0

17 0 0 0

The BBD can handle more than three factors, but additional treatments are needed. The

options of using blocks also can be incorporated.

Factors BBD Runs (Center points) BBD Blocks

3 17 (5) 1

4 29 (5) 1 or 3

5 46 (6) 1 or 2

6 54 (6) 1 or 2

7 62 (6) 1 or 2