non-linear regression introduction quite variation in the ... and...2005. rsm simplified, optimizing...
TRANSCRIPT
NON-LINEAR REGRESSION
Introduction
Quite often in regression a straight line is not the “best” model for explaining the
variation in the dependent variable.
A model that includes quadratic or higher order terms may be needed.
The number of possible comparisons is equal to the number of levels of a factor
minus one.
For example, if there are three levels of a factor, there are two possible comparisons.
Polynomials are equations such that each is associated with a power of the independent variable (e.g. X, linear; X2, quadratic; X3, cubic, etc.).
1st order comparisons measure linear relationships. 2nd order comparisons measures quadratic relationships. 3rd order comparisons measures cubic relationships.
Example Effect of row spacing on yield (bu/ac) of soybean. Row spacing (inches)
Block 18 24 30 36 42 ΣY.j
1 33.6 31.1 33.0 28.4 31.4 157.5
2 37.1 34.5 29.5 29.9 28.3 159.3
3 34.1 30.5 29.2 31.6 28.9 154.3
4 34.6 32.7 30.7 32.3 28.6 158.9
5 35.4 30.7 30.7 28.1 29.6 154.5
6 36.1 30.3 27.9 26.9 33.4 154.6
Yi. 210.9 189.8 181.0 177.2 180.2 939.1
.Yi 35.15 31.63 30.17 29.53 30.03 31.3
Step 1. Determine the polynomials that can be included in the model
5 rows spacing’s, so we can use X, X2, X3, and X4.
Step 2. Run the Stepwise Regression analysis. Input row block yield row2=row*row; row3=row2*row; row4=row3*row; datalines;
Proc STEPWISE; Model=row row2 row3 row4; Run; Step 3: Determine which parameters should remain in the model. Row and Row2 Step 4: Rerun the analysis using the parameters that contributed significantly to the model using the Proc REG command and test for lack of fit. Proc REG Model=yield row row2/LACKFIT; RUN;
( ) ( )
28
30
32
34
36
18 24 30 36 42
Effect of row spacing on soybean yield
Row spacing (inches)
Yie
ld (
bu
/ac)
Analysis using stepwise regression
The STEPWISE Procedure
Model: MODEL1
Dependent Variable: yield
07:02 Thursday, June 21, 2012 1
Number of Observations Read 30
Number of Observations Used 30
Stepwise Selection: Step 1
Variable row Entered: R-Square = 0.4452 and C(p) = 9.8393
Analysis of Variance
Source DF
Sum of
Squares
Mean
Square F Value Pr > F
Model 1 91.26667 91.26667 22.47 <.0001
Error 28 113.72300 4.06154
Corrected Total 29 204.98967
Variable Parameter
Estimate
Standard
Error Type II SS F Value Pr > F
Intercept 37.47000 1.35192 3120.00200 768.18 <.0001
row -0.20556 0.04336 91.26667 22.47 <.0001
Bounds on condition number: 1, 1
Stepwise Selection: Step 2
Variable row2 Entered: R-Square = 0.6096 and C(p) = 1.2210
Analysis using stepwise regression
The STEPWISE Procedure
Model: MODEL1
Dependent Variable: yield
Stepwise Selection: Step
2
07:02 Thursday, June 21, 2012 2
Analysis of Variance
Source DF Sum of
Squares
Mean
Square F Value Pr > F
Model 2 124.96000 62.48000 21.08 <.0001
Error 27 80.02967 2.96406
Corrected Total 29 204.98967
Variable
Parameter
Estimate
Standard
Error Type II SS F Value Pr > F
Intercept 52.03667 4.47218 401.29928 135.39 <.0001
row -1.26111 0.31526 47.42972 16.00 0.0004
row2 0.01759 0.00522 33.69333 11.37 0.0023
Bounds on condition number: 72.429, 289.71
All variables left in the model are significant at the 0.1500 level.
No other variable met the 0.1500 significance level for entry into the model.
Summary of Stepwise Selection
Step
Variable
Entered
Variable
Removed
Number
Vars In
Partial
R-Square
Model
R-Square C(p) F Value Pr > F
1 row 1 0.4452 0.4452 9.8393 22.47 <.0001
2 row2 2 0.1644 0.6096 1.2210 11.37 0.0023
Analysis using Proc REG and LACKFIT
Number of Observations Read 30
Number of Observations Used 30
Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error t Value Pr > |t|
Intercept 1 52.03667 4.47218 11.64 <.0001
row 1 -1.26111 0.31526 -4.00 0.0004
row2 1 0.01759 0.00522 3.37 0.0023
Analysis of Variance
Source DF
Sum of
Squares
Mean
Square F Value Pr > F
Model 2 124.96000 62.48000 21.08 <.0001
Error 27 80.02967 2.96406
Lack of Fit 2 0.70133 0.35067 0.11 0.8958
Pure Error 25 79.32833 3.17313
Corrected Total 29 204.98967
Root MSE 1.72165 R-Square 0.6096
Dependent Mean 31.30333 Adj R-Sq 0.5807
Coeff Var 5.49988
Lack of fit is NS, indicating that the model is worth considering.
RESPONSE SURFACE REGRESSION OR MODELING (RSM)
Introduction
A form of multivariate non-linear regression where the influences of several independent
or “response” variables on a dependent variable are determined.
The goal of RSM is typically to optimize a response.
In most cases the relationship between the response and dependent variable are unknown;
thus, it is necessary to obtain an estimate of the effects.
Quite often, a “first order model” in the form of:
is determined.
If there is a non-linear effect, the a “second order model” in the form of:
∑ ∑ ∑∑
is determined.
The least squares method is used to estimate the parameters in the models.
Common Results Obtained From RSM
1. Simple maximum
http://www.google.com/imgres?q=response+surface+plot&hl=en&biw=1916&bih=1070&gbv=2&tbm=isch&tbnid=enwaXmtFKVsjPM:&i
mgrefurl=http://www.ualberta.ca/~csps/JPPS5(3)/P.Ellaiah/alkaline.htm&docid=pllnWaKwfSsfmM&imgurl=http://www.ualberta.ca/~csps/J
PPS5(3)/P.Ellaiah/Figure-
2.gif&w=545&h=502&ei=IfnMT_ONPMbL0QGr28jADg&zoom=1&iact=hc&vpx=425&vpy=127&dur=642&hovh=215&hovw=234&tx=120&ty=118&sig=102023753706153885228&page=1&tbnh=110&tbnw=117&start=0&ndsp=59&ved=1t:429,r:2,s:0, i:77
2. Simple minimum
3. Ridge
http://www.google.com/imgres?q=response+surface+plot&hl=en&biw=1916&bih=1070&gbv=2&tbm=isch&tbnid=ypHy-
aZPpzs4HM:&imgrefurl=http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/statug_rsreg_sect005.htm&docid=1K_vs
jsQdRZ0GM&imgurl=http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/images/rsrgd.png&w=640&h=480&ei=IfnMT_ONPMbL0QGr28jADg&zoom=1
http://2.bp.blogspot.com/_uf8HSnevUy8/SKB3i_uk2AI/AAAAAAAAAEE/k05DBmHPGmM/s320/Picture4.jpg
4. Saddle
Basic Steps in RSM
Step 1: Determine which parameters may have an influence on the variation in the
dependent variable.
o You may have known factors determined from previous experiments.
o You may need to conduct additional simple 2n factorial experiments to identify
additional parameters that may be important.
These types of 2n factorial experiments are considered low resolution.
Step 2: Determine which parameters you want to continue and whether or not you will
have a first order (no curvature) or second order (curvature) model.
o First order model: .
o Second order model: ∑ ∑ ∑∑
Step 3: Develop the Response Surface Model.
o This is an optimization process.
http://www.pqsystems.com/products/sixsigma/DOEpack/images/ResponseSurfacePlotLarge.gif
o Parameters can be best estimated if proper experimental design (response surface
designs) are used.
o A common design for 2-factor models is the Central Composite Design.
o A common design for the 3-factor model is the Box-Behnken Design.
Step4: Validate the model.
Cautions on Using Happenstance Data for RSM
It is often very tempting to use data we have already collected from an experiment and
perform RSM on these data; however, the results obtained may not be reliable.
Happenstance data are typically obtained from experiments where:
o The process being maximized is typically highly controlled so inputs and outputs
vary little.
o Inputs tend to be highly correlated.
Just as we do in many experiments where we are trying to detect differences between
treatment means, choice of experimental design is very important when setting up
experiments for RSM.
Experimental Design for RSM Experiments
o Some researchers often use the One Factor At-A-Time Approach, where they maximize
one factor, fix the factor at this level and then maximize a second variable.
o For example, lets say we wish to conduct an experiment to determine the proper nitrogen
(N) fertilizer rate and plant population to maximize sugar yield in sugarbeet.
o The first step would be to determine the N rate that maximizes sugar yield. Using
this N rate, conduct another experiment to determine what plant population now
maximizes sugar yield.
o The problem with this type of approach is that you often end up with an
increasing ridge type of response. You may be on the top of the ridge, buy not at
the maximum on the plot.
1. Two-level factorial design with center points
Extremely useful when you believe there are interactions between factors.
Let’s say we have a three factor experiments, with two levels of each factor
o Time (minutes): 80, 90, and 100
o Temperature (oC): 140, 145, and 150
o Rate of adding chemical (mL/min): 4, 5, and 6
o The model would be :
o This first order model will not provide an estimate of curvature.
o A strict factorial would require us to use 27 treatments (3 x 3 x 3) or runs of
the experiment.
o Another option is available where we repeat the center points but not the
outside points of the cube.
o This design allowed us to use only 12 runs and experimental error is estimated
based on the replicated center points.
o To simplify the analysis, the data are typically coded to ease model fitting and
coefficient interpretation.
http://courseware.ee.calpoly.edu/~dbraun/papers/K155Fig1a.gif
Example of data for a 23
factorial with center points (treatments are presented before
randomization)†.
Treatment Time (min) Temp (oC) Rate (mL/min) Yield (g)
1 80 (-1) 140 (-1) 4 (-1) 72.6
2 100 (+1) 140 (-1) 4 (-1) 82.5
3 80 (-1) 150 (+1) 4 (-1) 86.0
4 100 (+1) 150 (+1) 4 (-1) 75.9
5 80 (-1) 140 (-1) 6 (+1) 79.1
6 100 (+1) 140 (-1) 6 (+1) 82.1
7 80 (-1) 150 (+1) 6 (+1) 88.2
8 100 (+1) 150 (+1) 6 (+1) 79.0
9 90 (0) 145 (0) 5 (0) 87.1
10 90 (0) 145 (0) 5 (0) 85.7
11 90 (0) 145 (0) 5 (0) 87.8
12 90 (0) 145 (0) 5 (0) 84.2
†Data obtained from Anderson, M.J., and P.J. Whitcomb. 2005. RSM simplified, optimizing
processes using response surface methods for design of experiments. Productivity Press, New
York.
2. Central Composite Design
This design allows you to get a better estimate of curvature that may be occurring in
your model.
http://www.mathworks.com/help/toolbox/stats/cc1.gif
The model for a three factor study would be:
Example of data for a 23 factorial with the Central Composite Design (treatments are
presented before randomization)†.
Treatment Time (min) Temp oC Rate (mL/min) Yield (g)
1 80 140 4 72.6
2 100 140 4 82.5
3 80 150 4 86.0
4 100 150 4 75.9
5 80 140 6 79.1
6 100 140 6 82.1
7 80 150 6 88.2
8 100 150 6 79.0
9 90 145 5 87.1
10 90 145 5 85.7
11 90 145 5 87.8
12 90 145 5 84.2
13 73 145 5 79.1
14 107 145 5 82.1
15 90 137 5 88.2
16 90 153 5 79.0
17 90 145 3.3 87.1
18 90 145 6.7 85.7
19 90 145 5 87.8
20 90 145 5 84.2
†Data obtained from Anderson, M.J., and P.J. Whitcomb. 2005. RSM simplified, optimizing
processes using response surface methods for design of experiments. Productivity Press, New
York.
Desirable Results from RSM
Main effects for the parameters entered in the model are all significant.
o If some parameters are not significant, drop them from the model and reanalyze
the data.
Lack of fit is non-significant.
o If lack of fit is significant, you will need to use a more complex model.
A minimum or maximum point is identified.
o If a minimum or maximum value is not obtained, then you will need to redo the
experiments with treatment levels that are closer to the optimum.
Using SAS for the Response Surface Analysis of CCD Data
SAS commands
options pageno=1;
data rsreg;
input v1 v2 x1 x2 yield;
label v1='original_variable_1' v2='original_variable_2' x1='coded_variable_1'
x2='coded_variable_2';
datalines;
80 170 -1 -1 76.5
80 180 -1 1 77
90 170 1 -1 78
90 180 1 1 79.5
85 175 0 0 79.9
85 175 0 0 80.3
85 175 0 0 80
85 175 0 0 79.7
85 175 0 0 79.8
92.07 175 1.414 0 78.4
77.93 175 1.414 0 75.6
85 182.07 0 1.414 78.5
85 167.93 0 1.414 77
;;
ods rtf file='rsreg.rtf';
ods graphics on;
proc print label;
title 'Example of Response Surface Regression with Two Independent Variables';
run;
proc rsreg data=rsreg
plots (unpack)=surface (3d);
model yield =v1 v2/lackfit;
run;
ods graphics off;
ods rtf close;
Example of Response Surface Regression with Two Independent Variables
07:02 Thursday, June 21, 2012 1
Obs original_variable_1 original_variable_2 coded_variable_1 coded_variable_2 yield
1 80.00 170.00 -1.000 -1.000 76.5
2 80.00 180.00 -1.000 1.000 77.0
3 90.00 170.00 1.000 -1.000 78.0
4 90.00 180.00 1.000 1.000 79.5
5 85.00 175.00 0.000 0.000 79.9
6 85.00 175.00 0.000 0.000 80.3
7 85.00 175.00 0.000 0.000 80.0
8 85.00 175.00 0.000 0.000 79.7
9 85.00 175.00 0.000 0.000 79.8
10 92.07 175.00 1.414 0.000 78.4
11 77.93 175.00 1.414 0.000 75.6
12 85.00 182.07 0.000 1.414 78.5
13 85.00 167.93 0.000 1.414 77.0
Example of Response Surface Regression with Two Independent Variables
The RSREG Procedure
07:02 Thursday, June 21, 2012 2
Coding Coefficients for the Independent Variables
Factor Subtracted off Divided by
v1 85.000000 7.070000
v2 175.000000 7.070000
Response Surface for Variable yield
Response Mean 78.476923
Root MSE 0.266290
R-Square 0.9827
Coefficient of Variation
0.3393
Regression DF Type I Sum of Squares R-Square F Value Pr > F
Linear 2 10.042955 0.3494 70.81 <.0001
Quadratic 2 17.953749 0.6246 126.59 <.0001
Crossproduct 1 0.250000 0.0087 3.53 0.1025
Total Model 5 28.246703 0.9827 79.67 <.0001
Residual DF Sum of
Squares Mean Square F Value Pr > F
Lack of Fit 3 0.284373 0.094791 1.79 0.2886
Pure Error 4 0.212000 0.053000
Total Error 7 0.496373 0.070910
Collectively, the linear and quadratic components of the model are significant, but the interaction was not.
Lack of fit is non-
significant, indicating
that a more complex
model is not needed
Example of Response Surface Regression with Two Independent Variables
The RSREG Procedure
07:02 Thursday, June 21, 2012 3
Parameter DF Estimate Standard
Error t Value Pr > |t|
Parameter Estimate
from Coded Data
Intercept 1 -1430.688438 152.851334 -9.36 <.0001 79.939955
v1 1 7.808865 1.157823 6.74 0.0003 1.407001
v2 1 13.271745 1.484606 8.94 <.0001 0.728497
v1*v1 1 -0.055058 0.004039 -13.63 <.0001 -2.752067
v2*v1 1 0.010000 0.005326 1.88 0.1025 0.499849
v2*v2 1 -0.040053 0.004039 -9.92 <.0001 -2.002067
Factor DF Sum of
Squares Mean
Square F Value Pr > F Label
v1 3 21.344008 7.114669 100.33 <.0001 original_variable_1
v2 3 9.345251 3.115084 43.93 <.0001 original_variable_2
𝑌𝑖𝑒𝑙𝑑 4 9 (𝑣 ) (𝑣 ) 𝑣 (𝑣 𝑣 ) 4(𝑣 ) Even though some of the components of the model are non-significant, a common rule of thumb is RSM is to include all parameters.
Example of Response Surface Regression with Two Independent Variables
The RSREG Procedure
Canonical Analysis of Response Surface Based on Coded Data
07:02 Thursday, June 21, 2012 4
Factor
Critical Value
Label Coded Uncoded
v1 0.275269 86.946152 original_variable_1
v2 0.216299 176.529233 original_variable_2
Predicted value at stationary point: 80.212393
Eigenvalues
Eigenvectors
v1 v2
-1.926415 0.289717 0.957112
-2.827719 0.957112 -0.289717
Stationary point is a maximum.
The maximum yield is obtained when v1=86.95 and v2=176.53.
RSreg with all parameters
The RSREG Procedure
3. Box-Behnken Design (BBD)
Constructed by combining two-level factorial designs with incomplete block designs,
and then adding a specified number of center points.
The BBD is used to fit second order models.
The figure below shows the points in space for the BBD, including replicated center
points.
http://ars.sciencedirect.com/content/image/1-s2.0-S0584854705000224-gr1.jpg
An example of coefficients for a three factor BBD
Runs A B C
1 -1 -1 0
2 +1 -1 0
3 -1 +1 0
4 +1 +1 0
5 -1 0 -1
6 +1 0 -1
7 -1 0 +1
8 +1 0 +1
9 0 -1 -1
10 0 -1 -1
11 0 +1 +1
12 0 +1 +1
13 0 0 0
14 0 0 0
15 0 0 0
16 0 0 0
17 0 0 0
The BBD can handle more than three factors, but additional treatments are needed. The
options of using blocks also can be incorporated.
Factors BBD Runs (Center points) BBD Blocks
3 17 (5) 1
4 29 (5) 1 or 3
5 46 (6) 1 or 2
6 54 (6) 1 or 2
7 62 (6) 1 or 2