2014. engineers often: regress data analysis fit to theory data reduction use the regression of...
TRANSCRIPT
Regression Statistics
2014
Engineers and Regression
Engineers often:Regress data
Analysis Fit to theory Data reduction
Use the regression of others
Antoine Equation DIPPR
0 2 4 6 8 1012141618-5
0
5
10
15
20
25
X
Y
We need to be able to report uncertainties associated with regression.
• Do the data fit the model?• What are the errors in the
prediction?• What are the errors in the
parameters?
Linear RegressionThere are two classes of regressions
Linear Non-linear
“Linear” refers to the parameters, not the functional dependence of the independent variable
You can use the Mathcad function “linfit” on linear equations
Linear RegressionThere are two classes of regressions
Linear Non-linear
“Linear” refers to the parameters, not the functional dependence of the independent variable
You can use the Mathcad function “linfit” on linear equations
cbxaxy 2
bxaey
543 T
e
T
d
T
c
T
bay
CT
BAy exp
bmxy
Quiz1.
2.
3.
4.
5.
Straight Line Model
0 2 4 6 8 10 12 14 16 18-5
0
5
10
15
20
25 residual (error)
“X” data“Y”
Measured
slope
Intercept
“Y” Predicted
Straight Line Model
residual (error)
iX iY iY ie“X” data
“Y” data
intercept 0.92291455 slope 0.516173934
1 2.749032178 1.439088483 1.3099436942 3.719910224 2.362003033 1.3579071923 0.925995017 3.284917582 -2.358922574 2.623482686 4.207832132 -1.584349455 6.539797342 5.130746681 1.4090506616 6.779909177 6.053661231 0.7262479467 4.946150401 6.976575781 -2.030425388 9.674178069 7.89949033 1.7746877399 7.61959821 8.82240488 -1.2028066710 7.650020996 9.745319429 -2.0952984311 11.514 10.66823398 0.84576602112 13.18285068 11.59114853 1.59170215213 13.28173635 12.51406308 0.76767327514 13.60444592 13.43697763 0.1674682915 12.79535218 14.35989218 -1.5645416 17.82374778 15.28280673 2.54094105617 14.55068379 16.20572128 -1.65503748
42.76608602 SSE
“Y” predicted
sum squared error
mean squared error
Number of fitted parameters: 2 for a two-parameter model
The R2 StatisticA useful statistic but not
definitive
Tells you how well the data fit the model.
It does not tell you if the model is correct.
How much of the distribution of the data about the mean is
described by the model.
Problems with R2
3 5 7 9 11 13 15 17 193
5
7
9
11
13
f(x) = 0.500090909090909 x + 3.00009090909091R² = 0.666542459508775
X
Y
3 5 7 9 11 13 15 17 193
5
7
9
11
13f(x) = 0.499909090909091 x + 3.00172727272727R² = 0.666707256898465
X
Y
3 5 7 9 11 13 15 17 193
5
7
9
11
13
f(x) = 0.499727272727273 x + 3.00245454545455R² = 0.666324041066559
X
Y
3 5 7 9 11 13 15 17 193
5
7
9
11
13
f(x) = 0.5 x + 3.00090909090909R² = 0.666242033727484
X
Y
Residuals (ei) should be normally distributed
3 5 7 9 11 13 15 17 193
5
7
9
11
13
f(x) = 0.500090909090909 x + 3.00009090909091R² = 0.666542459508775
X
Y1 2 3 4 5 6 7 8 9 10 11
-2.5-2
-1.5-1
-0.50
0.51
1.52
2.5
e
3 5 7 9 11 13 15 17 193
5
7
9
11
13
f(x) = 0.499727272727273 x + 3.00245454545455R² = 0.666324041066559
X
Y
1 2 3 4 5 6 7 8 9 10 11
-1.5-1
-0.50
0.51
1.52
2.53
3.5
e
3 5 7 9 11 13 15 17 193
5
7
9
11
13f(x) = 0.499909090909091 x + 3.00172727272727R² = 0.666707256898465
X
Y1 2 3 4 5 6 7 8 9 10 11
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
e
3 5 7 9 11 13 15 17 193
5
7
9
11
13
f(x) = 0.5 x + 3.00090909090909R² = 0.666242033727484
X
Y 1 2 3 4 5 6 7 8 9 10 11
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
e
Residuals (ei) should be normally distributed
Other Questions About Fit
How well does the line fit the data at each point (not just the mean)?
In what range should the data lie (are there outliers)?
What are the confidence intervals on the slope and intercept?
20
15
10
5
020151050
Coefficient values ± 95% Confidence Intervala =3.0001 ± 2.54b =0.50009 ± 0.267
20
15
10
5
020151050
Coefficient values ± 95% Confidence Intervala =3.0017 ± 2.54b =0.49991 ± 0.267
Confidence Intervals from Example
Green is confidence intervalHow well does the model fit the data?
Red is the prediction bandWhere should the data fall? Can I throw out any points?
Statistics on the Slope/Intercept
When you fit data to a straight line, the slope and intercept are only estimates of the true slope and intercept.
(1-a)100%Confidence Intervals
Standard Errors
(slope)
(intercept)
Confidence Interval on the Prediction & Expected Range of Data
0 2 4 6 8 10 12 14 16 18-5
0
5
10
15
20
25
f(x) = 0.922914602046951 x + 0.516173934846863R² = 0.890424511795483
Data
Excel Trendline
Least Squares Fit
95% CI for Prediction
Expected Range of Data (95%)
X
Y
Given a specific value for x, , what is the error in ?
Confidence Interval on the Prediction
If I take more data, where will the data fall? (Where will be found if measured at ?)
Prediction Band:Expected Range of Data
Confidence Interval vs. Prediction Band
CONFIDENCE INTERVAL
“Mean Prediction Band”
Shows possible errors in where the line will go
Interval narrows with increasing number of data points
See equation on previous slide
PREDICTION BAND
“Single (another) Point Prediction Band”Shows where the data
should lie Interval does not
narrow much with increasing number of data points See equation on previous
slide
Good News:IGOR!
Igor has the equations already programmed and will
• Plot the confidence intervals• Plot the prediction bands• Print the confidence intervals on the slope and
intercept
Assignment
Answers
60
55
50
45
40
35
30
25
Dis
tanc
e (c
m)
1086420
Time (s)
Coefficient values ± 95% Confidence Intervala =30.504 ± 0.766b =2.1468 ± 0.143
Data Curve Fit Confidence Intervals Prediction bands
1.20
1.15
1.10
1.05
1.00
Cp
(kJ
kg-1
K-1
)
1000900800700600500400300
Temperature (K)
Coefficient values ± 95% Confidence IntervalK0 =1.0015 ± 0.0509K1 =7.2738e-005 ± 0.000169K2 =9.4048e-008 ± 1.29e-007
Data Curve Fit Confidence Intervals Prediction Bands
Example Using Excel
Generalized Linear Regression
Linear regression can be written in matrix form. X Y21 18624 21432 28847 42550 45559 53968 62274 67562 56250 45341 37030 274
Straight Line Model
Quadratic Model
Statistics with Matrices
ParameterConfidence Intervals
Predicted VariableConfidence Intervals
Standard Error of bi is the square root of the i-th diagonal term of the matrix