12/17/2015 1 linear regression dr. a. emamzadeh. 2 what is regression? what is regression? given n...
TRANSCRIPT
04/21/231
Linear RegressionLinear Regression
Dr. A. Emamzadeh
2
What is Regression?
What is regression? Given n data points
),( , ... ),,(),,( 2211 nn yxyx yx
best fit )(xfy to the data. The best fit is generally based onminimizing the sum of the square of the
residuals, rS
Residual at a point is
)( iii xfy
n
i
iir xfyS1
2))((),( 11 yx
),( nn yx
)(xfy
Figure. Basic model for regression
Sum of the square of the residuals
.
3
Linear Regression-Criterion#1
),( , ... ),,(),,( 2211 nn yxyx yx
Given n data points
best fit xaay 10
to the data.
Does minimizing
n
ii
1
work as a criterion, where )( 10 iii xaay
x
iiixaay
10
11, yx
22, yx
33, yx
nnyx ,
iiyx ,
iiixaay
10
y
Figure. Linear regression of y vs. x data showing residuals at a typical
point, xi .
4
Example for Criterion#1
xy
2.04.0
3.06.0
2.06.0
3.08.0
Example: Given the data points (2,4), (3,6), (2,6) and (3,8), best fit the data to a straight line using Criterion#1
Figure. Data points for y vs. x data.
Table. Data Points
0
2
4
6
8
10
0 1 2 3 4
x
y
5
Linear Regression-Criteria#1
04
1
i
i
xyypredictedε = y - ypredicted
2.04.04.00.0
3.06.08.0-2.0
2.06.04.02.0
3.08.08.00.0
Table. Residuals at each point for regression model y = 4x – 4.
Figure. Regression curve for y=4x-4, y vs. x data
0
2
4
6
8
10
0 1 2 3 4
x
y
Using y=4x-4 as the regression curve
6
Linear Regression-Criteria#1
xyypredictedε = y - ypredicted
2.04.06.0-2.0
3.06.06.00.0
2.06.06.00.0
3.08.06.02.0
04
1
i
i
0
2
4
6
8
10
0 1 2 3 4
x
y
Table. Residuals at each point for y=6
Figure. Regression curve for y=6, y vs. x data
Using y=6 as a regression curve
7
Linear Regression – Criterion #1
04
1
i
i for both regression models of y=4x-4 and y=6.
The sum of the residuals is as small as possible, that is zero, but the regression model is not unique.
Hence the above criterion of minimizing the sum of the residuals is a bad criterion.
8
Linear Regression-Criterion#2
x
iiixaay
10
11, yx
22, yx
33, yx
nnyx ,
iiyx ,
iiixaay
10
y
Figure. Linear regression of y vs. x data showing residuals at a typical
point, xi .
Will minimizing
n
ii
1
work any better?
9
Linear Regression-Criteria 2
xyypredicted|ε| = |y - ypredicted|
2.04.04.00.0
3.06.08.02.0
2.06.04.02.0
3.08.08.00.0
0
2
4
6
8
10
0 1 2 3 4
x
y
Table. The absolute residuals employing the y=4x-4 regression model
Figure. Regression curve for y=4x-4, y vs. x data
44
1
i
i
Using y=4x-4 as the regression curve
10
Linear Regression-Criteria#2
xyypredicted|ε| = |y – ypredicted|
2.04.06.02.0
3.06.06.00.0
2.06.06.00.0
3.08.06.02.0
44
1
i
i
Table. Absolute residuals employing the y=6 model
0
2
4
6
8
10
0 1 2 3 4
x
y
Figure. Regression curve for y=6, y vs. x data
Using y=6 as a regression curve
11
Linear Regression-Criterion#2
Can you find a regression line for which
44
1
i
i and has unique
regression coefficients?
for both regression models of y=4x-4 and y=6.
The sum of the errors has been made as small as possible, that is 4, but the regression model is not unique.
Hence the above criterion of minimizing the sum of the absolute value of the residuals is also a bad criterion.
44
1
i
i
12
Least Squares Criterion
The least squares criterion minimizes the sum of the square of the residuals in the model, and also produces a unique line.
2
110
1
2
n
iii
n
iir xaayS
x
iiixaay
10
11, yx
22, yx
33, yx
nnyx ,
iiyx ,
iiixaay
10
y
Figure. Linear regression of y vs. x data showing residuals at a typical
point, xi .
13
Finding Constants of Linear Model
2
110
1
2
n
iii
n
iir xaayS Minimize the sum of the square of the
residuals:To find
0121
100
n
iii
r xaaya
S
021
101
n
iiii
r xxaaya
S
giving
i
n
iii
n
ii
n
i
xyxaxa
1
2
11
10
0a and
1a we minimize
with respect to
1a 0aandrS
.
n
iii
n
i
n
i
yxaa11
11
0
)( 10 xaya
14
Finding Constants of Linear Model
0a
Solving for
2
11
2
1111
n
ii
n
ii
n
ii
n
ii
n
iii
xxn
yxyxn
a
and
2
11
2
1111
2
0
n
ii
n
ii
n
iii
n
ii
n
ii
n
ii
xxn
yxxyx
a
1aand directly yields,
)( 10 xaya
15
Example 1
The torque, T needed to turn the torsion spring of a mousetrap through an angle, is given below.
Angle, θTorque, T
RadiansN-m
0.6981320.188224
0.9599310.209138
1.1344640.230052
1.5707960.250965
1.9198620.313707
Table: Torque vs Angle for a torsional spring
Find the constants for the model given by21 kkT
Figure. Data points for Angle vs. Torque data
0.1
0.2
0.3
0.4
0.5 1 1.5 2
θ (radians)
To
rqu
e (
N-m
)
16
Example 1 cont.
1a
The following table shows the summations needed for the calculations of the constants in the regression model.
2 TRadiansN-mRadians2N-m-Radians
0.6981320.1882240.4873880.131405
0.9599310.2091380.9214680.200758
1.1344640.2300521.28700.260986
1.5707960.2509652.46740.394215
1.9198620.3137073.68590.602274
6.28311.19218.84911.5896
Table. Tabulation of data for calculation of important
5
1i
5nUsing equations described for
25
1
5
1
2
5
1
5
1
5
12
ii
ii
ii
ii
iii
n
TTnk
2)2831.6()8491.8(5
)1921.1)(2831.6()5896.1(5
-210 6091.9 N-m/rad
summations
0aT
and
with
17
Example 1 cont.
n
TT i
i
5
1_
Use the average torque and average angle to calculate
1k
_
2
_
1 kTk
ni
i
5
1_
5
1921.1
1103842.2
5
2831.6
2566.1
Using,
)2566.1)(106091.9(103842.2 21 1101767.1 N-m
18
Example 1 Results
0.1
0.2
0.3
0.4
0.5 1 1.5 2
(radians)
To
rqu
e (N
-m)
T = 0.11767 + 0.096091
Figure. Linear regression of Torque versus Angle data
Using linear regression, a trend line is found from the data
Can you find the energy in the spring if it is twisted from 0 to 180 degrees?
19
Example 2
StrainStress
)%((MPa)
00
0.183306
0.36612
0.5324917
0.7021223
0.8671529
1.02441835
1.17742140
1.3292446
1.4792752
1.52767
1.562896
To find the longitudinal modulus of composite, the following data is collected. Find the longitudinal modulus, Table. Stress vs. Strain data
E using the regression model E and the sum of the square of
the
0.0E+00
1.0E+09
2.0E+09
3.0E+09
0 0.005 0.01 0.015 0.02
Strain, ε (m/m)
Str
ess,
σ (
Pa)
residuals.
Figure. Data points for Stress vs. Strain data
20
Example 2 cont.
iii E
Residual at each point is given by
The sum of the square of the residuals then is
n
iirS
1
2
n
iii E
1
2
0)(21
i
n
iii
r EE
S
Differentiate with respect to
E
n
ii
n
iii
E
1
2
1
Therefore
21
Example 2 cont.
iεσε 2εσ
10.00000.00000.00000.0000
21.8300x10-33.0600x1083.3489x10-65.5998x105
33.6000x10-36.1200x1081.2960x10-52.2032x106
45.3240x10-39.1700x1082.8345x10-54.8821x106
57.0200x10-31.2230x1094.9280x10-58.5855x106
68.6700x10-31.5290x1097.5169x10-51.3256x107
71.0244x10-21.8350x1091.0494x10-41.8798x107
81.1774x10-22.1400x1091.3863x10-42.5196x107
91.3290x10-22.4460x1091.7662x10-43.2507x107
101.4790x10-22.7520x1092.1874x10-44.0702x107
111.5000x10-22.7670x1092.2500x10-44.1505x107
121.5600x10-22.8960x1092.4336x10-44.5178x107
1.2764x10-32.3337x108
Table. Summation data for regression model
12
1i
12
1
32 102764.1i
i
With
and
12
1
8103337.2i
ii
Using
12
1
2
12
1
ii
iii
E
3
10
102764.1
103337.2
GPa83.182
22
Example 2 Results
823.182The equation
Figure. Linear regression for Stress vs. Strain data
describes the data.
0.00E+00
5.00E+08
1.00E+09
1.50E+09
2.00E+09
2.50E+09
3.00E+09
3.50E+09
0 0.005 0.01 0.015 0.02
Strain ε, (m/m)
Str
ess σ
, (P
a) 823.182
04/21/2323
Nonlinear RegressionNonlinear Regression
Dr. A. Emamzadeh
24
Nonlinear Regression
Given n data points
),( , ... ),,(),,( 2211 nn yxyx yx best fit )(xfy
to the data, where
)(xf is a nonlinear function of
x .
Figure. Nonlinear regression model for discrete y vs. x data
)(xfy
),(nn
yx
),(11
yx
),(22
yx
),(ii
yx
)(ii
xfy
25
Nonlinear Regression
)( bxaey
)( baxy
xb
axy
Some popular nonlinear regression models:
1. Exponential model:2. Power model:
3. Saturation growth model:4. Polynomial model:
)( 10m
mxa...xaay
26
Exponential Model
),( , ... ),,(),,( 2211 nn yxyx yxGiven
best fit
bxaey to the data.
Figure. Exponential model of nonlinear regression for y vs. x data
bxaey
),(nn
yx
),(11
yx
),(22
yx
),(ii
yx
)(ii
xfy
27
and
Finding constants of Exponential Model
n
i
bx
ir iaeyS
1
2
a
The sum of the square of the residuals is defined as
Differentiate with respect to
b
021
ii bxn
i
bxi
r eaeya
S
021
ii bxi
n
i
bxi
r eaxaeyb
S
28
Finding constants of Exponential Model
Rewriting the equations, we obtain
01
2
1
n
i
bxn
i
bxi
ii eaey
01
2
1
n
i
bxi
n
i
bxii
ii exaexy
29
Finding constants of Exponential Model
Substituting
a back into the previous equation
01
2
1
2
1
1
n
i
bxin
i
bx
bxn
ii
bxi
n
ii
i
i
i
i exe
eyexy
The constant
b can be found through numerical methods suchas the bisection method or secant method.
Nonlinear equation in terms of
b
n
i
bx
n
i
bxi
i
i
e
eya
1
2
1
Solving the first equation for
a yields
30
Example 1-Exponential Model
t(hrs)013579
1.0000.8920.7080.5620.4470.355
Many patients get concerned when a test involves injection of a radioactive material. For example for scanning a gallbladder, a few drops of Technetium-99m isotope is used. Half of the techritium-99m would be gone in about 6 hours. It, however, takes about 24 hours for the radiation levels to reach what we are exposed to in day-to-day activities. Below is given the relative intensity of radiation as a function of time.
Table. Relative intensity of radiation as a function
of time
0
0.5
1
0 5 10
Time t, (hours)
Relative intensity of radiation, γ
Figure. Data points of relative radiation intensity vs. time
31
Example 1-Exponential Model cont.
Find: a) The value of the regression
constants A an
db) The half-life of Technium-99m
c) Radiation intensity after 24 hours
The relative intensity is related to time by the equation tAe
32
Example 1-Exponential Model cont.
The value for
is found by solving the nonlinear equation
01
2
1
2
1
1
n
i
tin
i
t
n
i
ti
ti
n
ii
i
i
i
i ete
eetf
The value for
A is then found by solving
n
i
t
n
i
ti
i
i
e
eA
1
2
1
Numerical methods such as bisection method, or secant method are the best method to solve for
33
Example 1-Exponential Model cont.
Using bisection method, we may estimate the initial bracket for ][ 0.0770160.23105,The table below shows
)(f evaluated at
)( 0.23105f .
123456
013579
10.8910.7080.5620.4470.355
0.000000.707191.06200.885090.620870.39937
1.000000.707190.354000.177020.088700.04437
1.000000.629960.250000.099210.039370.01562
0.000000.629960.750000.496060.275600.14062
3.67452.37132.03422.2922
itiet 2
ite 2
Table. Summation value for calculation of constants of
model it
ieit
ii et iii
t
6
1i
to be
34
Example 1-Exponential Model cont.
6745.323105.06
1
it
ii
i et
From Table,
3713.223105.06
1
it
iie
0342.223105.026
1
it
i
e
2922.223105.026
1
it
iiet
2922.20342.2
3713.26745.323105.0 f 0024.1
35
Example 1-Exponential Model cont.
Using the same procedure with
0.077016λ we find
39201.00077016.0 f
Since 00077016.023105.0 ff the value of
λ falls in
the bracket
0.00770160.23105,
Continuing the bisection method, we eventually find that the root of 0f is 11508.0
This was calculated after twenty iterations with an absolute relative approximate error of 0.0002%.
.
36
Example 1-Exponential Model cont.
The value
A can now be calculated
6
1
2
6
1
i
t
i
ti
i
i
e
eA
)9)(11508.0(2)7)(11508.0(2)5)(11508.0(2
)3)(11508.0(2)1)(11508.0(2)0)(11508.0(2
)9(11508.0)7(11508.0)5(11508.0
)3(11508.0)1(11508.0)0(11508.0
355.0447.0562.0
708.0891.01
eee
eee
eee
eee
99983.0
The exponential regression model then is
te 11508.0 99983.0
37
Example 1-Exponential Model cont.
Resulting model
Figure. Relative intensity of radiation as a function of time using an exponential regression model.
te 11508.0 99983.0
0
0.5
1
0 5 10
Time, t (hrs)
Relative Intensity
of Radiation,
te 11508.0 99983.0
38
Example 1-Exponential Model cont.
b) Half life of Technetium 99-m is when 02
1
t
hourst
t
e
ee
t
t
0232.6
)5.0ln(11508.0
5.0
99983.02
199983.0
11508.0
)0(11508.011508.0
c) The relative intensity of radiation after 24 hours 2411508.099983.0 e
2103160.6 This result implies that only
%3171.610099983.0
103160.6 2
of the initial
radioactive intensity is left after 24 hours.
39
Polynomial Model
),( , ... ),,(),,( 2211 nn yxyx yxGiven best fit
m
mxa...xaay
10
)2( nm to a given data set.
Figure. Polynomial model for nonlinear regression of y vs. x data
m
mxaxaay
10
),(nn
yx
),(11
yx
),(22
yx
),(ii
yx
)(ii
xfy
40
Polynomial Model cont.
The residual at each data point is given bymimiii xaxaayE ...10
The sum of the square of the residuals then is
n
i
mimii
n
iir
xaxaay
ES
1
2
10
1
2
...
41
Polynomial Model cont.
To find the constants of the polynomial model, we set the derivatives with respect to ia wher
e
0)(....2
0)(....2
0)1(....2
110
110
1
110
0
mi
n
i
mimii
m
r
i
n
i
mimii
r
n
i
mimii
r
xxaxaaya
S
xxaxaaya
S
xaxaaya
S
,,1 mi equal to zero.
42
Polynomial Model cont.
These equations in matrix form are given by
n
ii
mi
n
iii
n
ii
mn
i
mi
n
i
mi
n
i
mi
n
i
mi
n
ii
n
ii
n
i
mi
n
ii
yx
yx
y
a
a
a
xxx
xxx
xxn
1
1
1
1
0
1
2
1
1
1
1
1
1
2
1
11
......
...
...........
...
...
The above equations are then solved for
maaa ,,, 10
43
Example 2-Polynomial Model
Temperature, T(oF)
Coefficient of thermal
expansion, α (in/in/oF)
806.47x10-6
406.24x10-6
-405.72x10-6
-1205.09x10-6
-2004.30x10-6
-2803.33x10-6
-3402.45x10-6
Regress the thermal expansion coefficient vs. temperature data to a second order polynomial.
1.00E-06
2.00E-06
3.00E-06
4.00E-06
5.00E-06
6.00E-06
7.00E-06
-400 -300 -200 -100 0 100 200
Temperature, oF
Th
erm
al e
xpan
sio
n c
oef
fici
ent,
α
(in
/in
/oF
)
Table. Data points for temperature vs
Figure. Data points for thermal expansion coefficient vs temperature.
α
44
Example 2-Polynomial Model cont.
2210 TaTaaα
We are to fit the data to the polynomial regression model
n
iii
n
iii
n
ii
n
ii
n
ii
n
ii
n
ii
n
ii
n
ii
n
ii
n
ii
T
T
a
a
a
TTT
TTT
TTn
1
2
1
1
2
1
0
1
4
1
3
1
2
1
3
1
2
1
1
2
1
The coefficients
210 , a,aa are found by differentiating the sum of thesquare of the residuals with respect to each variable and
setting thevalues equal to zero to obtain
45
Example 2-Polynomial Model cont.
The necessary summations are as follows
Temperature, T(oF)
Coefficient of thermal expansion,
α (in/in/oF)
806.47x10-6
406.24x10-6
-405.72x10-6
-1205.09x10-6
-2004.30x10-6
-2803.33x10-6
-3402.45x10-6
Table. Data points for temperature vs.
α 57
1
2 105580.2 i
iT
77
1
3 100472.7 i
iT
107
1
4 101363.2
i
iT
57
1
103600.3
i
i
37
1
106978.2
i
iiT
17
1
2 105013.8
i
iiT
46
Example 2-Polynomial Model cont.
1
3
5
2
1
0
1075
752
52
105013.8
106978.2
103600.3
101363.2100472.7105800.2
100472.7105800.210600.8
105800.2106000.80000.7
a
a
a
Using these summations, we can now calculate
210 , a,aa
Solving the above system of simultaneous linear equations we have
11
9
6
2
1
0
102218.1
102782.6
100217.6
a
a
a
The polynomial regression model is then
21196
2210
T101.2218T106.2782106.0217
α
TaTaa
47
Linearization of Data
To find the constants of many nonlinear models, it results in solving simultaneous nonlinear equations. For mathematical convenience, some of the data for such models can be linearized. For example, the data for an exponential model can be linearized.As shown in the previous example, many chemical and physical processes are governed by the equation,
bxaey Taking the natural log of both sides yields, bxay lnln
Let yz ln and
aa ln0
(implying)
oaea with
ba 1
We now have a linear regression model where
xaaz 10
48
Linearization of data cont.
Using linear model regression methods,
_
1
_
0
1
2
1
2
11 11
xaza
xxn
zxzxna
n
i
n
iii
n
ii
n
i
n
iiii
Once 1,aao are found, the original constants of the model are found as
0
1
aea
ab
49
Example 3-Linearization of data
t(hrs)013579
1.0000.8920.7080.5620.4470.355
Many patients get concerned when a test involves injection of a radioactive material. For example for scanning a gallbladder, a few drops of Technetium-99m isotope is used. Half of the techritium-99m would be gone in about 6 hours. It, however, takes about 24 hours for the radiation levels to reach what we are exposed to in day-to-day activities. Below is given the relative intensity of radiation as a function of time.Table. Relative intensity of radiation as a function
of time
0
0.5
1
0 5 10
Time t, (hours)
Relative intensity of radiation, γ
Figure. Data points of relative radiation intensity vs. time
50
Example 3-Linearization of data cont.
Find: a) The value of the regression
constants A an
db) The half-life of Technium-99m
c) Radiation intensity after 24 hours
The relative intensity is related to time by the equation tAe
51
Example 3-Linearization of data cont.
tAe Exponential model given as,
tA lnln
Assuming
lnz , Aao ln and 1a we obtaintaaz
10
This is a linear relationship between
z and t
52
Example 3-Linearization of data cont.
Using this linear relationship, we can calculate
10 , aa
n
i
n
ii
n
ii
n
i
n
iiii
ttn
ztztna
1
2
1
2
1
11 1
1
and
taza 10
where
1a
0a
eA
53
Example 3-Linearization of Data cont.
123456
013579
10.8910.7080.5620.4470.355
0.00000-0.11541-0.34531-0.57625-0.80520-1.0356
0.0000-0.11541-1.0359-2.8813-5.6364-9.3207
0.00001.00009.000025.00049.00081.000
25.000-2.8778-18.990165.00
Summations for data linearization are as follows
Table. Summation data for linearization of data model
i it iii
z ln iizt 2
it
With 6n
000.256
1
i
it
6
1
8778.2i
iz
6
1
990.18i
iizt
00.1656
1
2 i
it
54
Example 3-Linearization of Data cont.
Calculating 10 ,aa
21
2500.1656
8778.225990.186
a 11505.0
6
2511505.0
6
8778.20
a
4106150.2
Since Aa ln0 0aeA
4106150.2 e 99974.0
11505.01 aalso
55
Example 3-Linearization of Data cont.
Resulting model is
te 11505.099974.0
0
0.5
1
0 5 10
Time, t (hrs)
Relative Intensity
of Radiation,
te 11505.099974.0
Figure. Relative intensity of radiation as a function of temperature using linearization of data model.
56
Example 3-Linearization of Data cont.
The regression formula is thente 11505.099974.0
b) Half life of Technetium 99 is when02
1
t
hourst
t
e
ee
t
t
0247.6
)5.0ln(11505.0
5.0
99974.02
199974.0
11508.0
)0(11505.011505.0
57
Example 3-Linearization of Data cont.
c) The relative intensity of radiation after 24 hours is then 2411505.099974.0 e
063200.0This implies that only
%3211.610099983.0
103200.6 2
of the radioactive
material is left after 24 hours.
58
Comparison
Comparison of exponential model with and without data linearization:
With data linearization(Example 3)
Without data linearization(Example 1)
A0.999740.99983
λ-0.11505-0.11508
Half-Life (hrs)6.02476.0232
Relative intensity after 2 hrs.
6.3160x10-26.3200x10-2
Table. Comparison for exponential model with and without data linearization.
The values are very similar so data linearization was suitable to find the constants of the nonlinear exponential model in this case.