![Page 1: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/1.jpg)
Regression: Motivation
One dimensional data
(Summary by Mean)
10 20 30 40 50
![Page 2: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/2.jpg)
X (X-a)2
10 (10-a)2
20 (20-a)2
30 (30-a)2
40 (40-a)2
50 (50-a)2
150 T min T when a = mean=30
![Page 3: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/3.jpg)
RegressionEstriol Birth Wt
7 25
9 25
9 25
12 27
14 27
14 30
15 32
15 34
15 34
15 35
16 27
16 24
16 30
16 31
16 32
Estriol Birth Wt
30 35.5
32 35.5
36 35.5
35 37.0
37 37.0
31 38.5
34 38.5
38 40.0
30 41.5
40 43.0
28 46.0
43 46.0
32 47.5
39 47.5
34 50.5
![Page 4: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/4.jpg)
Regression
• Concerns– Data summarization
• (As in one dimensional data)
– Prediction of low birthweight baby• (for special prenatal care to those in high risk)
![Page 5: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/5.jpg)
Scatter plot
7 12 17 22 27
24
29
34
39
43
Birt
h w
eigh
t
Estriol
![Page 6: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/6.jpg)
Lines through scatter plot to represent the data
7 12 17 22 27
24
29
34
39
43
Line 3
Line 4
Line 5
Estriol (mg/24 hr)
Bir
thw
eigh
t (g/
100)
Line 2
![Page 7: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/7.jpg)
Regression line: The best lineThe best representation of data
Regression Line through Scatter Plot
7 12 17 22 27
24
29
34
39
43
Fig Reg 1.6
Estriol (mg/24 hr)
Bir
thw
eigh
t (g/
100)
![Page 8: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/8.jpg)
What is this with a line and numbers anyway?
• They could be the same in two different form or language
• But, lines require less space to record remember, memorize and are easy to comprehend
• Lines could be pictorial or mathematical representation of numerical data
![Page 9: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/9.jpg)
• A lineY = 2+3X
Numbers generated by the line
Slope = 2
Intercept =3
(interpretation ??)
x y
0 2
1 5
2 8
… …
50 152
… …
… …
![Page 10: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/10.jpg)
Representation of bivariate measure ments in different forms
• Equation Y =2+3x
• Data/Number
• x y
• 0 2• 1 5• 2 8• … …
50 152• … …• … …
Y
X0 3
2
11
Picture/Graph
![Page 11: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/11.jpg)
Straight lines
Inte
rcep
t
-------
A Straight Line
X
Y
Two Straight lines with the Same Slope but Different Intercepts
X Y
![Page 12: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/12.jpg)
Straight lines
Zero Slope
Zero Intercept
X X
Y
Y
Two Straight Lines with the same Intercept but Different Slopes
Straight Line with Zero Slope and Zero Intercept
![Page 13: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/13.jpg)
Regression: what line will generate the data?
Estriol Birth Wt
7 25
9 25
9 25
12 27
14 27
14 30
15 32
15 34
15 34
15 35
16 27
16 24
16 30
16 31
16 32
Estriol Birth Wt
30 35.5
32 35.5
36 35.5
35 37.0
37 37.0
31 38.5
34 38.5
38 40.0
30 41.5
40 43.0
28 46.0
43 46.0
32 47.5
39 47.5
34 50.5
![Page 14: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/14.jpg)
Regression: what line will generate the data?
7 12 17 22 27
24
29
34
39
43
Birt
h w
eigh
t
Estriol
![Page 15: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/15.jpg)
Which is the best line?
7 12 17 22 27
24
29
34
39
43
Line 1
Line 3
Line 4
Line 5
Estriol (mg/24 hr)
Bir
thw
eigh
t (g/
100)
Line 2
![Page 16: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/16.jpg)
The best lineBirthweight = 21.52 + 0.608 Estriol
Regression Line through Scatter Plot
7 12 17 22 27
24
29
34
39
43
Estriol (mg/24 hr)
Bir
thw
eigh
t (g/
100)
![Page 17: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/17.jpg)
Computer output
Coefficientsa
21.523 2.620 8.214 .000 16.164 26.883
.608 .147 .610 4.143 .000 .308 .908
(Constant)
ESTRIOL
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: BWEIGHTa.
![Page 18: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/18.jpg)
Regression
The Saga continues
![Page 19: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/19.jpg)
Out of curiosity
How did this accomplish what we wanted (i.e. data summarization and identifying women who might need special prenatal care)
![Page 20: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/20.jpg)
• 1. We end up with the line Birthweight =21.52+0.608 Estriol, hoping that
this line will generate the original data
2. In the case of univariate ‘mean’ is closest to the data in a sense. In similar way, regression line is the closet line to the data . In that sense it summarizes the data.
![Page 21: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/21.jpg)
Recall
One dimensional data
(Summary by Mean)
10 20 30 40 50
![Page 22: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/22.jpg)
Recall
X (X-a)2Bweight (bweight- L)2
10 (10-a)2 25 (25-L)2
20 (20-a)2 25 (25-L)2
30 (30-a)2 25 (25-L)2
40 (40-a)2 27 (27-L)2
50 (50-a)2 … …
Mean=30 minimizes sum L =21.52+0.608 Esriol minimizes the sum – This is regression line
![Page 23: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/23.jpg)
Prediction
• Women that need special care
• If lowbirth weight is defined as < 2500g, then women with estriol level < 5.72 are in hirisk of having low birthweight babies.
![Page 24: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/24.jpg)
• So is everything fine and dandy
• Not necessarily -– How closely does the regression line
generates the data?– How much is estriol is responsible for
birthweight??– Was there something that would have better
predicted women at risk???
![Page 25: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/25.jpg)
Birthweights Generated From
Observed Difference
Squared From
Obs. No.
(a)
Estriol
(b)
Observed Data (c)
Line 1.1
(d)
Line 1.2
(e)
Line 1.1 [(c)-(d)]2
Line 1.2 [(c)-(e)]2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
7 9 9
12 14 14 15 15 15 15 16 16 16 16 16 16 17 17 17 18 18 19 19 20 21 22 24 24 25 25 27
25 25 25 27 27 30 32 34 34 35 27 24 30 31 32 35 30 32 36 35 37 31 34 38 30 40 28 43 32 39 34
20.5 23.5 23.5 28.0 31.0 31.0 32.5 32.5 32.5 32.5 34.0 34.0 34.0 34.0 34.0 34.0 35.5 35.5 35.5 37.0 37.0 38.5 38.5 40.0 41.5 43.0 46.0 46.0 47.5 47.5 50.5
25.776 26.992 26.992 28.816 30.032 30.032 30.640 30.640 30.640 30.640 31.248 31.248 31.248 31.248 31.248 31.248 31.856 31.856 31.856 32.464 32.464 33.072 33.072 33.680 34.288 34.896 36.112 36.112 36.720 36.720 37.936
20.25 2.25 2.25 1.00
16.00 1.00 0.25 2.25 2.25 6.25
49.00 100.00
16.00 9.00 4.00 1.00
30.25 12.25 0.25 4.00 0.00
56.25 20.25 4.00
132.25 9.00
324.00 9.00
240.25 72.25
272.25
0.6022 3.9681 3.9681 3.2979 9.1930 0.0010 1.8496
11.2896 11.2896 19.0096 18.0455 52.5335
1.5575 0.0615 0.5655
14.0775 3.4447 0.0207
17.1727 6.4313
20.5753 4.2932 0.8612
18.6624 18.3869 26.0508 65.8045 47.4445 22.2784
5.1984 15.4921
Sum Mean Variance
534.00 17.23 22.58
992.00 32.00 22.47
1111.00 35.84 50.81
992.00 32.00 8.35
1419.00 - -
423.43 - -
![Page 26: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/26.jpg)
E BW Pred Diff 7.00 25.00 25.78076 -.78076 9.00 25.00 26.99714 -1.99714 9.00 25.00 26.99714 -1.99714 12.00 27.00 28.82171 -1.82171 14.00 27.00 30.03810 -3.03810 14.00 30.00 30.03810 -.03810 15.00 32.00 30.64629 1.35371 15.00 34.00 30.64629 3.35371 15.00 34.00 30.64629 3.35371 15.00 35.00 30.64629 4.35371 16.00 27.00 31.25448 -4.25448 16.00 24.00 31.25448 -7.25448 16.00 30.00 31.25448 -1.25448 16.00 31.00 31.25448 -.25448 16.00 32.00 31.25448 .74552 16.00 35.00 31.25448 3.74552 17.00 30.00 31.86267 -1.86267 17.00 32.00 31.86267 .13733 17.00 36.00 31.86267 4.13733 18.00 35.00 32.47086 2.52914 18.00 37.00 32.47086 4.52914 19.00 31.00 33.07905 -2.07905 19.00 34.00 33.07905 .92095 20.00 38.00 33.68724 4.31276 21.00 30.00 34.29543 -4.29543 22.00 40.00 34.90362 5.09638 24.00 28.00 36.12000 -8.12000 24.00 43.00 36.12000 6.88000 25.00 32.00 36.72819 -4.72819 25.00 39.00 36.72819 2.27181 27.00 34.00 37.94457 -3.94457
![Page 27: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/27.jpg)
How good is the regression
Regression Line through Scatter Plot
7 12 17 22 27
24
29
34
39
43
Fig Reg 1.6
Estriol (mg/24 hr)
Bir
thw
eigh
t (g/
100)
![Page 28: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/28.jpg)
How good is the regression
• R2 = 0.372– Estriol explains about 37.2% of variation in
the birthweights. Remaining 62.8 % is explained by other factors
– At estriol 16, we have several birthweight s(24,30,31,32 and 35). If estriol is the only factor for Birthweight we would not see this variation.
![Page 29: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/29.jpg)
How good is the regrssionRegression line and 95% confidence intervals around predicted values
Estriol
Bweight line upper lower
7 27
22.4777
43
![Page 30: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/30.jpg)
Other factors
Multiple Regression
![Page 31: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/31.jpg)
Regression Diagnostics
Residual Analysis
![Page 32: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/32.jpg)
Diagnostics
• Residual for a patient (observation)– Difference between observed birthweight and
the birthweight regression line would generate (predict)
• Example: (for the first patient)– Observed birthweight = 25– Generated = 21.52+0.608 estriol
=21.52+0.608(7)=25.776
Residual = 25-25.776= -0.776
![Page 33: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/33.jpg)
Diagnostics
• Residual plots
• Plot of residuals against predicted values
• For assumptions– Normality, linearity and homoscedasticity
![Page 34: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/34.jpg)
Non normal
Heteroscedasticity
nonlinearity
![Page 35: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/35.jpg)
Diagnostics
• Residuals for influence patients (observation)
- change in estimated parameters (slope and intercept) when the analysis is redone without the patient in question
Patients with high leverage and large residual will have greater influence.
![Page 36: Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50](https://reader036.vdocuments.us/reader036/viewer/2022062621/551c4ccf5503467b488b50a2/html5/thumbnails/36.jpg)
Diagnostics
• Standardized and the studentized (or jackknife) residual
– A patient with large values for these residuals indicate outliers