gy2100 geographical data analysis lecture 4 regression analysis and statistical inference department...
TRANSCRIPT
![Page 1: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/1.jpg)
0
0.5
1
1.5
2
2.5
-1 1 3 5
Mean grain size, phi
Sort
ing
Braided stream
Point bar
Stream mouth bar
Tidal flat
Inter distrib. beach
Mature beach
Dune
![Page 2: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/2.jpg)
GY2100 Geographical Data Analysis
Lecture 4Regression analysis and
statistical inference
DEPARTMENT OF GEOGRAPHY
![Page 3: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/3.jpg)
The statistical utility of a regression line
![Page 4: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/4.jpg)
The regression model and its underlying assumptions
Yi i = + Xi i + ii
Systematic or deterministic component represented by a straight line
Random or stochastic component represented by the deviations of the observations about the line
– alpha
– beta
delta
![Page 5: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/5.jpg)
Illustrating regression using the
fixed X model
![Page 6: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/6.jpg)
Assumptions of the regression model
1. The relationship between X and Y is linear;
2. Values of X are fixed and measured without error;
3. The disturbance terms i are normally distributed with equal variance about the line Y = + X and each has an expected value E(i) = 0. This means that the expected value for a given value of X is E(Yi,Xi) = + Xi;
![Page 7: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/7.jpg)
Assumptions of the regression model
4. The di are statistically uncorrelated:
a. There is no autocorrelation (di term is uncorrelated with X)
b. There is no spatial autocorrelation (di are not correlated with another variable).
![Page 8: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/8.jpg)
Testing the assumptions
1. Specific statistical tests using residuals of the sample regression line as estimates of the error term in the true population regression model
2. Histogram of residuals3. Examination of residual plots
![Page 9: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/9.jpg)
Examination of residual plots
![Page 10: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/10.jpg)
Recap
Yi = a + bXi + di
![Page 11: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/11.jpg)
Inferences in regression analysis
Slope parameter () Intercept parameter () Precision of estimates derived from
the sample regression equation
![Page 12: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/12.jpg)
Testing if =0
Elevation above sea-level vs mean annual rainfallfor Scotland
Y' = 2.38X + 895
0
500
1000
1500
2000
2500
0 100 200 300 400 500 600
Elevation (m)
Rai
nfal
l (m
m y
r-1)
If = 0 then
Y’= constant for all X
![Page 13: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/13.jpg)
Testing =0 using the t test
HypothesesHo: = 0
H1: 0
Under Ho, repeated sampling yields a distribution of b which follows a t distribution about an expected value of = 0.
![Page 14: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/14.jpg)
Testing =0 using the t test
Since we are testing
= 0, then
bsb
nt
2
n
i
n
iii
yx
nXX
sbs
1
2
1
2 1
where
df = n-2 is the number of
degrees of freedom
sb = estimated standard error
of the sampling distribution of b
bn s
bt
2
The test statistic to be
calculated is
![Page 15: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/15.jpg)
Regression in EXCELRegression Statistics
Multiple R 0.78
R square 0.61
Adj R square 0.59
Standard error 242.79
Observations 20
ANOVA
df SS MS F Sig F
Regression 1 1691558 1691558
28.7 4.31E-5
Residual 18 1061062 58948
Total 19 2752620
Coeffs Standard Error t stat P-value Lower 95% Upper 95%
Intercept 895 149.76 5.98 1.178E-5
581 1210
Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31
![Page 16: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/16.jpg)
Testing =0 using the t test
Since we are testing
= 0, then
bsb
nt
2
bn s
bt
2
The test statistic to be
calculated is
So……
36.5444.038.2 t
With df=n-2=18,
tcrit = 2.1 at = 0.05.
![Page 17: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/17.jpg)
Regression in EXCELRegression Statistics
Multiple R 0.78
R square 0.61
Adj R square 0.59
Standard error 242.79
Observations 20
ANOVA
df SS MS F Sig F
Regression 1 1691558 1691558
28.7 4.31E-5
Residual 18 1061062 58948
Total 19 2752620
Coeffs Standard Error t stat P-value Lower 95% Upper 95%
Intercept 895 149.76 5.98 1.178E-5 581 1210
Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31
![Page 18: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/18.jpg)
Testing =Q using the t test(Q0)
bn s
bt
2
The test statistic to be
calculated is
If we are testing = 4, then
444.0438.2
2
bn s
bt
![Page 19: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/19.jpg)
Testing =0 using the F test
Decomposition of the variance using sums of squares
Total sum = Regression sum + Residual sum
of squares of squares of squares
n
iii
n
i
n
iii YYYYYYTSS
1
2
1 1
22)'()'()(
![Page 20: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/20.jpg)
Testing =0 using the F test
Decomposition of the variance using sums of squaresTotal sum of squares (TSS) is the sum of the squared deviations of the individual observations about their mean
n
ii YYTSS
1
2)(
Regression sum of squares (RSS) is the sum of the squared deviations of the predicted Y values (Y’) about the mean Y
n
ii YYRSS
1
2)'(
The difference between TSS and RSS is ‘unexplained’ by the regression line and
is therefore the residual sum of squares (Residual SS)
n
iii YYSSsid
1
2)'(Re
![Page 21: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/21.jpg)
Testing =0 using the F test
Decomposition of
the variance using
sums of squares
TSS=RSS+Resid SS
n
ii YYTSS
1
2)(
n
ii YYRSS
1
2)'(
n
iii YYSSsidual
1
2)'(Re
![Page 22: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/22.jpg)
Regression in EXCELRegression Statistics
Multiple R 0.78
R square 0.61
Adj R square 0.59
Standard error 242.79
Observations 20
ANOVA
df SS MS F Sig F
Regression 1 1691558 1691558
28.7 4.31E-5
Residual 18 1061062 58948
Total 19 2752620
Coeffs Standard Error t stat P-value Lower 95% Upper 95%
Intercept 895 149.76 5.98 1.178E-5
581 1210
Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31
![Page 23: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/23.jpg)
Regression in Excel
ANOVA
df SS MS F Sig F
Regression
1 RSS RSS/df RMSS/Resid MSS
Residual n-2 Resid SS Resid SS/df
Total n-1 TSS TSS/df
RMSS = Regression MSS
Resid MSS – Residual MSS
![Page 24: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/24.jpg)
Regression in Excel
ANOVA
df SS MS F Sig F
Regression 1 1691558 1691558 28.7 4.31E-5
Residual 18 1061062 58948
Total 19 2752620
With 1 and 18 df,
Fcrit 2.7 at = 0.05.
![Page 25: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/25.jpg)
Constructing a confidence interval for
In general bb stbstb ..
For the rainfall data
31.344.1
444.0*1.238.2444.0*1.238.2
![Page 26: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY](https://reader035.vdocuments.us/reader035/viewer/2022062417/5518c60b550346b31f8b57c0/html5/thumbnails/26.jpg)
Regression in EXCELRegression Statistics
Multiple R 0.78
R square 0.61
Adj R square 0.59
Standard error 242.79
Observations 20
ANOVA
df SS MS F Sig F
Regression 1 1691558 1691558
28.7 4.31E-5
Residual 18 1061062 58948
Total 19 2752620
Coeffs Standard Error t stat P-value Lower 95% Upper 95%
Intercept 895 149.76 5.98 1.178E-5
581 1210
Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31