simple linear regression in r
TRANSCRIPT
-
8/11/2019 Simple Linear Regression in R
1/17
Gianni Gorgoglione
0 20 40 60 80 100
100
200
300
400
500
x
y
Figure 1 Eps Figure 2 Original data x & y
Simple Linear Regression in R
PART 1
1. Start R
2. ## Generate data
x
-
8/11/2019 Simple Linear Regression in R
2/17
Gianni Gorgoglione
0 20 40 60 80 100
100
200
300
400
500
x
y
Figure 3 Original data and Linear Regression
4.
## Linear Model lm() with x as predictor and y as response
4.a
lm(y~x)
lm1
-
8/11/2019 Simple Linear Regression in R
3/17
-
8/11/2019 Simple Linear Regression in R
4/17
Gianni Gorgoglione
6.
6.a
plot(x,y)
plot(res,y)
Yes, values in predicted and observed data shape a linear function and then residuals versus predicted
values
show a mean of zero where other values are within -10 and 10 that represent the standard deviation.
6.b
y2 lm2
res2
-
8/11/2019 Simple Linear Regression in R
5/17
Gianni Gorgoglione
By plotting the histogram of the residuals it is not possible to see an even distribution. In other words, it
seems that a constant variance does not exist. This confirms that we are not in the case of Linear
Regression. The second assumption criterion tells that the linear model is not appropriate for the non-
linear data.
7.
7.a
plot(res,y)
Histogram of res2
res2
Frequency
-500 0 500 1000
0
10
20
30
40
50
-
8/11/2019 Simple Linear Regression in R
6/17
Gianni Gorgoglione
Residuals are getting smaller closer to 0.
7.b
y3 lm3
res3
-
8/11/2019 Simple Linear Regression in R
7/17
Gianni Gorgoglione
6500000
7000000
7500000
1300000 1700000
prec_DJF prec_JJA
1300000 1700000
temp_DJF temp_JJA
1300000 1700000
presence
-20
0
20
40
60
80
100
120
140
PART 2
1.
install.packages(sp)
library(sp)
install.packages("raster")
library(raster)
install.packages("rasterVis")
library(rasterVis)
2. bird.clim.sp
-
8/11/2019 Simple Linear Regression in R
8/17
Gianni Gorgoglione
train.data
-
8/11/2019 Simple Linear Regression in R
9/17
Gianni Gorgoglione
train.data$RT90.O->tx
train.data$RT90.N->ty
train.data$temp_JJA->tdjja
temp_jja.trend2
-
8/11/2019 Simple Linear Regression in R
10/17
Gianni Gorgoglione
5.
The function predict takes the fit model and its values and calculates based on those values the
new predicted data.
temp_jja.trend1.predMSE2
MSE2 6.666005
sum((temp.trend3$temp_JJA-temp.trend3$predicted)^2)/179->MSE3
MSE3 6.731361
Table 1 MSE for crossvalidation of 1th, 2th and 3th order trend surfaces
MSE RESULTS
MSE1 6.958501
MSE2 6.666005MSE3 6.731361
According to the Mean Square Error cross-validation method the 2th order trend surface fits best
to the model where mse is the lowest.
-
8/11/2019 Simple Linear Regression in R
11/17
Gianni Gorgoglione
PART 3
1.
http://www.oikos.ekol.lu.se/appendixdown/snouterdata.txt
2.
snout
-
8/11/2019 Simple Linear Regression in R
12/17
Gianni Gorgoglione
4.
install.packages("pgirmess")
snout.res
-
8/11/2019 Simple Linear Regression in R
13/17
Gianni Gorgoglione
5.
5.A
install.packages("spdep")
library(spdep)
dnearneigh(coords, 1, 1.5, row.names = NULL, longlat = NULL)
##Created 3 spatial weights interval
dnearneigh(coords, 0, 1, row.names = NULL, longlat = NULL)->w0
dnearneigh(coords, 1, 1.5, row.names = NULL, longlat = NULL)->w1
dnearneigh(coords, 1.5, 2, row.names = NULL, longlat = NULL)->w2
dnearneigh(coords, 2, 10, row.names = NULL, longlat = NULL)->w3
nb2listw(w0)
nb2listw(w1)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 1108
Number of nonzero links: 4186
Percentage nonzero weights: 0.3409728
Average number of links: 3.777978
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 1108 1227664 1108 599.7917 4461.944
nb2listw(w2)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 1108
Number of nonzero links: 4084
Percentage nonzero weights: 0.3326643
Average number of links: 3.685921
Weights style: W
Weights constants summary:
-
8/11/2019 Simple Linear Regression in R
14/17
Gianni Gorgoglione
n nn S0 S1 S2
W 1108 1227664 1108 618.75 4472.778
nb2listw(w3)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 1108
Number of nonzero links: 249036
Percentage nonzero weights: 20.28535
Average number of links: 224.7617
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 1108 1227664 1108 10.45354 4468.041
nb2listw(w0)->sar0
nb2listw(w1)->sar1
nb2listw(w2)->sar2
nb2listw(w3)->sar3
errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar0)
Call:errorsarlm(formula = snout1.1$snouter1.1 ~ snout1.1$rain + snout1.1$djungle,
listw = sar0)
Type: error
Coefficients:
lambda (Intercept) snout1.1$rain snout1.1$djungle
0.87922189 79.91106965 -0.01964262 0.02931540
Log likelihood: -3514.855
errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar1)
Call:
errorsarlm(formula = snout1.1$snouter1.1 ~ snout1.1$rain + snout1.1$djungle,
listw = sar1)
Type: error
Coefficients:
-
8/11/2019 Simple Linear Regression in R
15/17
Gianni Gorgoglione
lambda (Intercept) snout1.1$rain snout1.1$djungle
0.81297579 81.12541938 -0.02054332 -0.01565557
Log likelihood: -3702.271
errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar2)
Call:
errorsarlm(formula = snout1.1$snouter1.1 ~ snout1.1$rain + snout1.1$djungle,
listw = sar2)
Type: error
Coefficients:
lambda (Intercept) snout1.1$rain snout1.1$djungle
0.73214746 80.38066616 -0.02022954 0.02246367
Log likelihood: -3853.828
errorsarlm(snout1.1$snouter1.1~snout1 .1$rain+snout1.1$djungle,listw=sar3)
Call:
errorsarlm(formula = snout1.1$snouter1.1 ~ snout1.1$rain + snout1.1$djungle,
listw = sar3)
Type: error
Coefficients:
lambda (Intercept) snout1.1$rain snout1.1$djungle
0.98325221 101.73633693 -0.01773770 0.08609926
Log likelihood: -4034.266
5.B
##Create SAR LINEAR MODEL for each distance w1,w2,w3
errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar0)->SARLM0
errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar1)->SARLM1
errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar2)->SARLM2
errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar3)->SARLM3
AIC(SARLM0)-----7039.71
-
8/11/2019 Simple Linear Regression in R
16/17
Gianni Gorgoglione
-0.2
0
-0.1
5
-0.1
0
-0.0
5
0.0
0
0.0
5
Moran I statistic = f(distance classes)
distance classes
MoranIstatistic
2 4 6 8 12 16 20 24 28 32 36 40 44
AIC(SARLM1)-----7414.543
AIC(SARLM2)-----7717.656
AIC(SARLM3) -----8078.533
The lowest AIC value is for SARLM0 where the distance between neighbours is 0 to 1 is the best model.
5.C
correlog(coords,SARLM0$residuals,method=c('Moran'))->SARLM0CORR
correlog(coords,SARLM1$residuals,method=c('Moran'))->SARLM1CORR
correlog(coords,SARLM2$residuals,method=c('Moran'))->SARLM2CORR
correlog(coords,SARLM3$residuals,method=c('Moran'))->SARLM3CORR
plot(SARLM0CORR)
plot(SARLM1CORR)
plot(SARLM2CORR)plot(SARLM3CORR)
Figure 1SARLM1 correlogram d=0-1.0 Figure 2 SARLM1 correlogram d=1.0-1.5
-0.1
5
-0.1
0
-0.0
5
0.0
0
Moran I statistic = f(distance classes)
distance classes
MoranIstatistic
2 6 10 14 18 22 26 30 34 38 42 46
-
8/11/2019 Simple Linear Regression in R
17/17
Gianni Gorgoglione
-0.1
0.0
0.1
0.2
0.3
0.4
Moran I statistic = f(distance classes)
distance classes
MoranIstatistic
2 4 6 8 12 16 20 24 28 32 36 40 44
Figure 2 SARLM3 correlogram d=1.5-2.0 Figure 4 SARLM3 correlogram d=2.0-10.0
Both the results from AIC and SAR correlograms show that the best model is the first called SARLM0. In
the SARLM0 (Figure 1) correlogram the values of Moran I are pretty much close to 0. This is the situation
that depicts a no spatial pattern. In the SARLM4 (figure 4) there is a tendency for residuals to 1 at
distances between 0 and 6. Thus, there is the tendency to create pattern. At the end this could not be
acceptable for the assumption of linear Regression.
-0.2
0
-0.1
5
-0.1
0
-0.0
5
0.00
0.0
5
0.1
0
Moran I statistic = f(distance classes)
distance classes
MoranIstatistic
2 4 6 8 12 16 20 24 28 32 36 40 44