simple linear regression in r

8/11/2019 Simple Linear Regression in R

1/17

Gianni Gorgoglione

0 20 40 60 80 100

100

200

300

400

500

x

y

Figure 1 Eps Figure 2 Original data x & y

Simple Linear Regression in R

PART 1

1. Start R

2. ## Generate data

x


2/17

Gianni Gorgoglione

0 20 40 60 80 100

100

200

300

400

500

x

y

Figure 3 Original data and Linear Regression

4.

## Linear Model lm() with x as predictor and y as response

4.a

lm(y~x)

lm1


3/17


4/17

Gianni Gorgoglione

6.

6.a

plot(x,y)

plot(res,y)

Yes, values in predicted and observed data shape a linear function and then residuals versus predicted

values

show a mean of zero where other values are within -10 and 10 that represent the standard deviation.

6.b

y2 lm2

res2


5/17

Gianni Gorgoglione

By plotting the histogram of the residuals it is not possible to see an even distribution. In other words, it

seems that a constant variance does not exist. This confirms that we are not in the case of Linear

Regression. The second assumption criterion tells that the linear model is not appropriate for the non-

linear data.

7.

7.a

plot(res,y)

Histogram of res2

res2

Frequency

-500 0 500 1000

0

10

20

30

40

50


6/17

Gianni Gorgoglione

Residuals are getting smaller closer to 0.

7.b

y3 lm3

res3


7/17

Gianni Gorgoglione

6500000

7000000

7500000

1300000 1700000

prec_DJF prec_JJA

1300000 1700000

temp_DJF temp_JJA

1300000 1700000

presence

-20

0

20

40

60

80

100

120

140

PART 2

1.

install.packages(sp)

library(sp)

install.packages("raster")

library(raster)

install.packages("rasterVis")

library(rasterVis)

2. bird.clim.sp


8/17

Gianni Gorgoglione

train.data


9/17

Gianni Gorgoglione

train.data$RT90.O->tx

train.data$RT90.N->ty

train.data$temp_JJA->tdjja

temp_jja.trend2


10/17

Gianni Gorgoglione

5.

The function predict takes the fit model and its values and calculates based on those values the

new predicted data.

temp_jja.trend1.predMSE2

MSE2 6.666005

sum((temp.trend3$temp_JJA-temp.trend3$predicted)^2)/179->MSE3

MSE3 6.731361

Table 1 MSE for crossvalidation of 1th, 2th and 3th order trend surfaces

MSE RESULTS

MSE1 6.958501

MSE2 6.666005MSE3 6.731361

According to the Mean Square Error cross-validation method the 2th order trend surface fits best

to the model where mse is the lowest.


11/17

Gianni Gorgoglione

PART 3

1.

http://www.oikos.ekol.lu.se/appendixdown/snouterdata.txt

2.

snout


12/17

Gianni Gorgoglione

4.

install.packages("pgirmess")

snout.res


13/17

Gianni Gorgoglione

5.

5.A

install.packages("spdep")

library(spdep)

dnearneigh(coords, 1, 1.5, row.names = NULL, longlat = NULL)

##Created 3 spatial weights interval

dnearneigh(coords, 0, 1, row.names = NULL, longlat = NULL)->w0

dnearneigh(coords, 1, 1.5, row.names = NULL, longlat = NULL)->w1

dnearneigh(coords, 1.5, 2, row.names = NULL, longlat = NULL)->w2

dnearneigh(coords, 2, 10, row.names = NULL, longlat = NULL)->w3

nb2listw(w0)

nb2listw(w1)

Characteristics of weights list object:

Neighbour list object:

Number of regions: 1108

Number of nonzero links: 4186

Percentage nonzero weights: 0.3409728

Average number of links: 3.777978

Weights style: W

Weights constants summary:

n nn S0 S1 S2

W 1108 1227664 1108 599.7917 4461.944

nb2listw(w2)







Weights style: W



14/17

Gianni Gorgoglione

n nn S0 S1 S2

W 1108 1227664 1108 618.75 4472.778

nb2listw(w3)







Weights style: W


n nn S0 S1 S2

W 1108 1227664 1108 10.45354 4468.041

nb2listw(w0)->sar0

nb2listw(w1)->sar1

nb2listw(w2)->sar2

nb2listw(w3)->sar3

errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar0)

Call:errorsarlm(formula = snout1.1$snouter1.1 ~ snout1.1$rain + snout1.1$djungle,

listw = sar0)

Type: error

Coefficients:

lambda (Intercept) snout1.1$rain snout1.1$djungle

0.87922189 79.91106965 -0.01964262 0.02931540

Log likelihood: -3514.855


Call:

errorsarlm(formula = snout1.1$snouter1.1 ~ snout1.1$rain + snout1.1$djungle,

listw = sar1)

Type: error

Coefficients:


15/17

Gianni Gorgoglione


0.81297579 81.12541938 -0.02054332 -0.01565557



Call:


listw = sar2)

Type: error

Coefficients:


0.73214746 80.38066616 -0.02022954 0.02246367


errorsarlm(snout1.1$snouter1.1~snout1 .1$rain+snout1.1$djungle,listw=sar3)

Call:


listw = sar3)

Type: error

Coefficients:


0.98325221 101.73633693 -0.01773770 0.08609926


5.B

##Create SAR LINEAR MODEL for each distance w1,w2,w3

errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar0)->SARLM0




AIC(SARLM0)-----7039.71


16/17

Gianni Gorgoglione

-0.2

0

-0.1

5

-0.1

0

-0.0

5

0.0

0

0.0

5

Moran I statistic = f(distance classes)

distance classes

MoranIstatistic

2 4 6 8 12 16 20 24 28 32 36 40 44

AIC(SARLM1)-----7414.543

AIC(SARLM2)-----7717.656

AIC(SARLM3) -----8078.533

The lowest AIC value is for SARLM0 where the distance between neighbours is 0 to 1 is the best model.

5.C

correlog(coords,SARLM0$residuals,method=c('Moran'))->SARLM0CORR




plot(SARLM0CORR)

plot(SARLM1CORR)

plot(SARLM2CORR)plot(SARLM3CORR)

Figure 1SARLM1 correlogram d=0-1.0 Figure 2 SARLM1 correlogram d=1.0-1.5

-0.1

5

-0.1

0

-0.0

5

0.0

0


distance classes

MoranIstatistic

2 6 10 14 18 22 26 30 34 38 42 46


17/17

Gianni Gorgoglione

-0.1

0.0

0.1

0.2

0.3

0.4


distance classes

MoranIstatistic

2 4 6 8 12 16 20 24 28 32 36 40 44

Figure 2 SARLM3 correlogram d=1.5-2.0 Figure 4 SARLM3 correlogram d=2.0-10.0

Both the results from AIC and SAR correlograms show that the best model is the first called SARLM0. In

the SARLM0 (Figure 1) correlogram the values of Moran I are pretty much close to 0. This is the situation

that depicts a no spatial pattern. In the SARLM4 (figure 4) there is a tendency for residuals to 1 at

distances between 0 and 6. Thus, there is the tendency to create pattern. At the end this could not be

acceptable for the assumption of linear Regression.

-0.2

0

-0.1

5

-0.1

0

-0.0

5

0.00

0.0

5

0.1

0


distance classes

MoranIstatistic

2 4 6 8 12 16 20 24 28 32 36 40 44

simple linear regression in r

Documents