simple linear regression in r

Upload: gianni-gorgoglione

Post on 03-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Simple Linear Regression in R

    1/17

    Gianni Gorgoglione

    0 20 40 60 80 100

    100

    200

    300

    400

    500

    x

    y

    Figure 1 Eps Figure 2 Original data x & y

    Simple Linear Regression in R

    PART 1

    1. Start R

    2. ## Generate data

    x

  • 8/11/2019 Simple Linear Regression in R

    2/17

    Gianni Gorgoglione

    0 20 40 60 80 100

    100

    200

    300

    400

    500

    x

    y

    Figure 3 Original data and Linear Regression

    4.

    ## Linear Model lm() with x as predictor and y as response

    4.a

    lm(y~x)

    lm1

  • 8/11/2019 Simple Linear Regression in R

    3/17

  • 8/11/2019 Simple Linear Regression in R

    4/17

    Gianni Gorgoglione

    6.

    6.a

    plot(x,y)

    plot(res,y)

    Yes, values in predicted and observed data shape a linear function and then residuals versus predicted

    values

    show a mean of zero where other values are within -10 and 10 that represent the standard deviation.

    6.b

    y2 lm2

    res2

  • 8/11/2019 Simple Linear Regression in R

    5/17

    Gianni Gorgoglione

    By plotting the histogram of the residuals it is not possible to see an even distribution. In other words, it

    seems that a constant variance does not exist. This confirms that we are not in the case of Linear

    Regression. The second assumption criterion tells that the linear model is not appropriate for the non-

    linear data.

    7.

    7.a

    plot(res,y)

    Histogram of res2

    res2

    Frequency

    -500 0 500 1000

    0

    10

    20

    30

    40

    50

  • 8/11/2019 Simple Linear Regression in R

    6/17

    Gianni Gorgoglione

    Residuals are getting smaller closer to 0.

    7.b

    y3 lm3

    res3

  • 8/11/2019 Simple Linear Regression in R

    7/17

    Gianni Gorgoglione

    6500000

    7000000

    7500000

    1300000 1700000

    prec_DJF prec_JJA

    1300000 1700000

    temp_DJF temp_JJA

    1300000 1700000

    presence

    -20

    0

    20

    40

    60

    80

    100

    120

    140

    PART 2

    1.

    install.packages(sp)

    library(sp)

    install.packages("raster")

    library(raster)

    install.packages("rasterVis")

    library(rasterVis)

    2. bird.clim.sp

  • 8/11/2019 Simple Linear Regression in R

    8/17

    Gianni Gorgoglione

    train.data

  • 8/11/2019 Simple Linear Regression in R

    9/17

    Gianni Gorgoglione

    train.data$RT90.O->tx

    train.data$RT90.N->ty

    train.data$temp_JJA->tdjja

    temp_jja.trend2

  • 8/11/2019 Simple Linear Regression in R

    10/17

    Gianni Gorgoglione

    5.

    The function predict takes the fit model and its values and calculates based on those values the

    new predicted data.

    temp_jja.trend1.predMSE2

    MSE2 6.666005

    sum((temp.trend3$temp_JJA-temp.trend3$predicted)^2)/179->MSE3

    MSE3 6.731361

    Table 1 MSE for crossvalidation of 1th, 2th and 3th order trend surfaces

    MSE RESULTS

    MSE1 6.958501

    MSE2 6.666005MSE3 6.731361

    According to the Mean Square Error cross-validation method the 2th order trend surface fits best

    to the model where mse is the lowest.

  • 8/11/2019 Simple Linear Regression in R

    11/17

    Gianni Gorgoglione

    PART 3

    1.

    http://www.oikos.ekol.lu.se/appendixdown/snouterdata.txt

    2.

    snout

  • 8/11/2019 Simple Linear Regression in R

    12/17

    Gianni Gorgoglione

    4.

    install.packages("pgirmess")

    snout.res

  • 8/11/2019 Simple Linear Regression in R

    13/17

    Gianni Gorgoglione

    5.

    5.A

    install.packages("spdep")

    library(spdep)

    dnearneigh(coords, 1, 1.5, row.names = NULL, longlat = NULL)

    ##Created 3 spatial weights interval

    dnearneigh(coords, 0, 1, row.names = NULL, longlat = NULL)->w0

    dnearneigh(coords, 1, 1.5, row.names = NULL, longlat = NULL)->w1

    dnearneigh(coords, 1.5, 2, row.names = NULL, longlat = NULL)->w2

    dnearneigh(coords, 2, 10, row.names = NULL, longlat = NULL)->w3

    nb2listw(w0)

    nb2listw(w1)

    Characteristics of weights list object:

    Neighbour list object:

    Number of regions: 1108

    Number of nonzero links: 4186

    Percentage nonzero weights: 0.3409728

    Average number of links: 3.777978

    Weights style: W

    Weights constants summary:

    n nn S0 S1 S2

    W 1108 1227664 1108 599.7917 4461.944

    nb2listw(w2)

    Characteristics of weights list object:

    Neighbour list object:

    Number of regions: 1108

    Number of nonzero links: 4084

    Percentage nonzero weights: 0.3326643

    Average number of links: 3.685921

    Weights style: W

    Weights constants summary:

  • 8/11/2019 Simple Linear Regression in R

    14/17

    Gianni Gorgoglione

    n nn S0 S1 S2

    W 1108 1227664 1108 618.75 4472.778

    nb2listw(w3)

    Characteristics of weights list object:

    Neighbour list object:

    Number of regions: 1108

    Number of nonzero links: 249036

    Percentage nonzero weights: 20.28535

    Average number of links: 224.7617

    Weights style: W

    Weights constants summary:

    n nn S0 S1 S2

    W 1108 1227664 1108 10.45354 4468.041

    nb2listw(w0)->sar0

    nb2listw(w1)->sar1

    nb2listw(w2)->sar2

    nb2listw(w3)->sar3

    errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar0)

    Call:errorsarlm(formula = snout1.1$snouter1.1 ~ snout1.1$rain + snout1.1$djungle,

    listw = sar0)

    Type: error

    Coefficients:

    lambda (Intercept) snout1.1$rain snout1.1$djungle

    0.87922189 79.91106965 -0.01964262 0.02931540

    Log likelihood: -3514.855

    errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar1)

    Call:

    errorsarlm(formula = snout1.1$snouter1.1 ~ snout1.1$rain + snout1.1$djungle,

    listw = sar1)

    Type: error

    Coefficients:

  • 8/11/2019 Simple Linear Regression in R

    15/17

    Gianni Gorgoglione

    lambda (Intercept) snout1.1$rain snout1.1$djungle

    0.81297579 81.12541938 -0.02054332 -0.01565557

    Log likelihood: -3702.271

    errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar2)

    Call:

    errorsarlm(formula = snout1.1$snouter1.1 ~ snout1.1$rain + snout1.1$djungle,

    listw = sar2)

    Type: error

    Coefficients:

    lambda (Intercept) snout1.1$rain snout1.1$djungle

    0.73214746 80.38066616 -0.02022954 0.02246367

    Log likelihood: -3853.828

    errorsarlm(snout1.1$snouter1.1~snout1 .1$rain+snout1.1$djungle,listw=sar3)

    Call:

    errorsarlm(formula = snout1.1$snouter1.1 ~ snout1.1$rain + snout1.1$djungle,

    listw = sar3)

    Type: error

    Coefficients:

    lambda (Intercept) snout1.1$rain snout1.1$djungle

    0.98325221 101.73633693 -0.01773770 0.08609926

    Log likelihood: -4034.266

    5.B

    ##Create SAR LINEAR MODEL for each distance w1,w2,w3

    errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar0)->SARLM0

    errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar1)->SARLM1

    errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar2)->SARLM2

    errorsarlm(snout1.1$snouter1.1~snout1.1$rain+snout1.1$djungle,listw=sar3)->SARLM3

    AIC(SARLM0)-----7039.71

  • 8/11/2019 Simple Linear Regression in R

    16/17

    Gianni Gorgoglione

    -0.2

    0

    -0.1

    5

    -0.1

    0

    -0.0

    5

    0.0

    0

    0.0

    5

    Moran I statistic = f(distance classes)

    distance classes

    MoranIstatistic

    2 4 6 8 12 16 20 24 28 32 36 40 44

    AIC(SARLM1)-----7414.543

    AIC(SARLM2)-----7717.656

    AIC(SARLM3) -----8078.533

    The lowest AIC value is for SARLM0 where the distance between neighbours is 0 to 1 is the best model.

    5.C

    correlog(coords,SARLM0$residuals,method=c('Moran'))->SARLM0CORR

    correlog(coords,SARLM1$residuals,method=c('Moran'))->SARLM1CORR

    correlog(coords,SARLM2$residuals,method=c('Moran'))->SARLM2CORR

    correlog(coords,SARLM3$residuals,method=c('Moran'))->SARLM3CORR

    plot(SARLM0CORR)

    plot(SARLM1CORR)

    plot(SARLM2CORR)plot(SARLM3CORR)

    Figure 1SARLM1 correlogram d=0-1.0 Figure 2 SARLM1 correlogram d=1.0-1.5

    -0.1

    5

    -0.1

    0

    -0.0

    5

    0.0

    0

    Moran I statistic = f(distance classes)

    distance classes

    MoranIstatistic

    2 6 10 14 18 22 26 30 34 38 42 46

  • 8/11/2019 Simple Linear Regression in R

    17/17

    Gianni Gorgoglione

    -0.1

    0.0

    0.1

    0.2

    0.3

    0.4

    Moran I statistic = f(distance classes)

    distance classes

    MoranIstatistic

    2 4 6 8 12 16 20 24 28 32 36 40 44

    Figure 2 SARLM3 correlogram d=1.5-2.0 Figure 4 SARLM3 correlogram d=2.0-10.0

    Both the results from AIC and SAR correlograms show that the best model is the first called SARLM0. In

    the SARLM0 (Figure 1) correlogram the values of Moran I are pretty much close to 0. This is the situation

    that depicts a no spatial pattern. In the SARLM4 (figure 4) there is a tendency for residuals to 1 at

    distances between 0 and 6. Thus, there is the tendency to create pattern. At the end this could not be

    acceptable for the assumption of linear Regression.

    -0.2

    0

    -0.1

    5

    -0.1

    0

    -0.0

    5

    0.00

    0.0

    5

    0.1

    0

    Moran I statistic = f(distance classes)

    distance classes

    MoranIstatistic

    2 4 6 8 12 16 20 24 28 32 36 40 44