stat modelling assignment 5

Upload: fariha-ahmad

Post on 02-Mar-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 Stat Modelling Assignment 5

    1/12

    STQS 2234

    STATISTICAL MODELLING

    TITLE

    ASSIGNMENT 5

    PREPARED FOR

    DR. MARINA BINTI ZAHARI

    GROUP 7

    NORAINSYIRAH BINTI MOHAMED NORDIN A151588

    NUR SYAHIDAH BINTI KHALAPIAH A150105

    NUR NADIRAH BINTI MOHAMAD YOHYI A151055

    SITI NUR ZAWANIE BINTI MD SOBRI A149121

    NUR AZILA BINTI BAHARUDDIN A148328

    NURFARIHA NADHIRAH BINTI AHMAD A151675

    NOORMARINA BINTI MOKHTAR A151110

  • 7/26/2019 Stat Modelling Assignment 5

    2/12

    1. Consider the multiple linear regression model y = X + . If denotes the least squares

    estimator of , show that + [ 1 ] .

    y = X +

    Minimizes SSE; SSE = 2=1 where ; = = ( )( ) = + =

    = 0

    Derived from fitted model ;

    1

    From multiple linear regression model y = X + ;

    y = X +

    = X +

    [ 1 ][ + ] + [ 1 ]

    QUESTION 1:

  • 7/26/2019 Stat Modelling Assignment 5

    3/12

    3(e)

    i. Residual versus x1 -

    ii. Residual versus x2

    iii. Residual versus x3iv. Residual versus x4

  • 7/26/2019 Stat Modelling Assignment 5

    4/12

    QUESTION 2:

  • 7/26/2019 Stat Modelling Assignment 5

    5/12

  • 7/26/2019 Stat Modelling Assignment 5

    6/12

    QUESTION 3

    a) 0:0 j H ,

    :1 H At least one of the j is not equal to zero

    5,...,2,1 j

    Test Statistic: 81.40 f

    Compare 81.40 f > 17.419,5,01.0 f and p-value = 0.0052 < 01.0

    Here, we reject 0 H . We conclude that the data is linearly related to 4321 ,,, x x x x and 5 x .

    0:

    0:

    11

    10

    H

    H

    Test Statistic: 47.20 t

    Compare 093.247.2 19,2/05.00 t t and p-value = 0.023 < 05.0

    With 05.0 , we would reject the null hypothesis. This indicates the predictor contribute tothe model.

    0:

    0:

    21

    20

    H

    H

    Test Statistic: 74.20 t

    Compare 093.274.2 19,2/05.00 t t and p-value = 0.013 < 05.0

    With 05.0 , we would reject the null hypothesis. This indicates the predictor contribute tothe model.

  • 7/26/2019 Stat Modelling Assignment 5

    7/12

    0:

    0:

    31

    30

    H

    H

    Test Statistic: 42.20 t

    Compare 093.242.2 19,2/05.00 t t and p-value = 0.026 < 05.0

    With 05.0 , we would reject the null hypothesis. This indicates the predictor contribute tothe model.

    0:

    0:

    41

    40

    H

    H

    Test Statistic: 79.20 t

    Compare 093.279.2 19,2/05.00 t t and p-value = 0.012 < 05.0

    With 05.0 , we would reject the null hypothesis. This indicates the predictor contribute tothe model.

    0:

    0:

    51

    50

    H

    H

    Test Statistic: 25.00 t

    Compare 093.225.0 19,2/05.00 t t and p-value = 0.801 > 05.0

    With 05.0 , we fail to reject the null hypothesis. This indicates the predictor could bedeleted from the model.

    b) When 5 x was excluded from the model and the model was re-fitted, it shows that the model is

    better compare to the model with 5 x .

    c) Some of the residuals are having large number and these become the outliers.

  • 7/26/2019 Stat Modelling Assignment 5

    8/12

    d) In (a), 2adj R is the least compared to the2

    adj R in (b) and (c).2

    adj R for (b) is higher than (a) and

    lower than (c) when 5 x is removed with 25 observations. For (c), the2

    adj R is the highest when

    5 x is removed with 24 observations. It shows that the2

    adj R increases as test. This is because the

    variables in the model are all useful for the model.

    e)

    Residuals versus 1 x

    - Based on the plots, the points are randomly scattered. More points are plotted at the bottom of the graph or at negative region. This is because it is over-predicted. There arealso some outliers.

    Residuals versus 2 x

    - Based on the plots, the points are randomly scattered. More points are plotted at the bottom of the graph or at negative region. This is because it is over-predicted. There arealso some outliers. Points at the left bottom are plotted closely to each other.

    Residuals versus 3 x

    - Based on the plots, the points are randomly scattered. More points are plotted at the bottom of the graph or at negative region. This is because it is over-predicted. There arealso some outliers. The points at the negative region of the graph are mostly at the same xvalue.

    Residuals versus 4 x

    - Based on the plots, the points are randomly scattered. The points plotted are most likelysame at both regions. There are also some outliers.

  • 7/26/2019 Stat Modelling Assignment 5

    9/12

    f) 0:0 j H ,

    :1 H At least one of the j is not equal to zero

    4,3,2,1 j

    Test Statistic: 79.210 f

    Compare 79.210 f > 87.220,4,05.0 f

    Here, we reject 0 H . We conclude that the data is linearly related to 321 ,, x x x and 4 x .

    0:

    0:

    11

    10

    H

    H

    Test Statistic: 76.50 t

    Compare 725.176.5 20,2/05.00 t t

    With 05.0 , we would reject the null hypothesis. This indicates the predictor contribute tothe model.

    0:

    0:

    21

    20

    H

    H

    Test Statistic: 96.50 t

    Compare 725.196.5 20,2/05.00 t t

    With 05.0 , we would reject the null hypothesis. This indicates the predictor contribute tothe model.

  • 7/26/2019 Stat Modelling Assignment 5

    10/12

    0:

    0:

    31

    30

    H

    H

    Test Statistic: 90.20 t

    Compare 725.190.2 20,2/05.00 t t

    With 05.0 , we would reject the null hypothesis. This indicates the predictor contribute tothe model.

    0:

    0:

    41

    40

    H

    H

    Test Statistic: 99.40 t

    Compare 725.199.4 20,2/05.00 t t

    With 05.0 , we would reject the null hypothesis. This indicates the predictor contribute tothe model.

    g) The residual plots against 321 ,, x x x and 4 x has boundary between -1 and 1 and there are no

    pattern shown in the plots. All of the plots have an outlier. The points in the all plots aresymmetrically distributed and most of the points are near to zero.

  • 7/26/2019 Stat Modelling Assignment 5

    11/12

    QUESTION 4 :

    Based on the 2 -value criterion, the best model is the model with the two predictors PctComp

    and PctTD as the 2-value give a substantial increase by jumps from 64.8 to 85.1.

    Based on the adjusted 2 -value and MSE criteria, the best model is the model with the seven

    predictors Att, PctComp, Yds, YdsperAtt, TD, PctTD and PctInt as the model have the largest

    adjusted 2 -value (100.0) and the smallest (5.1).

    Based on the criterion, there are eight possible best models

    i. the model with 6 predictors containing Att, PctComp, Yds, YdsperAtt, PctTD and

    PctInt;

    ii. the model with 6 predictors containing Att, Comp, PctComp, YdsperAtt, PctTD and

    PctInt;

    iii. the model with 7 predictors containing Att, PctComp, Yds, YdsperAtt, TD, PctTD and

    PctInt;

    iv. the model with 7 predictors containing Att, PctComp, Yds, YdsperAtt, PctTD, Int and

    PctInt;

    v. the model with 8 predictors containing Att, PctComp, Yds, YdsperAtt, TD, PctTD, Int

    and PctInt;

    vi. the model with 8 predictors containing Att, Comp, PctComp, Yds, YdsperAtt, TD,

    PctTD and PctInt;

    vii. the model with 9 predictors containing Att, PctComp, Yds, YdsperAtt, TD, PctTD, Lng,

    Int and PctInt;

    viii. and the model with 9 predictors containing Att, Comp, PctComp, Yds, YdsperAtt, TD,

    PctTD, Int and PctInt.

    As all of those models are unbiased models, because their values equal (or are below) the

    number of parameters, .

  • 7/26/2019 Stat Modelling Assignment 5

    12/12

    QUESTION 5: