02 chapter alr for printing(1)

Upload: sean-cohen

Post on 01-Jun-2018

231 views

Category:

Documents


2 download

TRANSCRIPT

  • 8/9/2019 02 Chapter ALR for Printing(1)

    1/117

    Introduction to Simple Linear Regression: I

    consider responseY and predictorXmodel for simple linear regression assumes a mean function

    E(Y | X = x) of the form 0+1x and a variance functionVar(Y |X=x) that is constant:

    E(Y |X=x) =0+ 1x and Var(Y |X=x) =2

    3 parameters are 0 (intercept), 1(slope) & 2 >0

    to interpret2, define random variablee =YE(Y |X=x)so thatY = E(Y |X=x) +e

    results inA.2.4 of ALR say that E(e) = 0 and Var(e) =2

    2 tells how closeYis likely to be to mean value E(Y |X=x)

    ALR21, 292 II1

  • 8/9/2019 02 Chapter ALR for Printing(1)

    2/117

    Introduction to Simple Linear Regression: II

    let (xi, yi),i= 1, . . . , n, denote predictor/response pairs (X, Y)

    ei=yiE(Y |X=xi) =yi(0+1xi) is called a statisticalerror and represents distance betweenyiand its mean function

    will make two additional assumptions about the errors:

    [1]E(ei|xi) = 0, implying a scatter plot ofeiversusxishouldresemble a null plot (random deviations about zero)

    [2]e1, . . . , enare a set ofnindependent random variables (RVs)

    will make third additional assumption upon occasion:

    [3]conditional onxis, errorseis are normally distributed

    note: normally distributed is same as Gaussian distributed(preferred expression in engineering & physical sciences)

    ALR21, 29 II2

  • 8/9/2019 02 Chapter ALR for Printing(1)

    3/117

    Estimation of Model Parameters: I

    Q: given data (xi, yi), i = 1, . . . , n (n realizations of RVs XandY), how should we determine parameters0 & 1?

    since 0& 1determine a line, question is equivalent to decidinghow best to draw a line through a scatterplot of (xi, yi)

    for n >2, possibilities for defining best (lotsmore exist!): hire an expert to eye ball a line (Mosteller et al., 1981) find line minimizing distances between data and all possible

    lines, with some considerations being

    direction (vertical, horizontal, perpendicular) squared difference, absolute difference, etc. look at all possible lines determined by two distinct points in

    scatterplot, and pick one with median slope (sounds bizarre,but later on will discuss why this might be of interest)

    ALR22,23 II3

  • 8/9/2019 02 Chapter ALR for Printing(1)

    4/117

  • 8/9/2019 02 Chapter ALR for Printing(1)

    5/117

    Vertical, Horizontal&PerpendicularLeast Squares

    0 10 20 30 40 50

    10

    20

    30

    40

    50

    60

    xi

    yi

    II5

  • 8/9/2019 02 Chapter ALR for Printing(1)

    6/117

    Estimation of Model Parameters: II

    one strategy: form to-be-defined estimators 0 & 1 of0 &1, after which form residuals(observed errors):

    ei=yi (0+ 1xi) =yi yi,where yi= (0+ 1xi) is fitted value forith case

    Q: why isnt residual eiin general equal to error

    ei=yi (0+ 1xi)?Q: if per chance we had 0=0& 1=1, would fitted value

    yibe equal to actual value yi?

    ALR22 II6

  • 8/9/2019 02 Chapter ALR for Printing(1)

    7/117

    Least Squares Criterion: I

    least squares scheme: estimate 0& 1such that sum of squaresof resulting residuals is as small as possible

    since residuals are given by ei=yi (0 +1xi) once 0& 1are known, consider

    RSS(b0, b1) =n

    i=1

    [yi (b0+b1xi)]2,

    i.e., theresidual sum of squareswhen we useb0for the inter-cept andb1 for the slope

    least squares estimators 0& 1 are such that

    RSS(0,1)< RSS(b0, b1)

    when eitherb0 = 0orb1 = 1 (or both)ALR24 II7

  • 8/9/2019 02 Chapter ALR for Printing(1)

    8/117

    Least Squares Criterion: II

    Q: how do we set b0 &b1 to make RSS(b0, b1) the smallest?could try lots of different values (a grid search a potentially

    exhausting task!), but can put calculus to good use here

    to motivate how to find 0 & 1, first consider simpler mean

    function E(Y |X=x) =1x(regression through origin)model is nowY =1x+e, and task is to findb1 minimizing

    RSS(b1) =n

    i=1[yi b1xi]2 =

    n

    i=1

    y2i 2b1xiyi+b21x2i

    Q: why isb1minimizing RSS(b1) is same aszminimizing

    f(z) =az2 +bz, where a=n

    i=1

    x2i and b= 2n

    i=1

    xiyi?

    ALR24, 47, 48 II8

  • 8/9/2019 02 Chapter ALR for Printing(1)

    9/117

    Least Squares Criterion: III

    sincea= i x2i >0,f(z) =az2 +bz asz sincef(0) = 0, minimizer zmust be such that f(z) 0

    0

    0

    z

    f(z)

    iff(z)

  • 8/9/2019 02 Chapter ALR for Printing(1)

    10/117

    Least Squares Criterion: IV

    roots of polynomialaz2 +bz+cgiven by quadratic formula:b

    b2 4ac

    2a

    whenc= 0, one root is 0, and nonzero root is b/a, soz= b

    2a= i xiyi

    i x2i

    = 1 ()

    alternative approach to finding minimizer ofRSS(b1): differen-tiate with respect tob1, set result to 0 and solve for b1:

    d RSS(b1)

    db1=

    d i[yi b1xi]2db1

    = 2i

    xi(yi b1xi) = 0,

    which yields same expression for 1as stated in()ALR47, 48 II10

  • 8/9/2019 02 Chapter ALR for Printing(1)

    11/117

    Least Squares Criterion: V

    return now to mean function E(Y | X = x) = 0+ 1X, forwhich RSS(b0, b1) =

    i[yi b0 b1xi]2

    calculus-based approach to get least squares estimators 0 &1 follows a path similar to that for E(Y |X=x) =1X

    leads to two equations to solve for two unknowns (0 & 1)

    differentiate RSS(b0, b1) with respect to b0and set result to 0:

    2

    i(yi b0 b1xi) = 0, giving b0n+b1

    ixi=

    iyi

    differentiate RSS(b0, b1) with respect to b1and set result to 0:

    2

    i

    xi(yib0b1xi) = 0, giving b0

    i

    xi+b1

    i

    x2i =

    i

    xiyi

    ALR293 II11

  • 8/9/2019 02 Chapter ALR for Printing(1)

    12/117

    Least Squares Criterion: VI

    so-callednormal equationsfor simple linear regression are thusb0n+b1

    i

    xi=

    i

    yi and b0

    i

    xi+b1

    i

    x2i =

    i

    xiyi

    using x= 1n

    i xi& y=

    1n

    i yi, 1st normal equation gives

    b0= y b1xreplaceb0in 2nd normal equation with right-hand side of above:

    (y b1x)

    i

    xi+b1

    i

    x2i =

    i

    xiyi

    after a bit of algebra, getb1=

    i xiyi yi xii x

    2i x

    i xi

    =

    i xiyi nxyi x

    2i nx2

    = 1,

    and hence 0= y 1xALR293, 204 II12

  • 8/9/2019 02 Chapter ALR for Printing(1)

    13/117

    Sum of Cross Products and Sum of Squares

    define sum of cross products and sum of squares for xs:

    SXY=

    i

    (xi x)(yi y) & SXX=

    i

    (xi x)2 ()

    Problem 1: show thati

    (xix)(yiy) = i

    xiyinxy & i

    (xix)2 = i

    x2inx2

    can thus write

    1= i xiyi nxyi x2i nx2 = SXYSXX

    note: should avoid

    i xiyinxy&

    i x2inx2 when actually

    computing 1 use SXYand SXXfrom()insteadALR294, 23 II13

  • 8/9/2019 02 Chapter ALR for Printing(1)

    14/117

    Sufficient Statistics

    since1=

    SXY

    SXXand 0= y 1x,

    need only know x, y, SXY& SXXto form 0& 1

    since

    x=1n

    ni=1

    xi, y=1n

    ni=1

    yi, SXY=n

    i=1

    xiyinxy, SXX= ni=1

    x2inx2

    follows that 0& 1depend only on foursufficient statistics:n

    i=1xi,

    ni=1

    yi,

    ni=1

    xiyi and

    ni=1

    x2i

    in theory, can dispense with 2nvalues (xi, yi),i= 1, . . . , n, andjust keep 4 sufficient statistics as far as 0 & 1 are concerned

    ALR294, 23 II14

  • 8/9/2019 02 Chapter ALR for Printing(1)

    15/117

    Atmospheric Pressure & Boiling Point of Water

    as 1st example, reconsider Forbess recordings of atmosphericpressure and boiling point of water, which physics suggests arerelated by

    log (pressure) = 0+ 1 boiling pointtaking responseYto be log

    10

    (pressure) and predictorXto beboiling point, will estimate 0 & 1 for model

    Y =0+ 1X+e

    via least squares based upon data (xi, yi),i= 1, . . . , 17

    taking log to mean log base 10, computations in Ryield

    x= 202.9529, y= 1.396041, SXY= 4.753781 & SXX= 530.7824,

    from which we get

    1=SXY

    SXX= 0.008956178 and 0= y 1x= 0.4216418

    ALR25, 26 II15

  • 8/9/2019 02 Chapter ALR for Printing(1)

    16/117

  • 8/9/2019 02 Chapter ALR for Printing(1)

    17/117

    Predicting the Weather

    as 2nd example, reconsidern= 93 years of measured early/lateseason snowfalls from Fort Collins, Colorado

    taking response Yand predictor Xto be late season (JanJune)and early season (SeptDec) snowfalls, entertain model

    Y =0+ 1X+e

    computations in Ryield

    x= 16.74409, y= 32.04301, SXY= 2229.014 & SXX= 10954.07,

    from which we get

    1=SXY

    SXX= 0.2034873 and 0= y 1x= 28.6358

    ALR8, 9 II17

  • 8/9/2019 02 Chapter ALR for Printing(1)

    18/117

    Scatterplot of Late Snowfall Versus Early Snowfall

    0 10 20 30 40 50

    10

    20

    30

    40

    50

    60

    Early snowfall (inches)

    Latesnowfall(i

    nches)

    ALR8 II18

  • 8/9/2019 02 Chapter ALR for Printing(1)

    19/117

    Sample Variances, Covariance and Correlation

    define sample variance and sample standard deviation ofxs:

    SD2x=

    i(xi x)2n 1 =

    SXX

    n 1 & SDx=

    SXX

    n 1note: sometimesnis used in place ofn

    1 in defining SD2x

    after defining SYY= i(yiy)2 (sum of squares forys), definesimilar quantities forys:

    SD2y= i(yi y)2n

    1

    = SYY

    n

    1

    & SDy=

    SYY

    n

    1

    finally define sample covariance and then sample correlation:

    sxy=

    i(xi x)(yi y)

    n 1 = SXY

    n 1 & rxy= sxy

    SDxSDy

    ALR23 II19

  • 8/9/2019 02 Chapter ALR for Printing(1)

    20/117

    Alternative Expression for Slope Estimator

    Problem 2: alternative expression for 1=SXY/SXXis

    1=rxySDySDx

    Problem 3:

    1

    rxy

    1

    note that, ifxis & yis are such that SDy = SDx, then esti-mated slope is same as sample correlation, as following set ofplots illustrate

    ALR24 II20

  • 8/9/2019 02 Chapter ALR for Printing(1)

    21/117

    Sample Correlation rxy= 0.999

    !"# #"$ #"# #"$ !"#

    !"

    #

    #"

    $

    #"

    #

    #"

    $

    !

    "#

    %&

    '&

    II21

  • 8/9/2019 02 Chapter ALR for Printing(1)

    22/117

    Estimating 2: I

    simple linear regression model has 3 parameters: 0(intercept),1 (slope) & 2 (variance of errors)

    with0&1estimated by 0 & 1, will base estimator for 2

    on variance of residuals (observed errors)

    recall definition of residuals: ei=yi (0+

    1xi)

    in view of, e.g.,

    SD2x=

    i(xi x)2n 1

    obvious estimator of2 would appear to bei(ei e)2n 1 , where

    e=1

    n

    ni=1

    ei

    Problem 4: show thate= 0alwaysfor simple linear regression

    ALR26 II22

  • 8/9/2019 02 Chapter ALR for Printing(1)

    23/117

    Estimating 2: II

    obvious estimator thus simplifies to ie2i /(n 1)taking RSSto be shorthand for RSS(0,1), we have

    RSS=n

    i=1[yi (0+ 1xi)]2 =

    n

    i=1e2i ,

    so the obvious estimator of2 is RSS/(n 1)can show (e.g., Seber, 1977, p. 51) that unbiased estimator 2

    of2, i.e., E(2) =2, is

    2 = RSSn 2,

    wheren2 = sample size # of parameters in mean function;obvious estimator 2 and hence is biased towards zero

    ALR26, 27, 306 II23

  • 8/9/2019 02 Chapter ALR for Printing(1)

    24/117

  • 8/9/2019 02 Chapter ALR for Printing(1)

    25/117

    Scatterplot of log10(Pressure) Versus Boiling Point

    195 200 205 210

    1.3

    5

    1.4

    0

    1.4

    5

    Boiling point

    log(Pressu

    re)

    ALR6 II25

  • 8/9/2019 02 Chapter ALR for Printing(1)

    26/117

    Residual Plot for Forbes Data

    !"# $%% $%# $!%

    %&%

    %#

    %&%

    %%

    %&%

    %#

    %&%

    !%

    %&%

    !

    #

    '()*)+, .()+/

    012)345*2

    ALR6 II26

  • 8/9/2019 02 Chapter ALR for Printing(1)

    27/117

    Estimating 2: IV

    for the Forbes data, computations in RyieldSYY= 0.04279135, SXY= 4.753781 & SXX= 530.7824,

    from which we get

    RSS= SYY

    (SXY)2

    SXX= 0.0002156426

    sincen= 17 for the Forbes data,

    2 = RSS

    n 2=RSS

    15 = 0.00001437617

    standard error of regression is

    = 0.00001437617 = 0.003791592(also called residual standard error)

    reddashed horizontal lines on residual plot showALR26 II27

  • 8/9/2019 02 Chapter ALR for Printing(1)

    28/117

    Estimating 2: V

    for the Fort Collins data, computations in Ryield

    SYY= 17572.41, SXY= 2229.014 & SXX= 10954.07,

    from which we get

    RSS= SYY (SXY)2

    SXX= 17118.83

    sincen= 93 for the Fort Collins data,

    2 = RSS

    n

    2=

    RSS

    91 = 188.119

    standard error of regression is

    =

    354.5912 = 13.71565

    ALR26 II28

  • 8/9/2019 02 Chapter ALR for Printing(1)

    29/117

    Scatterplot of Late Snowfall Versus Early Snowfall

    0 10 20 30 40 50

    10

    20

    30

    40

    50

    60

    Early snowfall (inches)

    Latesnowfall(i

    nches)

    ALR8 II29

  • 8/9/2019 02 Chapter ALR for Printing(1)

    30/117

    Residual Plot for Fort Collins Data

    ! "! #! $! %! &!

    $!

    #!

    "!

    !

    "!

    #!

    $!

    '()*+ -./01(** 23.456-7

    86-39:(*-

    II30

  • 8/9/2019 02 Chapter ALR for Printing(1)

    31/117

    Matrix Formulation of Simple Linear Regression: I

    matrix theory offers an alternative formulation for simple linearregression, with the advantage that it generalizes readily tohandle multiple linear regression

    start by defining a n-dimensional column vector Y with yis;

    ann 2 matrixXwhose 1st column consists of just 1s, andwhose 2nd has thexis; a 2-dimensional vector containing0and1; and ann-dimensional vectorecontaining theeis:

    Y= y1y2...

    yn , X=

    1 x11 x2... ...1 xn

    , = 01 , e= e1e2...

    en

    matrix version of simple linear regression model is Y=X+e

    ALR63, 64, 60 II31

  • 8/9/2019 02 Chapter ALR for Printing(1)

    32/117

    Matrix Formulation of Simple Linear Regression: II

    since

    X=

    1 x11 x2... ...1 xn

    & =

    01

    , it follows that X =

    0+ 1x10+ 1x2

    ...0+ 1xn

    henceith row of matrix equationY =X + esays

    yi=0+ 1xi+ei,

    which is consistent with modelY =0 +1X+ e; see also II2

    let e andX denote the transposes ofe and X; i.e., e is ann-dimensionalrow vector e1 e2 en, whileXis a 2 nmatrix taking the form

    1 1 1x1 x2 xn

    ALR299, 300, 301 II32

  • 8/9/2019 02 Chapter ALR for Printing(1)

    33/117

    Matrix Formulation of Simple Linear Regression: III

    sincee=Y

    X and sinceee= i e2i , can express sum ofsquares of errors as

    ee= (Y X)(Y X)if we entertain b=

    b0 b1

    rather than unknown =

    0 1

    ,

    corresponding residuals are given byY

    Xb, so residual sum

    of squares can be written as

    RSS(b) = (Y Xb)(Y Xb)= (Y bX)(Y Xb)= YY

    YXb

    bXY + bXXb

    = YY 2YXb + bXXb,where we make use of 2 facts: (1) transpose of a product isproduct of transposes in reverse order & (2) transpose of ascalar is itself (hencebXY= (bXY)=YXb)

    ALR61, 62, 300, 301, 304 II33

  • 8/9/2019 02 Chapter ALR for Printing(1)

    34/117

    Taking Derivatives with Respect to Vector b: I

    supposef(b) is a scalar-valued function of vectorb(elementsareb1, b2, . . . , bq)

    two examples, for which ais a vector (ith element is ai) & Ais anq qmatrix (element inith row &jth column isAi,j):

    f1(b) =ab=q

    i=1

    aibi and f2(b) =bAb=q

    i=1

    qj=1

    biAi,jbj

    define

    df(b)db

    =

    df(b)db1

    df(b)db2

    ...df(b)

    dbq

    ALR301, 304 II34

  • 8/9/2019 02 Chapter ALR for Printing(1)

    35/117

    Taking Derivatives with Respect to Vector b: II

    can show (see, e.g., Rao, 1973, p. 71) that

    f1(b) =ab has derivative df1(b)

    db =a

    and

    f2(b) =bAb has derivative df2(b)

    db = 2Ab

    (not hard to show do it for fun and games!)

    Q: what is the derivative off3(b) =ba?

    Q: what is the derivative off4(b) =bb=

    i b

    2i ?

    Q: what is the derivative of

    f5(b) =cCb=

    pi=1

    qj=1

    ciCi,jbj,

    wherecis ap-dimensional vector andCis ap qmatrix?II35

  • 8/9/2019 02 Chapter ALR for Printing(1)

    36/117

    Matrix Formulation of Simple Linear Regression: IV

    returning toRSS(b) =YY 2YXb + bXXb,

    taking the derivative off(b) = RSS(b) with respect to bandsetting the resulting expression to 0 (a vector of zeros) yields

    the matrix version of the normal equations:XXb= XY,

    where we have made use of the facts

    d(YY)db =0,

    d(YXb)db =XY and

    d(bXXb)db = 2XXb

    least squares estimator of is solution to normal equations:

    XX=XY

    ALR304 II36

  • 8/9/2019 02 Chapter ALR for Printing(1)

    37/117

    Matrix Formulation of Simple Linear Regression: V

    lets verify that solution to XX=XYyields same estima-tors 1 & 0as before, namely, SXY/SXX& y 1x

    now

    XX= 1 1 1x1 x2 xn1 x11 x2... ...1 xn

    = n i xii xi i x2i = n nxnxi x2iand

    XY= 1 1 1x1 x2 xn

    y1

    y2...yn

    = i yii xiyi

    = nyi xiyi

    Problem 6: finish the verification!

    ALR63, 64 II37

  • 8/9/2019 02 Chapter ALR for Printing(1)

    38/117

    Properties of Least Squares Estimators: I

    since E(Y |X=x) =0+ 1x, fitted mean function isE(Y |X=x) = 0+ 1x, ()which is a line with intercept 0 and slope 1

    recalling that 0

    = y

    1x, start from the right-hand side of

    ()withxset to xto get0+ 1x= y 1x+1x= y,

    which says point (x,y) mustlie on fitted mean function

    vertical dashed line on following plots indicates the value of x,while horizontal dashed line, the value of y

    ALR27, 28 II38

  • 8/9/2019 02 Chapter ALR for Printing(1)

    39/117

    Scatterplot of log10(Pressure) Versus Boiling Point

    195 200 205 210

    1.3

    5

    1.4

    0

    1.4

    5

    Boiling point

    log(Pressu

    re)

    ALR6 II39

  • 8/9/2019 02 Chapter ALR for Printing(1)

    40/117

    Scatterplot of Late Snowfall Versus Early Snowfall

    0 10 20 30 40 50

    10

    20

    30

    40

    50

    60

    Early snowfall (inches)

    Latesnowfall(inches)

    ALR8 II40

  • 8/9/2019 02 Chapter ALR for Printing(1)

    41/117

    Properties of Least Squares Estimators: II

    both 0 and 1 can be written as a linear combination of re-sponsesy1, y2, . . . , yn

    since 1=SXY/SXXand since SXY=

    i(xi x)yi(see Prob-lem 1), we have

    1=i(xi x)yi

    SXX= n

    i=1

    ciyi, where ci=xi xSXX

    Q: whichyis will have the most/least influence on 1?

    lets look atciplotted versusxifor Forbes data

    ALR27 II41

  • 8/9/2019 02 Chapter ALR for Printing(1)

    42/117

    Weights Versus Boiling Point for Forbes Data

    !"# $%% $%# $!%

    %&

    %!#

    %&

    %%#

    %

    &%%#

    %&

    %!#

    '()*)+, .()+/

    0)

    II42

  • 8/9/2019 02 Chapter ALR for Printing(1)

    43/117

    Scatterplot of log10(Pressure) Versus Boiling Point

    195 200 205 210

    1.3

    5

    1.4

    0

    1.4

    5

    Boiling point

    log(Pressu

    re)

    ALR6 II43

  • 8/9/2019 02 Chapter ALR for Printing(1)

    44/117

    Random Vectors and Their Properties: I

    column vector U is said to be a random vector if each of itselementsUiis an RV (random variable)

    expected value of random vector denoted by E(U) is avector whoseith element is expected value ofith RVUiinU

    for example, ifU= U1 U2 U3, thenE(U) =

    E(U1)E(U2)E(U3)

    ifUhas dimensionq, ifais ap-dimensional column vector ofconstants and ifAis ap qdimensional matrix of constants,

    can show (fairly easily give it a try!) that

    E(a + AU) =a + AE(U)

    ALR303 II44

  • 8/9/2019 02 Chapter ALR for Printing(1)

    45/117

    Random Vectors and Their Properties: II

    recall that, ifUiand U

    jare two RVs, theircovarianceis defined

    to be

    Cov(Ui, Uj) = E([Ui E(Ui)][Uj E(Uj)]note that Cov(Uj, Ui) = Cov(Ui, Uj) and that

    Cov(Ui, Ui) = E([UiE(Ui)][UiE(Ui)] = E([UiE(Ui)]2

    ) = Var(Ui)by definition, the covariance matrix for q-dimensional random

    vector U to be denoted by Var(U) is q qmatrix whose(i, j)th element is Cov(Ui, Uj)

    for example, ifU= U1 U2 U3, thenVar(U) =

    Var(U1) Cov(U1, U2) Cov(U1, U3)Cov(U2, U1) Var(U2) Cov(U2, U1)Cov(U3, U1) Cov(U3, U2) Var(U3)

    ALR291, 292, 303 II45

  • 8/9/2019 02 Chapter ALR for Printing(1)

    46/117

    Random Vectors and Their Properties: III

    ifUhas dimensionq, ifais ap-dimensional column vector ofconstants and ifA is p qdimensional matrix of constants,can show (a bit more challenging, but still worth a try!) that

    Var(a + AU) =A Var(U)A

    RVs inUare uncorrelated if Cov(Ui, Uj) = 0 wheni =jif eachUiinUhas the same variance (

    2, say) and ifUis areuncorrelated, then Var(U) =2I, whereI is theq qidentitymatrix (1s along its diagonal and 0s elsewhere)

    for this special case,

    Var(a + AU) =A(2I)A=2AA

    ALR304, 292 II46

  • 8/9/2019 02 Chapter ALR for Printing(1)

    47/117

    Properties of Least Squares Estimators: III

    recall that least squares estimator

    solves normal equations:XX=XY ()

    Problem 6: in the case of simple linear regression,XX is aninvertible matrix and thus has an inverse(XX)1 such that

    (XX)1XX= XX(XX)

    1=I

    premultiplication of both sides of()by(XX)1 yields(XX)1XX=(XX)1XY

    from which we get I=(XX)1XYand hence=(XX)1XY

    above succinctly expresses the fact that 0& 1(the elementsof) are linear combinations ofyis (the elements ofY)

    ALR61, 64, 304 II47

  • 8/9/2019 02 Chapter ALR for Printing(1)

    48/117

    Properties of Least Squares Estimators: IV

    considering = (XX)

    1

    XY and taking conditional expec-tation of both sides yieldsE(|X) = E((XX)1XY|X)

    = (XX)1XE(Y|X)= (X

    X)

    1X

    E(X + e|X)

    = (XX)1X(X +E(e|X))= (XX)1XX=

    Q: whats the justification for each step above?E( | X) = holds for all X and hence E() = uncondi-

    tionally, from which we can conclude that 0& 1are unbiasedestimators of0 & 1: E(0) =0 and E(1) =1

    ALR305 II48

  • 8/9/2019 02 Chapter ALR for Printing(1)

    49/117

    Properties of Least Squares Estimators: V

    since (i) Var(a + AU) =A Var(U)A, (ii) (AB)=BAand(iii) (A1)= (A)1 for a square matrix, we have

    Var(|X) = Var((XX)1XY|X)= (XX)1XVar(Y|X)((XX)1X)

    = (XX)1

    XVar(X + e|X)X(XX)1

    = (XX)1XVar(e|X)X(XX)1= (XX)1X(2I)X(XX)1= 2(XX)1XX(XX)1

    =

    2

    (XX)1

    Q: justification for each step above?

    ALR305 II49

  • 8/9/2019 02 Chapter ALR for Printing(1)

    50/117

    Properties of Least Squares Estimators: VI

    can readily verify thata bc d

    1=

    1

    ad cb d bc a

    since

    XX= n nxnxi x2i , get (XX)1 = 1ni x2i n2x2 i x2i

    nx

    nx n since

    Var(|X) =

    Var(0|X) Cov(0,1|X)

    Cov(0,1|X) Var(1|X)

    =2(XX)1,

    we find that, e.g., Var(1|X) = 2i x

    2i nx2

    = 2

    SXX

    by making use of

    i x

    2i nx2 =SXX(see Problem 1)

    ALR64, 305, 28 II50

  • 8/9/2019 02 Chapter ALR for Printing(1)

    51/117

    Properties of Least Squares Estimators: VII

    Q: what happens to Var(1 | X) = 2

    /SXX if we have theluxury of making the sample sizenas large as we want?

    in practice, 2 is usually unknown and must be estimated via2, leading to the following estimator for Var(1|X):

    Var(1|X) = 2SXX

    term standard error is sometimes (but not always) used torefer to the square root of an estimated variance

    standard error of1 denoted by se(1) is thus /SXX

    ALR29 II51

  • 8/9/2019 02 Chapter ALR for Printing(1)

    52/117

    Confidence Intervals and Tests for Slope: I

    assuming errors ei in simple linear regression to be normallydistributed, parameter estimator1for slope 1is also normallydistributed (same holds for 0 also)

    further assuming errors ei to have mean 0 and unknown vari-ance2, distribution of

    1also depends upon unknown 2

    with 2 estimated by 2, confidence intervals (CIs) and testsconcerning unknown true1need to be based ont-distributionwith degrees of freedom in sync with divisor used to form 2

    letTbe a random variable with at-distribution withddegreesof freedom, and let t(/2, d) be percentage point such that

    Pr(T t(/2, d)) =/2

    ALR30, 31 II52

  • 8/9/2019 02 Chapter ALR for Printing(1)

    53/117

    Confidence Intervals and Tests for Slope: II

    plot below shows probability density function (PDF) for t-distribution withd = 15 degrees of freedom, witht(0.05, 15) =1.753 marked by vertical dashed line (thus area under PDF toright of line is 0.05, and area to left is 0.95)

    ! " # $ # " !$%$

    $%#

    $%"

    $%!

    &

    '()

    II53

  • 8/9/2019 02 Chapter ALR for Printing(1)

    54/117

    Confidence Intervals and Tests for Slope: III

    (1 ) 100% CI for slope 1 is set of points 1in interval1 t(/2, n 2)se(1) 1 1+t(/2, n 2)se(1)

    example: for Forbes data (n= 17), 1= 0.008956 and se(1) =/SXX= 0.0001646 since = 0.003792 and

    SXX= 23.04,

    so 90% CI for 1is

    0.0089561.7530.0001646 1 0.008956+1.7530.0001646becauset(0.05, 15) = 1.753, yielding

    0.008668 1 0.009245

    ALR31 II54

  • 8/9/2019 02 Chapter ALR for Printing(1)

    55/117

    Confidence Intervals and Tests for Slope: IV

    can test null hypothesis 1= 1 versus alternative hypothesis1 =1 by computingt-statistic

    t=1 1se(1)

    and comparing it to percentage points for t-distribution withn 2 degrees of freedom

    example: for Fort Collins data (n = 93), 1 = 0.2035 andse(1) = /

    SXX = 0.1310 since = 13.72 and

    SXX =

    104.7, sot-statistic for test of zero slope (1 = 0) ist=

    0.2035 00.1310

    = 1.553

    ALR31 II55

  • 8/9/2019 02 Chapter ALR for Printing(1)

    56/117

    Confidence Intervals and Tests for Slope: V

    letting G(x) denote cumulative probability distribution func-tion for random variableT witht(91) distribution, i.e.,

    G(x) = Pr(T x),p-value associated withtis 2(1

    G(|t|)) see next overhead

    p-value is 0.1239, which is not small by common standards (e.g.,0.05 or 0.01), so not much support for rejecting null hypothesis

    ALR31 II56

  • 8/9/2019 02 Chapter ALR for Printing(1)

    57/117

  • 8/9/2019 02 Chapter ALR for Printing(1)

    58/117

    Prediction: I

    suppose now we want to predict a yet-to-be-observed responseygiven a settingx for the predictor

    if assumed-to-be-true linear regression model were known per-fectly, prediction would be y = 0 + 1x, whereas modelsays

    y=0+ 1x +e= y +eprediction error would be y y = e, which has variance

    Var(e|x) =2

    in general we must be satisfied using estimators 0 &

    1 inlieu of true values 0 & 1, which intuitively should lead to

    predictions that are not as good, resulting in a prediction errorwith a variance inflated above 2

    ALR32, 33 II58

  • 8/9/2019 02 Chapter ALR for Printing(1)

    59/117

    Prediction: II

    using fitted mean functionE(Y |X=x) = 0 +1xto predictresponseyfor givenx, prediction is nowy= 0+ 1x,

    and prediction error becomes

    y

    y

    =0+ 1x

    +e

    (0+ 1x

    )

    recall that, ifU&Vare uncorrelated RVs, then Var(UV) =Var(U) + Var(V) (see Equation (A.2), p. 291, of Weisberg)

    assuming that e is uncorrelated with RVs involved in formationof0&1, can regard U=0+ 1x

    +e

    andV = 0+ 1x

    as uncorrelated RVs when conditioned onx andx1, . . . , xnlettingx+ be shorthand for x, x1, . . . , xn, we can write

    Var(y y|x+ ) = Var(0+ 1x +e|x+ ) + Var(0+ 1x|x+ )

    ALR32, 33, 291 II59

  • 8/9/2019 02 Chapter ALR for Printing(1)

    60/117

    Prediction: III

    study piecesVar(0

    + 1

    x

    +e|x+

    ) &Var(

    0+

    1x|x+

    )

    one at a time

    using fact that Var(c + U) = Var(U) for a constantc, we have

    Var(0+ 1x +e|x+ ) = Var(e|x+ ) =2

    recall that, ifU & V are correlated RVs and c is a constantthen Var(U+cV) = Var(U) +c2 Var(V) + 2c Cov(U, V) (seeEquation (A.3), p. 292, of Weisberg)

    hence

    Var(0

    + 1

    x|x+

    ) = Var(

    0|x+

    ) +x2

    Var(

    1|x+

    ) + 2x

    Cov(

    0,

    1|x+

    )

    = Var(0|x1, . . . , xn) +x2Var(1|x1, . . . , xn)

    +2xCov(0, 1|x1, . . . , xn)under assumptionxis independent of RVs forming 0& 1

    ALR32, 33, 292 II60

  • 8/9/2019 02 Chapter ALR for Printing(1)

    61/117

    Prediction: IV

    expressions for Var(0 |x1, . . . , xn), Var(1 |x1, . . . , xn) andCov(0, 1|x1, . . . , xn) can be extracted from matrix

    Var(|X) =

    Var(0|X) Cov(0,1|X)

    Cov(0,1|X) Var(1|X)

    =2(XX)1

    Exercise (unassigned): using elements of (XX)1, show thatVar(0+ 1x|x+ ) =2

    1

    n+

    (x x)2SXX

    above representsincreasein variance of prediction error due to

    necessity of estimating0&1, with the actual variance being

    Var(y y|x+ ) =2

    1+1

    n+

    (x x)2SXX

    ALR32, 33, 295 II61

  • 8/9/2019 02 Chapter ALR for Printing(1)

    62/117

    Prediction: V

    estimating 2 by 2 and taking square root lead to standarderror of prediction (sepred) atx:

    sepred(y|x+ ) =

    1 +1

    n+

    (x x)2SXX

    1/2 using Forbes data as an example, suppose we want to pre-

    dict log10(pressure) at a hypothetical location for which boilingpoint of waterxis somewhere between 190 and 215

    prediction for log10(pressure) given boiling point x is

    y= 0+ 1x= 0.4216418 + 0.008956178x(1)100% prediction interval is set of pointsyin interval

    yt(/2, n2)sepred(y|x+ ) y y+t(/2, n2)sepred(y|x+ )

    ALR32, 33 II62

  • 8/9/2019 02 Chapter ALR for Printing(1)

    63/117

    Prediction: VI

    heren= 17, so, for a 99% prediction interval, we set = 0.01and uset(0.005, 15) = 2.947

    since = 0.003792, x= 203.0 and SXX= 530.8, we have

    sepred(y

    |x+

    ) = 0.0037921 +

    1

    17

    +(x 203.0)2

    530.8 1/2

    solid red curves on following plot depict 99% prediction intervalasxsweeps from 190 to 215 (black lines show intervals assum-ing unrealistically no uncertainty in parameter estimates)

    for x = 200, prediction is y = 1.370, and 99% predictioninterval is specified by 1.358 y 1.381 in original space, prediction is 10y = 23.42, and interval is

    101.358 10y 101.381, i.e., 22.80 10y 24.05ALR32, 33 II63

  • 8/9/2019 02 Chapter ALR for Printing(1)

    64/117

    Scatterplot of log10(Pressure) Versus Boiling Point

    190 195 200 205 210 215

    1.3

    0

    1.3

    5

    1.4

    0

    1.4

    5

    1.5

    0

    Boiling point

    log(Pressure)

    II64

  • 8/9/2019 02 Chapter ALR for Printing(1)

    65/117

    Scatterplot of Pressure Versus Boiling Point

    190 195 200 205 210 215

    20

    22

    24

    26

    28

    30

    32

    Boiling point

    Pressur

    e

    II65

  • 8/9/2019 02 Chapter ALR for Printing(1)

    66/117

    Coefficient of Determination R2: I

    ignoring potential predictors, best prediction of response is sam-ple average y of observed responsesy1,y2, . . . , yn

    for Fort Collins data, total sum of squares SYY=

    i(yi y)2is sum of squares of deviations of data from horizontal dashed

    line on next plotwith inclusion of predictors, unexplained variation is RSS

    for Fort Collins data, RSSis sum of squares of deviations fromsolid line on next plot

    ALR35, 36 II66

  • 8/9/2019 02 Chapter ALR for Printing(1)

    67/117

    Scatterplot of Late Snowfall Versus Early Snowfall

    0 10 20 30 40 50

    10

    20

    30

    40

    50

    60

    Early snowfall (inches)

    Latesnowfall(inches)

    ALR8 II67

  • 8/9/2019 02 Chapter ALR for Printing(1)

    68/117

    Coefficient of Determination R2: II

    difference between SYYand RSS is called sum of squares dueto regression:

    SSreg= SYY RSSProblem 5 says that

    RSS=SYY(SXY)2

    SXX

    hence

    SSreg= SYYSYY (SXY)

    2

    SXX

    =

    (SXY)2

    SXX

    divide SSregbySYYto get definition for coefficient of deter-mination:

    R2 =SSreg

    SYY=

    (SXY)2

    SXX SYY = 1 RSS

    SYY

    ALR35, 36 II68

  • 8/9/2019 02 Chapter ALR for Printing(1)

    69/117

    Coefficient of Determination R2: III

    Exercise (unassigned): R2 =r2xy (squared sample correlation)must have 0 R2 1R2 100 gives percentage of total sum of squares explained by

    regression (concept ofR2 generalizes to multiple regression)

    examples: R2 = 0.026 for Fort Collins &R2 = 0.995 for Forbes

    ALR35, 36 II69

  • 8/9/2019 02 Chapter ALR for Printing(1)

    70/117

    Coefficient of Determination R2: IV

    R and other computer packages report bothR

    2

    and a variationknown as the adjustedR2 :

    R2adj= 1 RSS/df

    SYY/(n 1) as compared to R2 = 1 RSS

    SYY,

    wheredfis the degrees of freedom

    for simple linear regression, df = n 2, so R2adjgets closerand closer toR2 asnincreases

    in general,df=n minus # of parameters in mean functionR2

    adjis intended to facilitate comparison of models, but Weis-

    berg notes (p. 36) there are better ways of doing so

    note: R2 useless if mean function does not have intercept term(e.g., regression through the origin: E(Y |X=x) =1x)

    ALR36 II70

  • 8/9/2019 02 Chapter ALR for Printing(1)

    71/117

    Inadequacy of Sufficient Statistics: I

    all of the data-dependent variables connected with a simple

    linear regression (e.g., 0, 1, 2, SSreg, RSS,R2, etc.) can beformed using just five fundamental statistics:

    x, y, SXX, SYY and SXY

    since

    x=1

    n

    ni=1

    xi, SXX=n

    i=1

    x2inx2 and SXY=n

    i=1

    xiyinxy

    (with analogous equations for y and SYY), it follows that basiclinear regression analysis depends only on five so-called suffi-

    cient statistics:n

    i=1

    xi,n

    i=1

    yi,n

    i=1

    x2i ,n

    i=1

    y2i andn

    i=1

    xiyi

    ALR293, 294, 23, 24, 25 II71

  • 8/9/2019 02 Chapter ALR for Printing(1)

    72/117

    Inadequacy of Sufficient Statistics: II

    under assumptions of normality and correctness of regressionmodel, we do not in theory lose any probabilistic informationby tossing away the original data (xi, yi), i = 1, . . . , n, andjust keeping five sufficient statistics

    reliance on sufficient statistics is dangerous in actual applica-tions, where adequacy of basic assumptions (normality, correct-ness of model) is always open to question

    Anscombe (1973) constructed an example of four data sets(n= 11) with sufficient statistics that are identical(to within

    rounding error), offering much food for thought

    third data set: reconsider scheme of picking median slopeamongst all possible lines determined by two distinct points

    ALR12, 13 II72

  • 8/9/2019 02 Chapter ALR for Printing(1)

    73/117

    Anscombes First Data Set

    5 10 15 20

    2

    4

    6

    8

    10

    12

    14

    Predictor

    Response

    ALR13 II73

  • 8/9/2019 02 Chapter ALR for Printing(1)

    74/117

    Anscombes Second Data Set

    5 10 15 20

    2

    4

    6

    8

    10

    12

    14

    Predictor

    Response

    ALR13 II74

  • 8/9/2019 02 Chapter ALR for Printing(1)

    75/117

    Anscombes Third Data Set

    5 10 15 20

    2

    4

    6

    8

    10

    12

    14

    Predictor

    Response

    ALR13 II75

  • 8/9/2019 02 Chapter ALR for Printing(1)

    76/117

    Anscombes Fourth Data Set

    5 10 15 20

    2

    4

    6

    8

    10

    12

    14

    Predictor

    Response

    ALR13 II76

  • 8/9/2019 02 Chapter ALR for Printing(1)

    77/117

    Residuals: I

    looking at residuals ei is a vital step in regression analysis can check assumptions to prevent garbage in/garbage out

    basic tool is a plot of residuals versus other quantities, of whichthree obvious choices are:

    1.residuals versus fitted values yi2.residuals versus predictorsxi3.residuals versus case numbersi

    special nature of certain data might suggest other plots

    useful residual plot resembles a null plot when assumptionshold, and a non-null plot when some assumption fails

    lets look at plots1to3using Anscombes data sets as examples

    ALR36, 37, 38 II77

  • 8/9/2019 02 Chapter ALR for Printing(1)

    78/117

    Residuals Versus Fitted Values, Data Set #1

    ! " # $ % &'

    (

    &

    '

    &

    )*++,- /012,3

    4,3*-20

    13

    II78

  • 8/9/2019 02 Chapter ALR for Printing(1)

    79/117

    Residuals Versus Predictors, Data Set #1

    ! " # $% $& $!

    &

    $

    %

    $

    '()*+,-.(/

    0)/+*12

    3/

    II79

  • 8/9/2019 02 Chapter ALR for Printing(1)

    80/117

    Residuals Versus Fitted Values, Data Set #2

    ! " # $ % &'

    ()'

    &

    )!

    &)'

    ')!

    ')'

    ')!

    &)'

    *+,,-. 0123-4

    5-4+.31

    24

    II80

  • 8/9/2019 02 Chapter ALR for Printing(1)

    81/117

    Residuals Versus Predictors, Data Set #2

    ! " # $% $& $!

    &'%

    $

    '(

    $'%

    %'(

    %'%

    %'(

    $'%

    )*+,-./0*1

    2+1-,34

    51

    II81

  • 8/9/2019 02 Chapter ALR for Printing(1)

    82/117

    Residuals Versus Fitted Values, Data Set #3

    ! " # $ % &'

    &

    '

    &

    (

    )

    *+,,-. 0123-4

    5-4+.31

    24

    II82

  • 8/9/2019 02 Chapter ALR for Printing(1)

    83/117

  • 8/9/2019 02 Chapter ALR for Printing(1)

    84/117

    Residuals Versus Fitted Values, Data Set #4

    ! " # $% $$ $&

    $'(

    %'(

    %'%

    %'(

    $'%

    $'(

    )*++,- /012,3

    4,3*-20

    13

    II84

  • 8/9/2019 02 Chapter ALR for Printing(1)

    85/117

    Residuals Versus Predictors, Data Set #4

    ! "# "$ "% "& "!

    "'(

    #'(

    #'#

    #'(

    "'#

    "'(

    )*+,-./0*1

    2+1-,34

    51

    II85

  • 8/9/2019 02 Chapter ALR for Printing(1)

    86/117

    Residuals: II

    Q: why is a plot of residuals versus yi identical to a plot ofresiduals versusxiafter relabeling the horizontal axis?

    II86

  • 8/9/2019 02 Chapter ALR for Printing(1)

    87/117

    Residuals Versus Case Numbers, Data Set #1

    ! " # $ %&

    !

    %

    &

    %

    '()* ,-./*0)

    1*)23-(

    4)

    II87

  • 8/9/2019 02 Chapter ALR for Printing(1)

    88/117

    Residuals Versus Case Numbers, Data Set #2

    ! " # $ %&

    !'&

    %

    '(

    %'&

    &'(

    &'&

    &'(

    %'&

    )*+, ./01,2+

    3,+45/*

    6+

    II88

  • 8/9/2019 02 Chapter ALR for Printing(1)

    89/117

    Residuals Versus Case Numbers, Data Set #3

    ! " # $ %&

    %

    &

    %

    !

    '

    ()*+ -./0+1*

    2+*34.)

    5*

    II89

  • 8/9/2019 02 Chapter ALR for Printing(1)

    90/117

    Residuals Versus Case Numbers, Data Set #4

    ! " # $ %&

    %'(

    &'(

    &'&

    &'(

    %'&

    %'(

    )*+, ./01,2+

    3,+45/*

    6+

    II90

  • 8/9/2019 02 Chapter ALR for Printing(1)

    91/117

    Residuals: III

    although plots of ei versus i were not particularly useful forAnscombes data, plotisuseful for certain other data sets (par-ticularly where cases are collected sequentially in time)

    fourth obvious choice: plot residuals eiversus responsesyi

    this choice is problematic because relationshipyi= yi+ ei

    says that, if spread of yis is small compared to spread of eis,large eis will correspond to largeyis even if model is correct

    thus residuals versus responses is not a useful residual plot be-cause it need not resemble a null plot when assumptions hold

    as an example, reconsider Fort Collins data

    II91

  • 8/9/2019 02 Chapter ALR for Printing(1)

    92/117

    Scatterplot of Late Snowfall Versus Early Snowfall

    0 10 20 30 40 50

    10

    20

    30

    40

    50

    60

    Early snowfall (inches)

    Latesnowfall(

    inches)

    ALR8 II92

  • 8/9/2019 02 Chapter ALR for Printing(1)

    93/117

    Residuals Versus Fitted Values, Fort Collins Data

    !" !# !$ !% !& $"

    !"

    #"

    '"

    "

    '"

    #"

    !"

    ()**+, ./01+2

    3+2),1/02

    II93

    id l di C lli

  • 8/9/2019 02 Chapter ALR for Printing(1)

    94/117

    Residuals Versus Predictors, Fort Collins Data

    ! "! #! $! %! &!

    $!

    #!

    "!

    !

    "!

    #!

    $!

    '()*+,-.(/

    0)/+*123/

    II94

    R id l V C N b F C lli D

  • 8/9/2019 02 Chapter ALR for Printing(1)

    95/117

    Residuals Versus Case Numbers, Fort Collins Data

    ! "! #! $! %!

    &!

    "!

    '!

    !

    '!

    "!

    &!

    ()*+ -./0+1*

    2+*34.)5*

    II95

    R id l V R F t C lli D t

  • 8/9/2019 02 Chapter ALR for Printing(1)

    96/117

    Residuals Versus Responses, Fort Collins Data

    !" #" $" %" &" '"

    $"

    #"

    !"

    "

    !"

    #"

    $"

    ()*+,-*)*

    ()*./012*

    II96

    R id l IV

  • 8/9/2019 02 Chapter ALR for Printing(1)

    97/117

    Residuals: IV

    reconsider Forbes data, focusing first on 3 following overheads reddashed horizontal lines on residual plot show recall definition of weights ci:

    1

    = i(xi x)yiSXX =n

    i=1 ciyi, where ci=xi xSXX

    ALR36, 37, 38 II97

    S tt l t f l (P ) V B ili P i t

  • 8/9/2019 02 Chapter ALR for Printing(1)

    98/117

    Scatterplot of log10(Pressure) Versus Boiling Point

    195 200 205 210

    1.3

    5

    1.40

    1.4

    5

    Boiling point

    log(Press

    ure)

    ALR6 II98

    Pl t f R id l P di t f F b D t

  • 8/9/2019 02 Chapter ALR for Printing(1)

    99/117

    Plot of Residuals versus Predictors for Forbes Data

    !"# $%% $%# $!%

    %&%

    %#

    %&%

    %%

    %&%

    %#

    %&%

    !%

    %&%!#

    '()*)+, .()+/

    012)345*2

    ALR6 II99

    W i ht V B ili P i t f F b D t

  • 8/9/2019 02 Chapter ALR for Printing(1)

    100/117

    Weights Versus Boiling Point for Forbes Data

    !"# $%% $%# $!%

    %&

    %!#

    %&

    %%#

    %&

    %%#

    %&

    %!#

    '()*)+, .()+/

    0)

    II100

    R id l V

  • 8/9/2019 02 Chapter ALR for Printing(1)

    101/117

    Residuals: V

    Weisberg notes that Forbes deemed this case evidently a mis-take, but perhaps just because of its appearance as an outlier

    Weisberg (p. 38) shows that, if (x12, y12) is removed and re-gression analysis is redone on reduced data set, resulting slopeestimate is virtually the same, but and quantities that depend

    upon it drastically change (see overheads that follow)

    to delete or not to delete that is the question:

    if we dont delete, normality assumption is questionable

    if we do delete, normality assumption is tenable, but no realscientific justification for doing so (open to charges of datamassaging)

    ALR36, 37, 38 II101

    S tt l t f l (P ) V B ili P i t

  • 8/9/2019 02 Chapter ALR for Printing(1)

    102/117

    Scatterplot of log10(Pressure) Versus Boiling Point

    195 200 205 210

    1.3

    5

    1.40

    1.4

    5

    Boiling point

    log(Pressure)

    ALR6 II102

    S tt l t f l (P ) V B ili P i t

  • 8/9/2019 02 Chapter ALR for Printing(1)

    103/117

    Scatterplot of log10(Pressure) Versus Boiling Point

    195 200 205 210

    1.3

    5

    1.40

    1.4

    5

    Boiling point

    log(Pressure) x

    II103

    Plot of Residuals versus Predictors for Forbes Data

  • 8/9/2019 02 Chapter ALR for Printing(1)

    104/117

    Plot of Residuals versus Predictors for Forbes Data

    !"# $%% $%# $!%

    %&%

    %#

    %&%

    %%

    %&%

    %#

    %&%

    !%

    %&%!#

    '()*)+, .()+/

    012)34

    5*2

    ALR6 II104

    Plot of Residuals versus Predictors for Forbes Data

  • 8/9/2019 02 Chapter ALR for Printing(1)

    105/117

    Plot of Residuals versus Predictors for Forbes Data

    !"# $%% $%# $!%

    %&%

    %#

    %&%

    %%

    %&%

    %#

    %&%

    !%

    %&%!#

    '()*)+, .()+/

    012)34

    5*2

    6

    II105

    Scatterplot of log (Pressure) Versus Boiling Point

  • 8/9/2019 02 Chapter ALR for Printing(1)

    106/117

    Scatterplot of log10(Pressure) Versus Boiling Point

    190 195 200 205 210 215

    1.3

    0

    1.3

    5

    1.4

    0

    1.4

    5

    1.5

    0

    Boiling point

    log(Pressure)

    II106

    Scatterplot of log (Pressure) Versus Boiling Point

  • 8/9/2019 02 Chapter ALR for Printing(1)

    107/117

    Scatterplot of log10(Pressure) Versus Boiling Point

    190 195 200 205 210 215

    1.3

    0

    1.3

    5

    1.40

    1.4

    5

    1.5

    0

    Boiling point

    log(Pressure) x

    II107

    Main Points: I

  • 8/9/2019 02 Chapter ALR for Printing(1)

    108/117

    Main Points: I

    given a responseYand a predictorX, simple linear regressionassumes(1)a linear mean function

    E(Y |X=x) =0+ 1x,

    where 0 (intercept term) and 1 (slope term) are unknownparameters (constants), and(2)a constant variance function

    Var(Y |X=x) =2,

    where2 >0 is a third unknown parameter

    simple linear regression model can also be written as

    Y = E(Y |X=x) +e=0+ 1x+e,

    wheree is a statistical error, a random variable (RV) such thatE(e) = 0 and Var(e) =2

    ALR21, 292, 293 II108

    Main Points: II

  • 8/9/2019 02 Chapter ALR for Printing(1)

    109/117

    Main Points: II

    let (xi, y

    i), i = 1, . . . , n, be RVs obeying Y =

    0+

    1x+ e

    (predictor/response data are realizations of these 2nRVs)

    for theith case, haveyi=0+ 1xi+ei

    errorse1, . . . , enare independent RVs such that E(ei|xi) = 0

    model for data can also be written in matrix notation asY=X + e,

    where

    Y= y1

    y2...

    yn

    , X= 1 x1

    1 x2... ...1 xn

    , = 01 , e= e1

    e2...

    en

    ALR21, 29, 63, 64 II109

    Main Points: III

  • 8/9/2019 02 Chapter ALR for Printing(1)

    110/117

    Main Points: III

    given sample means

    x=1n

    ni=1

    xi and y=1n

    ni=1

    yi

    and sample cross products and sum of squares

    SXY=

    n

    i=1

    (xix)(yiy), SXX=n

    i=1

    (xix)2

    & SYY=

    n

    i=1

    (yiy)2

    ,

    can form least squares estimators for parameters 1 and0:

    1=SXY

    SXXand 0= y 1x

    corresponding estimator for error variance 2 is

    2 = RSS

    n 2, where RSS=n

    i=1

    [yi (0 +1xi)]2 =SYY(SXY)2

    SXX

    ALR293, 294, 24, 25 II110

    Main Points: IV

  • 8/9/2019 02 Chapter ALR for Printing(1)

    111/117

    Main Points: IV

    letting

    RSS(b0, b1) =n

    i=1

    [yi (b0+b1xi)]2,

    least squares estimators 0 and 1 are choices for b0 and b1

    such that RSS(b0, b1) is minimizedfitted values yiand residuals eiare defined as

    yi= 0+ 1xi and ei=yi (0+ 1xi) =yi yi,in terms of which we have

    RSS= RSS(0,1) =n

    i=1

    e2i and 2 =

    ie2in 2

    ALR24, 22, 23 II111

    Main Points: V

  • 8/9/2019 02 Chapter ALR for Printing(1)

    112/117

    Main Points: V

    in matrix notation, least squares estimator of is such that

    XX=XY, i.e., is solution to normal equations XXb= XY

    2 2 matrixXXhas an inverse as long as SXX = 0, so= (XX)1XY

    since E() =, estimators0& 1are unbiased, as is 2 also:

    E(0) =0, E(1) =1 and E(2) =2

    also have Var(|X) =2(XX)1, leading us to deduce

    Var(1|X) = 2

    SXX, which can be estimated byVar(1|X) = 2

    SXX,

    the square root of which is se(1), the standard error of1

    ALR304, 305, 61, 62, 63, 27, 28 II112

    Main Points: VI

  • 8/9/2019 02 Chapter ALR for Printing(1)

    113/117

    Main Points: VI

    can test null hypothesis (NH) that1

    = 0 by formingt-statistict= 1/se(1) and comparing it to percentage pointst(, n2)for t-distribution with n 2 degrees of freedom, with a largevalue of|t|giving evidence against NH via a small p-value

    (1

    )

    100% confidence interval for 1 is set of points in

    interval whose end points are

    1 t(/2, n 2)se(1) and 1+t(/2, n 2)se(1)

    ALR31 II113

    Main Points: VII

  • 8/9/2019 02 Chapter ALR for Printing(1)

    114/117

    Main Points: VII

    can predict a yet-to-be-observed responsey

    given a settingxfor the predictor using y = 0+ 1x, which has a standard

    error given by

    sepred(y

    |x+

    ) = 1 +

    1

    n

    +(x x)2

    SXX 1/2

    ,

    wherex+ denotesxalong with original predictorsx1, . . . , xn(1 ) 100% prediction interval constitutes all values from

    y

    t(/2, n

    2)sepred(y|x+

    ) to y

    +t(/2, n

    2)sepred(y

    |x+

    )

    ALR32, 33 II114

  • 8/9/2019 02 Chapter ALR for Printing(1)

    115/117

    Main Points: IX

  • 8/9/2019 02 Chapter ALR for Printing(1)

    116/117

    Main Points: IX

    plots of residuals eiare invaluable for assessing reasonableness

    of fitted model (a point that cannot be emphasized too much)

    standard plot is residuals ei versus fitted values yi, which isequivalent to eis versus predictorsxi

    plot of residuals versus case numberiis potentially but notalways useful

    donotplot residuals versus responsesyi misleading!failure to plot residuals is potentially bad for your health!

    Thou Shalt Plot Residuals (a proposed 11th commandment!)

    ALR36, 37, 38 II116

    Additional References

  • 8/9/2019 02 Chapter ALR for Printing(1)

    117/117

    F. Mosteller, A.F. Siegel, E. Trapido and C. Youtz (1981), Eye Fitting Straight Lines,

    The American Statistician,35, pp. 1501

    C.R. Rao (1972), Linear Statistical Inference and Its Applications (Second Edition),

    New York: John Wiley & Sons, Inc.

    G.A.F. Seber (1977),Linear Regression Analysis, New York: John Wiley & Sons, Inc.