introduction to econometrics- stock & watson -ch 4 slides.doc

Upload: antonio-alvino

Post on 04-Jun-2018

257 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    1/84

    Introduction to Linear Regression

    (SW Chapter 4)

    Empirical problem: Class size and educational output

    Policy question: What is the effect of reducing classsize by one student per class? by 8 students/class?

    What is the right output (performance) measure?

    parent satisfaction

    student personal deelopment

    future adult !elfare

    future adult earnings

    performance on standardized tests

    "#$

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    2/84

    What do data say about class sizes and test scores

    !he Cali"ornia !est Score #ata Set

    %ll ' and California school districts (n "*)

    +ariables: ,thgrade test scores (-tanford#. achieement test

    combined math and reading) district aerage

    -tudent#teacher ratio (-01) no2 of students in thedistrict diided by no2 full#time equialent teachers

    "#

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    3/84

    %n initial loo3 at the California test score data:

    "#4

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    4/84

    5o districts !ith smaller classes (lo!er -01) hae higher test

    scores?

    "#"

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    5/84

    !he class size$test score policy %uestion:

    What is the effect on test scores of reducing -01 byone student/class?

    6b7ect of policy interest:0est score

    STR

    0his is the slope of the line relating test score and STR

    "#,

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    6/84

    0his suggests that !e !ant to dra! a line through the

    Test Score v. STRscatterplot but how?

    "#'

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    7/84

    Some &otation and !erminology

    (Sections 4' and 4')

    0hepopulation regression line:

    Test Score *9 $STR

    $ slope of population regression line

    0est score

    STR

    change in test score for a unit change in STR

    Why are *and $population parameters?

    We !ould li3e to 3no! the population alue of $2

    We dont 3no! $ so must estimate it using data2"#;

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    8/84

    How can we estimate *and $from data?

    1ecall that Y !as the least squares estimator of Y: Y

    soles)

    $

    min ( )n

    m i

    i

    Y m=

    n

    b b i i

    i

    Y b b =

    +"#8

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    9/84

    0he 6- estimator soles:* $

    )

    / * $

    $

    min = ( )>n

    b b i i

    i

    Y b b =

    +

    0he 6- estimator minimizes the aerage squareddifference bet!een the actual alues of Yiand the

    prediction (predicted alue) based on the estimated line2

    0his minimization problem can be soled usingcalculus (%pp2 "2)2

    !he result is the /LS estimators o" .and 2

    "#.

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    10/84

    Why use /LS0 rather than some other estimator

    6- is a generalization of the sample aerage: if the@lineA is 7ust an intercept (no) then the 6-

    estimator is 7ust the sample aerage of Y$BYn(Y)2

    i3e Y the 6- estimator has some desirableproperties: under certain assumptions it is unbiased

    (that is!( $C ) $) and it has a tighter sampling

    distribution than some other candidate estimators of

    $(more on this later)

    Dmportantly this is !hat eeryone uses the common@languageA of linear regression2

    "#$*

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    11/84

    "#$$

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    12/84

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    13/84

    Estimated regression line: TestScore '.82. 28STR

    %nterpretation of the estimated slope and intercept

    TestScore '.82. 28STR

    5istricts !ith one more student per teacher on aeragehae test scores that are 28 points lo!er2

    0hat is0est score

    STR

    28

    0he intercept (ta3en literally) means that according tothis estimated line districts !ith zero students per

    teacher !ould hae a (predicted) test score of '.82.2

    0his interpretation of the intercept ma3es no sense iteFtrapolates the line outside the range of the data in

    "#$4

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    14/84

    this application the intercept is not itself

    economically meaningful2

    "#$"

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    15/84

    1redicted 2alues 3 residuals:

    6ne of the districts in the data set is %ntelope C% for

    !hich STR $.244 and Test Score ',;28

    predicted alue: C&ntelopeY '.82. 28$.244 ',"28

    residual: &ntelopeu ',;28 ',"28 42*

    "#$,

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    16/84

    /LS regression: S!! output

    regress testscr str, robust

    Regression with robust standard errors Number of obs = 420 F( 1, 418) = 192! "rob # F = 00000 R$s%uared = 00&12 Root ' = 18&81

    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ * Robusttestscr * +oef td rr t "#*t* 9&- +onf .nter/a$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ str * $2239808 &194892 $49 0000 $0094& $12&8!31 5cons * !989 10!4! !344 0000 !38&!02 3190&3$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

    TestScore '.82. 28STR

    (!ell discuss the rest of this output later)

    "#$'

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    17/84

    0he 6- regression line is an estimate computed using

    our sample of dataG a different sample !ould hae gien

    a different alue of $C 2

    Ho! can !e:

    quantify the sampling uncertainty associated !ith $C ?

    use $C to test hypotheses such as $ *?

    construct a confidence interal for $?

    i3e estimation of the mean !e proceed in four steps:

    $2 0he probability frame!or3 for linear regression2 Estimation

    42 Hypothesis 0esting

    "2 Confidence interals"#$;

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    18/84

    ' 1robability 5rame*or- "or Linear Regression

    'opulation

    population of interest (eF: all possible school districts)

    Random variables:Y

    EF: (Test Score( STR)

    )oint distribution of (Y)

    0he 3ey feature is that !e suppose there is a linear

    relation in the population that relatesand YG this linear

    relation is the @population linear regressionA

    "#$8

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    19/84

    !he 1opulation Linear Regression 6odel (Section 4'7)

    Yi *9 $i9 ui i $B n

    is the independent variableor re*ressor

    Yis the dependent variable

    * intercept

    $ slope

    ui @error termA

    0he error term consists of omitted factors or possiblymeasurement error in the measurement of Y2 Dn

    "#$.

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    20/84

    general these omitted factors are other factors that

    influence Y other than the ariable

    "#*

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    21/84

    !+.: 0he population regression line and the error term

    What are some of the omitted factors in this e+ample?"#$

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    22/84

    ,ata and samplin*

    0he population ob7ects (@parametersA) *and $are

    un3no!nG so to dra! inferences about these un3no!n

    parameters !e must collect releant data2

    Simple random samplin*:

    Choose nentities at random from the population ofinterest and obsere (record)and Yfor each entity

    -imple random sampling implies that I(i Yi)J i $B

    n are independently and identically distributed(i2i2d2)2

    (-ote: (i Yi) are distributed independently of ( Y) for

    different obserations iand2)

    "#

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    23/84

    0as3 at hand: to characterize the sampling distribution of

    the 6- estimator2 0o do so !e ma3e three

    assumptions:

    !he Least S%uares ssumptions

    $2 0he conditional distribution of ugienhas meanzero that is!(uK+) *2

    2 (i(Yi) i$Bn are i2i2d2

    42 and uhae four moments that is:

    !(") L and!(u") L 2

    Well discuss these assumptions in order2

    "#4

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    24/84

    Least s%uares assumption 8: E(u9Xx) .'

    /or any *iven value of ( the mean of u is $ero

    "#"

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    25/84

    EFample: %ssumption M$ and the class size eFample

    Test Scorei *9 $STRi9 ui ui other factors

    @6ther factors:A

    parental inolement

    outside learning opportunities (eFtra math class22)

    home enironment conducie to reading

    family income is a useful proFy for many such factors

    -o!(uK+) * means!(/amily %ncomeKSTR) constant

    (!hich implies that family income and STRare

    uncorrelated)2 This assumption is not innocuous0 We

    will return to it often."#,

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    26/84

    Least s%uares assumption 8:

    (Xi0Yi)0 i 0;0nare i'i'd'

    0his arises automatically if the entity (indiidual district)

    is sampled by simple random sampling: the entity is

    selected then for that entityandYare obsered

    (recorded)2

    0he main place !e !ill encounter non#i2i2d2 sampling is

    !hen data are recorded oer time (@time series dataA)

    this !ill introduce some eFtra complications2

    "#'

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    27/84

    Least s%uares assumption 87:

    E(X4) < andE(u4)

    ( )

    n

    i i i

    i

    n

    i

    i

    u u

    =

    =

    +

    "#.

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    30/84

    $C

    $

    $

    )

    $

    ( )= ( ) ( )>

    ( )

    n

    i i i

    i

    n

    i

    i

    u u

    =

    =

    +

    $ $

    $) )

    $ $

    ( )( ) ( )( )

    ( ) ( )

    n n

    i i i i

    i i

    n n

    i i

    i i

    u u

    = =

    = =

    +

    so

    $C $$

    )

    $

    ( )( )

    ( )

    n

    i i

    i

    n

    i

    i

    u u

    =

    =

    "#4*

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    31/84

    We can simplify this formula by noting that:

    $

    ( )( )n

    i i

    i

    u u=

    $

    ( )n

    i i

    i

    u=

    $

    ( )n

    i

    i

    u=

    $

    ( )n

    i i

    i

    u=

    20hus

    $C $

    $

    )

    $

    ( )

    ( )

    n

    i i

    i

    n

    i

    i

    u

    =

    =

    $

    )

    $

    $

    n

    i

    i

    vn

    ns

    n

    =

    !here vi (i )ui2

    "#4$

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    32/84

    $C $

    $

    )

    $

    $

    n

    i

    i

    vn

    ns

    n

    =

    !here vi (i )ui

    We no! can calculate the mean and ariance of $C :

    !( $C $) )$

    $ $n

    i

    i

    n! v sn n

    =

    )$

    $

    $

    ni

    i

    vn!

    n n s=

    )$

    $

    $

    ni

    i

    vn!

    n n s=

    "#4

    ) )

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    33/84

    No! !(vi/ )

    s ) !=(i )ui/ )

    s > *

    because!(uiKi+) * (for details see %pp2 "24)

    0hus !( $C $) )

    $

    $

    $

    ni

    i

    vn!

    n n s=

    *

    so

    !( $C ) $

    0hat is $C

    is an unbiased estimator o" '

    "#44

    C

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    34/84

    Calculation of the ariance of $C :

    $

    C

    $

    $

    )

    $

    $

    n

    i

    i

    vn

    n sn

    =

    0his calculation is simplified by supposing that nis

    large (so that)

    s can be replaced by)

    )G the result is

    ar( $C

    )

    ar( )

    v

    n

    (Oor details see %pp2 "242)

    "#4"

    h li di ib i i li d b h

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    35/84

    0he eFact sampling distribution is complicated but !hen

    the sample size is large !e get some simple (and good)

    approFimations:

    ($)

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    36/84

    $C $

    $

    )

    $

    $

    n

    i

    i

    vn

    ns

    n

    =

    When nis large:

    vi (i )ui(i)ui !hich is i2i2d2 (why?) and

    has t!o moments that is ar(vi) L

    (why?)2 0hus

    $

    $ n

    i

    i

    vn

    =

    is distributed-(*ar(v)/n) !hen nis large

    )s is approFimately equal to)

    !hen nis large

    $n

    n

    $

    $

    n$ !hen nis large

    Putting these together !e hae:

    "#4'

    C

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    37/84

    Large=nappro>imation to the distribution o" $ :

    $C $

    $

    )

    $

    $

    n

    ii

    vn

    ns

    n

    =

    $

    $n

    i

    i

    vn

    =

    !hich is approFimately distributed-(*

    )

    ) )( )v

    n

    )2

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    38/84

    1ecall the summary of the sampling distribution of Y:

    Oor (Y$BYn) i2i2d2 !ith * L)

    Y L

    0he eFact (finite sample) sampling distribution of Y

    has meanY(@Yis an unbiased estimator of YA) and

    ariance)

    Y /n

    6ther than its mean and ariance the eFactdistribution of Y is complicated and depends on the

    distribution of Y

    Y p

    Y (la! of large numbers)

    ( )

    ar( )

    Y ! Y

    Y

    is approFimately distributed-(*$) (C0)

    "#48

    i " / S i C

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    39/84

    1arallel conclusions hold "or the /LS estimator $ :

    nder the three east -quares %ssumptions

    0he eFact (finite sample) sampling distribution of $C

    has mean $(@ $C is an unbiased estimator of $A) and

    ar( $C ) is inersely proportional to n2

    6ther than its mean and ariance the eFact

    distribution of $C is complicated and depends on the

    distribution of (u)

    $C p$(la! of large numbers)

    $ $

    $

    ( )

    ar( )

    !

    is approFimately distributed-(*$) (C0)

    "#4.

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    40/84

    "#"*

    $ 0h b bilit f 3 f li i

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    41/84

    $2 0he probability frame!or3 for linear regression

    2 Estimation

    7' ?ypothesis !esting (Section 4'@)

    "2 Confidence interals

    -uppose a s3eptic suggests that reducing the number of

    students in a class has no effect on learning orspecifically test scores2 0he s3eptic thus asserts the

    hypothesis

    H*: $ *

    We !ish to test this hypothesis using data reach a

    tentatie conclusion !hether it is correct or incorrect2

    "#"$

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    42/84

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    43/84

    t/*

    /

    Y

    Y

    Y

    s n

    then re7ect the null hypothesis if KtK Q$2.'2

    !here the S!of the estimator is the square root of an

    estimator of the ariance of the estimator2

    "#"4

    % li d t h th i b t

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    44/84

    %pplied to a hypothesis about $:

    testimator # hypothesized alue

    standard error of the estimator

    so

    t$ $/*

    $

    ( )S!

    !here $is the alue of $*hypothesized under the null

    (for eFample if the null alue is zero then $* *2

    What is S!( $C )?

    S!( $C ) the square root of an estimator of the

    ariance of the sampling distribution of $C

    "#""

    1 ll th i f th i f C (l )

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    45/84

    1ecall the eFpression for the ariance of $ (large n):

    ar( $C

    ) ar=( ) >

    ( )

    i + i

    u

    n

    )

    "

    v

    n

    !here vi (i )ui2 Estimator of the ariance of $C :

    $

    )

    CC )

    ) )

    $ estimator of

    (estimator of )

    v

    n

    ) )

    $

    )

    )

    $

    $

    ( )$

    $( )

    n

    i ii

    n

    i

    i

    un

    n

    n

    =

    =

    2

    "#",

    $ n

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    46/84

    $

    )

    CC

    ) )

    $

    )

    )

    $

    $( )

    $

    $( )

    i i

    i

    n

    i

    i

    un

    n

    n

    =

    =

    2

    6& this is a bit nasty but:

    0here is no reason to memorize this Dt is computed automatically by regression soft!are

    S!( $C ) $)

    C is reported by regression soft!are

    Dt is less complicated than it seems2 0he numeratorestimates the ar(v) the denominator estimates

    ar()2

    "#"'

    1eturn to calculation of the t statsitic:

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    47/84

    1eturn to calculation of the t#statsitic:

    t$ $/*

    $

    ( )S!

    $

    $ $/*

    )

    C

    1e7ect at ,R significance leel if KtK Q $2.'

    p#alue isp Pr=KtK Q KtactK> probability in tails ofnormal outside KtactK

    ample: Test Scores and STR Cali"ornia data

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    48/84

    E>ample: Test ScoresandSTR0 Cali"ornia data

    Estimated regression line: TestScore '.82. 28STR

    1egression soft!are reports the standard errors:

    S!( * ) $*2" S!( $C ) *2,

    t#statistic testing $* * $ $/*

    $

    ( )S!

    28 *

    *2,

    "248

    0he $1#sided significance leel is 2,8 so !e re7ectthe null at the $R significance leel2

    %lternatiely !e can compute thep#alueB"#"8

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    49/84

    0hep#alue based on the large#nstandard normal

    approFimation to the t#statistic is *2****$ ($*")"#".

    $ 0he probability frame!or3 for linear regression

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    50/84

    $2 0he probability frame!or3 for linear regression

    2 Estimation

    42 Hypothesis 0esting

    4' Con"idence inter2als (Section 4'A)

    Dn general if the sampling distribution of an estimator is

    normal for large n then a .,R confidence interal can beconstructed as estimator $2.'standard error2

    -o: a .,R confidence interal for $C is

    I $C $2.'S!( $

    C )J

    "#,*

    !+ample: Test Scores and STR California data

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    51/84

    !+ample: Test Scoresand STR California data

    Estimated regression line: TestScore '.82. 28STR

    S!( * ) $*2" S!( $

    C ) *2,

    .,R confidence interal for $C :

    I $C $2.'S!( $

    C )J I28 $2.'*2,J

    (424* $2')

    Equialent statements: 0he .,R confidence interal does not include zeroG

    0he hypothesis $ * is re7ected at the ,R leel

    "#,$

    con2ention "or reporting estimated regressions:

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    52/84

    con2ention "or reporting estimated regressions:

    Put standard errors in parentheses belo! the estimates

    TestScore '.82. 28STR

    ($*2") (*2,)

    0his eFpression means that:

    0he estimated regression line is

    TestScore '.82. 28STR 0he standard error of *

    is $*2"

    0he standard error of $C is *2,

    "#,

    /LS regression: S!! output

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    53/84

    /LS regression: S!! output

    regress testscr str, robust

    Regression with robust standard errors Number of obs = 420 F( 1, 418) = 192! "rob # F = 00000 R$s%uared = 00&12 Root ' = 18&81$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ * Robusttestscr * +oef td rr t "#*t* 9&- +onf .nter/a

    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ str * $2239808 &194892 $48 0000 $0094& $12&8!31 5cons * !989 10!4! !344 0000 !38&!02 3190&3$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

    so:

    TestScore '.82. 28STR

    ($*2") (*2,)

    t($ *) "248p#alue *2***

    .,R conf2 interal for $is (424* $2')"#,4

    Regression *hen X is Binary (Section 4 )

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    54/84

    Regression *henXis Binary (Section 4')

    -ometimes a regressor is binary:

    $ if female * if male

    $ if treated (eFperimental drug) * if not

    $ if small class size * if not

    -o far $has been called a @slopeA but that doesnt

    ma3e much sense ifis binary2

    Ho! do !e interpret regression !ith a binary regressor?

    "#,"

    Yi * 9 $i 9 ui !here is binary (i * or $):

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    55/84

    Yi *9 $i9 ui !hereis binary (i * or $):

    Wheni *: Yi *9 ui

    Wheni $: Yi *9 $9 ui

    thus:

    Wheni * the mean of Yiis *

    Wheni $ the mean of Yiis *9 $that is:

    !(YiKi*) *

    !(YiKi$) *9 $so:

    $!(YiKi$) !(YiKi*)

    "#,,

    population difference in group means

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    56/84

    population difference in group means

    !+ample: TestScoreand STR California data

    et

    ,i$ if *

    * if *

    i

    i

    STR

    STR

    >

    0he 6- estimate of the regression line relatingTestScoreto,(!ith standard errors in parentheses) is:

    TestScore ',*2* 9 ;2",

    ($24) ($28)

    5ifference in means bet!een groups ;2"G

    "#,'

    S! $28 t ;2"/$28 "2*

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    57/84

    S! $28 t ;2"/$28 "2*

    "#,;

    #ompare the re*ression results with the *roup means(

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    58/84

    #ompare the re*ression results with the *roup means(

    computed directly2

    Class -ize %erage score (Y) -td2 de2 (sY) -

    -mall (STRQ *) ',;2" $.2" 48arge (STRS *) ',*2* $;2. $8

    Estimation: small largeY Y ',;2" ',*2* ;2"

    !est .:;2"

    ( ) $284

    s l

    s l

    Y Yt

    S! Y Y

    = = "2*,

    D@ con"idence inter2alI;2"$2.'$284J(428$$2*)

    This is the same as in the regression!TestScore ',*2* 9 ;2",

    ($24) ($28)

    "#,8

    Summary: regression *hen Xi is binary (.$)

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    59/84

    Summary: regression *henXiis binary (.$)

    Yi *9 $i9 ui

    * mean of Ygien that *

    *9$ mean of Ygien that $

    $ difference in group means$ minus * -E( $C ) has the usual interpretation

    t#statistics confidence interals constructed as usual

    0his is another !ay to do difference#in#meansanalysis

    "#,.

    0he regression formulation is especially useful !hen

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    60/84

    0he regression formulation is especially useful !hen

    !e hae additional regressors (comin* up soon3)

    "#'*

    /ther Regression Statistics (Section 4'F)

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    61/84

    g ( )

    % natural question is ho! !ell the regression line @fitsA

    or eFplains the data2 0here are t!o regression statistics

    that proide complementary measures of the quality of

    fit:

    0he re*ression R

    measures the fraction of theariance of Ythat is eFplained byG it is unitless and

    ranges bet!een zero (no fit) and one (perfect fit)

    0hestandard error of the re*ressionmeasures the fit the typical size of a regression residual in the units

    of Y2

    "#'$

    !heR

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    62/84

    Write Yias the sum of the 6- prediction 9 6-

    residual:

    Yi CiY9 iu

    0heR

    is the fraction of the sample ariance of Yi@eFplainedA by the regression that is by CiY:

    R

    !SS

    TSS

    !here!SS)

    $

    ( )n

    i

    i

    Y Y=

    and TSS )$

    ( )n

    i

    i

    Y Y=

    2

    "#'

    R!SS

    h !SS) ( )

    n

    Y Y d TSS )( )n

    Y Y

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    63/84

    RTSS

    !here!SS)

    $

    ( )ii

    Y Y=

    and TSS )$

    ( )ii

    Y Y=

    0heR:

    R * means!SS * soeFplains none of theariation of Y

    R

    $ means!SS TSS so Y CYsoeFplains all ofthe ariation of Y

    * TRT $

    Oor regression !ith a single regressor (the case here)Ris the square of the correlation coefficient bet!een

    and Y

    "#'4

    !heStandard Error of the Regression(SER)

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    64/84

    f g ( )

    0he standard error of the regression is (almost) the

    sample standard deiation of the 6- residuals:

    S!R)

    $

    $ ( )

    n

    i i

    i

    u u

    n =

    )

    $

    $

    n

    i

    i

    un

    =

    (the second equality holds because$

    $

    n

    i

    i

    un

    =

    *)2

    "#'"

    S!R)$

    n

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    65/84

    S!R)

    $

    i

    i

    un

    =

    0he S!R:

    has the units of u !hich are the units of Y

    measures the spread of the distribution of u

    measures the aerage @sizeA of the 6- residual (the

    aerage @mista3eA made by the 6- regression line)

    0he root mean squared error(R4S!) is closelyrelated to the S!R:

    R4S! )

    $

    $ n

    i

    i

    un

    =

    0his measures the same thing as the S!R the minor

    difference is diision by $/ninstead of $/(n)2"#',

    Technical note: !hy diide by n instead of n$?

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    66/84

    y y

    S!R)

    $

    $

    n

    i

    i

    un

    =

    5iision by n is a @degrees of freedomA correction

    li3e diision by n$ in)

    Ys G the difference is that in the

    S!R t!o parameters hae been estimated (*and $ by

    * and $

    C ) !hereas in)

    Ys only one has been estimated

    (Y by Y)2

    When nis large it ma3es negligible difference !hethern n$ or n are used although the conentional

    formula uses n !hen there is a single regressor2

    "#''

    Oor details see -ection $,2"

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    67/84

    EFample ofRand S!R

    TestScore '.82. 28STRR

    2*, S!R $82'($*2") (*2,)

    "#';

    The slope coefficient is statistically si*nificant and lar*e

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    68/84

    in a policy sense( even thou*h STR e+plains only a small

    fraction of the variation in test scores2

    "#'8

    1ractical &ote: ?eteros-edasticity0

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    69/84

    ?omos-edasticity0 and the 5ormula "or the Standard

    Errors o" * and $

    C (Section 4'D)

    What do these t!o terms mean?

    Consequences of homos3edasticity

    Dmplication for computing standard errors

    What do these t*o terms mean

    Df ar(uK+) is constant that is the ariance of theconditional distribution of ugiendoes not depend on

    then uis said to be homosedastic2 6ther!ise uis

    said to be heterosedastic2"#'.

    omosedasticityin a picture2

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    70/84

    !(uK+) * (usatisfies east -quares %ssumption M$)

    0he ariance of udoes notchange !ith (depend on)+

    "#;*

    eterosedasticityin a picture2

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    71/84

    !(uK+) * (usatisfies east -quares %ssumption M$)

    "#;$

    0he ariance of udepends on+ so uis

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    72/84

    heteros3edastic2

    %n real#!orld eFample of heterosedasticityfrom labor

    economics: aerage hourly earnings s2 years of

    education (data source: $... Current Population -urey)

    "#;

    Average Hourly Earnings Fitted values

    "0

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    73/84

    Averagehourlyearn

    ings

    Scatterplot and OLS Regression LineYears of Education

    5 10 15 0

    0

    0

    !0

    "0

    "#;4

    Ds heteros3edasticity present in the class size data?

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    74/84

    Hard to sayBloo3s nearly homos3edastic but the spread

    might be tighter for large alues of STR2

    "#;"

    -o far !e hae (!ithout saying so) assumed that uis

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    75/84

    heteros3edastic:

    Recall the three least s5uares assumptions2

    $2 0he conditional distribution of ugienhas mean

    zero that is!(uK+) *2

    2 (i(Yi) i$Bn are i2i2d242 and uhae four finite moments2

    Heteros3edasticity and homos3edasticity concern ar(uK

    +)2

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    76/84

    Uou can proe some theorems about 6- (inparticular the Vauss#ar3o theorem !hich says

    that 6- is the estimator !ith the lo!est ariance

    among all estimators that are linear functions of (Y$

    BYn)G see -ection $,2,)2

    0he formula for the ariance of $C

    and the 6-standard error simplifies (%pp2 "2"): Df ar(uiKi+)

    )

    u then

    ar( $C ) ) )ar=( ) >

    ( )

    i + i

    u

    n

    B

    )

    )

    u

    n

    -ote: ar( $C ) is inersely proportional to ar():

    more spread inmeans more information about $C 2

    "#;'

    7eneral formulafor the standard error of $C is the of:

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    77/84

    $

    )

    CC

    ) )

    $

    )

    )

    $

    $( )

    $

    $( )

    n

    i ii

    n

    i

    i

    un

    n

    n

    =

    =

    2

    Special caseunder homos3edasticity:

    $

    )

    CC

    )

    $

    )

    $

    $

    $

    $ ( )

    n

    i

    i

    n

    i

    i

    un

    n n

    =

    =

    2

    -ometimes it is said that the lo!er formula is simpler2"#;;

    0he homos3edasticity#only formula for the standard error

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    78/84

    of $C and the @heteros3edasticity#robustA formula (the

    formula that is alid under heteros3edasticity) differ in

    generalyou *et different standard errors usin* the

    different formulas2

    ?omos-edasticity=only standard errors are the

    de"ault setting in regression so"t*are Gsometimes the only setting (e'g' E>cel)' !o get

    the general +heteros-edasticity=robust,

    standard errors you must o2erride the de"ault'

    Df you dont oerride the default and there is in fact

    heteros3edasticity you !ill get the !rong standard errors

    (and !rong t#statistics and confidence interals)2

    "#;8

    The critical points:

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    79/84

    Df the errors are homos3edastic and you use theheteros3edastic formula for standard errors (the one

    !e deried) you are 6&

    Df the errors are heteros3edastic and you use thehomos3edasticity#only formula for standard errors

    the standard errors are !rong2 0he t!o formulas coincide (!hen nis large) in the

    special case of homos3edasticity

    0he bottom line: you should al!ays use theheteros3edasticity#based formulas these are

    conentionally called the heterosedasticity"robust

    standard errors2

    "#;.

    ?eteros-edasticity=robust standard errors in S!!

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    80/84

    regress testscr str, robust

    Regression with robust standard errors Number of obs = 420

    F( 1, 418) = 192! "rob # F = 00000 R$s%uared = 00&12 Root ' = 18&81$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ * Robusttestscr * +oef td rr t "#*t* 9&- +onf .nter/a$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ str * $2239808 &194892 $49 0000 $0094& $12&8!31 5cons * !989 10!4! !344 0000 !38&!02 3190&3$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

    Hse the +0 robust, option

    "#8*

    Summary and ssessment (Section 4'.)

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    81/84

    0he initial policy question:

    -uppose ne! teachers are hired so the student#

    teacher ratio falls by one student per class2 What

    is the effect of this policy interention (this

    @treatmentA) on test scores?

    5oes our regression analysis gie a conincing ans!er?-ot really districts !ith lo! STRtend to be ones

    !ith lots of other resources and higher income

    families !hich proide 3ids !ith more learning

    opportunities outside schoolBthis suggests that

    corr(uiSTRi) Q * so!(uiKi)*2

    "#8$

    #igression on Causality

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    82/84

    0he original question (!hat is the quantitatie effect of

    an interention that reduces class size?) is a question

    about a causal effect: the effect on Yof applying a unit

    of the treatment is $2

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    83/84

    %deal: sub7ects all follo! the treatment protocol perfect compliance no errors in reporting etc2X

    Randomi$ed: sub7ects from the population of interestare randomly assigned to a treatment or control group

    (so there are no confounding factors)

    #ontrolled: haing a control group permitsmeasuring the differential effect of the treatment

    !+periment: the treatment is assigned as part of theeFperiment: the sub7ects hae no choice !hich

    means that there is no @reerse causalityA in !hich

    sub7ects choose the treatment they thin3 !ill !or3

    best2

    "#84

  • 8/14/2019 Introduction to Econometrics- Stock & Watson -Ch 4 Slides.doc

    84/84