3.3 power point

Upload: krothroc

Post on 08-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 3.3 Power Point

    1/19

    Correlation andCorrelation and

    Regression Wisdom:Regression Wisdom:Section 3.3Section 3.3

  • 8/7/2019 3.3 Power Point

    2/19

    CautionsCautions

    Correlation and regression describe onlyCorrelation and regression describe onlylinear relationships!linear relationships! You can do theYou can do thecalculations for any relationship betweencalculations for any relationship between

    two variables, but the results are useful onlytwo variables, but the results are useful onlyif the scatterplot shows a linear pattern.if the scatterplot shows a linear pattern.

    Extrapolation often produces unreliableExtrapolation often produces unreliable

    results!results! Correlation is not resistant!Correlation is not resistant! Always plotAlways plot

    your data and look for unusual observationsyour data and look for unusual observationsbefore you interpret correlation.before you interpret correlation.

  • 8/7/2019 3.3 Power Point

    3/19

    Which Points Have theWhich Points Have the

    Influence?Influence? Each point has an influence on the LSR line,Each point has an influence on the LSR line,

    some make large contributions and somesome make large contributions and some

    small. Some make positive contributions andsmall. Some make positive contributions andsome negative.some negative.

    Our goal is to learn to recognize the points theOur goal is to learn to recognize the points the

    points in a data set that may have an unusuallypoints in a data set that may have an unusuallylarge influence on where the regression linelarge influence on where the regression line

    goes or on the size and sign or the correlation.goes or on the size and sign or the correlation.

  • 8/7/2019 3.3 Power Point

    4/19

    Outliers and InfluentialOutliers and Influential

    Observations in RegressionObservations in Regression

  • 8/7/2019 3.3 Power Point

    5/19

    0

    10

    20

    30

    40

    50

    60

    70

    80

    0 5 10 15 20 25 30 35 40 45

    Aver e_Life_Span

    Coll ction1 Sc tt r Plot

    Animal LongevityAnimal Longevity

    TheThe

    relationrelation--shipship

    betweenbetween

    maximummaximum

    andandaverageaverage

    life spanlife span

    forfor

    mammals.mammals.

    Beaver: avg. = 5

    max = 50

    Elephant: avg. = 35

    max = 70

    Hippo: avg. = 41

    max = 54

  • 8/7/2019 3.3 Power Point

    6/19

    If you look at the entire sample, the beaver,If you look at the entire sample, the beaver,

    elephant, and hippo are the oddballs of theelephant, and hippo are the oddballs of the

    bunch.bunch.

    The LSR line for all the mammals in theThe LSR line for all the mammals in the

    sample is:sample is:

    where is the predicted maximumwhere is the predicted maximum

    longevity andlongevity and AA stands for observedstands for observed

    average longevity.average longevity.

    The correlation for the relationship is .77.The correlation for the relationship is .77.

    AM 58.153.10 !

    M

  • 8/7/2019 3.3 Power Point

    7/19

    Are Outliers Always Influential?Are Outliers Always Influential?

    In Chapter 1, outliers were influentialIn Chapter 1, outliers were influentialpoints, because they were far from thepoints, because they were far from the

    other values AND the mean changesother values AND the mean changesdrastically when they are removed. Indrastically when they are removed. Inregression, not all outliers are influential.regression, not all outliers are influential.Influential points often have smallInfluential points often have small

    residuals because they pull theresiduals because they pull theregression line toward themselves, soregression line toward themselves, sojust looking at a residual plot is notjust looking at a residual plot is notenough.enough.

  • 8/7/2019 3.3 Power Point

    8/19

    How Do We Know?How Do We Know?

    The surest way to verify that a point isThe surest way to verify that a point is

    influential is to find the regression lineinfluential is to find the regression line

    both with and without the suspect point. Ifboth with and without the suspect point. Ifthe line moves more than a small amountthe line moves more than a small amount

    when the point is deleted, the point iswhen the point is deleted, the point is

    influential.influential.

  • 8/7/2019 3.3 Power Point

    9/19

    Maxi _ i _ an = 1.58Average_ i e_ an + 10.5; r2 = 0.59

    0

    10

    20

    30

    40

    50

    60

    70

    80

    0 5 10 15 20 25 30 35 40 45

    Av r g if

    Collection 1 Scatter Plot

    With all mammals:slope = 1.58

    and r = .77

    Maximum_ i e_Span = 1.96Average_ i e_Span + 6.3; r2 = 0.64

    0

    10

    20

    30

    40

    50

    60

    70

    80

    0 5 10 15 20 25 30 35 40 45

    Av r g if

    Collection 1 Scatter Plot

    Without hippos:slope = 1.96

    and r = .80

    Maximum_ i e_Span = 1.53Average_ i e_Span + 11; r2 = 0.52

    0

    10

    20

    30

    40

    50

    60

    70

    80

    0 5 10 15 20 25 30 35 40 45

    Av r g if

    Collection 1 Scatter Plot

    Without elephant:slope = 1.53

    and r = .72

    Maximum_ i e_Span = 1.69Average_ i e_Span + 8.1; r2 = 0.69

    0

    10

    20

    30

    40

    50

    60

    70

    80

    0 5 10 15 20 25 30 35 40 45

    Av r g if

    Collection 1 Scatter Plot

    Without beaver:slope = 1.69

    and r = .83

  • 8/7/2019 3.3 Power Point

    10/19

    The Anscombe sets:The Anscombe sets: M

    ake a scatterplot, find the LSR equation,M

    ake a scatterplot, find the LSR equation,and find the correlation:and find the correlation:

    Set 1:Set 1: xx yy

    1010 8.048.04

    88 6.956.95

    1313 7.587.5899 8.818.81

    1111 8.338.33

    1414 9.969.96

    66 7.247.2444 4.264.26

    12 10.8412 10.84

    77 4.824.82

    55 5.685.68

    xy 5.!

    r = .82

  • 8/7/2019 3.3 Power Point

    11/19

    Make a scatterplot, find the LSR equation,M

    ake a scatterplot, find the LSR equation,and find the correlation:and find the correlation:

    Set 2:Set 2: xx yy

    1010 9.149.14

    88 8.148.14

    1313 8.748.7499 8.778.77

    1111 9.269.26

    1414 8.108.10

    66 6.136.1344 3.103.10

    12 9.1312 9.13

    77 7.267.26

    55 4.744.74

    The Anscombe sets:The Anscombe sets:

    xy 5.!

    r = .82

  • 8/7/2019 3.3 Power Point

    12/19

    Make a scatterplot, find the LSR equation,M

    ake a scatterplot, find the LSR equation,and find the correlation:and find the correlation:

    Set 3:Set 3: xx yy

    1010 7.467.46

    88 6.776.77

    13 12.7413 12.7499 7.117.11

    1111 7.817.81

    1414 8.848.84

    66 6.086.0844 5.395.39

    12 8.1512 8.15

    77 6.426.42

    55 5.735.73

    The Anscombe sets:The Anscombe sets:

    xy 5.!

    r = .82

  • 8/7/2019 3.3 Power Point

    13/19

    Make a scatterplot, find the LSR equation,M

    ake a scatterplot, find the LSR equation,and find the correlation:and find the correlation:

    Set 4:Set 4: xx yy

    88 6.956.95

    88 5.765.76

    88 7.717.7188 8.848.84

    88 8.478.47

    88 7.047.04

    88 5.255.2519 12.5019 12.50

    8 5.568 5.56

    88 7.917.91

    88 6.896.89

    The Anscombe sets:The Anscombe sets:

    xy 5.!

    r = .82

  • 8/7/2019 3.3 Power Point

    14/19

    QuestionsQuestions

    Which plots have a point influential withWhich plots have a point influential with

    respect to the slope of the LSR line? Howrespect to the slope of the LSR line? How

    would the slope change if the point waswould the slope change if the point was

    removed?removed?

    Which plots have a that is influential withWhich plots have a that is influential with

    respect to the correlation? How would therespect to the correlation? How would the

    correlation change if the point wascorrelation change if the point wasremoved?removed?

  • 8/7/2019 3.3 Power Point

    15/19

    Matching:Matching:

  • 8/7/2019 3.3 Power Point

    16/19

    Beware the Lurking Variable!Beware the Lurking Variable!

  • 8/7/2019 3.3 Power Point

    17/19

    Imported Goods and Spending on HealthImported Goods and Spending on Health

    The explanatory variable isThe explanatory variable is

    the dollar value of goodsthe dollar value of goodsimported into the USimported into the US

    between 1990 and 2001.between 1990 and 2001.

    The response variable isThe response variable is

    private spending on healthprivate spending on healthin these years.in these years.

  • 8/7/2019 3.3 Power Point

    18/19

    Are You in a Relationship?Are You in a Relationship?

    There is no economic relationshipThere is no economic relationshipbetween these variables. The strongbetween these variables. The strong

    association is due entirely to the fact thatassociation is due entirely to the fact thatboth imports and health spending grewboth imports and health spending grewrapidly in these years. The common yearrapidly in these years. The common yearfor each point is a lurking variable. Anyfor each point is a lurking variable. Any

    two variables that both increase overtwo variables that both increase overtime will show a strong association. Thistime will show a strong association. Thisdoes not mean that one variable explainsdoes not mean that one variable explainsor influences each other.or influences each other.

  • 8/7/2019 3.3 Power Point

    19/19

    Hey, Watch it!Hey, Watch it!

    Association doesAssociation does notnot

    imply causation!!!imply causation!!!