correlation association between 2 variables. suppose we wished to graph the relationship between...

35
Correlation Association between 2 variables

Upload: sybil-knight

Post on 31-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Correlation

Association between 2 variables

Page 2: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Suppose we wished to graph the relationship between foot length

58

60

62

64

66

68

70

72

74

Hei

gh

t

4 6 8 10 12 14

Foot Length

and height

In order to create the graph, which is called a scatterplot or scattergram, we need the foot length and height for each of our subjects.

of 20 subjects.

Page 3: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

1. Find 12 inches on the x-axis.2. Find 70 inches on the y-axis.3. Locate the intersection of 12 and 70.4. Place a dot at the intersection of 12 and 70.

Hei

gh

t

Foot Length

Assume our first subject had a 12 inch foot and was 70 inches tall.

Page 4: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

5. Find 8 inches on the x-axis.6. Find 62 inches on the y-axis.7. Locate the intersection of 8 and 62.8. Place a dot at the intersection of 8 and 62.9. Continue to plot points for each pair of scores.

Assume that our second subject had an 8 inch foot and was 62 inches tall.

Page 5: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

Notice how the scores cluster to form a pattern.

The more closely they cluster to a line that is drawn through them, the stronger the linear relationship between the two variables is (in this case foot length and height).

Page 6: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

If the points on the scatterplot have an upward movement from left to right,

If the points on the scatterplot have a downward movement from left to right,

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

we say the relationship between the variables is positive.

we say the relationship between the variables is negative.

Page 7: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

A positive relationship means that high scores on one variable

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

are associated with high scores on the other variable

are associated with low scores on the other variable. It also indicates that low scores on one variable

Page 8: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

A negative relationship means that high scores on one variable are associated with low scores on the other variable.

are associated with high scores on the other variable. It also indicates that low scores on one variable

Page 9: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Not only do relationships have direction (positive and negative), they also have strength (from 0.00 to 1.00 and from 0.00 to –1.00).

The more closely the points cluster toward a straight line,the stronger the relationship is.

Page 10: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

A set of scores with r= –0.60 has the same strength as a set of scores with r= 0.60 because both sets cluster similarly.

Page 11: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

For this procedure, we use Pearson’s r (also known as a Pearson Product Moment Correlation Coefficient). This statistical procedure can only be used when BOTH variables are measured on a continuous scale and you wish to measure a linear relationship.

Linear Relationship Curvilinear Relationship

NO

Pearson r

Page 12: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Formula for correlations

yx

xy

yx SDSD

Cov

SS

nyyxxr

/))((

or

y

i

x

i

s

yy

s

xx

nr

1

Page 13: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Assumptions of the PMCC

1. The measures are approximately normally distributed

2. The variance of the two measures is similar (homoscedasticity) -- check with scatterplot

3. The relationship is linear -- check with scatterplot

4. The sample represents the population5. The variables are measured on a interval

or ratio scale

Page 14: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Example

• We’ll use data from the class questionnaire in 2005 to see if a relationship exists between the number of times per week respondents eat fast food and their weight

• What’s your guess (hypothesis) about how the results of this test will turn out? .5? .8? ???

Page 15: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Example• To get a correlation

coefficient:• Slide the variables

over...

Page 16: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Example

• SPSS output

The red is our correlation coefficient. The blue is our level of significance resulting from the test…what does

that mean?

Page 17: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Digression - Hypotheses

• Many research designs involve statistical tests – involve accepting or rejecting a hypothesis

• Null (statistical) hypotheses assume no relationship between two or more variables.

• Statistics are used to test null hypotheses– E.g. We assume that there is no relationship

between weight and fast food consumption until we find statistical evidence that there is

Page 18: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Probability• Probability is the odds that a certain event will

occur• In research, we deal with the odds that

patterns in data have emerged by chance vs. they are representative of a real relationship

• Alpha () is the probability level (or significance level) set, in advance, by the researcher as the odds that something occurs by chance

Page 19: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Probability

• Alpha levels (cont.)– E.g. = .05 means that there will be a 5%

chance that significant findings are due to chance rather than a relationship in the data

– The lower the the better, but… level must be set in advance

Page 20: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Probability

• Most statistical tests produce a p-value that is then compared to the -level to accept or reject the null hypothesis• E.g. Researcher sets significance level at .05

a priori; test results show p = .02. • Researcher can then reject the null

hypothesis and conclude the result was not due to chance but to there being a real relationship in the data

• How about p = .051, when -level = .05?

Page 21: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Error

• Significance levels (e.g. = .05) are set in order to avoid error– Type I error = rejection of the null

hypothesis when it was actually true• Conclusion = relationship; there wasn’t one

(false positive) (= )

– Type II error = acceptance of the null hypothesis when it was actually false

• Conclusion = no relationship; there was one

Page 22: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Error – Truth Table

Null True Null False

Accept Type II error

Reject Type I error

Page 23: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Back to Our Example• Conclusion: No relationship exists between

weight and fast food consumption with this group of respondents

Page 24: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Really?

• Conclusion: No relationship exists between weight and fast food consumption with this group of subjects– Do you believe this? Can you critique it?

Construct validity? External validity?– Thinking in this fashion will help you adopt

a critical stance when reading research

Page 25: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Another Example

• Now let’s see if a relationship exists between weight and the number of piercings a person has– What’s your guess (hypothesis) about how

the results of this test will turn out?– It’s fine to guess, but remember that our

null hypothesis is that no relationship exists, until the data shows otherwise

Page 26: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Another Example (continued)

• What can we conclude from this test?

• Does this mean that weight causes piercings, or vice versa, or what?

Page 27: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Correlations and causality

• Correlations only describe the relationship, they do not prove cause and effect

• Correlation is a necessary, but not sufficient condition for determining causality

• There are Three Requirements to Infer a Causal Relationship

Page 28: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Correlations and causality

A statistically significant relationship between the variables

The causal variable occurred prior to the other variable

There are no other factors that could account for the cause Correlation studies do not meet the last

requirement and may not meet the second requirement (go back to internal validity – 497)

Page 29: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Correlations and causality

If there is a relationship between weight and # piercings it could be because weight # piercings weight # piercings weight some other factor # piercings

Which do you think is most likely here?

Page 30: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Other Types of Correlations

• Other measures of correlation between two variables:– Point-biserial correlation=use when you

have a dichotomous variable• The formula for computing a PBC is actually

just a mathematical simplification of the formula used to compute Pearson’s r, so to compute a PBC in SPSS, just compute r and the result is the same

Page 31: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Other Types of Correlations• Other measures of

correlation between two variables: (cont.)– Spearman rho

correlation; use with ordinal (rank) data

• Computed in SPSS the same way as Pearson’s r…simply toggle the Spearman button on the Bivariate Correlations window

Page 32: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Coefficient of Determination Correlation Coefficient Squared Percentage of the variability among scores on

one variable that can be attributed to differences in the scores on the other variable

The coefficient of determination is useful because it gives the proportion of the variance of one variable that is predictable from the other variable

Next week we will discuss regression, which builds upon correlation and utilizes this coefficient of determination

Page 33: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Correlation in excel

Use the function “correl”

The “arguments” (components) of the function are the two arrays

Page 34: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214
Page 35: Correlation Association between 2 variables. Suppose we wished to graph the relationship between foot length 58 60 62 64 66 68 70 72 74 Height 468101214

Applets (see applets page)

• http://www.stat.uiuc.edu/courses/stat100/java/GCApplet/GCAppletFrame.html

• http://www.stat.sc.edu/~west/applets/clicktest.html

• http://www.stat.sc.edu/~west/applets/rplot.html