correlation measuring association establishing a degree of association between two or more variables...

34
Correlation MEASURING ASSOCIATION • Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise. Scientists spend most of their time figuring out how one thing relates to another and structuring these relationships into explanatory theories. The question of association comes up in normal discourse as well, as in "like father like son“.

Post on 20-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Correlation• MEASURING ASSOCIATION• Establishing a degree of association

between two or more variables gets at the central objective of the scientific enterprise. Scientists spend most of their time figuring out how one thing relates to another and structuring these relationships into explanatory theories. The question of association comes up in normal discourse as well, as in "like father like son“.

Page 2: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Scatterplots

A. scatter diagram

A list of 1,078 pairs of heights would be impossible to grasp. [so we need some method that can examine this data and convert it into a more conceivable format]. One method is plotting the data for the two variables (father's height and son's height) in a graph called a scatter diagram.

Page 3: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

B. The Correlation CoefficientThis scatter plot looks like a cloud of points which

visually can give us a nice representation and a gut feeling on the strength of the relationship, and is especially useful for examining outliners or data anomalies, but statistics isn't too fond of simply providing a gut feeling. Statistics is interested in the summary and interpretation of masses of numerical data - so we need to summarize this relationship numerically. How do we do that - yes, with a correlation coefficient.

The correlation coefficient ranges from +1 to -1

Page 4: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

r = 1.0

Page 5: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

r = .85

Page 6: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

r = .42

Page 7: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

R = .17

Page 8: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

R = - .94

Page 9: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

R = - .54

Page 10: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

R = - .33

Page 11: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

• Computing the Pearson's r correlation coefficient

• Definitional formula is:

Convert each variable to standard units (zscores). The average of the products give the correlation coefficient. But this formula requires you to calculate z-scores for each observation, which means you have to calculate the standard deviation of X and Y before you can get started. For example, look what you have to do for only 5 cases.

Page 12: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Dividing the Sum of ZxZy (2.50) by N (5) get you the correlation coefficient = .50

Page 13: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

• The above formula can also be translated into the following – which is a little easier to decipher but is still tedious to use.

yxSSSS

SPr

Page 14: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

))(( YYXXSP

2)( XXSSx

2)( YYSS y

Page 15: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

• Or in other words …..

22YYXX

YYXXr

Page 16: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

• Therefore through some algebraic magic we get the computational formula, which is a bit more manageable.

2222 YNYXNX

YXNXYr

Page 17: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Interpreting correlation coefficients• Strong Association versus Weak

Association: strong: knowing one helps a lot in predicting the other. Weak, information about one variables does not help much in guessing the other. 0 = none; .25 weak; .5 moderate; .75 < strong

• Index of Association• R-squared defined as the proportion of the

variance of one variable accounted for by another variable a.k.a PRE STATISTIC (Proportionate Reduction of Error))

Page 18: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Significance of the correlation

• Null hypothesis?

• Formula:

• Then look to Table C in Appendix B

• Or just look at Table F in Appendix B

21

2

r

Nrt

Page 19: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Limitations of Pearson's r

• 1) at best, one must speak of "strong" and "weak," "some" and "none"-- precisely the vagueness statistical work is meant to cure.

• 2) Assumes Interval level data: Variables measured at different levels require that different statistics be used to test for association.

Page 20: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

• 3) Outliers and nonlinearity• The correlation coefficient does not always give a true

indication of the clustering. There are two main exceptional cases: Outliers and nonlinearity.

r = .457 r = .336

Page 21: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

4. Assumes a linear relationship

0

10000

20000

30000

40000

50000

60000

0 5 10 15 20 25 30

Education

Sala

ry

Series1

Page 22: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

4) Christopher Achen in 1977 argues (and shows empirically) that two correlations can differ because the variance in the samples differ, not because the underlying relationship has changed.

Solution?

Regression analysis

Page 23: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Three Types of Unobtrusive Research

1. Content analysis - examine written documents such as editorials.

2. Analyses of existing statistics.

3. Historical/comparative analysis - historical records.

Page 24: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

What is Content Analysis?

• Study of recorded human communication

• Topic Appropriate for CA– “who says what, to whom, how, and with

what”– Effects of the Media

Page 25: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

ExampleInvestigated the media’s role in framing the welfare

privatization debate with a content analysis of ABC, CBS & NBC evening news & special programs from 1/1/94 to 8/22/96. Specials include Nightline, 20/20 and This Week with David Brinkley on ABC; 60 Minutes, 48 Hours and Face the Nation on CBS.

Searched LexisNexis and the Vanderbilt Television Archives for all transcripts pertaining to the issue of how welfare should be administered, and found 191 stories.

At the time of the study NBC’s transcripts are not available on LexisNexis prior to 1997. Authors searched for stories using the Vanderbilt News Archives and then purchased pre-1997 transcripts from Burrell’s Transcripts.

Page 26: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Coding, Counting and Record Keeping

• Unit of Analysis

• Manifest vs. Latent Content coding

• Analysis:– Counting– Qualitative evaluation

Page 27: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Coding: Pro-Privatization Frames

CAUSE OF PROBLEM/PROBLEM/SOLUTION9. Delivery / dependency / faith-based10. Delivery / economic costs / faith-based11. Delivery / dependency / non-profits12. Delivery / econ. costs / non-profits13. Delivery / dependency / for-profits14. Delivery / econ. costs / for-profits16. Gen govt / dependency / faith-based17. Gen govt / econ. costs / faith-based18. Gen govt / dependency / non-profits

Page 28: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Coding: Anti-Privatization Frames

CAUSE OF PROBLEM/PROBLEM/SOLUTION3. Privatization / job loss / don’t privatize4. Privatization / job loss / don’t devolve5. Privatization / accountability / don’t privatize6. Privatization / accountability / don’t devolve11. Secular / job loss / don’t privatize12. Secular / job loss / don’t devolve13. Secular / accountability / don’t privatize

Page 29: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Hypothesis & Findings

• Authors hypothesized that mainstream (corporate owned) media would be biased toward privatization.

• Findings did not support such a hypothesis. Media coverage was remarkably balanced (with slight leaning against privatization)

Page 30: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Strengths of Content Analysis

• Economy of time and money.• Easy to repeat a portion of the study if

necessary.• Permits study of processes over time.• Researcher seldom has any effect on the

subject being studied.• Reliability.

Page 31: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Weaknesses of Content Analysis

• Limited to the examination of recorded communications.

• Problems of validity are likely.

Page 32: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Analyzing Existing Statistics

• Can be the main source of data or a supplemental source of data.

• Often existing data doesn't cover the exact question.

• Reliability is dependent on the quality of the statistics.

• Examples: Census data, Crime Stats

Page 33: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Analyzing Existing Statistics

• Can be the main source of data or a supplemental source of data.

• Often existing data doesn't cover the exact question.

• Reliability is dependent on the quality of the statistics.

• Examples: Census data, Crime Stats

Page 34: Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise

Problems with Existing Statistics

• Problems with Validity– What’s available v. what is needed

• Problems with Reliability– Moreno Valley Example