example 3.6 measures of association: covariance and correlation

10
Example 3.6 Measures of Association: Covariance and Correlation

Upload: brenda-adams

Post on 19-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Example 3.6 Measures of Association: Covariance and Correlation

Example 3.6

Measures of Association: Covariance and Correlation

Page 2: Example 3.6 Measures of Association: Covariance and Correlation

3.1 | 3.2 | 3.3 | 3.4 | 3.5 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

EXPENSES.XLS

A survey questions members of 100 households about their spending habits.

The data in this file represent the salary, expense for cultural activities, expense for sports-related activities, and the expense for dining-out for each household over the past year.

Do these variables appear to be related linearly?

Page 3: Example 3.6 Measures of Association: Covariance and Correlation

3.1 | 3.2 | 3.3 | 3.4 | 3.5 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

Covariance and Correlation When we need to summarize the relationship between

two variables we can use the measures covariance and correlation. We summarize the type of behavior observed in a scatterplot.

Each measures the strength (and direction) of a linear relationship between two numerical variables.

The relationship is “strong” if the points in a scatterplot cluster tightly around some straight line. If this line rises form left to right then the relationship is “positive”. If it falls from left to right then the relationship is “negative”.

Page 4: Example 3.6 Measures of Association: Covariance and Correlation

3.1 | 3.2 | 3.3 | 3.4 | 3.5 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

Determining Linear Relationships Scatterplots of each variable versus each other would

provide the answer to the question but six scatterplots would be required, one for each pair.

To get a quick indication of possible linear relationships we can use Stat-Proto obtain a table of correlations and/or covariances.

Page 5: Example 3.6 Measures of Association: Covariance and Correlation

3.1 | 3.2 | 3.3 | 3.4 | 3.5 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

Table of Correlations and Covariances To get the table, place the cursor anywhere in the

data set and use the StatPro/Summary Stats/Correlations, Covariances menu item and proceed in the obvious way.

Page 6: Example 3.6 Measures of Association: Covariance and Correlation

3.1 | 3.2 | 3.3 | 3.4 | 3.5 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

Relationships

The only relationships that stand out are the positive relationships between salary and cultural expenses and between salary and dining expenses.

The negative relationships are between cultural and sports-related expenses.

To confirm these graphically we show scatterplots of Salary versus Culture and Culture versus Sports

Page 7: Example 3.6 Measures of Association: Covariance and Correlation

3.1 | 3.2 | 3.3 | 3.4 | 3.5 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

Scatterplot Indicating Positive Relationship

Page 8: Example 3.6 Measures of Association: Covariance and Correlation

3.1 | 3.2 | 3.3 | 3.4 | 3.5 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

Scatterplot Indicating Negative Relationship

Page 9: Example 3.6 Measures of Association: Covariance and Correlation

3.1 | 3.2 | 3.3 | 3.4 | 3.5 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

Correlation and Covariance Properties

In general, the following properties are evident from the Table of correlations and covariances.

– The correlation between a variable and itself is 1.

– The correlation between X and Y is the same as the correlation between Y and X. Therefore, it is sufficient to list the correlations below (or above) the diagonal in the table. (The same is true for the covariances).

– The covariance between a variable and itself is the variance of the variable. We indicate this in the heading of the covariance table.

Page 10: Example 3.6 Measures of Association: Covariance and Correlation

3.1 | 3.2 | 3.3 | 3.4 | 3.5 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

Correlation and Covariance Properties -- continued

– It is difficult to interpret the magnitudes of covariances. These depend on the fact that the data are measured in dollars rather than, say, thousands of dollars. It is such easier to interpret the magnitudes of the correlations because they are scaled to be between -1 and +1.