overview

Post on 23-Feb-2016

29 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Overview. 4.2 Introduction to Correlation 4.3 Introduction to Regression. Scatterplots. Used to summarize the relationship between two quantitative variables that have been measured on the same element Graph of points (x, y) each of which represents one observation from the data set - PowerPoint PPT Presentation

TRANSCRIPT

Overview

4.2 Introduction to Correlation

4.3 Introduction to Regression

ScatterplotsUsed to summarize the relationship between two quantitative variables that have been

measured on the same element

Graph of points (x, y) each of which represents one observation from the data set

One of the variables is measured along the horizontal axis and is called the x variable

The other variable is measured along the vertical axis and is called the y variable

Predictor Variable and Response Variable

The value of the x variable can be used to predict or estimate the value of the

y variable

The x variable is referred to as the predictor variable

The y variable is called the response variable

Scatterplot TerminologyNote the terminology in the caption to Figure

4.2.

When describing a scatterplot, always indicate the y variable first and use the term versus (vs.) or against the x variable.

This terminology reinforces the notion that the y variable depends on the x variable.

FIGURE 4.2Scatterplot of sales price versus square

footage.

Positive relationshipAs the x variable increases in value, the y variable also tends to increase.

FIGURE 4.3 (a) Scatterplot of a positive relationship

Negative relationshipAs the x variable increases in value, the y variable tends to decrease

FIGURE 4.3 (b) scatterplot of a negative relationship

No apparent relationshipAs the x variable increases in value, the y

variable tends to remain unchanged

FIGURE 4.3 (c) scatterplot of no apparent relationship.

4.2 Introduction to CorrelationObjective:By the end of this section, I will beable to…

1) Calculate and interpret the value of the correlation coefficient.

Correlation Coefficient rMeasures the strength and direction of the

linear relationship between two variables.

sx is the sample standard deviation of the x data values.

sy is the sample standard deviation of the y data values.

)( )(( 1) x y

y yx xrn s s

Example 4.5 - Calculating the correlation coefficient rFind the value of the correlation coefficient rfor the temperature data in Table 4.11.

Table 4.11 High and low temperatures, in degrees Fahrenheit, of 10 American cities

Interpreting the Correlation Coefficient r

1) Values of r close to 1 indicate a positive relationship between the two variables.

The variables are said to be positively correlated.

As x increases, y tends to increase as well.

Interpreting the Correlation Coefficient r2) Values of r close to -1 indicate a negative

relationship between the two variables.

The variables are said to be negatively correlated.

As x increases, y tends to decrease.

Interpreting the Correlation Coefficient r3) Other values of r indicate the lack of either

a positive or negative linear relationship between the two variables.

The variables are said to be uncorrelated

As x increases, y tends to neither increase nor decrease linearly.

Guidelines for Interpreting the Correlation Coefficient rIf the correlation coefficient between twovariables isgreater than 0.7, the variables are positively

correlated.between 0.33 and 0.7, the variables are

mildly positively correlated.between –0.33 and 0.33, the variables are

not correlated.between –0.7 and –0.33, the variables are

mildly negatively correlated. less than –0.7, the variables are negatively

correlated.

Example 4.6 - Interpreting the correlation coefficientInterpret the correlation coefficient found in Example 4.5.

Example 4.6 continuedSolution

In Example 4.5, we found the correlation coefficient for the relationship between high and low temperature to be r = 0.9761.

r = 0.9761 very close to 1. We would therefore say that high and low

temperatures for these 10 American cities are strongly positively correlated.

As low temperature increases, high temperatures also tend to increase.

Equivalent Computational Formula for Calculating the Correlation Coefficient r

2 22 2

/

/ /

xy x y nr

x x n y y n

Example 4.7Use the computational formula to calculate the correlation coefficient r for the relationshipbetween square footage and sales price of the eight home lots for sale in Glen Ellyn from Table 4.6 (Example 4.3 in Section 4.1).

SummarySection 4.2 introduces the correlation coefficient r, a measure of the strength of linear

association between two numeric variables.

Values of r close to 1 indicate that the variables are positively correlated.

Values of r close to –1 indicate that the variables are negatively correlated.

Values of r close to 0 indicate that the variables are not correlated.

4.3 Introduction to RegressionObjectives:By the end of this section, I will beable to…

1) Calculate the value and understand the meaning of the slope and the y intercept of the regression line.

2) Predict values of y for given values of x.

Equation of the Regression LineApproximates the relationship between x

and y

The equation is where the regression coefficients are the

slope, b1, and the y intercept, b0.

The “hat” over the y (pronounced “y-hat”) indicates that this is an estimate of y and not necessarily an actual value of y.

0 1y b b x

Example 4.8 - Calculating the regression coefficients b0 and b1

Find the value of the regression coefficients b0 and b1 for the temperature data inTable 4.11.

Table 4.11 High and low temperatures, in degrees Fahrenheit, of 10 American cities

Example 4.8 continuedStep 4:

Thus, the equation of the regression line for the temperature data is

10.0533 0.9865y x

Example 4.8 continuedSince y and x represent high and low

temperatures, respectively, this equation is read as follows:

“The estimated high temperature for an American city is 10.0533 degrees Fahrenheit plus 0.9865 times the low temperature for that city.”

Using the Regression Equation to Make PredictionsFor any particular value of x, the predicted

value for y lies on the regression line.

Example 4.11

Suppose we are considering moving to a city that has a low temperature of 47 degrees Fahrenheit (ºF) on this particular winter’s day. What would the estimated high temperature be for this city?

Example 4.11 continuedSolution

Plug the value of 47ºF for the variable low into the regression equation from Example 4.8:

We would say: “The estimated high temperature for an American city with a low of 47ºF, is 56.4188ºF.”

10.0533 0.9865

10.0533 0.9865 47

56.4188

y low

Interpreting the SlopeRelationship Between Slope and Correlation Coefficient

The slope b1 of the regression line and the correlation coefficient r always have the same sign.

b1 is positive if and only if r is positive.

b1 is negative if and only if r is negative.

top related