3.1a scatterplots and correlation explanatory and response...

6
3.1A Scatterplots and Correlation Explanatory and Response Variables We think that smoking causes a decrease in life expectancy. Notice the two variables, underlined. There is a relationship between the two variables, but each play a different role. Life expectancy is a response variable, while the number of cigarettes smoked is an explanatory variable. The National Student Loan Survey provides data on the amount of debt for recent college graduates, their income, and how stressed they feel about college debt. A sociologist looks at the data with the goal of using amount of debt and income to explain the stress caused by college debt. Which variable is the explanatory variable? Which variable is the response variable? Displaying Relationships: Scatterplots The most useful graph for displaying the relationship between two ____________________ variables is a scatterplot. Each individual in the data appears as a point in the graph. Response Variable Explanatory Variable How to Make a Scatterplot: 1. 2. 3. x-axis:_____________________________ y-axis:_____________________________ How to Interpret a Scatterplot: As in any graph of data, look for the overall pattern and for striking departures from that pattern. 1. Direction a. Positive Association – Low-Low and High-High moves from low left to upper right, positive slope b. Negative Association – Low-High and High-Low moves from upper left to low right, negative slope 2. Form – look for curves and clusters (is it a line or curve?) 3. Strength – determined by how closely the points follow a clear form (is it an obvious line or curve? 4. Outliers **If you are asked to make a scatterplot, be sure to label and scale both axes! Don’t just copy it down **

Upload: others

Post on 20-Oct-2019

19 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 3.1A Scatterplots and Correlation Explanatory and Response ...jschweigert.weebly.com/uploads/2/9/0/3/29037999/chapter_3.1_notes.pdf · 3.1A Scatterplots and Correlation Explanatory

3.1A Scatterplots and Correlation

Explanatory and Response Variables

We think that smoking causes a decrease in life expectancy. Notice the two variables, underlined. There is a relationship

between the two variables, but each play a different role. Life expectancy is a response variable, while the number of

cigarettes smoked is an explanatory variable.

The National Student Loan Survey provides data on the amount of debt for recent college graduates, their income, and

how stressed they feel about college debt. A sociologist looks at the data with the goal of using amount of debt and

income to explain the stress caused by college debt.

Which variable is the explanatory variable?

Which variable is the response variable?

Displaying Relationships: Scatterplots

The most useful graph for displaying the relationship between two ____________________ variables is a scatterplot.

Each individual in the data appears as a point in the graph.

Response Variable

Explanatory Variable

How to Make a Scatterplot:

1.

2.

3.

x-axis:_____________________________ y-axis:_____________________________

How to Interpret a Scatterplot:

As in any graph of data, look for the overall pattern and for striking departures from that pattern.

1. Direction

a. Positive Association – Low-Low and High-High moves from low left to upper right, positive slope

b. Negative Association – Low-High and High-Low moves from upper left to low right, negative slope

2. Form – look for curves and clusters (is it a line or curve?)

3. Strength – determined by how closely the points follow a clear form (is it an obvious line or curve?

4. Outliers

**If you are asked to make a scatterplot, be sure to label and scale both axes! Don’t just copy it down **

Page 2: 3.1A Scatterplots and Correlation Explanatory and Response ...jschweigert.weebly.com/uploads/2/9/0/3/29037999/chapter_3.1_notes.pdf · 3.1A Scatterplots and Correlation Explanatory

Example: Interpreting a Scatterplot

Interpret the scatterplot you just created for the backpacking data.

Scatterplot on the Calculator

Enter your explanatory data in L1 and your response data in L2.

2nd

y =

Enter for Plot 1

Enter to turn Plot 1 ON

Scroll down and make sure the first graph (scatterplot is highlighted)

Xlist: L1

YList: L2

Choose a mark

Graph

Zoom 9 (ZoomStat)

Page 3: 3.1A Scatterplots and Correlation Explanatory and Response ...jschweigert.weebly.com/uploads/2/9/0/3/29037999/chapter_3.1_notes.pdf · 3.1A Scatterplots and Correlation Explanatory

3.1B Measuring Linear Association: Correlation

Correlation r

The correlation r measures:

Important Stuff About r:

The correlation r is always between ____ and ____

Positive Association: ______________

Negative Association: ______________

Values of r close to _____ indicate a very weak linear relationship.

Values of r close to _____ or _____ indicate a very strong linear relationship.

Page 4: 3.1A Scatterplots and Correlation Explanatory and Response ...jschweigert.weebly.com/uploads/2/9/0/3/29037999/chapter_3.1_notes.pdf · 3.1A Scatterplots and Correlation Explanatory

A) Interpret the value of r in context.

B) What effect would removing the hiker with the heaviest body

weight from the data have on the correlation? Justify your answer.

A) Make a scatterplot on your calculator.

B) Describe the form of the relationship. Why is it not linear? Explain why the form of the relationship makes sense.

C) It does not make sense to describe the variables as either positively associated or negatively associated. Why?

D) Is the relationship reasonably strong or quite weak? Explain your answer.

Page 5: 3.1A Scatterplots and Correlation Explanatory and Response ...jschweigert.weebly.com/uploads/2/9/0/3/29037999/chapter_3.1_notes.pdf · 3.1A Scatterplots and Correlation Explanatory

Facts about Correlation

Correlation makes no distinction between explanatory and response variables.

o It makes no difference which variable you call x and which you call y in calculating the correlation.

Because r uses the standardized values of the observations, r does not change when we change the units of

measurement of x, y, or both.

o Measuring height in centimeters rather than inches and weight in kilograms rather than pounds does

not change the correlation between height and weight.

The correlation r itself has no unit of measurement, it is just a number.

Describing the relationship between two variables is more complex than describing the distribution of one

variable. Here are some cautions to keep in mind when you use correlation:

o Correlation requires that both variables be quantitative, so that it makes sense to do the arithmetic

indicated by the formula for r.

We cannot calculate a correlation between the incomes of a group of people and what city they

live in, because city is a categorical variable.

o Correlation measures the strength of only the linear relationship between two variables.

Correlation does not describe curved relationships between variables, no matter how strong the

relationship is.

A correlation of 0 doesn’t guarantee that there’s no relationship between two variables,

just that there’s no linear relationship.

Like the mean and standard deviation, the correlation is not resistant.

“r” is strongly affected by a few outlying observations. Use r with caution when outliers

appear in the scatterplot.

Correlation is not a complete summary of two-variable data, even when the relationship

between the variables is linear.

Page 6: 3.1A Scatterplots and Correlation Explanatory and Response ...jschweigert.weebly.com/uploads/2/9/0/3/29037999/chapter_3.1_notes.pdf · 3.1A Scatterplots and Correlation Explanatory

State, Plan, Do, Conclude