curly hair care - home · web view1. interpret the slope of the least-squares regression line in...

14
AP Statistics Name: ____________________________ Chapter 12 – Inference for Regression Date: ____________________ Per: _____ Section 12.1 Introduction: Inference for Linear Regression Activity: Does Seat Location Matter? Many people believe that students learn better when they sit closer to the front of the classroom. Does sitting closer cause higher achievement, or do better students simply choose to sit in the front? To investigate, we will randomly assign students to seat locations in the classroom for a particular chapter and record the test score for each student at the end of the chapter. The explanatory variable in this experiment is which row the student was assigned (Row 1 is closet and Row 7 is farthest away). Here are the results, including a scatterplot and least-squares regression line: 1. Interpret the slope of the least-squares regression line in this context. 2. Explain why it was important to randomly assign the students to seats rather than letting each student choose his or her own seat. 3. Does the negative slope provide convincing evidence that sitting closer causes higher achievement, or is it plausible that the association is due to the chance variation in the random assignment? To investigate, a simulation was performed. Here are the results of 100 trials of this simulation: Based on the simulation results above, was the observed slope of -1.12 unusual, or is it likely to get a slope this small due to the chance variation in random assignment?

Upload: others

Post on 05-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

AP Statistics Name: ____________________________

Chapter 12 – Inference for RegressionDate: ____________________ Per: _____

Section 12.1 Introduction: Inference for Linear Regression

Activity: Does Seat Location Matter?

Many people believe that students learn better when they sit closer to the front of the classroom. Does sitting closer cause higher achievement, or do better students simply choose to sit in the front? To investigate, we will randomly assign students to seat locations in the classroom for a particular chapter and record the test score for each student at the end of the chapter. The explanatory variable in this experiment is which row the student was assigned (Row 1 is closet and Row 7 is farthest away). Here are the results, including a scatterplot and least-squares regression line:

1. Interpret the slope of the least-squares regression line in this context.

2. Explain why it was important to randomly assign the students to seats rather than letting each student choose his or her own seat.

3. Does the negative slope provide convincing evidence that sitting closer causes higher achievement, or is it plausible that the association is due to the chance variation in the random assignment? To investigate, a simulation was performed. Here are the results of 100 trials of this simulation:

Based on the simulation results above, was the observed slope of -1.12 unusual, or is it likely to get a slope this small due to the chance variation in random assignment?

4. What conclusion should you make based on this study?

Introduction

· The population regression line is a regression line that is calculated based on all observations from a population (also called true regression line).

· The sample regression line is calculated from a sample of observations from the population.

· How does the slope b of the sample regression line relate to the slope of the population regression line? You will notice in this section that the slopes of the sample regression lines vary quite a bit from the slope of the population regression line. The pattern of variation in the slope b is described by its sampling distribution.

Sampling Distribution of b

Confidence intervals and significance tests about the population regression line are based on the sampling distribution of b, the slope of the sample regression line. When certain conditions are met, the sampling distribution of b will be approximately Normal, with a mean , and standard deviation

Here is a summary of the important facts about the sampling distribution of b:

What’s with the formula for

There are 3 factors that affect the standard deviation of b:

· , the standard devition of the residuals for the population regression line. When is larger, so is

· , the standard deviation of the explanatory variable. When is larger, is smaller.

· the sample size. A larger sample size n reduces variability, therefore is smaller.

Here are the conditions for performing inference about the linear regression model. Read pg. 743 – 745.

EXAMPLE 1: Checking Conditions

We used Fathom to carry out a least-squares regression analysis for the “Does Seat Location Matter?” Activity. A scatterplot, residual plot, histogram, and Normal probability plot of the residuals are shown:

Check whether the conditions for performing inference about the regression model are met.

Estimating the Parameters

When conditions are met, we can do inference about the regression model . The first step is to estimate the unknown parameters. If we calculate the sample regression line , the slope b is an unbiased estimator of the true slope , and the y-intercept a is an unbiased estimator of the true y-intercept . The remaining parameter is the standard deviation , which is estimated by the standard deviation of the residuals, or . Remember that s describes the size of a “typical” prediction error.

EXAMPLE 2: Estimating Parameters

In a previous alternate example, we looked at the results of an experiment designed to see if sitting closer to the front of a classroom causes higher achievement. We checked the conditions for inference and there were no major violations. Here is a scatterplot of the data and some output from a regression analysis.

Regression Analysis: Score versus Row

Predictor Coef SE Coef T P

Constant 85.706 4.239 20.22 0.000

Row -1.1171 0.9472 -1.18 0.248

S = 10.0673 R-Sq = 4.7% R-Sq(adj) = 1.3%

(a) Identify the standard error of the slope SEb from the computer output. Interpret this value in context.

(b) Calculate the 95% confidence interval for the true slope. Show your work.

(c) Interpret the interval from part (b) in context.

(d) Based on your interval, is there convincing evidence that seat location affects scores?

AP Statistics Name: ______________________________

Chapter 12 – Inference for RegressionDate: ____________________ Per: _____

Section 12.1 Part 2: Confidence Intervals and Significance Tests for Slope

EXAMPLE 1 (Confidence Interval for : More Mentos, More Mess?

When Mentos are dropped into a newly opened bottle of Diet Coke, carbon dioxide is released from the Diet Coke very rapidly, causing the Diet Coke to be expelled from the bottle. Will more Diet Coke be expelled when there is a larger number of Mentos dropped in the bottle? How much more? Two statistics students, Brittany and Allie, decided to find out. Using 16 ounce (2 cup) bottles of Diet Coke, they dropped either 2, 3, 4, or 5 Mentos into a randomly selected bottle, waited for the fizzing to die down, and measured the number of cups remaining in the bottle. Then, they subtracted this measurement from the original amount in the bottle to calculate the amount of Diet Coke expelled (in cups). Here are the data:

Mentos

Amount

Expelled

(cups)

Mentos

Amount

Expelled

(cups)

2

1.1250

4

1.2500

2

1.2500

4

1.3125

2

1.0625

4

1.2500

2

1.2500

4

1.3750

2

1.1250

4

1.3125

2

1.0625

4

1.2500

3

1.1875

5

1.2500

3

1.1250

5

1.4375

3

1.2500

5

1.3125

3

1.1875

5

1.3125

3

1.3125

5

1.3750

3

1.1875

5

1.4375

Output from a regression analysis is shown below:

Predictor Coef SE Coef T P

Constant 1.00208 0.04511 22.21 0.000

Mentos 0.07083 0.01228 5.77 0.000

S = 0.0672442 R-Sq = 60.2% R-Sq(adj)

Problem: Construct and interpret a 99% confidence interval for the slope of the true regression line.

State:

Plan:

Do:

Conclude:

EXAMPLE 2 (Significance Test for : Tipping at a Buffet

Do customers who stay longer at buffets give larger tips? Charlotte, an AP® Statistics student who worked at an Asian buffet, decided to investigate this question for her second-semester project. While she was doing her job as a hostess, she obtained a random sample of receipts, which included the length of time (in minutes) the party was in the restaurant and the amount of the tip (in dollars). Here are the data, along with some output from a least-squares regression analysis.

Time (minutes)

Tip (dollars)

23

5.00

39

2.75

44

7.75

55

5.00

61

7.00

65

8.88

67

9.01

70

5.00

74

7.29

85

7.50

90

6.00

99

6.50

Predictor Coef SE Coef T P

Constant 4.535 1.657 2.74 0.021

Time (minutes) 0.03013 0.02448 1.23 0.247

S = 1.77931 R-Sq = 13.2% R-Sq(adj) = 4.5%

Problem:

(a) What is the equation of the least-squares regression line for predicting tip from the amount of time spent at the buffet? Define any variables you use.

(b) Interpret the slope of the least-squares regression line.

(c) Explain what the value of means in this setting.

(d) Do these data provide convincing evidence of a positive linear relationship between the amount of time and amount of tip for customers at this Asian buffet?

State:

Plan:

Do:

Conclude:

2

r