activities—notes for the instructor

34
Activities—Notes for the Instructor Activity 1.1 Head Sizes—Understanding Variability In this activity, students should be able to see the difference between variability due to “measurement error” and person-to-person variability. In this activity, person- to-person variability will likely be larger than variability introduced by having different people taking measurements. In Step 9, students should speculate that the scheme proposed would result in more variability since the measurements will reflect both person-to-person variability and variability introduced by having different people doing the measuring. Activity 1.2 Estimating Sizes Actual sizes for the ten shapes are 1 2 3 4 5 6 7 8 9 10 44 20 16 40 41 47 66 14 48 33 Don’t give the students very much time to estimate the sizes, and be sure to remind them not to draw on the figure. This activity introduces the idea of deviations (here estimated – actual) and leads students to consider the sum of squared deviations as a measure of overall error. Activity 1.4 Egg Variability This activity asks students to consider the issue of measurement error and to think about variability. Activity 1.5 Big Feet, Little Feet

Upload: cormac

Post on 05-May-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Activities—Notes for the Instructor

Activities—Notes for the Instructor

Activity 1.1 Head Sizes—Understanding Variability

In this activity, students should be able to see the difference between variability due to “measurement error” and person-to-person variability. In this activity, person-to-person variability will likely be larger than variability introduced by having different people taking measurements.

In Step 9, students should speculate that the scheme proposed would result in more variability since the measurements will reflect both person-to-person variability and variability introduced by having different people doing the measuring.

Activity 1.2 Estimating Sizes

Actual sizes for the ten shapes are

1 2 3 4 5 6 7 8 9 1044 20 16 40 41 47 66 14 48 33

Don’t give the students very much time to estimate the sizes, and be sure to remind them not to draw on the figure. This activity introduces the idea of deviations (here estimated – actual) and leads students to consider the sum of squared deviations as a measure of overall error.

Activity 1.4 Egg Variability

This activity asks students to consider the issue of measurement error and to think about variability.

Activity 1.5 Big Feet, Little Feet

This activity examines variability and asks students to informally compare two groups on the basis of variability and center.

Page 2: Activities—Notes for the Instructor

Activity 2.1 Designing a Sampling Plan

It is often very difficult r impractical to select a simple random sample from a population. In particular, selecting a simple random sample of students from a large school could prove difficult because it is unlikely that a student could get access to a reasonable sampling frame. This activity asks students to consider how they might select a sample that, while not necessarily a simple random sample, might still be considered as representative of the students at the school. Encourage students to consider the need to vary location, day of the week, time of day, etc., and to think about how they will decide which students in a proposed location will actually be asked to participate.

Activity 2.2 An Experiment to Test for the Stroop Effect

Students generally find this to be an interesting experiment. Two reasonable designs are1. Assign volunteer subjects at random to one of the two experimental conditions

(text or colored rectangles)2. Have all subjects process both the text and the rectangle lists. For this design, it is

important that the order of the two experimental conditions be determined at random for each subject.

Have the class talk about the difference between these two designs and note that although it takes different forms in the two designs, randomization is critical in both designs.

This activity also provides a good forum for talking about extraneous variables and how the proposed design(s) address them.

Activity 2.5 Cluster Sampling

This activity is fairly straightforward and introduces students to cluster sampling. The last step explores the idea of sampling variability, and important concept in Chapter 8. It is worth spending some time with the class discussing the responses to this step.

Activity 2.6 Speed Sorting

In this activity, students design and carry out an experiment and then informally look for differences among the three treatments/experimental conditions. It also asks students to articulate why random assignment of subjects to treatments is important.

Page 3: Activities—Notes for the Instructor

Activity 3.1 Locating States

This activity bridges Chapters 2 and 3, asking students to develop a sampling plan and then implement it to collect data. The graphical displays of Chapter 3 are then used to summarize the resulting data.

Depending on where you are located, you may need to bring in a US map to help decide whether you are closer to Nebraska or Vermont in Step 8!

If the class comes up with a reasonable sampling plan, they should feel comfortable generalizing the results to the population of students at your school in Step 10. This would be an appropriate place to talk about why it would not be a good idea to generalize beyond your school—to the district, county, state, etc., or to other age groups.

Activity 3.2 Bean Counters!

This activity has students collect data that is paired in nature. Graphical displays are used to summarize the resulting data and students are led to consider looking at differences. Hopefully, students see that looking at the distribution of differences provides information that is not apparent in the two individual distributions. This is an important ides in Chapter 11, when the distinction between independent and paired samples determines the appropriate method of analysis.

You can come back to this data in Chapter 11 if you need a good data set to illustrate the paired t test.

Page 4: Activities—Notes for the Instructor

Activity 3.3 M&M Marginal Plots

A marginal plot is a scatter plot with added univariate plots for the two variables used to construct the scatter plot. MINITAB can construct marginal plots. For example, below are two marginal plots—one that uses a histogram as the univariate display and one that uses a dot plot.

You will need a fairly sensitive scale for this activity. Biology, Chemistry and Physics faculty are likely to have a good scale that you might be able to borrow.

In Step 8, have students think about which type of marginal plot would be best if the sample size is very large (histogram marginal plot).

Page 5: Activities—Notes for the Instructor

Activity 3.4 Stretchability of Rubber Bands

This activity asks students to collect data, summarize it using graphical displays, and to think critically about graphical displays. Before attempting this activity, make sure that you have the appropriate supplies—THIN rubber bands and BIG, HEAVY nuts for weights. If the nuts are not heavy enough, the stretch will not be very noticeable and it will be hard to measure. Try this one out before attempting it in class!

Activity 4.1 Collecting and Summarizing Numerical Data

This activity has students implement the sampling plan developed in Activity 2.1. If you didn't do Activity 2.1, you will need to integrate the sampling design into this activity.

Activity 4.3 Boxplot Shapes

Solutions:

Boxplot Five-Number Summary ReasoningA III This is the only five number

summary consistent with an outlier on the high side.

B I Median closer to the upper quartile, short lower whisker.

C II Median closer to the lower quartile, but nearer the center of the box than IV.

D IV Median closer to the lower quartile, and upper whisker longer than lower whisker.

Activity 4.4 Understanding Variability and Numerical Measures of Variability

A favorite activity—gets students thinking about variability visually and then numerically.

Page 6: Activities—Notes for the Instructor

Activity 4.5 Comparing Brands of Chocolate Chip Cookies

This activity has students compare two different brands of chocolate chip cookies and then use post-it notes to create a graphical display that is equivalent to a back-to-back stem-and-leaf plot. In Step 7, students should describe how the two distributions are similar or different with respect to center, spread and shape.

In Step 11, students may want to consider other factors (price, nutritional information, appearance etc.) in addition to the number of chocolate chips in making their recommendations.

Activity 5.1 Exploring Correlation and Regression

This activity has students use two applets (on the txt CD) to explore correlation (Steps 1 & 2) and how sum of squared error is used in fitting the least squares line (Step 3).

Activity 5.2 Age and Flexibility

In this activity, students will collect data on age and a measure of flexibility and then summarize the data using a scatter plot and a least squares line. It is important to have a wide range of ages represented in the data set.

You may want to spend a bit of time talking about the proposed measure of flexibility—sometimes students worry at first that it will depend on height, but usually they are convinced that the measure is reasonable after discussion. If students are concerned, have them also record height, and they can then use the data to see if there does appear to be a relationship between height and the measure of flexibility.

Before Step 3, it may be useful to have the class discuss which of the two variables (age and flexibility) is the response variable.

Activity 5.3 Paper Towels—A Nonlinear Relationship

Prior to carrying out this activity, ask students to bring in a partially used roll of paper towels. You will want to only use rolls that have standard size paper towels (some rolls now on the market have much smaller towel sheets). If you have a large class, you can just have a subset of the class bring in rolls—10 to 20 in total would be enough. If you have storage space, you can save the towels once they have been taken off the rolls and counted. Some of the later activities use paper towel sheets.

The scatterplot in Step 4 should exhibit a nonlinear patter. One practical situation where a model like this might be useful (see Step 7) is for estimating the remaining length of fabric on a bolt of fabric.

Page 7: Activities—Notes for the Instructor

Activity 5.4 Exponential Decay

This activity has students generate bivariate data that illustrates a nonlinear relationship. Students then recommend a transformation of the data. You can extend this activity by asking students to actually fit a nonlinear model to the data.

Activity 6.1 Kisses

This is a simple (and sweet) activity that uses simulation to estimate a probability.

Activity 6.2 A Crisis for European Sports Fans??

This activity uses simulation to approximate a probability distribution. The distribution obtained is actually the sampling distribution of a sample proportion, and Step 4 introduces the idea of using the information provided by the sampling distribution to reach a conclusion.

If you have a small class, you may need to have each student conduct more than one trial in order to have a sufficient number of trials.

You may want to revisit this activity again in Chapter 8 when sampling distributions are formally introduced.

Activity 6.3 The "Hot Hand" in Basketball

In this activity, simulation issued to approximate the distribution of the longest run in a sequence of trials. Students are often surprised to find that long runs are not as uncommon as they expected. You might want to ask students to predict the length of the longest run of heads that would be observed prior to actually performing the simulation.

If you have a small class, you may need to have each student conduct more than one trial in order to have a sufficient number of trials.

There is no right or wrong answer for the question posed in Step 7, but it can be an interesting discussion!

Activity 6.4 The Monty Hall Problem

This is the now famous (but sometimes unintuitive) Monty Hall problem. The activity is designed to help students understand the reasoning that leads to the correct answer/strategy.

Page 8: Activities—Notes for the Instructor

Activity 6.5 Efron's Dice: An Unintuitive Example

An unintuitive by interesting example. You may need to remind students what "transitive" means!

Activity 7.1 Rotten Eggs??

You have to chuckle at the newspaper clip that is the basis for this activity!

In step 2, the probability calculations shown depend on the assumption of independence. This may not be reasonable if eggs in the same carton come from the same farm, same chicken, were processed in the same batch, etc.

Step 3 has students carry out a simulation of the strategy proposed by the restaurant manager. If you have a small class, you may need to have each pair of students perform more than 10 trials to obtain a sufficient number of trials.

Activity 7.2 The Sound of the Normal Distribution

Another one of my favorite activities! You really can "hear" the normal distribution.

The variable of interest isx = time for a kernel to pop

and the distribution is formed by the x values for the collection of kernels in the bag.

Activity 7.3 Pass the Message

This activity is a variation on the old children's pass the message game, where someone whispers a message to a second person, who then passes it to a third and so on. In that game, the message that emerges at the end can be quite different from the original message.

The simulation is straightforward. The key think to recognize is that the end result will be correct any time thee is an even number of transmission errors.

Page 9: Activities—Notes for the Instructor

Activity 8.1 Do Students Who Take the SAT Multiple Times Have an Advantage in College Admissions?

This activity lets students simulate sampling distributions and then use them to make a policy recommendation regarding whether admissions at selective universities should consider the highest, mean or most recent score for a student who takes the exam multiple times.

Sample results from Parts 1 and 2 are shown below (of course results will vary somewhat from student to student).

Max2

Mean2

Page 10: Activities—Notes for the Instructor

Recent2

Max5

Page 11: Activities—Notes for the Instructor

Mean5

Recent5

Things to note: There is a clear advantage to students taking the test multiple times when the highest score is used and the advantage is greater when the test is taken 5 times than when it is taken 2 times. This is not surprising, but he graphs allow students to get a sense of the magnitude of the advantage. The distributions for most recent score looks similar no matter how many times the test is taken, and remains centered at about 1200, the students true ability.

The distribution of mean2 and mean5 are centered at about 1200, but are less spread out than the distribution of recent2 and recent5. Students should be able to relate what they see in the mean histograms to what they know about the sampling distribution of the sample mean.

If your students don't have access to MINITAB, this activity can easily be adapted to a demonstration, where you show the computer (or provide it in handouts).

Page 12: Activities—Notes for the Instructor

Activity 8.2 Defective M&Ms

This activity has students use what they know about the sampling distribution of a sample proportion to decide if the number of defective M&Ms in a sample of 100 is consistent with a claim of 10% defective.

This activity can be revisited when P-values are introduced in Chapter 10.

Activity 9.1 Getting a Feel for Confidence Level

Part 1 of this activity has students use an applet to explore the meaning of confidence level. The main point is that the confidence level specifies an error rate for the method—it is the proportion of intervals constructed in repeated sampling that include the true value of the population characteristic being estimated. Point out that the value of the population characteristic is fixed and does not change. What changes from sample to sample is the interval itself.

Part 2 of this activity looks at why the t distribution is used to construct confidence intervals when the population standard deviation in not known. For small sample sizes, if the z interval is used when the population standard deviation is unknown, the long-fun proportion of the time that the resulting interval will include the population mean is too small—less than the stated confidence level.

As the sample size increases, the proportion containing the population value gets much closer to the desired confidence level. This is why some texts say it is OK to use a z interval even if the population standard deviation is unknown, as long as the sample size is large.

Activity 9.2 An Alternative Confidence Interval for a Population Proportion

In section 9.2 mentions that the "usual" large sample confidence interval for a population proportion is often not very good in the sense that the actual long-run proportion of intervals that include the value of the population proportion may differ substantially from the stated confidence level. This activity has students compare "capture rate" for the usual large-sample confidence interval with that of the alternative interval proposed in the text.

Page 13: Activities—Notes for the Instructor

Activity 9.5 Water Stains

This activity has students collect data and construct confidence intervals. Data is collected on the width of the water stain (at the widest part of the stain) when ¼ tsp of water is spilled onto a paper towel and when ½ tsp. of water is spilled onto a paper towel. (Here is a place you can use those leftover paper towels from Activity 5.3.)

Many students are surprised at the fact that the mean width of the stain for ½ tsp water does not appear to be twice the mean width for ¼ tsp. of water. What is happening is that the AREA of the stain is about twice as large, but this does not mean that the width of the stain doubles. (Think of a rectangle—if both the width and length of the rectangle double, the area increases by a factor of 4.)

Activity 10.1 Comparing the t and z Distributions

This activity uses simulation to show why the statistic

is not well described by a standard normal distribution when n is small and the population standard deviation is unknown, even if the population distribution is normal. Students generate 200 samples of size 5 from a normal distribution and then compute both the z statistic (using the known population standard deviation) and the t statistic for each sample. Students see that there is more variability in the t values than in the z values and that while the histogram of the z value looks like the standard normal, the histogram of the t values is more spread out.

Typical graphs (student simulation results will vary from student to student) are shown below.

Page 14: Activities—Notes for the Instructor

z values

t values

Activity 10.3 Lapsed Time

This activity has students collect data and use it to perform a one-sample hypothesis test.

If you have a small class size, you may need to recruit some additional subjects to ensure an adequate sample size. In my experience, the distribution of times appears skewed rather than normal, so it is best to have a sample size of 30 or more.

Page 15: Activities—Notes for the Instructor

Activity 11.1 Helium Filled Footballs??

This activity uses data from the Data and Story Library (DASL). Before looking at the data, have your students answer the question posed in Step 1. This can lead to an interesting discussion!

If you prefer to provide the data rather than have your students go to the Internet to retrieve it, the data follows. If you haven't seen the DASL web site, you might want to spend a bit of time exploring it. It is a nice source of data and examples.

Trial Air Helium1 25 252 23 163 18 254 16 145 35 236 15 297 26 258 24 269 24 2210 28 2611 25 1212 19 2813 27 2814 25 3115 34 2216 26 2917 20 2318 22 2619 33 3520 29 2421 31 3122 27 3423 22 3924 29 3225 28 1426 29 2827 22 3028 31 2729 25 3330 20 1131 27 2632 26 3233 28 3034 32 2935 28 3036 25 2937 31 2938 28 3039 28 26

Page 16: Activities—Notes for the Instructor

Activity 11.2 Thinking About Data Collection

This activity focuses on the difference between independent samples and paired samples, and also provides students with a chance to review some of the material on experimental design. The activity can consist of just Steps 1 – 3, which focus on data collection issues, or you can include Step 4, which asks students to actually collect the data and perform an appropriate hypothesis test.

Activity 11.4 More on Defective M&Ms

In this activity, students inspect M&Ms for defects, comparing the proportion of defective for plain and peanut M&Ms. Students should also classify defective M&Ms by type of defect so that this data can be used again in Activity 12.3.

Activity 11.5 Which Weighs More—Coke or Diet Coke?

This is a fun activity—you will be surprised at the student answers to the question in Step 1. Students seem to be evenly split between the three possible responses, and so the discussion gets the students interested in collecting data to answer the question. (It turns out that Coke weighs more, due to the sugar that is dissolved in the solution.)

To carry out this activity you will need a fairly sensitive scale—science departments usually have these, so you might ask a science faculty member if you can borrow one for this activity.

Activity 12.1 Pick a Number, Any Number…

People are notoriously poor random number generators. This activity gives students practice with the Chi-square goodness-of-fit test and also reminds them that they should use random number tables or a random number generator when they need random numbers.

Activity 12.2 Color and Perceived Taste

This activity has students design an experiment to see if the color of a food item changes the way people perceive its taste. Collecting the data can be a bit time consuming, so if you choose not to have the students actually collect the data, an alternate approach would be to complete Step 1 (the design part) and then provide students with hypothetical data that can be used to complete the table in Step 3 and the test in Step 4.

Page 17: Activities—Notes for the Instructor

Activity 12.3 Peanut and Plain M&M Defects

This activity uses data collected in Activity 11.3 to perform a Chi-square test. If you didn't do Activity 11.3, you will need to do the data collection part of that activity at this time.

This activity also provides an opportunity to talk about the distinction between the Chi-square test of homogeneity and the Chi-square test of independence. This is a test for homogeneity.

Activity 13.1 Are Tall Women from "Big" Families

This activity primarily focuses on the model utility test in simple linear regression. MINITAB output for the data of this example follows.

Pearson correlation of Height and Siblings = 0.396P-Value = 0.257

The regression equation isHeight = 64.3 + 0.366 Siblings

Predictor Coef SE Coef T PConstant 64.2543 0.7532 85.31 0.000Siblings 0.3662 0.3001 1.22 0.257

S = 1.55637 R-Sq = 15.7% R-Sq(adj) = 5.2%

Page 18: Activities—Notes for the Instructor

Activity 13.2 Golden Rectangles

This activity introduces the idea of fitting a linear model that goes through the origin (i.e. has intercept = 0). MINITAB will fit such a model, but if you don't have access to MINITAB or a software program that will fit such a model, you can have students fit the model by hand.

The model is

and the least squares estimate of is

The estimated variance of the residuals is

and the test statistic for testing is

Activity 13.3 Name Lengths

This activity is straightforward. Students use data on name lengths to carry out a test to determine if there is a positive correlation between first name length and last name length.

Page 19: Activities—Notes for the Instructor

Activity 14.1 Exploring the Relationship Between Number of Predictors and Sample Size

This activity shows that if enough variables are included in a multiple regression model, it is possible to achieve an of 100% and an of 0 even when there is no relationship between y and any of the predictors. MINITAB output follows

Regression Analysis: y versus x1 The regression equation isy = 24.6 - 0.195 x1

Predictor Coef SE Coef T PConstant 24.622 2.925 8.42 0.004x1 -0.1949 0.1411 -1.38 0.261

S = 0.612314 R-Sq = 38.9% R-Sq(adj) = 18.5%

Regression Analysis: y versus x1, x2 The regression equation isy = 29.9 - 0.224 x1 - 0.232 x2

Predictor Coef SE Coef T PConstant 29.939 4.336 6.90 0.020x1 -0.2240 0.1205 -1.86 0.204x2 -0.2321 0.1557 -1.49 0.275

S = 0.516166 R-Sq = 71.0% R-Sq(adj) = 42.1%

Regression Analysis: y versus x1, x2, x3 The regression equation isy = 27.7 - 0.294 x1 - 0.203 x2 + 0.150 x3

Predictor Coef SE Coef T PConstant 27.728 3.116 8.90 0.071x1 -0.29430 0.08850 -3.33 0.186x2 -0.2025 0.1047 -1.93 0.304x3 0.14981 0.07984 1.88 0.312

S = 0.343323 R-Sq = 93.6% R-Sq(adj) = 74.4%

Regression Analysis: y versus x1, x2, x3, x4 The regression equation isy = 35.5 - 0.0278 x1 - 0.325 x2 - 0.0535 x3 - 0.350 x4

SEPredictor Coef Coef T PConstant 35.5481 * * *x1 -0.0278208 * * *x2 -0.324598 * * *x3 -0.0534616 * * *x4 -0.350013 * * *

S = *

Page 20: Activities—Notes for the Instructor

Activity 15.1 Exploring Single-Factor ANOVA

Step 1:

Case 1: Looks like the variance may not be the same for all three populationsCase 2: Consistent with assumptionsCase 3: Consistent with assumptionsCase 4: May worry about assumption of normality for population 3

Step 2:

Case A: Probably not all the sameCase B: UnsureCase C: Could be the same

Step 3: MINITAB output follows

Case AOne-way ANOVA: Sample 1, Sample 2, Sample 3

Source DF SS MS F PFactor 2 3313.69 1656.84 259.60 0.000Error 21 134.03 6.38Total 23 3447.72

Case BOne-way ANOVA: Sample 1, Sample 2, Sample 3

Source DF SS MS F PFactor 2 2918 1459 12.55 0.000Error 21 2442 116Total 23 5359

Case COne-way ANOVA: Sample 1, Sample 2, Sample 3

Source DF SS MS F PFactor 2 237.0 118.5 1.55 0.235Error 21 1604.0 76.4Total 23 1841.0