chapter 3 generalization: how broadly do the results apply?
TRANSCRIPT
![Page 1: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/1.jpg)
Chapter 3 Generalization: How broadly do the results apply?
![Page 2: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/2.jpg)
GeneralizationSo far we’ve studied significance
and estimation. Once we make a conclusion from
a test of significance or construct a confidence interval, how broadly do these apply or to what population can I generalize these results?
This generalization is the topic for this section.
![Page 3: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/3.jpg)
Sometimes this generalization is difficult and sometimes it is not.
Generalizing to a larger population is valid only when the sample is representative.
Unfortunately, biased sampling methods are common.
Generalization
![Page 4: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/4.jpg)
Section 3.1Introduction to sampling from a finite population
![Page 5: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/5.jpg)
Notation Check
Statistics (x-bar) Sample
Average or Mean (p-hat) Sample
Proportion
Parameters (mu) Population
Average or Mean (pi) Population
Proportion
Statistics summarize a sample and parameters summarize a population
![Page 6: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/6.jpg)
Sampling Hope College students
Suppose we want to know the proportion of Hope students that watched the Super Bowl. Or the average number of traffic tickets Hope students have received.
The population of interest is all Hope students.
A census will get this information from all Hope students.
What if you don’t have time/money to interview all students?
![Page 7: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/7.jpg)
Sampling
We can take a sample of Hope students and find the proportion of those in our sample that watched the Super Bowl or mean number of traffic tickets they have received.
Using these statistics we can make inferences to the parameters.
How well will these statistics represent our parameters of interest?
The key to this question is how the sample is selected from the population.
![Page 8: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/8.jpg)
Random Sampling
Getting a random sample is key to making a good inference. This can be tough; we don’t live in a random world. For example, the people you see on a daily basis can be very different from the people others near to you see on a daily basis.
When samples are not random or representative their results can be misleading.
![Page 9: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/9.jpg)
Biased Sampling
![Page 10: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/10.jpg)
ESPN Top 10: What is college basketball's fiercest rivalry?
Connecticut vs. Tennessee (Women)
Duke vs. North Carolina
Hope vs. Calvin
Illinois vs. Missouri
Indiana vs. Purdue
Louisville vs. Kentucky
Penn vs. Princeton
Philadelphia's Big 5
Oklahoma vs. Oklahoma State
Xavier vs. Cincinnati http://proxy.espn.go.com/chat/sportsnation/polling?event_id=1194
![Page 11: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/11.jpg)
ESPN Top 10: What is college basketball's fiercest rivalry?
75.1% Hope vs. Calvin 9.3% Duke vs. North Carolina 5.4% Indiana vs. Purdue5.2% Philadelphia's Big 5 1.7% Penn vs. Princeton1.5% Oklahoma vs. Oklahoma State 0.7% Louisville vs. Kentucky 0.6% Connecticut vs. Tennessee (Women)0.3% Illinois vs. Missouri 0.3% Xavier vs. Cincinnati
Total Votes: 46,084
![Page 12: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/12.jpg)
2012 State ACT Results
New York ranked 6th with an average of 23.3.
Michigan ranked 45th with an average of 20.1.
![Page 13: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/13.jpg)
2011 State SAT Results New York ranked 45th with
an average of 1466. Michigan ranked 6th with an
average of 1762.
??? MI NYACT 100% 29%SAT 4% 90%
![Page 14: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/14.jpg)
Random SampleTo have a random sample, you
can’t have people self-select themselves into the sample. (Basketball poll)
You can’t choose a convenient sample that is clearly not representative of the population. (ACT vs. SAT)
![Page 15: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/15.jpg)
Random SampleA simple random sample is the
easiest way to ensure that your sample is unbiased.
A sampling method is biased if statistics from samples consistently over or under-estimate the population parameter.
![Page 16: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/16.jpg)
Simple Random SampleA simple random sample is like
drawing names out of a hat. Technically, a simple random
sample is a way of randomly selecting members of a population so that every sample of a certain size from a population has the same chance of being chosen.
![Page 17: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/17.jpg)
Sampling Every simple random sample gives
us different values for the statistics. There is variability from sample to
sample (sampling variability). If we take repeated simple random
samples of Hope students, each sample will consist of different students. We will get different means or proportions each time we do this. However …
![Page 18: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/18.jpg)
SamplingThe sample means or proportions
will center around the population mean or proportion if the sampling method is unbiased (like a simple random sample).
Our sampling variability will decrease when we take larger and larger sample sizes.
![Page 19: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/19.jpg)
![Page 20: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/20.jpg)
Exploration 3.1A: Sampling Words
We need to sample from a population of interest if it is very large or is difficult to measure every single member of the population.
If we were interested in High School GPA for Hope students we would not need to sample. The registrar’s office has all that information. If we were interested in something that has not already been collected, we might want to sample.
![Page 21: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/21.jpg)
Exploration 3.1A: Sampling WordsThat being said, in this activity we will be
using the words in the Gettysburg Address as our population.
There are fewer than 300 in this speech and we could easily look at the entire speech to find out average word length, proportion of words that contain an e, etc.
We will be sampling from this speech not to get information from the population, but to help us learn some things about sampling.
![Page 22: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/22.jpg)
Only picture of Lincoln at Gettysburg(Edward Everett spoke for over two hours. Lincoln followed with his two-minute speech.)
![Page 23: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/23.jpg)
Exploration 3.1A
Select what you think is a representative sample of 10 words from the Gettysburg (pg 3-10). Record your words in table in question 2.
Make dotplots of both average length and proportion containing e on the board.
Only work through question 22.HW: Exercises 3.1.3 and 3.1.4
![Page 24: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/24.jpg)
Review of Section 3.1A sampling method is biased if
statistics from samples consistently over or under-estimate the population parameter.
A simple random sample is the easiest way to insure that your sample is unbiased.
Therefore, if we have a simple random sample, we can infer our results to the population from which is was drawn.
![Page 25: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/25.jpg)
Review of Section 3.1We saw biased and unbiased
sampling in the Gettysburg Address exploration. We also saw that:◦When we increase sample size, the
variability of our sampling distribution decreases.
◦This variability can be predicted.◦Changing the population size has no
effect on variability.
![Page 26: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/26.jpg)
Population distribution of word lengths
Distribution of average word length from samples of size 20
![Page 27: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/27.jpg)
Section 3.2: Inference for a Single Quantitative Variable
Using methods similar to what we did in the last section, we will see how a null distribution for a single quantitative variable can be obtained and even predicted.
![Page 28: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/28.jpg)
Example 3.2: Estimating Elapsed Time
![Page 29: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/29.jpg)
Does it ever seem that time drags or flies by?
Students in a stats class (for their project 2) collected data on students’ perception of time
Subjects were told that they’d listen to music and asked questions when it was over.
Played 10 seconds of the Jackson 5’s “ABC” and asked how long they thought it lasted
Can students accurately estimate the length?
Estimating Time
![Page 30: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/30.jpg)
Hypotheses
Null Hypothesis: People will accurately estimate the length of a 10 second-song snippet, on average. (μ = 10 seconds)Alternative Hypothesis: People will not accurately estimate the length of a 10 second-song snippet, on average. (μ ≠ 10 seconds)
![Page 31: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/31.jpg)
A convenience sample of 48 students on campus were subjects and song length estimates were recorded.
The average estimate was 13.71 sec and the standard deviation was 6.50 sec.
Estimating Time
Estimate5 10 15 20 25 30
volume "low"=
![Page 32: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/32.jpg)
Skewed, mean, medianThe distribution obtained is not
symmetric, but is right skewed.When data are skewed right, the
mean gets pulled out to the right while the median is more resistant to this.
Estimate5 10 15 20 25 30
volume "low"=
![Page 33: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/33.jpg)
Mean v MedianThe mean is 13.71 and the
median is 12.How would these numbers
change if on of the people that gave an answer of 30 seconds actually said 300 seconds?
The standard deviation is 6.5 sec. Is it resistant to outliers?
Estimate5 10 15 20 25 30
volume "low"=
![Page 34: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/34.jpg)
Population?One way to develop a null
distribution is to draw samples from some population that we think our population of time estimates might look like under a true null.
Under the null the mean is 10 sec.
We might assume the population is skewed and has a standard deviation similar to what we found.
![Page 35: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/35.jpg)
Simulation-based InferenceWe have a possible population
data set similar to what we need.Let’s go and get that data.Then go to the One Mean
applet and develop a null distribution.
Find out where our actual mean of 13.71sec is located.
And finally see how a t-distribution could predict all this.
![Page 36: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/36.jpg)
T-distributionThe t-distribution is very similar
to a normal distribution, but with slightly “heavier” tails.
The t-statistic is the standardized statistic we use with a single quantitative variable and can be found using the formula:
![Page 37: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/37.jpg)
Validity ConditionsThe theory-based test for a single
mean requires either:◦The sample size is at least 20.◦If the sample size is less than 20 the
sample distribution is not skewed.Let’s use the theory-based applet
to run this test and find a confidence interval. (We first need to get the data.)
![Page 38: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/38.jpg)
Formulate Conclusions. Based on our small p-value, we
can conclude that people don’t accurately estimate the length of a 10-second song snippet and in fact they overestimate it.
To what larger population can we make our inference?
Estimating Time
![Page 39: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/39.jpg)
Estimating TimeWe are 95% confident that the
average estimate of a 10 second song is between 11.823 and 15.597 seconds.
Estimate5 10 15 20 25 30
![Page 40: Chapter 3 Generalization: How broadly do the results apply?](https://reader038.vdocuments.us/reader038/viewer/2022102906/56649c915503460f9494bcf9/html5/thumbnails/40.jpg)
Exploration 3.2: Sleepless Nights?Page 3-32