collecting data understanding random sampling. objectives: to develop the basic properties of...

29
Collecting Data Understanding Random Sampling

Upload: elijah-stevens

Post on 19-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Collecting Data

Understanding Random Sampling

Page 2: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Objectives: To develop the basic properties of

collecting an unbiased sample. To learn to recognize flaws in biased

sampling.

Page 3: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Intro…

Do you know what it

means when something

occurs randomly?

Randomly select a number

from the next slide. Ready…

Page 4: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

1 2 3 4

Page 5: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Question:

What would you except

to happen if when we

collected data on this

simple task?

Page 6: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

How do we gather data?

Surveys Opinion pollsInterviewsStudies

ObservationalRetrospective (past)Prospective (future)

Experiments

Page 7: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

PopulationPopulation – the entire group

of individuals we want information about.

Census – a complete count of the entire population

Page 8: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Why would we not use a census all the time?

1) Not accurate

2) Very expensive

3) Perhaps impossible

4) If using destructive sampling, you would destroy population

• Breaking strength of soda bottles• Lifetime of flashlight batteries• Safety ratings for cars

Page 9: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

SampleA part of the population that we examine in order to gather information

Used to generalize information about a population

Page 10: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Sampling designrefers to the method used to choose

the sample from the population

Sampling framea list of every individual in the population

Page 11: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

consist of n individuals from the population chosen in such a way thatevery individual has an equal

chance of being selectedevery set of n individuals has an

equal chance of being selected

Simple Random Sample (SRS)

Page 12: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

SRSAdvantages

UnbiasedEasy

DisadvantagesLarge varianceMay not be

representativeMust have

sampling frame (list of population)

Page 13: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Systematic random sample

select sample by following a systematic approach

randomly select where to begin

Page 14: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Systematic Random Sample

AdvantagesUnbiasedEnsure that the

sample is distributed across population

More efficient, cheaper, etc.

DisadvantagesLarge varianceCan be

confounded by trend or cycle

Formulas are complicated

Page 15: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10th customer after them, to fill it out before they leave.

Identify the sampling design

Systematic random sampling

Page 16: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

BiasERRORfavors certain outcomes

Note: We cannot ever draw conclusions from bias data. Throw it out and start over!

Page 17: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Voluntary responsePeople chose to respond Usually only people with very strong opinions respond

Produces biased results

Page 18: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Convenience samplingAsk people who are easy to ask

Produces bias results

Page 19: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Suppose that you want to estimate the total amount of money spent by students on textbooks each semester at Rice. You collect register receipts for students as they leave the bookstore during lunch one day.

Source of bias?

Convenience sampling – easy way to collect data

Page 20: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

1970 Draft Lottery and the Role of Randomization

In that first draft lottery (conducted on December 1, 1969), a large, deep, cylindrical bowl was filled with 366 dates, one for each day of the year (including February 29, of course). The dates were placed inside small capsules (balls about the size of a pecan), added to the bowl, and then mixed. After mixing, the capsules were selected, one by one, and assigned a draft priority. Draft registrants whose birthdays matched the first 100 or so dates selected were likely to be called for induction. However, the bowl's small diameter and height (nearly arm's length) made the mixing less than random because each month's dates had been added sequentially in the yearly order of months.

January's capsules were dumped in first, followed by February's and so on until December.

Set of Data for 1970 Draft Lottery

Page 21: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

1970 Draft Lottery

Page 22: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

DayOfYear DraftNo correlation = -0.197831

DraftNo = -0.197DayOfYear + 220;

0

100200

300

400

0 100 200 300 400DayOfYear

draft_70 Scatter Plot

1970 Draft Number by Day of Year

Page 23: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Mean_Draft_No = -7.06Month_No + 230; r2 = 0.75

120

140

160

180

200

220

240

2 4 6 8 10 12 14Month_No

draft_70 Scatter Plot

Mean Draft Number by Month

Page 24: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

How did the nonrandomness of the draft effect the casualties (deaths) during the Vietnam war?

This was recently studied by Paul Sommers in "The Writing on the Wall", Chance, Vol, 1, 2003, p35-38.

He examined the names of the casualties on the Vietnam Memorial (available online at thewall-usa.com) together with other sources and found the number of casualties by birth month:

Page 25: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws
Page 26: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Selecting a SRS For the AP exam: “Knowledgeable

users of statistics need to be able to perform your sample exactly using the described method.”

Methods: we can “pick samples from a hat”, use a random number generator, or use a table of random digits to derive our sample

Page 27: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

SRS by picking out of a hat Say items in hat are “mixed thoroughly” and

state whether or not slips of paper are replaced back in the hat (yes if stratified sampling).

Page 28: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Random digit tableeach entry is equally likely to be any of the 10 digits

digits are independent of each other

Page 29: Collecting Data Understanding Random Sampling. Objectives: To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws

Suppose your population consisted of these 20 people:

1) Aidan 6) Fred 11) Kathy 16) Paul2) Bob 7) Gloria 12) Lori 17) Shawnie3) Chico 8) Hannah 13) Matthew 18) Tracy4) Doug 9) Israel 14) Nancy 19) Uncle Sam5) Edward 10) Jung 15) Opus 20) Vernon

Use the following random digits to select a sample of five from these people.

We will need to use double digit random numbers,

ignoring any number greater than 20. Start with Row 1

and read across.

We will need to use double digit random numbers,

ignoring any number greater than 20. Start with Row 1

and read across.

Row1 4 5 1 8 0 5 1 3 7 12 0 1 5 5 8 0 1 5 7 03 8 9 9 3 4 3 5 0 6 3

Row1 4 5 1 8 0 5 1 3 7 12 0 1 5 5 8 0 1 5 7 03 8 9 9 3 4 3 5 0 6 3

Ignore.Ignore.

18) Tracy

5) Edward

13) Matthew

1) Aidan

15) Opus

Ignore.Ignore.Ignore.Ignore.Ignore.Ignore.

Stop when five people are selected. So my sample would consist of :

Aidan, Edward, Matthew, Opus, and Tracy

Stop when five people are selected. So my sample would consist of :

Aidan, Edward, Matthew, Opus, and Tracy