bps - 3rd ed. chapter 71 producing data: sampling

44
BPS - 3rd Ed . Chapter 7 1 Chapter 7 Producing Data: Sampling

Upload: andrew-knight

Post on 28-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 1

Chapter 7

Producing Data: Sampling

Page 2: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 2

Researchers often want to answer questions about some large group of individuals (this group is called the population)

Often the researchers cannot measure (or survey) all individuals in the population, so they measure a subset of individuals that is chosen to represent the entire population (this subset is called a sample)

The researchers then use statistical techniques to make conclusions about the population based on the sample

Population and Sample

Page 3: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 3

How Data are Obtained Observational Study

– Observes individuals and measures variables of interest but does not attempt to influence the responses

– Describes some group or situation– Sample surveys are observational studies

Experiment– Deliberately imposes some treatment on

individuals in order to observe their responses– Studies whether the treatment causes change in

the response.

Page 4: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 4

Experiment versusObservational Study

Both typically have the goal of detecting a relationship between the explanatory and response variables.Experiment

– create differences in the explanatory variable and examine any resulting changes in the response variable (cause-and-effect conclusion)

Observational Study– observe differences in the explanatory variable

and notice any related differences in the response variable (association between variables)

Page 5: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 5

Why Not Always Use an Experiment?

Sometimes it is unethical or impossible to assign people to receive a specific treatment.

Certain explanatory variables, such as handedness or gender, are inherent traits and cannot be randomly assigned.

Page 6: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 6

Confounding

The problem:– in addition to the explanatory variable of

interest, there may be other variables (explanatory or lurking) that make the groups being studied different from each other

– the impact of these variables cannot be separated from the impact of the explanatory variable on the response

Page 7: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 7

Confounding

The solution:– Experiment: randomize experimental units

to receive different treatments (possible confounding variables should “even out” across groups)

– Observational Study: measure potential confounding variables and determine if they have an impact on the response(may then adjust for these variables in the statistical analysis)

Page 8: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 8

Question

A recent newspaper article concludedthat smoking marijuana at least three times a week resulted in lower grades in college. How do you think the researchers came to this conclusion? Do you believe it? Is there a more reasonable conclusion?

Page 9: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 9

Case Study

The Effect of Hypnosison the

Immune System

reported in Science News, Sept. 4, 1993, p. 153

Page 10: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 10

Case Study

The Effect of Hypnosison the

Immune System

Objective:To determine if hypnosis strengthens thedisease-fighting capacity of immune cells.

Page 11: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 11

Case Study

65 college students. – 33 easily hypnotized– 32 not easily hypnotized

white blood cell counts measured all students viewed a brief video about

the immune system.

Page 12: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 12

Case Study

Students randomly assigned to one of three conditions– subjects hypnotized, given mental exercise– subjects relaxed in sensory deprivation

tank– control group (no treatment)

Page 13: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 13

Case Study

white blood cell counts re-measured after one week

the two white blood cell counts are compared for each group

results– hypnotized group showed larger jump in white

blood cells– “easily hypnotized” group showed largest immune

enhancement

Page 14: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 14

Case Study

The Effect of Hypnosison the

Immune System

What is the population?

What is the sample?

Page 15: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 15

Case Study

The Effect of Hypnosison the

Immune System

Is this an experimentor

an observational study?

Page 16: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 16

Case Study

The Effect of Hypnosison the

Immune System

Does hypnosis and mental exercise affect the

immune system?

Page 17: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 17

Case Study

Weight Gain Spells

Heart Risk for Women

“Weight, weight change, and coronary heart disease in women.” W.C. Willett, et. al., vol. 273(6), Journal of the American Medical Association, Feb. 8, 1995.

(Reported in Science News, Feb. 4, 1995, p. 108)

Page 18: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 18

Case Study

Objective:To recommend a range of body mass index (a function of weight and height) in terms of

coronary heart disease (CHD) risk in women.

Weight Gain Spells

Heart Risk for Women

Page 19: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 19

Case Study

Study started in 1976 with 115,818 women aged 30 to 55 years and without a history of previous CHD.

Each woman’s weight (body mass) was determined

Each woman was asked her weight at age 18.

Page 20: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 20

Case Study

The cohort of women were followed for 14 years.

The number of CHD (fatal and nonfatal) cases were counted (1292 cases).

Results were adjusted for other variables (smoking, family history, menopausal status, post-menopausal hormone use).

Page 21: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 21

Case Study

Results: compare those who gained less than 11 pounds (from age 18 to current age) to the others.– 11 to 17 lbs: 25% more likely to develop

heart disease– 17 to 24 lbs: 64% more likely– 24 to 44 lbs: 92% more likely– more than 44 lbs: 165% more likely

Page 22: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 22

Case Study

What is the population?

What is the sample?

Weight Gain Spells

Heart Risk for Women

Page 23: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 23

Case Study

Is this an experimentor

an observational study?

Weight Gain Spells

Heart Risk for Women

Page 24: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 24

Case Study

Does weight gain in women increase their risk

for CHD?

Weight Gain Spells

Heart Risk for Women

Page 25: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 25

Bad Sampling Designs

Voluntary response sampling– allowing individuals to choose to be in the sample

Convenience sampling– selecting individuals that are easiest to reach

Both of these techniques are biased– systematically favor certain outcomes

Page 26: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 26

Voluntary Response To prepare for her book Women and Love, Shere

Hite sent questionnaires to 100,000 women asking about love, sex, and relationships.– 4.5% responded– Hite used those responses to write her book

Moore (Statistics: Concepts and Controversies, 1997) noted:– respondents “were fed up with men and eager to fight

them…”– “the anger became the theme of the book…”– “but angry women are more likely” to respond

Page 27: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 27

Convenience Sampling

Sampling mice from a large cage to study how a drug affects physical activity– lab assistant reaches into the cage to select

the mice one at a time until 10 are chosen

Which mice will likely be chosen?– could this sample yield biased results?

Page 28: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 28

Simple Random Sampling

Each individual in the population has the same chance of being chosen for the sample

Each group of individuals (in the population) of the required size (n) has the same chance of being the sample actually selected

Random selection:– “drawing names out of a hat”– table of random digits– computer software

Page 29: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 29

Table of Random Digits Table B on pg. 654 of text

– each entry is equally likely to be any of the 10 digits 0 through 9

– entries are independent of each other (knowledge of one entry gives no information about any other entries)

– each pair of entries is equally likely to be any of the 100 pairs 00, 01,…, 99

– each triple of entries is equally likely to be any of the 1000 values 000, 001, …, 999

Page 30: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 30

Choosing a Simple Random Sample (SRS)

STEP 1: Label each individual in the population

STEP 2: Use Table B to select labels at random

Page 31: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 31

Probability Sample

a sample chosen by chance must know what samples are possible

and what chance, or probability, each possible sample has of being selected

a SRS gives each member of the population an equal chance to be selected

Page 32: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 32

Stratified Random Sample

first divide the population into groups of similar individuals, called strata

second, choose a separate SRS in each stratum

third, combine these SRSs to form the full sample

Page 33: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 33

Stratified Random SampleExample

Suppose a university has the following student demographics:

Undergraduate Graduate First Professional Special

55% 20% 5% 20%

A stratified random sample of 100 students could be chosen as follows: select a SRS of 55 undergraduates, a SRS of 20 graduates, a SRS of 5 first professional students, and a SRS of 20 special students; combine these 100 students.

Page 34: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 34

Multistage Sample

several stages of sampling are carried out useful for large-scale sample surveys samples at each stage may be SRSs, but

are often stratified stages may involve other random sampling

techniques as well (cluster, systematic, random digit dialing, …)

Page 35: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 35

Cautions about Sample Surveys Undercoverage

– some individuals or groups in the population are left out of the process of choosing the sample

Nonresponse– individuals chosen for the sample cannot be contacted

or refuse to cooperate/respond Response bias

– behavior of respondent or interviewer may lead to inaccurate answers or measurements

Wording of questions– confusing or leading (biased) questions; words with

different meanings

Page 36: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 36

Nonresponse To prepare for her book Women and Love,

Shere Hite sent questionnaires to 100,000 women asking about love, sex, and relationships.– 4.5% responded– Hite used those responses to write her book– angry women are more likely to respond

Page 37: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 37

Response Bias A door-to-door survey is being conducted

to determine drug use (past or present) of members of the community. Respondents may give socially acceptable answers (maybe not the truth!)

For this survey on drug use, would it matter if a police officer is conducting the interview? (bias from interviewer)

Page 38: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 38

Asking the UninformedWashington Post National Weekly Edition (April 10-16, 1995, p. 36)

A 1978 poll done in Cincinnati asked people whether they “favored or opposed repealing the 1975 Public Affairs Act.”– There was no such act!– About one third of those asked expressed

an opinion about it.

Response Bias

Page 39: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 39

Wording of Questions

A newsletter distributed by a politician to his constituents gave the results of a “nationwide survey on Americans’ attitudes about a variety of educational issues.” One of the questions asked was, “Should your legislature adopt a policy to assist children in failing schools to opt out of that school and attend an alternative school--public, private, or parochial--of the parents’ choosing?” From the wording of this question, can you speculate on what answer was desired? Explain.

Page 40: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 40

Wording: Deliberate Bias

“If you found a wallet with $20 in it, would you return the money?”

“If you found a wallet with $20 in it, would you do the right thing and return the money?”

Page 41: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 41

Wording: Unintentional Bias “I have taught several students over the

past few years.”– How many students do you think I have

taught?– How many years am I referring to?

“Over the past few days, how many servings of fruit have you eaten?”– How many days are you considering?– What constitutes a serving?

Page 42: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 42

Wording: Unnecessary Complexity

“Do you sometimes find that you have arguments with your family members and co-workers?”– Arguments with family members– Arguments with co-workers

Page 43: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 43

Wording: Ordering of Questions

“How often do you normally go out on a date? about ___ times a month.”

“How happy are you with life in general.”– Strong association between these questions.– If the ordering is reversed, then there would

be no strong association between these questions

Page 44: BPS - 3rd Ed. Chapter 71 Producing Data: Sampling

BPS - 3rd Ed. Chapter 7 44

Inferences about the Population Values calculated from samples are used to

make conclusions (inferences) about unknown values in the population

Variability– different samples from the same population may yield

different results for a particular value of interest

– estimates from random samples will be closer to the true values in the population if the samples are larger

– how close the estimates will likely be to the true values can be calculated -- this is called the margin of error