chapter 4: gathering data section 4.1 should we experiment or should we merely observe?

Chapter 4:Gathering Data

Section 4.1

Should We Experiment or Should We Merely Observe?

1. Population versus Sample

2. Types of Studies: Experimental and Observational

3. Comparing Experimental and Observational Studies

Learning Objectives:

Population

Population: all the subjects of interest We use statistics to learn about the population, the

entire group of interest Sample: subset of the population

Data is collected for the sample because we cannot typically measure all subjects in the population

Learning Objective 1:Population and Sample

Sample

Learning Objective 2:Type of Study: Observational Study

In an observational study, the researcher observes values of the response variable and explanatory variables for the sampled subjects, without anything being done to the subjects (such as imposing a treatment)

Learning Objective 2:Observational Study – Sample Survey

A sample survey selects a sample of people from a population and interviews them to collect data.

A sample survey is a type of observational study.

A census is a survey that attempts to count the number of people in the population and to measure certain characteristics about them

Learning Objective 2:Type of Study: Experiment

A researcher conducts an experiment by assigning subjects to certain experimental conditions and then observing outcomes on the response variable

The experimental conditions, which correspond to assigned values of the explanatory variable, are called treatments

Learning Objective 2:Example

Headline: “Student Drug Testing Not Effective in Reducing Drug Use”

Facts about the study:

76,000 students nationwide Schools selected for the study included schools

that tested for drugs and schools that did not test for drugs

Each student filled out a questionnaire asking about his/her drug use


Conclusion: Drug use was similar in schools that tested for drugs and schools that did not test for drugs


This study was an observational study.

In order for it to be an experiment, the researcher would had to have assigned each school to use or not use drug testing rather than leaving this decision to the school.

Learning Objective 3:Comparing Experiments and Observational Studies

An experiment reduces the potential for lurking variables to affect the result. Thus, an experiment gives the researcher more control over outside influences.

Only an experiment can establish cause and effect. Observational studies can not.

Experiments are not always possible due to ethical reasons, time considerations and other factors.

Chapter 4Gathering Data

Section 4.2

What are Good Ways and Poor Ways to Sample?


1. Sampling Frame & Sampling Design

2. Simple Random Sample (SRS)

3. Random number table

4. Margin of Error

5. Convenience Samples

6. Types of Bias in Sample Surveys

7. Key Parts of a Sample Survey

Learning Objective 1:Sampling Frame & Sampling Design

The sampling frame is the list of subjects in the population from which the sample is taken, ideally it lists the entire population of interest

The sampling design determines how the sample is selected. Ideally, it should give each subject an equal chance of being selected to be in the sample

Learning Objective 2:Simple Random Sampling, SRS

Random Sampling is the best way of obtaining a sample that is representative of the population

A simple random sample of ‘n’ subjects from a population is one in which each possible sample of that size has the same chance of being selected

Learning Objective 2:SRS Example

Two club officers are to be chosen for a New Orleans trip

There are 5 officers: President, Vice-President, Secretary, Treasurer and Activity Coordinator

The 10 possible samples are:

(P,V) (P,S) (P,T) (P,A) (V,S)

(V,T) (V,A) (S,T) (S,A) (T,A) For a SRS, each of the ten possible samples has an

equal chance of being selected. Thus, each sample has a 1 in 10 chance of being selected and each officer has a 1 in 4 chance of being selected.

Learning Objective 3:SRS: Table of Random Numbers

Table E on pg. A6 of text

Table of Random Numbers

Leaning Objective 3:Using Random Numbers to select a SRS

To select a simple random sample Number the subjects in the sampling frame

using numbers of the same length (number of digits)

Select numbers of that length from a table of random numbers or using a random number generator

Include in the sample those subjects having numbers equal to the random numbers selected

We need to select a random sample of 5 from a class of 20 students.

1) List and number all members of the population, which is the class of 20.

2) The number 20 is two-digits long.

3) Parse the list of random digits into numbers that are two digits long. Here

we chose to start with line 103, for no particular reason.

Learning Objective 3:Choosing a simple random sample

22 36 84 65 73 25 59 58 53 93 30 99 58 91 98 27 98 25 34 02

1 Alison2 Amy3 Brigitte4 Darwin5 Emily6 Fernando7 George8 Harry9 Henry10 John11 Kate12 Max13 Moe14 Nancy15 Ned16 Paul17 Ramon18 Rupert19 Tom20 Victoria

• Remember that 1 is 01, 2 is 02, etc. • If you were to hit 09 again before getting five people,

don’t sample Ramon twice—you just keep going.

4) Choose a random sample of size 5 by reading through the

list of two-digit random numbers, starting with line 2 and on.

5) The first five random numbers matching numbers assigned

to people make the SRS.

22 36 84 65 73 25 59 58 53 93 30 99 58 91 98 27 98 25 34 02

The first individual selected is Amy, number 02. That’s it

from line 2. Move to line 3

Then Moe (13), Darwin, (04), Henry (09), and Net (15)

24 13 04 83 60 22 52 79 72 65 76 39 36 48 09 15 17 92 48 30

Learning Objective 4:Margin of Error

Sample surveys are commonly used to estimate population percentages

These estimates include a margin of error which tells us how well the sample estimate predicts the population percentage

When a SRS of n subjects is used, the margin of error is approximately

1100%

n

Learning Objective 4:Example: Margin of Error

A survey result states: “The margin of error is plus or minus 3 percentage points”

This means: “It is very likely that the reported sample percentage is no more than 3% lower or 3% higher than the population percentage”

Learning Objective 5:Convenience Samples: Poor Ways to Sample

Convenience Sample: a type of survey sample that is easy to obtain

Unlikely to be representative of the population

Often severe biases result from such a sample

Results apply ONLY to the observed subjects

Learning Objective 5:Convenience Samples: Poor Ways to Sample

Volunteer Sample: most common form of convenience sample Subjects volunteer for the sample Volunteers do not tend to be representative

of the entire population

Learning Objective 6:Types of Bias in Sample Surveys

Bias: Tendency to systematically favor certain parts of the population over others

Sampling Bias: bias resulting from the sampling method such as using nonrandom samples or having undercoverage

Nonresponse bias: occurs when some sampled subjects cannot be reached or refuse to participate or fail to answer some questions

Response bias: occurs when the subject gives an incorrect response or the question is misleading

A Large Sample Does Not Guarantee An Unbiased Sample!

Learning Objective 7:Key Parts of a Sample Survey

Identify the population of all subjects of interest Construct a sampling frame which attempts to list

all subjects in the population Use a random sampling design to select n subjects

from the sampling frame Be cautious of sampling bias due to nonrandom

samples

We can make inferences about the population of interest when sample surveys that use random sampling are employed.


Section 4.3

What Are Good Ways and Poor Ways to Experiment?


1. Identify the elements of an experiment

2. Experiments

3. 3 Components of a good experiment

4. Blinding the Study

5. Define Statistical Significance

6. Generalizing Results of the Study

Learning Objective 1:Elements of an Experiment

Experimental units: the subjects of an experiment; the entities that we measure in an experiment

Treatment: A specific experimental condition imposed on the subjects of the study; the treatments correspond to assigned values of the explanatory variable

Explanatory variable: Defines the groups to be compared with respect to values on the response variable

Response variable: The outcome measured on the subjects to reveal the effect of the treatment(s).

Learning Objective 2:Experiments

An experiment deliberately imposes treatments on the experimental units in order to observe their responses.

The goal of an experiment is to compare the effect ofthe treatment on the response.

Experiments that are randomized occur when the subjects are randomly assigned to the treatments; randomization helps to eliminate the effects of lurking variables

Learning Objective 3:3 Components of a Good Experiment

Control/Comparison group: allows the researcher to analyze the effectiveness of the primary treatment

Randomization: eliminates possible researcher bias, balances the comparison groups on known as well as on lurking variables

Replication: allows us to attribute observed effects to the treatments rather than ordinary variability

Learning Objective 3:Principle 1: Control or Comparison Group

A placebo is a dummy treatment, i.e. sugar pill. Many subjects respond favorable to any treatment, even a placebo.

A control group typically receives a placebo. A control group allows us the analyze the effectiveness of the primary treatment. A control group need not receive a placebo. Clinical

trials often compare a new treatment for a medical condition, not with a placebo, but with a treatment that is already on the market.

Learning Objective 3:Principle 1: Control or Comparison Group

Experiments should compare treatments rather than attempt to assess the effect of a single treatment in isolation Is the treatment group better, worse, or no different

than the control group? Example: 400 volunteers are asked to quit

smoking and each start taking an antidepressant. In 1 year, how many have relapsed? Without a control group (individuals who are not on the antidepressant), it is not possible to gauge the effectiveness of the antidepressant.

Learning Objective 3:Placebo effect

Placebo effect (power of suggestion) The “placebo effect” is an improvement in health due not to any treatment but only to the patient’s belief that he or she will improve.

Learning Objective 3:Principle 2: Randomization

To have confidence in our results we should randomly assign subjects to the treatments. In doing so, we Eliminate bias that may result from the researcher

assigning the subjects Balance the groups on variables known to affect the

response Balance the groups on lurking variables that may be

unknown to the researcher

Learning Objective 3:Principle 3: Replication

Replication is the process of assigning several experimental units to each treatment The difference due to ordinary variation is

smaller with larger samples We have more confidence that the sample

results reflect a true difference due to treatments when the sample size is large

Since it is always possible that the observed effects were due to chance alone, replicating the experiment also builds confidence in our conclusions

Learning Objective 4:Blinding the Experiment

Ideally, subjects are unaware, or blind, to the treatment they are receiving

If an experiment is conducted in such a way that neither the subjects nor the investigators working with them know which treatment each subject is receiving, then the experiment is double-blinded

A double-blinded experiment controls response bias from the respondent and experimenter

If an experiment (or other study) finds a difference in two (or more) groups, is this difference really important?

If the observed difference is larger than what would be expected just by chance, then it is labeled statistically significant.

Rather than relying solely on the label of statistical significance, also look at the actual results to determine if they are practically significant.

Learning Objective 5:Define Statistical Significance

Learning Objective 6:Generalizing Results

Recall that the goal of experimentation is to analyze the association between the treatment and the response for the population, not just the sample

However, care should be taken to generalize the results of a study only to the population that is represented by the study.


Section 4.4

What are Other Ways to Conduct Experimental and Observational Studies

Learning Objectives

1. Sample Surveys: Other Random Sampling Designs

2. Types of Observational Studies: Prospective and Retrospective

3. Multifactor Experiment

4. Matched pairs design

5. Randomized block design

Learning Objective 1:Sample Surveys: Random Sampling Designs

It is not always possible to conduct an experiment so it is necessary to have well designed, informative studies that are not experimental, e.g., sample surveys that use randomization Simple Random Sampling Cluster Sampling Stratified Random Sampling

Learning Objective 1:Sample Surveys: Cluster Random Sample

Cluster Random Sample Steps

Divide the population into a large number of clusters, such as city blocks

Select a simple random sample of the clusters Use the subjects in those clusters as the

sample

Learning Objective 1:Sample Surveys: Cluster Random Sample

Cluster Random Sample Preferable when

A reliable sampling frame is unavailable The cost of selecting a SRS is excessive

Disadvantage Usually need a larger sample size than with a

SRS in order to achieve a particular margin of error

Learning Objective 1:Sample Surveys: Stratified Random Sample

Stratified Random Sample Steps

Divide the population into separate groups, called strata

Select a simple random sample from each strata

Combine the samples from all strata to form complete sample

Learning Objective 1:Sample Surveys: Stratified Random Sample

Stratified Random Sample Advantage is that you can include in your

sample enough subjects in each stratum you want to evaluate

Disadvantage is that you must have a sampling frame and know the stratum into which each subject belongs

Learning Objective 1:Stratified Random Sample - Example

Suppose a university has the following student demographics:

Undergraduate Graduate First Professional Special

55% 20% 5% 20%

In order to insure proper coverage of each demographic, a stratified random sample of 100 students could be chosen as follows: select a SRS of 55 undergraduates, a SRS of 20 graduates, a SRS of 5 first professional students, and a SRS of 20 special students; combine these 100 students.

Learning Objective 1:Comparing Random Sampling Methods

Learning Objective 2:Types of Observational Studies

An observational study can yield useful information when an experiment is not practical.

Types of observational studies: Sample Survey: attempts to take a cross section of a

population at the current time Retrospective study: looks into the past Prospective study: follows its subjects into the future

Causation can never be definitively established with an observational study, but well designed studies can provide supporting evidence for the researcher’s beliefs

Learning Objective 2:Retrospective Case-Control Study

A case-control study is a retrospective observational study in which subjects who have a response outcome of interest (the cases) and subjects who have the other response outcome (the controls) are compared on an explanatory variable

Learning Objective 2:Example: Case-Control Study

Response outcome of interest: Lung cancer The cases have lung cancer The controls did not have lung cancer

The two groups were compared on the explanatory variable smoker/nonsmoker

Smoker Cases Controlsyes 688 650no 21 59Total 709 709Prob(smoker) 97% 92%

Lung Cancer

Learning Objective 2:Example: Prospective Study

Nurses’ Health Study: Began in 1976 with 121,700 female nurses aged 30 to

55; questionnaires are filled out every two years Purpose was to explore the relationships among diet,

hormonal factors, smoking habits and exercise habits and the risk of coronary heart disease, pulmonary disease and stroke

Nurses are followed into the future to determine whether they eventually develop an outcome such as lung cancer and whether certain explanatory variables are associated with it

Learning Objective 3:Multifactor Experiments

A Multifactor experiment uses a single experiment to analyze the effects of two or more explanatory variables on the response

Categorical explanatory variables in an experiment are often called factors

We are often able to learn more from a multifactor experiment than from separate one-factor experiments since the response may vary for different factor combinations

Learning Objective 3:Example: Multifactor experiment

Examine the effectiveness of both Zyban and nicotine patches on quitting smoking•Two factor experiment•4 treatments


subjects: a certain number of undergraduate students

all subjects viewed a 40-minute television program that included ads for a digital camera

some subjects saw a 30-second commercial; others saw a 90-second version

same commercial was shown either 1, 3, or 5 times during the program

there were two factors: length of the commercial (2 values), and number of repetitions (3 values)


the 6 combinations of one value of each factor form six treatments

Factor B:Repetitions

1 time 3 times 5 times

Factor A:Length

30 seconds 1 2 3

90 seconds 4 5 6

subjects assigned to Treatment 3 see a 30-second ad five times during the program

after viewing, all subjects answered questions about: recall of the ad, their attitude toward the commercial, and their intention to purchase the product – these were the response variables.

Learning Objective 4:Matched Pairs Design

In a matched pairs design, the subjects receiving the two treatments are somehow matched (same person, husband/wife, two plots in the same field, etc.) In a crossover design, the same individual is used for

the two treatments Randomly

assign the two treatments to the two matched subjects, or

randomize the order of applying the treatments in a crossover design

The number of replicates equals the number of pairs Helps to reduce effects of lurking variables

Learning Objective 5:Randomized Block Design

A block is a set of experimental units that are matched with respect to one or more characteristics

A Randomized Block Design, RBD, is when the random assignment of experimental units to treatments is carried out separately within each block

Learning Objective 5:Example: Randomized Block Design

Block = gender; 3 treatments = 3 types of therapyThe men (as well as the women) are randomly assigned to the 3 treatments; differences can be compared with respect to gender as well as therapy type

Learning Objective 5:Randomized Block Design

RBD eliminates variability in the response due to the blocking variable; allows for better comparisons to be made among the treatments of interest

A matched pairs design is a special case of a RBD with two observations in each block

chapter 4: gathering data section 4.1 should we experiment or should we merely observe?

Documents

sample sample slide

population population

sample survey slide

population learning

sample of people

sample surveys

population data

type of observational