stats chapter 5

47
Chapter 5 Producing Data

Upload: richard-ferreria

Post on 03-Dec-2014

5.558 views

Category:

Documents


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Stats chapter 5

Chapter 5

Producing Data

Page 2: Stats chapter 5

5.1 DESIGNING SAMPLES

Page 3: Stats chapter 5

Some notes before we begin

• We are entering the second part of the statistics course “Experimental Design”

• In most real life applications, experimental design begins the process of statistics

• Provided experiments (and surveys) are carefully designed, we can use the techniques of statistics to analyze the results with increased “significance”

• Much of this material is covered in social science courses (i.e. psychology)

Page 4: Stats chapter 5

Population and Sample

Population-• The entire group of individuals for

which information is producedSample- • A subset of the population that is

examined in greater detail• Results of the sample are generalized

to the population.

Page 5: Stats chapter 5

Sample vs. Census

Census• Information gathered from the entire

population (no exceptions!)• Produces the most accurate

description of the population• Usually expensive or impossible

Page 6: Stats chapter 5

Samples

• By their nature, the success or failure of a study or experiment depends on good technique in sampling

• We want our sample to “look like” our population– We would like to minimize the effect of

outlier observations– We would like to decrease ‘variability’ in

our sample– We would like to decrease ‘bias’

Page 7: Stats chapter 5

Some ‘bad’ sampling techniques

Voluntary Response Sampling• Most often seen as a ‘call-in’ poll or

an ‘internet poll’• People with strong, often negative

opinions are most likely to respond• Polls are easily “fixed”• This sampling technique and its’

results are not to be trusted!

Page 8: Stats chapter 5

Some ‘bad’ sampling techniques

Convenience Sampling• Individuals in the sample consist of those who

are easiest to reach• Mall interviews

– The sample is only valid for people who visit the mall (this is not everyone!)

– The sample tends to consist of the “easiest targets”

• Some telephone studies• This is not to say that samples must be difficult

to construct, they just cannot consist of only the easiest individuals to sample

Page 9: Stats chapter 5

Bias

• In statistics, bias refers to the systematic favoring of one outcome over another

• Try not to confuse this definition with a non-statistical definition

• Bias is enemy #1 for sampling technique

Page 10: Stats chapter 5

Some notation

• The lowercase script ‘n’ always denotes the number of individuals in a sample

• The capital ‘N’ denotes the size of the population

• ‘Table B’ (inside back cover) is the table of random digits

• A random integer can be produced from a TI with the command “RandInt(a, b, n)”– a = smallest number, b = largest number,

n = number of digits to produce (optional)

Page 11: Stats chapter 5

Simple Random Samples

• This is THE sampling technique for this statistics course– Other sampling techniques exist, but our

course is focused on the results of an SRS

• Every possible sample of size n has an equal chance of being selected

• This is analogous to placing “names in a hat” or “drawing straws”

Page 12: Stats chapter 5

Choosing an SRS

1. Label IndividualsAssign each individual in the population a unique “ID”Each ID should have the same # of digits

2. Select IndividualsUse table B or your calculator to select individuals

3. Stopping ruleIndicate when you will stop sampling

4. Identify SampleIndicate which individuals/ID#’s are included your sample

Page 13: Stats chapter 5

Probability Samples

• Samples are chosen by chance• All possible samples are known• The probability of choosing each

sample is known• SRS is one example of a probability

sample

Page 14: Stats chapter 5

Stratified Random Sample

• Population is divided into strata– These strata are segments of the population that

are similar in an important way

• Each stratum undergoes an SRS• The samples from each stratum are combined

to form the full sample• A stratified sample ensures that all groups are

represented at the appropriate proportion– Would a sample that consists of 50% boys and 50%

girls make sense for a population of IT consultants?

Page 15: Stats chapter 5

Stratified Random Sample

Suppose the population contains 100 juniors and 50 seniors

• We would like our samples to reflect this proportion between juniors and seniors

1. Choose an SRS n=10 from the juniors

2. Choose and SRS n=5 from the seniors

3. The 15 individuals chosen will be the sample for our Stratified Random Sample

Page 16: Stats chapter 5

Cluster Sampling

1. The population is divided into clusters or groupsEach cluster must be representative of the population (no bias!)

2. One cluster is randomly chosenRandom ID selection (table B, names in a hat, calculator)

3. The entire cluster that is chosen becomes the sample

Page 17: Stats chapter 5

Multistage sampling

• Used when the population is very large

• Take samples from the samples repeatedly until the sample size is “manageable”

• Refer to pg 341

Page 18: Stats chapter 5

Cautions about Sample Surveys

Undercoverage• Sample does not include all segments of the

population, or systematically favors one segment of the population

• Many telephone samples will contain an undercoverage bias simply because many people do not have telephones – (yes, it’s true)

• This is most serious when the “undercovered” individuals differ significantly from the rest of the population.

Page 19: Stats chapter 5

Cautions about Sample Surveys

Nonresponse• Many people contacted for a survey choose not to

participate• Extremely significant if the non-responders differ

from the responders• Simply “sampling more people” will not eliminate

bias, esp. if the bias is systematically linked to the nonresponse– We are likely to get more nonresponse!

• We should either:(1) redesign the survey, or (2) follow up on the nonresponders

Page 20: Stats chapter 5

Cautions about Sample Surveys

Response Bias• Respondents answer in a way that is

different from the actual opinion• Can be caused by the interviewer– Appearance and gender sensitive

questions can be influenced by the appearance and gender of interviewer

Page 21: Stats chapter 5

Cautions about Sample Surveys

Wording of Questions• Questions that are “confusing”– Complicated wording affects responses

• Questions that are “leading”– Present a scenario that can influence a

response before prompting for a response

– Use words that color the respondent's opinions

Page 22: Stats chapter 5

Sample Survey Wisdom

• Insist of knowing the following before trusting results:1. The exact questions asked2. Rate of nonresponse3. Date and method of survey

• Larger samples produce more accurate results than smaller samples

Page 23: Stats chapter 5

Assignment 5.1A

#2, 6, 7, 9, 11, 24, 26, 32

Page 24: Stats chapter 5

5.2 DESIGNING EXPERIMENTS

Page 25: Stats chapter 5

Definitions

An experiment is conducted to reveal the response of one variable (response variable) to changes in other variables (explanatory variable/s)

Page 26: Stats chapter 5

Definitions

Experimental Units• The individuals upon whom the

experiment is conducted• Human experimental units are called

“subjects”Treatment• The specific experimental condition

applied to the experimental units

Page 27: Stats chapter 5

Definitions

Factors• Another term for explanatory

variables in an experiment• An experiment can examine the

effects of multiple factorsLevels• Factors can be applied to

experimental units in different amounts or levels

Page 28: Stats chapter 5

Principles of Design

• Control–Minimize effect confounding variables– Obtain and apply treatments to exp. units

• Replication–Minimize effects of outlier observations– Use multiple exp units

• Randomization–Minimize effects of variability from

individual responses

Page 29: Stats chapter 5

Control

• Try to detect and separate effects from the treatment from effects from other variables

• Control Group– Represents the population with no treatment– Often applied a placebo treatment– Provides a “baseline” for comparison

• Don’t confuse “Control” (the principle) with “Control Group” (the treatment group)

Page 30: Stats chapter 5

Replication

• We would like exp. units within each treatment group to respond similarly to the treatment, and differently from exp. units in other treatment groups

• BUT variability (and outliers) exists throughout each treatment group

• If the experiment is replicated many times (many exp. units), the effects of variability (and outliers) will “average out”

Page 31: Stats chapter 5

Replication

• Use enough experimental units to eliminate “chance variation”

• Replication (in terms of experimental design) does not mean “repeat the entire experiment”

• Remember: larger samples produce more accurate results than smaller samples

Page 32: Stats chapter 5

Randomization

• Assign experimental units to treatments using a randomized design (SRS)

• Minimize bias due to individual’s response level to different treatments

Page 33: Stats chapter 5

Statistical Significance

• After experimentation, we hope to see a difference in response level that is large/measurable

• A difference that is too large to have happened “by chance” is called statistically significant

• We try to produce statistically significant results!

• We will discuss how large the difference must be in future chapters.

Page 34: Stats chapter 5

Assignment 5.2A

• Pg 357 #33, 35, 37, 39, 40, 43, 45, 46, 67

Page 35: Stats chapter 5

Randomized Comparative Experiments

• Completely Randomized Design–Most basic

• Block Design– Used when we believe there is a difference

in response levels of different groups

• Matched Pairs Design– Compares only two treatments–Measures effect of treatment on two very

similar exp units

Page 36: Stats chapter 5

Completely Randomized Design

• Can be used for many treatments• Exp units assigned to treatment

group randomly• Response in each treatment group is

averaged• Average of each treatment group is

compared

Page 37: Stats chapter 5

Completely Randomized Design (Example Diagram)

Page 38: Stats chapter 5

Block Design

• This is an instance of control• Exp Units are known to have similar

response level groups (i.e. gender differences)

• Exp units are “blocked” according to these groups

• Each block undergoes an SRS into treatment groups

Page 39: Stats chapter 5

Block Design

• Each treatment group is averaged an compared within the block

• Each block may (or may not) have a control group

• Form blocks based on the most important unavoidable sources of variability among exp units

• “Control what you can, block what you can’t control, randomize the rest”

Page 40: Stats chapter 5

Block Design(Example Diagram)

Page 41: Stats chapter 5

Matched Pairs Design

• Exp units are matched into pairs that are similar in terms of the experiment

• Each of two experimental units will receive a different treatment

• Many times, the subjects in the pair are the same person

• The effect of the response from the matched pair is measured with a simple subtraction

Page 42: Stats chapter 5

Matched Pairs Design

• Randomization-– Randomized which member of the pair

receives which treatment – Randomize the order the treatments are

applied– Often randomization can be done with a

coin flip!– Sometimes, it is important to have a length

of time between treatment applications

Page 43: Stats chapter 5

Matched Pair Design(example diagram – single subject)

Subject #1

treatment

controlcompare

Subject #2

control

treatmentcompare

Subject #3

treatment

controlcompare

Subject #n

control

treatmentcompare

Randomize order

compare

Page 44: Stats chapter 5

Matched Pair Design(example diagram – paired subjects)

Subject #1

treatment

control compareSubject #2

Subject #3

treatment

controlcompare

Subject #n

treatment

controlcompare

Randomizetreatment

Subject #4

Subject #n-1

Match Pairs

Page 45: Stats chapter 5

Cautions about Experimentation

Double Blind Experiment• Sometimes bias is produced unconsciously• Sometimes a subject will produce bias if he

knows he as receiving placebo treatment• Effects can be controlled if neither the

experimenter nor the subject know which treatment was administered

• Typically, the treatment is given an ID number and only the researcher will know which treatment corresponds to which ID.

• Controls the placebo effect

Page 46: Stats chapter 5

Cautions about Experimentation

Lack of realism• Experimental results are produced

under conditions that cannot be realistically duplicated

• Subjects who know they are exp units may behave differently than the population

• The laboratory setting itself may be a variable of the experiment!

Page 47: Stats chapter 5

Assignment 5.2B

• #45-49, 55, 57, 62, 63, 67, 68