introduction to hypothesis testing raymond j. carroll department of statistics faculty of nutrition...

27
Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University http://stat.tamu.edu/~carroll

Post on 15-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Introduction to Hypothesis Testing

Raymond J. CarrollDepartment of Statistics

Faculty of Nutrition

Texas A&M Universityhttp://stat.tamu.edu/~carroll

Page 2: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Outline

• Series of Examples

• Data Collection for Examples

Page 3: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #1

• My Hypothesis: Texas A&M Students simply guess when they are asked whether they are drinking diet Pepsi or diet Coke

• The Experiment: Blind taste test. You are asked which cup you drink is diet Coke

• Our Goal: Test this hypothesis, using statistical principles and probability statements

Page 4: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

A Warning

• Yes or No: No statistician will ever answer a question “yes” or “no”

• Probabilities: We always say things like “the chance is less than 5% that your hypothesis is correct”

Page 5: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #1

• Data Model: The data model is – Normal?– Gamma?– Binomial?– Poisson?

Page 6: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #1

• Data Model: The data model is – Normal?– Gamma?– Binomial? Because each outcome is yes or no,

success or failure– Poisson?

Page 7: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #1

• My Hypothesis in terms of population parameters: I have claimed that you can do no better than guess

• Each of you is a Binomial(1,p) or Binomial(1

• When I say you are guessing, what am I saying about the population?

Page 8: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #1

• My Hypothesis in terms of population parameters: I have claimed that you can do no better than guess

• Each of you is a Binomial(1,p) or Binomial(1

• When I say you are guessing, what am I saying about the population?

• That the proportion of successes is p = = ½

Page 9: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #2

• My Hypotheses: Keebler used to advertise • 17 chocolate chip per cookie• More chocolate chips than another brand

• The Experiment: Get a cookie of each type, count the number of chips, criticize the experiment

• Our Goal: Test these hypotheses, using statistical principles and probability statements

Page 10: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #2

• Data Model: The data model is – Normal?– Gamma?– Binomial?– Poisson?

Page 11: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #2

• Data Model:The data model is – Normal?– Gamma?– Binomial? – Poisson?

• It could be Poisson or normal. Poisson is the better choice, because it is a count

• We’ll use the central limit theorem to make inferences

Page 12: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #2

• My Hypothesis in terms of population parameters: Keebler has claimed that it gives you 17 chips per cookie, on average

• Each of you is a Poisson with mean

• When I say Keebler is correct, what am I saying about the population?

Page 13: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #2

• My Hypothesis in terms of population parameters: Keebler has claimed that it gives you 17 chips per cookie, on average

• Each of you is a Poisson with mean

• When I say Keebler is correct, what am I saying about the population?

• That the population mean number of chips is 17

Page 14: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #3

• My Hypotheses: The percentage of regular M&M’s that are green is the same as the percentage of peanut M&M’s that are green

• The Experiment: Compute the percentage of green M&M’s in each bag

• Our Goal: Test these hypotheses, using statistical principles and probability statements

Page 15: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #3

• Data Model: The data model is – Normal?– Gamma?– Binomial?– Poisson?

Page 16: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #3

• Data Model:The data model is – Normal?– Gamma?– Binomial? – Poisson?

• Roughly normal, since each data point is a percentage

• We’ll use the central limit theorem to make inferences

Page 17: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #3

• My Hypothesis in terms of population parameters: The %-green M&M’s does not depend on the type of M&M’s

• What am I saying about the two populations?

Page 18: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #3

• My Hypothesis in terms of population parameters: The %-green M&M’s does not depend on the type of M&M’s

• What am I saying about the two populations?

• That they have the same population mean.

Page 19: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #4

• My Hypotheses: Women who keep track of their diet by diaries or PDA do not lower their caloric intake in a 6-day period

• The Experiment: The WISH Study at the National Cancer Institute, with 400 women

• The data appear to contradict my hypothesis

Page 20: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

1350140014501500155016001650170017501800

FF

Q

Dia

ry 1

Dia

ry 2

Dia

ry 3

Dia

ry 4

Dia

ry 5

Dia

ry 6

Typical (Median) Values of Reported Caloric Intake Over 6 Diary Days: WISH Study

A major point of STAT211 is to prepare you to answer the question as to whether these data, which look convincing, really are convincing in terms of probability statements.

Page 21: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #4

• Data Model: The data model is – Normal?– Gamma?– Binomial?– Poisson?

Page 22: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #4

• Data Model:The data model is – Normal?– Gamma?– Binomial? – Poisson?

• Lognormal, so most people take logarithms of caloric intake and analyze them as normal

Page 23: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #4

• Data Model: The data that we use is the difference between Day 1 and Day 6, i.e., Day 1 – Day 6

Page 24: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #4

• My Hypothesis in terms of population parameters:

• What am I saying about the population, when I claim that writing down diets will not lead to a change in reported caloric intake?

Page 25: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Example #4

• My Hypothesis in terms of population parameters:

• What am I saying about the population, when I claim that writing down diets will not lead to a change in reported caloric intake?

• That the population mean difference between Day 1 and Day 6 = 0

Page 26: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

Some Final Comments

• Formulating statistical hypothesis testing is really intuitive

• Don’t let the formulae obscure the fact that all we are doing is– Asking questions about population parameters– Constructing confidence intervals for

population parameters– Using these confidence intervals to answer the

question

Page 27: Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University carroll

The WISH Data

• I computed a 99% confidence interval for the population mean change in the WISH data.

• This interval was entirely above 0, and ranged roughly from 75 to 375

• In other words, with 99% confidence, Day 1 reported between 75 and 375 more calories than Day 6.

• Is the hypothesis true?