basic statistics: mbi-540 · basic statistics: mbi-540 sally w. thurston, phd associate professor...

173
Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January 2011 Sally W. Thurston, PhD Basic Statistics: MBI-540

Upload: others

Post on 04-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Basic Statistics: MBI-540

Sally W. Thurston, PhD

Associate ProfessorDepartment of Biostatistics and Computational Biology

University of Rochester

January 2011

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 2: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Why statistics?

Allows us to say something (“make inference”) about apopulation based on data from a sample.Example: measure effect of drug on weight loss in 20BALB/c mice

Sample: the 20 micePopulation: all BALB/c mice (that could be subject to thesame condition)Interested in weight loss in BALB/c mice, not just in the 20mice.Statistics allows us to estimate weight loss in the populationusing data from a sample.

Many journal articles ask for statistics

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 3: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Some comments

Design of experiment is keyNo statistical method can fix bad data!Essential to think about design before data is collected

Be clear about research question(s) before collecting data.Includes: what data to collect to answer question(s).

Know your dataHow the measurements relate to research questionPlots to visualize your data: helpful!

Most statistical models have assumptionsIf violated, statistical “answer” not correct.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 4: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How can a statistician help?

Design of experimentTo help, statistician must understand aims, background

AnalysisStatistician can figure out appropriate model, checkassumptions, fit model to data

Research?Talk with statistician early: help with design to prevent biasIf small project, consultation possibleIf detailed project, statistician should be collaborator, notconsultant

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 5: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outline

1 Design of experiments: examples

2 Mean, standard deviation, standard error

3 M&M data

4 t-test

5 2-sample t-test

6 Importance of plotting the data

7 1-way ANOVA

8 Summary remarks

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 6: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Tuba’s question

Dr. Tuba is interested in the effect of drugs on weight loss inmice after infection.

Drugs are given right after infection6 mice were given the standard drug6 mice were given the new drugDr. Tuba reports p < 0.05 for a t-test of the difference inmean weight loss across the two groups after day 5.

Should Dr. Tuba publish?

Your thoughts?What additional information might you need beforedeciding?

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 7: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Tuba’s question

Dr. Tuba is interested in the effect of drugs on weight loss inmice after infection.

Drugs are given right after infection6 mice were given the standard drug6 mice were given the new drugDr. Tuba reports p < 0.05 for a t-test of the difference inmean weight loss across the two groups after day 5.

Should Dr. Tuba publish?

Your thoughts?What additional information might you need beforedeciding?

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 8: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Tuba’s question

Dr. Tuba is interested in the effect of drugs on weight loss inmice after infection.

Drugs are given right after infection6 mice were given the standard drug6 mice were given the new drugDr. Tuba reports p < 0.05 for a t-test of the difference inmean weight loss across the two groups after day 5.

Should Dr. Tuba publish?

Your thoughts?What additional information might you need beforedeciding?

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 9: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Some things to consider

How did Dr. Tuba choose which mice received each drug?Are the mice independent?Plot the data - why?Did he choose to present results only from the time periodthat showed the most significant effect?

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 10: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario AThe 12 mice were running around in a pen.Dr. Tuba captured the mice one at a time.The first 6 mice captured were given the standard drug.

Problems?Ease of capture might be associated with propensity tolose weight.

Fastest mice might be the most healthyFastest mice might be the lightest

So - mouse selection could be the cause of an apparentdrug effect.

Important to randomize treatment assignment.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 11: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario AThe 12 mice were running around in a pen.Dr. Tuba captured the mice one at a time.The first 6 mice captured were given the standard drug.

Problems?

Ease of capture might be associated with propensity tolose weight.

Fastest mice might be the most healthyFastest mice might be the lightest

So - mouse selection could be the cause of an apparentdrug effect.

Important to randomize treatment assignment.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 12: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario AThe 12 mice were running around in a pen.Dr. Tuba captured the mice one at a time.The first 6 mice captured were given the standard drug.

Problems?Ease of capture might be associated with propensity tolose weight.

Fastest mice might be the most healthyFastest mice might be the lightest

So - mouse selection could be the cause of an apparentdrug effect.

Important to randomize treatment assignment.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 13: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario AThe 12 mice were running around in a pen.Dr. Tuba captured the mice one at a time.The first 6 mice captured were given the standard drug.

Problems?Ease of capture might be associated with propensity tolose weight.

Fastest mice might be the most healthyFastest mice might be the lightest

So - mouse selection could be the cause of an apparentdrug effect.

Important to randomize treatment assignment.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 14: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario BThe mice came from 2 dams, treatment was randomized.Pups from the first dam were given the standard drug.Pups from the second dam were given the new drug.

Problems?The effect of drug is totally confounded with litterTranslation: If we see a difference in outcome between thetwo groups, we can’t tell if it was

caused by the drug - ORdue to a litter effect

Mice in one litter might have a different propensity to loseweight than mice in another litter.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 15: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario BThe mice came from 2 dams, treatment was randomized.Pups from the first dam were given the standard drug.Pups from the second dam were given the new drug.

Problems?

The effect of drug is totally confounded with litterTranslation: If we see a difference in outcome between thetwo groups, we can’t tell if it was

caused by the drug - ORdue to a litter effect

Mice in one litter might have a different propensity to loseweight than mice in another litter.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 16: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario BThe mice came from 2 dams, treatment was randomized.Pups from the first dam were given the standard drug.Pups from the second dam were given the new drug.

Problems?The effect of drug is totally confounded with litterTranslation: If we see a difference in outcome between thetwo groups, we can’t tell if it was

caused by the drug - ORdue to a litter effect

Mice in one litter might have a different propensity to loseweight than mice in another litter.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 17: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario CThe mice came from 6 dams.The first pup in each dam was given the standard drug.The second pup in each dam was given the new drug.

Problems?Birth order may be associated with the propensity to loseweight.Birth order is confounded with drug.

Better idea: within a litter, randomize first/second born pups todrug.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 18: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario CThe mice came from 6 dams.The first pup in each dam was given the standard drug.The second pup in each dam was given the new drug.

Problems?

Birth order may be associated with the propensity to loseweight.Birth order is confounded with drug.

Better idea: within a litter, randomize first/second born pups todrug.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 19: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario CThe mice came from 6 dams.The first pup in each dam was given the standard drug.The second pup in each dam was given the new drug.

Problems?Birth order may be associated with the propensity to loseweight.Birth order is confounded with drug.

Better idea: within a litter, randomize first/second born pups todrug.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 20: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario CThe mice came from 6 dams.The first pup in each dam was given the standard drug.The second pup in each dam was given the new drug.

Problems?Birth order may be associated with the propensity to loseweight.Birth order is confounded with drug.

Better idea: within a litter, randomize first/second born pups todrug.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 21: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario DMice were all first-born pups from different dams.Mice were randomized to treatmentAll mice which received the standard drug were housed inone cage.All mice which received the new drug were housed in one(different) cage.

Problems?There might be a cage effect - viral infection, hotterlocation, less water, · · · .

Better idea: divide animals from a drug group into > 1 cage - orif necessary, house all in same cage.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 22: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario DMice were all first-born pups from different dams.Mice were randomized to treatmentAll mice which received the standard drug were housed inone cage.All mice which received the new drug were housed in one(different) cage.

Problems?

There might be a cage effect - viral infection, hotterlocation, less water, · · · .

Better idea: divide animals from a drug group into > 1 cage - orif necessary, house all in same cage.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 23: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How did Dr. Tuba choose the mice?

Scenario DMice were all first-born pups from different dams.Mice were randomized to treatmentAll mice which received the standard drug were housed inone cage.All mice which received the new drug were housed in one(different) cage.

Problems?There might be a cage effect - viral infection, hotterlocation, less water, · · · .

Better idea: divide animals from a drug group into > 1 cage - orif necessary, house all in same cage.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 24: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Was day 5 one of many days he considered?

1 2 3 4 5

−4

−3

−2

−1

0

Days

Wei

ght l

oss

●●

● group 1group 2

Comments?

Simulated data: the mean isthe same for both groups, atall timesLooking for the biggestdifference = multiplecomparisons

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 25: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Was day 5 one of many days he considered?

1 2 3 4 5

−4

−3

−2

−1

0

Days

Wei

ght l

oss

●●

● group 1group 2

Comments?Simulated data: the mean isthe same for both groups, atall timesLooking for the biggestdifference = multiplecomparisons

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 26: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Was day 5 one of many days he considered?

1 2 3 4 5

−8

−6

−4

−2

0

Days

Wei

ght l

oss

●●

●●

●●

● group 1group 2

Comments?

Simulated data: the meandeclines with time for group2, not group 1If expect this, could chooseday 5 a prioriCould use other statisticalmethods to compare slopeover time in the 2 groups.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 27: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Was day 5 one of many days he considered?

1 2 3 4 5

−8

−6

−4

−2

0

Days

Wei

ght l

oss

●●

●●

●●

● group 1group 2

Comments?Simulated data: the meandeclines with time for group2, not group 1If expect this, could chooseday 5 a prioriCould use other statisticalmethods to compare slopeover time in the 2 groups.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 28: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Are the mice independent?

Mice from the same litter are more like each other.Mice housed in the same cage might respond moresimilarlyWe are not interested in the effect of litter (or cage)

If appropriate experimental design, can control for correlatedobservations statistically.Need to record and save data on the litter (or cage)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 29: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Are the mice independent?

Mice from the same litter are more like each other.Mice housed in the same cage might respond moresimilarlyWe are not interested in the effect of litter (or cage)

If appropriate experimental design, can control for correlatedobservations statistically.Need to record and save data on the litter (or cage)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 30: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Other scenarios (1)

An investigator is comparing response to different doses of 3drugs on 96-well plates.

Each drug is tested at 5 doses.Plate 1 has multiple wells of drug A at each dose level.Plate 2 has multiple wells of drug B at each dose level.Plate 3 has multiple wells of drug C at each dose level.

Is this a good experimental design? Why / why not?

If one drug appears to be better, it might be due to a differencein the plates.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 31: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Other scenarios (1)

An investigator is comparing response to different doses of 3drugs on 96-well plates.

Each drug is tested at 5 doses.Plate 1 has multiple wells of drug A at each dose level.Plate 2 has multiple wells of drug B at each dose level.Plate 3 has multiple wells of drug C at each dose level.

Is this a good experimental design? Why / why not?

If one drug appears to be better, it might be due to a differencein the plates.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 32: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Other scenarios (1)

An investigator is comparing response to different doses of 3drugs on 96-well plates.

Each drug is tested at 5 doses.Plate 1 has multiple wells of drug A at each dose level.Plate 2 has multiple wells of drug B at each dose level.Plate 3 has multiple wells of drug C at each dose level.

Is this a good experimental design? Why / why not?

If one drug appears to be better, it might be due to a differencein the plates.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 33: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Other scenarios (2)

Dr. Busy scores mouse activity in a 1-minute period on a scaleof 1-10.

All mice are measured 6 hours after treatment10 mice receive a standard treatment, 10 a new treatment.Treatment assignment is randomizedThe new treatment is expected to be better.The investigator knows the treatment assignment of eachmouse.

Is this a good experimental design? Why / why not?

Expectation that new treatment is better could unconsciouslybias the investigator’s measurements

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 34: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Other scenarios (2)

Dr. Busy scores mouse activity in a 1-minute period on a scaleof 1-10.

All mice are measured 6 hours after treatment10 mice receive a standard treatment, 10 a new treatment.Treatment assignment is randomizedThe new treatment is expected to be better.The investigator knows the treatment assignment of eachmouse.

Is this a good experimental design? Why / why not?

Expectation that new treatment is better could unconsciouslybias the investigator’s measurements

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 35: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Other scenarios (2)

Dr. Busy scores mouse activity in a 1-minute period on a scaleof 1-10.

All mice are measured 6 hours after treatment10 mice receive a standard treatment, 10 a new treatment.Treatment assignment is randomizedThe new treatment is expected to be better.The investigator knows the treatment assignment of eachmouse.

Is this a good experimental design? Why / why not?

Expectation that new treatment is better could unconsciouslybias the investigator’s measurements

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 36: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Comments on experimental designs

In the scenarios just described, there is a possibility thatthe intended effect cannot be distinguished fromsomething else that is not of interest.Estimate of intended effect may be biased.

Unless the unintended effect can be estimated(impossible?)

No statistical analysis will fix this problem.Often, the problem can be avoided by good experimentaldesign.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 37: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Comments on experimental designs

In the scenarios just described, there is a possibility thatthe intended effect cannot be distinguished fromsomething else that is not of interest.Estimate of intended effect may be biased.Unless the unintended effect can be estimated(impossible?)

No statistical analysis will fix this problem.

Often, the problem can be avoided by good experimentaldesign.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 38: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Comments on experimental designs

In the scenarios just described, there is a possibility thatthe intended effect cannot be distinguished fromsomething else that is not of interest.Estimate of intended effect may be biased.Unless the unintended effect can be estimated(impossible?)

No statistical analysis will fix this problem.Often, the problem can be avoided by good experimentaldesign.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 39: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Summary: experimental design

Data should be a random sample from populationIf treatment is assigned, assignment should be randomlydetermined.

Can use table of random numbersBlinding

Person measuring endpoints should not know treatmentassignmentPerson receiving treatment should not know his/hertreatmentParticularly an issue if measurements can be subjective

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 40: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outline

1 Design of experiments: examples

2 Mean, standard deviation, standard error

3 M&M data

4 t-test

5 2-sample t-test

6 Importance of plotting the data

7 1-way ANOVA

8 Summary remarks

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 41: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Forte’s question

A well-known journal article reports that after infection,BALB/c mice given ’superdrug’ lost 4 ounces on averageafter one week.Dr. Forte wants to compare this to ’superdrug’ effects oninfected C57BL/6 mice.Dr. Forte infects 4 C57BL/6 mice, treats with ’superdrug’,and measures weight loss after one week.

His measurements of weight loss: 5, 8, 6, 3 ounces.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 42: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

The mean

Data (weight loss) represented by x1, x2, · · · , xnn is sample size (number observations)Dr. Forte’s data: x1 = 5, x2 = 8, x3 = 6, x4 = 3.

Population mean (µ): the average xi over all observationsin population

Cannot measure data from all observations in population.µ is unknown (a population parameter)

Estimate µ by µ̂. Here µ̂ = x̄ = sample mean.

For Dr. Forte’s experimentWhat is n?What is the sample?What is the population?

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 43: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

The mean

Data (weight loss) represented by x1, x2, · · · , xnn is sample size (number observations)Dr. Forte’s data: x1 = 5, x2 = 8, x3 = 6, x4 = 3.

Population mean (µ): the average xi over all observationsin population

Cannot measure data from all observations in population.µ is unknown (a population parameter)

Estimate µ by µ̂. Here µ̂ = x̄ = sample mean.

For Dr. Forte’s experimentWhat is n?What is the sample?What is the population?

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 44: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

The mean (cont.)

Dr. Forte’s data: 3, 5, 6, 8.Sample mean, x̄ = 1

n∑n

i=1 xi

What is Dr. Forte’s sample mean?

x̄ = 14(3 + 5 + 6 + 8) = 5.5

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 45: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

The mean (cont.)

Dr. Forte’s data: 3, 5, 6, 8.Sample mean, x̄ = 1

n∑n

i=1 xi

What is Dr. Forte’s sample mean?

x̄ = 14(3 + 5 + 6 + 8) = 5.5

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 46: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Variability

Consider measurements from 2 researchers:Dr. Forte’s measurements: 3, 5, 6, 8.Dr. Pianissimo’s measurements: 5.1, 5.3, 5.7, 5.9Both have x̄ = 5.5.

Whose measurements give you more confidence about yourknowledge of the population mean?

Dr. Pianissimo: data closer together, plausible values for µ insmaller interval.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 47: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Variability

Consider measurements from 2 researchers:Dr. Forte’s measurements: 3, 5, 6, 8.Dr. Pianissimo’s measurements: 5.1, 5.3, 5.7, 5.9Both have x̄ = 5.5.

Whose measurements give you more confidence about yourknowledge of the population mean?

Dr. Pianissimo: data closer together, plausible values for µ insmaller interval.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 48: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Variance (σ2 = population variance)

Population variance: σ2 = E[(xi − µ)2]

E means “expectation”: average over all observations inpopulation.Translation: σ2 is expected squared difference betweenindividual values and population meanσ = population SD (of xi )

Sample variance, σ̂2

Estimate σ2 by σ̂2.If µ known: σ̂2 = 1

n

∑ni=1(xi − µ)2

If µ unknown: σ̂2 = s2 = 1n−1

∑ni=1(xi − x̄)2

Dr. Forte’s data: 3, 5, 6, 8, x̄ = 5.5. What is s2?s2 = (3−5.5)2+(5−5.5)2+(6−5.5)2+(8−5.5)2

3 = 4.333

Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9, s2 = 0.133.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 49: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Variance (σ2 = population variance)

Population variance: σ2 = E[(xi − µ)2]

E means “expectation”: average over all observations inpopulation.Translation: σ2 is expected squared difference betweenindividual values and population meanσ = population SD (of xi )

Sample variance, σ̂2

Estimate σ2 by σ̂2.If µ known: σ̂2 = 1

n

∑ni=1(xi − µ)2

If µ unknown: σ̂2 = s2 = 1n−1

∑ni=1(xi − x̄)2

Dr. Forte’s data: 3, 5, 6, 8, x̄ = 5.5. What is s2?

s2 = (3−5.5)2+(5−5.5)2+(6−5.5)2+(8−5.5)2

3 = 4.333

Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9, s2 = 0.133.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 50: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Variance (σ2 = population variance)

Population variance: σ2 = E[(xi − µ)2]

E means “expectation”: average over all observations inpopulation.Translation: σ2 is expected squared difference betweenindividual values and population meanσ = population SD (of xi )

Sample variance, σ̂2

Estimate σ2 by σ̂2.If µ known: σ̂2 = 1

n

∑ni=1(xi − µ)2

If µ unknown: σ̂2 = s2 = 1n−1

∑ni=1(xi − x̄)2

Dr. Forte’s data: 3, 5, 6, 8, x̄ = 5.5. What is s2?s2 = (3−5.5)2+(5−5.5)2+(6−5.5)2+(8−5.5)2

3 = 4.333

Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9, s2 = 0.133.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 51: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Variance (σ2 = population variance)

Population variance: σ2 = E[(xi − µ)2]

E means “expectation”: average over all observations inpopulation.Translation: σ2 is expected squared difference betweenindividual values and population meanσ = population SD (of xi )

Sample variance, σ̂2

Estimate σ2 by σ̂2.If µ known: σ̂2 = 1

n

∑ni=1(xi − µ)2

If µ unknown: σ̂2 = s2 = 1n−1

∑ni=1(xi − x̄)2

Dr. Forte’s data: 3, 5, 6, 8, x̄ = 5.5. What is s2?s2 = (3−5.5)2+(5−5.5)2+(6−5.5)2+(8−5.5)2

3 = 4.333

Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9, s2 = 0.133.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 52: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

SD, SE

Standard deviation (of x), SD(x) is s =√

s2

Can also write s as sx

Standard error (of x̄), SE(x̄) is√

s2

n = s√n

In other contexts, can speak of standard error of otherquantities, e.g. SE(β̂)

Dr. Forte’s data: 3, 5, 6, 8,x̄ = 5.5, s2 = 4.333.

What are SD(x) and SE(x̄)?SD(x) =

√4.333 = 2.08

SE(x̄) = 2.08√4

= 1.04

Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9x̄ = 5.5, s2 = 0.133.

What are SD(x) and SE(x̄)?SD(x) =

√0.133 = 0.365

SE(x̄) = 0.365√4

= 0.183

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 53: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

SD, SE

Standard deviation (of x), SD(x) is s =√

s2

Can also write s as sx

Standard error (of x̄), SE(x̄) is√

s2

n = s√n

In other contexts, can speak of standard error of otherquantities, e.g. SE(β̂)

Dr. Forte’s data: 3, 5, 6, 8,x̄ = 5.5, s2 = 4.333.

What are SD(x) and SE(x̄)?

SD(x) =√

4.333 = 2.08SE(x̄) = 2.08√

4= 1.04

Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9x̄ = 5.5, s2 = 0.133.

What are SD(x) and SE(x̄)?SD(x) =

√0.133 = 0.365

SE(x̄) = 0.365√4

= 0.183

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 54: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

SD, SE

Standard deviation (of x), SD(x) is s =√

s2

Can also write s as sx

Standard error (of x̄), SE(x̄) is√

s2

n = s√n

In other contexts, can speak of standard error of otherquantities, e.g. SE(β̂)

Dr. Forte’s data: 3, 5, 6, 8,x̄ = 5.5, s2 = 4.333.

What are SD(x) and SE(x̄)?SD(x) =

√4.333 = 2.08

SE(x̄) = 2.08√4

= 1.04

Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9x̄ = 5.5, s2 = 0.133.

What are SD(x) and SE(x̄)?

SD(x) =√

0.133 = 0.365SE(x̄) = 0.365√

4= 0.183

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 55: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

SD, SE

Standard deviation (of x), SD(x) is s =√

s2

Can also write s as sx

Standard error (of x̄), SE(x̄) is√

s2

n = s√n

In other contexts, can speak of standard error of otherquantities, e.g. SE(β̂)

Dr. Forte’s data: 3, 5, 6, 8,x̄ = 5.5, s2 = 4.333.

What are SD(x) and SE(x̄)?SD(x) =

√4.333 = 2.08

SE(x̄) = 2.08√4

= 1.04

Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9x̄ = 5.5, s2 = 0.133.

What are SD(x) and SE(x̄)?SD(x) =

√0.133 = 0.365

SE(x̄) = 0.365√4

= 0.183

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 56: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

SD, SE (cont.)

Recall SD(x) is sx =√

1n−1

∑ni=1(xi − x̄)2, SE(x̄) = sx√

n

If Dr. Forte had collected data from 20 mice instead of 4, howwould SD(x) and SE(x̄) be affected?

SD(x) would be about the same (estimated more precisely)SE(x̄) would decrease (we have more confidence about x̄).

What do SD(x) and SE(x̄) mean?SD(x) is our estimate of σ, the SD of x in the population.SE(x̄) is a measure of our uncertainty about µ.

Can form confidence interval (CI) for µ: x̄ ± tcritSE(x̄)tcrit discussed later (usually ≈ 2)Approximately 95% of such CIs include µ.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 57: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

SD, SE (cont.)

Recall SD(x) is sx =√

1n−1

∑ni=1(xi − x̄)2, SE(x̄) = sx√

n

If Dr. Forte had collected data from 20 mice instead of 4, howwould SD(x) and SE(x̄) be affected?

SD(x) would be about the same (estimated more precisely)SE(x̄) would decrease (we have more confidence about x̄).

What do SD(x) and SE(x̄) mean?SD(x) is our estimate of σ, the SD of x in the population.SE(x̄) is a measure of our uncertainty about µ.

Can form confidence interval (CI) for µ: x̄ ± tcritSE(x̄)tcrit discussed later (usually ≈ 2)Approximately 95% of such CIs include µ.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 58: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

SD, SE (cont.)

Recall SD(x) is sx =√

1n−1

∑ni=1(xi − x̄)2, SE(x̄) = sx√

n

If Dr. Forte had collected data from 20 mice instead of 4, howwould SD(x) and SE(x̄) be affected?

SD(x) would be about the same (estimated more precisely)SE(x̄) would decrease (we have more confidence about x̄).

What do SD(x) and SE(x̄) mean?SD(x) is our estimate of σ, the SD of x in the population.SE(x̄) is a measure of our uncertainty about µ.

Can form confidence interval (CI) for µ: x̄ ± tcritSE(x̄)tcrit discussed later (usually ≈ 2)Approximately 95% of such CIs include µ.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 59: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Why divide by n − 1 in s2?

If we knew µ, σ̂2 =Pn

i=1(xi−µ)2

n

We don’t know µ, so σ̂2 = s2 =Pn

i=1(xi−x̄)2

n−1

For a particular sample, x̄ minimizes∑

(xi − µ̂)2 for any µ̂

Using x̄ to estimate µ:Numerate of s2 is (on average) smaller than

∑(xi − µ)2

Need to divide by smaller number to compensate; n − 1 isthe right number

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 60: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Why divide by n − 1 in s2?

Examine by simulationTake n = 5 observations fromnormal, µ = 0, σ2 = 1.

Calculate s2 =P

(xi−x̄)2

4 andP(xi−x̄)2

5

Repeat multiple timesNote that true value of σ is 1.

Dividing by n − 1 does wellDividing by n underestimates σ

SD(x): divide by (n−1)

Fre

quen

cy

0.5 1.0 1.5 2.0

020

4060

8010

0

Not SD(x): divide by n

Fre

quen

cy

0.0 0.5 1.0 1.5

020

4060

8012

0

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 61: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Why divide by n − 1 in s2?

Examine by simulationTake n = 5 observations fromnormal, µ = 0, σ2 = 1.

Calculate s2 =P

(xi−x̄)2

4 andP(xi−x̄)2

5

Repeat multiple timesNote that true value of σ is 1.Dividing by n − 1 does wellDividing by n underestimates σ

SD(x): divide by (n−1)

Fre

quen

cy

0.5 1.0 1.5 2.0

020

4060

8010

0

Not SD(x): divide by n

Fre

quen

cy

0.0 0.5 1.0 1.5

020

4060

8012

0

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 62: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outline

1 Design of experiments: examples

2 Mean, standard deviation, standard error

3 M&M data

4 t-test

5 2-sample t-test

6 Importance of plotting the data

7 1-way ANOVA

8 Summary remarks

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 63: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

We collected data on the number of M&M of each color inpackages of mini M&M’s

# brown

Fre

quen

cy

5 10 15 20

02

46

8

# orange

Fre

quen

cy

4 6 8 10 12 14 16

02

46

8

# yellow

Fre

quen

cy

2 4 6 8 10 12 14

02

46

810

12

# green

Fre

quen

cy

6 8 10 12

01

23

45

67

# blue

Fre

quen

cy

4 6 8 10 12 14 16 18

05

1015

# red

Fre

quen

cy

4 6 8 10 12 14 16

05

1015

32 people totalHistograms show number ofeach color

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 64: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

Aside: did everyone get same # M&M’s? NO

Number MM: 50 51 55 56 57 58 59 60 61 62 66 73People: 1 1 3 2 9 2 4 5 1 2 1 1

%(blue + red): All

Fre

quen

cy

0.30 0.35 0.40 0.45

01

23

45

67 Focus on % (red + blue)

x1 is % (red + blue) for firstperson, x2 same for secondperson, etc.What is the sample?What is the population?

We could approximate xi with anormal distribution, mean µ,variance σ2

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 65: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

Aside: did everyone get same # M&M’s? NO

Number MM: 50 51 55 56 57 58 59 60 61 62 66 73People: 1 1 3 2 9 2 4 5 1 2 1 1

%(blue + red): All

Fre

quen

cy

0.30 0.35 0.40 0.45

01

23

45

67 Focus on % (red + blue)

x1 is % (red + blue) for firstperson, x2 same for secondperson, etc.What is the sample?What is the population?

We could approximate xi with anormal distribution, mean µ,variance σ2

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 66: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

Aside: did everyone get same # M&M’s? NO

Number MM: 50 51 55 56 57 58 59 60 61 62 66 73People: 1 1 3 2 9 2 4 5 1 2 1 1

%(blue + red): All

Fre

quen

cy

0.30 0.35 0.40 0.45

01

23

45

67 Focus on % (red + blue)

x1 is % (red + blue) for firstperson, x2 same for secondperson, etc.What is the sample?What is the population?

We could approximate xi with anormal distribution, mean µ,variance σ2

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 67: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

sample 1

Fre

quen

cy

0.30 0.35 0.40 0.45

0.0

0.5

1.0

1.5

2.0

2.5

3.0

sample 2

Fre

quen

cy

0.30 0.35 0.40 0.45

0.0

0.5

1.0

1.5

2.0

sample 3

Fre

quen

cy

0.30 0.32 0.34 0.36 0.38 0.40

0.0

0.5

1.0

1.5

2.0

sample 4

Fre

quen

cy

0.30 0.32 0.34 0.36 0.38 0.40

0.0

0.5

1.0

1.5

2.0

%(blue + red): samples

Interest: % (red + blue)Divide data into 4 groups of 8observationsCalculate mean, SD, SE(x̄) ineach group

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 68: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

Each group has n = 8 observations (32 observations total)Group 1: % (red + blue): Data (x1, · · · , x8) is

0.45, 0.38, 0.39, 0.38, 0.35, 0.33, 0.41, 0.29

Calculate x̄

x̄ = (0.45+0.38+0.39+0.38+0.35+0.33+0.41+0.29)8 = 0.374

Calculate sx (standard deviation)

sx =

√(0.45−0.374)2+(0.38−0.374)2+···+(0.29−0.374)2

7 = 0.048

Calculate SE(x̄) = sx√n = 0.048√

8= 0.017

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 69: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

Each group has n = 8 observations (32 observations total)Group 1: % (red + blue): Data (x1, · · · , x8) is

0.45, 0.38, 0.39, 0.38, 0.35, 0.33, 0.41, 0.29

Calculate x̄x̄ = (0.45+0.38+0.39+0.38+0.35+0.33+0.41+0.29)

8 = 0.374Calculate sx (standard deviation)

sx =

√(0.45−0.374)2+(0.38−0.374)2+···+(0.29−0.374)2

7 = 0.048

Calculate SE(x̄) = sx√n = 0.048√

8= 0.017

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 70: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

Each group has n = 8 observations (32 observations total)Group 1: % (red + blue): Data (x1, · · · , x8) is

0.45, 0.38, 0.39, 0.38, 0.35, 0.33, 0.41, 0.29

Calculate x̄x̄ = (0.45+0.38+0.39+0.38+0.35+0.33+0.41+0.29)

8 = 0.374Calculate sx (standard deviation)

sx =

√(0.45−0.374)2+(0.38−0.374)2+···+(0.29−0.374)2

7 = 0.048

Calculate SE(x̄) = sx√n = 0.048√

8= 0.017

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 71: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

Each group has n = 8 observations (32 observations total)Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121

Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077

We might want to know if % red + blue in the population = 13

(later: t-test)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 72: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

Each group has n = 8 observations (32 observations total)Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121

Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077

We might want to know if % red + blue in the population = 13

(later: t-test)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 73: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

M&M data

Each group has n = 8 observations (32 observations total)Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121

Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077

We might want to know if % red + blue in the population = 13

(later: t-test)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 74: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outline

1 Design of experiments: examples

2 Mean, standard deviation, standard error

3 M&M data

4 t-test

5 2-sample t-test

6 Importance of plotting the data

7 1-way ANOVA

8 Summary remarks

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 75: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test

Recall Dr. Forte’s question: is weight loss in C57BL/6 mice(his data) different than in BALB/c mice (published data,weight loss = 4 ounces)?

H0 is the null hypothesis (“no effect”, “same as before”)HA is the alternative hypothesis (“some effect”, “not H0”)

Example: H0 : µ = µ0 versus HA : µ 6= µ0.µ is mean weight loss for C57Bl/6 mice in populationµ is unknown, but we have an estimate of it (x̄)µ0 is some specific number (here: 4).

How to evaluate H0, HA?Compare x̄ to µ0If x̄ is far from µ0, suggests H0 not true.Need SE(x̄) to evaluate how different x̄ and µ0 are.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 76: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test

Recall Dr. Forte’s question: is weight loss in C57BL/6 mice(his data) different than in BALB/c mice (published data,weight loss = 4 ounces)?

H0 is the null hypothesis (“no effect”, “same as before”)HA is the alternative hypothesis (“some effect”, “not H0”)Example: H0 : µ = µ0 versus HA : µ 6= µ0.

µ is mean weight loss for C57Bl/6 mice in populationµ is unknown, but we have an estimate of it (x̄)µ0 is some specific number (here: 4).

How to evaluate H0, HA?Compare x̄ to µ0If x̄ is far from µ0, suggests H0 not true.Need SE(x̄) to evaluate how different x̄ and µ0 are.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 77: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test

Recall Dr. Forte’s question: is weight loss in C57BL/6 mice(his data) different than in BALB/c mice (published data,weight loss = 4 ounces)?

H0 is the null hypothesis (“no effect”, “same as before”)HA is the alternative hypothesis (“some effect”, “not H0”)Example: H0 : µ = µ0 versus HA : µ 6= µ0.

µ is mean weight loss for C57Bl/6 mice in populationµ is unknown, but we have an estimate of it (x̄)µ0 is some specific number (here: 4).

How to evaluate H0, HA?Compare x̄ to µ0If x̄ is far from µ0, suggests H0 not true.Need SE(x̄) to evaluate how different x̄ and µ0 are.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 78: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test assumptions

T-test is based on tcalc = x̄−µ0SE(x̄)

Assumptions: xi ∼ independent N(µ, σ2)

Interpretation: ∼ means “distributed as”Says data are independent and have a normal distributionµ = population mean (of the xi ’s)σ2 = population variance (of the xi ’s) (σ = SD)

“How can I tell if my data have a normal distribution?”Assumption is about distribution of data from populationt-test robust to moderate departure from normalityImportant to plot data to check for gross violation fromnormality

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 79: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test assumptions

T-test is based on tcalc = x̄−µ0SE(x̄)

Assumptions: xi ∼ independent N(µ, σ2)

Interpretation: ∼ means “distributed as”Says data are independent and have a normal distributionµ = population mean (of the xi ’s)σ2 = population variance (of the xi ’s) (σ = SD)

“How can I tell if my data have a normal distribution?”Assumption is about distribution of data from populationt-test robust to moderate departure from normalityImportant to plot data to check for gross violation fromnormality

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 80: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Samples of n=20 from a normal distribution

normsamp[ctr]

−3 −1 0 1 2 3

01

23

4

normsamp[ctr]

Fre

quen

cy

−3 −1 0 1 2 3

01

23

45

normsamp[ctr]

Fre

quen

cy

−3 −1 0 1 2 3

0.0

1.0

2.0

3.0

normsamp[ctr]

−3 −1 0 1 2 3

01

23

45

normsamp[ctr]

Fre

quen

cy

−3 −1 0 1 2 3

01

23

45

67

normsamp[ctr]

Fre

quen

cy

−3 −1 0 1 2 3

01

23

4

−3 −1 0 1 2 3

01

23

45

67

Fre

quen

cy

−3 −1 0 1 2 3

01

23

45

67

Fre

quen

cy

−3 −1 0 1 2 3

01

23

45

6

Normal

normsamp[ctr]

Fre

quen

cy

−3 −2 −1 0 1 2 3

01

23

4

One mistake

c(normsamp[1:(nper.plot − 1)], 10.25)

Fre

quen

cy

−4 −2 0 2 4 6 8 10

01

23

4

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 81: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Samples of n=20 from a normal distribution

normsamp[ctr]

−3 −1 0 1 2 3

01

23

4

normsamp[ctr]

Fre

quen

cy

−3 −1 0 1 2 3

01

23

45

normsamp[ctr]

Fre

quen

cy

−3 −1 0 1 2 3

0.0

1.0

2.0

3.0

normsamp[ctr]

−3 −1 0 1 2 3

01

23

45

normsamp[ctr]

Fre

quen

cy

−3 −1 0 1 2 3

01

23

45

67

normsamp[ctr]

Fre

quen

cy

−3 −1 0 1 2 3

01

23

4

−3 −1 0 1 2 3

01

23

45

67

Fre

quen

cy

−3 −1 0 1 2 3

01

23

45

67

Fre

quen

cy

−3 −1 0 1 2 3

01

23

45

6

Normal

normsamp[ctr]

Fre

quen

cy

−3 −2 −1 0 1 2 3

01

23

4

One mistake

c(normsamp[1:(nper.plot − 1)], 10.25)

Fre

quen

cy

−4 −2 0 2 4 6 8 10

01

23

4

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 82: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outliers

“My data has one very unusual value. Should I delete it?”

Step 1: check to see if the value is a mistake! If so, fix.Step 2: is there something unusual about thatobservation?

animal was sicka different person took the measurementexperimental conditions different from others

If clear reason why unusual value differs from otherobservations

Value represents different condition: OK to delete (explainin paper)

If cannot determine any reason why value is unusualDeleting observation underestimates population variability:don’t deleteAlternative: report results both with and without the value(explain in paper)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 83: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outliers

“My data has one very unusual value. Should I delete it?”Step 1: check to see if the value is a mistake! If so, fix.Step 2: is there something unusual about thatobservation?

animal was sicka different person took the measurementexperimental conditions different from others

If clear reason why unusual value differs from otherobservations

Value represents different condition: OK to delete (explainin paper)

If cannot determine any reason why value is unusualDeleting observation underestimates population variability:don’t deleteAlternative: report results both with and without the value(explain in paper)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 84: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outliers

“My data has one very unusual value. Should I delete it?”Step 1: check to see if the value is a mistake! If so, fix.Step 2: is there something unusual about thatobservation?

animal was sicka different person took the measurementexperimental conditions different from others

If clear reason why unusual value differs from otherobservations

Value represents different condition: OK to delete (explainin paper)

If cannot determine any reason why value is unusualDeleting observation underestimates population variability:don’t deleteAlternative: report results both with and without the value(explain in paper)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 85: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Parametric or non-parametric?

“When should I use a non-parametric version of a t-test?”Some statisticians (nearly) always use non-parametrictests (such as Mann-Whitney)Some statisticians (nearly) always use parametric tests(such as t-test)Knowledge of type of data can help determine if normalityis reasonable

Sometimes data approximately normal after transformation

If no severe departure from normality, either parametric ornonparametric tests probably fine.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 86: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test example

T-test is based on tcalc = x̄−µ0SE(x̄)

Dr. Forte: H0 : µ = 4, HA : µ 6= 4.We calculated x̄ = 5.5, SE(x̄) = 1.04.What is tcalc?

Here tcalc = (5.5− 4)/1.04 = 1.44

Under H0, tcalc ∼ tn−1

Here: tcalc ∼ t3If H0 true, tcalc centered around 0

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 87: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test example

T-test is based on tcalc = x̄−µ0SE(x̄)

Dr. Forte: H0 : µ = 4, HA : µ 6= 4.We calculated x̄ = 5.5, SE(x̄) = 1.04.What is tcalc?

Here tcalc = (5.5− 4)/1.04 = 1.44

Under H0, tcalc ∼ tn−1

Here: tcalc ∼ t3If H0 true, tcalc centered around 0

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 88: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-distribution

Recall tcalc = x̄−µ0SE(x̄)

“How can tcalc have a distribution? It’s just a number! ”Your thoughts?

Any single experiment gives a single tcalc .Could take another sample of 4 C57BL/6 mice, measureweight loss, get tcalc for that sample.This could theoretically be repeated multiple times withdifferent miceEach experiment would give a tcalc

A histogram of the tcalc values would look like the t3distribution (assuming enough experiments)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 89: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-distribution

Recall tcalc = x̄−µ0SE(x̄)

“How can tcalc have a distribution? It’s just a number! ”Your thoughts?

Any single experiment gives a single tcalc .Could take another sample of 4 C57BL/6 mice, measureweight loss, get tcalc for that sample.This could theoretically be repeated multiple times withdifferent miceEach experiment would give a tcalc

A histogram of the tcalc values would look like the t3distribution (assuming enough experiments)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 90: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Samples from t-distributions with 3 df

n= 10

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

n= 100

−6 −4 −2 0 2 4

05

1015

2025

30

n= 1000

−4 −2 0 2 4 6

050

100

150

n= 10000

−6 −4 −2 0 2 4 6

020

040

060

0

n=10: First fewobservations: -0.69,1.73, 0.55, 0.96,-0.79, · · ·

n=100: Starting tolook roughly “normal”n=1000: (12 obsoutside plot. Rangewas: -12.8 to 8.3)n=10,000: (85 obsoutside plot. Rangewas: -24.3 to 35.7)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 91: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Samples from t-distributions with 3 df

n= 10

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

n= 100

−6 −4 −2 0 2 4

05

1015

2025

30

n= 1000

−4 −2 0 2 4 6

050

100

150

n= 10000

−6 −4 −2 0 2 4 6

020

040

060

0

n=10: First fewobservations: -0.69,1.73, 0.55, 0.96,-0.79, · · ·n=100: Starting tolook roughly “normal”

n=1000: (12 obsoutside plot. Rangewas: -12.8 to 8.3)n=10,000: (85 obsoutside plot. Rangewas: -24.3 to 35.7)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 92: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Samples from t-distributions with 3 df

n= 10

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

n= 100

−6 −4 −2 0 2 4

05

1015

2025

30

n= 1000

−4 −2 0 2 4 6

050

100

150

n= 10000

−6 −4 −2 0 2 4 6

020

040

060

0

n=10: First fewobservations: -0.69,1.73, 0.55, 0.96,-0.79, · · ·n=100: Starting tolook roughly “normal”n=1000: (12 obsoutside plot. Rangewas: -12.8 to 8.3)

n=10,000: (85 obsoutside plot. Rangewas: -24.3 to 35.7)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 93: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Samples from t-distributions with 3 df

n= 10

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

n= 100

−6 −4 −2 0 2 4

05

1015

2025

30

n= 1000

−4 −2 0 2 4 6

050

100

150

n= 10000

−6 −4 −2 0 2 4 6

020

040

060

0

n=10: First fewobservations: -0.69,1.73, 0.55, 0.96,-0.79, · · ·n=100: Starting tolook roughly “normal”n=1000: (12 obsoutside plot. Rangewas: -12.8 to 8.3)n=10,000: (85 obsoutside plot. Rangewas: -24.3 to 35.7)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 94: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Normal, t distributions: differences?

Normal distributionn= 10000

−6 −4 −2 0 2 4 6

020

040

060

080

0

Normal: range was: -3.6 to 3.9

t3 distributionn= 10000

−6 −4 −2 0 2 4 6

020

040

060

0

t3: range was: -24.3 to 35.7

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 95: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Different t-distributions

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

df=2df=8normal

With more data, larger ndegrees of freedom (df) =n − 1 increasessmaller area in tails of thedistributionapproaches normaldistribution (which is t with∞ df)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 96: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Normal, t distributions

“Why doesn’t tcalc = x̄−µ0SE(x̄) have a normal distribution?”

Recall we assumed xi ∼ N(µ, σ2)

x̄ has a normal distributionIf we knew σ2

SE(x̄) would be σ2/n, a fixed number“tcalc” would have a normal distribution

When σ2 not known, use s2 to estimate itOur estimate of σ2 isn’t perfectResult: tcalc more variable

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 97: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Normal, t distributions

“Why doesn’t tcalc = x̄−µ0SE(x̄) have a normal distribution?”

Recall we assumed xi ∼ N(µ, σ2)

x̄ has a normal distributionIf we knew σ2

SE(x̄) would be σ2/n, a fixed number“tcalc” would have a normal distribution

When σ2 not known, use s2 to estimate itOur estimate of σ2 isn’t perfectResult: tcalc more variable

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 98: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-distribution (3 df)

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

t dist, 3 df2.5% cutpt = 3.18

If H0 trueExpect values nearcenterRed: unusual

If H0 true, control Pr(reject H0) to α(α often chosen as 0.05)Reject H0 if tcalc in shaded area

Each shaded area has 2.5% of thedistributionShaded area defined by tcrit

tcrit is number for which 2.5% ofdistribution has larger values(−tcrit : 2.5% are smaller)tcrit depends on df and α

With 3 df, α = 0.05, tcrit = 3.18Reject H0 if |tcalc | > tcrit

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 99: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-distribution (3 df)

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

t dist, 3 df2.5% cutpt = 3.18

If H0 trueExpect values nearcenterRed: unusual

If H0 true, control Pr(reject H0) to α(α often chosen as 0.05)Reject H0 if tcalc in shaded area

Each shaded area has 2.5% of thedistributionShaded area defined by tcrit

tcrit is number for which 2.5% ofdistribution has larger values(−tcrit : 2.5% are smaller)tcrit depends on df and α

With 3 df, α = 0.05, tcrit = 3.18Reject H0 if |tcalc | > tcrit

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 100: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-distribution (3 df)

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

t dist, 3 df2.5% cutpt = 3.18

If H0 trueExpect values nearcenterRed: unusual

If H0 true, control Pr(reject H0) to α(α often chosen as 0.05)Reject H0 if tcalc in shaded area

Each shaded area has 2.5% of thedistributionShaded area defined by tcrit

tcrit is number for which 2.5% ofdistribution has larger values(−tcrit : 2.5% are smaller)tcrit depends on df and α

With 3 df, α = 0.05, tcrit = 3.18Reject H0 if |tcalc | > tcrit

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 101: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Sample from t-distribution: how good is tcrit?

n= 10000

−6 −4 −2 0 2 4 6

020

040

060

0

How good is tcrit?Do we really get 2.5% in each “tail”?For t3, α = 0.05, tcrit = 3.18.

From sample of 10,000 observation from t3247 observations were < −3.18 (2.47%)258 observations were > 3.18 (2.58%)

If H0 true and α = 0.05, approximately 5% of the time wewould reject H0 (“type I error”)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 102: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Sample from t-distribution: how good is tcrit?

n= 10000

−6 −4 −2 0 2 4 6

020

040

060

0

How good is tcrit?Do we really get 2.5% in each “tail”?For t3, α = 0.05, tcrit = 3.18.

From sample of 10,000 observation from t3247 observations were < −3.18 (2.47%)258 observations were > 3.18 (2.58%)

If H0 true and α = 0.05, approximately 5% of the time wewould reject H0 (“type I error”)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 103: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test for Dr. Forte

Recall Dr. Forte’s question: is weight loss in C57BL/6mice (his data) different than in BALB/c mice(published data, weight loss = 4 ounces)?His hypotheses: H0 : µ = 4, HA : µ 6= 4.µ: weight loss in population of C57BL/6 miceDr. Forte: tcalc = x̄−µ0

SE(x̄) = (5.5− 4)/1.04 = 1.44

t-test rule: Reject H0 if |tcalc | > tcrit

With 3 df, α = 0.05, tcrit = 3.18What should Dr. Forte conclude?

Fail to reject H0Dr. Forte’s data are consistent with H0 (p = 0.25)No evidence that weight loss in C57BL/6 mice isdifferent from 4 ounces.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 104: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test for Dr. Forte

Recall Dr. Forte’s question: is weight loss in C57BL/6mice (his data) different than in BALB/c mice(published data, weight loss = 4 ounces)?His hypotheses: H0 : µ = 4, HA : µ 6= 4.µ: weight loss in population of C57BL/6 miceDr. Forte: tcalc = x̄−µ0

SE(x̄) = (5.5− 4)/1.04 = 1.44

t-test rule: Reject H0 if |tcalc | > tcrit

With 3 df, α = 0.05, tcrit = 3.18What should Dr. Forte conclude?Fail to reject H0

Dr. Forte’s data are consistent with H0 (p = 0.25)No evidence that weight loss in C57BL/6 mice isdifferent from 4 ounces.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 105: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-distribution (3df)

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

t dist, 3 df2.5% cutpt = 3.18

Dr. Forte’s value: blue line(1.44)Compare Dr. Forte’s value toa t distribution with 3 dfReject H0 if tcalc in shadedareaFail to reject H0

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 106: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Confidence interval

95% confidence interval (CI): x̄ ± tcrit × SE(x̄)

(Assumes tcrit calculated using α = 0.05)

This example: tcrit = 3.18We calculated x̄ = 5.5, SE(x̄) = 1.04What is Dr. Forte’s 95% CI?

95% CI is 5.5± 3.18× 1.04 = (2.19, 8.81)

This is a 95% CI for µFail to reject H0 because interval includes µ0 = 4.Confidence interval more informative than “fail to reject H0”

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 107: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Confidence interval

95% confidence interval (CI): x̄ ± tcrit × SE(x̄)

(Assumes tcrit calculated using α = 0.05)

This example: tcrit = 3.18We calculated x̄ = 5.5, SE(x̄) = 1.04What is Dr. Forte’s 95% CI?95% CI is 5.5± 3.18× 1.04 = (2.19, 8.81)

This is a 95% CI for µFail to reject H0 because interval includes µ0 = 4.Confidence interval more informative than “fail to reject H0”

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 108: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Classical statistics

µ considered fixed.Observed data (xi ) are considered random

Could collect new data: different xiImplies calculated confidence interval is random.

If H0 is true, on average 95% of the CIs will include µ.Can never answer “Is H0 true?”

Can only say something about how unusual our teststatistic is if H0 is true

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 109: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

p-value

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

t dist, 3 df12.5% cutpt = 1.44 p-value = probability of observing

a statistic as extreme or moreextreme as we observed, if H0 istrue

Curve: total area=1Dr. Forte’s tcalc = 1.44.p-value = shaded area = 0.25

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 110: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

p-value

We calculated p = 0.25 for Dr. Forte’s data.

Is this the same as the probability that H0 is true?

Nop-value is calculated assuming H0 is true.

Recall: p-value is probability of observing a statistic as or moreextreme as we observed, if H0 is true

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 111: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

p-value

We calculated p = 0.25 for Dr. Forte’s data.

Is this the same as the probability that H0 is true?

Nop-value is calculated assuming H0 is true.

Recall: p-value is probability of observing a statistic as or moreextreme as we observed, if H0 is true

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 112: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

p-value questions

What do you think of these statements?“The p-value tells you whether or not the observed effect isreal.”

Not true“If the result is not statistically significant, that proves thereis no difference.” Not true

Better:Use knowledge of the magnitude of the estimated effect(and the CI) to help interpret the results.Suppose comparing treatment to control and p = 0.07

If estimated effect large, “Our study suggests an importantbenefit of treatment, but this did not reach statisticalsignificance.”

A p-value is not an end in itself.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 113: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

p-value questions

What do you think of these statements?“The p-value tells you whether or not the observed effect isreal.” Not true

“If the result is not statistically significant, that proves thereis no difference.” Not true

Better:Use knowledge of the magnitude of the estimated effect(and the CI) to help interpret the results.Suppose comparing treatment to control and p = 0.07

If estimated effect large, “Our study suggests an importantbenefit of treatment, but this did not reach statisticalsignificance.”

A p-value is not an end in itself.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 114: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

p-value questions

What do you think of these statements?“The p-value tells you whether or not the observed effect isreal.” Not true“If the result is not statistically significant, that proves thereis no difference.”

Not trueBetter:

Use knowledge of the magnitude of the estimated effect(and the CI) to help interpret the results.Suppose comparing treatment to control and p = 0.07

If estimated effect large, “Our study suggests an importantbenefit of treatment, but this did not reach statisticalsignificance.”

A p-value is not an end in itself.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 115: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

p-value questions

What do you think of these statements?“The p-value tells you whether or not the observed effect isreal.” Not true“If the result is not statistically significant, that proves thereis no difference.” Not true

Better:Use knowledge of the magnitude of the estimated effect(and the CI) to help interpret the results.Suppose comparing treatment to control and p = 0.07

If estimated effect large, “Our study suggests an importantbenefit of treatment, but this did not reach statisticalsignificance.”

A p-value is not an end in itself.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 116: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

p-value questions

What do you think of these statements?“The p-value tells you whether or not the observed effect isreal.” Not true“If the result is not statistically significant, that proves thereis no difference.” Not true

Better:Use knowledge of the magnitude of the estimated effect(and the CI) to help interpret the results.Suppose comparing treatment to control and p = 0.07

If estimated effect large, “Our study suggests an importantbenefit of treatment, but this did not reach statisticalsignificance.”

A p-value is not an end in itself.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 117: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test for M&M’s

Consider M&M data from 1st sample of 8 peopleGroup 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169

Recall xi is % (red + blue) M&M’sTest H0 : µ = 1/3, versus HA : µ 6= 1/3.What is tcalc?

0.374− 0.3330.0169

= 2.41

Compare to tcrit = 2.36 (based on (n-1) df, α = 0.05)Conclusion? Reject H0 (also p = 0.047).

For first 8 people, mean % (red + blue) M&M’s significantlydifferent than 1/3.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 118: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test for M&M’s

Consider M&M data from 1st sample of 8 peopleGroup 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169

Recall xi is % (red + blue) M&M’sTest H0 : µ = 1/3, versus HA : µ 6= 1/3.What is tcalc?

0.374− 0.3330.0169

= 2.41

Compare to tcrit = 2.36 (based on (n-1) df, α = 0.05)Conclusion?

Reject H0 (also p = 0.047).For first 8 people, mean % (red + blue) M&M’s significantlydifferent than 1/3.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 119: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test for M&M’s

Consider M&M data from 1st sample of 8 peopleGroup 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169

Recall xi is % (red + blue) M&M’sTest H0 : µ = 1/3, versus HA : µ 6= 1/3.What is tcalc?

0.374− 0.3330.0169

= 2.41

Compare to tcrit = 2.36 (based on (n-1) df, α = 0.05)Conclusion? Reject H0 (also p = 0.047).

For first 8 people, mean % (red + blue) M&M’s significantlydifferent than 1/3.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 120: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test for M&M’s

For groups of 8 people, tcrit = 2.36Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169

tcalc = 2.41, p = 0.047Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203

tcalc = 1.138, p = 0.293Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123

tcalc = 1.175, p = 0.123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121

tcalc = 1.084, p = 0.314

For entire sample (n=32), tcrit = 2.04Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077

tcalc = 3.19, p = 0.003

Larger n: smaller SE(x̄), smaller tcrit , easier to reject H0 if H0not true.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 121: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test for M&M’s

For groups of 8 people, tcrit = 2.36Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169

tcalc = 2.41, p = 0.047Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203

tcalc = 1.138, p = 0.293Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123

tcalc = 1.175, p = 0.123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121

tcalc = 1.084, p = 0.314

For entire sample (n=32), tcrit = 2.04Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077

tcalc = 3.19, p = 0.003

Larger n: smaller SE(x̄), smaller tcrit , easier to reject H0 if H0not true.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 122: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

t-test for M&M’s

For groups of 8 people, tcrit = 2.36Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169

tcalc = 2.41, p = 0.047Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203

tcalc = 1.138, p = 0.293Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123

tcalc = 1.175, p = 0.123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121

tcalc = 1.084, p = 0.314

For entire sample (n=32), tcrit = 2.04Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077

tcalc = 3.19, p = 0.003

Larger n: smaller SE(x̄), smaller tcrit , easier to reject H0 if H0not true.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 123: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outline

1 Design of experiments: examples

2 Mean, standard deviation, standard error

3 M&M data

4 t-test

5 2-sample t-test

6 Importance of plotting the data

7 1-way ANOVA

8 Summary remarks

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 124: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

2-sample t-test

Dr. Forte now wants to compare the effect of ’superdrug’on weight loss in knockout mice (new) versus C57BL/6mice (just measured)

NOTE: Better to collect data on both species in the sametime period.

Let x̄1 and s1 be mean and SD of weight loss in 1st sample(C57BL/6), n1 = 4 observations.Let x̄2 and s2 be mean and SD of weight loss in 2ndsample (knockout), n2 observationsµ1 is population mean weight loss in C57BL/6 mice, µ2 inknockout mice.Null hypothesis of “no effect” or “no difference” isH0 : µ1 = µ2, so HA : µ1 6= µ2.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 125: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

2-sample t-test (cont.)

H0 : µ1 = µ2, HA : µ1 6= µ2

Same as H0 : µ1 − µ2 = 0 and HA : µ1 − µ2 6= 0.Estimate µ1 − µ2 by x̄1 − x̄2.tcalc = x̄1−x̄2

SE(x̄1−x̄2)

Assumptions of the 2-sample test: data from 2 populationsAre independentAre normally distributed (within population)Have the same variance (or: use modified estimate ofstandard error)

Reject H0 if |tcalc | > tcrit (tcrit depends on n1 + n2 and α)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 126: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Scenario 1

Pairs of mice are housed in the same cage with same foodsourceInterest is in effect of restricted food intakeMice fed 2x/day, only 1 mouse can eat at once

Is this a reasonable design?

Mice are not independent: if one mouse in a pair eatsaggressively, the other mouse cannot eat as much

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 127: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Scenario 1

Pairs of mice are housed in the same cage with same foodsourceInterest is in effect of restricted food intakeMice fed 2x/day, only 1 mouse can eat at once

Is this a reasonable design?

Mice are not independent: if one mouse in a pair eatsaggressively, the other mouse cannot eat as much

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 128: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Scenario 1

Pairs of mice are housed in the same cage with same foodsourceInterest is in effect of restricted food intakeMice fed 2x/day, only 1 mouse can eat at once

Is this a reasonable design?

Mice are not independent: if one mouse in a pair eatsaggressively, the other mouse cannot eat as much

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 129: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Scenario 2

12 mice receive standard food, 12 a new food.Interest: gain in head circumference, measured at end of 2weeksBob measures mice receiving standard food, Harrymeasures the othersBob and Harry do not know which mice received whichfood source

Is this a reasonable design? No

Food source is totally confounded with person who takesmeasurementsBob might tend to get smaller measurements on sameanimal (bias)Harry might tend to measure less accurately (largervariance)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 130: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Scenario 2

12 mice receive standard food, 12 a new food.Interest: gain in head circumference, measured at end of 2weeksBob measures mice receiving standard food, Harrymeasures the othersBob and Harry do not know which mice received whichfood source

Is this a reasonable design?

No

Food source is totally confounded with person who takesmeasurementsBob might tend to get smaller measurements on sameanimal (bias)Harry might tend to measure less accurately (largervariance)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 131: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Scenario 2

12 mice receive standard food, 12 a new food.Interest: gain in head circumference, measured at end of 2weeksBob measures mice receiving standard food, Harrymeasures the othersBob and Harry do not know which mice received whichfood source

Is this a reasonable design? No

Food source is totally confounded with person who takesmeasurementsBob might tend to get smaller measurements on sameanimal (bias)Harry might tend to measure less accurately (largervariance)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 132: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Forte’s 2-sample data

1 2

12

34

56

78

group

y

Do you think assumptions arereasonable?

(OK)H0 : µ1 − µ2 = 0, HA : µ1 − µ2 6= 0.tcalc = x̄1−x̄2

SE(x̄1−x̄2)

x̄1 = 5.5, x̄2 = 2.1SE(x̄1 − x̄2) = 1.19 (assumes samevariance in 2 groups)What is tcalc? 5.5−2.1

1.19 = 2.86tcrit = 2.45, from 6 df, α = 0.05What is your conclusion? (Reject H0)

95% CI for µ1 − µ2 is x̄1 − x̄2 ± tcrit × SE(x̄1 − x̄2) = (0.50, 6.30)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 133: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Forte’s 2-sample data

1 2

12

34

56

78

group

y

Do you think assumptions arereasonable? (OK)H0 : µ1 − µ2 = 0, HA : µ1 − µ2 6= 0.tcalc = x̄1−x̄2

SE(x̄1−x̄2)

x̄1 = 5.5, x̄2 = 2.1SE(x̄1 − x̄2) = 1.19 (assumes samevariance in 2 groups)What is tcalc?

5.5−2.11.19 = 2.86

tcrit = 2.45, from 6 df, α = 0.05What is your conclusion? (Reject H0)

95% CI for µ1 − µ2 is x̄1 − x̄2 ± tcrit × SE(x̄1 − x̄2) = (0.50, 6.30)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 134: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Forte’s 2-sample data

1 2

12

34

56

78

group

y

Do you think assumptions arereasonable? (OK)H0 : µ1 − µ2 = 0, HA : µ1 − µ2 6= 0.tcalc = x̄1−x̄2

SE(x̄1−x̄2)

x̄1 = 5.5, x̄2 = 2.1SE(x̄1 − x̄2) = 1.19 (assumes samevariance in 2 groups)What is tcalc? 5.5−2.1

1.19 = 2.86tcrit = 2.45, from 6 df, α = 0.05What is your conclusion?

(Reject H0)

95% CI for µ1 − µ2 is x̄1 − x̄2 ± tcrit × SE(x̄1 − x̄2) = (0.50, 6.30)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 135: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Forte’s 2-sample data

1 2

12

34

56

78

group

y

Do you think assumptions arereasonable? (OK)H0 : µ1 − µ2 = 0, HA : µ1 − µ2 6= 0.tcalc = x̄1−x̄2

SE(x̄1−x̄2)

x̄1 = 5.5, x̄2 = 2.1SE(x̄1 − x̄2) = 1.19 (assumes samevariance in 2 groups)What is tcalc? 5.5−2.1

1.19 = 2.86tcrit = 2.45, from 6 df, α = 0.05What is your conclusion? (Reject H0)

95% CI for µ1 − µ2 is x̄1 − x̄2 ± tcrit × SE(x̄1 − x̄2) = (0.50, 6.30)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 136: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Forte’s conclusions

Recall: Dr. Forte wants to compare the effect of’superdrug’ on weight loss in knockout vs C57BL/6mice.C57BL/6 mice: (x̄1 = 5.5), knockout mice (x̄2 = 2.1)We rejected H0, p = 0.03

What does “reject H0” mean here?

Data suggests that theaverage weight loss in C57BL/6 mice is significantlygreater than in knockout mice.

What does the p-value mean?If the 2 species had the same mean weight loss, we wouldhave observed a difference as great or greater 3% of thetime.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 137: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Forte’s conclusions

Recall: Dr. Forte wants to compare the effect of’superdrug’ on weight loss in knockout vs C57BL/6mice.C57BL/6 mice: (x̄1 = 5.5), knockout mice (x̄2 = 2.1)We rejected H0, p = 0.03

What does “reject H0” mean here? Data suggests that theaverage weight loss in C57BL/6 mice is significantlygreater than in knockout mice.

What does the p-value mean?

If the 2 species had the same mean weight loss, we wouldhave observed a difference as great or greater 3% of thetime.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 138: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Forte’s conclusions

Recall: Dr. Forte wants to compare the effect of’superdrug’ on weight loss in knockout vs C57BL/6mice.C57BL/6 mice: (x̄1 = 5.5), knockout mice (x̄2 = 2.1)We rejected H0, p = 0.03

What does “reject H0” mean here? Data suggests that theaverage weight loss in C57BL/6 mice is significantlygreater than in knockout mice.

What does the p-value mean?If the 2 species had the same mean weight loss, we wouldhave observed a difference as great or greater 3% of thetime.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 139: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outline

1 Design of experiments: examples

2 Mean, standard deviation, standard error

3 M&M data

4 t-test

5 2-sample t-test

6 Importance of plotting the data

7 1-way ANOVA

8 Summary remarks

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 140: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Plot the data: why?

1 2

5.6

5.8

6.0

6.2

6.4

6.6

6.8

group

y

Comments?

The data distribution looksreasonable

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 141: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Plot the data: why?

1 2

5.6

5.8

6.0

6.2

6.4

6.6

6.8

group

y

Comments?The data distribution looksreasonable

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 142: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Plot the data: why?

1 2

−4

−2

02

46

group

y

●●●●●●

Comments?

One large outlying observationA mistake?Non-normal distribution?

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 143: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Plot the data: why?

1 2

−4

−2

02

46

group

y

●●●●●●

Comments?One large outlying observation

A mistake?Non-normal distribution?

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 144: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Plot the data: why?

1 2

300

400

500

600

700

800

900

group

y

Comments?

Variability within groups notsimilarSame data as first plot -EXCEPT data isexponentiatedSuggest: use logtransformationKnowledge of what theoutcomes are can guidedecision about whether totransform

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 145: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Plot the data: why?

1 2

300

400

500

600

700

800

900

group

y

Comments?Variability within groups notsimilarSame data as first plot -EXCEPT data isexponentiatedSuggest: use logtransformationKnowledge of what theoutcomes are can guidedecision about whether totransform

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 146: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outline

1 Design of experiments: examples

2 Mean, standard deviation, standard error

3 M&M data

4 t-test

5 2-sample t-test

6 Importance of plotting the data

7 1-way ANOVA

8 Summary remarks

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 147: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Crescendo

Dr. Crescendo wants to test the effects of drugs A, B, andC on cell counts in 96-well platesSome things to consider

Is each drug tested on multiple plates? (plate effect)Are the wells at the edge of the plate different from others?(edge effect)Are the wells for drug A always read first? (order effect)

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 148: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Set up for 1-way ANOVA

Dr. Crescendo took 5 measurements from each of the 3treatments (A,B,C)µ1, µ2, µ3 are population means of treatment effects A, B,and C.Hypotheses:

H0 : µ1 = µ2 = µ3HA : at least 2 means are not equal.

How to test H0, HA?

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 149: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Pictures for 1-way ANOVA

1 2 3

45

67

8

group

y

1 2 3

45

67

8group

y

How do the 2 pictures differ?

Right: 3 means are more different from each otherRight: Variability between means is also more differentANOVA compares variability between treatments tovariability within treatments

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 150: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Pictures for 1-way ANOVA

1 2 3

45

67

8

group

y

1 2 3

45

67

8group

y

How do the 2 pictures differ?Right: 3 means are more different from each other

Right: Variability between means is also more differentANOVA compares variability between treatments tovariability within treatments

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 151: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Pictures for 1-way ANOVA

1 2 3

45

67

8

group

y

1 2 3

45

67

8group

y

How do the 2 pictures differ?Right: 3 means are more different from each otherRight: Variability between means is also more different

ANOVA compares variability between treatments tovariability within treatments

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 152: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Pictures for 1-way ANOVA

1 2 3

45

67

8

group

y

1 2 3

45

67

8group

y

How do the 2 pictures differ?Right: 3 means are more different from each otherRight: Variability between means is also more differentANOVA compares variability between treatments tovariability within treatments

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 153: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr Crescendo’s data

1 2 3

5.5

6.0

6.5

7.0

group

y

●●

Dr. Crescendo's data

Sample sizes:n1 = n2 = n3 = 5.Sample means: x̄1 =5.49, x̄2 = 6.07, x̄3 = 6.38.Sample variances: s2

1 =0.29, s2

2 = 0.48, s23 = 0.66

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 154: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

1-way ANOVA: basics

ANOVA assumes Var(x1) = Var(x2) = Var(x3)

Call this σ2

Variability within treatments is weighted average ofs2

1, s22, s2

3.degrees of freedom (df) = n1 + n2 + n3 − 3 (here: df=12)Dr. Crescendo: within treatment variance is3.016/12 = 0.25

Variability between treatments is weighted average of howfar apart group means are from overall mean

df = number groups - 1 (here: df=2)Dr. Crescendo: between treatment variance is2.063/2 = 1.03

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 155: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

1-way ANOVA: basics

ANOVA assumes Var(x1) = Var(x2) = Var(x3)

Call this σ2

Variability within treatments is weighted average ofs2

1, s22, s2

3.degrees of freedom (df) = n1 + n2 + n3 − 3 (here: df=12)Dr. Crescendo: within treatment variance is3.016/12 = 0.25

Variability between treatments is weighted average of howfar apart group means are from overall mean

df = number groups - 1 (here: df=2)Dr. Crescendo: between treatment variance is2.063/2 = 1.03

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 156: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

1-way ANOVA: basics

ANOVA assumes Var(x1) = Var(x2) = Var(x3)

Call this σ2

Variability within treatments is weighted average ofs2

1, s22, s2

3.degrees of freedom (df) = n1 + n2 + n3 − 3 (here: df=12)Dr. Crescendo: within treatment variance is3.016/12 = 0.25

Variability between treatments is weighted average of howfar apart group means are from overall mean

df = number groups - 1 (here: df=2)Dr. Crescendo: between treatment variance is2.063/2 = 1.03

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 157: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Testing in 1-way ANOVA (3 groups)

Recall: 1-way ANOVA comparesVariability within treatment: MS(Within) = MS(Error)Variability between treatment: MS(Between) = MS(Group)(Note: MS = mean square)

Dr. Crescendo:MS(Error) = 3.016

12 = 0.25MS(Between) = 2.063

2 = 1.03

ANOVA tableDf Sum Sq Mean Sq

Between 2 2.06337 1.03169Error 12 3.01612 0.25134

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 158: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Testing in 1-way ANOVA (3 groups)

Recall: 1-way ANOVA comparesVariability within treatment: MS(Within) = MS(Error)Variability between treatment: MS(Between) = MS(Group)(Note: MS = mean square)

Dr. Crescendo:MS(Error) = 3.016

12 = 0.25MS(Between) = 2.063

2 = 1.03

ANOVA tableDf Sum Sq Mean Sq

Between 2 2.06337 1.03169Error 12 3.01612 0.25134

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 159: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

1-way ANOVA: basics

If H0 trueHow do MS(Within) and MS(Between) compare?How could we estimate σ2?

MS(Between) ≈ MS(Within), and both estimate σ2

MS(Between)MS(Within) close to 1.

If H0 not true, same questionsOnly MS(Within) estimates σ2.

MS(Between) > MS(Within) so MS(Between)MS(Within) > 1

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 160: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

1-way ANOVA: basics

If H0 trueHow do MS(Within) and MS(Between) compare?How could we estimate σ2?MS(Between) ≈ MS(Within), and both estimate σ2

MS(Between)MS(Within) close to 1.

If H0 not true, same questionsOnly MS(Within) estimates σ2.

MS(Between) > MS(Within) so MS(Between)MS(Within) > 1

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 161: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

1-way ANOVA: basics

If H0 trueHow do MS(Within) and MS(Between) compare?How could we estimate σ2?MS(Between) ≈ MS(Within), and both estimate σ2

MS(Between)MS(Within) close to 1.

If H0 not true, same questions

Only MS(Within) estimates σ2.

MS(Between) > MS(Within) so MS(Between)MS(Within) > 1

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 162: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

1-way ANOVA: basics

If H0 trueHow do MS(Within) and MS(Between) compare?How could we estimate σ2?MS(Between) ≈ MS(Within), and both estimate σ2

MS(Between)MS(Within) close to 1.

If H0 not true, same questionsOnly MS(Within) estimates σ2.

MS(Between) > MS(Within) so MS(Between)MS(Within) > 1

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 163: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Testing in 1-way ANOVA (3 groups)

H0 : µ1 = µ2 = µ3, HA : at least 2 means are not equal.

ANOVA tableDf Sum Sq Mean Sq F value Pr(>F)

Between 2 2.06337 1.03169 4.1047 0.04383Error 12 3.01612 0.25134

Test of H0 is Fcalc =MS(Between)

MS(Error) = 4.10.

If H0 true (and assumptions met!), Fcalc ∼ F2,12

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 164: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Testing in 1-way ANOVA (3 groups)

H0 : µ1 = µ2 = µ3, HA : at least 2 means are not equal.

ANOVA tableDf Sum Sq Mean Sq F value Pr(>F)

Between 2 2.06337 1.03169 4.1047 0.04383Error 12 3.01612 0.25134

Test of H0 is Fcalc =MS(Between)

MS(Error) = 4.10.

If H0 true (and assumptions met!), Fcalc ∼ F2,12

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 165: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Testing in 1-way ANOVA (3 groups)

H0 : µ1 = µ2 = µ3, HA : at least 2 means are not equal.

ANOVA tableDf Sum Sq Mean Sq F value Pr(>F)

Between 2 2.06337 1.03169 4.1047 0.04383Error 12 3.01612 0.25134

Test of H0 is Fcalc =MS(Between)

MS(Error) = 4.10.

If H0 true (and assumptions met!), Fcalc ∼ F2,12

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 166: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Some F distributions

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

df=2,12df=3,12df=8,12df=8,100

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 167: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Dr. Crescendo’s Fcalc

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

F dist, 2 and 12 df5% cutpt = 3.89

Generally: only large valuesof Fcalc considered unusual(1-tail)Reject H0 ifFcalc > F2,12,.95 = 3.89.Dr. Crescendo:Fcalc = 1.03/0.25 = 4.10,p = 0.044. Reject H0.

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 168: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Outline

1 Design of experiments: examples

2 Mean, standard deviation, standard error

3 M&M data

4 t-test

5 2-sample t-test

6 Importance of plotting the data

7 1-way ANOVA

8 Summary remarks

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 169: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Interacting with statistician

“What questions should I ask the statistician?”

Who asks the first questions?I ask: “Tell me about your experiment and your question.”We need some background

If designing an experiment:I ask: “What is your research question (explain in terms ofdata you plan to collect)?”We can help you

Decide what data to collect to answer questionHow to design experimentFigure out reasonable sample size (but need furtherinformation from you: estimated effect)

For data already collectedI ask: “What is your research question (explained in termsof data)?”Plots of data helpful

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 170: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Interacting with statistician

“What questions should I ask the statistician?”Who asks the first questions?

I ask: “Tell me about your experiment and your question.”We need some background

If designing an experiment:I ask: “What is your research question (explain in terms ofdata you plan to collect)?”We can help you

Decide what data to collect to answer questionHow to design experimentFigure out reasonable sample size (but need furtherinformation from you: estimated effect)

For data already collectedI ask: “What is your research question (explained in termsof data)?”Plots of data helpful

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 171: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

How a statistician can help

Design of experimentStatistician must understand study aims and backgroundCan help investigator decide

what data to collecthow to design data collection processhow to use data to answer research questions

AnalysisDetermine appropriate modelUnderstand and check assumptions (use different model ifviolated)Carry out analysis

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 172: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

What does Biostatistics Dept offer?

CoursesBST463 = Introduction to Biostatistics (fall semester)

Consulting serviceRotation system: faculty and PhD student or postdocFree assistance with grant preparation if appropriate %effort for BiostatisticsFree 1-hour consultation on anythingIf related to CTSI, voucher possible for up to 10 free hoursStudent projects: faculty mentor must attend initial meetingand most other meetings with BiostatisticiansFaculty advisor can contact Susan Messing([email protected]) for appointment,see http://www.urmc.rochester.edu/biostat/consulting/

Sally W. Thurston, PhD Basic Statistics: MBI-540

Page 173: Basic Statistics: MBI-540 · Basic Statistics: MBI-540 Sally W. Thurston, PhD Associate Professor Department of Biostatistics and Computational Biology University of Rochester January

Parting remarks

Statistics is not just “crunching numbers”Requires understanding of model and assumptions.

Good experimental design and data collection is keyPoor design: may be impossible to make valid inferenceabout population.Biased data collection: may never be able to adjust for bias.Good design: enables statistical inference about populationof inference based on data collected.

Sally W. Thurston, PhD Basic Statistics: MBI-540