standard deviation and standard error tutorial this is significantly important. get your ap...

29
Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Upload: erika-short

Post on 17-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Standard Deviation andStandard Error Tutorial

This is significantly important.Get your AP Equations and Formulas sheet

Page 2: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

The Basics

• Let’s start with a review of the basics of statistics.• Mean: What most people consider “average.”– The sum of all scores divided by the number of scores.• The mean is good for the average of normally distributed data.

• Median: The middle number when data is ordered.– If you have an even number, it’s the mean of the two

middle points.• The median is good for the average of data that is not normally

distributed.

• Mode: The most frequently-seen value in the data.– 0 if no data points repeat.

Page 3: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Data Distribution• Feast your eyes on this data and try to get a rough

sense of how a histogram (frequency chart) would look. Where would the peak be?

Height of plants (cm) # of Plants

0.0-0.9 31.0-1.9 102.0-2.9 213.0-3.9 304.0-4.9 205.0-5.9 146.0-6.9 2

Distribution Chart of Heights of 100 Control Plants

Page 4: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Data Distribution• This is a normal distribution, also known as a

bell curve.– The majority of individuals are “medium.”

0.0-0.9 1.0-1.9 2.0-2.9 3.0-3.9 4.0-4.9 5.0-5.9 6.0-6.90

5

10

15

20

25

30

35Number of Plants in each Class

Page 5: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Abnormal Distribution?

• Human height is a fairly normal distribution.– Average U.S. woman (age 20+) is 5’ 4”.– Average U.S. man (age 20+) is 5’ 9.5”.– About 50% of people are at or above average and 50% are

at or below average.• What, then, is not a normal distribution?• Imagine if most women are 5’ 4”, but no one is taller.– That’s not a normal distribution, and it won’t be a bell

curve.

Page 6: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Abnormal Distribution

• The same goes for test scores.• If we get an average of 80% on a test, we

don’t necessarily have a normal distribution.– That’s why the median is better than the mean for

test scores.• Imagine if the average were a 100% –

definitely not a normal distribution.

Page 7: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Back to Standard Deviation/Error

• Suppose two students take a test.– One gets a 100%, one gets a 0%.– What’s the mean?

• 50%.

• Suppose two students take a test.– One gets a 50%, one gets a 50%.– What’s the mean?

• 50%.

• So it’s the same mean, but we got there very differently. This could mean a lot about the test.

• Variance measures the average “difference” from the mean in a set of data.

Page 8: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Variance

• Variance is given by the symbol s2.• A high variance is indicative of a lot of

deviation from the mean.• A low variance is indicative of relatively stable

values.

Page 9: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Calculating Variance

• Σ is “sum of” – you need to perform the numerator operation for each number in the data set.

• xi is an individual number in your data set.• (read: “x bar”) is the mean for your data.x̄�• n is your sample size.

Page 10: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

_

Sample Samples

• Let’s try calculating the variance:

Plant Height(cm)

Deviations from mean

Squares of deviation

from meanDivided by n-1

(xi) (xi- x) (xi- x)2

A 10 2 4

B 7 -1 1

C 6 -2 4

D 8 0 0

E 9 1 1

Mean = 8 Σ (xi- x)2 = 10 10 / (5-1) = 2.5

_ _

Page 11: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Whoo, variance! Now what?

• The standard deviation is simply the square root of the variance.– So its symbol is s.

• In our example, s2 (variance) is 2.5, so s (standard deviation) is 1.58.

• Now, you may be asking why we bother taking this statistic, if variance seems to do the same thing.– The reason is that we can make some inferences and statements

about the data in the same way we used chi-squared tables to make inferences about the role of chance.

1

) - (

2

n

xxs i

Page 12: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Standard Deviation (SD) Inferences

• If you assume a normal distribution of data, 68.27% of data is within 1 SD of the mean.– No real difference.

• 95.45% of the data is within 2 SD.– Anything outside is

probably an outlier.• 99.73% of the data is

within 3 SD.– Anything outside is almost

definitely an outlier.

Page 13: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Standard Deviation (SD) Inferences

• Suppose the average height of a population is 6 feet (SD = 0.5 feet).

• If the population is normally distributed:• 68.27% of the population is between 5.5’ and 6.5’.• 95.45% of the population is between 5’ and 7’.• 99.73% of the population is between 4.5’ and 7.5’.

Page 14: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Standard Deviation

• The standard deviation (and mean/variance) allow us to learn something about an entire population from just a sample.– Assuming a normal distribution.– For example, if we took a sample of pro basketball

players’ heights, we could generalize the raw data of our sample to the entire NBA.

• Key: The more samples we take, and therefore the more “means” we determine, the closer we’ll get to the actual mean of the entire league.

Page 15: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Standard Error

• The standard error of the means (SEM) (or just plain standard error) is a way to determine how likely our data is off from reality due to chance.– Oddly a little like x2.

• Example: Consider the NBA player height survey.• We could sample 10 players and get the average height, and

get the standard deviation from that.• However, if we continued to sample 10 players over and

over and over again, the mean of our calculated means would start to become more like the true mean.– Standard error of the means helps us figure out how close our

calculated mean is to the true mean, even without knowing it.

Page 16: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Standard Error

• Put it another way:– If we survey 10 players, that’s a low number.– Is it likely that those 10 players perfectly represent

the league?• Probably not.

– If we survey 300 players, that’s a high number.– Is it likely that those 300 players perfectly

represent the league?• Probably.

Page 17: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Standard Error

• The formula for standard error should now make sense:

• s = standard deviation• n = sample size• The standard error is best when it is closest to 0.

Page 18: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Standard Error vs. Standard Deviation

• Key: Standard deviation is the deviation of the raw data from the sample’s mean.– Think the deviation of an NBA player’s height from

the average of a surveyed population.• Key: Standard error is the deviation of the

sample from the actual population’s mean.– Think the deviation of our surveyed population’s

mean height from the true mean height of an NBA player from the entire league.

Page 19: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

One last way to understand this…

• Remember the potato cores?• You can calculate the average potato core

mass, but that doesn’t tell us how consistent the mass was.– That’s why we have standard deviation.

• Once you get a mean for your samples, it also doesn’t tell us if your set of potato cores was representative of all the cores I was slicing.– That’s why we have standard error.

Page 20: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Standard Error vs. Standard Deviation

• Interpreting data:– Generally you want standard deviation low.– This means your underlying data set is more consistent.• Why is that important?

– You definitely want standard error low.– How can we minimize standard error?• Have a low standard deviation (out of our control).• Have a large sample size (in our control).

Page 21: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Confidence Intervals & Error Bars

• In addition to the inferences about data from before (68% within one SD, et cetera), we also can make inferences using SEM.– These are more important for biology.

• Traditionally, 95% is the confidence we need in our data (just like in chi-squared analyses).

• For SEM, 95% confidence is a confidence interval represented on a graph as error bars.– Let’s take a closer look.

Page 22: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Confidence Intervals & Error Bars

• Suppose you want to see if Central Bucks HS students are significantly taller than Council Rock HS students.– You can’t do a x2 analysis because there’s no “expected.”

• So, you take the mean of some of the students from each district.– You can’t measure all of them – that’d take forever.

• You get the SD and SEM as shown:

• Let’s graph the means.

Team Mean Standard Deviation Standard ErrorCouncil Rock 72 in. 6 in. 1.90 in.Central Bucks 80 in. 4 in. 1.26 in.

Page 23: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Mean Height of High School StudentsH

eigh

t (in

)

86

84

82

80

78

76

74

72

70

68

66

DistrictCouncil Rock Central Bucks

Page 24: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Confidence Intervals & Error Bars

• Okay, now let’s figure out a 95% confidence interval.• The 95% confidence interval is traditionally ± 2

SEM about the mean.• In this case:– C. Rock = 72 in ± 3.80 in (since 1.90 in * 2 = 3.80 in)– C. Bucks = 80 in ± 2.52 in (since 1.26 in * 2 = 2.52 in)

• Now let’s draw the intervals on the graph.

Team Mean Standard Deviation Standard ErrorCouncil Rock 72 in. 6 in. 1.90 in.Central Bucks 80 in. 4 in. 1.26 in.

Page 25: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Mean Height of High School StudentsH

eigh

t (in

)

86

84

82

80

78

76

74

72

70

68

66

DistrictCouncil Rock Central Bucks

The shapes are the 95% confidence

intervals. Since they don’t overlap

between the districts, there is probably a

significant difference between the heights

of the two.

Page 26: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Confidence Interval “Frame of Mind”

• When you construct a graph with confidence intervals and find they do overlap, it suggests insignificant (null) results.– It’s possible that the real average height of ALL Council

Rock students is actually equal to the same for the Central Bucks.• This is also known as sampling error.

– In other words, there is some average height, within both confidence intervals, that could make the two teams equal.

• If there is no overlap, it suggests significance.

Page 27: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Practice

• Standard Deviation and Standard Error Procedural Practice

Page 28: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Practice

• How else are we going to practice standard deviation and standard error?– With your data!

• Find in your lab notebooks the measurements you took on potato core size.

• Calculate the standard deviation and standard error for your data set with your lab group.– See why I had you take their masses individually?

Page 29: Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

Practice

• Calculate standard deviation:– What is the SD of your set of three cores before the

study and the SD of your three cores afterward?• Calculate the standard error:– For each set of data, how likely is our average

potato mass was close to the actual average potato mass of all the slices I cut for our lab?

• No error bars needed.• Last Key Note: Your units for SD and SE match

the units of the mean (here it’s grams).