lecture 4 variability: standard deviation. variability reminder - how spread out the scores...

Lecture 4

Variability: Standard Deviation

Variability

Reminder - How spread out the scores are…Range - How does the range of each of these distributions vary? Or the Interquartile range?

Measure of error - is our sample similar to the population OR is an individual score representative of its sample

Standard Deviation Standard deviation - the average distance on

either side of the mean. Goal of the SD is to measure the standard or

typical distance from the mean.– But it’s not practical with large N, so we need to

estimate the variance and standard deviation using equations

60

62

64

66

68

70

72

74

76

Ben Tom Bill James Matt

He

igh

t (i

n.)

• Mean = 70.8

•Ben is 66 in. tall. His deviation from the mean is -4.8.

•James is 75 in. tall. His deviation from the mean is 4.2

How much scores typically vary around the mean; a measure of dispersion

Usually 1/5 - 1/6 of the range Based on the mean, therefore:

– Requires at least interval data– Sensitive to outliers– accounts for all

scores in a distribution

Standard Deviation

f

1 2 3 4 5 6 7 98

M

Logic of the Standard Deviation:Let’s start by looking at the population Step 1: Find the Deviation for each

score from the mean. X - . Be sure to include both the sign (+/-) and the number. X X -

65 -1490 +1184 +576 -381 +298 +1982 +356 -23

= 79 0

* Notice that the sum of the deviations = 0. This reflects the fact that the mean is a balancing point

* Bonus - you can use this fact to check yourselves

Step 2 - Remember the standard deviation is the average of the deviations, but this won’t work because the sum of our deviations = 0– Solution = get rid of the signs (+/-)– Square each score

Square of each score and sum them = Sum of Squared Deviations

= SS

X X - (X – )265 -14.4 207.490 10.6 112.484 4.6 21.276 -3.4 11.681 1.6 2.698 18.6 346.082 2.6 6.859 -20.4 416.2

X = 79.4 0 1123.9 * Sum of Squared Deviations = SS

Step 3 - Calculate the mean squared deviation = SS / N

This value is called the variance and is represented with the symbol MS or 2 .

Variance will be important for use in inferential stats methods, but it isn’t the best descriptive stat.

-- it’s hard to visualize variability with

the variance alone.

X X - (X – )265 -14.4 207.490 10.6 112.484 4.6 21.276 -3.4 11.681 1.6 2.698 18.6 346.082 2.6 6.859 -20.4 416.2

X = 79.4 0 1123.9

MS = 1123.9 / 8 = 140.5

* Sum of Squared Deviations = SS

Step 4: Correct for having squared all the deviations because we want a value that easily corresponds to the mean that we can visualize:– Standard deviation = variance

X - (X – )2207.4112.4

346.0

416.2

1123.9

X65 -14.490 10.684 4.6 21.276 -3.4 11.681 1.6 2.698 18.682 2.6 6.859 -20.4

X = 79.4 0

140.5 = 11.9Standard deviation = the square root of the mean squared deviation

Conceptually the average distance from the mean: on average a random point pulled from this distribution will be 11.9 away from the mean.

Putting it Together

X - (X – )2207.4112.4

346.0

416.2

1123.9

X65 -14.490 10.684 4.6 21.276 -3.4 11.681 1.6 2.698 18.682 2.6 6.859 -20.4

X = 79.4 0

= 11.9 What can we say about a score that lies 12 points from the mean, 91 points?

What about a score that lies 30 points from the mean, 49 points?

REVIEW: variance = mean squared deviation = greek lower case letter sigma 2 = SS / N

Standard deviation = = SS/ N Computing SS:

– Definitional formula: SS = (X - )2

Shows exactly how scores vary about the mean (like we just did). Works best on whole numbers.

– Computational formula: SS = X2 - [ (X)2 / N]

Easier for calculations because it works directly with the scores, but less intuitive about the mean.

Population Standard Deviation

Formulas for Pop. SD and Variance

Variance = SS / N (mean squared deviation)

Standard deviation = SS/N

Denoted by Greek letters and 2

Let’s Do It TogetherX X - (X - )2 X2 (X)2 2 24

24

28

32

33

48

64

42

38

67

55

455

-17.4

-17.4

-13.4

-9.4

-8.4

6.6

22.6

0.6

-3.4

25.6

13.6

0

302.8

302.8

179.6

88.4

70.6

43.6

510.8

.36

11.6

655.4

185

2351

576

576

784

1024

1089

2304

4096

1764

1444

4489

3025

21171

207025 213.7 14.6

Definitional:SS = (X - )2

Computational:

SS = X2 - [ (X)2 / N]

Another Example… Find for the following sets of numbers X = 1, 7, 7, 9 X = 1, 6, 1, 1, 1, 1

X X2 (X)2 2 10

15

17

21

24

31

13

Definitional:SS = (X - )2

Computational:

SS = X2 - [ (X)2 / N]

Samples vs. Populations Rationale: Inferential statistics rely on

samples to draw general conclusions about the population.– PROBLEM - sample variability tends to be

less than population variability.– Thus, this variability is biased. That is, it

underestimates the pop. variability. pop. variability

xx

xx

xx

sample variability

Terms Biased - a sample statistic is said to be

biased if on the average the sample statistic consistently underestimates or overestimates the population parameter.

Unbiased - a sample statistic is said to be unbiased if on average the sample statistics is equal to the population parameter

An Analogy for a Biased Stat Imagine you were interested in studying

learning in elementary school children.– What if you chose as your sample child

geniuses from computer and science camp?

– Could you generalize from your sample to the population of elementary school children?

A sample statistic for SD will be biased even with a representative sample - We have to perform a correction

Samples: s and Changes in notation to reflect a sample:

– So to calculate SS (same as for pop.):• (1) Find deviation: X - M• (2) Squared each deviation: (X - M)2

• (3) Sum squared devations: SS = (X - M)2

Correcting for the bias is done in the calculation for the mean square deviation or variance:– Sample variance - s2 = SS / (n - 1)– Sample standard deviation = s = SS / (n - 1)

or s = s2

Let’s Do it TogetherXf

1 2 3 4 5 6 7 98

X X2

4

5

6

6

6

7

7

7

8

8

8

8

9

9

98

16

25

36

36

36

49

49

49

64

64

64

64

81

81

714

The smallest distance from the mean is 1 and the largest distance is 3, so the SD should be somewhere in between.

SS = 714 - (982 / 14) = 28

* NOTE: do not correct for bias in SS

S2 or MS = SS / (n-1)

S2 or MS = 28 / 13 = 2.2

S = 2.2 = 1.5

SS = X2 - [ (X)2 / n]

Start Easy: Find s

X = 5, 1, 5, 5

X = 1, 7, 1, 1

•NOTE: do not correct for bias in SS

S2 or MS = SS / (n-1)

S = S2

SS = X2 - [ (X)2 / n]

A little more complexX X^2

322.84336.63368.80276.84512.20285.05239.68262.86302.13300.12326.62257.65429.81291.71263.15323.49

SS = X2 - [ (X)2 / n]

MS or S2 = SS / n-1

s = SS / (n - 1)

104223.10113319.68136011.7376638.36

262348.6881251.0757446.6369094.1291283.7290071.31

106683.0666383.09

184733.6685093.6669247.72

104644.41

5099.6 1698474.01

SS = 1698474.01 - (26005920.2 / 16)

MS = 73104 / 15

s = 69.8

Sample Variability and Degrees of Freedom: Why do we correct with n-1?

(1) the deviations computed from a sample are not “real” deviations.

Sampling error - sample and pop. are close, but not exact. SS is smaller for the sample - math. proof Using a sample mean places a restriction on the variability

X X - (X - )2 X X - M (X - M)2 12

8

10

+4

0

+2

16

0

4SS = 17

Where = 8

12

8

10

+2

+2

+2

4

4

4

SS = 12Where M = 10

More about n -1 Sample mean is known before

deviations and SS can be computed.

Sample of n=3 with a M=10. Therefore, as soon as the first two values are given X = 12, 8 you know the last value is 10.

n-1 scores can vary; the last score is not free to vary

X X - (X - )2 X X - M (X - M)2 12

8

10

+4

0

+2

16

0

4SS = 17

Where = 8

12

8

10

+2

+2

+2

4

4

4

SS = 12Where M = 10

Degrees of Freedom df commonly encountered as n - 1, where n is

the number of scores in the sample Refers to the number of scores in a distribution

that are free to vary once the M & n are set

Example{5, 10, 15}; n = 3; M = 10

How many scores could you change and still

have n = 3 & M = 10?

n = 1 or 2

So, s2 = SS / n-1 = SS / df

Cafeteria degrees of freedom: An analogy

You are 4th in line at the cafeteria to choose your dessert. The choices are a cheesecake, a piece of fruit, pumpkin pie, and a stale cookie.– The first person chooses the cheescake– Next to go is the apple– Then the pumpkin pie– The last choice is restricted and can’t vary.

You are stuck with the stale cookie

Degrees of Freedom Why n - 1?

– Because you are estimating the from M. Once this is done, the estimate is fixed & cannot be changed. Therefore, you can only vary N - 1 scores with this fixed value

This is the case whenever we are estimating a parameter from a statistic.

A little more about biased stats Population N=6 (0, 0, 3, 3, 9, 9) = 4, 2 =14 Take all possible n = 2 samples

Biased variance unbiased varianceSample First score Second score Mean n n-1

1 0 0 0 0 02 0 3 1.5 2.25 4.53 0 9 4.5 20.25 40.54 3 0 1.5 2.25 4.55 3 3 3 0 06 3 9 6 9 187 9 0 4.5 20.25 40.58 9 3 6 9 189 9 9 9 0 0

36 63 126

Properties of the Standard Deviation

Distribution:– Homogeneous sample: data values are

very similar = small s2 and s.– Heterogeneous sample: data values are

dissimilar = big s2 and s.

Helps make predictions about the amount of error in your sample. How close is your sample to the population

Properties of the Standard Deviation Transforming scores:

Adding or subtracting a constant does not change the SD

f

1 2 3 4 5 6 7 98 3 4 5 6 7 8 9 1311

Another way to determine if the SD is affected by a constant is to pick any two scores and calculate the distance between the two both before and after the constant

e.g. you and a friend compare scores on an exam your friend earned a 85 and you earned a 90. Later you find out that a 5 point curve was added to everyone’s score.

Properties of the Standard Deviation Transforming scores:

Multiplying or dividing by a constant changes SD by that amount

f

Another way to determine if the SD is affected by a constant is to pick any two scores and calculate the distance between the two both before and after the constant

1 2 3 4

f

10 20 30 40

1 10

Factors that affect Variability Extreme Scores:

– Range is most affected– SD and variance somewhat affected– SIR not affected

Sample Size:– Range is directly related to sample size.

This is unacceptable.– SD, variance, and SIR unaffected by

sample size Open-ended Distributions:

– Cannot computer range, SD, or variance– SIR is your only option

Relationship with other Statistics SD is derived using information about

the mean (distances) - the two go hand-in-hand

Interquartile range (& SIR) are based on percentiles, so is the median (mdn is 50th percentile)

Range has no direct relationship with any other statistical measures

Why we need to know this information Variability influences how easy it is to

see patterns in our data….

Estimate M for each sample

Sample 1 Sample 2

X

34

35

36

35

X

26

10

64

40

Why we need to know this information Keep the goal in mind:

– Research uses samples to deduce information about the population

– Consider the data from two experiments and determine whether or not there appears to be a consistent difference

f

Talk therapy = M = 20

Meditation = M = 40

5 10 15 20 25 30 35 40 45 50 60

f

5 10 15 20 25 30 35 40 45 50 60

Experiment 1 Experiment 2

Graphical Representation of

f

1 2 3 4 5 6 7 98

=1.58

Graphic Representation - Box Plots Also called box-and-whisker plots Useful for

– comparing distributions– displaying variability

Box defines the interquartile range– Top line defines the third quartile– Bottom line defines the first quartile

Whiskers extend out to the highest and lowest scores

Median is often displayed by a line

Graphic Representation - Boxplots

Pearson’s Coefficient of Skew Pearson’s coefficient of skew tells us if a distribution

is positive or negatively skewed and how much (+/- 0.5 is approximately symmetric/normal)

s3 = [3(M - mdn)] / s

M = 20, s = 5, md = 24

s3 = [3(20 - 24)] / 5 s3 = -2.4

Negatively skewed

Try one M = 50, Mdn = 30, s = 7

s3 = [3(M - mdn)] / s

X

1

2

3

4

5

6

7

8

9

10

11

12

13

f

1

1

1

1

1

2

4

5

6

9

11

6

2

Putting it all together…

Find Pearson’s coefficient of skew

s3 = [3(M - mdn)] / s

For this table s = 2.74

Homework: Chapter 4

1, 3, 4, 6, 8, 11, 12, 14, 19, 20, 23, 24, 25

Read IN THE LITERATURE pg 122-123.

Skim Chapter 6 pages 161 - 166; section on Probability.

** BRING YOUR TEXT BOOKS TO CLASS TOMORROW**

lecture 4 variability: standard deviation. variability reminder - how spread out the scores...

Documents