1.2: describing distributions with numbers

58

Upload: nolan-copeland

Post on 01-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

1.2: Describing Distributions with Numbers. Measures of Location (Center) Mean Median B. Measures of Spread (Variability) Quartiles (Quantiles) Variance and Standard deviation. Measures of Location. 1. Mean (Average). How to find the mean (average):. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 1.2: Describing Distributions              with Numbers
Page 2: 1.2: Describing Distributions              with Numbers

1.2: Describing Distributions with Numbers

A. Measures of Location (Center) Mean Median

B. Measures of Spread (Variability) Quartiles (Quantiles) Variance and Standard deviation

Page 3: 1.2: Describing Distributions              with Numbers

Measures of Location

How to find the mean (average):

1) Add the values together

2) Divide the total by the number of observations

• Example: Test Scores : 56, 65, 54, 55, 57, 54, 61, 62, 60, 55, 57, 56, 57, 61, 62, 60, 49, 66, 59, 80

Step 1 : 56 + 65 + 54 + …… + 59 + 80 = 1186

Step 2 : 1186 / 20 = 59.3

Mean

1. Mean (Average)

Page 4: 1.2: Describing Distributions              with Numbers

Mean

To find the mean x of a set of observations, add their values anddivide by the number of observations. If the n observations arex , x , x , ….. , x , their mean is :1 2 3 n

Or, in more compact notation:

x =x

1x

nx

3x

2+ + + +...

nx

x = xi

Page 5: 1.2: Describing Distributions              with Numbers

2. MedianHow to find the median M :

1) Arrange the observations in order from smallest to largest.

2) If the number of observations is odd, then the median is located at the center of the list. So, if there are n observations,then the median is located in spot (n + 1) / 2

3) If the number of observations is even, then the median isthe average of the two terms in the middle spots. These arelocated in spots (n / 2) and (n / 2) + 1

Page 6: 1.2: Describing Distributions              with Numbers

Median

Example of finding a Median :

List 1 : 2, 4, 6, 3, 5, 2, 6, 8, 10, 11, 1

Step 1: Order the list :

1, 2, 2, 3, 4, 5, 6, 6, 8, 10, 11

Step 2 : Find the middle term2 : (n+1) / 2 = (11 + 1) / 2 = 6

1, 2, 2, 3, 4, 5, 6, 6, 8, 10, 11

Median

Page 7: 1.2: Describing Distributions              with Numbers

MedianExample of finding a Median :

List : 2, 4, 6, 3, 5, 2, 6, 8, 10, 11, 1, 12

Step 1: Order the list :

1, 2, 2, 3, 4, 5, 6, 6, 8, 10, 11, 12

Step 2 : Find the two middle terms :

1, 2, 2, 3, 4, 5, 6, 6, 8, 10, 11, 12

Median = (5 + 6) /2

n / 2 = 12 / 2 = 6 (n / 2) + 1 = (12 / 2) + 1 = 7

Step 3 : Average the sixth and seventh terms :

= 5.5

Page 8: 1.2: Describing Distributions              with Numbers

In The Presence Of Outliers

Q: Do outliers affect the Mean and Median?

Consider the list on numbers from 1 through 9 :

1, 2, 3, 4, 5, 6, 7 ,8 ,9

The Mean is : 5 The Median is : 5

What if we put the number 100 at the end of the list :

The Mean is :

1, 2, 3, 4, 5, 6, 7 ,8 ,9, 100

14.5 The Median is : 5.5

A: Outliers affect the Mean much more than the Median !

Page 9: 1.2: Describing Distributions              with Numbers

Distributions

The mean is the point at which a histogram balances. For symmetric distributions the mean and median will be nearlythe same.

However, since the mean is influenced by outliers, for skewed distributions the mean will be pulled in the direction of the long tail while the median will be resistant to the outliers and remain in nearly the same place.

Page 10: 1.2: Describing Distributions              with Numbers

Skewed Right

M X

Page 11: 1.2: Describing Distributions              with Numbers

Skewed Left

X M

Page 12: 1.2: Describing Distributions              with Numbers

Describing SpreadThe Five Number Summary :

1) The Median

2) First Quartile : 25% of the observations lie below the First Quartile

3) Third Quartile : 75% of the observations lie below the third quartile

4) Lowest Individual Observation (Minimum)

5) Highest Individual Observation (Maximum)

Page 13: 1.2: Describing Distributions              with Numbers

QuartilesCalculating the Quartiles :

1) Arrange the observations in increasing order and locate the Median M in the ordered list o’ observations.

2) The First Quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median.

3) The Third Quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median.

Page 14: 1.2: Describing Distributions              with Numbers

Quartiles

Example of calculating First Quartile :

List of quiz scores: 10, 8, 9, 4, 6, 6, 8, 9, 2, 7

1) Order the list: 2, 4, 6, 6, 7, 8, 8, 9, 9, 10

Find the median: (7 + 8) / 2 = 7.5

2) Find all the observations whose position in the list is to the left of the median : 2, 4, 6, 6, 7, 8, 8, 9, 9, 10

Find the median of these values : 6

Page 15: 1.2: Describing Distributions              with Numbers

Quartiles

Example of calculating Third Quartile :

List of quiz scores: 10, 8, 9, 4, 6, 6, 8, 9, 2, 7, 11

1) Order the list: 2, 4, 6, 6, 7, 8, 8, 9, 9, 10, 11

Find the median: 8

2) Find all the observations whose position in the list is to the right of the median : 2, 4, 6, 6, 7, 8, 8, 9, 9, 10, 118, 9, 9, 10, 11

Find the median of these values : 9

Page 16: 1.2: Describing Distributions              with Numbers

Interquartile RangeThe interquartile range , IQR, is the distance between the firstquartile and the third quartile.

Determining OutliersCall an observation a suspected outlier if it falls more than1.5 * IQR above the third quartile or below the first quartile.

Example : Imagine we have a bunch of test scores with Q1 = 50 andQ3 = 80.

The IQR = 80 - 50 = 30 So, 1.5 * IQR = 1.5 * 30 = 45

This means that if there are any scores above Q3 + 45 = 125or any scores Q1 - 45 = 5, then these scores are suspected outliers.

Page 17: 1.2: Describing Distributions              with Numbers

Boxplot• A Boxplot is a graph of the five number summary. A central box spans the quartiles, with a line marking the median. Whiskers extend out from the box to the extremes.

Example: Low = 47, High = 98, Median = 77, Q1 = 65, Q3 = 85

0

30

10

50

70

90

Median (77)

Q1 (65)

Q3 (85)

Lowest Observation (47)

Highest Observation (98)

Page 18: 1.2: Describing Distributions              with Numbers

Describing Spread2. The Standard Deviation

• Variance: The variance of a set of observations is an “average” of the deviations of the observations from the mean.

• Standard Deviation: The SD is the square root of the variance.

• Note: You divide by (n - 1) instead of n.

Page 19: 1.2: Describing Distributions              with Numbers

Describing SpreadThe Standard Deviation

Example : Test Scores : 65, 77, 83, 80, 95

1) Find the average : 80

2) Find the deviations from the mean, and their squares

Obs Deviation from Mean Deviations Squared

65 -15 22577 -3 983 3 9

80 0 0

95 15 225

Page 20: 1.2: Describing Distributions              with Numbers

Describing SpreadThe Standard Deviation

3) Determine the mean of the squares:

Variance4) Determine the Standard Deviation:

117 = 10.8

(225 + 9 + 9 + 0 + 225)

(5 - 1)= 117

Page 21: 1.2: Describing Distributions              with Numbers

More Fancy NotationThe variance of a set of observations is the average of the squaresof the deviations of the observations from their mean. In symbols, the variance on n observations , , is :

s 2

xn

x2

x1

...

s 2 =(x - x )

n

2

(x - x )2

2

(x - x )1

2

+++ ...

n - 1

or, in more compact notation :

s 2 = (x - x )i

21

n-1

The standard deviation s is the square root of the variance :s 2

(x - x )i

21 s =

n-1

Page 22: 1.2: Describing Distributions              with Numbers

Another Example of Standard DeviationConsider the following years in our past :

1792, 1666, 1362, 1614, 1460, 1867, 1439

Find the standard deviation of these years.

The Mean = 1600

xi

xi- x( )

2

xi- x( )

1792166613621614146018671439

192 66-238 14-140 267-161

36864 435656644 196196007129825921

s 2 = (x - x )i

21

n-1

=1

6( 214879 )

= 35813.166

s = 189.2

Page 23: 1.2: Describing Distributions              with Numbers

Why Do We Square The Deviations ?1) The sum of the squared deviations of any set of observations from theirmean is the smallest that the sum of squared deviations from any number can possibly be.

1) The standard deviation is the measure of spread for an importantclass of symmetric unimodal distributions called the normal distribution.

Why use the Standard Deviation and not the Variance ?

2) The standard deviation is used by the normal distribution.

3) The variance uses squared deviations, which gives a different unitfrom the original data.

Why use n - 1 ?1) The sum of the deviations is *always* zero. So, if we know n-1 of thedeviations, then the last deviation can be calculated. So, only n-1 of thedeviations can vary freely. These are called degrees of freedom.

Page 24: 1.2: Describing Distributions              with Numbers

Properties of Standard Deviations

1) The standard deviation measures spread about the mean and should be used only when the mean is chosen as the measure of center.

2) s = 0 only when there is no spread. This happens only when allobservations have the same value. Otherwise, s > 0. As the observationsget more spread out from the mean, then s gets larger.

3) s, like the mean, is not resistant. A few outliers can make s very large.

Page 25: 1.2: Describing Distributions              with Numbers

Which Measure To Use ?

Q: When is the mean better than median? When is the five number summary better than the standard deviation?

Rules Of Thumb

A1: If outliers appear, or if your distribution is skewed, then the mean could be affected, so use the median and the five number summary.

A2: If the distribution is reasonably symmetric and is free of outliers, then the mean and standard deviation should be used.

Page 26: 1.2: Describing Distributions              with Numbers

Changing UnitsConsider the following values : 30, 40, 50, 60, 70

The mean is 50 and the standard deviation is 15.8

What happens to these if we take every score, multiply it by 2 and add 10

We get these values : 70, 90, 110, 130, 150

The mean is 110 and the standard deviation is 31.6

Page 27: 1.2: Describing Distributions              with Numbers

Changing UnitsOld values : 30, 40, 50, 60, 70 mean = 50 and s = 15.8

What happens to these if we take every score, multiply it by 2 and add 10

New values : 70, 90, 110, 130, 150 mean = 110 and s = 31.6

30

50

90

70

110

130

150

30

50

90

70

110

130

150

30

50

90

70

110

130

150

Page 28: 1.2: Describing Distributions              with Numbers

Linear TransformationsA linear transformation changes the original variable x into the newvariable given an equation of the form :x new

x new = bx + a

Note: The constant a shifts all values of x either up or down by the valuea. The constant b changes the size of the unit of the distribution.

Effects of Linear Transformations1) To get the new spread, multiply the old spread by |b|.

2) To get the new mean, multiply the old mean by b and add the constant a.

Page 29: 1.2: Describing Distributions              with Numbers

Density CurvesA density curve is a curve that :

1) is always on or above the vertical axis, and

2) has area exactly 1 underneath it.

A density curve describes the overall pattern of a distribution. The area under the curve and above any range of values is the relative frequency of all observations that fall in that range.

1.3: The Normal Distributions

Page 30: 1.2: Describing Distributions              with Numbers

Density Curves

Page 31: 1.2: Describing Distributions              with Numbers

Normal and Skewed Curves

Median Mean

Page 32: 1.2: Describing Distributions              with Numbers

Why are Normal Distributions important in stats?

1) Normal distributions are good descriptions for somedistributions of real data.

2) Normal distributions are good to the results of many kindsof chance outcomes.

3) Many statistical inference procedures based on normaldistributions work well for other roughly symmetricdistributions.

Page 33: 1.2: Describing Distributions              with Numbers

The 68 - 95 - 99.7 RuleIn the normal distribution with mean and standard deviation :• 68 % of the observations fall within of the mean • 95 % of the observations fall within 2 of the mean

• 99.7 % of the observations fall within 3 of the mean

Page 34: 1.2: Describing Distributions              with Numbers

Normal Curve ExampleJohn collected data on the heights of women ages 18 to 24. He found that the distribution was roughly normal, with a meanof 64.5 inches and a standard deviation of 2.5 inches.

Page 35: 1.2: Describing Distributions              with Numbers

Standardizing ObservationsIf x is an observation from a roughly symmetric distribution that has mean and standard deviation , then the standard value of x is :

z = x -

Note: A standardized score is often called a z-score.

Example : Women’s IQ’s have a symmetric distribution with amean of 97 and a standard deviation of 6.

What is the standard score for a woman with an IQ of 106 ?

z =106 - 97

6=

9

6= 1.5

Page 36: 1.2: Describing Distributions              with Numbers

Standardizing ObservationsIf x is an observation from a roughly symmetric distribution that has mean and standard deviation , then the standard value of x is :

z = x -

Note: A standardized score is often called a z-score.

Example : Men’s IQ’s have a roughly symmetric distribution with amean of 72 and a standard deviation of 8.

What is the standard score for a man with an IQ of 66 ?

z =66 - 72

8=

-6

8= - .75

Page 37: 1.2: Describing Distributions              with Numbers

If x is an observation from a roughly symmetric distribution that has mean and standard deviation , then the standard value of x is :

z = x -

Note: A standardized score is often called a z-score.

Example : Men’s IQ’s have a roughly symmetric distribution with amean of 72 and a standard deviation of 8.

What is the standard score for a man with an IQ of 66 ?

z =66 - 72

8=

-6

8= - .75

The Standard Normal Distribution

Q: What percentage of people have a score below 66 ?

Page 38: 1.2: Describing Distributions              with Numbers

The Standard Normal TableTable A is a table of areas under the standard normal curve. Thetable entry for each z value is the area under the curve to the left of z

Page 39: 1.2: Describing Distributions              with Numbers

.1357

The Standard Normal TableExample : Imagine we have done an experiment, and we want to findwhat percentage of people fell under a score, namely x.

We then proceed to find that the z-score for the value x is -1.10.

Page 40: 1.2: Describing Distributions              with Numbers

The Standard Normal TableExample : The Graduate Record Examinations (GRE) are widelyused to help predict the performance of applicants to graduate schools.The range of possible sores on a GRE is 200 to 900. The psychologydepartment at a university finds the scores of its applicants on thequantitative GRE are approximately normal with mean = 544 andstandard deviation = 103. Answer the following :

1) Find the percentage of people who scored 700 or higher on the test.

2) Find the percentage of people who scored below 500 on the test.

3) Find the percentage of people who scored between 500 and 800 on the test.

Page 41: 1.2: Describing Distributions              with Numbers

1) Find the percentage of people who scored 700 or higher on the test.

Find the percentage to the right of the 700 marker.

Page 42: 1.2: Describing Distributions              with Numbers

1) Find the percentage of people who scored 700 or higher on the test.

Find the z-score : z =700 - 544

103=

156

103= 1.51

P(X>700)=P(Z>1.51)=1-P(Z<1.51)=1 - .9345 = .0655

.9345

.0655

Page 43: 1.2: Describing Distributions              with Numbers

2) Find the percentage of people who scored below 500 on the test.

Find the percentage to the left of 500

Page 44: 1.2: Describing Distributions              with Numbers

Find the z-score : z =500 - 544

103=

- 44

103= - 0.43

2) Find the percentage of people who scored below 500 on the test.

0.3336

Answer : 0.3336

Page 45: 1.2: Describing Distributions              with Numbers

3) Find the percentage of people who scored between 500 and 800 on the test.

Find the percentage between 500 and 800

Page 46: 1.2: Describing Distributions              with Numbers

3) Find the percentage of people who scored between 500 and 800 on the test.

Find the first z-score : z =500 - 544

103=

- 44

103= - 0.43

Find the second z-score : z =800 - 544

103=

256

103= 2.49

0.3336

0.9936Area =

.9936 - .3336 =

0.66

Page 47: 1.2: Describing Distributions              with Numbers

Example : The Soup Nazi charges, on the average, $4.50 for a cup of soup, and if you’re lucky, some bread, with astandard deviation of $0.45.

4.50

What is the probability that our check will be morethan $5.00 ?

Page 48: 1.2: Describing Distributions              with Numbers

4.50

What is the probability that our check will be morethan $5.00 ?

5.00

P (X > 5 ) =P(Z >1.11)=0.1335

Z = 5.00 - 4.50

0.45= 1.11

0.8665 0.1335

13.35 %

Page 49: 1.2: Describing Distributions              with Numbers

“Backward” Normal Calculations

• We could find the observed value (x) of a given proportion in N( , ) by unstandardizing the z-score.

1) State the problem

2) Draw a picture

3) Use the normal table to find the proportion closest to the one you need

4) Read off the z-value

5) Unstandardize x= + z

Page 50: 1.2: Describing Distributions              with Numbers

Example

Find the value of z such that the probability of being less than z is 0.10.

0

1. z: P(Z < z) = .10

Page 51: 1.2: Describing Distributions              with Numbers

Example Find the value of z such that the probability of being less than z is .10.

1. z: P(Z < z) = .10

2.

3. In the body of the normal table, find the closest value to .10. Once found, determine the z value.

P(Z < -1.28) = .1003So z = -1.28

0

Closest is .1003

Page 52: 1.2: Describing Distributions              with Numbers

0

.33???

Example

Find the value of z such that the probability of being greater than z is .33.

2. z: P(Z < z) = 1 - .33 = .67

.67

1. z: P(Z > z) = .33

Page 53: 1.2: Describing Distributions              with Numbers

Example Find the value of z such that the probability of being greater than z is .33.

1. z: P(Z > z) = .33 2. z: P(Z < z) = 1 - .33 = .67

0

.33.67

3. In the body of the normal table, find the closest value to .67. Once found, determine the z value.

P(Z > .44) = .33So z = .44I found .6700

Page 54: 1.2: Describing Distributions              with Numbers

Example

X = time Americans stir sugar into their iced tea X ~ N(12.3, 3.1) seconds

(1)Find the percent of Americans who spend between 20 to22 seconds in stirring sugar into their iced tea? i.e. P(20 < X < 22)

Page 55: 1.2: Describing Distributions              with Numbers

X = time Americans stir sugar into their iced tea X ~ N(12.3, 3.1)

Find P(20 < X < 22) = P(20 - 12.3 < Z < 22 - 12.3) 3.1 3.1

Example

= P(2.48 < Z < 3.13)

= P(Z < 3.13) - P(Z < 2.48)

= .9991 - .9934

= .0057

Page 56: 1.2: Describing Distributions              with Numbers

Example X = time Americans stir sugar into their iced tea X ~ N(12.3, 3.1)(2) About 18.4% of Americans spend more than how manyseconds stirring sugar into their iced tea?i.e. Find the value of X such that the probability of being greater than this value is .184.

(1) z: P(Z > z) = .184

(2) z: P(Z < z) = 1 - .184 = .816

(3) From the normal table, z = 0.90

(4) So x = +z = 12.3 + 0.90(3.1) = 12.3 + 2.79 = 15.09The person would have to stir 15.09 seconds.

Page 57: 1.2: Describing Distributions              with Numbers

Example

X = IQ scores X ~ N(112, 9)

Find the IQ score that replaces you in the top 2%of all scores.

1. z: P(Z > z) = .02

2. z: P(Z < z) = 1 - .02 = .98

3. From the normal table, z = 2.05

x = +z = 112 + 2.05 (9) = 130.45

Page 58: 1.2: Describing Distributions              with Numbers

ExerciseThe distribution of SAT Math scores is approximately normally distributed with mean 500 and standard deviation 100.

1. In what range do the middle 95% of all SAT Math

scores lie?2. What proportion of SAT Math scores are between 450 and 650?

3. If high school students having SAT Math scores in the top 10% of all scores are eligible for a certain scholarship, what is the lowest score a person eligible for the scholarship can have?