lesson 1 - 2

23
Lesson 1 - 2 Describing Distributions with Numbers apted from Mr. Molesky’s Statmonkey website

Upload: ull

Post on 28-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Lesson 1 - 2. Describing Distributions with Numbers. adapted from Mr. Molesky’s Statmonkey website. Measures of Spread. Variability is the key to Statistics. Without variability, there would be no need for the subject. When describing data, never rely on center alone. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lesson 1 - 2

Lesson 1 - 2

Describing Distributions with Numbers

adapted from Mr. Molesky’s Statmonkey website

Page 2: Lesson 1 - 2

Measures of Spread

Variability is the key to Statistics. Without variability, there would be no need for the subject.

When describing data, never rely on center alone.

Measures of Spread:Range - {rarely used ... why?}

Quartiles - InterQuartile Range {IQR=Q3-Q1}

Variance and Standard Deviation {var and sx}

Like Measures of Center, you must choose the most appropriate measure of spread.

Page 3: Lesson 1 - 2

Standard Deviation

Another common measure of spread is the Standard Deviation: a measure of the “average” deviation of all observations from the mean.

To calculate Standard Deviation:Calculate the mean.Determine each observation’s deviation (x - xbar).“Average” the squared-deviations by dividing the total squared deviation by (n-1).This quantity is the Variance.Square root the result to determine the Standard Deviation.

Page 4: Lesson 1 - 2

Standard Deviation Properties

s measures spread about the mean and should be used only when the mean is used as the measure of center

s = 0 only when there is no spread/variability. This happens only when all observations have the same value. Otherwise, s > 0. As the observations become more spread out about their mean, s gets larger

s, like the mean x-bar, is not resistant. A few outliers can make s very large

Page 5: Lesson 1 - 2

Standard Deviation

Variance:

Standard Deviation:

Example 1.16 (p.85): Metabolic Rates

var (x1 x )2 (x2 x )2 ... (xn x )2

n 1

sx (xi x )2n 1

1792 1666 1362 1614 1460 1867 1439

Page 6: Lesson 1 - 2

Standard Deviation

1792 1666 1362 1614 1460 1867 1439

x (x - x) (x - x)2

1792 192 36864

1666 66 4356

1362 -238 56644

1614 14 196

1460 -140 19600

1867 267 71289

1439 -161 25921

Totals: 0 214870

Metabolic Rates: mean=1600

Total Squared Deviation

214870

Variance

var=214870/6

var=35811.66

Standard Deviation

s=√35811.66

s=189.24 cal

What does this value, s, mean?

Page 7: Lesson 1 - 2

Example 1

Which of the following measures of spread are resistant?

1. Range

2. Variance

3. Standard Deviation

Not Resistant

Not Resistant

Not Resistant

Page 8: Lesson 1 - 2

Example 2

Given the following set of data:

70, 56, 48, 48, 53, 52, 66, 48, 36, 49, 28, 35, 58, 62, 45, 60, 38, 73, 45, 51,56, 51, 46, 39, 56, 32, 44, 60, 51, 44, 63, 50, 46, 69, 53, 70, 33, 54, 55, 52

What is the range?

What is the variance?

What is the standard deviation?

73-28 = 45

117.958

10.861

Page 9: Lesson 1 - 2

QuartilesQuartiles Q1 and Q3 represent the 25th and 75th percentiles.

To find them, order data from min to max.

Determine the median - average if necessary.

The first quartile is the middle of the ‘bottom half’.

The third quartile is the middle of the ‘top half’.

19 22 23 23 23 26 26 27 28 29 30 31 32

45 68 74 75 76 82 82 91 93 98

med Q3=29.5Q1=23

med=79Q1 Q3

Page 10: Lesson 1 - 2

Using the TI-83

• Enter the test data into List, L1– STAT, EDIT enter data into L1

• Calculate 5 Number Summary– Hit STAT go over to CALC

and select 1-Var Stats and hitt 2nd 1 (L1)

• Use 2nd Y= (STAT PLOT) to graph the box plot– Turn plot1 ON– Select BOX PLOT (4th option, first in second row)– Xlist: L1– Freq: 1– Hit ZOOM 9:ZoomStat to graph the box plot

• Copy graph with appropriate labels and titles

Page 11: Lesson 1 - 2

5-Number Summary, Boxplots

The 5 Number Summary provides a reasonably complete description of the center and spread of distribution

We can visualize the 5 Number Summary with a boxplot.

MIN Q1 MED Q3 MAX

min=45 Q1=74 med=79 Q3=91 max=98

45 50 55 60 65 70 75 80 85 90 95 100

Quiz ScoresOutlier?Outlier?

Page 12: Lesson 1 - 2

Determining Outliers

InterQuartile Range “IQR”: Distance between Q1 and Q3. Resistant measure of spread...only measures middle 50% of data.

IQR = Q3 - Q1 {width of the “box” in a boxplot}

1.5 IQR Rule: If an observation falls more than 1.5 IQRs above Q3 or below Q1, it is an outlier.

“1.5 • IQR Rule”“1.5 • IQR Rule”

Why 1.5? According to John Tukey, 1 IQR seemed like too little and 2 IQRs Why 1.5? According to John Tukey, 1 IQR seemed like too little and 2 IQRs seemed like too much...seemed like too much...

Page 13: Lesson 1 - 2

Outliers: 1.5 • IQR Rule

To determine outliers:

1. Find 5 Number Summary

2. Determine IQR

3. Multiply 1.5xIQR

4. Set up “fences” A. Lower Fence: Q1-(1.5∙IQR)

B. Upper Fence: Q3+(1.5∙IQR)

5. Observations “outside” the fences are outliers.

Page 14: Lesson 1 - 2

Outlier Example

0 10 20 30 40 50 60 70 80 90 100Spending ($)

IQR=45.72-19.06IQR=26.66IQR=45.72-19.06IQR=26.66

1.5IQR=1.5(26.66)1.5IQR=39.991.5IQR=1.5(26.66)1.5IQR=39.99

All data on pg 48,#1.6

outliers}

fence: 45.72+39.99= 85.71

fence: 19.06-39.99= -20.93

{

Page 15: Lesson 1 - 2

Example 4Consumer Reports did a study of ice cream bars (sigh, only

vanilla flavored) in their August 1989 issue. Twenty-seven bars having a taste-test rating of at least “fair” were listed, and calories per bar was included. Calories vary quite a bit partly because bars are not of uniform size. Just how many calories should an ice cream bar contain?

 

Construct a boxplot for the data above.

342 377 319 353 295 234 294 286

377 182 310 439 111 201 182 197

209 147 190 151 131 151

Page 16: Lesson 1 - 2

Example 4 - Answer

Q1 = 182 Q2 = 221.5 Q3 = 319

Min = 111 Max = 439 Range = 328

IQR = 137 UF = 524.5 LF = -23.5

Calories

100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500

Page 17: Lesson 1 - 2

Example 5

The weights of 20 randomly selected juniors at MSHS are recorded below:

 

 

a) Construct a boxplot of the data

b) Determine if there are any mild or extreme outliers

c) Comment on the distribution

121 126 130 132 143 137 141 144 148 205

125 128 131 133 135 139 141 147 153 213

Page 18: Lesson 1 - 2

Example 5 - Answer

Q1 = 130.5 Q2 = 138 Q3 = 145.5

Min = 121 Max = 213 Range = 92

IQR = 15 UF = 168 LF = 108

Mean = 143.6

StDev = 23.91

Weight (lbs)

100 110 120 130 140 150 160 170 180 190 200 210 220

**

Extreme Outliers( > 3 IQR from Q3)

Shape: somewhat symmetric Outliers: 2 extreme outliersCenter: Median = 138 Spread: IQR = 15

Page 19: Lesson 1 - 2

Linear Transformations

Variables can be measured in different units (feet vs meters, pounds vs kilograms, etc)

When converting units, the measures of center and spread will change

Linear Transformations (xnew = a+bx) do not change the overall shape of a distribution

Multiplying each observation by b multiplies both the measure of center and spread by b

Adding a to each observation adds a to the measure of center, but does not affect spread

If the distribution was symmetric, its transformation is symmetric. If the distribution was skewed, its transformation maintains the same skewness

Page 20: Lesson 1 - 2

Transformation Example 6

• Using the data from example #5– a) Change the weight from pounds to kilograms

and add 2 kg (for a special band uniform)– b) Get summary statistics and compare with example 5– c) Draw a box plot

121 126 130 132 143 137 141 144 148 205

125 128 131 133 135 139 141 147 153 213

Page 21: Lesson 1 - 2

Example 6 - Answer• Convert Pounds to Kg ( 0.4536 ) and add 2

Q1 = 61.19 Q2 = 64.60 Q3 = 68.00

Min = 56.89 Max = 98.62 Range = 41.73

IQR = 6.81 UF = 78.22 LF = 50.98

Mean = 67.14 (143.6 0.4536 + 2)

StDev = 10.84 (23.91 0.4536)

121 126 130 132 143 137 141 144 148 205

125 128 131 133 135 139 141 147 153 213

56.88 59.15 60.97 61.88 66.87 64.14 65.96 67.32 69.13 94.99

58.7 60.06 61.42 62.33 63.24 65.05 65.96 68.68 71.40 98.62

Page 22: Lesson 1 - 2

Example 6 – Answer cont

Transformation follows what we expect:

Multiplying each observation by b multiplies both the measure of center and spread by b

Adding a to each observation adds a to the measure of center, but does not affect spread

If the distribution was symmetric, its transformation is symmetric. If the distribution was skewed, its transformation maintains the same skewness

Weight (in Kg)

45 50 55 60 65 70 75 80 85 90 95 100 105

**

Extreme Outliers( > 3 IQR from Q3)

Page 23: Lesson 1 - 2

Day 2 Summary and Homework

• Summary– Sample variance is found by dividing by (n – 1) to keep it an

unbiased (since we estimate the population mean, μ, by using the sample mean, x‾) estimator of population variance

– The larger the standard deviation, the more dispersion the distribution has

– Boxplots can be used to check outliers and distributions– Use comparative boxplots for two datasets– Identifying a distribution from boxplots or histograms is

subjective!

• Homework– pg 82: prob 33; pg 89 probs 40, 41;

pg 97 probs 45, 46