frequency distributions and summary...

61
Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai’i at Mānoa

Upload: others

Post on 05-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Frequency Distribution and

Summary Statistics

Dongmei Li

Department of Public Health Sciences Office of Public Health Studies University of Hawai’i at Mānoa

Page 2: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Outline

1. Stemplot

2. Frequency table

3. Summary statistics

2

Page 3: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

1. Stem-and-leaf plots (stemplots)

Always start by looking at the data with

graphs and plots

Our favorite technique for looking at a

single variable is the stemplot

A stemplot is a graphical technique that

organizes data into a histogram-like

display

You can observe a lot by looking – Yogi Berra

3

Page 4: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Stemplot Illustrative Example

Select an SRS of 10 ages

List data as an ordered array

05 11 21 24 27 28 30 42 50 52

Divide each data point into a stem-value

and leaf-value

In this example the “tens place” will be

the stem-value and the “ones place” will

be the leaf value, e.g., 21 has a stem value

of 2 and leaf value of 1

4

Page 5: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Stemplot illustration (cont.)

Draw an axis for the stem-values:

0| 1| 2| 3| 4| 5| ×10 axis multiplier (important!)

Place leaves next to their stem value

21 plotted (animation)

1

5

Page 6: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Stemplot illustration continued …

Plot all data points and rearrange in rank order:

0|5

1|1

2|1478

3|0

4|2

5|02

×10

Here is the plot horizontally: (for demonstration purposes)

8

7

4 2

5 1 1 0 2 0

------------

0 1 2 3 4 5

------------

Rotated stemplot

6

Page 7: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Interpreting Stemplots Shape

◦ Symmetry

◦ Modality (number of peaks)

◦ Kurtosis (width of tails)

◦ Departures (outliers)

Location

◦ Gravitational center mean

◦ Middle value median

Spread

◦ Range and inter-quartile range

◦ Standard deviation and variance

7

Page 8: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Shape “Shape” refers to the pattern when

plotted

Here’s the silhouette of our data X

X

X X

X X X X X X

-----------

0 1 2 3 4 5

-----------

Consider: symmetry, modality, kurtosis 8

Page 9: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Shape: Idealized Density Curve A large dataset is introduced

An density curve is superimposed to better discuss shape

9

Page 10: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Symmetrical Shapes

10

Page 11: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Asymmetrical shapes

11

Page 12: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Modality (no. of peaks)

12

Page 13: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Kurtosis (width of tails)

Mesokurtic (medium) Platykurtic (flat)

Leptokurtic (steep)

skinny tails

fat tails

Kurtosis is not be easily judged by eye

13

Page 14: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Stemplot – Second Example

Data: 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42

Stem = ones-place

Leaves = tenths-place

Round to keep one digit

after decimal point

(e.g., 1.47 1.5)

Do not plot decimal

|1|5

|2|14

|3|4789

|4|4

(×1)

Shape: asymmetric, skewed to the left, unimodal, no outliers

14

Page 15: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Draw a stemplot using JMP

Analyze---Distribution---Data---Stem and

Leaf

15

Open the JMP data set

named

Stem_and_leaf_plot.jmp

Page 16: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Third Illustrative Example (n = 26)

Age data set from 26 subjects

{14, 17, 18, 19, 22, 22, 23, 24, 24, 26, 26, 27, 28,

29, 30, 30, 30, 31, 32, 33, 34, 34, 35, 36, 37, 38}

16

Data set:

Stem_and_leaf_plot_example2.jmp

Distribution of the age variable?

Page 17: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

2. Frequency Table

Frequency = count

Relative frequency = proportion or %

Cumulative frequency % less than or equal to level

AGE | Freq Rel.Freq Cum.Freq.

------+-----------------------

3 | 2 0.3% 0.3%

4 | 9 1.4% 1.7%

5 | 28 4.3% 6.0%

6 | 37 5.7% 11.6%

7 | 54 8.3% 19.9%

8 | 85 13.0% 32.9%

9 | 94 14.4% 47.2%

10 | 81 12.4% 59.6%

11 | 90 13.8% 73.4%

12 | 57 8.7% 82.1%

13 | 43 6.6% 88.7%

14 | 25 3.8% 92.5%

15 | 19 2.9% 95.4%

16 | 13 2.0% 97.4%

17 | 8 1.2% 98.6%

18 | 6 0.9% 99.5%

19 | 3 0.5% 100.0%

------+-----------------------

Total | 654 100.0%

17

Page 18: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Frequency Table with Class Intervals

When data are sparse, group data into class intervals

Create 4 to 12 class intervals

Classes can be uniform or non-uniform

End-point convention: e.g., first class interval of

0 to 10 will include 0 but exclude 10 (0 to

9.99)

Talley frequencies

Calculate relative frequency

Calculate cumulative frequency

18

Page 19: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Class Intervals

Class Freq Relative

Freq. (%)

Cumulative

Freq (%)

0 – 9 1 10 10

10 – 19 1

20 – 29 4

30 – 39 1

40 – 49 1 10 80

50 – 59 2 20 100

Total 10 100 --

Uniform class intervals table (width 10) for data:

05 11 21 24 27 28 30 42 50 52

19

Page 20: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Histogram

0

1

2

3

4

5

0-9

10_1

9

20-2

9

30-3

9

40-4

9

50-5

9

Age Class

A histogram is a frequency chart for a quantitative measurement. Notice how the bars touch.

20

Page 21: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Bar Chart

0

50

100

150

200

250

300

350

400

450

500

Pre- Elem. Middle High

School-level

A bar chart with non-touching bars is reserved for categorical measurements and non-uniform

class intervals

21

Page 22: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

3. Summary Statistics

Central location ◦Mean ◦Median ◦Mode

Spread ◦ Range and interquartile range (IQR) ◦ Variance and standard deviation

22

Page 23: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Location: Mean

“Eye-ball method” visualize where plot would balance

Arithmetic method = sum values and divide by n

8

7

4 2

5 1 1 0 2 0

------------

0 1 2 3 4 5

------------

^

Grav.Center

Eye-ball method around 25 to 30 (takes practice)

Arithmetic method mean = 290 / 10 = 29

23

Page 24: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Notation n sample size

X the variable (e.g., ages of subjects)

xi the value of individual i for variable X

sum all values (capital sigma)

Illustrative data (ages of participants):

21 42 5 11 30 50 28 27 24

52

n = 10

X = AGE variable

x1= 21, x2= 42, …, x10= 52

xi = x1 + x2 + … + x10= 21 + 42 + … + 52 = 290

24

Page 25: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Central Location: Sample Mean

“Arithmetic average”

Traditional measure of central location

Sum the values and divide by n

“xbar” refers to the sample mean

n

i

in xn

xxxn

x

1

1121

25

Page 26: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Example: Sample Mean

Ten individuals selected at random have the following ages:

21 42 5 11 30 50 28 27 24 52

Note that n = 10, xi = 21 + 42 + … + 52 = 290, and

0.29)290(10

11 i

xn

x

0 10 20 30 40 50 60

Mean = 29

The sample mean is the gravitational center of a distribution

26

Page 27: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Uses of the Sample Mean

The sample mean can be used to predict:

The value of an observation drawn at

random from the sample

The value of an observation drawn at

random from the population

The population mean

27

Page 28: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Population Mean

Same operation as sample mean except based on entire population (N ≡ population size)

Conceptually important

Usually not available in practice

Sometimes referred to as the expected value

i

ix

NN

x 1

28

Page 29: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Central Location: Median Ordered array:

05 11 21 24 27 28 30 42 50 52

When n is even, the median is the average of the

(n ÷2)th data and the (n ÷2+1)th data.

When n is odd, the median is the ((n+1) ÷2)th

data.

For illustrative data: n = 10 → the median falls

between 27 and 28=(27+28) ÷ 2 =27.5

05 11 21 24 27 28 30 42 50 52

median

Average the adjacent values: M = 27.5 29

Page 30: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

More Examples of Medians

Example A: 2 4 6

Median = 4

Example B: 2 4 6 8

Median = 5 (average of 4 and 6)

Example C: 6 2 4

Median 2

(Values must be ordered first)

30

Page 31: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

The Median is Robust

This data set has a mean of 1636:

1362 1439 1460 1614 1666 1792 1867

The median is 1614 in both instances, demonstrating its

robustness in the face of outliers.

The median is more resistant to skews and

outliers than the mean; it is more robust.

Here’s the same data set with a data entry error “outlier”

(highlighted). This data set has a mean of 2743:

1362 1439 1460 1614 1666 1792 9867

31

Page 32: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Mode

The mode is the most commonly encountered value in the dataset

This data set has a mode of 7 {4, 7, 7, 7, 8, 8, 9}

This data set has no mode {4, 6, 7, 8} (each point appears only once)

The mode is useful only in large data sets with repeating values

32

Page 33: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Comparison of Mean, Median, Mode

Note how the mean gets pulled toward

the longer tail more than the median

mean = median → symmetrical distrib

mean > median → positive skew

mean < median → negative skew

33

Page 34: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Spread: Quartiles

Two distributions can be quite

different yet can have the same

mean

This data compares particulate

matter in air samples (μg/m3) at

two sites. Both sites have a

mean of 36, but Site 1 exhibits

much greater variability. We

would miss the high pollution

days if we relied solely on the

mean.

Site 1| |Site 2

---------------

42|2|

8|2|

2|3|234

86|3|6689

2|4|0

|4|

|5|

|5|

|6|

8|6|

×10

34

Page 35: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Spread: Range

Range = maximum – minimum

Illustrative example:

Site 1 range = 68 – 22 = 46

Site 2 range = 40 – 32 = 8

Beware: the sample range will

tend to underestimate the

population range.

Always supplement the range

with at least one addition

measure of spread

Site 1| |Site 2

----------------

42|2|

8|2|

2|3|234

86|3|6689

2|4|0

|4|

|5|

|5|

|6|

8|6|

×10

35

Page 36: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Spread: Quartiles

Quartile 1 (Q1): cuts off bottom quarter of data = median of the lower half of the data set

Quartile 3 (Q3): cuts off top quarter of data = median of the upper half of the data set

Interquartile Range (IQR) = Q3 – Q1 covers the middle 50% of the distribution

05 11 21 24 27 28 30 42 50 52

Q1 median Q3

Q1 = 21, Q3 = 42, and IQR = 42 – 21 = 21 36

Page 37: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Quartiles (Tukey’s Hinges) – Example 2 Data are metabolic rates (cal/day), n = 7

When n is odd, include the median in both halves of the data set.

Bottom half: 1362 1439 1460 1614 which has a median of 1449.5 (Q1)

Top half: 1614 1666 1792 1867 which has a median of 1729 (Q3)

1362 1439 1460 1614 1666 1792 1867

median

37

Page 38: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Five-Point Summary

Q0 (the minimum)

Q1 (25th percentile)

Q2 (median)

Q3 (75th percentile)

Q4 (the maximum)

38

Page 39: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Boxplots 1. Calculate 5-point summary. Draw box from Q1

to Q3 w/ line at median

2. Calculate IQR and fences as follows:

FenceLower = Q1 – 1.5(IQR)

FenceUpper = Q3 + 1.5(IQR)

Do not draw fences

3. Determine if any values lie outside the fences

(outside values). If so, plot these separately.

4. Determine values inside the fences (inside values)

Draw whisker from Q3 to upper inside value.

Draw whisker from Q1 to lower inside value

39

Page 40: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Illustrative Example: Boxplot

1. 5 pt summary: {5, 21, 27.5, 42, 52};

box from 21 to 42 with line @

27.5

2. IQR = 42 – 21 = 21. FU = Q3 + 1.5(IQR) = 42 + (1.5)(21) =

73.5

FL = Q1 – 1.5(IQR) = 21 – (1.5)(21) = –

10.5

3. None values above upper fence

None values below lower fence

4. Upper inside value = 52

Lower inside value = 5

Draws whiskers

Data: 05 11 21 24 27 28 30 42 50 52

60

50

40

30

20

10

0

Upper inside = 52

Q3 = 42

Q1 = 21

Lower inside = 5

Q2 = 27.5

40

Page 41: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Illustrative Example: Boxplot 2 Data: 3 21 22 24 25 26 28 29 31 51

60

50

40

30

20

10

0

O utside value (51)

O utside value (3 )

Inside value (21)

Upper hinge (29)

Lower h inge (22)

Median (25 .5 )

Inside value (31)

1. 5-point summary: 3, 22, 25.5, 29, 51: draw box

2. IQR = 29 – 22 = 7 FU = Q3 + 1.5(IQR) = 29 + (1.5)(7) = 39.5 FL = Q1 – 1.5(IQR) = 22 – (1.5)(7) = 11.6

3. One above top fence (51) One below bottom fence (3)

4. Upper inside value is 31 Lower inside value is 21 Draw whiskers

41

Page 42: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Illustrative Example: Boxplot 3

Seven metabolic rates:

1362 1439 1460 1614 1666 1792 1867

7N =

Data source: Moore,

2000

1900

1800

1700

1600

1500

1400

1300

1. 5-point summary: 1362, 1449.5, 1614, 1729, 1867

2. IQR = 1729 – 1449.5 = 279.5

FU = Q3 + 1.5(IQR) = 1729 + (1.5)(279.5) = 2148.25

FL = Q1 – 1.5(IQR) = 1449.5 – (1.5)(279.5) = 1030.25

3. None outside

4. Whiskers end @ 1867 and 1362

42

Page 43: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Boxplots: Interpretation

Location ◦ Position of median

◦ Position of box

Spread ◦ Hinge-spread (IQR)

◦ Whisker-to-whisker spread

◦ Range

Shape ◦ Symmetry or direction of skew

◦ Long whiskers (tails) indicate leptokurtosis (Long tails?)

43

Page 44: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Side-by-side boxplots

Boxplots are especially useful when comparing groups

44

Page 45: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Spread: Standard Deviation

Most common

descriptive

measures of spread

Based on deviations

around the mean.

This figure

demonstrates the

deviations of two of

its values

This data set has a mean of 36.

The data point 33 has a deviation of 33 – 36 = −3.

The data point 40 has a deviation of 40 – 36 = 4.

45

Page 46: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Variance and Standard Deviation

xxiDeviation =

2

xxSSiSum of squared deviations =

1

2

n

SSsSample variance =

2ss Sample standard deviation =

46

Page 47: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Standard deviation (formula)

2

)(1

1xx

ns

i

Sample standard deviation s is the estimator of

population standard deviation .

Sum of Squares

47

Page 48: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Illustrative Example: Standard Deviation

Observation Deviations Squared deviations

36 36 36 = 0 02 = 0

38 38 36 = 2 22 = 4

39 39 36 = 3 32 = 9

40 40 36 = 4 42 = 16

36 36 36 = 0 02 = 0

34 34 36 = 2 (2)2 = 4

33 33 36 = 3 (3)2 = 9

32 32 36 = 4 (4)2 = 16

SUMS 0* SS = 58

2

xxixx

i

ix

* Sum of deviations always equals zero

48

Page 49: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Illustrative Example (cont.)

232)g/m( 286.8

18

58

1

n

SSs

32g/m 88.2286.8 ss

Sample variance (s2)

Standard deviation (s)

49

Page 50: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Interpretation of Standard Deviation

Measure spread (e.g., if group was s1 =

15 and group 2 s2 = 10, group 1 has

more spread, i.e., variability)

68-95-99.7 rule (next slide)

Chebychev’s rule (two slides hence)

50

Page 51: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

68-95-99.7 Rule Normal Distributions Only!

68% of data in the range μ ± σ

95% of data in the range μ ± 2σ

99.7% of data in the range μ ± 3σ

Example. Suppose a variable has a Normal

distribution with μ = 30 and σ = 10. Then:

68% of values are between 30 ± 10 = 20 to 40

95% are between 30 ± (2)(10) = 30 ± 20 = 10

to 50

99.7% are between 30 ± (3)(10) = 30 ± 30 = 0

to 60 51

Page 52: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Chebychev’s Rule

All Distributions

Chebychev’s rule says that at least 75% of

the values will fall in the range μ ± 2σ (for

any shaped distribution)

Example: A distribution with μ = 30 and σ

= 10 has at least 75% of the values in the

range 30 ± (2)(10) = 10 to 50

52

Page 53: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Rules for Rounding

Carry at least four significant digits during

calculations.

Round at last step of operation

Always report units

Always use common sense and good judgment.

53

Page 54: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Choosing Summary Statistics

Always report a measure of central

location, a measure of spread, and the

sample size

Symmetrical mound-shaped

distributions report mean and

standard deviation

Odd shaped distributions report 5-

point summaries (or median and IQR)

54

Page 55: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Software and Calculators

Use software and calculators to check work.

55

Page 56: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Excel Data Analysis ToolPak

Data set: Boxplot.xlsx

Summary statistics using Excel

56

Page 57: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Excel Data Analysis ToolPak

Get summary statistics using Excel

57

Page 58: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Boxplot using Excel

Youtube links to get boxplot in Excel

http://www.youtube.com/watch?v=s8ZW

4PVarwE&feature=related

58

0

10

20

30

40

50

60

70

Female age Male age

Page 59: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Boxplot and summary statistics by

JMP

Data set:

Boxplot.jmp

What do you say

about the

comparison of

distributions of

ages for females

and males?

59

Page 60: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

60

Page 61: Frequency Distributions and Summary Statisticsrmatrix2.jabsom.hawaii.edu/cbrtap/2012-10-25-resources/...2012/10/25  · Stemplot Illustrative Example Select an SRS of 10 ages List

Exercise

Surgical times. Durations of surgeries (hours) for 15 patients receiving artificial hearts are shown here. Create a stem plot of these data. Describe the distribution. Are there any outliers? What is the standard deviation of this data set? Draw a boxplot based on this data set.

7.0 6.5 3.5 3.1 2.8 2.5 3.8 2.6 2.4 2.1 1.8 2.3 3.1 3.0 2.5

Data set: Presentation2_Exercise.xlsx

Presentation2_exercise.jmp 61