statistic frequency distribution

66
Chapter 2 Descriptive Statistics 1 Larson/Farber 4th ed.

Upload: ellina-james

Post on 12-Feb-2016

270 views

Category:

Documents


0 download

DESCRIPTION

notes in statistic, basic stat

TRANSCRIPT

Page 1: Statistic Frequency Distribution

Chapter 2

Descriptive Statistics

1Larson/Farber 4th ed.

Page 2: Statistic Frequency Distribution

Chapter Outline

• 2.1 Frequency Distributions and Their Graphs

• 2.2 More Graphs and Displays

• 2.3 Measures of Central Tendency

• 2.4 Measures of Variation

• 2.5 Measures of Position

2Larson/Farber 4th ed.

Page 3: Statistic Frequency Distribution

Overview

Descriptive Statistics• Describes the important characteristics of a set of

data. • Organize, present, and summarize data:

1. Graphically

2. Numerically

Larson/Farber 4th ed. 3

Page 4: Statistic Frequency Distribution

Important Characteristics of Quantitative Data

“Shape, Center, and Spread”

• Center: A representative or average value that indicates where the middle of the data set is located.

• Variation: A measure of the amount that the values vary among themselves.

• Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed).

Page 5: Statistic Frequency Distribution

Overview

• 2.1 Frequency Distributions and Their Graphs

• 2.2 More Graphs and Displays

• 2.3 Measures of Central Tendency

• 2.4 Measures of Variation

• 2.5 Measures of Position

5Larson/Farber 4th ed.

Page 6: Statistic Frequency Distribution

Section 2.1

Frequency Distributions

and Their Graphs

6Larson/Farber 4th ed.

Page 7: Statistic Frequency Distribution

Frequency Distributions

Frequency Distribution• A table that organizes data values into classes or intervals

along with number of values that fall in each class (frequency, f ).

1. Ungrouped Frequency Distribution – for data sets with few different values. Each value is in its own class.

2. Grouped Frequency Distribution: for data sets with many different values, which are grouped together in the classes.

Page 8: Statistic Frequency Distribution

Grouped and UngroupedFrequency Distributions

Courses Taken

Frequency, f

1 25

2 38

3 217

4 1462

5 932

6 15

Ungrouped

Age of Voters

Frequency, f

18-30 202

31-42 508

43-54 620

55-66 413

67-78 158

78-90 32

Grouped

Page 9: Statistic Frequency Distribution

Ungrouped Frequency Distributions

Number of Peas in a Pea Pod

Sample Size: 50

5 5 4 6 4

3 7 6 3 5

6 5 4 5 5

6 2 3 5 5

5 5 7 4 3

4 5 4 5 6

5 1 6 2 6

6 6 6 6 4

4 5 4 5 3

5 5 7 6 5

Peas per pod Freq, f Peas per pod

Freq, f

1 1

2 2

3 5

4 9

5 18

6 12

7 3

Page 10: Statistic Frequency Distribution

Graphs of Frequency Distributions:Frequency Histograms

Frequency Histogram

• A bar graph that represents the frequency distribution.

• The horizontal scale is quantitative and measures the data values.

• The vertical scale measures the frequencies of the classes.

• Consecutive bars must touch.

Larson/Farber 4th ed. 10

data valuesfr

eque

ncy

Page 11: Statistic Frequency Distribution

Frequency Histogram

Ex. Peas per Pod

Peas per pod Freq, f

1 1

2 2

3 5

4 9

5 18

6 12

7 3

Number of Peas in a Pod

0

5

10

15

20

1 2 3 4 5 6 7

Number of Peas

Fre

qu

ency

, f

Page 12: Statistic Frequency Distribution

Relative Frequency Distributions and Relative Frequency Histograms

Relative Frequency Distribution • Shows the portion or percentage of the data that falls

in a particular class.

12

n

f

sizeSample

frequencyclassfrequencyrelative•

Relative Frequency Histogram

• Has the same shape and the same horizontal scale as the corresponding frequency histogram.

• The vertical scale measures the relative frequencies, not frequencies.

Page 13: Statistic Frequency Distribution

Relative Frequency Histogram

Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies.

Page 14: Statistic Frequency Distribution

Grouped Frequency Distributions

Grouped Frequency Distribution• For data sets with many different values. • Groups data into 5-20 classes of equal width.

Exam Scores Freq, f Exam Scores Freq, f

30-39

40-49

50-59

60-69

70-79

80-89

90-99

Exam Scores Freq, f

30-39 1

40-49 0

50-59 4

60-69 9

70-79 13

80-89 10

90-99 3

Page 15: Statistic Frequency Distribution

Grouped Frequency Distribution Terms

• Lower class limits: are the smallest numbers that

can actually belong to different classes

• Upper class limits: are the largest numbers that can

actually belong to different classes

• Class width: is the difference between two

consecutive lower class limits

15

Page 16: Statistic Frequency Distribution

Labeling Grouped Frequency Distributions

• Class midpoints: the value halfway between LCL

and UCL:

• Class boundaries: the value halfway between an

UCL and the next LCL

(Lower class limit) (Upper class limit)

2

(Upper class limit) (next Lower class limit)

2

Page 17: Statistic Frequency Distribution

Constructing a Grouped Frequency Distribution

17

1. Determine the range of the data: Range = highest data value – lowest data value May round up to the next convenient number

2. Decide on the number of classes. Usually between 5 and 20; otherwise, it may be difficult to detect any

patterns.

3. Find the class width: .

Round up to the next convenient number.range

class width =number of classes

Page 18: Statistic Frequency Distribution

Constructing a Frequency Distribution

4. Find the class limits. Choose the first LCL: use the minimum data

entry or something smaller that is convenient. Find the remaining LCLs: add the class width to

the lower limit of the preceding class. Find the UCLs: Remember that classes must

cover all data values and cannot overlap.

5. Find the frequencies for each class. (You may add a tally column first and make a tally mark for each data value in the class).

Larson/Farber 4th ed. 18

Page 19: Statistic Frequency Distribution

“Shape” of Distributions

Symmetric• Data is symmetric if the left half of its histogram is

roughly a mirror image of its right half.

Skewed• Data is skewed if it is not symmetric and if it extends

more to one side than the other.

Uniform• Data is uniform if it is equally distributed (on a

histogram, all the bars are the same height or approximately the same height).

Page 20: Statistic Frequency Distribution

The Shape of Distributions

Symmetric

Skewed RightSkewed left

Uniform

Page 21: Statistic Frequency Distribution

Outliers• Unusual data values as compared to the rest of the set.

They may be distinguished by gaps in a histogram.

Outliers

Page 22: Statistic Frequency Distribution

Section 2.2

More Graphs and Displays

Larson/Farber 4th ed. 22

Page 23: Statistic Frequency Distribution

Other Graphs

Besides Histograms, there are other methods of graphing quantitative data:

• Stem and Leaf Plots• Dot Plots• Time Series

Page 24: Statistic Frequency Distribution

Stem and Leaf Plots

Represents data by separating each data value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit)

Larson/Farber 4th ed. 24

Page 25: Statistic Frequency Distribution

Constructing Stem and Leaf Plots

• Split each data value at the same place value to form the stem and a leaf. (Want 5-20 stems). • Arrange all possible stems vertically so there are no missing stems. • Write each leaf to the right of its stem, in order. • Create a key to recreate the data. • Variations of stem plots:

1. Split stems

2. Back to back stem plots.

Larson/Farber 4th ed. 25

Page 26: Statistic Frequency Distribution

Constructing a Stem-and-Leaf Plot

Larson/Farber 4th ed. 26

Include a key to identify the values of the data.

Page 27: Statistic Frequency Distribution

Dot Plots

Dot plot

• Consists of a graph in which each data value is plotted as a point along a scale of values

Figure 2-5

Page 28: Statistic Frequency Distribution

Time Series (Paired data)

Time Series

• Data set is composed of quantitative entries taken at regular intervals over a period of time. e.g., The amount of precipitation measured each

day for one month.

• Use a time series chart to graph.

Larson/Farber 4th ed. 28

timeQ

uant

itat

ive

data

Page 29: Statistic Frequency Distribution

Time-Series Graph

Number of Screens at Drive-In Movies Theaters

Figure 2-8

Ex. www.eia.doe.gov/oil_gas/petroleum/

Page 30: Statistic Frequency Distribution

Graphing Qualitative Data Sets

Pie Chart

• A circle is divided into sectors that represent categories.

Larson/Farber 4th ed. 30

Pareto Chart

• A vertical bar graph in which the height of each bar represents frequency or relative frequency.

CategoriesF

requ

ency

Page 31: Statistic Frequency Distribution

Constructing a Pie Chart

• Find the total sample size.

• Convert the frequencies to relative frequencies (percent).

31

Marital Status Frequency,f (in millions)

Relative frequency (%)

Never Married 55.3

Married 127.7

Widowed 13.9

Divorced 22.8

Total: 219.7

55.30.25 or 25%

219.7

127.7

219.7

13.9

219.7

22.8

219.7

Page 32: Statistic Frequency Distribution

Constructing Pareto Charts

• Create a bar for each category, where the height of the bar can represent frequency or relative frequency.

• The bars are often positioned in order of decreasing height, with the tallest bar positioned at the left.

Figure 2-6

Page 33: Statistic Frequency Distribution

Section 2.3

Measures of Central Tendency

Larson/Farber 4th ed. 33

Page 34: Statistic Frequency Distribution

Measures of Central Tendency

Measure of central tendency

• A value that represents a typical, or central, entry of a data set.

• Most common measures of central tendency: Mean Median Mode

Larson/Farber 4th ed. 34

Page 35: Statistic Frequency Distribution

Measure of Central Tendency: Mean

Mean : The sum of all the data entries divided by the number of entries.

• Population mean:

• Sample mean:

• Round-off rule for measures of center: Carry one more decimal place than is in the original values. Do not round until the last step.

35

x

N

xx

n

Page 36: Statistic Frequency Distribution

Measure of Central Tendency: Median

Median

• The value that lies in the middle of the data when the data set is arranged in order from lowest to highest. .

• Measures the center of an ordered data set by dividing it into two equal parts.

• A sample mean is often referred to as x.

• If the data set has an odd number of entries: median is the middle data entry. even number of entries: median is the mean of the two

middle data entries.

Larson/Farber 4th ed. 36

~

Page 37: Statistic Frequency Distribution

Computing the Median

If the data set has an:• odd number of entries: median is the middle data entry:

• even number of entries: median is the mean of the two middle data entries:

37

2 5 6 11 13 median is the exact middle value:

median is the mean of the by two numbers:

2 5 6 7 11 13 6 7

6.52

x

6x

Page 38: Statistic Frequency Distribution

Measure of Central Tendency: Mode

Mode

• The data entry that occurs with the greatest frequency.

• If no entry is repeated the data set has no mode.

• If two entries occur with the same greatest frequency, each entry is a mode (bimodal).

a) 5.40 1.10 0.42 0.73 0.48 1.10

b) 27 27 27 55 55 55 88 88 99

c) 1 2 3 6 7 8 9 10

Mode is 1.10

Bimodal - 27 & 55

No Mode

Page 39: Statistic Frequency Distribution

Comparing the Mean, Median, and Mode

• All three measures describe an “average”. Choose the one that best represents a “typical” value in the set.

• Mean: The most familiar average. A reliable measure because it takes into account every entry of a

data set. May be greatly affected by outliers or skew.

• Median: A common average. Not as effected by skew or outliers.

• Mode: May be used if there is an overwhelming repeat.

Page 40: Statistic Frequency Distribution

Choosing the “Best Average”

• The shape of your data and the existence of any outliers may help you choose the best average:

Page 41: Statistic Frequency Distribution

Section 2.4

Measures of Variation

Larson/Farber 4th ed. 41

Page 42: Statistic Frequency Distribution

Measures of Variation (“Spread”)

Another important characteristic of quantitative data is how much the data varies, or is spread out.

The 2 most common method of measuring spread are:

1. Range

2. Standard deviation and Variance

Larson/Farber 4th ed. 42

Page 43: Statistic Frequency Distribution

Range

Range

• The difference between the maximum and minimum data entries in the set.

• The data must be quantitative.

• Range = (Max. data entry) – (Min. data entry)

Larson/Farber 4th ed. 43

Page 44: Statistic Frequency Distribution

Example: Finding the Range

The wait time to see a bank teller is studied at 2 banks.

Bank A has multiple lines, one for each teller.

Bank B has a single wait line for 1st available teller.

5 wait times (in minutes) are sampled from each bank:

Bank A: 5.2 6.2 7.5 8.4 9.2

Bank B: 6.6 6.8 7.5 7.7 7.9

Find the mean, median, and range for each bank.

Page 45: Statistic Frequency Distribution

Solution: Finding the Range

• Bank A: Range = ?

• Bank B: Range = ?

• Note: The range is easy to compute, but only uses 2 values. Do the following 2 sets vary the same?

Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10

Larson/Farber 4th ed. 45

Page 46: Statistic Frequency Distribution

Standard Deviation and Variance

Measures the typical amount data deviates from the mean.

Sample Variance, :

Sample Standard Deviation, s:

46

22 ( )

1

x xs

n

22 ( )

1

x xs s

n

2s

Page 47: Statistic Frequency Distribution

Finding Sample Variance & Standard Deviation

47

1. Find the mean of the sample data set.

2. Find deviation of each entry.

3. Square each deviation.

4. Add to get the sum of the deviations squared.

5. Divide by n – 1 to get the sample variance.

6. Find the square root to get the sample standard deviation.

xx

n

2( )x x

2( )x x

x x

22 ( )

1

x xs

n

2( )

1

x xs

n

Page 48: Statistic Frequency Distribution

Find the Standard Deviation and Variance for Bank A (multi-line)

Wait time, x (in min)

Deviation: x – x Squares: (x – x)2

5.2 5.2 – 7.3 = -2.1 (–2.1)2 = 4.41

6.2 6.2 – 7.3 = ( )2 =

7.5 7.5 – 7.3 = ( )2 =

8.4 8.4 – 7.3 = ( )2 =

9.2 9.2 – 7.3 = ( )2 =

Σ(x – x) = 36.5x

36.57.3 min

5

xx

n

2x x

22 ( )

1

x xs

n

2s s • Round to one more decimal than the data.

• Don’t round until the end.

• Include the appropriate units.

Page 49: Statistic Frequency Distribution

Find the Standard Deviation and Variance for Bank B (1 wait line)

Wait time, x (in min)

Deviation: x – x Squares: (x – x)2

6.6

6.8

7.5

7.7

7.9

Σ(x – x) = 36.5x

36.57.3 min

5

xx

n

2x x

22 ( )

1

x xs

n

2s s • Round to one more decimal than the data.

• Don’t round until the end.

• Include the appropriate units.

Page 50: Statistic Frequency Distribution

Sample versus Population Standard Deviation and Variance

Sample Population Statistics: Parameters:

Mean x µ

Standard s σ Deviation

Variance s2 σ2

Page 51: Statistic Frequency Distribution

Sample versus Population Standard Deviation

Sample Standard Deviation

Population Standard Deviation

Larson/Farber 4th ed. 51

22 ( )x

N

Note: Unlike x and µ, the formulas for s and σ are not mathematically the same:

22 ( )

1

x xs s

n

Page 52: Statistic Frequency Distribution

Standard Deviation: Key Points

The standard deviation is a measure of variation of all values from the mean. The larger s is, the more the data varies.

( When would s = 0 ?)

The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others)

The units of the standard deviation s are the same as the units of the original data values. (The variance has units2).

0s

Page 53: Statistic Frequency Distribution

Interpreting Standard Deviation

• Standard deviation is a measure of the typical amount an entry deviates from the mean.

• The more the entries are spread out, the greater the standard deviation.

Larson/Farber 4th ed. 53

Page 54: Statistic Frequency Distribution

Solution: Using Technology to Find the Standard Deviation

Larson/Farber 4th ed. 54

Sample Mean

Sample Standard Deviation

Page 55: Statistic Frequency Distribution

Using Technology

The gas mileage of 2 cars is sampled over various conditions:

Car A: 21.1 21.2 20.8 19.8 23.8 (mpg)

Car B: 25.2 19.1 18.0 24.4 20.3 (mpg)

Which car do you think gets “better” mpg?

Use a calculator to find the mean and standard deviation for each to justify your choice.

Page 56: Statistic Frequency Distribution

Standard Deviation and “Spread”

How does “s” show how much the data varies?

Three methods:

1. Range Rule of Thumb

2. Chebyshev’s Theorem

3. The Empirical Rule

Page 57: Statistic Frequency Distribution

The Range Rule of Thumb

Alternatively, If the range is known, you can use the range rule to estimate the standard deviation:

Range

4s

Range Rule: For most data sets, the majority of the data lies within 2 standard deviations of the mean. Recall: Range = High – LoEstimate: Range ≈ 4s

Page 58: Statistic Frequency Distribution

Using the Range Rule of Thumb

A sample of women’s heights has a mean of 64 inches and a standard deviation of 2.5 inches.

Using the range rule, “most” women fall within what heights?

What would be an “unusual” height?

Page 59: Statistic Frequency Distribution

Using the Range Rule of Thumb

The sample of Exam Scores used in the class handout had a mean of 73.6. Which of the following is most likely the standard deviation of the sample?

s = 3.6 s = 12.8 s = 74.5

Use the range rule to help justify your choice.

Page 60: Statistic Frequency Distribution

Chebyshev’s Theorem

Chebyshev’s Theorem

For data with any distribution, the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1-1/K2, where K is any positive number greater than 1.

For K = 2, at least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean

For K = 3, at least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean

Page 61: Statistic Frequency Distribution

Using Chebyshev’s Theorem

A sample of salaries at an elementary school has a mean of $32,000 and a standard deviation of $3000.

Use Chebyshev’s Theorem to describe how the salaries are spread out.

Would a salary of $28,000 be “unusual?”

Would a salary of $45,000 be “unusual”?

Page 62: Statistic Frequency Distribution

The Empirical Rule

Empirical (68-95-99.7) Rule

For data sets having a symmetric distribution:

About 68% of all values fall within 1 standard deviation of the mean

About 95% of all values fall within 2 standard deviations of the mean

About 99.7% of all values fall within 3 standard deviations of the mean

Page 63: Statistic Frequency Distribution

The Empirical Rule

Page 64: Statistic Frequency Distribution

The Empirical Rule

Page 65: Statistic Frequency Distribution

The Empirical Rule

Page 66: Statistic Frequency Distribution

Example: Using the Empirical Rule

A sample of IQs has a symmetric distribution with a mean of 100 and a standard deviation of 15.

1. Sketch the distribution.

2. 68% of people have an IQ between what 2 values?

3. What percent of people have an IQ between 70 and 130?

4. What percent of people have an IQ between 100 and 115?

5. What percent of people have an IQ above 145?

66