chapter 5: understanding and comparing distributions

24
Chapter 5: Understanding and Comparing Distributions AP Statistics

Upload: aulii

Post on 23-Feb-2016

57 views

Category:

Documents


0 download

DESCRIPTION

Chapter 5: Understanding and Comparing Distributions. AP Statistics. Understanding and Comparing Distributions . In Chapter 3 we compared the associations between two categorical variables through contingency tables and displays. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 5:  Understanding and Comparing Distributions

Chapter 5: Understanding and

Comparing Distributions

AP Statistics

Page 2: Chapter 5:  Understanding and Comparing Distributions

In Chapter 3 we compared the associations

between two categorical variables through contingency tables and displays.

Now we will look at the different ways of examining the relationships between two variables (one quantitative and one categorical).

Understanding and Comparing Distributions

Page 3: Chapter 5:  Understanding and Comparing Distributions

We can answer much more interesting

questions about variables when we compare distributions for different groups.

Below is a histogram of the Average Wind Speed for every day in 1989.

The Big Picture

Page 4: Chapter 5:  Understanding and Comparing Distributions

The distribution is unimodal and skewed to

the right. The high value may be an outlier

The Big Picture (cont.)

Median daily wind speed is about 1.90 mph and the IQR is reported to be 1.78 mph.

Can we say more?

Page 5: Chapter 5:  Understanding and Comparing Distributions

The five-number

summary of a distribution reports its median, quartiles, and extremes (maximum and minimum). Example: The five-

number summary for the daily wind speed is:

Review: The Five-Number Summary

Max 8.67

Q3 2.93

Median 1.90

Q1 1.15

Min 0.20

Page 6: Chapter 5:  Understanding and Comparing Distributions

A boxplot is a graphical display of the five-

number summary.

Boxplots are useful when comparing groups.

Boxplots are particularly good at pointing out outliers.

Making Boxplots

Page 7: Chapter 5:  Understanding and Comparing Distributions

1. Draw a single

vertical axis spanning the range of the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box.

Constructing Boxplots

Page 8: Chapter 5:  Understanding and Comparing Distributions

2. Draw “fences” around

the main part of the data.

The upper fence is 1.5*(IQR) above the upper quartile.

The lower fence is 1.5*(IQR) below the lower quartile.

Note: the fences only help with constructing the boxplot and should not appear in the final display.

Constructing Boxplots (cont.)

Page 9: Chapter 5:  Understanding and Comparing Distributions

3. Use the fences to

grow “whiskers.” Draw lines from

the ends of the box up and down to the most extreme data values found within the fences.

If a data value falls outside one of the fences, we do not connect it with a whisker.

Constructing Boxplots (cont.)

Page 10: Chapter 5:  Understanding and Comparing Distributions

4. Add the outliers

by displaying any data values beyond the fences with special symbols. We often use a

different symbol for “far outliers” that are farther than 3 IQRs from the quartiles.

Constructing Boxplots (cont.)

Page 11: Chapter 5:  Understanding and Comparing Distributions

Compare the histogram and boxplot for daily wind

speeds:

How does each display represent the distribution?

Wind Speed: Making Boxplots

Page 12: Chapter 5:  Understanding and Comparing Distributions

The center of the boxplot shows us the

middle half of the data between the quartiles.

The height of the box is equal to the IQR. If the median is roughly centered

between the quartiles, then the middle half of the data is roughly symmetric. Thus, if the median is not centered, the distribution is skewed.

The whiskers also show the skewness if they are not the same length.

Outliers are out of the way to keep you from judging skewness, but give them special attention.

What Do Boxplots Tell Us?

Page 13: Chapter 5:  Understanding and Comparing Distributions

It is almost always more interesting to compare groups. With histograms, note the shapes, centers, and

spreads of the two distributions.

What does this graphical display tell you?

Comparing Groups

Page 14: Chapter 5:  Understanding and Comparing Distributions

In 2004, the average number of SV students to pass the AP Statistics exam was 2.9 out of 15 students in a class. Ms. Halliday collected class data from the past AP Statistics classes. Since there were so many, she broke them off into males and females to compare. The results are to the right.

How does the number of students who pass the AP Statistics exam compare to males and females?

Comparing Groups (Stem-and-Leaf Displays)

Key: 1|3| = 3.1

Page 15: Chapter 5:  Understanding and Comparing Distributions

Boxplots offer an ideal balance of information and

simplicity, hiding the details while displaying the overall summary information.

We often plot them side by side for groups or categories we wish to compare.

What do these boxplots tell you?

Comparing Groups (cont.)

Page 16: Chapter 5:  Understanding and Comparing Distributions

If there is a legitimate reason for a mistake in

the data, disregard the outlier and give your reason for doing so.

If there are any clear outliers and you cant tell if they are errors or not, it is best to report the mean and standard deviation with the outliers present and with the outliers removed. The differences may be quite revealing.

Note: Remember that the median and IQR are not likely to be affected by the outliers.

What About Outliers?Day 2

Page 17: Chapter 5:  Understanding and Comparing Distributions

For some data sets, we are interested in how

the data behave over time. In these cases, we construct timeplots of the data.

NOTE: Time is ALWAYS on the horizontal axis.

Timeplots

Page 18: Chapter 5:  Understanding and Comparing Distributions

When the data are skewed it can be hard to summarize them simply with a center and spread, and hard to decide whether the most extreme values are outliers or just part of a stretched out tail.

How can we say anything useful about such data?

*Re-expressing Skewed Data to Improve Symmetry

Page 19: Chapter 5:  Understanding and Comparing Distributions

One way to make a skewed

distribution more symmetric is to re-express or transform the data by applying a simple function (e.g., logarithmic function).

Note the change in skewness from the raw data (previous slide) to the transformed data (right):

*Re-expressing Skewed Data to Improve Symmetry (cont.)

Page 20: Chapter 5:  Understanding and Comparing Distributions

Avoid inconsistent

scales, either within the display or when comparing two displays.

Label clearly so a reader knows what the plot displays. Good intentions, bad

plot:

What Can Go Wrong?

Page 21: Chapter 5:  Understanding and Comparing Distributions

Beware of outliers

Be careful when comparing groups that have very different spreads. Consider these side-

by-side boxplots of cotinine levels:

Re-express . . .

What Can Go Wrong? (cont.)

Page 22: Chapter 5:  Understanding and Comparing Distributions

Imagine we were to examine data for the

number of inches of snowfall in December in Bismarck, ND, for the past 100 years. What information could we learn from looking at a histogram that would be difficult to see in a timeplot? What information could we see in a timeplot that would not show up in a histogram?

Histogram vs. Timeplot

Page 23: Chapter 5:  Understanding and Comparing Distributions

We’ve learned the value of comparing data groups

and looking for patterns among groups and over time.

We’ve seen that boxplots are very effective for comparing groups graphically.

We’ve experienced the value of identifying and investigating outliers.

We’ve graphed data that has been measured over time against a time axis and looked for long-term trends both by eye and with a data smoother.

Recap

Page 24: Chapter 5:  Understanding and Comparing Distributions

Day 1: # 5, 7, 10, 11, 12, 15 Day 2: # 13, 18, 20, 21, 23 Day 3: # 27, 3, 35, 36b, 39, 40

Reading: Chapter 6

Assignments: pp. 95 – 103