statistics 12 · statistics 12 chapter 1 –exploring data dr. john lo royal canadian college...

90
Statistics 12 Chapter 1 – Exploring Data Dr. John Lo Royal Canadian College 2020-2021

Upload: others

Post on 21-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

Statistics 12Chapter 1 – Exploring Data

Dr. John LoRoyal Canadian College

2020-2021

Page 2: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

0. Real-life Case Study

RCC @ 2020 CHAPTER 1 - EXPLORING DATA 2

Page 3: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Question: Do pets or friends help reduce stress?

CHAPTER 1 - EXPLORING DATA 3RCC @ 2020

Page 4: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

1. Introduction: Making sense of data

RCC @ 2020 CHAPTER 1 - EXPLORING DATA 4

Page 5: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Statistics: The science of data

› In order to understand what data tell us, we need to perform data analysis.

› Data analysis: The process of organizing, displaying, summarizing, and asking questions about data.

CHAPTER 1 - EXPLORING DATA 5RCC @ 2020

Page 6: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: Imagine that we are developing a student database for RCC.

› Questions to ask:

1. What are the individuals?

› The students of RCC

2. What are the variables?

› For example, gender, age, grade level, address, phone numbers.

CHAPTER 1 - EXPLORING DATA 6RCC @ 2020

Page 7: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Depending on the nature of variables, we can group them into two major types:

› Examples:

• Categorical – gender, race, occupation

• Quantitative – grade point average, age

CHAPTER 1 - EXPLORING DATA 7RCC @ 2020

Page 8: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: Is there anything suspicious?

CHAPTER 1 - EXPLORING DATA 8RCC @ 2020

Page 9: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Another key description that we should know regarding data is distribution.

› By definition:

› Can you describe the distribution of data in the previous example?

CHAPTER 1 - EXPLORING DATA 9RCC @ 2020

Page 10: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Practice: The following table includes data for 10 people chosen at random from more than 1 million people in households.

CHAPTER 1 - EXPLORING DATA 10RCC @ 2020

Page 11: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

2. Analyzing categorical data

RCC @ 2020 CHAPTER 1 - EXPLORING DATA 11

Page 12: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

A. Distribution of a single categorical variable

› Recall that categorical variables place individuals into one of several groups or categories.

➢Note that the values of a categorical variables are labelsfor the different categories.

➢Note also that the distribution of a categorical variable lists the count or percent of individuals who fall into each category.

CHAPTER 1 - EXPLORING DATA 12RCC @ 2020

Page 13: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: A survey of radio audience rating of US radio stations

CHAPTER 1 - EXPLORING DATA 13

variable

values

counts percentsRCC @ 2020

Page 14: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Frequency tables are not always easy to read and analyze.

› To facilitate the analysis, one may prefer to show the distribution by displaying with:

CHAPTER 1 - EXPLORING DATA 14

A pie chart A bar graph

RCC @ 2020

Page 15: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

i. Bar graphs

› A bar graph displays the distribution of a categorical variable, showing the counts for each category for easy comparison.

CHAPTER 1 - EXPLORING DATA 15RCC @ 2020

Page 16: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Two ways of showing bar charts: horizontal vs. vertical

CHAPTER 1 - EXPLORING DATA 16RCC @ 2020

Page 17: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Question: Should a bar chart be horizontal or vertical?

› Answer: Depends on whether nominal or ordinal variables are considered.

CHAPTER 1 - EXPLORING DATA 17RCC @ 2020

Page 18: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Bar charts may be deceiving if not well designed.

CHAPTER 1 - EXPLORING DATA 18

• Example: How many people were in each class on the Titanic?

RCC @ 2020

Page 19: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The best data displays observe a fundamental principle of graphing data called the area principle.

› This principle states that the area occupied by a part of the graph should correspond to the magnitude of the value it represents.

› Because of this, violations of the area principle are a common way to lie (either intentionally or not) with statistics.

CHAPTER 1 - EXPLORING DATA 19RCC @ 2020

Page 20: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Question: What’s wrong with this bar chart?

CHAPTER 1 - EXPLORING DATA 20RCC @ 2020

Page 21: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Question: 500 random customers who bought the new iMac computer were asked if their previous computer had been another Mac or a Windows computer. The results are found in this table:

› Why is the pictograph misleading?

CHAPTER 1 - EXPLORING DATA 21RCC @ 2020

Page 22: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Two possible bar graphs of the data are shown below. Which one could be considered deceptive? Why?

CHAPTER 1 - EXPLORING DATA 22RCC @ 2020

Page 23: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

ii. Pie charts

› A pie chart displays all the cases as a circle whose slices have areas proportional to each category’s fraction of the whole.

› Pie charts give a quick impression of the distribution, and are particularly good for seeing relative frequencies of ½, ¼ or 1/8.

CHAPTER 1 - EXPLORING DATA 23RCC @ 2020

Page 24: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› There are many varieties of pie charts:

CHAPTER 1 - EXPLORING DATA 24RCC @ 2020

Page 25: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Note that pie charts may be attractive, but it can be hard to see patterns in them.

› Can you tell the differences in distributions depicted by these three pie charts?

CHAPTER 1 - EXPLORING DATA 25RCC @ 2020

Page 26: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The bar charts of the same values look like:

› Bar charts are almost always better than pie charts for comparing the relative frequencies of categories.

CHAPTER 1 - EXPLORING DATA 26RCC @ 2020

Page 27: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

B. Distribution with more than one categorical variable

› We have learnt the analysis of a distribution with a single categorical variable using frequency tables, bar charts and pie charts.

› However, if a distribution contains two categorical variables, for example, what will we do?

› We can make use of a two-way table!

CHAPTER 1 - EXPLORING DATA 27RCC @ 2020

Page 28: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: A survey of 4826 randomly selected young adults (aged 19 to 25) asked, “What do you think the chances are you will have much more than a middle-class income at age 30?”. The table below shows the responses.

CHAPTER 1 - EXPLORING DATA 28RCC @ 2020

Page 29: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› To analyze data presented in a two-way table, we look at the marginal distributions, which are defined as follows:

› In typical two-way tables, these distributions are listed on the right and bottom margins.

CHAPTER 1 - EXPLORING DATA 29RCC @ 2020

Page 30: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The marginal distributions in the two-way table shown previously can be examined by converting the data into percent:

CHAPTER 1 - EXPLORING DATA 30RCC @ 2020

Page 31: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Each marginal distribution from a two-way table is a distribution from a single categorical variable. But it does not tell anything about the relationship between two variables.

› The opinions of women and men alone can be analyzed individually by studying the data only at the “Female” or “Male” column, respectively.

› The resulting distributions are called conditional distributions:

CHAPTER 1 - EXPLORING DATA 31RCC @ 2020

Page 32: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The conditional distributions of opinions among women and men:

CHAPTER 1 - EXPLORING DATA 32RCC @ 2020

Page 33: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Conditional distributions can be displayed in form of segmented bar graph or side-by-side bar graph:

CHAPTER 1 - EXPLORING DATA 33RCC @ 2020

Page 34: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Both graphs provide evidence of an association between gender and opinion about future wealth in this sample of young adults.

› The concept of association is important in statistics, but we could not over-emphasize it!

CHAPTER 1 - EXPLORING DATA 34RCC @ 2020

Page 35: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

3. Displaying quantitative data with graphs

RCC @ 2020 CHAPTER 1 - EXPLORING DATA 35

Page 36: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Bar graphs or pie charts are good at showing the distributions of categorical variables. However, they can’t be used for quantitative variables.

› The following types of graphs are commonly used to display the distributions of quantitative variables:

a) Dotplots

b) Stem-and-leaf plots

c) Histograms

d) Density plots

CHAPTER 1 - EXPLORING DATA 36RCC @ 2020

Page 37: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Always remember that a graph is to help us understand the data. Therefore, to interpret graphs of quantitative data, the SOCS strategy described below is followed:

CHAPTER 1 - EXPLORING DATA 37RCC @ 2020

Page 38: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

A. Dotplots

› Each data value is shown as a dot above its location on a number line.

CHAPTER 1 - EXPLORING DATA 38

Dotplot of the Kentucky Derby race winning times

RCC @ 2020

Page 39: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: The following table displays the US Environmental Protection Agency (EPA) estimates of highway gas mileage in miles per gallon (mpg) for a sample of 24 model year 2012 midsize cars.

CHAPTER 1 - EXPLORING DATA 39RCC @ 2020

Page 40: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› A dotplot of the data is shown below:

› Can you describe the shape, center, and spread of the data? Is there any outliers?

CHAPTER 1 - EXPLORING DATA 40RCC @ 2020

Page 41: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The shape of data distribution can be described using the following terms:

CHAPTER 1 - EXPLORING DATA 41RCC @ 2020

Page 42: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

CHAPTER 1 - EXPLORING DATA 42

Symmetric• Data clustered at the center

Skewed-to-the-left (or left-skewed)• Data clustered on the right

Skewed-to-the-right (or right-skewed)• Data clustered on the left

RCC @ 2020

Page 43: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Besides skewness, the shape of a distribution can be described in terms of modes (or local maxima):

CHAPTER 1 - EXPLORING DATA 43RCC @ 2020

Page 44: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: Compare the distributions of household size for UK and South Africa.

CHAPTER 1 - EXPLORING DATA 44RCC @ 2020

Page 45: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

B. Stem-and-leaf plots

› Also known as stemplots, they give a quick picture of the shape of a distribution while including the actual numerical values in the graph.

› Example:

CHAPTER 1 - EXPLORING DATA 45RCC @ 2020

Page 46: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› A stem-and-leaf plot can be made following the procedures below:

1. Separate each observation into a stem (i.e., all but the final digit) and a leaf (i.e., the last digit).

2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right. Don’t skip any stem.

3. Write each leaf in the row to the right of its stem. Arrange the leaves in ascending order.

4. Provide a key that explains the meaning of the stems and leaves.

CHAPTER 1 - EXPLORING DATA 46RCC @ 2020

Page 47: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: 20 female students from a school were randomly chosen and asked how many pairs of shoes they have. Here are the data:

› Present the results in a stem-and-leaf plot.

CHAPTER 1 - EXPLORING DATA 47RCC @ 2020

Page 48: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Problem: 20 male students from the same school were also selected randomly for the survey. The data are shown below. Construct the corresponding stemplot.

CHAPTER 1 - EXPLORING DATA 48RCC @ 2020

Page 49: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Sometimes, for clarity purpose, we construct the stem-and-leaf plots with split stems.

CHAPTER 1 - EXPLORING DATA 49

Cluster of data

Large gap between 22 and 35

RCC @ 2020

Page 50: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The data both males and females can be compared using the back-to-back stem-and-leaf plot with common stems.

CHAPTER 1 - EXPLORING DATA 50RCC @ 2020

Page 51: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Some tips to consider when making a stemplots:

1. Stemplots do not work well for large data sets, where each stem must hold a large number of leaves.

2. There is no magic number of stems to use, but five is a good minimum.

3. If you split stems, be sure that each stem is assigned an equal number of possible leaf digits (two stems, each with five possible leaves; or five stems, each with two possible leaves).

4. Round the data so that the final digit after rounding is suitable as a leaf. Do this when the data have too many digits.

CHAPTER 1 - EXPLORING DATA 51RCC @ 2020

Page 52: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Practice: Here are the numbers of points scored by teams in the California Division I-AAA high school basketball playoffs in a single day’s games:

Construct a stemplot for the data.

CHAPTER 1 - EXPLORING DATA 52RCC @ 2020

Page 53: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

C. Histograms

› Quantitative variables often take many values. A graph of the distribution is clearer if nearby values are grouped together. The resulting graph is called a histogram.

› The followings are the steps in making a histogram:

1. Divide the data into classes of equal width.

2. Find the count (frequency) or percent (relative frequency) of individuals in each class.

3. Label and scale the axes and draw the histogram.

CHAPTER 1 - EXPLORING DATA 53RCC @ 2020

Page 54: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: The following table presents the data of the percent of home state’s residents who were born outside the US for all 50 states.

CHAPTER 1 - EXPLORING DATA 54RCC @ 2020

Page 55: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The range is 1.2 to 27.2. Hence

› The frequency and relative frequency tables:

CHAPTER 1 - EXPLORING DATA 55RCC @ 2020

Page 56: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Using these data, the histograms can be made:

CHAPTER 1 - EXPLORING DATA 56RCC @ 2020

Page 57: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› To see more subtle details of the distribution, more classes (i.e., smaller width) can be used:

› What is the difference between these histograms and the previous ones?

CHAPTER 1 - EXPLORING DATA 57RCC @ 2020

Page 58: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Be careful when making histograms:

i. Don’t confuse histograms with bar graphs.

CHAPTER 1 - EXPLORING DATA 58RCC @ 2020

Page 59: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

ii. Use percents instead of counts on the vertical axis when comparing distributions with different numbers of observations.

› Why is the first one misleading?CHAPTER 1 - EXPLORING DATA 59RCC @ 2020

Page 60: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

iii. Just because a graph looks nice doesn’t make it a meaningful display of data.

› Which one is a better display?

CHAPTER 1 - EXPLORING DATA 60RCC @ 2020

Page 61: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

D. Density plots

› The size of the bins in a histogram can influence its looksand the interpretation of the distribution.

› Density plots smooth the bins in a histogram to reduce the effect of the choice of the size of bins.

› Example: Ages of those aboard the Titanic.

CHAPTER 1 - EXPLORING DATA 61RCC @ 2020

Page 62: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

4. Describing quantitative data with numbers

RCC @ 2020 CHAPTER 1 - EXPLORING DATA 62

Page 63: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Consider the following survey which is about the travel times in minutes for 15 randomly chosen workers in North Carolina:

› How do we describe the distribution?

CHAPTER 1 - EXPLORING DATA 63RCC @ 2020

Page 64: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The stemplot of these data:

› Conclusions:

▪ Unimodal and right-skewed

▪ Center around 20

▪ A possible outlier at 60

▪Wide spread (from 5 to 60)CHAPTER 1 - EXPLORING DATA 64

(Where is the exact center?)

(Is it really an outlier?)

(Is the spread reasonable?)RCC @ 2020

Page 65: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

A. Measuring center

i. Mean

› The arithmetic average that measures the center of data

CHAPTER 1 - EXPLORING DATA 65RCC @ 2020

Page 66: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: Here is the stemplot of the travel times for work for the sample of 15 North Carolinians.

a) Find the mean travel time for all 15 workers.

b) Calculate the mean again excluding the person who reported a 60-minute travel time to work. What do you notice?

CHAPTER 1 - EXPLORING DATA 66RCC @ 2020

Page 67: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

ii. Median

› Another common measure of center is the median which describes the midpoint of a distribution.

CHAPTER 1 - EXPLORING DATA 67RCC @ 2020

Page 68: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: Here are the travel times in minutes of 20 randomly chosen New York workers:

a) Make a stemplot of the data. Be sure to include the key.

b) Find the median. Show your work.

CHAPTER 1 - EXPLORING DATA 68RCC @ 2020

Page 69: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› What is the difference between mean and median if they both measure the center of data?

CHAPTER 1 - EXPLORING DATA 69

Quantity Characteristics

Mean Arithmetic average of a set of data. It gives the “average” value of a variable.

Median Midpoint of a set of data. It gives the “typical” value of a variable.

RCC @ 2020

Page 70: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Comparing the mean and median of a distribution:

› Note that in a skewed distribution, the mean is usually farther out in the long tail than is the median. Therefore, the latter one is reported often for strongly skewed distributions such as incomes.

CHAPTER 1 - EXPLORING DATA 70RCC @ 2020

Page 71: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

B. Measuring spread

i. Interquartile range (IQR)

› A measure of center alone can be misleading

CHAPTER 1 - EXPLORING DATA 71RCC @ 2020

Page 72: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› A useful numerical description of a distribution requires both a measure of center and a measure of spread.

› How can we measure spread?

CHAPTER 1 - EXPLORING DATA 72RCC @ 2020

Page 73: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: Calculate the quartiles for 15 workers’ travel times sampled in North Carolina.

› Arrange the times in increasing order:

CHAPTER 1 - EXPLORING DATA 73RCC @ 2020

Page 74: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: Find and interpret IQR for the data on travel times to work for 20 randomly selected New Yorkers.

› Rewrite the list of values in increasing order:

CHAPTER 1 - EXPLORING DATA 74RCC @ 2020

Page 75: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The first quartile is:

› The third quartile is:

› Therefore the interquartile range is:

› Interpretation: The range of the middle half of travel times for New Yorkers in the sample is 27.5 min.

CHAPTER 1 - EXPLORING DATA 75

𝑄1 =15 + 15

2= 15

𝑄3 =40 + 45

2= 42.5

𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 42.5 − 15 = 27.5

RCC @ 2020

Page 76: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› A useful application of IQR is to identify possible outliersin a data set.

› By definition:

› In the previous example, IQR is 27.5. Hence, data are flagged outliers if they do not falling within:

CHAPTER 1 - EXPLORING DATA 76

15 − 1.5 × 27.5 = −26.25

42.5 + 1.5 × 27.5 = 83.75

RCC @ 2020

Page 77: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The center and spread of a data set can be described well using a five-number summary:

› Note that approximately 25% of data fall between each pair of successive numbers in the summary.

CHAPTER 1 - EXPLORING DATA 77RCC @ 2020

Page 78: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The five-number summary is usually represented by the boxplot (or box-and-whisker plot) which is constructed based on the following steps:

CHAPTER 1 - EXPLORING DATA 78RCC @ 2020

Page 79: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: The followings are the data on the number of home rums that Barry Bonds hit in each of his 21 complete seasons before his retirement in 2007.

› Make a boxplot for these data.

CHAPTER 1 - EXPLORING DATA 79RCC @ 2020

Page 80: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Based on the data, we have

› The IQR is 45 − 25.5 = 19.5.

› The 1.5×IQR rule gives the range −3.75 and 74.25. Hence, there is no outliers in the data set.

› The resulting boxplot is thus:

CHAPTER 1 - EXPLORING DATA 80RCC @ 2020

Page 81: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Technically, what we are doing when making a boxplot is as shown:

CHAPTER 1 - EXPLORING DATA 81RCC @ 2020

Page 82: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

ii. Standard deviation and variance

› Another way of measuring the spread of data is to use standard deviation and its close relative, variance.

CHAPTER 1 - EXPLORING DATA 82RCC @ 2020

Page 83: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The value of standard deviation can be determined by the following procedures:

CHAPTER 1 - EXPLORING DATA 83RCC @ 2020

Page 84: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Example: Consider the following data on the number of pets owned by a group of 9 children. Determine the variance and standard deviation.

› We can construct the dotplot for the data set:

CHAPTER 1 - EXPLORING DATA 84RCC @ 2020

Page 85: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› The deviation of each observation from the mean and the corresponding squared value are listed below:

CHAPTER 1 - EXPLORING DATA 85RCC @ 2020

Page 86: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Based on this table, we can compute the variance

› The standard deviation is thus given by

› It means that the number of pets typically varies from the average (i.e., 5 pets) by about 2.55 pets.

CHAPTER 1 - EXPLORING DATA 86

𝑠𝑥2 =

16 + 4 + 1 + 1 + 1 + 0 + 4 + 9 + 16

9 − 1= 6.5

𝑠𝑥 = 6.5 = 2.55

RCC @ 2020

Page 87: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› There are some properties of standard deviations that are worth noting:

1. Standard deviation measures spread about the mean and should be used only when the mean is chosen as the measure of center.

2. Standard deviation is always greater than or equal to 0. It gets larger when observations become more spread out about the mean.

3. Standard deviation has the same units of measurement as the original observations.

4. Standard deviation is not resistant. A few outliers can make it very large. It is even more sensitive than the mean to a few extreme observations.

CHAPTER 1 - EXPLORING DATA 87RCC @ 2020

Page 88: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› There are two choices that can be used to describe center and spread: median/IQR and mean/standard deviation. Which one should we choose?

CHAPTER 1 - EXPLORING DATA 88

Choice Reasons

Median and IQR They are resistant to extreme values. Therefore, they are suitable for skewed distributions and the distributions with strong outliers.

Mean and Standard Deviation

They are more sensitive to the presence of outliers and strong skewness. Therefore, they are suitable for symmetric distributions

RCC @ 2020

Page 89: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› A rule of thumb that explains how standard deviation measures the variation in a data set or the spread in a relative frequency distribution is called the empirical rule or 68-95-99.7 rule.

CHAPTER 1 - EXPLORING DATA 89

68% of the measurements lie within one standard deviationRCC @ 2020

Page 90: Statistics 12 · Statistics 12 Chapter 1 –Exploring Data Dr. John Lo Royal Canadian College 2020-2021. 0. Real-life Case Study RCC @ 2020 CHAPTER 1 -EXPLORING DATA 2 ›Question:

› Practice: The following is the result from a survey regarding the texting habits of males and females. A random sample of students were asked to record the number of text messages sent and received over a two-day period.

› What conclusion can you draw? Give appropriate evidence to support your answer.

CHAPTER 1 - EXPLORING DATA 90RCC @ 2020