one quantitative variable: shape and...
TRANSCRIPT
Statistics: Unlocking the Power of Data Lock5
Section 2.2
One Quantitative Variable:
Shape and Center
Statistics: Unlocking the Power of Data Lock5
Outline One Quantitative Variable
Visualization: dotplot and histogram
Shape: symmetric, skewed
Measures of center: mean and median
Outliers and resistance
Statistics: Unlocking the Power of Data Lock5
One Quantitative Variable
World gross for all 2011 Hollywood movies HollywoodMovies2011
More graphics on profits for Hollywood movies
Statistics: Unlocking the Power of Data Lock5
HollywoodMovies2011
Statistics: Unlocking the Power of Data Lock5
Dotplot In a dotplot, each case is represented by a dot and dots are stacked. Easy way to see each case
Statistics: Unlocking the Power of Data Lock5
Histogram The height of the each bar corresponds to the number of cases within that range of the variable
Statistics: Unlocking the Power of Data Lock5
Histogram vs Bar Chart A bar chart is for categorical data, and the x-axis has no numeric scale
A histogram is for quantitative data, and the x-axis is numeric
For a categorical variable, the number of bars equals the number of categories, and the number in each category is fixed
For a quantitative variable, the number of bars in a histogram is up to you (or your software), and the appearance can differ with different number of bars
Statistics: Unlocking the Power of Data Lock5
Shape
Symmetric Left-Skewed Right-Skewed
Long right tail
Statistics: Unlocking the Power of Data Lock5
Bell-Shaped Fr
eque
ncy
-15 -10 -5 0 5 10 15
050
150
Freq
uenc
y
-15 -10 -5 0 5 10 15
050
150
Statistics: Unlocking the Power of Data Lock5
Notation The sample size, the number of cases in the sample, is denoted by n
We often let x or y stand for any variable, and x1 , x2 , …, xn represent the n values of the variable x
x1 = 97.009, x2 = 201.897, x3 = 216.196, …
Statistics: Unlocking the Power of Data Lock5
Mean
The mean or average of the data values is
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑𝑎𝑎𝑑𝑑𝑎𝑎 𝑣𝑣𝑎𝑎𝑎𝑎𝑠𝑠𝑣𝑣𝑠𝑠𝑛𝑛𝑠𝑠𝑠𝑠𝑛𝑛𝑣𝑣𝑛𝑛 𝑜𝑜𝑜𝑜 𝑑𝑑𝑎𝑎𝑑𝑑𝑎𝑎 𝑣𝑣𝑎𝑎𝑎𝑎𝑠𝑠𝑣𝑣𝑠𝑠
Sample mean: �̅�𝑥 Population mean: µ (“mu”)
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 =𝑥𝑥1 + 𝑥𝑥2 + ⋯+ 𝑥𝑥𝑛𝑛
𝑚𝑚=∑𝑥𝑥𝑚𝑚
R: mean(x)
Statistics: Unlocking the Power of Data Lock5
Median
The median, m, is the middle value when the data are ordered.
If there are an even number of values, the median is the average of the two middle values.
The median splits the data in half.
Statistics: Unlocking the Power of Data Lock5
Measures of Center
For each of the following variables: Find the mean Find the median Identify any outliers
1. 8, 12, 3, 18, 15 2. 41, 53, 38, 12, 115, 47, 50 3. 15, 22, 12, 28, 58, 18, 25, 18 4. 110, 112, 118, 119, 122, 125, 129, 135, 138, 140
Statistics: Unlocking the Power of Data Lock5
m = 76.66
µ =150.74 Mean is “pulled” in the direction of skewness
Measures of Center
World Gross (in millions)
Statistics: Unlocking the Power of Data Lock5
Skewness and Center
A distribution is left-skewed. Which measure of center would you expect to be higher?
Median. The mean will be pulled down towards the skewness (towards the long tail).
Statistics: Unlocking the Power of Data Lock5
Outlier
An outlier is an observed value that is notably distinct from the other
values in a dataset.
Statistics: Unlocking the Power of Data Lock5
Outliers
World Gross (in millions)
Harry Potter
Transformers Pirates of the Caribbean
Statistics: Unlocking the Power of Data Lock5
Resistance
A statistic is resistant if it is relatively unaffected by extreme
values.
The median is resistant while the mean is not.
Mean Median With Harry Potter $150,742,300 $76,658,500
Without Harry Potter $141,889,900 $75,009,000
Statistics: Unlocking the Power of Data Lock5
Outliers When using statistics that are not resistant to outliers, stop and think about whether the outlier is a mistake
If not, you have to decide whether the outlier is part of your population of interest or not
Usually, for outliers that are not a mistake, it’s best to run the analysis twice, once with the outlier(s) and once without, to see how much the outlier(s) are affecting the results
Statistics: Unlocking the Power of Data Lock5
Summary Visualizing one quantitative variable: Dotplot Histogram
Shape: Symmetric Skewed
Measures of center: Mean (not resistant to outliers) Median (resistant to outliers)