slide 1 statistics workshop tutorial 6 measures of relative standing exploratory data analysis
TRANSCRIPT
Slide 1
Statistics Workshop Tutorial 6
•Measures of Relative Standing• Exploratory Data Analysis
Copyright © 2004 Pearson Education, Inc.
Slide 2
Created by Tom Wegleitner, Centreville, Virginia
Section 2-6Measures of Relative
Standing
Copyright © 2004 Pearson Education, Inc.
Slide 3
z Score (or standard score)
the number of standard deviations that a given value x is above or
below the mean.
Definition
Copyright © 2004 Pearson Education, Inc.
Slide 4
Sample Population
x - µz =
Round to 2 decimal places
Measures of Positionz score
z = x - xs
Copyright © 2004 Pearson Education, Inc.
Slide 5Interpreting Z Scores
Whenever a value is less than the mean, its corresponding z score is negative
Ordinary values: z score between –2 and 2 sd
Unusual Values: z score < -2 or z score > 2 sd
FIGURE 2-14
Copyright © 2004 Pearson Education, Inc.
Slide 6Definition
Q1 (First Quartile) separates the bottom 25% of sorted values from the top 75%.
Q2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%.
Q1 (Third Quartile) separates the bottom 75% of sorted values from the top 25%.
Copyright © 2004 Pearson Education, Inc.
Slide 7
Q1, Q2, Q3 divides ranked scores into four equal parts
Quartiles
25% 25% 25% 25%
Q3Q2Q1(minimum) (maximum)
(median)
Copyright © 2004 Pearson Education, Inc.
Slide 8Percentiles
Just as there are quartiles separating data into four parts, there are 99 percentiles denoted P1, P2, . . . P99, which partition the data into 100 groups.
Copyright © 2004 Pearson Education, Inc.
Slide 9Finding the Percentile
of a Given Score
Percentile of value x = • 100number of values less than x
total number of values
From Percentile to Data Value
• What score is at the kth percentile?
• (1) Rank the data from lowest to highest
• (2) Find L (locator) L = k% * n
• a) If L is not a whole number, round up and find the score in that position
• b) If L is a whole #, find the average of the scores in positions L and L+1
Copyright © 2004 Pearson Education, Inc.
Slide 11
Interquartile Range (or IQR): Q3 - Q1
10 - 90 Percentile Range: P90 - P10
Semi-interquartile Range:2
Q3 - Q1
Midquartile:2
Q3 + Q1
Some Other Statistics
Copyright © 2004 Pearson Education, Inc.
Slide 13
Created by Tom Wegleitner, Centreville, Virginia
Section 2-7Exploratory Data Analysis
(EDA)
Copyright © 2004 Pearson Education, Inc.
Slide 14
Exploratory Data Analysis is the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate data sets in order to understand their important characteristics
Definition
Outliers
• An outlier is a very high or very low value that stand apart from the rest of the data
• They may be from data collection errors, data entry errors, or simply valid but unusual data values.
• Always identify and examine outliers to determine if they are in error
Copyright © 2004 Pearson Education, Inc.
Slide 16Important Principles
An outlier can have a dramatic effect on the mean
An outlier have a dramatic effect on the standard deviation
An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally
obscured
Copyright © 2004 Pearson Education, Inc.
Slide 17
For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value
A boxplot ( or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3
Definitions
Copyright © 2004 Pearson Education, Inc.
Slide 18Boxplots
Figure 2-16
Outliers
• A data point is considered an outlier if it is 1.5 times the interquartile range above the 75th percentile or 1.5 times the interquartile range below the 25th percentile
• In other words, outliers are numbers outside the interval [Q1-1.5*IQR, Q3+1.5*IQR]
Box Plots and Histograms
• When looking at one variable, it’s a good idea to look at the box plot and histogram together
• Box plots complement histograms by providing more specific information about the center, the quartiles, and outliers
Copyright © 2004 Pearson Education, Inc.
Slide 21
Figure 2-17
Boxplots
Shape, Center and Spread
• What should you tell about a quantitative variable?
• Always report the shape, center and spread
• If the distribution is skewed, report the median and IQR
• In a symmetric distribution, report the mean and standard deviation
• If there are any clear outliers and you are reporting the mean and the standard deviation, report them with the outliers and without them
Slide 23
Now we are ready for
Part 21 of Day 1