mare 250 dr. jason turner
DESCRIPTION
Descriptive Measures. MARE 250 Dr. Jason Turner. Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used to summarize raw data. Descriptive Measures. Measures of Center - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/1.jpg)
MARE 250Dr. Jason Turner
Descriptive Measures
![Page 2: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/2.jpg)
Descriptive Measures
Descriptive Measures – numbers that are used to describe datasets
Parts of Descriptive Statistics
Used to summarize raw data
![Page 3: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/3.jpg)
Descriptive Measures
Measures of Center
Measures of Variation – how data are distributed around center
5-number summary – used to construct visual representation - Boxplot
![Page 4: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/4.jpg)
Measures of Center
Measure of Central Tendency – indicate where center or most typical value of data set lie
Mean, Median, Mode
![Page 5: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/5.jpg)
Measures of Center
Mean – of a dataset is the sum of the observations divided by the number of observations; Arithmetic Average
10,20,30,40,50,60,70,80,90,100 = 550
550 / 10 = 55
![Page 6: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/6.jpg)
Measures of Center
Median – the number that divides the bottom 50% of the data from the top 50%
1) Arrange data in increasing order2) If number of observations is ODD, the median is the observation exactly in the middle3) If the number of observations is EVEN, median is the mean of the middle two observations
![Page 7: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/7.jpg)
Measures of Center
Median = (n+1)/2
10,20,30,40,50,60,70,80,90,100, 110(ODD); Median = 60
10,20,30,40,50,60,70,80,90,100(EVEN); Median = 50+60/2 = 55
![Page 8: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/8.jpg)
Measures of Center
Mode – frequency of each value inthe data set
If no value occurs more than once – No Mode; 10,20,30,40,50,60,70,80,90,100
Otherwise – any value with greatest frequency is Mode; 10,20,30,40,50,50, 60,70,80,90,100…Mode is 50
![Page 9: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/9.jpg)
Measures of Center
The mode is useful if the distribution is skewed or bimodal (having two very pronounced values around which data are concentrated)
30
Num
ber o
f Ind
ivid
uals
0
10
20
![Page 10: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/10.jpg)
You are so totally skewed!The mean is sensitive to extreme (very large or small) observations and the median is not
Therefore – you can determine how skewed your data is by looking at the relationship between median and mean
Mean is Greater than the Median
Mean and Median are Equal
Mean is Less Than the Median
![Page 11: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/11.jpg)
Resistance Measures
A resistance measure is not sensitive to the influences of a few extreme observations
Median – resistant measure of centerMean – not resistant
Outliers DO NOT affect Median
Outliers DO affect Mean
![Page 12: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/12.jpg)
Resistance Measures
Resistance of Mean can be improved by using – Trimmed Means – a specified percentage of the smallest and largest observations are removed before computing the mean
Will do something like this later when exploring the data and evaluating outliers…(their effects upon the mean)
![Page 13: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/13.jpg)
Measures of Variation
Measures of Variation (Spread) – amount of variability in the data set
Range, Standard Deviation, Variance
Range = Maximum Observation – Minimum Observation10,20,30,40,50,60,70,80,90,100;Range = 100-10 = 90
![Page 14: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/14.jpg)
Measures of Variation
Standard Deviation - (±SD) measures the variation by indicating how far (on average) the observations are from the mean
Large Dev. – farFrom mean
Small Dev. – Close to mean
![Page 15: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/15.jpg)
Measures of Variation
Variance - (measure used by statistical formulas) square of the standard deviation
“Equal Variance” is one of the assumptions of parametric means testing…(we will learn this later)
![Page 16: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/16.jpg)
Measures of Variation
Three Standard Deviations Rule – almost all observations in any data set lie within three standard deviations to either side of the mean; “almost all” defined in 2-ways by stats nerds…
![Page 17: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/17.jpg)
Measures of Variation
Three Standard Deviations Rule –
Chebychev’s Rule – 89% of data within 3 Standard Deviations
Empirical Rule – 99.7% of observations are within 3 Standard deviations; if data are approximately bell-shaped
![Page 18: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/18.jpg)
5 Number Summary
Percentiles – data set is divided into hundredths (100 equal parts)
Why?..Percentiles are not sensitive to the influence of a few extreme observations (outliers)
![Page 19: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/19.jpg)
5 Number Summary
Quartiles – data set is divided into quarters (4 equal parts); most typically used
Data set has 3 Quartiles: Q1, Q2, Q3
Q1 – is the number that divides the bottom 25% from top 75%
Q2 – is the median; bottom 50% from top 50%Q3 – is the number that divides the bottom 75% from top 25%
![Page 20: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/20.jpg)
5 Number Summary
Quartiles – data set is divided into quarters (4 equal parts); most typically used
![Page 21: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/21.jpg)
5 Number Summary
Interquartile Range (IQR) – the difference between the first and third quartiles
IQR = Q3 – Q1
The IQR gives you the range of the middle 50% of the data
![Page 22: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/22.jpg)
Outlier, Outlier
Outliers – observations that fall well outside the overall pattern of the data
Requires special attention
May be the result of:Measurement or Recording ErrorObservation from a different populationUnusual Extreme observation
![Page 23: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/23.jpg)
Pants on Fire!
Must deal with outliers: (Yes, really!)
If error – can delete; otherwise judgment call
Can use quartiles and IQR to identify potential outliers
![Page 24: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/24.jpg)
The Outer Limits
Lower and Upper Limits:Lower limit – is the number that lies 1.5
IQR’s below the first quartile
Lower Limit = Q1 - 1.5 * IQR
Upper limit – is the number that lies 1.5 IQR’s above the first quartile
Upper Limit = Q3 + 1.5 * IQR
![Page 25: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/25.jpg)
The Outer Limits
If a value is outside the “Outer Limits” of a dataset it is an…
OUTLIER!OUTLIER!
![Page 26: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/26.jpg)
5 Number Summary
5-Number Summary:Min, Q1, Q2, Q3, Max
Written in increasing order
Provides information on Center and Variation
Are used to construct Box-Plots
![Page 27: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/27.jpg)
Boxplot
Boxplot (Box-and-Whisker-Design): based on the 5-number summary provide graphic display of the center and variation
Q1 Q2 Q3
Min Max
0 70
![Page 28: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/28.jpg)
Boxplot
Potential Outlier
0 70
*
Modified Boxplot – includes outliers
Note that Min & Max are determine after outliers are removed!
![Page 29: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/29.jpg)
Boxplot
![Page 30: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/30.jpg)
Boxplot
Boxplots summarize information about the shape, dispersion, and center of your data
They can also help you spot outliers
![Page 31: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/31.jpg)
Boxplot
Left edge of the box represents the first quartile (Q1), while the right edge represents the third quartile (Q3)Box portion of the plot represents the interquartile range (IQR) - middle 50% of data
Q1 Q2 Q3LowerLimit
UpperLimit
0 70
![Page 32: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/32.jpg)
Boxplot
The line drawn through the box represents the median of the data
The lines extending from the box are called whiskers
The whiskers extend outward to indicate the Upper and Lower limits in the data set (excluding outliers)
![Page 33: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/33.jpg)
Boxplot
Extreme values, or outliers, are represented by dots A value is considered an outlier if it is outside of the box (greater than Q3 or less than Q1) by more than 1.5 times the IQR
0 70
*
Potential Outlier
![Page 34: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/34.jpg)
Boxplot
Use the boxplot to assess the symmetry of the data:
If the data are fairly symmetric, the median line will be roughly in the middle of the IQR box and the whiskers will be similar in length
0 70
![Page 35: MARE 250 Dr. Jason Turner](https://reader035.vdocuments.us/reader035/viewer/2022081517/56816833550346895ddde6fa/html5/thumbnails/35.jpg)
Boxplot
Use the boxplot to assess the symmetry of the data:
If the data are skewed, the median may not fall in the middle of the IQR box, and one whisker will likely be noticeably longer than the other
0 70