data summary using descriptive measures

Data SummaryUsing Descriptive Measures

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Types of Descriptive Measures

• Central Tendency

• Variation

• Position

• Shape


Kvanli/Guynes/Pavur


Measures of Central Tendency

• Mean

• Median

• Midrange

• Mode


Kvanli/Guynes/Pavur


The Mean

The Mean is simply the average of the data.


Kvanli/Guynes/Pavur


Sample Mean

x x

n

Each value in the sample is represented by xthus to get the mean simply add all the valuesin the sample and divide by the number of values in the sample


Kvanli/Guynes/Pavur


Accident Data Set

x 6 9 7 23 5

510.0


Kvanli/Guynes/Pavur


The Median

The Median (Md) of a set of data is the value in the center of the data values when they are arranged from lowest to highest.


Kvanli/Guynes/Pavur


Accident Data

Ordered array: 5, 6, 7, 9, 23

The value that has an equal number of items to the right and left is the median. Thus Md = 7


Kvanli/Guynes/Pavur


The Median

Md n1

2

st ordered value

In general if n is odd, Md is the center data value of the ordered data set.


Kvanli/Guynes/Pavur


Accident Data

Ordered array: 5, 6, 7, 9, 23

Md 51

2

st ordered value = 3rd value


Kvanli/Guynes/Pavur


The Median

If n is even, Md is the average of the two center values of the ordered data set.

For the ordered data set: 3, 8, 12, 14

Md 812

2

= 10.0


Kvanli/Guynes/Pavur


The Midrange

The Midrange (Mr) provides an easy-to-grasp measure of central tendency.

Mr L H

2


Kvanli/Guynes/Pavur


Accident Data

Mr 5 23

2

Mr L H

2

= 14.0

x Md = 7Note: that the Midrange is severely affected by outliersCompare:


Kvanli/Guynes/Pavur


The Mode

The Mode (Mo) of a data set is the value that occurs more than once and the most often.

The Mode is not always a measure of central tendency; this value need not occur in the center of the data.


Kvanli/Guynes/Pavur


Level of Measurement and Measure of Central Tendency


Kvanli/Guynes/Pavur


Measures of Variation

• Homogeneity refers to the degree of similarity within a set of data.

• The more Homogeneous a set of data is, the better the mean will represent a typical value.

• Variation is the tendency of data values to scatter about the mean, .x


Kvanli/Guynes/Pavur


Common Measures of Variation

• Range

• Variance

• Standard Deviation

• Coefficient of Variation


Kvanli/Guynes/Pavur


The Range

For the Accident data:

Range = H - L = 23 - 5 = 18


Kvanli/Guynes/Pavur


The Variance and Standard Deviation

Both measures describe the variation of the values about the mean.


Kvanli/Guynes/Pavur


Accident Data

Data Value (x - ) (x - )2

5 -5 256 -4 167 -3 99 -1 1

23 13 169 = 220

x

x

(x – x )2


Kvanli/Guynes/Pavur


Definition: Sample Variance

s2 220

5 –1

220

455.0

s2 ( x– x )2n– 1


Kvanli/Guynes/Pavur


Definition: Sample Standard Deviation

s ( x– x )2n –1

s 55.0 7.416


Kvanli/Guynes/Pavur


Definition:Population Variance

2 ( x– )2

N


Kvanli/Guynes/Pavur


Definition:Population Standard Deviation

(x – )2

N


Kvanli/Guynes/Pavur


The Coefficient of Variation

The Coefficient of Variation (CV) is used to compare the variation of two or more data sets where the values of the data differ greatly.

CV sx

100Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur


Example

Data Set 1: 5, 6, 7, 9, 23Data Set 2: 5000, 6000, 7000, 9000, 23,000

CV 7.416

100Data Set 110

. = 74.16

CV 7,416

10010,000

. = 74.16Data Set 2

Thus both data sets exhibit the same relative variationIntroduction to Business Statistics, 5e

Kvanli/Guynes/Pavur


Measures of Position

• Percentile (Quartile)

• Z Score


Kvanli/Guynes/Pavur


Percentile

The 35th Percentile (P35) is that value such that at most 35% of the data values are less than P35 and at most 65% of the data values are greater than P35 .


Kvanli/Guynes/Pavur


PercentileTexon Industries Data

nP

10050.35 17.5

17.5 represents the position of the 35th percentile


Kvanli/Guynes/Pavur


Percentile: Location Rules

• If n P/100 is not a counting number, round it up, and the Pth percentile will be the value in this position of the ordered data.

• If n P/100 is a counting number, the Pth percentile is the average of the number in this location (of the ordered data) and the number in the next largest location.


Kvanli/Guynes/Pavur


Quartiles

Quartiles are merely particular percentiles that divide the data into quarters, namely:

• Q1 = 1st quartile = 25th percentile (P25)

• Q2 = 2nd quartile = 50th percentile (P50)

• Q3 = 3rd quartile = 75th percentile (P75)


Kvanli/Guynes/Pavur


Z Scores• Z score determines the relative position of any

particular data value x and is based on the mean and standard deviation of the data set.

• The Z score is expresses the number of standard deviations the value x is from the mean.

• A negative Z score implies that x is to the left of the mean and a positive Z score implies that x is to the right of the mean.Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur


Z Score Equation

zx– x

sIntroduction to Business Statistics, 5e

Kvanli/Guynes/Pavur


Measures of Shape

• Skewness

• Kurtosis


Kvanli/Guynes/Pavur


Skewness

Skewness measures the tendency of a distribution to stretch out in a particular direction


Kvanli/Guynes/Pavur


Skewness

• In a symmetrical distribution the mean, median, and mode would all be the same value. Sk = 0 (fig 3.7)

• A positive Sk number implies a shape which is skewed right (fig3.8). The

mode < median < mean

• In a data set with a negative Sk value (fig3.9) the mean < Median < ModeIntroduction to Business Statistics, 5e

Kvanli/Guynes/Pavur


Figure 3.7


Kvanli/Guynes/Pavur


Figure 3.8


Kvanli/Guynes/Pavur


Figure 3.9


Kvanli/Guynes/Pavur


Skewness Calculation

Sk 3( x – Md)

s


Kvanli/Guynes/Pavur


Kurtosis

Kurtosis measures the peakedness of the distribution.


Kvanli/Guynes/Pavur


Chebyshev’s Inequality

• At least 75% of the data values are between

x - 2s and x + 2s or

At least 75% of the data values have a Z score value between -2 and +2

• At least 89% of the data values are between

x - 3s and x + 3s

• In general, at least (1-1/k2) x 100% of the data values lie between x - ks and x + ks for any k>1Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur


Empirical Rule

• Under the assumption of a bell shaped population

• Approximately 68% of the data values lie between

• Approximately 95% of the data values lie between

• Approximately 99.7% of the data values lie between

s xandx s

2s xandx s2

3s xandx s3Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur


Chebyshev’s versus Empirical


Kvanli/Guynes/Pavur


Grouped DataApproximations

x f mn

s2 f m2 – ( f m) 2/n

n– 1

Where: f is the frequency of the class and m is the m is the midpoint of the classIntroduction to Business Statistics, 5e

Kvanli/Guynes/Pavur


data summary using descriptive measures

Documents

business statistics

set of data

center data value

ordered data set

tendency of data values

x introduction

typical value

center values