min mod median
TRANSCRIPT
1
Basic Business Statistics: Concepts & Applications
Activity 4+ 5 + 6Descriptive Statistics
and Graphical Analysis
2
Learning Objectives
• Measures of Center and Location: Mean (Arithmetic Average,Weighted Mean, Geometric Mean), Mode, Median
• Other measures of Location: Percentiles, Quartiles
• Measures of Variation: Range, Interquartile range, Variance and Standard deviation, Coefficient of variation
3
Contents
• 1. Measures of Central Tendency
• 1.1. Mean (Arithmetic Mean)
• 1.2. Mode
• 1.3. Median
• 1.4. Shape of a Distribution
• 2. Other Location Measures: Percentiles and Quartiles
• 2.1. Quartiles
• 2.2. Box-and-Whisker Plot
• 2.3. Distribution Shape and Box-and-Whisker Plot
4
Contents (continued)
• 3. Measures of Variation • 3.1. Range• 3.2. Interquartile Range• 3.3. Variance• 3.4. Standard Deviation• 3.5. Coefficient of Variation• 3.6. The Empirical Rule and Tchebysheff’s Theorem• 3.7. Tchebysheff’s Theorem• Using Microsoft Excel• Exercices
5
Summary Measures
Center and Location
Mean
Median
Mode
Other Measures of Location
Weighted Mean
Describing Data Numerically
Variation
Variance
Standard Deviation
Range
Percentiles
Interquartile Range
Quartiles
6
1. Measures of Central Tendency
Center and Location
Mean Median Mode Weighted Mean
N
x
n
xx
N
ii
n
ii
1
1
i
iiW
i
iiW
w
xw
w
xwX
7
1.1. Mean (Arithmetic Mean)
Mean (arithmetic mean) of data values
Sample mean
Population mean
1 1 2
n
ii n
XX X X
Xn n
1 1 2
N
ii N
XX X X
N N
Sample Size
Population Size
8
1.1 Mean (continued) (Weighted Mean)
• Used when values are grouped by frequency or relative importance
28
87
126
45
FrequencyDays to Complete
28
87
126
45
FrequencyDays to Complete
Example: Sample of 26 Repair Projects Weighted Mean Days
to Complete:
days 6.31 26
164
28124
8)(27)(86)(125)(4
w
xwX
i
iiW
9
1.2. Mode• A measure of central tendency• Value that occurs most often• Not affected by extreme values• Used for either numerical or categorical data• There may be no mode or several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
10
1.3. Median• Robust measure of central tendency• Not affected by extreme values
• In an ordered array, the median is the “middle” numberIf n or N is odd, the median is the middle numberIf n or N is even, the median is the average of the two
middle numbers
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
11
1.4. Shape of a Distribution
• Describes how data is distributed
• Symmetric or skewed
Mean = Median = ModeMean < Median < Mode Mode < Median < Mean
Right-SkewedLeft-Skewed Symmetric
(Longer tail extends to left) (Longer tail extends to right)
12
2. Other Location Measures
Other Measures of Location
Percentiles Quartiles
1st quartile = 25th percentile
2nd quartile = 50th percentile
= median
3rd quartile = 75th percentile
The pth percentile in a data array:
p% are less than or equal to this value
(100 – p)% are greater than or equal to this value
(where 0 ≤ p ≤ 100)
13
2.1. Quartiles
• Split Ordered Data into 4 Quarters
• Position of ith Quartile
25% 25% 25% 25%
1Q 2Q 3Q
1
4i
i nQ
14
2.1. Quartiles (continued)
Data in Ordered Array: 11 12 13 16 16 17 17 18 21
11Position of 2.
1 9 1 12 13
4 25 12.5Q Q
= Median = 162Q Q3 = 17,5
15
2.2. Box-and-Whisker Plot
Graphical Display of Data Using 5-Number Summary:
Median
4 6 8 10 12
Q3Q1 XlargestXsmallest
16
2.3. Distribution Shape and Box-and-Whisker Plot
Right-SkewedLeft-Skewed Symmetric
1Q 1Q 1Q2Q 2Q 2Q3Q 3Q3Q
Refer to summary on Exhibit 3.4 (p.114), Fig. 3.13 (p.115)
17
3. Measures of Variation
Variation
Variance Standard Deviation Coefficient of Variation
PopulationVariance
Sample
Variance
PopulationStandardDeviationSample
Standard
Deviation
Range
Interquartile Range
18
3.1. Range
• Measure of variation
• Difference between the largest and the smallest observations:
• Ignore the way in which data are distributed
Largest SmallestRange X X
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
19
3.2. Interquartile Range
• Interquartile range = 3rd quartile – 1st quartile
3 1Interquartile Range 17.5 12.5 5Q Q
Data in Ordered Array: 11 12 13 16 16 17 17 18 21
Median(Q2)
XmaximumX
minimum Q1 Q3
Example:
25% 25% 25% 25%
12 30 45 57 70
Interquartile range = 57 – 30 = 27
20
2
2 1
N
ii
X
N
• Important measure of variation
• Shows variation about the mean
Sample variance:
Population variance:
2
2 1
1
n
ii
X XS
n
3.3. Variance
21
3.4. Standard Deviation
• Most important measure of variation
• Shows variation about the mean
Sample standard deviation:
Population standard deviation:
2
1
1
n
ii
X XS
n
2
1
N
ii
X
N
22
Comparing Standard Deviations
Mean = 15.5s = 3.33811 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
23
3.5. Coefficient of Variation
• Measure of Relative Dispersion
• Always in %
• Shows Variation Relative to Mean
• Used to Compare 2 or More Groups
• Formula:
100%μ
σCV
Population Sample
100%x
sCV
24
3.6. The Empirical Rule and Tchebysheff’s Theorem
• 3.6.1. The Empirical Rule
◘ contains about 68% of the values in the population or the sample
1σμ
μ
68%
1σμ
25
3.6.1. The Empirical Rule (continued)
3σμ
99.7%95%
2σμ
26
3.7. Tchebysheff’s Theorem
• Regardless of how the data are distributed, at least (1 - 1/k2) of the values will fall within k standard deviations of the mean
• Examples:
(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) = 89% ……… k=3 (μ ± 3σ)
27
Using Microsoft Excel
• Use menu choice:
tools / data analysis / descriptive statistics
• Enter details in dialog box
28
Using Microsoft Excel (continued)
Use menu choice:
tools / data analysis /
descriptive statistics
29
Using Microsoft Excel (continued)
Enter dialog box details
Check box for summary statistics
Click OK
30
Using Microsoft Excel (continued)
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000500,000300,000100,000100,000
31
Froblem for Activity 6
• The following data represent the waiting time in minutes (definded as the time the customer enters the line until he or she reaches the teller window) of 15 customers at two bank branches.
32
Froblem for Activity 6 (continued)Bank 1 Bank 2
4.21
5.55
3.02
5.13
4.77
2.34
3.54
3.2
4.5
6.1
0.38
5.12
6.46
6.19
3.79
9.66
5.90
8.02
5.79
8.73
3.82
8.01
8.35
10.49
6.68
5.64
4.08
6.17
9.91
5.47
33
Froblem for Activity 6 (continued)
• For each of the waiting time at the two bank branches :
1. Using Exel: List the five-number summary (Xmin – First quartile – Median – Third quartile – Xmax)
2. Using SPSS for window: Compute the range, interquartile range, variance, standard deviation, coefficient of variation.