statistics for business and economics - faculty and...
TRANSCRIPT
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1
Statistics for Business and Economics
Chapter 2
Describing Data: Numerical
Measures of Central Tendency
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Central Tendency
Mean Median Mode
n
xx
n
1ii∑
==
Overview
Midpoint of ranked values
Most frequently observed value
Arithmetic average
Ch. 2-2
2.1
Arithmetic Mean
n The arithmetic mean (mean) is the most common measure of central tendency
n For a population of N values:
n For a sample of size n:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Sample size
nxxx
n
xx n21
n
1ii +++==
∑= ! Observed
values
Nxxx
N
xµ N21
N
1ii +++==
∑= !
Population size
Population values
Ch. 2-3
Arithmetic Mean
n The most common measure of central tendency n Mean = sum of values divided by the number of values n Affected by extreme values (outliers)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
(continued)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
3515
554321
==++++ 4
520
5104321
==++++
Ch. 2-4
Median
n In an ordered list, the median is the “middle” number (50% above, 50% below)
n Not affected by extreme values
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Ch. 2-5
Finding the Median
n The location of the median:
n If the number of values is odd, the median is the middle number n If the number of values is even, the median is the average of
the two middle numbers
n Note that is not the value of the median, only the
position of the median in the ranked data
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
dataorderedtheinposition21npositionMedian +
=
21n +
Ch. 2-6
Mode
n A measure of central tendency n Value that occurs most often n Not affected by extreme values n Used for either numerical or categorical data n There may may be no mode n There may be several modes
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode Ch. 2-7
Review Example
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
n Five houses on a hill by the beach $2,000 K
$500 K
$300 K
$100 K
$100 K
House Prices: $2,000,000 500,000 300,000 100,000 100,000
Ch. 2-8
Review Example: Summary Statistics
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
n Mean: ($3,000,000/5) = $600,000
n Median: middle value of ranked data = $300,000
n Mode: most frequent value = $100,000
House Prices: $2,000,000 500,000 300,000 100,000 100,000
Sum 3,000,000
Ch. 2-9
Which measure of location is the “best”?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
n Mean is generally used, unless extreme values (outliers) exist . . .
n Then median is often used, since the median is not sensitive to extreme values. n Example: Median home prices may be reported for
a region – less sensitive to outliers
Ch. 2-10
Shape of a Distribution
n Describes how data are distributed n Measures of shape
n Symmetric or skewed
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Mean = Median Mean < Median Median < Mean
Right-Skewed Left-Skewed Symmetric
Ch. 2-11
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Geometric Mean
n Geometric mean
n Geometric mean rate of return
1/nn21
nn21g )xx(x)xx(xx ×××=×××= !!
1)x...x(xr 1/nn21g −×××=
Ch. 2-12
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Example Initial Investment: $100 After 5 years: $125 Question: the average annual rate of returns? • 25 % divided by 5 years = 5%? This is Wrong!
• ``Compound Interest’’ over 5 years $100 x (1+0.05)^5 = $127.63 > $125 after 5 years • Answer: $100 x (1+r)^5 = $125 è
Ch. 2-13
Measures of Variability
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Same center, different variation
Variation
Variance Standard Deviation
Coefficient of Variation
Range Interquartile Range
n Measures of variation give information on the spread or variability of the data values.
Ch. 2-14
2.2
Range
n Simplest measure of variation n Difference between the largest and the smallest
observations:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Range = Xlargest – Xsmallest
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
Ch. 2-15
Disadvantages of the Range
n Ignores the way in which data are distributed
n Sensitive to outliers
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
7 8 9 10 11 12 Range = 12 - 7 = 5
7 8 9 10 11 12 Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
Ch. 2-16
Interquartile Range
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Median (Q2)
X maximum X
minimum Q1 Q3
Example:
25% 25% 25% 25%
12 30 45 57 70
Interquartile range = 57 – 30 = 27
Ch. 2-17
STATA Example
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-18
2040
6080
100
High HW Group Low HW Group
Fina
l Gra
deInterquartile Rage by Low vs. High HW Group
Population Variance
n Average of squared deviations of values from the mean
n Population variance:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
N
µ)(xσ
N
1i
2i
2∑=
−=
Where = population mean
N = population size
xi = ith value of the variable x
µ
Ch. 2-19
Sample Variance
n Average (approximately) of squared deviations of values from the mean
n Sample variance:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
1-n
)x(xs
n
1i
2i
2∑=
−=
Where = arithmetic mean
n = sample size
Xi = ith value of the variable X
X
Ch. 2-20
Population Standard Deviation
n Most commonly used measure of variation n Shows variation about the mean n Has the same units as the original data
n Population standard deviation:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
N
µ)(xσ
N
1i
2i∑
=
−=
Ch. 2-21
Sample Standard Deviation
n Most commonly used measure of variation n Shows variation about the mean n Has the same units as the original data
n Sample standard deviation:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
1-n
)x(xS
n
1i
2i∑
=
−=
Ch. 2-22
Calculation Example: Sample Standard Deviation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Sample Data (xi) : 10 12 14 15 17 18 18 24
n = 8 Mean = x = 16
4.24267
126
18
16)(2416)(1416)(1216)(10
1n
)x(24)x(14)x(12)X(10s
2222
2222
==
−
−++−+−+−=
−
−++−+−+−=
!
!
A measure of the “average” scatter around the mean
Ch. 2-23
Measuring variation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Small standard deviation
Large standard deviation
Ch. 2-24
Comparing Standard Deviations
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5 s = 0.926
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 s = 4.570
Data C
Ch. 2-25
Advantages of Variance and Standard Deviation
n Each value in the data set is used in the calculation
n Values far from the mean are given extra weight (because deviations from the mean are squared)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-26
Coefficient of Variation
n Measures relative variation n Always in percentage (%) n Shows variation relative to mean n Can be used to compare two or more sets of
data measured in different units
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
100%x
sCV ⋅⎟⎟
⎠
⎞⎜⎜⎝
⎛=
Ch. 2-27
Comparing Coefficient of Variation
n Stock A: n Average price last year = $50 n Standard deviation = $5
n Stock B: n Average price last year = $100 n Standard deviation = $5
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Both stocks have the same standard deviation, but stock B is less variable relative to its price
10%100%$50
$5100%
x
sCVA =⋅=⋅⎟⎟
⎠
⎞⎜⎜⎝
⎛=
5%100%$100
$5100%
x
sCVB =⋅=⋅⎟⎟
⎠
⎞⎜⎜⎝
⎛=
Ch. 2-28
n If the data distribution is approximated by normal distribution, then the interval:
n contains about 68% of the values in the population or the sample
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
The Empirical Rule
1σµ ±
µ
68%
1σµ±Ch. 2-29
n contains about 95% of the values in the population or the sample
n contains almost all (about 99.7%) of the values in the population or the sample
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
The Empirical Rule
2σµ ±
3σµ ±
3σµ ±
99.7% 95%
2σµ ±
Ch. 2-30
Weighted Mean
n The weighted mean of a set of data is
n Where wi is the weight of the ith observation and
n Use when data is already grouped into n classes, with wi values in the ith class
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
x = wixii=1
n
∑ =w1x1 +w2x2 +!+wnxn
Ch. 2-31
wi∑ =1
2.3
Example
n Consider a student with the scores of assignment (x1), midterm (x2), and final exam (x3) given by
x1 = 90, x2 = 90, and x3 = 46. n The weights: n w1 = 0.1, w2 =0.3, and w3 = 0.6. n The final grade for this student is
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
wixii=1
3
∑ = 0.1×90+0.3×90+0.6× 46=63.6
Ch. 2-32
2.3
The Sample Covariance n The covariance measures the strength of the linear relationship
between two variables n The population covariance:
n The sample covariance:
n Only concerned with the strength of the relationship n No causal effect is implied n Depends on the unit of measurement
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
N
))(y(xy),(xCov
N
1iyixi
xy
∑=
−−==
µµσ
1n
)y)(yx(xsy),(xCov
n
1iii
xy −
−−==∑=
Ch. 2-33
2.4
Interpreting Covariance
n Covariance between two variables:
Cov(x,y) > 0 x and y tend to move in the same direction
Cov(x,y) < 0 x and y tend to move in opposite directions
Cov(x,y) = 0 x and y are independent
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-34
Coefficient of Correlation
n Measures the relative strength of the linear relationship between two variables
n Population correlation coefficient:
n Sample correlation coefficient:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
YXssy),(xCovr =
YXσσy),(xCovρ =
Ch. 2-35
Features of Correlation Coefficient, r
n Unit free n Ranges between –1 and 1
n The closer to –1, the stronger the negative linear relationship
n The closer to 1, the stronger the positive linear relationship
n The closer to 0, the weaker any positive linear relationship
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-36
Scatter Plots of Data with Various Correlation Coefficients
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = +.3 r = +1
Y
X r = 0
Ch. 2-37
Example: HW and Final Grade
n r = .0.499
n There is a relatively strong positive linear relationship between HW scores and
Final Grades n Students who scored high on HW assignment
tended to have high final grades
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-38
2040
6080
100
Fina
l Gra
de
0 2 4 6 8 10Homework
Grade Fitted values
Correlation = 0.499HW and Final Grade