statistics for business and economics - faculty and...

Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1

Statistics for Business and Economics

Chapter 2

Describing Data: Numerical

Measures of Central Tendency

Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall

Central Tendency

Mean Median Mode

n

xx

n

1ii∑

==

Overview

Midpoint of ranked values

Most frequently observed value

Arithmetic average

Ch. 2-2

2.1

Arithmetic Mean

n  The arithmetic mean (mean) is the most common measure of central tendency

n  For a population of N values:

n  For a sample of size n:

Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Sample size

nxxx

n

xx n21

n

1ii +++==

∑= ! Observed

values

Nxxx

N

xµ N21

N

1ii +++==

∑= !

Population size

Population values

Ch. 2-3

Arithmetic Mean

n  The most common measure of central tendency n  Mean = sum of values divided by the number of values n  Affected by extreme values (outliers)


(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

3515

554321

==++++ 4

520

5104321

==++++

Ch. 2-4

Median

n  In an ordered list, the median is the “middle” number (50% above, 50% below)

n  Not affected by extreme values


0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Ch. 2-5

Finding the Median

n  The location of the median:

n  If the number of values is odd, the median is the middle number n  If the number of values is even, the median is the average of

the two middle numbers

n  Note that is not the value of the median, only the

position of the median in the ranked data


dataorderedtheinposition21npositionMedian +

=

21n +

Ch. 2-6

Mode

n  A measure of central tendency n  Value that occurs most often n  Not affected by extreme values n  Used for either numerical or categorical data n  There may may be no mode n  There may be several modes


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode Ch. 2-7

Review Example


n  Five houses on a hill by the beach $2,000 K

$500 K

$300 K

$100 K

$100 K

House Prices: $2,000,000 500,000 300,000 100,000 100,000

Ch. 2-8

Review Example: Summary Statistics


n  Mean: ($3,000,000/5) = $600,000

n  Median: middle value of ranked data = $300,000

n  Mode: most frequent value = $100,000

House Prices: $2,000,000 500,000 300,000 100,000 100,000

Sum 3,000,000

Ch. 2-9

Which measure of location is the “best”?


n  Mean is generally used, unless extreme values (outliers) exist . . .

n  Then median is often used, since the median is not sensitive to extreme values. n  Example: Median home prices may be reported for

a region – less sensitive to outliers

Ch. 2-10

Shape of a Distribution

n  Describes how data are distributed n  Measures of shape

n  Symmetric or skewed


Mean = Median Mean < Median Median < Mean

Right-Skewed Left-Skewed Symmetric

Ch. 2-11


Geometric Mean

n  Geometric mean

n  Geometric mean rate of return

1/nn21

nn21g )xx(x)xx(xx ×××=×××= !!

1)x...x(xr 1/nn21g −×××=

Ch. 2-12


Example Initial Investment: $100 After 5 years: $125 Question: the average annual rate of returns? •  25 % divided by 5 years = 5%? This is Wrong!

•  ``Compound Interest’’ over 5 years $100 x (1+0.05)^5 = $127.63 > $125 after 5 years •  Answer: $100 x (1+r)^5 = $125 è

Ch. 2-13

Measures of Variability


Same center, different variation

Variation

Variance Standard Deviation

Coefficient of Variation

Range Interquartile Range

n  Measures of variation give information on the spread or variability of the data values.

Ch. 2-14

2.2

Range

n  Simplest measure of variation n  Difference between the largest and the smallest

observations:


Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Ch. 2-15

Disadvantages of the Range

n  Ignores the way in which data are distributed

n  Sensitive to outliers


7 8 9 10 11 12 Range = 12 - 7 = 5

7 8 9 10 11 12 Range = 12 - 7 = 5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Ch. 2-16

Interquartile Range


Median (Q2)

X maximum X

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

Ch. 2-17

STATA Example


2040

6080

100

High HW Group Low HW Group

Fina

l Gra

deInterquartile Rage by Low vs. High HW Group

Population Variance

n  Average of squared deviations of values from the mean

n  Population variance:


N

µ)(xσ

N

1i

2i

2∑=

−=

Where = population mean

N = population size

xi = ith value of the variable x

µ

Ch. 2-19

Sample Variance

n  Average (approximately) of squared deviations of values from the mean

n  Sample variance:


1-n

)x(xs

n

1i

2i

2∑=

−=

Where = arithmetic mean

n = sample size

Xi = ith value of the variable X

X

Ch. 2-20

Population Standard Deviation

n  Most commonly used measure of variation n  Shows variation about the mean n  Has the same units as the original data

n  Population standard deviation:


N

µ)(xσ

N

1i

2i∑

=

−=

Ch. 2-21

Sample Standard Deviation

n  Most commonly used measure of variation n  Shows variation about the mean n  Has the same units as the original data

n  Sample standard deviation:


1-n

)x(xS

n

1i

2i∑

=

−=

Ch. 2-22

Calculation Example: Sample Standard Deviation


Sample Data (xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.24267

126

18

16)(2416)(1416)(1216)(10

1n

)x(24)x(14)x(12)X(10s

2222

2222

==

−

−++−+−+−=

−

−++−+−+−=

!

!

A measure of the “average” scatter around the mean

Ch. 2-23

Measuring variation


Small standard deviation

Large standard deviation

Ch. 2-24

Comparing Standard Deviations


Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = 0.926

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.570

Data C

Ch. 2-25

Advantages of Variance and Standard Deviation

n  Each value in the data set is used in the calculation

n  Values far from the mean are given extra weight (because deviations from the mean are squared)


Coefficient of Variation

n  Measures relative variation n  Always in percentage (%) n  Shows variation relative to mean n  Can be used to compare two or more sets of

data measured in different units


100%x

sCV ⋅⎟⎟

⎠

⎞⎜⎜⎝

⎛=

Ch. 2-27

Comparing Coefficient of Variation

n  Stock A: n  Average price last year = $50 n  Standard deviation = $5

n  Stock B: n  Average price last year = $100 n  Standard deviation = $5


Both stocks have the same standard deviation, but stock B is less variable relative to its price

10%100%$50

$5100%

x

sCVA =⋅=⋅⎟⎟

⎠

⎞⎜⎜⎝

⎛=

5%100%$100

$5100%

x

sCVB =⋅=⋅⎟⎟

⎠

⎞⎜⎜⎝

⎛=

Ch. 2-28

n  If the data distribution is approximated by normal distribution, then the interval:

n  contains about 68% of the values in the population or the sample


The Empirical Rule

1σµ ±

µ

68%

1σµ±Ch. 2-29

n  contains about 95% of the values in the population or the sample

n  contains almost all (about 99.7%) of the values in the population or the sample


The Empirical Rule

2σµ ±

3σµ ±

3σµ ±

99.7% 95%

2σµ ±

Ch. 2-30

Weighted Mean

n  The weighted mean of a set of data is

n  Where wi is the weight of the ith observation and

n  Use when data is already grouped into n classes, with wi values in the ith class


x = wixii=1

n

∑ =w1x1 +w2x2 +!+wnxn

Ch. 2-31

wi∑ =1

2.3

Example

n  Consider a student with the scores of assignment (x1), midterm (x2), and final exam (x3) given by

x1 = 90, x2 = 90, and x3 = 46. n  The weights: n  w1 = 0.1, w2 =0.3, and w3 = 0.6. n  The final grade for this student is


wixii=1

3

∑ = 0.1×90+0.3×90+0.6× 46=63.6

Ch. 2-32

2.3

The Sample Covariance n  The covariance measures the strength of the linear relationship

between two variables n  The population covariance:

n  The sample covariance:

n  Only concerned with the strength of the relationship n  No causal effect is implied n  Depends on the unit of measurement


N

))(y(xy),(xCov

N

1iyixi

xy

∑=

−−==

µµσ

1n

)y)(yx(xsy),(xCov

n

1iii

xy −

−−==∑=

Ch. 2-33

2.4

Interpreting Covariance

n  Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent


Coefficient of Correlation

n  Measures the relative strength of the linear relationship between two variables

n  Population correlation coefficient:

n  Sample correlation coefficient:


YXssy),(xCovr =

YXσσy),(xCovρ =

Ch. 2-35

Features of Correlation Coefficient, r

n  Unit free n  Ranges between –1 and 1

n  The closer to –1, the stronger the negative linear relationship

n  The closer to 1, the stronger the positive linear relationship

n  The closer to 0, the weaker any positive linear relationship


Scatter Plots of Data with Various Correlation Coefficients


Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3 r = +1

Y

X r = 0

Ch. 2-37

Example: HW and Final Grade

n  r = .0.499

n  There is a relatively strong positive linear relationship between HW scores and

Final Grades n  Students who scored high on HW assignment

tended to have high final grades


2040

6080

100

Fina

l Gra

de

0 2 4 6 8 10Homework

Grade Fitted values

Correlation = 0.499HW and Final Grade

statistics for business and economics - faculty and...

Documents