ka-fu wong © 2004 econ1003: analysis of economic data lesson2-1 lesson 2: descriptive statistics

44
Lesson2-1 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson 2: Descriptive Statistics

Post on 21-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-1 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Lesson 2:

Descriptive Statistics

Page 2: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-2 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Outline

Mean

Median

Mode

Measures of dispersion

Variance

Interpretation and uses of standard deviation

Working with mean and standard deviation

Page 3: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-3 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Population Parameters and Sample Statistics

A population parameter is number calculated from all the population measurements that describes some aspect of the population.

The population mean, denoted , is a population parameter and is the average of the population measurements.

A point estimate is a one-number estimate of the value of a population parameter.

A sample statistic is number calculated using sample measurements that describes some aspect of the sample.

Page 4: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-4 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The Mean

Population X1, X2, …, XN

Population Mean

N

X

N

1=ii

Sample x1, x2, …, xn

Sample Mean

n

xx

n

1=ii

x

Page 5: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Population Mean

For ungrouped data, the population mean is the sum of all the population values divided by the total number of population values:

N

X=μ

where µ is the population mean.N is the total number of observations.X is a particular value. indicates the operation of adding.

Page 6: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Sample Mean

For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:

Where n is the total number of values in the sample.

nΣX

=X

This sample mean is also referred as arithmetic mean, simple mean, or simply sample average.

Page 7: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

A sample of five executives received the following bonus last year ($000):

14.0, 15.0, 17.0, 16.0, 15.0

15.4=577

=5

15.0+...+14.0=

nΣX

=X

Page 8: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-8 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Population and Sample Proportions

Population X1, X2, …, XN

p

Population Proportion

Sample x1, x2, …, xn

Sample Proportion

n

n

1=ii

p

xi = 1 if characteristic present, 0 if not

Page 9: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

A sample of five executives received the following bonus last year ($000):

7.0, 15.0, 17.0, 16.0, 15.0 Changing the first observation from 14.0 to 7.0 will

change the sample mean.

14=570

=5

15.0+...+7.0=

nΣX

=X

15.4=577

=5

15.0+...+14.0=

nΣX

=X

Page 10: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Weighted Mean

The weighted mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2, ...,wn, is computed from the following formula:

)n21

nn2211w ...w+w+(w

)Xw+...+Xw+X(w=X

Page 11: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

During a one hour period on a hot Saturday afternoon cabana boy Chris served fifty drinks. He sold five drinks for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10. Compute the weighted mean of the price of the drinks sold.

$0.89=50

$44.50=

15+15+15+515($1.15)+15($0.90)+15($0.75)+5($0.50)

=Xw

Page 12: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The Median

The Median is the midpoint of the values after they have been ordered from the smallest to the largest.

There are as many values above the median as below it in the data array.

For an even set of values, the median will be the arithmetic average of the two middle numbers.

Page 13: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

The ages for a sample of five college students are:21, 25, 19, 20, 22

Arranging the data in ascending order gives: 19, 20, 21, 22, 25. Thus the median is 21.

The heights of four basketball players, in inches, are:

76, 73, 80, 75Arranging the data in ascending order gives:

73, 75, 76, 80. Thus the median is 75.5

Page 14: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The Mode

The mode is the value of the observation that appears most frequently.

EXAMPLE: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87.

Because the score of 81 occurs the most often, it is the mode.

Page 15: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-15 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Properties of Mean and Median

Property Mean Median Mode

Uniqueness Yes Yes No

Effect of extreme values Strong Small Maybe

Page 16: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-16 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Measures of dispersion

1. Range

2. Mean Deviation

3. Variance and standard deviation

4. Coefficient of variation

Page 17: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-17 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Range

The range is the difference between the largest and the smallest value.

Only two values are used in its calculation. It is influenced by an extreme value. It is easy to compute and understand.

Page 18: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-18 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Mean Deviation

The Mean Deviation is the arithmetic mean of the absolute values of the deviations from the arithmetic mean.

All values are used in the calculation. It is not influenced too much by large or small values. The absolute values are difficult to manipulate.

n

X-X Σ=MD

Mean deviation is also known as Mean Absolute Deviation (MAD).

Page 19: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Range and Mean Deviation

The weights of a sample of crates containing books for the bookstore (in pounds ) are:

103, 97, 101, 106, 103Find the range and the mean deviation.

Range = 106 – 97 = 9

Page 20: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-20 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Range and Mean Deviation

The first step is to find the mean weight.

The mean deviation is:

102=5

510=

nΣX

=X

2.4=5

5+4+1+5+1=

5102-103+...+102-103

=n

X-X Σ=MD

Page 21: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Population Variance

The population variance is the arithmetic mean of the squared deviations from the population mean.

All values are used in the calculation. More likely to be influenced by extreme values

than mean deviation. The units are awkward, the square of the original

units.

Page 22: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-22 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The Variance

Population X1, X2, …, XN

Population Variance

(X - )

N2

i2

i=1

N

Sample x1, x2, …, xn

Sample Variance

1-n

)x - (x =s

n

1=i

2i

2

s

Note in the sample variance formula the sum of deviation is divided by (n-1) instead of n in order to yield an unbiased estimator of the population variance.

Page 23: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Population variance

The ages of the Dunn family are: 2, 18, 34, 42

What is the population variance?

24=496

=nΣX

( ) ( )

236=4

944=

424-42+...+24-2

=N

μ)-Σ(X=σ

2222

Page 24: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Population Standard Deviation

The population standard deviation (σ) is the square root of the population variance.

In the last example, the population variance is 236. Hence, the population standard deviation is 15.36, found by

15.36=236=σ=σ 2

Page 25: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Sample variance

The hourly wages earned by a sample of five students are:

$7, $5, $11, $8, $6. Find the variance.

7.40=537

=nΣX

=X

( ) ( ) ( )

5.30=1-5

21.2=

1-57.4-6+...+7.4-7

=1-nX-XΣ

=s222

2

Page 26: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Sample Standard Deviation

The sample standard deviation is the square root of the sample variance.

In the last example, the sample variance is 5.29. Hence, the sample standard deviation is 2.30

2.30=5.29=s=s 2

Page 27: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Sample Variance For Grouped Data

The formula for the sample variance for grouped data is:

1-n

xn-Σfx

1-n

xnx2n-Σfx

1-n

xΣfΣfxx2-Σfx

1-n

)xxx2-Σf(x

1-Σf

)x-Σf(x=s

22

222

22

2222

Page 28: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-28 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Sample Variance For Grouped Data

During a one hour period on a hot Saturday afternoon cabana boy Chris served fifty drinks. He sold five drinks for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10. Compute the variance of the price of the drinks.

042.01-50

2.07

1-15)1515(5)89.015.1(15)89.090.0(15)89.075.0(15)89.05.0(5

1-Σf)x-Σf(x

=s

2222

22

Page 29: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Interpretation and Uses of the Standard Deviation

Chebyshev’s theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at least:

where k is any constant greater than 1.

2k1

-1

Page 30: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-30 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Chebyshev’s theorem

K Coverage

1 0%

2 75.00%

3 88.89%

4 93.75%

5 96.00%

6 97.22%

Chebyshev’s theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at least 1- 1/k2

Page 31: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Interpretation and Uses of the Standard Deviation

Empirical Rule: For any symmetrical, bell-shaped distribution: About 68% of the observations will lie within 1s

the mean, About 95% of the observations will lie within 2s

of the mean Virtually all the observations will be within 3s of

the mean

Empirical rule is also known as normal rule.

Page 32: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Bell-shaped Curve showing the relationship between σ and μ

Page 33: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-33 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Why are we concern about dispersion?

Dispersion is used as a measure of risk. Consider two assets of the same expected (mean)

returns. -2%, 0%,+2% -4%, 0%,+4%

The dispersion of returns of the second asset is larger then the first. Thus, the second asset is more risky.

Thus, the knowledge of dispersion is essential for investment decision. And so is the knowledge of expected (mean) returns.

Page 34: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Relative Dispersion

The coefficient of variation is the ratio of the standard deviation to the arithmetic mean, expressed as a percentage:

(100%)X

s=CV

Page 35: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-35 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Sharpe Ratio and Relative Dispersion

Sharpe Ratio is often used to measure the performance of investment strategies, with an adjustment for risk.

If X is the return of an investment strategy in excess of the market portfolio, the inverse of the CV is the Sharpe Ratio.

An investment strategy of a higher Sharpe Ratio is preferred.

http://www.stanford.edu/~wfsharpe/art/sr/sr.htm

Page 36: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Skewness

Skewness is the measurement of the lack of symmetry of the distribution.

The coefficient of skewness can range from 3.00 up to 3.00.

A value of 0 indicates a symmetric distribution. It is computed as follows:

Smedian)-x3(

=sk

3

s

xx

2)-1)(n-(nn

=skOr

Page 37: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-37 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Why are we concerned about skewness?

Skewness measures the degree of asymmetry in risk. Upside risk Downside risk

Consider the distribution of asset returns: Right skewed implies higher upside risk than

downside risk. Left skewed implies higher downside risk than

upside risk.

Page 38: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Symmetric Distribution

zero skewness: mode = median = mean

Density Distribution(the height may be interpreted as relative frequency)

The area under the density distribution is 1. The sum of relative frequency is 1.Thus median always splits the density distribution into two equal areas.

Page 39: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Right Skewed Distribution

Positively skewed: (Skew to the right)

Mean and Median are to the right of the Mode.

Mode<Median<Mean

Page 40: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Left Skewed Distribution

Negatively Skewed:(skew to the left)

Mean and Median are to the left of the Mode.

Mean<Median<Mode

Page 41: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-41 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Working with mean and Standard Deviation

Set DataMea

nSt Dev

(1) 19 20 2120.0

0 0.82

(2) -1 0 1 0.00 0.82

(3) 19 20 20 2120.0

0 0.71

(4) 38 40 4240.0

0 1.63

(5) 57 60 6360.0

0 2.45

(6) 19 19 20 20 21 2120.0

0 0.82

(7) 3 5 8 5.33 2.05

(8) 4 7 9 6.67 2.05

(9) 7 12 1712.0

0 4.08

(10)

12 20 21 27 32 35 45 56 7235.5

6 18.04

Page 42: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-42 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

(2) = (1) – mean(1): Mean(2)=0; Stdev(2)=Stdev(1)

(3) = (1) + mean(1) Mean(3)=Mean(1); Stdev(3)<Stdev(1).

(4) = (1)*2; (5) = (1)*3 Mean(4)=mean(1)*2; mean(5)=mean(1)*3 Stdev(4)=stdev(1)*2; stdev(5)=stdev(1)*3

Working with mean and Standard Deviation

Set DataMea

nSt Dev

(1) 19 20 2120.0

0 0.82

(2) -1 0 1 0.00 0.82

(3) 19 20 20 2120.0

0 0.71

(4) 38 40 4240.0

0 1.63

(5) 57 60 6360.0

0 2.45

Page 43: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-43 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Working with mean and Standard Deviation

Set DataMea

nSt Dev

(1) 19 20 2120.0

0 0.82

(6) 19 19 20 20 21 2120.0

0 0.82

(7) 3 5 8 5.33 2.05

(8) 4 7 9 6.67 2.05

(9) 7 12 1712.0

0 4.08

(10)

12 20 21 27 32 35 45 56 7235.5

6 18.04

(6)=(1) multiplied by some frequency Mean(6)=Mean(1); Stdev(6)=Stdev(1).

(9) = (7)+(8) Mean(9)=mean(7)+mean(8)

(10) = (7) *(8) Mean(10)=mean(7)*mean(8)

Page 44: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics

Lesson2-44 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

- END -

Lesson 2: Lesson 2: Descriptive Statistics