introduction to statistics

1/18

EC114 Introduction to Quantitative Economics1. Introduction to Statistics

Department of EconomicsUniversity of Essex

11/13 October 2011

EC114 Introduction to Quantitative Economics 1. Introduction to Statistics

2/18

Outline

Reference: R. L. Thomas, Using Statistics in Economics,McGraw-Hill, 2005, Prerequisites 1 and 2.

1 Statistics in Economics

2 Descriptive Statistics


Statistics in Economics 3/18

Why Study Quantitative Economics?

In Economics, we make an argument using quantitativeevidence.

A historian might defend an argument using historicalquotationsEconomists make arguments using quantitiese.g. “Unemployment rose last year by 1 million becauseGDP fell by 0.5%”

We use statistics to interpret quantitative evidence

The good news is that knowledge of statistics pays verywell!



In Economics we use Statistics (and Econometrics) toanalyse and interpret economic data with a view to:

1 modelling economic relationships;- What determines wages? Age, experience, occupation,education?

2 testing economic theories;- Are share price movements unpredictable?

3 identifying trends;- Are global air temperatures rising?

4 forecasting/prediction;- We predict GDP next year given different government taxpolicies

5 making better decisions.- Which assets should we buy?



• The data we observe can be for different types of variable:1 aggregate: relating to the whole economy or specific

sectors/regions, e.g. consumers’ expenditure,producer price inflation.

2 individual: relating to individual firms or householdse.g. household expenditure, firms’ investmentexpenditure.

• The data can be observed in different ways:1 time series i.e. for a given variable over time,

e.g. consumers’ expenditure from 1955–2009;2 cross section i.e. for a given variable at a particular

point in time, e.g. car firms’ investment in 2005;3 panel data i.e. for a variable on individual units over

time, e.g. households’ expenditure in the U.K. from1990–2009.


Descriptive Statistics 6/18

• What can we learn from data? We learn little from lookingat a large set of numbers, so we attempt to summarise thedata using descriptive statistics.• Table P.1 in Thomas reports the yearly clothing expenditure

of 373 families and uses this data set to illustrate the use ofdescriptive statistics – try to follow what he does.• We shall use a smaller data set of 10 observations picked

from Table P.1, these being

2806, 1743, 3201, 2401, 3567, 1666, 2111, 2848, 1572,2651• These are observations on a variable we shall denote X.• We shall use the index i to denote a generic observation

Xi, where i takes on the values 1, 2, 3, . . . , 9, 10.• Hence X1 = 2806, X2 = 1743, . . . , X9 = 1572, X10 = 2651.



Measures of Central Tendency

• What is a typical value for X in the data set?• The most common answer is to compute the (arithmetic)

mean, or average, of the values for X:

X̄ =X1 + X2 + . . .+ X9 + X10

10=

24, 56610

= 2, 456.6;

the typical clothing expenditure is £2,456.60.• In general, if we have n observations, we would write

X̄ =X1 + X2 + . . .+ Xn−1 + Xn

n=

n∑i=1

Xi

n.



The summation notation is very useful and the followingproperties are worth learning:

1 If α is a constant i.e. it does not vary and does not dependon i, then

n∑i=1

αXi = αX1 + . . .+ αXn = α

n∑i=1

Xi,

n∑i=1

α = α+ . . .+ α (ntimes) = nα.

2 If X and Y are two variables, with n observations on each,then

n∑i=1

(Xi + Yi) = (X1 + Y1) + . . .+ (Xn + Yn) =n∑

i=1

Xi +n∑

i=1

Yi.



The above properties can also be combined: if α and β are twoconstants, then

n∑i=1

(αXi + βYi) =n∑

i=1

αXi +n∑

i=1

βYi = α

n∑i=1

Xi + β

n∑i=1

Yi.

Sometimes the summation is written∑n

i=1 or∑

i or simply∑

when the range of summation is obvious.For example, the sample mean may be written

X̄ =∑

i Xi

n=

∑Xi

n.



• Another measure of the typical value is the median, M.• It is obtained by arranging the data in ascending order and

choosing the value in the middle.• If n is odd, then

M = X(n+1)/2

e.g. if n = 125 then (n + 1)/2 = (125 + 1)/2 = 63 so that Mis the 63rd observation:

M = X63.

• Note that there are 62 observations below M and 62observations above M.



• If n is even, M is the average of the two middle numbers:

M =Xn/2 + Xn/2+1

2

e.g. if n = 126 then n/2 = 63 and n/2 + 1 = 64 so that

M =X63 + X64

2.

• Putting our sample of 10 observations in ascending order:

1572 1666 1743 2111 2401 2651 2806 2848 3201 3567• Here, n = 10 and so n/2 = 5 and n/2 + 1 = 6; hence

M =X5 + X6

2=

2401 + 26512

=5052

2= 2526.



• A third measure of central tendency is the mode, which isthe most frequent observation.• In our sample of 10 clothing expenditures the mode has

little meaning because all the observations are different!• But when values are repeated in a data set the mode

depicts the most common value.

• The mean is used most widely but can be distorted byextreme values, in which case the median is moremeaningful.• Suppose the largest observation in our data set was not

3567 but 13567, which is an extreme value compared tothe other nine observations.• In this case the mean becomes 34,566/10=3,456.6 (larger

than all but the largest, extreme, observation) but themedian remains unchanged at 2526.



Measures of Variation

• By how much do the observations vary around their centralvalue?• The variance is the mean squared deviation around the

mean.• Let xi denote the deviation of observation i from the mean,

X̄, i.e. xi = Xi − X̄.• The squared deviation is x2

i = (Xi − X̄)2, and the mean ofthese (the variance) is

v2 =∑

i x2i

n=

∑i(Xi − X̄)2

n.

• It is often easier to calculate

v2 =∑

i X2i

n− X̄2.



Returning to our 10 observations on clothing expenditure:

i Xi Xi − X̄ (Xi − X̄)2 X2i

1 2806 349.4 122,080.36 7,873,6362 1743 −713.6 509,224.96 3,038,049...

......

......

9 1572 −884.6 782,517.16 2,471,18410 2651 194.4 37,791.36 7,027,801

Sums 24,566 0.0 4,139,506.40 64,488,342

Hence v2 =4, 139, 506.40

10= 413, 950.64

or v2 =64, 488, 342

10− (2, 456.6)2

= 6, 448, 834.2− 6, 034, 883.56 = 413, 950.64.



• What does a variance of 413,950.64 actually mean?• We can make relative statements e.g. it is larger than a

variance of 10 and smaller than a variance of one million,but can we say any more about the variation about themean?• It is common to consider the standard deviation, the

positive square root of v2:

v =

√∑i x2

i

n.

• For our data set we find that v =√

413, 950.64 = 643.39which we interpret as being approximately the averagedeviation of observations from their mean.



• An alternative measure of variation is to take the meanabsolute deviation rather than the mean squared deviationfrom the mean:

mdev =∑

i |Xi − X̄|n

.

• From our previous table we find:i Xi Xi − X̄ |Xi − X̄|

1 2806 349.4 349.42 1743 −713.6 713.6...

......

...9 1572 −884.6 884.6

10 2651 194.4 194.4

Sums 24,566 0.0 5,580.0



• Hence we find that mdev=5580/10=558.• Although mdev may seem the natural measure of the

average variation from the mean, it is used less widely thanthe standard deviation, mainly because:

1 mathematically it is easier to analyse squared valuesthan absolute values, and therefore...

2 the theory about variance and standard deviation ismore developed.


Summary 18/18

Summary

Measures of central tendency (mean, mode, median)Measures of variation (variance, standard deviation, meandeviation)

Next week:

Frequency distributionsMutually exclusive and independent eventsConditional probabilities


introduction to statistics

Documents