st102 michaelmas term revision lse london school of economics

40
ST102 Elementary Statistical Theory Revision lectures – Michaelmas Term material Dr James Abdey Department of Statistics London School of Economics and Political Science ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 1

Upload: aanthonykiedis

Post on 28-Dec-2015

359 views

Category:

Documents


18 download

DESCRIPTION

ST102 Michaelmas Term Revision LSE London School of Economics

TRANSCRIPT

Page 1: ST102 Michaelmas Term Revision LSE London School of Economics

ST102Elementary Statistical Theory

Revision lectures – Michaelmas Term material

Dr James Abdey†

†Department of StatisticsLondon School of Economics and Political Science

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 1

Page 2: ST102 Michaelmas Term Revision LSE London School of Economics

Examination arrangements

Thursday, May 22nd 2014, 10:00–13:00.I Please double-check the time in your examination timetable on LSE

for You (published by week 1 of ST), in case of (extremely unlikely)changes to the date and time.

In the examination, you will be provided with:I Murdoch and Barnes: Statistical Tables, 4th edition.

F The only tables from this that you will (may) need are for thestandard normal, t, χ2, F and Wilcoxon distributions. These tablesare also on the ST102 Moodle site, so make sure you are familiar withtheir layout.

I A formula sheet (at the end of the examination paper). This is alsoon the ST102 Moodle site.

For general administrative matters on examinations see:http://www2.lse.ac.uk/intranet/students/registrationTimetablesAssessment/

examinationsAndResults/examTimetables/ExamTimetable.aspx

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 2

Page 3: ST102 Michaelmas Term Revision LSE London School of Economics

A word on calculators

You can also use a scientific calculator, as prescribed by examinationprocedures.

The rubric on the front of the examination paper will say:‘Scientific calculators are permitted in the examination, as prescribedby the School’s regulations. If you have a programmable calculator,you must delete anything stored in the memory in the presence of aninvigilator at the start of the examination.’

In short, graphics calculators are permitted (the graphics capabilitywill be of no benefit in the examination). However, anyprogrammable memory must be re-set in the presence of aninvigilator at the start of the examination.

Although many statistical calculations can be performed on scientificcalculators, you must still show all your working in your answerbooklet.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 3

Page 4: ST102 Michaelmas Term Revision LSE London School of Economics

Structure of the examination paper

The question paper contains seven questions, all given equal weight(20 marks each).

Section A: two compulsory questions; Section B: five questions.

Answer both questions from Section A, and three questions fromSection B.

If you answer more than three questions from Section B, only yourbest 3 answers will count towards the final mark. However, you arestrongly advised to only attempt 3 questions from Section B tomake efficient use of your time.

Each question in each section could cover any part of the syllabus.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 4

Page 5: ST102 Michaelmas Term Revision LSE London School of Economics

Structure of the examination paper

The final mark is out of 100.

Pass mark for the examination is 40.

Important note: Re-sit candidates only will sit the ‘old’examination paper structure which was in place in the 2012–13academic year. All past examination papers on Moodle have the‘old’ structure.

A specimen 2014 examination paper with the ‘new’ structure is alsoavailable on Moodle.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 5

Page 6: ST102 Michaelmas Term Revision LSE London School of Economics

Notes on examination ‘tactics’

Make sure you do not miss out on any marks that you can get!I i.e. try to give time to all questions you attempt, and not to get

stuck on any single question.

Some (parts of) questions are entirely standard and straightforward,some are more challenging. So try to make sure you do not miss outon the standard ones at least, bearing in mind that:

I The questions are not in order of difficulty.I Parts of questions (e.g. 1(a), 1(b), . . .) are not in order of difficulty.I Questions may be answered in any order but keep answers to each

question in one place in your answer booklet!

Remember that partial credit is given for partially correct answers.The only guaranteed way to get a 0 is an empty answer book!

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 6

Page 7: ST102 Michaelmas Term Revision LSE London School of Economics

Preparing for the examination

Only the topics covered in the lecture notes are included in theexamination

I ... except for the ‡ topics in MT, and others (if any) that may beexplicitly stated as not examinable, mainly LT after Section 6.10 andSection 7 on ANOVA.

Among these topics, all are potentially examinable.I However, some are, of course, more central than others, and more

likely to turn up in the examination.I Use the lecture notes, exercises and (especially) recent examination

papers to form an idea of which topics are most prominent in thisrespect, and to decide which ones to give most weight to in yourpreparation.

Also read the textbook on these topics, if it helps you.

Substantive queries about the material are best posted in MoodleQ&A forum (so everyone can read the response).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 7

Page 8: ST102 Michaelmas Term Revision LSE London School of Economics

Past examination papers

Most relevant are papers from 2008 onwards (including the 2008mock).

Older examination papers exist, but...

I they include further topics that are no longer covered

I the solutions are not always complete and contain some errors

I the style of the questions is generally different from more recent ones.

These can be accessed (with LSE username and password) at:https://library-2.lse.ac.uk/protected-exam/index.html

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 8

Page 9: ST102 Michaelmas Term Revision LSE London School of Economics

The rest of today

Outline of the most important topics and results from MT.

I You should definitely at least remember these!

I In the examination you can take these results as known and use them,unless told otherwise (i.e. unless a question explicitly asks you toprove some of them).

Examples of common types of questions, from past examinations.

I If you suspect any typos or errors in the solutions, please query themin the Moodle Q&A forum.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 9

Page 10: ST102 Michaelmas Term Revision LSE London School of Economics

Key topics covered in MT

1. Descriptive statisticsI Not often separate examination questions, but some of these (e.g. X̄

and S2) appear in other questions.

2. Set theory and counting rules (used in probability questions).

3. Probability: definition, classical probability, independence,conditional probability, Bayes’ theorem.

4. Random variables: definition, pf/pdf and cdf, expected values andvariances, medians, moment generating functions.

5. Common probability distributions: discrete and continuous uniform,Poisson, binomial, exponential and normal.

6. Multivariate probability distributions: independence of randomvariables, conditional and marginal distributions, sums and productsof random variables, covariance and correlation.

7. Sampling distributions: random (IID) samples, statistics and theirsampling distributions, the central limit theorem.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 10

Page 11: ST102 Michaelmas Term Revision LSE London School of Economics

Set theory

Basic rules of set-theoretic operations:

A ∩ B = B ∩ A and A ∪ B = B ∪ A

A ∩ (B ∩ C ) = (A ∩ B) ∩ C and A ∪ (B ∪ C ) = (A ∪ B) ∪ C

A∩ (B ∪C ) = (A∩B)∪ (A∩C ) and A∪ (B ∩C ) = (A∪B)∩ (A∪C )

(A ∩ B)c = Ac ∪ Bc and (A ∪ B)c = Ac ∩ Bc

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 11

Page 12: ST102 Michaelmas Term Revision LSE London School of Economics

Set theory

Let S be the sample space and A ⊂ S . Then:

∅ ⊂ A and ∅c = S

A ∩ ∅ = ∅ and A ∪ ∅ = A

A ∩ S = A and A ∪ S = S

A ∩ Ac = ∅ and A ∪ Ac = S

A ∩ A = A and A ∪ A = A

(For these and other similar results, see slides 122–123.)

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 12

Page 13: ST102 Michaelmas Term Revision LSE London School of Economics

Counting rules

Remember that in ‘classical probability’ problems, where alloutcomes are equally likely, probability calculations involve countingoutcomes (see slides 141–142).

See slide 155 for the basic counting formulae.

Counting possibilities directly (without the formulae) is also fine, ifyou can do it (i.e. in small problems).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 13

Page 14: ST102 Michaelmas Term Revision LSE London School of Economics

Probability: definition and key properties

See slide 128 for the axioms of probability.

The basic properties of the probability function P (slide 136):

P(S) = 1 and P(∅) = 0.

0 ≤ P(A) ≤ 1 for all events A.

P(Ac) = 1− P(A).

P(A ∪ B) = P(A) + P(B)− P(A ∩ B).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 14

Page 15: ST102 Michaelmas Term Revision LSE London School of Economics

Some crucial results and definitions

Independence: A and B are independent if P(A∩B) = P(A)P(B)I and if A1,A2, . . . ,An are independent, then

P(A1 ∩ A2 ∩ · · · ∩ An) = P(A1)P(A2) · · ·P(An).

Conditional probability:

P(A |B) =P(A ∩ B)

P(B), provided P(B) > 0.

Multiplication rule: P(A ∩ B) = P(A |B)P(B) and its extensions(see slide 181–182).

If A1,A2, . . . ,An form a partition of the sample space (see slide124):

I Total probability formula: P(B) =n∑

j=1

P(B |Aj)P(Aj)

I Bayes’ theorem:

P(Ai |B) =P(B |Ai )P(Ai )

P(B)=

P(B |Ai )P(Ai )n∑

j=1

P(B |Aj)P(Aj)

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 15

Page 16: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Question: A, B and C are independent events. Prove that A and(B ∪ C ) are independent.

Solution: Using rules for set-theoretic operations and basic properties ofprobability:

P[A ∩ (B ∪ C )]

= P[(A ∩ B) ∪ (A ∩ C )]

= P(A ∩ B) + P(A ∩ C )− P[(A ∩ B) ∩ (A ∩ C )]

= P(A ∩ B) + P(A ∩ C )− P(A ∩ B ∩ C )

= P(A)P(B) + P(A)P(C )− P(A)P(B)P(C )

= P(A)[P(B) + P(C )− P(B)P(C )]

= P(A)[P(B) + P(C )− P(B ∩ C )] = P(A)P(B ∪ C )

and thus A ⊥⊥ (B ∪ C ).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 16

Page 17: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Question: If a committee of three persons is to be formed from a groupof five men and four women, what is the probability that at least two ofthe committee are women, given that there is at least one woman on thecommittee?

Solution: Let A = ‘There are at least two women on the committee’ andB = ‘There is at least one woman on the committee’. Note that A ⊂ B,so A ∩ B = A. Calculate first:

P(No women on the committee) =

(5

3

)/

(9

3

)= 10/84

P(One woman on the committee) =

[(5

2

)(4

1

)]/

(9

3

)= 40/84

Then P(B) = 1− 10/84 = 74/84, P(A) = 1− [10/84 + 40/84] = 34/84,and

P(A |B) =P(A ∩ B)

P(B)=

P(A)

P(B)=

34

74= 0.4595.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 17

Page 18: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Question: We know that 3% of the population have a particular heartcondition. A screening test has a 70% chance of identifying the conditionif it is present, but has a 20% chance of recording a positive result whenit is not. Evaluate the probability that a patient who gets a positive testresult actually has the condition.

Solution: Let H = ‘Person has the condition’ and D = ‘Test is positive’.Then P(H) = 0.03, P(Hc) = 0.97, P(D |H) = 0.7 and P(D |Hc) = 0.2.Using Bayes’ theorem, we get:

P(H |D) =P(D |H)P(H)

P(D)=

P(D |H)P(H)

P(D |H)P(H) + P(D |Hc)P(Hc)

=0.7× 0.03

0.7× 0.03 + 0.2× 0.97

=0.021

0.021 + 0.194

= 0.098.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 18

Page 19: ST102 Michaelmas Term Revision LSE London School of Economics

Probability distributions of random variables

Suppose X is a continuous random variable, f (x) is its probability densityfunction (pdf) and F (x) is its cumulative distribution function (cdf). Keyresults for these:

pdf must satisfy (i.) f (x) ≥ 0 for all x , and (ii.)∫∞−∞ f (x) d(x) = 1.

F (x) = P(X ≤ x) =∫ x−∞ f (t) dt.

P(a < X ≤ b) = F (b)− F (a) =∫ ba f (x) d(x) for any a ≤ b.

F ′(x) = f (x).

Similar results, except for the last one, also hold for the cdf andprobability function p(x) = P(X = x) of a discrete random variable (withintegration replaced by summation over the possible values of X ).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 19

Page 20: ST102 Michaelmas Term Revision LSE London School of Economics

Expected values, variances and medians

For a continuous random variable X , we have:

E(X ) =

∫ ∞−∞

xf (x) dx

E[g(X )] =

∫ ∞−∞

g(x)f (x) dx for any function g

Var(X ) = E[(X − E(X ))2] =

∫ ∞−∞

(x − E(X ))2f (x) dx

= E(X 2)− (E(X ))2

F (m) = 0.5

where m denotes the median of X .

Similar definitions for a discrete random variable, with sums instead ofintegrals (and a modified definition of the median, see slide 288).

Expected values and variances of sums and products of random variables:see slide 433.ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 20

Page 21: ST102 Michaelmas Term Revision LSE London School of Economics

Moment generating functions

The moment generating function (mgf) of a continuous r.v. X is:

MX (t) = E(etX ) =

∫ ∞−∞

etx f (x) dx .

For discrete random variables, integration is replaced by summation, andf (x) by p(x).

In both cases:

M ′X (0) = E(X )

and M ′′X (0) = E(X 2)

which also gives:

Var(X ) = E(X 2)− (E(X ))2 = M ′′X (0)− (M ′X (0))2.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 21

Page 22: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Question: The weights, in kilograms, of a certain species of fish caughtoff the coast of Cornwall have a continuous distribution well-described bythe probability density function:

f (x) =

{c(6x − x2 − 5) 1 ≤ x ≤ 5

0 otherwise.

(a) Determine the value of the constant c ; (b) Derive the cumulativedistribution function and evaluate the median and expected value of X .

Solution: This very common type of question requires integration. In anexamination answer you must show the intermediate steps of theintegration, even though they are omitted here for brevity.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 22

Page 23: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Solution continued: (a) Here:∫ ∞−∞

f (x) dx = c

∫ 5

1(6x − x2 − 5) dx = c × 32/3

and since the integral must be 1, we have x = 3/32.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 23

Page 24: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Solution continued: (b) We have:∫ x

1(3/32)(6t − t2 − 5) dt = (9x2 − x3 − 15x + 7)/32

so:

F (x) =

0 for x < 1

(9x2 − x3 − 15x + 7)/32 for 1 ≤ x ≤ 5

1 for x > 5.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 24

Page 25: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Solution (b) continued:

The median m is the solution to F (m) = 0.5.

Since you cannot solve this third-degree equation directly, there must beanother way.

Since the expected value E(X ) = 3 is exactly half-way between 1 and 5,you might guess that this is because the distribution is symmetric around3. If this is the case, the median is also equal to 3. Direct calculationthen shows that indeed F (3) = 0.5, so m = 3.

(This reminds us that in any question some parts may be routine, whileothers may involve a twist!)

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 25

Page 26: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Solution (b) continued:

The expected value is given by:

E(X ) =

∫ ∞−∞

xf (x) dx = (3/32)

∫ 5

1x(6x − x2 − 5) dx = 3

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 26

Page 27: ST102 Michaelmas Term Revision LSE London School of Economics

Common probability distributions

You should memorise, and can then use, the pf/pdf, cdf, mean, varianceand median (if given) of the following distributions:

Binomial

Poisson

Discrete and continuous uniform

Exponential

Normal.

If any other distribution is used in a question, you will be given formulaefor them (or asked to derive them, as part of the question).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 27

Page 28: ST102 Michaelmas Term Revision LSE London School of Economics

Common discrete distributions

Poisson distribution for counts x = 0, 1, 2, . . . .

Binomial distribution for the number x of ‘successes’ out of n trials.

Common type of question: calculate probabilities or expected valuesfor these distributions, given some value of their parameters.

For the binomial distribution with large n, the normal approximationis often used:

I i.e. Bin(n, π) is approximately N(nπ, nπ(1− π)) (see slide 377)I the table of the standard normal distribution is then usedI then remember to include the continuity correction (see slide 379).

Remember also results for sums of independent binomial andPoisson random variables: see slide 443.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 28

Page 29: ST102 Michaelmas Term Revision LSE London School of Economics

The normal distribution

For a normal distribution X ∼ N(µ, σ2), it is important to remember (inaddition to the pdf, mean E(X ) = µ and variance Var(X ) = σ2) that:

Linear combinations and sums of normally distributed randomvariables are also normally distributed (see slide 445).

In particular, the standardised variable

Z =X − µσ

∼ N(0, 1)

With standardisation, calculations of probabilities for any normaldistribution can be transformed into calculations for a standardnormal [N(0, 1)] distribution.

The normal distribution tables that you have in the examinationshow values of 1− Φ(z) = P(Z > z) of the standard normaldistribution.

I You should know how to do these calculations (see slides 365–376 forthe rules and examples).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 29

Page 30: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Question: In the construction of a certain assembly, four rods of lengthX1, X2, X3 and X4 are connected end-to-end to form a composite rod tospan a gap of width Y . To function satisfactorily the length of thecomposite rod must exceed the size of the gap by not less than 0.10 cm.The lengths X1, X2, X3 and X4 are independently normally distributedwith mean 4.0 cm and variance 0.015 cm. Y is also normally distributedwith mean 15.94 cm and variance 0.024 cm, independently of the lengthsof the rods.

Find the probability that the assembly is satisfactorily formed at the firstattempt. Out of 10 independent composite rods, what is the probabilitythat two and only two are satisfactory?

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 30

Page 31: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Solution: Let C = X1 + X2 + X3 + X4 be the length of the compositerod.

Then C is also normally distributed with mean 4× 4.0 = 16.0 andvariance 4× 0.015 = 0.06.

Since Y ∼ N(15.94, 0.024) independently of C , the difference is alsonormally distributed with:

D = C − Y ∼ N(16.0− 15.94, 0.06 + 0.024) = N(0.06, 0.084).

The probability we need is:

P(D ≥ 0.1) = P

(D − 0.06√

0.084≥ 0.1− 0.06√

0.084

)= P(Z ≥ 0.14) = 1− Φ(0.14) = 0.4443

where Z ∼ N(0, 1).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 31

Page 32: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Solution continued: For the second part of the question, let X nowdenote the number of satisfactory rods out of 10 independent rods.

Then X ∼ Bin(10, 0.4443). The probability we need is

P(X = 2) =

(10

2

)× (0.4443)2 × (1− 0.4443)8

= 45× (0.4443)2 × (1− 0.4443)8

= 0.081.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 32

Page 33: ST102 Michaelmas Term Revision LSE London School of Economics

Multivariate distributions

If random variables X1,X2, . . . ,Xn are independent, the pf/pdf oftheir joint distribution is the product of their univariate marginalpfs/pdfs (see slides 426–430).

Key concepts for the general (possibly non-independent) case wereintroduced mainly in the context of a bivariate discrete randomvariable (X ,Y ):

I Marginal distributions (see slides 395–396):

pX (x) =∑y

p(x , y) and pY (y) =∑x

p(x , y)

I Conditional distributions, for example (see slide 404):

pY |X (y | x) = P(Y = y |X = x) =P(X = x and Y = y)

P(X = x)=

p(x , y)

pX (x)

Covariance and correlation: measures of association between anytwo random variables (see slides 415–419).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 33

Page 34: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Question: The table below specifies the joint probability distribution ofthe random variables X and Y :

X−1 0 1

−1 0.05 0.15 0.10 0.30Y 0 0.10 0.05 0.25 0.40

1 0.10 0.05 0.15 0.30

0.25 0.25 0.50 1

(a) Identify the marginal distribution of Y , and the conditionaldistribution of X |Y = 1.

(b) Evaluate the covariance of X and Y .

(c) Are X and Y independent?

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 34

Page 35: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Solution: (a) Conveniently, the summation to give the marginaldistribution pY (y) is already included in the table. So:

pY (−1) = 0.30, pY (0) = 0.40, pY (1) = 0.30

and pY (y) = 0 for all other y .

The conditional pf is:

pX |Y (x |Y = 1) = pX ,Y (x , 1)/pY (1) = pX ,Y (x , 1)/0.30

i.e.

pX |Y (−1 |Y = 1) = 0.10/0.30 = 0.33

pX |Y (0 |Y = 1) = 0.05/0.30 = 0.17

pX |Y (1 |Y = 1) = 0.15/0.30 = 0.50

and pX |Y (x | 1) = 0 for all other x .

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 35

Page 36: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Solution: (b) First we need the marginal expected values:

E(Y ) =∑y

ypY (y) = −1× 0.30 + 0× 0.40 + 1× 0.30 = 0

and E(X ) = 0.25, similarly. We also need:

E(XY ) =∑x

∑y

xypX ,Y (x , y)

= −1× (0.10 + 0.10) + 1× (0.05 + 0.15) + 0 = 0

soCov(X ,Y ) = E(XY )− E(X )E(Y ) = 0− 0.25 · 0 = 0.

(c) Even though the covariance is 0, X and Y are not independent. Forexample, pX (1)pY (0) = 0.20 6= 0.25 = pX ,Y (1, 0). Therefore it is not thecase that pX ,Y (x , y) = pX (x)pY (y) for all x , y .

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 36

Page 37: ST102 Michaelmas Term Revision LSE London School of Economics

Sampling distributions

Random sample from a distribution f (x ; θ): Random variablesX1,X2, . . . ,Xn which are independent and each has the samedistribution f (x , θ) (see slide 454)

I i.e. n independent and identically distributed (IID) random variables.

A statistic is a function of the variables in the sample which doesnot depend on unknown parameters (i.e. its value in a sample canbe calculated when a sample is observed) (see slide 458).

A statistic is a random variable. Its distribution is the samplingdistribution of the statistic (see slide 459).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 37

Page 38: ST102 Michaelmas Term Revision LSE London School of Economics

Sampling distribution of the sample mean

Consider a random sample X1, . . . ,Xn from a distribution with meanE(Xi ) = µ and variance Var(Xi ) = σ2.

For the sampling distribution of X̄ =n∑

i=1Xi/n, the mean and

variance are always E(X̄ ) = µ and Var(X̄ ) = σ2/n, rsepectively.

About the shape of the sampling distribution we know the following:

I If the Xi s are normally distributed:

X̄ ∼ N(µ, σ2/n) (1)

I Even when Xi are not normally distributed, (1) holds approximatelywhen n is large enough. This is the central limit theorem (CLT)(see slides 475–477).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 38

Page 39: ST102 Michaelmas Term Revision LSE London School of Economics

Sampling distribution of the sample mean

A common application is when Xi ∼ Bin(1, π).

Let S =n∑

i=1Xi , which is distributed as S ∼ Bin(n, π).

Then X̄ = S/n = π̂ is the sample proportion of observations withvalue Xi = 1.

The CLT then says that approximately:

π̂ ∼ N(π, π(1− π)/n).

This also implies that approximately:

S = nπ ∼ N(nπ, nπ(1− π)).

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 39

Page 40: ST102 Michaelmas Term Revision LSE London School of Economics

Example question

Question: A random sample of 100 individuals is telephoned and askedvarious questions. One of the questions was ‘Do you consider that ingeneral the products for sale in AB Stores are of high quality?’. Supposethe percentage in the general population who believe that the productsare of high quality is 30%. Let S denote the number of people in thesample who answer ‘Yes’ to the question. Evaluate P(S ≤ 25).

Solution: Here the 100 individual responses are a random sample fromthe distribution Xi ∼ Bin(1, 0.3), so S ∼ Bin(100, 0.3).Let Y ∼ N(100× 0.3, 100× 0.3× 0.7) = N(30, 21).Here it is useful to use a continuity correction in the calculation, so:

P(S ≤ 25) = P(Y ≤ 25.5) = P

(Y − 30√

21≤ 25.5− 30√

21

)= P(Z ≤ −0.98) = 0.1635

(where Z ∼ N(0, 1)), using the table of the standard normal distribution.

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures – MT material 40