problems

M2S1: Probability and Statistics IIProblems

Professor David A van DykStatistics Section, Imperial College London

[email protected]://www2.imperial.ac.uk/∼dvandyk

October 2013

1

M2S1 Problems(?) You may find starred problems more challenging.(†) Problems marked with a (†) review material from M1S.

1 Introduction and Motivation

1.1 Course Administration and Syllabus

1.2 Randomized Controlled Experiments

1.2.1 Suppose you are interested in the effects of caffeine on the concentration of sleep-deprived under-graduates, in particular on their performance on exam-like tasks. You are planning an experimentand have identified 100 of your classmates who are willing to participate as subjects in the ex-periment. You plan to have a test that involves reading comprehension, mathematical skills, andanalytical abilities. You plan to administer the test at 14.00 on a day when your 100 subjectshave had at most 4 hours of sleep in the last 24 hours and have had no sleep since 7.00. Describehow you will design your experiment. Be sure to consider issues such as blinding, control andtreatment groups, control, randomization, dose, and the placebo effect. You might want to readhttp://news.bbc.co.uk/1/hi/health/1142492.stm and

http://en.wikipedia.org/wiki/Effect_of_psychoactive_drugs_on_animals

for background.

2 Probability Spaces

2.1 Definitions of Probability

2.1.1 Suppose your friend has a 50p coin and is going to flip it into the air. What probability would youassign that it will come up heads in each of the following situations:

(a) You have no information about the coin or its history of coming up head when flipped byyour friend.

(b) You friend has already flipped this coin twice, and it came up heads both times.

(c) You friend has already flipped this coin 10 times, and it came up heads each time.

(d) You friend has already flipped this coin 1000 times, and it came up heads each time.

What type of probability are you using? [Hint: No calculations are required for this question.]

2.2 Basic Probability

2.2.1 (†) Describe the sample spaces for each of the following experiments,

(a) A coin is tossed five times.

(b) The number of people in the queue ahead of you when you arrive in the junior common roomto purchase a decaf(!) skinny latte.

(c) Measure the life-time of a particular type of light bulb.

(d) Measure the time it takes a Piccadilly line train to go from South Kensington Station toHeathrow Terminal 4.

2.2.2 (†) Find formulas for each of the following events in terms of α = Pr(A), β = Pr(B), andγ = Pr(A ∩B).

..................................................................................................................................................................................................(?) You may find starred problems more challenging.(†) Problems marked with a (†) review material from M1S.

2

2.2 Basic Probability M2S1 Problems(a) either A or B or both.

(b) either A or B but not both.

(c) at least one of A or B.

(d) at most one of A or B.

2.2.3 (†) A couple plans to have three children. There are 8 possible arrangements of girls and boys.For example, GGB means the first two children are girls and the third child is a boy. All 8arrangements are (approximately) equally likely.

(a) Write down all 8 arrangements of the sexes of the three children. What is the probability ofany one of these arrangements?

(b) Let X be the number of girls the couple has. What is the probability that X = 2?

(c) Starting from your work in (a), find the distribution of X . That is, what values can X take,and what are the probabilities for each value?

(d) What named distribution does X follow? What are the mean and variance of X?

2.2.4 (†) If Pr(A) = 1/4 and Pr(Bc) = 1/5, can A and B be disjoint? Explain.

2.2.5 (†) Consider the probability space (S,B,Pr) withA,B ∈ B. Using only the Kolmogorov axiomsprove

(a) Pr(A) ≤ 1,

(b) If A ⊂ B, then Pr(A) ≤ Pr(B), and

(c) Pr(A ∪B) = Pr(A) + Pr(B)− Pr(A ∩B).

2.2.6 Let Ω be a sample space.

(a) Show that the collection B = ∅,Ω is a sigma algebra.

(b) Let B = all subsets of Ω, including Ω itself. Show B is a sigma algebra.

(c) Show that the intersection of two sigma algebras is a sigma algebra.

2.2.7 (?) A slightly more general definition of a probability function than what we consider in classallows a probability function to be defined on a field (rather than only on a sigma algebra). Inparticular, let F be a field composed of subset of Ω and define a probability function on F , asa function Pr such that (i) Pr(A) ≥ 0 for all A ∈ F , (ii) Pr(Ω) = 1, and (iii) Pr is countablyadditive. For a countably disjoint sequence Ak, k = 1, 2, . . ., we require Pr(∪∞k=1Ak) =∑∞k=1 Pr(Ak) only if ∪∞k=1Ak ∈ F .

Suppose Bk ∈ F for k = 1, 2, . . . , and consider the following:

AXIOM: If B1 ⊃ B2 ⊃ B3 ⊃ . . . and ∩∞k=1Bk = ∅ then Pr(Bk) ↓ 0 as k →∞.

Now let Ω = 1, 2, . . . and F = finite and cofinite subsets of Ω. Show that the function

Pr(ϕ) =

0 if ϕ is finite1 if ϕ is cofinite

does not satisfy this AXIOM. Given what we have shown about this function in class, commenton why this result is not surprising.

2.2.8 (?) Let B be the set of countable and cocountable subsets of the real numbers, where a set iscocountable if its complement is countable.


3

2.3 Conditional Probability and Independence M2S1 Problems(a) Show that B is a sigma algebra. [Hint: Is a countable set of countable sets countable?]

(b) Let

P (ϕ) =

0 if ϕ is countable1 if ϕ is cocountable.

Is this function finitely additive? Is it countable additive?

2.3 Conditional Probability and Independence

2.3.1 (†) Suppose Pr(A) > 0 and Pr(B) > 0, Show that

(a) if A and B are disjoint they are not independent, and

(b) if A and B are independent they are not disjoint.

2.3.2 (†) About one in three human twins is identical (one egg) and two in three are fraternal (twoeggs). Identical twins are always the same sex, with boys and girls being equally likely. Aboutone quarter of fraternal twin pairs are both boys, another quarter are both girls, and the rest ofone boy and one girl. In England and Wales about one in sixty-five births is a twin birth (http://www.multiplebirths.org.uk/media.asp. Let

A = A birth in England/Wales results in twin boysB = A birth in England/Wales results in fraternal twinC = A birth in England/Wales results in twins

(a) Describe the event A ∩B ∩ C in words.

(b) Find Pr(A ∩B ∩ C).

2.3.3 For events A and B in a sample space Ω, under what conditions does the following hold:

Pr(A) = Pr(A|B) + Pr(A|Bc)?

2.3.4 A biased coin is tossed repeatedly, with tosses mutually independent; the probability of the coinshowing Heads on any toss is p. Let Hn be the event that an even number of Heads have beenobtained after n tosses, let pn = Pr (Hn), and define p0 = 1. By conditioning on Hn−1 and usingthe LAW OF TOTAL PROBABILITY, show that, for n ≥ 1,

pn = (1− 2p)pn−1 + p. (1)

Find a solution to this difference equation, valid for all n ≥ 0, of the form pn = A+Bλn, whereA, B and λ are constants to be identified. Prove that if p < 1/2, then pn > 1/2 for all n ≥ 1, andfind the limiting value of pn as n −→∞. Is this limit intuitively reasonable?

2.3.5 (?) A simple model for weather forecasting involves classifying days as either Fine or Wet, andthen assuming that the weather on a given day will be the same as the weather on the precedingday with probability p, with 0 < p < 1. Suppose that the probability of fine weather on dayindexed 1 (say Jan 1st) is denoted by θ. Let θn denote the probability that day indexed n is Fine.For n = 2, 3, ..., find a difference equation for θn similar to that in equation (1) in Problem 2.3.2,and use this difference equation to find θn explicitly as a function of n, p and θ. Find the limitingvalue of θn as n −→∞.

2.3.6 (†) Consider two coins, of which one is a normal fair coin and the other is biased so that theprobability of obtaining a Head is p > 1/2.


4

M2S1 Problems(a) Suppose p = 1 and a coin is selected at random and tossed n times, with tosses mutually

independent. Evaluate the conditional probability that the selected coin is the normal one,given that the first n tosses are all Heads. [Hint: You will need to use the Binomial distribu-tion from M1S.]

(b) Now suppose 1/2 < p < 1 and that again, one of the coins is selected randomly and tossedn times. Let E be the event that the n tosses result in k Heads and n− k Tails, and let F bethe event that the coin is fair. Find Pr(F |E).

2.3.7 (†) A company is to introduce mandatory drug testing for its employees. The test used is veryaccurate, in that it gives a correct positive test (detects drugs when they are actually present in ablood sample) with probability 0.99, and a correct negative test (does not detect drugs when theyare not present) with probability 0.98. If an individual tests positive on the first test, a secondblood sample is tested. It is assumed that only 1 in 5000 employees actually does provide a bloodsample with drugs present.

Calculate the probability that the presence of drugs in a blood sample is detected correctly, given

(a) a positive result on the first test (before the second test is carried out),

(b) a positive result on both first and second tests.

Assume that the results of tests are conditionally independent, that is, independent given the pres-ence or absence of drugs in the sample.

3 Univariate Random Variables and Probability Distributions

3.1 Probability Distribution, Density and Mass Functions

3.1.1 (†) Determine for which values of the constant c the following functions define valid probabilitymass functions for a discrete random variable X , taking values on range X = 1, 2, 3, .... Forparts (a) and (d) the value of c will depend on λ. In this case, specify the range of λ resulting in avalid probability mass function.

(a) fX(x) = cλx, (b) fX(x) = c/(x2x),

(c) fX(x) = c/(x2), (d) fX(x) = cλx/x!.

In each case, calculate Pr(X > 1).

3.1.2 (†) Suppose n identical fair coins are tossed. Those that show Heads are tossed again, and thenumber of Heads obtained on the second set of tosses defines a discrete random variable X .Assuming that all tosses are independent, find the range and probability mass function of X .

3.1.3 (†) A continuous random variable X has pdf given by

fX(x) = c(1− x)x2, for 0 < x < 1,

and zero otherwise. Find the value of c, the cdf of X , FX , and Pr(X > 1/2).

3.1.4 (†) A function f is defined by

f(x) = k/xk+1, for x > 1,

and zero otherwise. For what values of k is f a valid pdf? Find the cdf of X .


5

3.2 Transformations of Univariate Random Variables M2S1 Problems3.1.5 (†) A continuous random variable X has pdf given by

fX(x) =

x, for 0 < x < 1,

2− x, for 1 ≤ x < 2,

and zero otherwise. Plot fX , find the cdf FX , and plot FX .

3.1.6 (†) Show that the function, FX , defined for x ∈ R by

FX(x) = c exp−e−λx

is a valid cdf for a continuous random variable X for a specific choice of constant c, where λ > 0.Find the pdf fX associated with this cdf.

3.1.7 Evaluate ∫ ∞0

e−4x2dx.

[Hint: Relate the integrand to a well-known pdf.]

3.1.8 A point is to be selected randomly from an integer lattice restricted to the triangle with corners at(1, 1), (n, 1) and (n, n) for positive integer n. If all points are equally likely to be selected, findthe probability mass functions for the two discrete random variables X and Y corresponding tothe x− and y− coordinates of the selected point respectively.

3.1.9 Consider two random variables,X with cdf FX and Y with cdf FY . We say that Y is stochasticallygreater thanX if FY (u) ≤ FX(u) for all u and FY (u) < FX(u) for some u. If Y is stochasticallygreater than X , prove that

Pr(Y > u) ≥ Pr(X > u) for every u

andPr(Y > u) > Pr(X > u) for some u.

Qualitatively compare the distributions of X and Y .

3.2 Transformations of Univariate Random Variables

3.2.1 (†) Suppose that X is a continuous random variable with density function given by

fX(x) = 4x3, for 0 < x < 1,

and zero otherwise. Find the density functions of the following random variables:

(a) Y = X4, (b) W = eX , (c) Z = logX, (d) U = (X − 0.5)2.

3.2.2 Again suppose that X is a continuous random variable with density function given by

fX(x) = 4x3, for 0 < x < 1,

and zero otherwise. Find the monotonic decreasing function H such that the random variable V ,defined by V = H(X), has a density function that is constant on the interval (0, 1), and zerootherwise.


6

3.3 Expected Values M2S1 Problems3.2.3 The measured radius of a circle, R, is a continuous random variable with density function given

byfR(r) = 6r(1− r), for 0 < r < 1,

and zero otherwise. Find the density functions of (a) the circumference and (b) the area of thecircle.

3.2.4 Suppose that X is a continuous random variable with density function given by

fX(x) =α

β

(1 +

x

β

)−(α+1)

, for x > 0,

and zero elsewhere, with α and β non-negative parameters.

(a) Find the density function and cdf of the random variable defined by Y = logX .

(b) Find the density function of the random variable defined by Z = ξ + θY .

3.3 Expected Values

3.3.1 (†) onsider the function fX(x) = cg(x) for some constant c > 0, with g defined by

g(x) =|x|

(1 + x2)2, for x ∈ R.

Show that fX(x) is a valid pdf for a continuous random variable X with range X = R, and findthe cdf, FX , and the expected value of X , E(X), associated with this pdf.

3.3.2 Suppose that X is a continuous random variable with support R. Its pdf is given by

fX(x) = α2x exp −αx , for x ≥ 0,

and is zero otherwise, with parameter α > 0.

(a) Find the cdf of X .

(b) Show that, for any positive value m, Pr(X ≥ m) = (1 + αm) exp −αm .(c) Find E(X).

(d) Now consider the pdfgX(x) = kx2 exp−αx, for x ≥ 0 (2)

and zero elsewhere. Find the value of k for which (2) is a valid pdf.

3.3.3 The solid line in the left panel of Figure 1 is an unnormalized density function, fX(x). It doesnot integrate to one and its integral is unknown. It is superimposed on a standard normal pdf,φ(x), (dashed line). The ratio fX(x)/φ(x) is plotted in the right panel. Suppose you obtain alarge random sample from the standard normal distribution, (X1, X2, . . . , Xn). How might youuse this sample to approximate the expectation under fX(x), the properly normalized version offX(x)? Based on the plot in Figure 1, how well do you expect your method to work? Why?


7

3.4 Higher Moments M2S1 Problems

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

f X(x

)

−3 −2 −1 0 1 2 3

0.6

0.8

1.0

1.2

x

f X(x

)φ(

x)

Figure 1: Plots for Problem 3.3.3.

3.4 Higher Moments

3.4.1 Suppose that X is a continuous random variable with pdf

fX(x) =1

βe−x/β, for x > 0

and zero elsewhere, with β a positive parameter. Let Y =√

2X/β. Find the pdf, mean, andvariance of Y .

3.4.2 A continuous random variable X has cdf given by

FX(x) = c(αxβ − βxα), for 0 ≤ x ≤ 1,

FX(x) = 0, for x < 0 and FX(x) = 1, for x > 1, for constants 1 ≤ β < α. Find the value ofconstant c, and evaluate the rth moment of X .

3.4.3 Let X be a random variable with kth non-central moment, αk = E(Xk), and kth central moment,µk = E[(X − α1)k]. In addition to the mean and variance, two quantities sometimes used tosummarize a distribution are the

skewness : ξ3 =µ3

(µ2)3/2and kurtosis : ξ4 =

µ4

µ22

.

The skewness measures asymmetry in the pdf, while the kurtosis measures “peakedness”. A pdfis said to be symmetric around a point m if fX(m− δ) = fX(m+ δ) for every positive δ.

(a) Show that the skewness of a symmetric distribution is zero.

(b) Show µ3 = α3 − 3α2α1 + 2α31.

(c) Suppose

X =

0 with probability 1− pθ with probability p,

where θ > 0. Find and plot the skewness of X as a function of p and θ.


8

M2S1 Problems(d) Calculate the skewness of fX(x) = e−x, for x ≥ 0 and 0 elsewhere. This is a right skewed

distribution.

(e) Calculate the kurtosis for each of the following pdfs and comment on the peakedness ofeach.

(i) fX(x) =1√2πe−x

2/2, for x ∈ R

(ii) fX(x) =1

2, for − 1 < x < 1

(iii) fX(x) =1

2e−|x|, for x ∈ R

3.4.4 (?) Let X be a continuous random variable with range X = R+, pdf fX and cdf FX .

(a) Show that

E(X) =

∫ ∞0

[1− FX(x)] dx.

(b) Show also that for integer r ≥ 1,

E(Xr) =

∫ ∞0rxr−1 [1− FX(x)] dx.

(c) Find a similar expression for random variables for which X = R.

4 Univariate Families of Distributions

4.1 Standard Univariate Parametric Families

4.1.1 (†)

(a) At a certain London university, the average overall grade of students in their first year is 65with a standard deviation of 15. Under a normal model, what proportion of students havegrades over 80? Under 40? Find the sixtieth percentile of the overall grades of first yearstudents. If there are 250 first year students, find the mean and standard deviation of thenumber who score above 80.

(b) About 75% of 20 year old women weigh between 103.5 and 148.5 lb. Using a normal model,and assuming that 103.5 and 148.5 are equal distant from the mean, µ, calculate σ.

4.1.2 One reason cited for the mental deterioration so often seen in the very elderly is the reductionin cerebral blood flow that accompanies the aging process. Addressing itself to this notion, astudy was done (Ball and Taylor, 1969) to see whether cyclandelate, a vasodilator, might be ableto stimulate the cerebral circulation and thereby slow the rate of mental deterioration. Bloodcirculation time can be measured using a radioactive tracer. Let X and Y be the mean bloodcirculation time before treatment and after treatment respectively, for a randomly selected elderlypatient.

(a) Let

D =

0 if Y < X1 if Y > X .

What is the distribution of D?


9

4.2 Classes of Parametric Families M2S1 Problems(b) Consider the skeptical hypothesis that cyclandelate has no effect on mean circulation time

and any differences that are observed before and after treatment are due to chance. What isthe distribution of D under this hypothesis?

(c) Now suppose we select a random sample of n patients, and let ∆ =∑ni=1Di, what is the

distribution of ∆ under the skeptical hypothesis?

(d) The drug was given to eleven subjects and blood flow was measured before and after treat-ment as described above. The data appear below. What is the observed value, δ, of therandom variable ∆? How likely is it that we would see a value as extreme or more extremeunder the skeptical hypothesis? What do you conclude about the skeptical hypothesis?

CEREBRAL CIRCULATION EXPERIMENT

Mean Circulation Time (seconds)Before, xi After, yi

15 1312 812 12.514 1213 1213 12.513 12.512 14

12.5 1212 11

12.5 10

4.1.3 The drug enforcement team in a large city ceased a stash of 496 small packets containing whatappeared to be illegal narcotics. They were packaged for resale on the street. Four of the packetswere randomly sampled, tested, and found to contain prohibited substances. Undercover policeofficers took two more of the (untested) packets and sold them to a defendant later accused ofpurchasing illegal drugs. Unfortunately, these last two packets were lost before they could betested for narcotics. [This question is based on actual events as described in Shuster (1991) andreported in Casella and Berger (2002).]

(a) Let N be the number of the original 496 packets that contained prohibited substances andlet M = 496 − N be the number that did not. Compute the probability that the first fourrandomly selected packets contained narcotics and the next two randomly selected packetsdid not. (You should report your answer as a function of N and M .)

(b) Maximize the probability that you found in part (a) as a function of N and M = 496 −N .This is the defendant’s maximum probability of innocence.

4.1.4 Show that the binomial distribution converges to the Poisson distribution as n→∞ with λ = pnheld fixed. That is,

limn→∞

(n

x

)px(1− p)n−x =

e−λλx

x!,

where p = λ/n.

4.2 Classes of Parametric Families

4.2.1 Consider the Poisson distribution with expectation λ.


10

M2S1 Problems(a) Show that the Poisson distribution belongs to the exponential family. What is the canonical

parameterization?

(b) Now suppose λ = exp∑J

j=1 xjβj

, where (x1, . . . , xJ) are known predictor variables,and (β1, . . . , βJ) are unknown parameters. (Here β replaces λ as the unknown parameter.)Show that this new distribution is also a member of the exponential family.

4.2.2 Consider the probability density function fX(x) = e−x, for x > 0.

(a) Construct a location-scale family fX(x|γ, β) from fX(x), where γ is the location parameterand β is the scale parameter.

(b) Derive the non-central moments, αn = E(Y n), where Y follows the location-scale family.

5 Multivariate Random Variables and Probability Distributions

5.1 Multivariate Random Variables

5.1.1 Suppose that X and Y are discrete random variables with joint mass function given by

fX,Y (x, y) = c2x+y

x!y! , for x, y = 0, 1, 2, . . . ,

and zero otherwise, for some constant c.

(a) Find the value of c, and the marginal mass functions of X and Y .(b) Prove that X and Y are independent random variables, that is,

fX,Y (x, y) = fX(x)fY (y), for all x, y = 0, 1, . . . .

5.1.2 Continuous random variables X and Y have joint cdf FX,Y defined by

FX,Y (x, y) =(1− e−x

) (1

2+

1

πtan−1 y

), for x > 0 and −∞ < y <∞,

withFX,Y (x, y) = 0, for x ≤ 0.

Find the joint pdf, fX,Y . Are X and Y independent ? Justify your answer.

5.1.3 Suppose that X and Y are continuous random variables with joint pdf given by

fX,Y (x, y) = cx(1− y), for 0 < x < 1 and 0 < y < 1,

and zero otherwise, for some constant c.

(a) Are X and Y independent random variables?(b) Find the value of c.(c) Find Pr(X < Y ).

5.1.4 Suppose that the joint pdf of X and Y is given by

fX,Y (x, y) = 24xy, for x > 0, y > 0, and x+ y < 1,

and zero otherwise. Find (a) the marginal pdf of X , fX , (b) the marginal pdf of Y , fY , (c) theconditional pdf of X given Y = y, fX|Y , (d) the conditional pdf of Y given X = x, fY |X , (e) theexpected value of X , and (f) the expected value of Y .

[Hint: Sketch the region on which the joint density is non-zero; remember that the integrand isonly non-zero for some part of the integral range.]


11

5.2 Multivariate Transformations M2S1 Problems5.1.5 Suppose that X and Y are continuous random variables with joint pdf given by

fX,Y (x, y) =1

2x2y, for 1 ≤ x <∞ and

1

x≤ y ≤ x,

and zero otherwise.

(a) Find the marginal pdf of X .

(b) Find the marginal pdf of Y .

(c) Find the conditional pdf of X given Y = y.

(d) Find the conditional pdf of Y given X = x.

(e) Find the expectation of Y , E(Y ).

5.1.6 Suppose X is a random variable with pdf proportional to the normal pdf (mean µ and varianceσ2) for positive x and zero elsewhere,

fX(x) =

k√

2πσ2exp

− (x−µ)2

2σ2

, for x ≥ 0

0, elsewhere .

Derive the mean and variance of X .

5.1.7 A critical component in an experimental rocket can withstand temperatures up to t0 in degreesCentigrade. If the maximum temperature, T , of this component exceeds t0, the rocket will fail.Preliminary tests indicate that there is some variability in the maximum temperature that the com-ponent is likely to reach when the rocket is launched; the pdf of this temperature is given by fT (t).Engineers are anxious about an upcoming test launch because, although

∫∞t0fT (t)dt is near zero,

it is greater than zero. There are sensors in the rocket that will record T , but it will take some timeto recover the rocket and analyze the data from the sensors. (This, of course, is assuming that therocket does not fail.) Suppose that the test launch goes smoothly and the rocket does not fail.

(a) Carefully derive the pdf of the maximum temperate of the critical component after the en-gineers observe the successful launch of the rocket, but before they are able to analyze thesensor data.

(b) Verify that the pdf you gave in part (a) is a valid pdf.

5.2 Multivariate Transformations

5.2.1 Suppose Xiiid∼ Gamma(αi, β) for i = 1, . . . , n.

(a) Use the convolution theorem to show that Y = X1 +X2 ∼ Gamma(α1 + α2, β).

(b) Prove that Zn =∑ni=1Xi ∼ Gamma (

∑ni=1 αi, β).

5.2.2 (?) Suppose that X and Y have joint pdf that is constant on the range X(2) ≡ (0, 1)× (0, 1), andzero otherwise. Find the marginal pdf of the random variables U = X/Y and V = − log(XY ),stating clearly the range of the transformed random variable in each case.

[Hint: For U , you might consider first the joint pdf of (U,X), then obtain the marginal pdf of U .For V , consider the joint pdf of (V,− logX), then obtain the marginal pdf of V . Compare theease of these calculations with those required by the joint transformation from (X,Y ) to (U, V ).]


12

5.2 Multivariate Transformations M2S1 Problems5.2.3 (?) Suppose that continuous random variables X1, X2, X3 are independent, and have marginal

pdfs specified byfXi(xi) = cix

iie−xi , xi > 0,

for i = 1, 2, 3, where c1, c2 and c3 are normalizing constants. Find the joint pdf of the randomvariables Y1, Y2, Y3 defined by

Y1 = X1/(X1 +X2 +X3), Y2 = X2/(X1 +X2 +X3), Y3 = X1 +X2 +X3,

and evaluate the (marginal) expectation of Y1.

5.2.4 Suppose that X and Y are continuous random variables with pdf given by

fX,Y (x, y) =1

2πexp

−1

2

(x2 + y2

), for x, y ∈ R.

(a) Let the random variable U be defined by U = X/Y . Find the pdf of U .(b) Suppose now that S ∼ χ2

ν is independent of X and Y . (The pdf of S is given by

fS(s) = c(ν)sν/2−1e−s/2, for s > 0,

where ν is a positive integer and c(ν) is a normalizing constant depending on ν.) Find thepdf of random variable T defined by

T =X√S/ν

.

This is the pdf of a t random variable with ν degrees of freedom.

5.2.5 Suppose (X1, . . . , Xn) is a collection of independent and identically distributed random variablestaking values on X with pmf/pdf fX and cdf FX . Let Yn and Zn correspond to the maximum andminimum order statistics derived from (X1, . . . , Xn), that is

Yn = max X1, . . . , Xn , Zn = min X1, . . . , Xn .

(a) Show that the cdfs of Yn and Zn are given by

FYn(y) = FX(y)n , FZn(z) = 1− 1− FX(z)n .

(b) Suppose X1, . . . , Xn ∼ Unif(0, 1), that is

FX(x) = x, for 0 ≤ x ≤ 1.

Find the cdfs of Yn and Zn.(c) Suppose X1, . . . , Xn have cdf

FX(x) = 1− x−1, for x ≥ 1.

Find the cdfs of Zn and Un = Znn .(d) Suppose X1, . . . , Xn have cdf

FX(x) =1

1 + e−x, for x ∈ R.

Find the cdfs of Yn and Un = Yn − log n.(e) Suppose X1, . . . , Xn have cdf

FX(x) = 1− 1

1 + λx, for x > 0.

Find the cdfs of Yn, Zn, Un = Yn/n, and Vn = nZn.


13

5.3 Covariance and Correlation M2S1 Problems5.3 Covariance and Correlation

5.3.1 Suppose X and Y are two random variables each with finite mean and variance. Prove −1 ≤ρXY ≤ 1 by using the fact that

Var

(X

σX+

Y

σY

)and Var

(X

σX− Y

σY

)are both positive quantities.

5.3.2 Suppose that X and Y have joint pdf given by

fX,Y (x, y) = cxy(1− x− y), for 0 < x < 1, 0 < y < 1, and 0 < x+ y < 1,

for some constant c > 0. Find the covariance of X and Y .

5.4 Hierarchical and Mixture Models

5.4.1 Suppose X|Y ∼ Exponential(Y ) and Y ∼ Gamma(α, β), using the parameterization on theformula sheet.

(a) Find the mean and variance of X .

(b) Find the marginal distribution of X .

5.4.2 The number of daughters of an organism is a discrete random variable with mean µ and varianceσ2. Each of its daughters reproduces in the same manner. Find the expectation and variance ofthe number of granddaughters.

5.4.3 Suppose that the joint pdf of random variables X and Y is specified via the conditional densityfX|Y and the marginal density fY as

fX|Y (x|y) =

√y

2πexp

−yx

2

2

, for x ∈ R; fY (y) = c(ν)yν/2−1e−νy/2, for y > 0,

where ν is a positive integer. Find the marginal pdf of X .

6 Multivariate Families of Distributions

6.1 Multinomial and Dirichlet Distributions

6.1.1 Suppose N ∼ Poisson(λ) and X|N ∼ Multinomial(N, p), where N is univariate and X =(X1, . . . , Xk)

> is a (k × 1) random variable and p = (p1, . . . , pk)> is a (k × 1) probability

vector.

(a) Write the joint pmf of N andX .

(b) (?) Rearrange the terms in the joint pmf and its support to show that Xiind∼ Poisson(λpi)

and N =∑ki=1Xi.


14

6.2 Multivariate Normal Distribution M2S1 Problems6.2 Multivariate Normal Distribution

6.2.1 (?) The Bivariate Normal Distribution: Suppose that X1 and X2 are independent and identicallydistributed N(0, 1) random variables. Let random variables Y1 and Y2 be defined by

Y1 = µ1 + σ1

√1− ρ2X1 + σ1ρX2 and Y2 = µ2 + σ2X2,

or, equivalently, (Y1

Y2

)=

(µ1

µ2

)+

(σ1

√1− ρ2 σ1ρ

0 σ2

)(X1

X2

),

for positive constants σ1 and σ2, and |ρ| < 1.

(a) Find the joint pdf of (Y1, Y2).

(b) Show that, marginally for i = 1, 2, Yi ∼ N(µi, σ

2i

), and that conditionally

Y1|Y2 = y2 ∼ N[µ1 + ρσ1

σ2(y2 − µ2) , σ2

1

(1− ρ2

)]Y2|Y1 = y1 ∼ N

[µ2 + ρσ2

σ1(y1 − µ1) , σ2

2

(1− ρ2

)].

(c) Find the correlation between Y1 and Y2.

6.2.2 Suppose (X1

X2

)∼ N2

[µ =

(2−5

),Σ =

(1 −0.5−0.5 4

)].

Compute Pr(X1 > 0) and Pr(X2 < −6).

[Hint: You may use the result of Problem 6.2.1.]

6.2.3 The joint pdf of the random variables X1 and X2 is

fX1,X2(x1, x2) = k exp

−(x2

1

6− x1x2

3+

2x22

3

), for −∞ < x1, x2 <∞.

Find E(X1),E(X2),Var(X1),Var(X2),Cov(X1, X2) and k.

[Hint: You may use the result of Problem 6.2.1.]

6.2.4 (?) [Warning: If you have an aversion to vector notation, you may find this question challenging!]Suppose Y and X = (X1, X2)> jointly follow a trivariate normal distribution. Here Y is aunivariate random variable and Z = (Y,X1, X2)> is a (3 × 1) trivariate normal random vectorwith mean

µ =

(µYµX

)and variance-covariance matrix M−1 =

(mY Y MYX

M>YX MXX

)−1

,

where µY is the univariate mean of Y , µX is the (2×1) mean vector ofX , µ is the (3×1) meanvector of bothX and Y ,mY Y is the first diagonal element ofM ,MXX is the lower-right (2×2)submatrix of M , and MYX is the remaining off-diagonal (1 × 2) submartix of M . (Note thatwe parameterize the multivariate normal in terms of the inverse of its variance-covariance matrix.This will significantly simplify calculations!)

(a) Derive the conditional distribution of Y given both X1 and X2. [Hint: Use vector/matrixnotation.]


15

6.3 Connections between the Distributions M2S1 Problems(b) Now suppose Y and X = (X1, . . . , Xn)> jointly follow a multivariate normal distribution.

Here Y remains a univariate random variable andZ = (Y,X1, . . . , Xn)> is an [(n+1)×1]multivariate normal random vector. Use the same notation for the mean and the inverseof the variance-covariance matrix, but with appropriately adjusted dimensions. Derive theconditional distribution of Y given X1, . . . , Xn. [Hint: If you used vector/matrix notationin part (a), this problem will be very easy. If you did not, it will be very hard!]

(c) Set n = 1 and check that your answer is the same as the conditional distribution for thebivariate normal derived in lecture and in Problem 6.2.1.

6.3 Connections between the Distributions

6.3.1 Suppose that U1 and U2 are independent and identically distributed Unif(0, 1) random variables.Let random variables Z1 and Z2 be defined by

Z1 =√−2 log(U1) cos (2πU2) ,

Z2 =√−2 log(U1) sin (2πU2) .

Find the joint pdf of (Z1, Z2).

6.3.2 Suppose that U is a Unif(0, 1) random variable. Find the distribution of

X = −β logU.

6.3.3 Suppose that an unlimited sequence of Unif(0, 1) random variables is available. Using the resultsof Problems 6.3.1 and 6.3.2, and results discussed earlier this term, describe how to generate:

(a) a Gamma(k, λ) random variable, for integer k ≥ 0;

(b) a realization of a Poisson process with rate µ;

(c) a χ2ν ≡ Gamma

(ν2 ,

12

)random variable, where ν is a positive, integer parameter;

(d) a tn random variable, where n is a positive integer parameter.

7 Sampling Distributions and Statistical Inference

7.1 Background

7.2 Statistics and Their Sampling Distributions

7.2.1 Suppose that (X1, . . . , Xn) is a random sample from a Poisson(λ) distribution. Define the statis-tics

T1 = X =1

n

n∑i=1

Xi, and T2 = S2 =1

n− 1

n∑i=1

(Xi − X)2.

Show thatE(T1) = E(T2) = λ.

7.2.2 Suppose that (X1, . . . , Xn) is a random sample from the probability distribution with pdf

fX(x; θ) =1

θe−x/θ, for x > 0.


16

7.3 The Method of Moments M2S1 Problems(a) Show that the sample mean X is an unbiased estimator of θ.

(b) Set Y1 = min X1, . . . , Xn and show that Z = nY1 is also unbiased for θ.

7.2.3 Suppose that (X1, . . . , Xn) is a random sample from the uniform distribution on (θ − 1, θ + 1).

(a) Show that the sample mean X is an unbiased estimator of θ.

(b) Let Y1 and Yn be the smallest and largest order statistics derived from (X1, . . . , Xn). Showalso that random variable M = (Y1 + Yn) /2 is an unbiased estimator of θ.

7.3 The Method of Moments

7.3.1 Method of moments.

(a) Suppose (X1, . . . , Xn) is a random sample from a gamma distribution, having pdf

fX(x) =1

βαΓ(α)xα−1 exp−x/β, for x > 0,

where α, β > 0. Find the method of moments estimators of α and β. Can the correspondingestimates ever be outside the parameter space?

(b) Suppose (X1, . . . , Xn) is a random sample from a beta distribution, having pdf

fX(x) =1

B(α, β)xα−1(1− x)β−1, for 0 < x < 1,

where α, β > 0. Find the method of moments estimators of α and β. Can the correspondingestimates ever be outside the parameter space?

7.4 Maximum Likelihood Estimation

7.4.1 [Problem 4.2.2 continued] Consider the probability density function fX(x) = e−x for x > 0.In Problem 4.2.2 (a) you derived a location-scale family fX(x|γ, β) from fX(x), where γ is thelocation parameter and β is the scale parameter. Now suppose β = 1. Report the loglikelihoodfunction for γ and compute its maximum likelihood estimator, γ.

7.4.2 [Problem 6.1.1 continued] Suppose N ∼ Poisson(λ) and X|N ∼ Multinomial(N, p), where Nis univariate and X = (X1, . . . , Xk)

> is a (k × 1) random variable and p = (p1, . . . , pk)> is a

(k × 1) probability vector. In Problem 6.1.1, you showed that

Xiind∼ Poisson(λpi) and N =

k∑i=1

Xi. (3)

(a) Let ξ = λp. Note that ξ = (ξ1, . . . , ξk) is a (k × 1) vector. Using (3), find the maximumlikelihood estimator, ξ of ξ.

(b) Using ξ derived in part (a) derive formulas for λ and p that satisfy ξ = λp, such that p isa probability vector. You do not need to show it, but maximum likelihood estimators areinvariant to transformations, so that λ and p are the maximum likelihood estimators of λ andp, respectively.

7.4.3 Suppose that (X1, . . . , Xn) is a random sample from a Poisson(λ) distribution.

(a) Find the maximum likelihood estimator of λ and show that this estimator is unbiased.


17

7.5 Random Intervals and Confidence Intervals M2S1 Problems(b) Find the maximum likelihood estimator of τ(λ) = e−λ = Pr(X = 0).

7.4.4 Find the maximum likelihood estimators of the unknown parameters in the following probabilitydensities on the basis of a random sample of size n.

(a) fX(x; θ) = θxθ−1, for 0 < x < 1 and θ > 0.

(b) fX(x; θ) = (θ + 1)x−θ−2, for 1 < x and θ > 0.

(c) fX(x; θ) = θ2x exp −θx , for 0 < x and θ > 0.

(d) fX(x; θ) = 2θ2x−3, for θ ≤ x and θ > 0.

(e) fX(x; θ1, θ2) = θ1θθ12 x−θ1−1, for θ2 ≤ x and θ1, θ2 > 0.

7.5 Random Intervals and Confidence Intervals

7.5.1 Suppose you observe a single observation from a normal distribution with unit variance and un-known mean, µ. Specifically, X ∼ N(µ, 1).

(a) Report the loglikelihood function for µ and compute its maximum likelihood estimator, µ.

(b) What is the distribution of µ?

(c) Derive an interval I(µ) that has a 95% chance of containing µ. For definiteness, choose theshortest possible interval with this property. Make a plot of your interval, with µ plotted onthe horizontal axis, and the lower and upper bounds of I(µ) plotted on the vertical axis.

(d) Now consider the interval J (µ) =µ : µ ∈ I(µ)

. Identify this interval on your plot from

your answer to part (c). Give formulas for the lower and upper bounds of J (µ). Noticethat J can be computed from data, whereas I cannot. (I depends on the unknown mean, µ,while J depends only on the maximum likelihood estimator.)

(e) Show thatPr[µ ∈ J (µ)

]= 95%.

[Hint: What is the random quantity in this expression.] An interval with this property iscalled a 95% confidence interval.

7.5.2 Suppose you observe a binomial random variable, X ∼ Bin(n, p), where n is known, but p is not.

(a) Report the loglikelihood function for p and compute its maximum likelihood estimator, p.

(b) What is the distribution of p?

(c) Suppose n = 10 and for p on the grid of values (0, 0.1, 0.2, 0.3, . . . , 1.0), derive an intervalI(p) that has at least a 95% chance of containing p. For definiteness, choose the shortestpossible interval with this property. Make a plot of your interval, interpolating linearlybetween grid points, with p plotted on the horizontal axis, and the lower and upper boundsof I(p) plotted on the vertical axis.

(d) Now consider the interval J (p) =p : p ∈ I(p)

. Identify this interval on your plot from

your answer to part (c). Compute J (p) for each possible value of p.

(e) Show thatPr[p ∈ J (p)

]≥ 95%,

at least for p ∈ (0, 0.1, 0.2, 0.3, . . . , 1.0).

(f) Qualitatively, how will the intervals change as n increases?


18

7.6 Bayesian Statistical Inference M2S1 Problems7.5.3 Suppose that (X1, . . . , Xn) is a random sample from the probability distribution with pdf

fX(x; θ) = θe−θx, for x > 0.

(a) Find the maximum likelihood estimator of θ and show that it is biased as an estimator of θ,but that some multiple of it is not.

(b) Show that 2θ∑ni=1Xi is a pivotal quantity. Describe briefly how to use this to construct a

100(1− α)% confidence interval for θ, α ∈ (0, 1).

7.5.4 Let (X1, . . . , Xn) be a random sample from the uniform distribution on (0, θ).

(a) Find the maximum likelihood estimator, θ of θ.

(b) By considering the distribution of θ/θ, show that for α ∈ (0, 1), a 100(1− α)% confidenceinterval for θ based on θ is given by (θ, θ/α1/n).

7.5.5 Suppose X1 ∼ N(θ1, 1) and X2 ∼ N(θ2, 1), with X1 and X2 independent and θ1 and θ2 bothunknown. Show that both the square S and circle C given by

S = (θ1, θ2) : |X1 − θ1| ≤ 2.236, |X2 − θ2| ≤ 2.236 andC = (θ1, θ2) : (X1 − θ1)2 + (X2 − θ2)2 ≤ 5.991

are 95% confidence sets for (θ1, θ2). What is a sensible criterion for choosing between S and C?

7.5.6 The following data is sampled from a N(µ, σ2) distribution, where both µ and σ2 are unknown:6.82, 6.07, 3.74, 6.87, 5.92. For this data

∑xi = 29.42 and

∑x2i = 179.588.

(a) Find a 95% confidence interval for µ and show that its width is about 3.16.

(b) Suppose it becomes known that the true value of σ2 is 1. Show that a 95% confidenceinterval for µ now has width about 1.75. The width of the confidence interval is narrowerwhen the true value of σ2 is known. Will this always happen?

(c) (?) Consider the event that the 95% confidence interval for µ is narrower when σ2 is knownthan when it is unknown, with both intervals are computed from the same random sample(X1, . . . , Xn) from N(µ, σ2). Show that for n = 5 this event has probability a bit less than0.75. [Hint: you will need to refer to tables of the quantiles of the χ2

4 distribution.]

7.6 Bayesian Statistical Inference

7.6.1 Suppose Y |Λ = λ ∼ Poisson(λ) and we wish to estimate λ using Bayesian methods and specify agamma prior distribution, Λ ∼ Gamma(r, β). Use the parameterization of the gamma distributionon the formula sheet and assume that r is a positive integer.

(a) Derive the posterior distribution of Λ given Y . What named distribution is this?

(b) What are the posterior mean and variance of Λ?

(c) Find the marginal distribution of Y . What named distribution is this?

(d) Compute the maximum likelihood estimator, λ of λ. How does the maximum likelihoodestimate compare with the posterior mean?[Hint: Ignore the prior distribution when computing the maximum likelihood estimate/estimator!]

7.6.2 Suppose Y |Θ ∼ Binomial(n,Θ) and show:

(a) If Θ ∼ Unif(0, 1), then Var(Θ|Y ) < Var(Θ).


19

M2S1 Problems(b) If Θ ∼ Beta(α, β), Var(Θ|Y ) may be larger than Var(Θ).

7.6.3 Suppose you are given the choice between two envelopes, one containing θ pounds and the otherwith 2θ pounds. The envelopes are sealed and shuffled so that you do not know which one containsmore money. One of the envelopes is opened and found to have x pounds. You can either take thex pounds or take the other envelope and the money that it contains. You are not, however, allowedto open the second envelope before you make your decision.

(a) Suppose that your subjective pdf for θ is uniform on the interval (0,M) for some positiveM . What is your optimal strategy given x.

(b) Now suppose that your subjective pmf for θ is fΘ(θ) = 1/10 for θ = 1, 2, . . . , 10. What isyour optimal strategy given x.

(c) Finally, suppose that your subjective pdf for θ is fΘ(θ) = 1β e−θ/β for θ > 0 and 0 elsewhere,

where β is some positive number. Show how the optimal strategy depends on x. [Hint: LetW be the expected value of the money in the sealed envelope and let ξ = W/x. When isξ > 1?]

8 Convergence Concepts

8.1 Convergence in Distribution and the Central Limit Theorem

8.1.1 Suppose that random variable X has mgf, MX(t) given by

MX(t) =1

8et +

2

8e2t +

5

8e3t.

Find the probability distribution, expectation, and variance of X .

[Hint: Consider MX and its definition.]

8.1.2 Suppose that X is a continuous random variable with pdf

fX(x) = exp −(x+ 2) , for − 2 < x <∞.

Find the mgf of X , and hence find the expectation and variance of X .

8.1.3 Suppose Z ∼ N(0, 1).

(a) Find the mgf of Z, and also the pdf and the mgf of the random variable X , where

X = µ+1

λZ,

for parameters µ and λ > 0.

(b) Find the expectation of X , and the expectation of the function g(X), where g(x) = ex. Useboth the definition of the expectation directly and the mgf and compare the complexity ofyour calculations.

(c) Suppose now Y is the random variable defined in terms of X by Y = eX . Find the pdf ofY , and show that the expectation of Y is

exp

µ+

1

2λ2

.


20

8.1 Convergence in Distribution and the Central Limit Theorem M2S1 Problems(d) Finally, let random variable T be defined by T = Z2. Find the pdf and mgf of T .

8.1.4 Suppose that X is a random variable with pmf/pdf fX and mgf MX . The cumulant generatingfunction of X , KX , is defined by KX(t) = log [MX(t)]. Prove that

d

dtKX(t)t=0 = E(X),

d2

dt2KX(t)t=0 = Var(X).

8.1.5 Using the CENTRAL LIMIT THEOREM, construct Normal approximations to each of the followingrandom variables,

(a) a Binomial distribution, X ∼ Binomial(n, θ);(b) a Poisson distribution, X ∼ Poisson(λ);(c) a Negative Binomial distribution, X ∼ Negative Binomial(n, θ).

8.1.6 Let S2n denote the sample variance of a random sample of size n from N(µ, σ2), so that

Vn =(n− 1)S2

n

σ2∼ χ2

n−1.

Show, using the CENTRAL LIMIT THEOREM that√n− 1(S2

n − σ2)

σ2√

2

D−→ Z ∼ N(0, 1),

so that, for large n, S2n is approximately distributed as N

(σ2, 2σ4

n−1

).

8.1.7 [Problem 5.2.5 continued] In Problem 5.2.5, you derived the cdfs of a number of random variablesinvolving the minimum or maximum of a random sample. In this problem we will derive thelimiting distribution of these same random variables.

Suppose (X1, . . . , Xn) is a collection of independent and identically distributed random variablestaking values on X with pmf/pdf fX and cdf FX , let Yn and Zn correspond to the maximum andminimum order statistics derived from X1, . . . , Xn.

(a) Suppose X1, . . . , Xn ∼ Unif(0, 1), that is

FX(x) = x, for 0 ≤ x ≤ 1.

Find the limiting distributions of Yn and Zn as n −→∞.(b) Suppose X1, . . . , Xn have cdf

FX(x) = 1− x−1, for x ≥ 1.

Find the limiting distributions of Zn and Un = Znn as n −→∞.(c) Suppose X1, . . . , Xn have cdf

FX(x) =1

1 + e−x, for x ∈ R.

Find the limiting distributions of Yn and Un = Yn − log n, as n −→∞.(d) Suppose X1, . . . , Xn have cdf

FX(x) = 1− 1

1 + λx, for x > 0.

Let Un = Yn/n and Vn = nZn. Find the limiting distributions of Yn, Zn, Un, and Vn asn −→∞.


21

8.2 Convergence in Probability, the Law of Large Numbers, and Inequalities M2S1 Problems8.2 Convergence in Probability, the Law of Large Numbers, and Inequalities

8.2.1 Convergence in Probability. Suppose X1, . . . , Xn ∼ Poisson(λ). Let

X =1

n

n∑i=1

Xi.

(a) Show that Xp−→ λ as n −→∞.

(b) (?) Suppose Tn = e−X , show that Tnp−→ e−λ.

8.2.2 Suppose S2 is computed from a random sample, (X1, . . . , Xn), from a distribution with finitevariance, σ2. Letting S =

√S2, show E(S) ≤ σ and if σ2 > 0, then E(S) < σ.


22

problems

Documents

type of probability

starred problems

p coin

probability spaces2

byyour friend

20131m2s1 problems

hours of sleep

basic probability2