topic 10: the law of large numbers - university of arizonamath.arizona.edu/~jwatkins/j-limit.pdf ·...

Topic 10: The Law of Large Numbers∗

October 6, 2011

If we choose adult European males independently and measure their heights, keeping a running average, then atthe beginning we might see some larger fluctuations but as we continue to make measurements, we expect to see thisrunning average settle and converge to the true mean height of European males. This phenomena is informally knownas the law of averages. In probability theory, we call this the law of large numbers.

Example 1. We can simulate men’s heights with independent normal random variables, mean 68 and standard de-viation 3.5. The folowing R commands perform this simulation and computes a running average of the heights. Theresults are displayed in Figure 1.

> n<-c(1:100)> x<-rnorm(100,68,3.5)> s<-cumsum(x)> plot(s/n,xlab="n",ylim=c(60,70),type="l")

Here, we begin with a sequence X1, X2, . . . of random variables having a common distribution. Their average,

1nSn =

1n

(X1 +X2 + · · ·+Xn),

is itself a random variable.If the common mean for the Xi’s is µ, then by the linearity property of expectation, the mean of the average

E[1nSn] =

1n

(EX1 + EX2 + · · ·+ EXn) =1n

(µ+ µ+ · · ·+ µ) =1nnµ = µ (1)

If, in addition, the Xi’s are independent with common variance σ2, then by the Pythagorean theorem for thevariance of independent random variables,

Var(1nSn) =

1n2

(Var(X1) + Var(X2) + · · ·+ Var(Xn)) =1n2

(σ2 + σ2 + · · ·+ σ2) =1n2nσ2 =

1nσ2. (2)

So the mean of these running averages remains at µ but the variance is decreasing to 0 at a rate inversely propor-tional to the number of terms in the sum. For example, the mean of the average height of 100 European males is 68inches, the standard deviation is σ/

√n = 3.5/

√100 = 0.35 inches. For 10,000 males, the mean remains 68 inches,

he standard deviation is σ/√n = 3.5/

√10000 = 0.035 inches

The mathematical result, the law of large numbers, tells us that the results of these simulation could have beenanticipated.

Theorem 2. For a sequence of independent random variablesX1, X2, . . . having a common distribution, their runningaverage

1nSn =

1n

(X1 + · · ·+Xn)

has a limit as n→∞ if and only if this sequence of random variables has a common mean µ. In this case the limit isµ.

∗ c© 2011 Joseph C. Watkins

129

Introduction to Statistical Methodology The Law of Large Numbers

0 20 40 60 80 100

60

62

64

66

68

70

n

s/n

0 20 40 60 80 100

60

62

64

66

68

70

n

s/n

0 20 40 60 80 100

60

62

64

66

68

70

n

s/n

0 20 40 60 80 100

60

62

64

66

68

70

n

s/n

Figure 1: Four simulations of the running average Sn/n, n = 1, 2, . . . , 100 for independent normal random variables, mean 68 and standarddeviation 3.5.

The theorem also states that if the random variables do not have a mean, then as the next example shows, the limitwill fail to exist. We shall show with the following example.

Example 3. The standard Cauchy random variable X has density function

fX(x) =1π

11 + x2

.

Let Y = |X|. In an attempt to compute EY , note that∫ b

−b|x|fX(x) dx = 2

∫ b

0

1π

x

1 + x2dx =

1π

ln(1 + x2)∣∣∣b0

= ln(1 + b2)→∞

as b→∞. Thus, Y has infinite mean.

130


We now simulate 1000 independent Cauchy random variables.

> n<-c(1:1000)> y<-abs(rcauchy(1000))> s<-cumsum(y)> plot(s/n,xlab="n",ylim=c(-6,6),type="l")

These random variables do not have a finite mean. As you can see in Figure 2 that their running averages do notseem to be converging. Thus, if we are using a simulation strategy that depends on the law of large numbers, we needto check that the random variables have a mean.

Exercise 4. Check the failure of the large number of Cauchy random variables via simulations. Note that the shockscan jump either up or down in this case.

1 Monte Carlo Integration

Monte Carlo methods use stochastic simulations to approximate solutions to questions too difficult to solve analyti-cally. This approach has seen widespread use in fields as diverse as statistical physics, astronomy, population genetics,protein chemistry, and finance. Our introduction will focus on examples having relatively rapid computations. How-ever, many research groups routinely use Monte Carlo simulations that can take weeks of computer time to perform.

For example, let X1, X2, . . . be independent random variables uniformally distributed on the interval [a, b] andwrite fX for the density of a uniform random variable on the interval [a, b].

Then, by the law of large numbers,

g(X)n =1n

n∑i=1

g(Xi) ≈ Eg(X1) =∫ b

a

g(x)fX(x) dx =1

b− a

∫ b

a

g(x) dx

n lsrge. Let’s denote by I(g) the value of this integral.Recall that in calculus, we defined the average of g to be

1b− a

∫ b

a

g(x) dx.

We can also interpret this integral as the expected value of g(X1).

Thus, Monte Carlo integration leads to a procedure for estimating integrals.

• simulating uniform random variables X1, X2, . . . , Xn on the interval [a, b],

• computing the average of g(X1), g(X2), . . . , g(Xn) to estimate the average of g, and

• multiplying by b− a to estimate the integral.

Example 5. Let g(x) =√

1 + cos3(x) for x ∈ [0, π], to find∫ π0g(x) dx. The three steps above become the following

R code.

> x<-runif(1000,0,pi)> g<-sqrt(1+cos(x)ˆ3)> pi*mean(g)[1] 2.991057

131


0 200 400 600 800 1000

02

46

810

12

n

s/n

0 200 400 600 800 1000

02

46

810

12

n

0 200 400 600 800 1000

02

46

810

12

n

0 200 400 600 800 1000

02

46

810

12

nFigure 2: Four simulations of the running average Sn/n, n = 1, 2, . . . , 1000 for the absolute value of independent Cauchy random variables.Note that the running averate does not seem to be settling down and is subject to “shocks”.

The error in the estimate of the integral can be estimated by the variance as given in equation (2).

Var(g(X)n) =1n

Var(g(X1)).

where σ2 = Var(g(X1)) = E(g(X1) − µg(X1))2 =

∫ ba(g(x) − µg(X1))

2fX(x) dx. Typically this integral is moredifficult to estimate than

∫ bag(x) dx, our original integral of interest. However, we can see that the variance of the

estimator is inversely proportional to n, the number of random numbers in the simulation. Thus, the standard deviationis inversely proportional to

√n.

Monte Carlo techniques are rarely the best strategy for estimating one or even very low dimensional integrals. R

132


does integration numerically using the function and the integrate commands. For example,

> g<-function(x){sqrt(1+cos(x)ˆ3)}> integrate(g,0,pi)2.949644 with absolute error < 3.8e-06

However, with only a small change in the algorithm, we can also use this to evaluate high dimensional multivariateintegrals. For example, in three dimensions, the integral

I(g) =∫ 1

0

∫ 1

0

∫ 1

0

g(x, y, z) dx dy dz

can be estimated using Monte Carlo integration by generating three sequences of uniform random variables,

X1, X2, . . . , Xn, Y1, Y2, . . . , Yn, and Z1, Z2, . . . Zn

Then,

I(g) ≈ 1n

n∑i=1

g(Xi, Yi, Zi). (3)

Example 6. To obtain a sense of the distribution of the approximations to I(g), we perform 1000 simulations using100 uniform random variable for each of the three coordinates to perform the Monte Carlo integration. The commandIg<-rep(0,1000) creates a vector of 1000 zeros. This is added so that R creates a place ahead of the simulationsto store the results.

> Ig<-rep(0,1000)> for(i in 1:1000){x<-runif(100);y<-runif(100);z<-runif(100);

g<-32*xˆ3/(3*(y+zˆ4+1)); Ig[i]<-mean(g)}> hist(Ig)> summary(Ig)

Min. 1st Qu. Median Mean 3rd Qu. Max.1.045 1.507 1.644 1.650 1.788 2.284

> var(Ig)[1] 0.03524665> sd(Ig)[1] 0.1877409

Exercise 7. Estimate the variance and standard deviation of the Monte Carlo estimator for the integral (3) based onn = 500 and 1000 random numbers.

Exercise 8. How many observations are needed in estimating (3) so that the standard deviation of the average is 0.05?

To modify this technique for a region [a1, b1]× [a2, b2]× [a3, b3] use indepenent uniform random variables Xi, Yi,and Zi on the respective intervals, then

1n

n∑i=1

g(Xi, Yi, Zi) ≈ Eg(X1, Y1, Z1) =1

b1 − a1

1b2 − a2

1b3 − a3

∫ b1

a1

∫ b2

a2

∫ b3

a3

g(x, y, z) dz dy dx.

Thus, the estimate for the integral is

(b1 − a1)(b2 − a2)(b3 − a3)n

n∑i=1

g(Xi, Yi, Zi).

133


Histogram of Ig

Ig

Frequency

1.0 1.2 1.4 1.6 1.8 2.0 2.2

050

100

150

Figure 3: Histogram of 1000 Monte Carlo estimates for the integralR 10

R 10

R 10 32x3/(y + z4 + 1) dx dy dz. The sample standard deviation

σ = 0.187.

2 Importance SamplingIn many of the large simulations, the dimension of the integral can be in the hundreds and the function g can bevery close to zero for large regions in the domain of g. Simple Monte Carlo simulation will then frequently choosevalues for g that are close to zero. These values contribute very little to the average. Due to this inefficiency, a moresophisticated strategy is employed. Importance sampling methods begin with the observation that a better strategymay be to concentrate the random points where the integrand g is large in absolute value.

For example, for the integral ∫ 1

0

e−x/2√x(1− x)

dx, (4)

the integrand is much bigger for values near x = 0 or x = 1. (See Figure 4) Thus, we can hope to have a more accurateestimate by concentrating our sample points in these places.

With this in mind, we perform the Monte Carlo integration beginning with Y1, Y2, . . . independent random vari-ables with common densityfY . The goal is to find a density fY that is large when |g| is large and small when |g|is small. The density fY is called the importance sampling function or the proposal density. With this choice ofdensity, we define the importance sampling weights

w(y) =g(y)fY (y)

. (5)

To justify this choice, note that, the sample mean

w(Y )n =1n

n∑i=1

w(Yi) ≈∫ ∞−∞

w(x)fY (x) dy =∫ ∞−∞

g(x)fY (x)

fY (x) dx = I(g).

Thus, the average of the importance sampling weights, by the strong law of large numbers, still approximates theintegral of g. This is an improvement over simple Monte Carlo integration if the variance decreases, i.e.,

Var(w(Y1)) =∫ ∞−∞

(w(y)− I(g))2fY (y) dy = σ2f << σ2.

134


As the formula shows, this can be best achieved by having the weight w(y) be close to the integral I(g). Referring toequation (5), we can now see that we should endeavor to have fY proportional to g.

Example 9. For the integral (4) we can use Monte Carlo simulation based on uniform random variables.

> Ig<-rep(0,1000)> for(i in 1:1000){x<-runif(100);g<-exp(-x/2)*1/sqrt(x*(1-x));Ig[i]<-mean(g)}> summary(Ig)

Min. 1st Qu. Median Mean 3rd Qu. Max.1.970 2.277 2.425 2.484 2.583 8.586

> sd(Ig)[1] 0.3938047

Based on a 1000 simulations, we find a sample mean value of 2.425 and a sample standard deviation of 0.394.Because the integrand is very large near x = 0 and x = 1, we choose look for a density fY to concentrate the randomsamples near the ends of the intervals.

Our choice for the proposal density is a Beta(1/2, 1/2), then

fY (y) =1πy1/2−1(1− y)1/2−1

on the interval [0, 1]. Thus the weightw(y) = πe−y/2.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

x

g(x) = e−x/2/sqrt(x(1−x))

fY(x)=1//sqrt(x(1−x))

Figure 4: Importance sampling using the density function fY to estimateR 10 g(x) dx. The weight w(x) = πe−x/2 is the ratio g(x)/fY (x).

Again, we perform the simulation multiple times.

> IS<-rep(0,1000)> for(i in 1:1000){y<-rbeta(100,1/2,1/2);w<-pi*exp(-y/2);IS[i]<-mean(w)}> summary(IS)

Min. 1st Qu. Median Mean 3rd Qu. Max.

135


2.321 2.455 2.483 2.484 2.515 2.609> var(IS)[1] 0.0002105915> sd(IS)[1] 0.04377021

Based on 1000 simulations, we find a sample mean value of 2.484 and a sample standard deviation of 0.044, about1/9th the size of the Monte Carlo weright. Part of the gain is illusory. Beta random variables take longer to simulate.If they require a factor more than 81 times longer to simulate, then the extra work needed to create a good importancesample is not helpful in producing a more accurate estimate for the integral.

Simpson’s rule can also be used in this case and we obtain the answer 2.48505.

Exercise 10. Evaluate the integral ∫ 1

0

e−x

3√xdx

1000 times using n = 200 sample points using directly Monte Carlo integration and using importance sampling withrandom variables having density

fX(x) =2

3 3√x

on the interval [0, 1]. For the second part, you will need to use the probability transform. Compare the means andstandard deviations of the 1000 estimates for the integral. The integral is approximately 1.04969.

3 Answers to Selected Exercises4. Here are the R commands:

> par(mfrow=c(2,2))> x<-rcauchy(1000)> s<-cumsum(x)> plot (n,s/n,type="l")> x<-rcauchy(1000)> s<-cumsum(x)> plot (n,s/n,type="l")> x<-rcauchy(1000)> s<-cumsum(x)> plot (n,s/n,type="l")> x<-rcauchy(1000)> s<-cumsum(x)> plot (n,s/n,type="l")

This produces the plots below. Notice the differences for the values on the x-axis

7. The standard deviation for the average of n observations is σ/√n where σ is the standard deviation for a single

observation. From the output

> sd(Ig)[1] 0.1877409

We have that 0.1877409 ≈ σ/√

100 = σ/10. Thus, σ ≈ 1.877409. Consequently, for 500 observations, σ/√

500 ≈0.08396028. For 1000 observations σ/

√1000 ≈ 0.05936889

8. For σ/√n = 0.05, we have that n = (σ/0.05)2 ≈ 1409.866. So we need approximately 1410 observations.

10. For the direct Monte Carlo simulation, we have

136


0 200 400 600 800 1000

-2-1

01

n

s/n

0 200 400 600 800 1000

-2.5

-1.5

-0.5

0.5

ns/n

0 200 400 600 800 1000

-15

-10

-50

n

s/n

0 200 400 600 800 1000

02

46

n

s/n

> Ig<-rep(0,1000)> for (i in 1:1000){x<-runif(200);g<-exp(-x)/xˆ(1/3);Ig[i]<-mean(g)}> mean(Ig)[1] 1.048734> sd(Ig)[1] 0.07062628

For the importance sampler, the integral is

32

∫ 1

0

e−xfX(x) dx.

137


To simulate independent random variables with density fX , we first need the cumulative distribution function for X ,

FX(x) =∫ x

0

23 3√tdt = t2/3

∣∣∣x0

= x2/3.

Then, to find the probability transform, note that

u = FX(x) = x2/3 and x = F−1X (u) = u3/2.

Thus, to simulate X , we simulate a uniform random variable U on the interval [0, 1] and evaluate U3/2. This leads tothe following R commands for both the simple Monte Carlo and the importance sample simulation:

> Ig<-rep(0,1000)> for (i in 1:1000){u<-runif(200);g<-exp(-u)/uˆ(1/3);Ig[i]<-mean(g)}> mean(Ig)[1] 1.049716> sd(Ig)[1] 0.07173335> ISg<-rep(0,1000)> for (i in 1:1000){u<-runif(200);x<-uˆ(3/2); w<-3*exp(-x)/2;ISg[i]<-mean(w)}> mean(ISg)[1] 1.048415> sd(ISg)[1] 0.02010032

Thus, the standard deviation using importance sampling is about 2/7-ths the standard deviation using simple MonteCarlo simulation. Consequently, we will can decrease the number of samples using importance sampling by a factorof (2/7)2 ≈ 0.08.

138

topic 10: the law of large numbers - university of arizonamath.arizona.edu/~jwatkins/j-limit.pdf ·...

Documents