monday’s lecture chapter 5: functions of random variables

14
Monday’s lecture Long run relative frequencies The law of large numbers (4.5) The Cauchy distribution (and why it does not follow the law of large numbers) Chapter 5: Functions of random variables Finding distributions using the cdf Temperature A certain chemical reaction achieves a temperature, X, varying from experiment to experiment according to the pdf f X (x) = x e -x 2 /2 , x 0 where X is measured in degrees Celsius. Let Y be the temperature in degrees Fahrenheit. What is the density of Y? so P(Y y) = P(1.8 X + 32 y ) = P(X y-32 1.8 ) whence F X (x) = u e -u 2 /2 0 x = 1 - e -x 2 /2 f Y (y) = d dy P(Y y) = 1 1.8 f X ( y 32 1.8 ) = y 32 3.24 e y32 ( ) 2 6.48 Friday’s problems 1. If we know the value of X=x, then Y can take on the two values . Hence X and Y are not independent. 2.Cov(aX+b,cY+d)=acCov(X,Y). sd(aX +b)=|a|sd(X) and sd(cY+d)=|c|sd(y) so Corr(aX+b,cY+d) =sgn(a)sgn(b)Corr(X,Y) 3. The combined resistance is the sum of the resistances, so its variance is 6 times the vaiance of each (assuming independence). We need 6s 2 0.4 2 or s ± 1 x 2 E(XY) = sin(2πu)cos(2πu)du = sin(4πu)du = 0 0 2π 0 2π E(X) = sin(2πu)du = 0 = E(Y) 0 2π T = X i

Upload: others

Post on 14-Feb-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Monday’s lecture Chapter 5: Functions of random variables

Monday’s lecture

Long run relative frequenciesThe law of large numbers (4.5)The Cauchy distribution (and why it does not follow the law of large numbers)

Chapter 5:Functions of

random variables

Finding distributions using the cdf

TemperatureA certain chemical reaction achieves a temperature, X, varying from experiment to experiment according to the pdf fX(x) = x e-x2/2 , x ≥ 0

where X is measured in degrees Celsius. Let Y be the temperature in degrees Fahrenheit. What is the density of Y?

so

P(Y ≤ y) = P(1.8 X + 32 ≤ y ) = P(X ≤ y-321.8 )whence

FX(x) ∫ = u e-u2/2

0

x

= 1 - e -x2/2

fY (y) =ddyP(Y ≤ y) = 1

1.8fX(y − 321.8

) = y − 323.24

ey−32( )2

6.48

Friday’s problems

1. If we know the value of X=x, then Y can take on the two values . Hence X and Y are not independent.

2.Cov(aX+b,cY+d)=acCov(X,Y). sd(aX+b)=|a|sd(X) and sd(cY+d)=|c|sd(y) so Corr(aX+b,cY+d) =sgn(a)sgn(b)Corr(X,Y)

3. The combined resistance is the sum of the resistances, so its variance is 6 times the vaiance of each (assuming independence). We need 6s2≤0.42 or s ≤ 0.163.

4. where Xi is the winning in round i. Here E(Xi)=pi, E(Xi

2)=pi2 so Var(Xi)=p(1-p)i2. Hence E(T)= n(n+1)p/2 and Var(T)= p(1-p)n(n+1)(2n+1)/6. A fair price

± 1− x2

E(XY) = sin(2πu)cos(2πu)du = sin(4πu)du = 00

∫0

∫E(X) = sin(2πu)du = 0 = E(Y)

0

T = Xi∑

Page 2: Monday’s lecture Chapter 5: Functions of random variables

is the expected winnings, i.e. n(n+1)p/2.

5.P(minXi>x)=P(all Xi>x)=(1-F(x))n so FminXi(x)=1-(1-F(x))n =1-exp(nx) so the density is xnexp(-xn) and

6. Y=max(U,1-U) so P(Y≤y)=P(U≤y,U>1-U)+P(1-U≤y,U≤1-U)=y-1/2+1/2-(1-y)=2y-1, y≥1/2. Hence Y~U(1/2,1) and E(Y)=3/4, so larger by 1/2 (using symmetry the smaller piece will be U(0,1/2)).

7. Assuming bricks and mortar are independent, we have E(length)=50 x 10 + 49 x 1/2Var(length)=50/322 + 49/162=0.24, so sd(length)=0.49

8. E(aX+b)=aE(X)+b so E(aX+b-aE(X)-b)3 =a3E(X-E(x))3

sd(aX+b)=|a|sd(X) so the skewness koefficient is

E(minXi) = x2ne −xn dx = 1n2

u2e −u du0

∫Γ (3)=2

! "# $#0

∫ = 2n2

a3E(X −E(X))3

a 3 sd(X)3= sgn(a) × skewness(X)

The cdf method

If we want to find the density of Y=g(X), we do this in two steps:1. Find the cdf of Y by computing P(g(X)≤y)If g is monotone increasing, this is FX(g-1(y))

2. Now differentiate the cdf to get the density

Step 1 also works for discrete cdfs, with a different step 2.

The chi-square densityLet X ~ N(0,1) i.e., X has density

f(x) = (2π)1/2exp(-x2/ 2). What is the density of X2? First compute

Now differentiate with respect to x to get

Plug in to the normal density to see that we get

This is called the chi-square density with one degree of freedom.

P(X2 ≤ x) = P(− x ≤ X ≤ x) = F( x) −F(− x)

fX2 (x) =12 x

f( x) + f(− x)( )

fX2 (x) =12πx

e−x / 2, x > 0

Monte Carlo simulation

The method of computer simulation has changed the way science and engineering is done. Simulation of random processes is usually done using the Monte Carlo method, where the computer selects (quasi)random numbers which are used to construct the random process. A typical random number generator generates a number uniformly distributed between 0 and 1. Suppose we want a binomial random variable X with n=8 and p=1/3. The cdf of X isx 0 1 2 3 4 5 6 7 8F .039 .195 .468 .741 .912 .980 .997 .9998 1.0

U = 0.1509110U = 0.7236689

Page 3: Monday’s lecture Chapter 5: Functions of random variables

Again, we want to use a uniform random variable to construct the exponential one. In the discrete case we used the inverse of the cdf. Will that work here as well?We have , so the inverse function is

The continuous caseSuppose that we want to generate an exponential random variable with pdf fX(x) = α e - αx , x > 0

FX(x) = 1 - e - αx

FX-1(u) = - 1

α log (1 - u )

P( - 1α

log (1 - U )≤y ) = P (U ≤ 1 - e-αy )

Hence

Wednesday’s lecture

The cdf method for change of variables (5.2)The inverse cdf method for generating random numbers with a given distribution (8.5)

Some more examples

Let X ~ U(0,1). What is the density of X2?

What if X ~ U(-1,1)

The cumulative distribution function

FX(x) = P(X ≤ x) (cdf)

P(X > x ) =

P(a < X ≤ b ) =

P(X < x ) =

12

Page 4: Monday’s lecture Chapter 5: Functions of random variables

Properties of the cdf

Let F(x) be a cdf. Then(a)  F is nondecreasing(b)   (c) 

13

F(∞) = limx→∞

F(x) = 1

F(−∞) = 0

Some examples of cumulative distribution

functions...or...?

14

Going from the cdf to the pmf/pdf

Let X be a discrete random variable. ThenpX(x) = FX(x) – FX(x-)

If X is continuous, then

15

fX (x) =ddxFX (x)

Joint distribution

Joint cdf FX,Y(x,y) = P(X≤x,Y≤y)

Joint pmf pX,Y(x,y) = P(X=x,Y=y)

Joint pdf

16

fX,Y (x,y) =∂2

∂x∂yFX,Y (x,y)

Page 5: Monday’s lecture Chapter 5: Functions of random variables

Computing probabilities from the joint cdf

Given FX,Y(x,y) , how do we compute P(a<X≤b, c<Y≤d)?

We see that the sought probability is given by

FX,Y(b,d) - FX,Y(b,c) - FX,Y(a,d) + FX,Y(a,c)

a b

c

d

17

Marginal distributionsIf (X,Y) is discrete, the marginal distribution of X is given by

while if they are continuous it is given by

The marginal distribution contains no information about the joint behavior of X and Y.

pX (x) = pX,Yy∑ (x,y)

fX (x) = fX,Yy=−∞

∫ (x,y)dy

18

The discrete case

Let X ~ Bin(n,p). What is the pmf for 2X-n (a random walk)

Sums of random variables

Let X and Y be independent random variables, and consider Z = X + Y. How do we find the distribution of Z?Discrete case, X, Y ≥ 0. ThenP(Z=n) = P(X=z, Y=0) + P(X=z-1, Y=1) +...+ P(X=0,Y=z)

Continuous case:

so

= pX(k)pY(z −k)k=0

z

P(Z ≤ z) = fX (x)fY(y) = fX (x) fY(y)dydx−∞

z−x

∫−∞

∫x+y≤z∫∫

= fX (x)FY(z − x)dx−∞

fZ(z) =ddzP(Z ≤ z) = fX (x)fY(z − x)dx

−∞

Page 6: Monday’s lecture Chapter 5: Functions of random variables

Wednesday’s lecture

Review of concepts of cdf, pmf and pdf, joint, marginal and conditional distributions (ch. 3.2-5)Distribution of sum of independent rvs. (5.2)

Friday solutions

1. P(Y≤y)=P(F(X)≤y)=P(X≤F-1(y))=F(F-1(y))=y2. (a) X+Y~Bin(n+m,p) (each is a sum of indicators of independent Bernoulli events) so

(b) E(X|X+Y=k)=kn/(n+m)(c) Once you know X+Y you are just picking the X events without replacement, and the success probability is just the probability of X-events out of the k.3.(a)

P(X = l X + Y = k) = P(X = l,Y = k − l)P(X + Y = k)

=

nl

⎛⎝⎜

⎞⎠⎟

mk − l

⎛⎝⎜

⎞⎠⎟pl+k−lqn−l+m−k+l

n +mk

⎛⎝⎜

⎞⎠⎟pkqn+m−k

=

nl

⎛⎝⎜

⎞⎠⎟

mk − l

⎛⎝⎜

⎞⎠⎟

n +mk

⎛⎝⎜

⎞⎠⎟

fXY(x y) =fX,Y (x,y)fY(y)

E(X Y) = xfX,Y (x,y)fY(y)

dx∫

(b)

4. E(X1/Sn)+E(X2/Sn)+...+E(Xn/Sn)=E(Sn/Sn)=1 so each E(Xi/Sn) must be 1/n, hence E(Sm/Sn)=m/n.5. (a) Given X=x, Y≤f(x) so P(Y≤f(X)|X=x)=f(x). Hence FX,Y(x,f(x))=](b) Draw pairs of independent random numbers, for each record whether it is a hit or not. The frequency of hits will converge to the integral of the function.6. (a) P(black on draw 2)=r/(r+b) x b/(r+b+1) + b/(r+b) x (b+1)/(r+b-1) =b/(r+b) etc for any draw (by symmetry)(b) E(Xi)=P(Xi=1)=b/(r+b) so ESn=nb/(r+b)(c)E(X1X2)=P(X!=X2=1)= b/(r+b) x(b+1)/(r+b+1) so Cov(X1,X2)=b(b+1)/(r+b)(r+b+1) – (b/r+b)2 =rb/((r+b)2(r+b+1))Now Var(X1)=b/(r+b)-(b/r+b)2=ab/((a+b)2

so Corr(X1,X2)= ab(a+b)2/(a+b)2(a+b+1)ab)=1/(a+b+1)(d) Var(Sn)=nrb/(r+b)2+n(n-1) rb/((r+b)2(r+b+1))

E(E(X Y)) =xfX,Y (x,y)dx∫fY (y)∫ fY (y)dy = xfX,Y (x,y)dxdy∫∫

= xfX(x)dx = E(X)∫

f(x)dx0

1

Uniform case

Let X and Y be independent U(0,1), and Z=X+Y. Then P(Z≤z) is the area of the shaded portion below.

z≤1 z>1

Page 7: Monday’s lecture Chapter 5: Functions of random variables

Exponential

fX+Y(z) = fX (x)fY(z − x)dx−∞

= λe−λxλe−λ (z−x) dx0

z

= λ2e−λz dx0

z

∫ = λ2ze−λz

This is called a gamma distribution with parameters 2 and λ, or Γ(2,λ).

Binomial

X ~ Bin(n,p) Y ~ Bin(m,p) (independent) count the number of rainstorms in January and November, respectively. What is the distribution of X+Y?

Poisson

Let X ~ Po(λ) count the large earthquakes in the western hemisphere in a 20-year interval and Y ~ Po(μ) those in the eastern hemisphere in the same interval. Based on plate techtonics, it may be reasonable to assume that X and Y are independent. What is the distribution of earthquakes worldwide in this time interval?

Conditional distribution

Conditional pmf:

Conditional pdf:

28

pXY(x y) =P(X = x Y = y)

=P(X = x,Y = y)P(Y = y)

=pX,Y (x,y)pY(y)

fXY(x y) =fX,Y (x,y)fY(y)

Page 8: Monday’s lecture Chapter 5: Functions of random variables

Examples

X,Y independent Po(m), Z=X+YpX|Z(x|z) =

X,Y independent Exp(λ), Z=X+YfX|Z(x|z) =

The product of two uniforms

(X,Y) uniform on unit square, and Z=XY. What region do we need to compute the area of to find the cdf of Z?

The larger of two random variables

Let X and Y be independent with the same density f(x). In order to find the distribution of Z = max(X,Y), note that Z≤z if and only if both X≤z and Y≤z.

What if we want V = min(X,Y)?

Page 9: Monday’s lecture Chapter 5: Functions of random variables

Chapter 6 The normal distribution

• Some history• The central limit theorem• Normal approximations

The Scottish chest measurements

VOL. 79, NO. 2, APRIL 2006 109

33 35 37 39 41 43 45 47

0.05

0.1

0.15

0.2

Relativefrequency

Chestgirth

Figure 8 Is this data normally distributed?

not group themselves with more regularity, as to the order of magnitude, thanthe 5,738 measurements made on the scotch [sic] soldiers; and if the two se-ries were given to us without their being particularly designated, we should bemuch embarrassed to state which series was taken from 5,738 different soldiers,and which was obtained from one individual with less skill and ruder means ofappreciation.

This argument, too, is unconvincing. It would have to be a strange person indeedwho could produce results that diverge by 15′′ while measuring a chest of girth 40′′.Any idiosyncracy or unusual conditions (fatigue, for example), that would producesuch unreasonable girths is more than likely to skew the entire measurement processto the point that the data would fail to be normal.

It is interesting to note that Quetelet was the man who coined the phrase the averageman. In fact, he went so far as to view this mythical being as an ideal form whosevarious corporeal manifestations were to be construed as measurements that are besetwith errors [34, p. 99]:

If the average man were completely determined, we might . . . consider him asthe type of perfection; and everything differing from his proportions or condi-tion, would constitute deformity and disease; everything found dissimilar, notonly as regarded proportion or form, but as exceeding the observed limits, wouldconstitute a monstrosity.

Quetelet was quite explicit about the application of this, now discredited, principleto the Scottish soldiers. He takes the liberty of viewing the measurements of the sol-diers’ chests as a repeated estimation of the chest of the average soldier:

I will astonish you in saying that the experiment has been done. Yes, truly, morethan a thousand copies of a statue have been measured, and though I will notassert it to be that of the Gladiator, it differs, in any event, only slightly fromit: these copies were even living ones, so that the measures were taken with allpossible chances of error, I will add, moreover, that the copies were subject todeformity by a host of accidental causes. One may expect to find here a consid-erable probable error [32, p. 136].

We write X ~ N(µ,σ2) if

where is called the standard normal density.

This distribution has mean µ and variance σ2..Some history:• de Moivre• Gauss/Laplace• Quetelet

The normal distribution

fX (x;µ,σ2) = 1

σφ(x − µ

σ) = 1

σ 2πexp(− (x − µ)

2

2σ2)

φ(x) = 12π

e−x2 / 2

1788-18551796-1874

1749-18271667-1754

Calculation of normal probabilities

The cdf for a standard normal random variable (i.e. N(0,1)) is denoted Φ(x), and tabled in the back of the book. If X ~ N(µ,σ2) the corresponding cdf is Φ((x-µ)/σ).Examples: X~N(3,1) P(X>4) = 1 - Φ(4 - 3)

= 1 - Φ(1) =

X~N(-2,4) P(|X| < 3) = Φ((3+2)/2) - Φ((-3+2)/2)

= Φ(2.5) - Φ(-.5) =

Page 10: Monday’s lecture Chapter 5: Functions of random variables

French conscripts

Quetelet studied the heights of conscripts tothe French army in the early 1800’s, and foundan interesting anomaly:“too many” individuals were below the minimum height for soldiers, 1.57 m (5’1.8”) and “too few” just above that height.

The pth percentile lp of a distribution with cdf F(x) is a number such that F(lp) = p.

If X has cdf F then 95% of the time X will be smaller than l.95.

For the standard normal distribution, here are some values

If X~ N(2,4) find x,y so that P(x<X<y)=0.95

Normal percentiles

p .5 .75 .9 .95 .975 .99 .995lp 0.0 0.67 1.28 1.65 1.96 2.33 2.57

Monday’s lecture

Normal distributionStandard normalGeneral normal

Using normal tableQuantiles

Learning resources

Ask questions in classText bookOffice hoursStudy centerClass web page

Link to last quarter’s class web pageLinks to other free texts

Email instructorTalk to other students

Page 11: Monday’s lecture Chapter 5: Functions of random variables

Traffic deaths

C O L 0 1

STANDARD UNITS 0 . 1

0 . 20 . 30 . 40 . 50 . 60 . 70 . 80 . 91 . 0

0 3 6 9 1 2

The histogram below depicts the traffic-death rates (fatalities per 100 million motor vehicle miles) for each of the 50 states in 1971. Does the normal curve depicted look like a good fit?

The mean rate is 5.33, and the sd is 1.26. Suppose we want to estimate the probability that a state has traffic-death rate below 4. What would be a reasonable estimate?

Sums of normal random variables

Let X ~ N(µ,σ2) and Y ~ N(ν,τ2) be independent. Fact: Linear combinations of normal random variables are normal.To figure out the distribution of Z=aX+bY we need only determine the mean and variance of Z. ButE(Z) =

Var(Z) =

What changes if X and Y are correlated?

Central Limit Theorem

The normal distribution arises naturally whenever we look at a sum of a large number of iid random variables.

Theorem 5.3.B: Let X1,…,Xn be iid random variables with expected value µ and variance σ2 . For any real number x

Some examples

Page 12: Monday’s lecture Chapter 5: Functions of random variables

Wednesday’s lecture

Normal distributionCentral limit theorem

Cauchy-Schwarz inequalityIf S and T are random variables with finite variance, then

{E(ST))2 ≤ E(S2) E(T2)

Corollary |Corr(S,T)| ≤ 1.

The larger of two random variables

Let X and Y be independent with the same density f(x). In order to find the distribution of Z = max(X,Y), note that Z≤z if and only if both X≤z and Y≤z.

A binomial approximation

When n is small it is easy to compute binomial probabilities exactly. When n is large and p small (or near 1) we can use the Poisson approximation. But what about n large?De Moivre showed the normal approximation to binomial (what we now would use the CLT for)

E(X)=Var(X)=

F(k+1/2) – F(k–1/2) ≈ f(k)

X = Yii=1

n

Page 13: Monday’s lecture Chapter 5: Functions of random variables

Roundoff error

Transactions in banks are sometimes recorded only to the nearest cent. Assume that the rounding errors (actual value - recorded value) are independent uniform random variables on (-0.005,0.005). For 48 transactions, the maximal possible error is 0.24. However, by the central limit theorem the distribution of total error X is approximately normal with mean 0 and variance 48/12✕10-4. Hence

P(| X |> 0.06) = P( | X |0.02

>0.060.02

) = 1−P(−3 ≤ X0.02

≤ 3)

≈ 1− (Φ(3) −Φ(−3) = 0.003

Production controlAssembly-line processes do not yield constant output. There is always variation in the quality of items produced. Control charts are used to monitor whether quality variation can reasonably be assigned to chance or there are other sources of variation to look for.A mean chart for process control looks at successive averages of quality measures (such as measures of size). Based on previous data, control limits are determined. If the plot of successive averages fall outside the control limits, the process is out of control, and corrections must be made.

Production control, cont.

By the central limit theorem, the averages are approximately normally distributed, with µ = process mean and σ = process standard deviation / n1/2, where n is the number of measures averaged. Thus we expect the process to exceed ± 3σ only rarely. If we do not know the process standard deviation it must be estimated from the data, and the number 3 will be replaced by a slightly larger number.

A control chart

Page 14: Monday’s lecture Chapter 5: Functions of random variables

What causes lack of control?

Change in level:different supplierdifferent operator

Trend:machine moving out of alignmentdeterioration in quality of raw materials

Change in dispersion:different supplier not detectable by means chart

Food processing

An ice cream company produces 200 ounce ice cream containers. The process is designed to produce actual fill amounts between 196 and 204 ounces. At 10-minute intervals, four containers were taken from the production and the average weight computed. The overall mean of 24 such samples (96 containers) was 203.95 ounces, with an estimated standard error for each average of four of 1.36. Ignoring the estimation of the standard error, the control limits are 204 ± 4.08.