chapter 4

36
Chapter Four – Continuous Random Variables and Probability Distributions 4.1 – Continuous Random Variables and Probability Density Functions For a discrete random variable X , one can assign a probability to each value that X can take (i.e., through the probability mass function). Random variables such as heights, weights, lifetimes, and measurement error can all assume an infinite number of values. As a result, we need a different mechanism for understanding the probability distribution. For a continuous random variable, any real number x is possible between A and B (for A<B). The number of values is theoretically infinite, but one won’t have such precision in practice. Repre- senting such a random variable using a continuous model is still appropriate. Note that a continuous random variable can still have specific endpoints to its range (e.g., x> 0, 0 <x< 1 or x> 1), but the number of values possible is still infinite. As discussed earlier, we can use histograms to represent the relative frequency of a random variable X . Suppose X is the depth of a lake at a randomly chosen point on the surface. X has a range from 0 to M , where M is the maximum depth of the lake. The relative frequency histogram can represent the probability distribution for X . The finer the discretization of the X axis (i.e., the precision of the measurement), the smaller the subintervals for the histogram as seen below. Infinitely small subintervals lead to the continuous probability distribution being represented as a smooth curve. As with the relative frequency histograms, the probability associated with the values of X between any two values a and b is the area under the smooth curve between a and b. For functions consisting of straight line segments, geometry and simple relationships for known areas (shapes) can be used. For more complex functions, we may be able to subdivide the area under the curve into smaller rectangles and sum their areas. However, calculus and integration techniques solve this for us. If X is a continuous random variable, the probability distribution or the probability density function (pdf ) of X is a function f (x) such that for any two numbers a and b with a b: P (a X b)= Z b a f (x) dx. This probability is the area under the curve f (x) between the points a and b. For f (x) to be a legitimate density function, the following two conditions must hold: f (x) 0 for all x, and 1

Upload: sumedh-kakde

Post on 27-Dec-2015

45 views

Category:

Documents


2 download

DESCRIPTION

Statistics Ken Black

TRANSCRIPT

Page 1: Chapter 4

Chapter Four –

Continuous Random Variables and Probability Distributions

4.1 – Continuous Random Variables and Probability Density Functions

For a discrete random variable X, one can assign a probability to each value that X can take (i.e.,through the probability mass function). Random variables such as heights, weights, lifetimes, andmeasurement error can all assume an infinite number of values. As a result, we need a differentmechanism for understanding the probability distribution.

For a continuous random variable, any real number x is possible between A and B (for A < B).The number of values is theoretically infinite, but one won’t have such precision in practice. Repre-senting such a random variable using a continuous model is still appropriate. Note that a continuousrandom variable can still have specific endpoints to its range (e.g., x > 0, 0 < x < 1 or x > 1), butthe number of values possible is still infinite.

As discussed earlier, we can use histograms to represent the relative frequency of a randomvariable X. Suppose X is the depth of a lake at a randomly chosen point on the surface. X has arange from 0 to M , where M is the maximum depth of the lake. The relative frequency histogramcan represent the probability distribution for X. The finer the discretization of the X axis (i.e., theprecision of the measurement), the smaller the subintervals for the histogram as seen below.

Infinitely small subintervals lead to the continuous probability distribution being representedas a smooth curve. As with the relative frequency histograms, the probability associated with thevalues of X between any two values a and b is the area under the smooth curve between a and b. Forfunctions consisting of straight line segments, geometry and simple relationships for known areas(shapes) can be used. For more complex functions, we may be able to subdivide the area underthe curve into smaller rectangles and sum their areas. However, calculus and integration techniquessolve this for us.

If X is a continuous random variable, the probability distribution or the probability densityfunction (pdf ) of X is a function f (x) such that for any two numbers a and b with a ≤ b:

P (a ≤ X ≤ b) =

∫ b

a

f (x) dx.

This probability is the area under the curve f (x) between the points a and b.For f (x) to be a legitimate density function, the following two conditions must hold:

• f (x) ≥ 0 for all x, and

1

Page 2: Chapter 4

• ∫∞−∞ f (x) dx = 1.

With discrete random variables, we only had a finite set of values for X, each with probabilitymass at x and the sum of possible p (x) values equal to 1 (the equivalent of the two conditionsabove).

Since integration is fundamental to the understanding of continuous random variables, someintegration review is suggested. Some basic indefinite integrals you should know are:

∫dx = x + C

∫c dx = c

∫dx

∫(f + g + · · · ) dx =

∫f dx +

∫g dx + · · ·

∫xrdx =

xr+1

r + 1+ C

∫exp (x) dx = exp (x) + C

Of course, the same rules apply when the integrals are definite rather than indefinite. Hence,suppose that you want the area under the parabola y = x2 for x ∈ [0, 1]. Then:

∫ 1

0

x2 dx =

[x3

3

]1

0

=1

3− 0 =

1

3.

Maple will be used extensively, particularly for more complicated functions. The equivalent differ-entiation rules for the antiderivatives (integrals) given above should also be known. For example:

exp (2x)d

dx= 2 exp (2x) .

Corresponding derivative rules for the above integration rules are:

xd

dx= 1

cxd

dx= c

[f + g + · · · ] d

dx= f

d

dx+ g

d

dx+ · · ·

xr d

dx= rxr−1

exp (x)d

dx= exp (x)

The Uniform Distribution A simple but commonly occurring continuous distribution is theuniform distribution on the interval [A, B]. If a continuous random variable X follows a uniformdistribution, the pdf of X is:

f (x; A,B) =

{ 1

B − AA ≤ x ≤ B

0 otherwise

}.

2

Page 3: Chapter 4

To verify that this is a valid pdf, note that f (x) ≥ 0 for all x ∈ [A,B] and that:

∫ ∞

−∞f (x) dx =

∫ B

A

1

B − Adx =

[x

B − A

]B

A

=B

B − A− A

B − A=

B − A

B − A= 1.

Example (Devore, Page 143, Exercise 1) Let X denote the amount of time for which a book ontwo-hour reserve at a college library is checked out by a randomly selected student and supposethat X has density function:

f (x) =

{0.5x 0 ≤ x ≤ 2

0 otherwise

}.

Calculate the following probabilities: P (X ≤ 1), P (0.5 ≤ X ≤ 1.5), and P (1.5 < X).

Solution The desired probabilities are:

P (X ≤ 1) =

∫ 1

0

0.5x dx =[0.25x2

]1

0= 0.2500

P (0.5 ≤ X ≤ 1.5) =

∫ 1.5

0.5

0.5x dx =[0.25x2

]1.5

0.5= 0.5000

P (1.5 < X) =

∫ 2

1.5

0.5x dx =[0.25x2

]2

1.5= 0.4375.

3

Page 4: Chapter 4

¨

Example (Devore, Page 144, Exercise 5) A college professor never finishes his lecture before thebell rings to end the period and always finishes his lecture within one minute after the bell

4

Page 5: Chapter 4

rings. Let X = the time that elapses between the bell and the end of the lecture and suppose

the pdf of X is f (x) =

{kx2 0 ≤ x ≤ 10 otherwise

}.

a. Find the value of k.

Solution For f (x) to be a valid pdf, we must have∫∞−∞ f (x) dx = 1. Hence:

∫ ∞

−∞f (x) dx =

∫ 1

0

kx2 dx = 1 ⇒[kx3

3

]1

0

= 1 ⇒ k

[1

3− 0

]=

k

3= 1 ⇒ k = 3.

b. What is the probability that the lecture ends within 12

minute of the bell ringing?

Solution P(X ≤ 1

2

)=

∫ 1/2

03x2 dx = [x3]

1/20 =

(12

)3= 1

8.

c. What is the probability that the lecture continues beyond the bell for between 15 and 30seconds?

Solution P(

14≤ X ≤ 1

2

)=

∫ 1/2

1/43x2dx = [x3]

1/21/4 =

(12

)3 − (14

)3= 7

64.

d. What is the probability that the lecture continues for at least 40 seconds beyond the bell?

Solution P(X ≥ 2

3

)=

∫ 1

2/33x2 dx = [x3]

12/3 = (1)3 − (

23

)3= 19

27.

¨The ”area” under a single point is zero,

∫ a

af (x) dx = 0. Although in practice a single real

number may seem to be a likely value for x, a continuous model for X assumes zero probability

5

Page 6: Chapter 4

for any single value. This implies that for a continuous random variable X which has 1 within itsrange of possible values, P (X ≤ 1) ≡ P (X < 1). Similarly, P (1 ≤ X ≤ 2) ≡ P (1 < X < 2). Ofcourse, for an integer valued discrete random variable X, this is not true.

4.2 – Cumulative Distribution Functions and Expected Values

The Cumulative Distribution Function The cumulative distribution function (cdf ), F (x),for a continuous random variable X is defined for every real number x by:

F (x) = P (X ≤ x) =

∫ x

−∞f (y) dy.

Hence, f (x) is the pdf but, notationally, y is used to avoid confusion because the upper limit ofintegration is x. The cdf F (x) is the area under the density curve for x, f (x), which is to the leftof x.

Example (Devore, Page 145, Example 4.5) Suppose X has a uniform distribution on [A,B]. Tocompute the cdf of X, we integrate f (x) = 1

B−Aover the range of X up to x:

F (x) =

∫ x

−∞

1

B − Ady =

∫ x

A

1

B − Ady =

1

B − A

∫ x

A

dy =1

B − A[y]xA =

x− A

B − A,

for A ≤ x ≤ B. More precisely, the cdf is defined to be:

F (x) =

0 x < Ax−AB−A

A ≤ x ≤ B

1 x ≥ B

Example (Devore, Page 143, Exercise 3) Suppose the distance X between a point target and ashot aimed at the point in a coin-operated target game is a continuous random variable withpdf:

f (x) =

{34(1− x2) −1 ≤ x ≤ 1

0 otherwise

}.

Note that you should be able to sketch the graph of this pdf. It is a parabola openingdownward with a y-intercept of 3

4. For −1 < x < 1, the cdf of X is then:

F (x) =

∫ x

−∞f (t) dt =

∫ x

−1

3

4

(1− t2

)dt =

3

4

[t− t3

3

]x

−1

=3

4

[x− x3

3−

(−1− −1

3

)]=

3

4

(x− x3

3+

2

3

)

=1

2+

3

4

(x− 1

3x3

).

Hence, the cdf of X is:

F (x) =

0 x < −112

+ 34

(x− 1

3x3

) −1 ≤ x ≤ 11 x > 1

. ¨

6

Page 7: Chapter 4

Using F (x) to Compute Probabilities Let X be a continuous random variable with pdf f (x)and cdf F (x). Then for any two numbers a and b with a < b:

P (a ≤ X ≤ b) = F (b)− F (a) .

The following figure illustrates this proposition; the desired probability is the shaded area under thedensity curve between a and b and equals the difference between the two shaded cumulative areas.

Example (Devore, Page 146, Example 4.6) Suppose the pdf of the magnitude X of a dynamic loadon a bridge (in newtons) is given by:

f (x) =

{18

+ 38x 0 ≤ x ≤ 2

0 otherwise

}.

Find the cdf of X and use it to find P (1 ≤ X ≤ 1.5).

Solution For 0 ≤ x ≤ 2, the cdf is:

F (x) =

∫ x

−∞f (y) dy =

∫ x

0

(1

8+

3

8y

)dy =

x

8+

3

16x2.

Hence, the cdf is:

F (x) =

0 x < 0x8

+ 316

x2 0 ≤ x ≤ 21 x > 2

.

Therefore:

P (1 ≤ X ≤ 1.5) = F (1.5)− F (1) =1.5

8+

3

16(1.5)2 −

[1

8+

3

16(1)2

]=

19

64= 0.2969. ¨

Obtaining f (x) from F (x) If X is a continuous random variable with pdf f (x) and cdf F (x),then at every x at which the derivative F ′ (x) exists:

F ′ (x) =d

dxF (x) = f (x) .

7

Page 8: Chapter 4

¨

Percentiles of a Continuous Distribution Let p be a number between 0 and 1. The (100p)thpercentile of the distribution of a continuous random variable X, denoted by η (p), is defined by:

p = F (η (p)) =

∫ η(p)

−∞f (y) dy.

8

Page 9: Chapter 4

Therefore, (100p) % of the area under f (x) lies to the left of η (p) and 100 (1− p) % lies to the right.

Example (Devore, Page 143, Exercise 1) Let X denote the amount of time for which a book ontwo-hour reserve at a college library is checked out by a randomly selected student and supposethat X has density function:

f (x) =

{x2

0 ≤ x ≤ 20 otherwise

}.

Find the 75th percentile.

Solution The cdf is:

F (x) =

∫ x

0

y

2dy =

[y2

4

]x

0

=x2

4

for 0 < x < 2. The 75th percentile, η (0.75), is found as follows:

0.75 = F (η (0.75)) =η (0.75)2

4⇒ 3 = η (0.75)2 ⇒ η (0.75) = ±

√3.

But since X is a measure of time, −√3 is not a possible value. Therefore, the 75thpercentile is η (0.75) =

√3. ¨

The median of a continuous distribution, denoted by µ̃, is the 50th percentile, so µ̃ satisfies0.5 = F (µ̃). That is, half the area under the density curve is to the left of µ̃ and half is to the rightof µ̃.

Expected Values for Continuous Random Variables The expected or mean value of a con-tinuous random variable X with pdf f (x) is:

µX = E (X) =

∫ ∞

−∞x · f (x) dx.

If X is a continuous random variable with pdf f (x) and h (X) is any function of X, then:

E [h (X)] = µh(X) =

∫ ∞

−∞h (x) · f (x) dx.

9

Page 10: Chapter 4

The Variance of a Continuous Random Variable The variance of a continuous randomvariable X is:

σ2X = V (X) =

∫ ∞

−∞(x− µ)2 · f (x) dx = E

[(X − µ)2]

and the standard deviation of X is:σX =

√V (X).

It is usually more convenient to compute V (X) as:

V (X) = E(X2

)− [E (X)]2 ,

but keep in mind that the variance is a measure of spread about the mean. The variance of X isalso called the second moment of X. The mean is the first moment.

Example (Devore, Page 153, Exercise 22) The weekly demand for propane gas (in 1000s of gallons)from a particular facility is a random variable X with pdf:

f (x) =

{2(1− 1

x2

)1 ≤ x ≤ 2

0 otherwise

}.

a. Compute the cdf of X.

Solution For 1 ≤ x ≤ 2:

F (x) =

∫ x

−∞f (y) dy = 2

∫ x

1

(1− 1

y2

)dy = 2

[y +

1

y

]x

1

= 2

(x +

1

x

)− 4,

so:

F (x) =

0 x < 12(x + 1

x

)− 4 1 ≤ x ≤ 21 x > 2

.

b. Obtain an expression for the (100p)th percentile. What is the value of µ̃?

Solution Let xp be the (100p)th percentile. Then:

p = 2

(xp +

1

xp

)− 4

⇒ 2x2p − (4 + p) xp + 2 = 0

⇒ xp =1

4

(4 + p±

√p2 + 8p

),

but since 0 < p < 1 and 1 ≤ xp ≤ 2:

xp =1

4

(4 + p +

√p2 + 8p

).

To find µ̃, set p = 0.5 to obtain:

µ̃ =1

4

(4 + 0.5 +

√(0.5)2 + 8 (0.5)

)= 1.6404.

10

Page 11: Chapter 4

c. Compute E (X) and V (X).

Solution Using the definitions:

E (X) =

∫ 2

1

2x

(1− 1

x2

)dx = 2

∫ 2

1

(x− 1

x

)dx = 2

[x2

2− ln x

]2

1

= 1.6137

E(X2

)=

∫ 2

1

2x2

(1− 1

x2

)dx = 2

∫ 2

1

(x2 − 1

)dx = 2

[x3

3− x

]2

1

= 2.6667

V (X) = E(X2

)− [E (X)]2 = 2.6667− (1.6137)2 = 0.0627.

d. If 1.5 thousand gallons is in stock at the beginning of the week and no new supply is duein during the week, how much of the 1.5 thousand gallons is expected to be left at theend of the week?

Solution The amount left is equal to max (1.5−X, 0). Therefore:

E (Amount Left) =

∫ 2

1

max (1.5− x, 0) · f (x) dx

= 2

∫ 1.5

1

(1.5− x)

(1− 1

x2

)dx = 0.0609.

11

Page 12: Chapter 4

¨

Example (Devore, Page 152, Exercise 15) Suppose the pdf of weekly gravel sales X (in tons) is:

f (x) =

{2 (1− x) 0 ≤ x ≤ 1

0 otherwise

}.

a. Obtain the cdf of X.

12

Page 13: Chapter 4

Solution For 0 ≤ x ≤ 1:

F (x) =

∫ x

−∞f (y) dy =

∫ x

0

2 (1− y) dy = 2

[y − y2

2

]x

0

= 2

(x− x2

2

).

Thus:

F (x) =

0 x < 0

2(x− x2

2

)0 ≤ x ≤ 1

1 x > 1

.

b. What is P(X ≤ 1

2

)?

Solution P(X ≤ 1

2

)= F

(12

)= 2

[12− ( 1

2)2

2

]= 3

4= 0.75.

c. Using part (a), what is P(

14

< X ≤ 12

)? What is P

(14≤ X ≤ 1

2

)?

Solution P(

14

< X ≤ 12

)= P

(14≤ X ≤ 1

2

)= F

(12

)− F(

14

)= 3

4− 7

16= 5

16= 0.3125.

d. What is the 75th percentile of the sales distribution?

Solution We must solve F (η (0.75)) = 0.75 for η (0.75):

0.75 = 2

(η (0.75)− η (0.75)2

2

)⇒ η (0.75)− η (0.75)2

2=

3

8

⇒ η (0.75)2 − 2η (0.75) +3

4= 0 ⇒ η (0.75) ∈

{1

2,3

4

}.

Therefore, η (0.75) = 12.

e. What is the median µ̃ of the sales distribution?

Solution 0.5 = F (µ̃) = 2(µ̃− µ̃2

2

)⇒ µ̃2 − 2µ̃ + 1

2= 0 ⇒ µ̃ = 2−√2

2= 0.2929.

f. Compute E (X) and σX .

Solution From the definitions:

E (X) =

∫ 1

0

2x (1− x) dx = 2

[x2

2− x3

3

]1

0

=1

3= 0.3333

E(X2

)=

∫ 1

0

2x2 (1− x) dx = 2

[x3

3− x4

4

]1

0

=1

6= 0.1667

V (X) =1

6−

(1

3

)2

=1

6− 1

9=

1

18= 0.0556

σX =√

V (X) =

√1

18=

√2

6= 0.2357. ¨

4.3 – The Normal Distribution

The normal distribution is the most common and important distribution in statistics and applica-tions. Many random variables (e.g., heights, weights, reaction times) follow a normal distribution.

13

Page 14: Chapter 4

Even when the underlying distribution is discrete, the normal distribution is often a very goodapproximation. Sums and averages of non-normal variables will usually be approximately normallydistributed.

A continuous random variable X is said to have a normal distribution with parameters µ and σif the pdf of X is:

f (x; µ, σ) =1

σ√

2πexp

[− (x− µ)2

2σ2

]

for −∞ < x < ∞. The statement that X is normally distributed with parameters µ and σ2 is oftenabbreviated as X ∼ N (µ, σ2). The parameters µ and σ define the normal family. Showing that thepdf integrates to 1 is not simple by hand or even with Maple due to the fact that no closed formsolution exists for the integral.

If X ∼ N (µ, σ2), then the mean and variance are:

E (X) = µ

V (X) = σ2.

Since the normal density curve is symmetric, the center of the single peak of the bell-shapedcurve represents both the median and the mean. The spread of the distribution as given by σ isreflected in the shape of the bell and the amount of area in the tails. The smaller the value of σ,the higher the peak and greater the area around µ. The area under the normal density curve willbe approximately 0.68 within one standard deviation of the mean and P (|X − µ| ≤ 2σ) ≈ 0.95.

The Standard Normal Distribution Rather than compute probabilities by numerically eval-uating:

P (a ≤ X ≤ b) =

∫ b

a

1

σ√

2πexp

[− (x− µ)2

2σ2

]dx,

we ”standardize” to the random variable:

Z =X − µ

σ.

The standard normal random variable Z reflects how many standard deviations we are from themean. In other words, X = µ + σZ.

The standard normal random variable Z has parameters µ = 0 and σ = 1 and has pdf:

f (z; 0, 1) =1√2π

exp

[−z2

2

]

for −∞ < z < ∞. Notationally, we write Z ∼ N (0, 1).The cdf of Z, denoted by Φ (z), is:

Φ (z) = P (Z ≤ z) =

∫ z

−∞f (y; 0, 1) dy.

14

Page 15: Chapter 4

For any random variable X ∼ N (µ, σ2), we standardize to Z via Z = X−µσ

and compute anyprobability as:

P (a ≤ X ≤ b) = P

(a− µ

σ≤ Z ≤ b− µ

σ

)

according to standard normal tables such as Table A.3 in Devore (pp. 704-705).More formally, any probability involving X can be expressed as a probability involving a standard

normal random variable Z. Hence:

P (X ≤ x) = P

(Z ≤ X − µ

σ

)= Φ

(X − µ

σ

).

So, assuming a handy method of integration such as Maple is not available, we can always convertany probability statement involving X into one in terms of Z. These should already be familiarcalculations from other courses.

Example If X ∼ N (0.5, 1), compute P (0.28 < X < 1.75).

Solution P (0.28 < X < 1.75) = P(

0.28−0.51

< Z < 1.75−0.51

)= P (−0.38 < Z < 1.25) = Φ (1.25)−

Φ (−0.38) = 0.8944− 0.3520 = 0.5424. ¨

15

Page 16: Chapter 4

¨

16

Page 17: Chapter 4

Percentiles of an Arbitrary Normal Distribution In statistical inference, percentiles of thestandard normal distribution which have area α to the right are denoted by zα, the 100 (1− α)thpercentile of the standard normal distribution. The (100α)th percentile is −zα. These ”criticalvalues” given in the table below are seen quite often in applications and hypothesis testing.

The (100p)th percentile of a normal distribution with mean µ and standard deviation σ canbe obtain by finding the (100p)th percentile of the standard normal distribution and reversing thestandardization. Hence, if X has a normal distribution with mean µ and standard deviation σ, the(100p)th percentile for X is given by:

[(100p) th percentile for N (µ, σ)] = µ + [(100p) th percentile for Z] · σ.

Example (Devore, Page 165, Exercise 39) The distribution of resistance for resistors of a certaintype is known to be normal. 10% of all resistors have a resistance exceeding 10.256 ohmsand 5% have resistance smaller than 9.671 ohms. What are the mean value and standarddeviation of the resistance distribution?

Solution We are given that P (X > 10.256) = 0.10 and P (X < 9.671) = 0.05. Hence:

P(Z > 10.256−µ

σ

)= 0.10 and P

(Z < 9.671−µ

σ

)= 0.05.

From the standard normal tables, we see that P (Z > 1.28) = 0.10 and P (Z < −1.65) =0.05. Therefore we have two equations in two unknown variables:

10.256− µ

σ= 1.28 ⇒ µ + 1.28σ = 10.256

9.671− µ

σ= −1.65 ⇒ µ− 1.65σ = 9.671.

Subtracting the bottom equation from the top equation gives:

µ + 1.28σ − (µ− 1.65σ) = 10.256− 9.671

1.28σ + 1.65σ = 10.256− 9.671

2.93σ = 0.585

σ = 0.1997.

Substitution gives µ = 10. Hence the distribution of the resistance is normal with mean10 ohms and standard deviation 0.1997 ohms. ¨

Example (Devore, Page 165, Exercise 35a) If a normal distribution has µ = 25 and σ = 5, whatis the 91st percentile of the distribution?

Solution For the standard normal random variable Z ∼ N (0, 1), z0.09 = 1.34. Hence:

η (0.91) = 25 + (1.34) (5) = 31.7. ¨

Example (Devore, Page 165, Exercise 35c) The width of a line etched on an integrated circuit chipis normally distributed with mean 3.000 µm and standard deviation 0.150. What width valueseparates the widest 10% of all such lines from the other 90%?

Solution Let X = width of a line etched on an integrated circuit chip. Then X ∼ N (3.000, 0.150).The widest 10% corresponds to the 90th percentile and z0.10 = 1.28. Hence:

η (0.90) = 3.000 + (1.28) (0.150) = 3.192. ¨

17

Page 18: Chapter 4

The Normal Approximation to the Binomial Distribution The normal distribution is of-ten used as an approximation to the distribution of values in a discrete population. The point massP (X = x) from the discrete distribution is replaced with the interval probability P

(x− 1

2≤ X ≤ x + 1

2

).

Devore notes in Example 4.19 that IQ scores are usually assumed to be normally distributedalthough the score is an integer. The histogram of scores has rectangles which are centered atintegers. In using the normal approximation, the area under the normal density curve between124.5 and 125.5 would approximate the point mass P (X = 125). The correction of 0.5 is called thecontinuity correction.

The binomial distribution is often approximated by the normal distribution. This approximationis considered reasonable if np ≥ 5 and n (1− p) ≥ 5. If these two conditions hold, then X ∼BIN(n, p) is approximately normal with mean µ = np and variance σ2 = np (1− p).

More formally, if X ∼ BIN(n, p)with np ≥ 5 and n (1− p) ≥ 5:

P (X ≤ x) ≈ Φ

(x + 0.5− np√

np (1− p)

),

where Φ is the standard normal cdf.The following two figures illustrate this approximation. The figure on the left is a histogram

of 5000 samples from a BIN(20, 0.1) distribution and the figure on the left is a histogram of 5000samples from a BIN(20, 0.5) distribution. Note that the first figure has np = 2 < 5, and thedistribution is still rather skewed. However, the figure on the right satisfies the necessary conditionsand appears to be approximately normal.

The 0.5 is a continuity correction which ensures that all of the probability to the left of theoriginal x is included. If we are interested in the binomial probability P (a ≤ X ≤ b), then:

P (a ≤ X ≤ b) ≈ Φ

(b + 0.5− np√

np (1− p)

)− Φ

(a− 0.5− np√

np (1− p)

).

18

Page 19: Chapter 4

¨

Example (Devore, Page 166, Exercise 49) Suppose only 40% of all drivers in a certain stateregularly wear a seatbelt. A random sample of 500 drivers is selected. What is the probabilitythat:

a. Between 180 and 230 (inclusive) of the drivers in the sample regularly wear their seatbelt?

Solution Let X = the number of drivers in the sample who regularly wear their seatbelt.Then X ∼ BIN(500, 0.40). Since np = 200 and n (1− p) = 300 (both much larger than5), the normal approximation should be very good here, i.e., X∼̇N (200, 120). Therefore:

P (180 ≤ X ≤ 230) ≈ Φ

(230.5− 200√

120

)− Φ

(179.5− 200√

120

)

= Φ (2.78)− Φ (−1.87) = 0.9973− 0.0307 = 0.9666.

b. Fewer than 175 of those in the sample regularly wear a seatbelt?

Solution P (X < 175) ≈ Φ(

174.5−200√120

)= Φ (−2.33) = 0.0099. ¨

4.4 and 4.5 – The Gamma Family and Other Continuous Distributions

Many situations exist where the normal family of distributions is not applicable. Other models forcontinuous variables are used in many practical situations. One such application is in reliability orsurvival analysis. Survival times or times to failure are typically not normally distributed. Althoughspecification of a distribution is not always necessary, it is often done effectively.

19

Page 20: Chapter 4

The Lognormal Distribution The lognormal distribution may be specified for a random variablewhose natural logarithm is normally distributed. A continuous random variable X is said to have alognormal distribution if Y = ln (X) has a normal distribution. Since Y ∼ N (µ, σ), X = exp (Y )is also specified in terms of µ and σ. The pdf of X is:

f (x; µ, σ) =

1

σx√

2πexp

(− [ln (x)− µ]2

2σ2

)x ≥ 0

0 x < 0

.

Note that X must be non-negative and that:

E (X) = exp

(µ +

σ2

2

)

V (X) = exp(2µ + σ2

) · [exp(σ2

)− 1].

Also note that E (X) 6= exp (µ) and V (X) 6= exp (σ2). In other words, we cannot simply transformthe parameters via exponentiation. The following figure illustrates the graphs of several lognormaldensity functions.

Because ln (X) has a normal distribution, the cdf of X can be expressed in terms of the cdfΦ (z) of a standard normal random variable Z. For x ≥ 0:

F (x; µ, σ) = P (X ≤ x) = P [ln (X) ≤ ln (x)]

= P

(Z ≤ ln (x)− µ

σ

)= Φ

(ln (x)− µ

σ

).

Example (Devore, Page 178, Example 4.27) Let X = the hourly median power (in decibels)of received radio signals transmitted between two cities. The authors of a journal articleargue that the lognormal distribution provides a reasonable probability model for X. If theparameter values are µ = 3.5 and σ = 1.2.

a. What are E (X) and V (X)?

Solution Using the proposition above:

E (X) = exp

(3.5 +

(1.2)2

2

)= 68.0335

V (X) = exp(2 (3.5) + (1.2)2) · [exp

(1.22

)− 1]

= 14907.1677.

20

Page 21: Chapter 4

b. What is the probability that received power is between 50 and 250 dB?

Solution Using the standard normal cdf:

P (50 ≤ X ≤ 250) = F (250; 3.5, 1.2)− F (50; 3.5, 1.2)

= Φ

(ln (250)− 3.5

1.2

)− Φ

(ln (50)− 3.5

1.2

)

= Φ (1.68)− Φ (0.34) = 0.9535− 0.6331 = 0.3204.

c. What is the probability that X does not exceed its mean?

Solution Using the standard normal cdf:

P (X ≤ 68.0) = Φ

(ln (68.0)− 3.5

1.2

)= Φ (0.60) = 0.7257.

If the distribution were symmetric, this probability would equal 0.5; it is much largerbecause of the positive skew (long upper tail) of the distribution, which pulls µ outwardpast the median. ¨

Example (Devore, Page 181, Exercise 73) A theoretical justification based on a certain mate-rial failure mechanism underlies the assumption that ductile strength X of a material has alognormal distribution. Suppose the parameters are µ = 5 and σ = 0.1.

a. Compute E (X) and V (X).

Solution Using the proposition above:

E (X) = exp

(5 +

(0.1)2

2

)= 149.1571

V (X) = exp(2 (5) + (0.1)2) · [exp

(0.12

)− 1]

= 223.5945.

b. Compute P (X > 120) and P (110 ≤ X ≤ 130).

Solution Using the standard normal cdf:

P (X > 120) = P

(Z >

ln (120)− 5

0.1

)= 1− Φ (−2.12) = 0.9830

P (110 ≤ X ≤ 130) = P

(ln (110)− 5

0.1≤ Z ≤ ln (130)− 5

0.1

)

= Φ (−1.32)− Φ (−3.00) = 0.0921.

c. What is the value of median ductile strength?

Solution µ̃ = exp (5) = 148.4132.

d. If ten different samples of an alloy steel of this type were subjected to a strength test, howmany would you expect to have strength at least 120?

Solution Let Y = the number out of 10 which have strength at least 120. Then:

E (Y ) = 10P (X > 120) = 10 (0.9830) = 9.8300.

21

Page 22: Chapter 4

e. If the smallest 5% of strength values were unacceptable, what would the minimum accept-able strength be?

Solution We want the 5th percentile. Hence:

0.05 = Φ

(ln (η (0.05))− 5

0.1

)⇒ ln (η (0.05)) = exp [5 + (0.1) (−1.645)] = 125.9015.

22

Page 23: Chapter 4

¨

23

Page 24: Chapter 4

The Weibull Distribution The Weibull distribution is a very flexible distributional family whichis very applicable in reliability and survival analysis. Its flexibility allows it to be a reasonable modelfor a variety of random variables with skewed probability distributions.

A random variable X is said to follow a Weibull distribution with parameters α and β (α > 0, β > 0)if the pdf of X is:

f (x; α, β) =

α

βαxα−1 exp

[−

(x

β

)α]x ≥ 0

0 x < 0

.

A variety of shapes are possible. As the figures below show, changing the scale parameter β fora fixed α stretches the pdf.

Integrating to obtain E (X) and E (X2) yields:

E (X) = βΓ

(1− 1

α

)

V (X) = β2

(1− 2

α

)−

(1 +

1

α

)]2}

,

where Γ (α) is the gamma function:

Γ (α) =

∫ ∞

0

xα−1 exp (−x) dx.

The most important properties of the gamma function are:

• For any α > 1, Γ (α) = (α− 1) Γ (α− 1),

• For any positive integer n, Γ (n) = (n− 1)!, and

24

Page 25: Chapter 4

• Γ(

12

)=√

π.

One of the most useful features of the Weibull distribution is the simple form of the cdf:

F (x; α, β) =

1− exp

[−

(x

β

)α]x ≥ 0

0 x < 0

.

Note that X may be shifted such that the minimum value is not 0 as defined in the pdf. If thisminimum value is unknown, but positive, the pdf of X − γ will just have x− γ replacing x in thecdf.

Example (Devore, Page 180, Exercise 67) The authors of a paper state that ”the Weibull dis-tribution is widely used in statistical problems relating to aging of solid insulating materialssubjected to aging and stress.” They propose the use of the distribution as a model for time(in hours) to failure of solid insulating specimens subjected to AC voltage. The values ofthe parameters depend on the voltage and temperature; suppose that α = 2.5 and β = 200(values suggested by data in the paper).

a. What is the probability that a specimen’s lifetime is at most 200? Less than 200? Morethan 300?

Solution Let X = a specimen’s time to failure. Then the desired probabilities are:

P (X ≤ 200) = 1− exp

[−

(200

200

)2.5]

= 1− exp (−1) = 0.6321

P (X < 200) = P (X ≤ 200) = 0.6321

P (X > 300) = 1− P (X ≤ 300) = exp

[−

(300

200

)2.5]

= 0.0636.

b. What is the probability that a specimen’s lifetime is between 100 and 200?

Solution P (100 ≤ X ≤ 200) = exp[− (

100200

)2.5]− exp

[− (

200200

)2.5]

= 0.4701.

c. What value is such that exactly 50% of all specimens have lifetimes exceeding that value?

Solution We want the median. The equation F (µ̃) = 0.5 reduces to:

0.5 = exp

[−

(µ̃

200

)2.5]

.

Therefore µ̃ = 172.7223.

25

Page 26: Chapter 4

¨

26

Page 27: Chapter 4

The Beta Distribution We often need to model the distribution of a proportion (i.e., 0 < X <1), where X is a continuous random variable. The beta distribution is often used in this framework.X is said to have a beta distribution with parameters α, β, A, and B if the pdf of X is:

f (x; α, β, A, B) =

1

B − A

Γ (α + β)

Γ (α) Γ (β)

(x− A

B − A

)α−1 (B − x

B − A

)β−1

A ≤ x ≤ B

0 otherwise

.

The case A = 0 and B = 1 gives the standard beta distribution.The mean and variance for a beta distribution are:

E (X) = A + (B − A) · α

α + β

V (X) =(B − A)2 αβ

(α + β)2 (α + β + 1).

The Family of Gamma Distributions Another widely used distributional model for skeweddata is the gamma family of distributions.

A continuous random variable X is said to have a gamma distribution if the pdf of X is:

f (x; α, β) =

1

βαΓ (α)xα−1 exp

(−x

β

)x ≥ 0

0 otherwise

,

where α > 0, β > 0, and Γ (α) is gamma function defined above. The standard gamma distributionhas β = 1.

For the standard gamma distribution, the pdf is f (x; α) = 1Γ(α)

xα−1 exp (−x). The pdf is strictly

decreasing for any α ≤ 1 and has a skewed shape (with a definite maximum) for any α > 0. Thefigures below illustrate some gamma density functions and standard gamma density functions.

The mean and variance for a gamma distribution are:

E (X) = αβ

V (X) = αβ2.

27

Page 28: Chapter 4

The proof for E (X) = αβ is:

E (X) =

∫ ∞

−∞x · f (x) dx =

∫ ∞

0

x · xα−1 exp(−x

β

)

βαΓ (α)dx =

∫ ∞

0

αβ

αβ·x · xα−1 exp

(−x

β

)

βαΓ (α)dx

= αβ

∫ ∞

0

xα exp(−x

β

)

βα+1Γ (α + 1)dx

︸ ︷︷ ︸GAMMA(α+1,β) pdf

= αβ (1) = αβ.

The parameter α is the shape parameter. For α ≤ 1, the pdf is strictly decreasing. Theparameter β is a scale parameter. It stretches (β > 1) or compresses (β < 1) the pdf for a constantα. Therefore, the gamma pdf can be very flexible for representing many skewed distributions.

The cdf of the standard gamma distribution (β = 1) is given by the incomplete gamma function:

F (x; α) =

∫ x

0

yα−1 exp (−y)

Γ (α)dy,

for x > 0. Table A.4 in Devore (p. 706) gives values for the incomplete gamma function for variousvalues of x and α.

The incomplete gamma function can also be used for a general β. Let X ∼ GAMMA(α, β).Then:

P (X ≤ x) = F

(x

β; α

),

where F is the incomplete gamma function.

Example (Devore, Page 173, Exercise 57) Suppose that when a transistor of a certain type issubjected to an accelerated life test, the lifetime X (in weeks) has a gamma distribution withmean 24 weeks and standard deviation 12 weeks.

a. What is the probability that a transistor will last between 12 and 24 weeks?

Solution We are given that αβ = 24 and αβ2 = 122 = 144. Therefore:

β =αβ2

αβ=

144

24= 6

α =24

β=

24

6= 4.

Hence:

P (12 ≤ X ≤ 24) = F

(24

6; 4

)− F

(12

6; 4

)

= F (4; 4)− F (2; 4) = 0.567− 0.143 = 0.424.

b. What is the probability that a transistor will last at most 24 weeks? Is the median of thelifetime distribution less than 24? Why or why not?

28

Page 29: Chapter 4

Solution The desired probability is:

P (X ≤ 24) = F (4; 4) = 0.567,

so while the mean is 24, the median is less than 24. This is a result of the positive skewof the gamma distribution.

c. What is the 99th percentile of the lifetime distribution?

Solution We need c such that 0.99 = P (X ≤ c) = F(

c6; 4

). From Table A.4, we see that

c6

= 10. Hence c = 60.

d. Suppose the test will actually be terminated after t weeks. What value of t is such thatonly one-half of 1% of all transistors would still be operating at termination?

Solution The desired value of t is the 99.5th percentile, so:

0.995 = F

(t

6; 4

)⇒ t

6= 11 ⇒ t = 66.

29

Page 30: Chapter 4

¨

30

Page 31: Chapter 4

The Exponential Distribution A special case of the gamma distribution is the exponentialdistribution. Taking α = 1 and β = 1

λgives the exponential pdf:

f (x; λ) =

{λ exp (−λx) x ≥ 0

0 x < 0

},

for λ > 0.The expected value and variance of an exponential random variable X are:

E (X) = αβ = (1)

(1

λ

)=

1

λ

V (X) = αβ2 = (1)

(1

λ

)2

=1

λ2.

The cdf of an exponential random variable X is:

F (x; λ) =

{1− exp (−λx) x ≥ 0

0 x < 0

}.

The exponential distribution is also a special case of the Weibull distribution with α = 1 and β = 1λ.

The exponential distribution is frequently used as the model for distribution of inter-arrival timesor the times between successive events.

We have already discussed the Poisson distribution as a possible model for the number of eventsoccurring in a time interval of length t. If the occurrence of such events is distributed as Poissonwith parameter αt then the distribution of the elapsed time between successive events is distributedas exponential with parameter λ = α.

Another application of the exponential distribution is to model the distribution of componentlifetime. This application is the ”memoryless” property :

P (X ≥ t2 | X ≥ t1) = P (X ≥ t) = exp (−λt) ,

for t2 = t1 + t. Although we haven’t discussed conditional probability in much detail, this propertysays that the probability of failure for an additional t units of times is the same as the probabilityof surviving any t units of time, irrespective of the fact the we know it didn’t fail after t1 units.

Example (Devore, Page 173, Exercise 59) The time X (in seconds) that it takes a librarian tolocate an entry in a file of records on checked-out books has an exponential distribution withexpected time 20 seconds. Calculate the following probabilities: P (X ≤ 30), P (X ≥ 20), andP (20 ≤ X ≤ 30).

Solution The value of λ is λ = 1E(X)

= 120

. The desired probabilities are then:

P (X ≤ 30) = 1− exp

(−30

20

)= 0.7769

P (X ≥ 20) = 1−[1− exp

(−20

20

)]= exp (−1) = 0.3679

P (20 ≤ X ≤ 30) = 1− exp

(−30

20

)−

[1− exp

(−20

20

)]= 0.1447.

Note that this non-symmetric distribution, the probability of an observed X being greaterthan the mean is not 0.5.

31

Page 32: Chapter 4

¨

The Chi-Squared Distribution The chi-squared distribution is another member of the gammafamily with α = ν

2and β = 2. A random variable X has a chi-squared distribution with ν degrees

32

Page 33: Chapter 4

of freedom, denoted X ∼ χ2ν , if it has pdf:

f (x; ν) =

1

2ν/2Γ(

ν2

)x(ν/2)−1 exp(−x

2

)x ≥ 0

0 x < 0

,

for ν = 1, 2, 3, ... .If X ∼ χ2

ν , then:

E (X) = ν

V (X) = 2ν.

The chi-squared distribution is widely used in statistical inference. In fact, if X ∼ N (0, 1),then X2 ∼ χ2

1. The most commonly used procedure in inference which involves the chi-squareddistribution is the Pearson statistic:

k∑i=1

(Oi − Ei)2

Ei

,

used to see whether categorical data consisting of elements y1, y2, ..., yk in k cells fits pre-specifiedprobabilities of falling in these cells. Under fairly general conditions, this statistic will (in the longrun for many samples) follow a distribution which is χ2

k−1.

33

Page 34: Chapter 4

¨

34

Page 35: Chapter 4

4.6 – Probability Plots

Given a sample of size n (x1, x2, ..., xn), how do we decide whether or not a particular distributionis appropriate for modelling? Assuming we have a continuous random variable, a probability plotis a mechanism for comparing the distribution of the observed data with a reference distributionin order to determine if the family of the reference distribution is a reasonable model for the data.The method makes use of sample quantiles. The pth quantile is the (100p)th percentile. Recall thatthe (100p)th percentile is the number η (p) such that F (η (p)) = p.

We order the data set from smallest to largest to obtain the sample order statistics, x(1), x(2), ..., x(n),where x(i) is the ith smallest observation. Due to the possibility of n being even and hence the

position of a percentile is actually between two observations we define x(i) to be the[

100(i−0.5)n

]th

sample percentile.A probability plot uses the order pair:

([100 (i− 0.5)

n

]th Reference Percentile, ith Smallest Sample Observation

).

In other words, we’re looking to see how close the ith smallest sample observation matches up with

the[

100(i−0.5)n

]th percentile of the reference distribution. A straight line is a good match.

Example (Devore, Page 183, Example 4.29) The value of a certain physical constant is knownto an experimenter. The experimenter makes n = 10 independent measurements of thisvalue using a particular measurement device and records the resulting measurement errors.These observations appear in the following table (ordered with the corresponding referencepercentiles).

Is is plausible that the random variable measurement error has a standard normal distribution?The needed standard normal percentiles are also displayed in the table. Thus, the points in theprobability plot are (−1.645,−1.91) , (−1.037,−1.25) , ..., and (1.645, 1.56). The figure below showsthe resulting plot. Although the points deviate a bit from the 45◦ line, the predominant impressionis that this line fits the points very well. The plot suggests that the standard normal distributionis a reasonable probability model for measurement error.

35

Page 36: Chapter 4

Families other than the normal family can also be compared to a standard distribution. For twoparameter families with location and scale parameters θ1 and θ2, this corresponds to θ1 = 0 andθ2 = 1. Hence the percentiles of the standard reference distribution corresponding to 100(i−0.5)

nfor

i = 1, 2, ..., n are plotted against the sorted data set as with the normal distribution. Again, plotsclose to a straight line indicate good fit.

Another distribution which is useful when looking at probability or quantile-quantile (QQ) plotsmay be the extreme value distribution with parameters θ1 and θ2. The cdf of the extreme valuedistribution is:

F (x; θ1, θ2) = 1− exp

[− exp

(x− θ1

θ2

)].

When considering a Weibull distribution for X, we know that for X ∼WEIB(α, β), ln (X) has anextreme value distribution with θ1 = ln (β) and θ2 = α. The incomplete gamma function makesexpression involving the quantiles of the Weibull distribution difficult, so this result is beneficial.Since an exponential distribution is a special case of the Weibull distribution, it can also be checkedvia this method.

36