probability and statistics - unene.ca · probability and statistics — probability ©wei-chau xie...

UNENE Math Refresher Course

Probability and Statistics

1

Probability and Statistics — Statistical Measures © Wei-Chau Xie

Statistical Measures

Given a sample of n observations x1, x2, . . . , xn of random variable X.

Location

{

Central tendency: mean, median, mode

Position: percentiles

Sample Mean (Average, Expected Value) X = 1

n

n∑

i=1

xi

Median m is the value above and below which an equal number of

observations lie, i.e., P{

X<m}

= P{

X>m}

= 0.5

Mode is the most frequently occurring value.

X X

Uni-modal frequency distribution Bi-modal frequency distribution

Relative frequency Relative frequency

2


pth percentile Qp is the point below which p-percent of the observations lie.

Median is the 50th percentile, m = Q0.5

Procedure for finding Qp

( 1

n< p<

n−1

n

)

Sort the data in ascending order: x1, x2, . . . , xn

k equals the integral part of (n+1)p

d equals the decimal part of (n+1)p

Qp lies between xk and xk+1, i.e.

Qp = xk + d (xk+1 − xk)

3


Variability (dispersion) characterizes individual differences.

Variance measures variation or spread in the data

Var(X) = s2 = 1

n−1

n∑

i=1

(xi − X)2

Standard deviation s =√

Var(X)

X

s2

s2 > s1

s1Relative frequency

X1=X2

4


Coefficient of variation δ expresses the standard deviation in relative terms

δ = s∣∣X

∣∣

, X 6= 0

The coefficient of variation is useful because the standard deviation of data

must always be understood in the context of the mean of the data.

When comparing between data sets with different units or widely different

means, we should use the coefficient of variation for comparison instead of

the standard deviation.

5


Example

We have a batch of 1000 I-beams used for building construction. A sample of 10

tensile strength measurements is obtained:

126, 128, 135, 146, 137, 142, 125, 131, 139, 141

Sample mean X = 1

10

10∑

i=1

xi = 1

10(126 + 128 + 135 + · · · + 141) = 135.0

Sample standard deviation

s2 = 1

9

10∑

i=1

(xi − X)2

= 1

9[(126−135)2 + (128−135)2 + (135−135)2 + · · · + (141−135)2]

= 472

9= 52.44 =⇒ s = 7.24

Coefficient of variation δ = s

X= 7.24

135.0= 0.054

6


Sort the data in ascending order

125, 126, 128, 131, 135, 137, 139, 141, 142, 146

25th percentile Q0.25

p = 0.25, 0.1< p<0.9

(n+1)p = 11×0.25 = 2.75 =⇒ k = 2, d = 0.75 =⇒ x2<Q0.25<x3

Q0.25 = x2 + d (x3−x2) = 126 + 0.75×(128 − 126) = 127.5

50th percentile Q0.50 = Median

p = 0.50, 0.1< p<0.9

(n+1)p = 11×0.50 = 5.5 =⇒ k = 5, d = 0.5 =⇒ x5<Q0.50<x6

Q0.50 = x5 + d (x6−x5) = 135 + 0.50×(137 − 135) = 136

7

Probability and Statistics — Probability © Wei-Chau Xie

Probability

Consider an experiment with a finite number of mutually exclusive outcomes

which are equiprobable (equally likely due to the nature of the experiment).

The probability of event A is the fraction of outcomes in which A occurs.

P(A) = N(A)

N

where N: total number of outcomes

N(A): number of outcomes leading to the occurrence of event A

8


Example

In rolling a single unbiased dice, what is the probability of getting an even

number of spots?

N = 6 mutually exclusive equiprobable outcomes (getting 1, 2, 3, 4, 5, 6)

A = getting an even number of sports, N(A) = 3 (getting 2, 4, 6)

∴ P(A) = N(A)

N= 3

6= 1

2

9


Basics of Combinatorial Analysis

Given n1 elements a(1)1 , a

(1)2 , . . . , a

(1)n1 ,

n2 elements a(2)1 , a

(2)2 , . . . , a

(2)n2 ,

...

nr elements a(r)1 , a

(r)2 , . . . , a

(r)nr ,

there are precisely n1n2· · · nr distinct ordered r-tuples (a(1)i1 , a

(2)i2 , . . . , a

(r)ir )

containing one element of each kind.

A population of n elements has precisely

Cnr =

(nr

)

= n!r!(n−r)!

sub-populations of size r6n.

Cnr is called the number of combination of n things taken r at a time

(without regard for order).

10


Example

In rolling a pair of unbiased dice, what is the probability that both dice show the

same number of spots?

In rolling a pair of dice, each outcome is an ordered pair (a, b),

where a is the number on the first dice, n1 = 6

b is the number on the second dice, n2 = 6

∴ There are N = n1n2 = 6×6 = 36 mutually exclusive equiprobable events.

A = both dice show the same number of spots

∴ A occurs whenever a = b =⇒ N(A) = 6

P(A) = N(A)

N= 6

36= 1

6

11


Example (Quality Control)

A batch of 100 manufactured items is checked by an inspector, who examines

10 items selected at random. If none of the 10 items is defective, he accepts the

whole batch. Otherwise, the batch is subjected to further inspection. What is

the probability that a batch containing 10 defective items will be accepted?

Draw 10 items from 100 items: N = C10010 = 100!

10!(100−10)! = 100!10! 90!

A = the batch is accepted =⇒ draw 10 items from 90 non-defective items

N(A) = C9010 = 90!

10!(90−10)! = 90!10! 80!

∴ P(A) = N(A)

N=

90!10! 80!

100!10! 90!

= 90!80! · 90!

100! = 81·82· · · 90

91·92· · · 100= 0.3305

12


The mutually exclusive outcomes of an experiment are called elementary

events (sample points).

The set of all elementary events associated with a given experiment is the

sample space, denoted by �.

Venn Diagram

A sample space � is represented by a rectangle; an event A is then represented

symbolically by a closed region within this rectangle.

If the area of the rectangle is taken as 1, the area of the closed region A is the

probability that event A will occur.

�

A

13


Rolling a balanced dice, � ={

1, 2, 3, 4, 5, 6}

The Venn diagram is a rectangle of area 1, divided into six equal rectangles of

area16

.

�

1 3 5

2 4 6

P{

Getting 1}

= 16

�

1 3 5

2 4 6

P{

Getting an odd number of spots}

= P{

Getting 1 or 3 or 5}

= 12

�

1 3 5

2 4 6

14


Combination of Events

A1 and A2 are mutually exclusive if A1 and A2 cannot occur simultaneously.

The union of events A1 and A2, A1 ∪ A2, is the event consisting of the

occurrence of at least one of the events A1 and A2.

The intersection of events A1 and A2, A1∩A2 or A1A2, is the event consisting

of the occurrence of both events A1 and A2.

The difference of events A1 and A2, A1−A2, is the event in which A1 occurs

but not A2.

The complementary event of A, A, is the event that A does not occur, i.e.,

A = �−A.

If the occurrence of event A1 implies the occurrence of event A2, then

A1 ⊂ A2 (event A1 is contained in event A2)

15


�

A1 and A2 are mutually exclusive A1 ∪ A2

A1 ∩ A2 A1 − A2

A1 ⊂ A2

A1 A2

�

��

��

A

A1 A2

A1

A1

A2 A1 A2

A2

A

A=� − A

16


Statistical Independence

If the outcome of one experiment has no influence on the outcome of the other,

these two experiments are called statistically independent.

Let

A1 = an event associated with only the first experiment

A2 = an event associated with only the second experiment

Then the occurrence of A1 has no influence on the occurrence of A2, and

conversely.

Two events A1 and A2 are said to be (statistically) independent if they satisfy

P(A1 ∩ A2) = P(A1 A2) = P(A1) P(A2)

17


The Addition Law for Probabilities

If A1, A2, . . . , An are mutually exclusive, then

P(A1 ∪ A2 ∪ · · · ∪ An) = P(A1) + P(A2) + · · · + P(An)

�

A1 and A2 are mutually exclusive

A1 A2

�

A

A

A=� − A

P(A1 ∪ A2) = P(A1)+P(A2) P(A) = 1−P(A)

A ∪ A = �, A and A are mutually exclusive =⇒ P(A) + P(A) = 1

P(A) = 1 − P(A) or P(A) = 1 − P(A)

18


The Addition Law for Probabilities

For arbitrary A1 and A2: P(A1 ∪ A2) = P(A1) + P(A2) − P(A1 A2)

�

�

A1 A2

P(A1 ∪ A2) P(A1) P(A2)

�

A1 A2

A1 A2

�

A1 A2

�

A1 A2A2

+=

−

P(A1 ∩ A2)P(A1)+P(A2)

For arbitrary A1, A2, and A3,

P(A1 ∪ A2 ∪ A3) = P(A1) + P(A2) + P(A3)

− [P(A1 A2) + P(A2 A3) + P(A3 A1)] + P(A1 A2 A3)

19


Example

The water supply for a city C comes from two sources A and B. The water

is transported by a pipeline consisting of branches 1, 2, and 3; the probability

of failure of each branch is 0.01 and the failures of the individual branches are

statistically independent. Assume that either source alone is sufficient to supply

the water for the city. What is the probability of water shortage for city C?

Source A 1

2

3City C

Source B

S = Water shortage in city C; Ai = Branch i fails, P(Ai) = 0.01, i = 1, 2, 3

Method 1: S = A3 ∪ A1 A2

P(S) = P(A3 ∪ A1 A2) = P(A3) + P(A1 A2) − P(A3 A1 A2)

= P(A3) + P(A1)P(A2) − P(A3)P(A1)P(A2)

= 0.01 + 0.012 − 0.013 = 0.010099

20


Method 2: S = A1 A2 A3 ∪ A1 A2 A3 ∪ A1 A2 A3

These events are mutually exclusive.

P(S) = P(A1 A2 A3 ∪ A1 A2 A3 ∪ A1 A2 A3)

= P(A1 A2 A3) + P(A1 A2 A3) + P(A1 A2 A3)

= P(A1)P(A2)P(A3) + P(A1)P(A2)P(A3) + P(A1)P(A2)P(A3)

= 0.01×0.99×0.99 + 0.99×0.01×0.99 + 0.99×0.99×0.99

= 0.989901

∴ P(S) = 1 − P(S) = 1 − 0.989901 = 0.010099

21


Dependent Events

P(A∣∣ B) = Conditional probability of A given that event B has occurred.

P(A∣∣ B) = P(AB)

P(B)=⇒ P(AB) = P(A

∣∣ B)P(B) = P(B

∣∣ A)P(A)

�

BA AB

P(A|B)=P(AB)

P(AB)

P(B)

P(B)

22


Dependent Events

P(A∣∣ B) = P(AB)

P(B)=⇒ P(AB) = P(A

∣∣ B)P(B) = P(B

∣∣ A)P(A)

If A and B are statistically independent,

P(A∣∣ B) = P(A), P(B

∣∣ A) = P(B) =⇒ P(AB) = P(A)P(B)

If B implies A, B ⊂ A =⇒ AB = B, P(AB) = P(B) =⇒ P(A∣∣ B) = 1.

If A1, A2, . . . , An are mutually exclusive events with A =n∪

k=1Ak, then

P(A∣∣ B) =

n∑

k=1

P(Ak

∣∣ B)

P(A∣∣ B) + P(A

∣∣ B) = 1

23


Example

The foundation of a building may fail either from bearing capacity or by

excessive settlement, with B and S denoting the respective failure mode. If

P(B) = 0.001, P(S) = 0.008, and P(B∣∣ S) = 0.1, determine

1. the probability of failure of the foundation;

2. the probability that the building has excessive settlement but no failure in

bearing capacity.

1. F = Failure of the foundation = B ∪ S

P(F) = P(B ∪ S) = P(B) + P(S) − P(BS)

= P(B) + P(S) − P(B∣∣ S)P(S)

= 0.001 + 0.008 − 0.1×0.008 = 0.0082

2. P(BS) = P(B∣∣ S) P(S) = [1 − P(B

∣∣ S)] P(S)

= (1−0.1)×0.008 = 0.0072

24


Example

Two power generating units a and b operate in parallel to supply the power

requirements of a small city. The demand for power is subject to considerable

fluctuation, and it is known that each unit has a capacity so that it can supply

the city’s full power requirement 75% of the time in case the other unit fails. The

probability of failure of each unit is 0.1, whereas the probability that both units

will fail is 0.02. If there is failure in the power generation, what is the probability

that the city will have its supply of full power?

A = unit a fails, P(A) = 0.1; B = unit b fails, P(B) = 0.1; P(AB) = 0.02

E = only one unit fails if there is failure = (AB ∪ AB)∣∣ (A ∪ B)

S = the city has power supply if there is failure, P(S) = 0.75 P(E)

P(E) = P[(AB ∪ AB)∣∣ (A ∪ B)] = P[AB

∣∣ (A ∪ B)] + P[AB

∣∣ (A ∪ B)]

=P[AB(A ∪ B)]

P(A ∪ B)+

P[AB(A ∪ B)]P(A ∪ B)

25


∵ P[AB(A ∪ B)] = P[(A ∪ B)∣∣ AB] P(AB) = 1 · P(AB)

P[AB(A ∪ B)] = P[(A ∪ B)∣∣ AB] P(AB) = 1 · P(AB)

∴ P(E) = P(AB) + P(AB)

P(A ∪ B)=

P(A∣∣ B)P(B) + P(B

∣∣ A)P(A)

P(A) + P(B) − P(AB)

P(A∣∣ B) = P(AB)

P(B)= 0.02

0.1= 0.2, P(A

∣∣ B) = 1 − P(A

∣∣ B) = 0.8

P(B∣∣ A) = P(BA)

P(A)= 0.02

0.1= 0.2, P(B

∣∣ A) = 1 − P(B

∣∣ A) = 0.8

∴ P(E) = 0.8×0.1 + 0.8×0.1

0.1 + 0.1 − 0.02= 0.889

∴ P(S) = 0.75 P(E) = 0.75×0.889 = 0.667

26


Total Probability Formula

Suppose B1, B2, . . . , Bn are mutually exclusive and collectively exhaustive (i.e.,

B1 ∪ B2 ∪ · · · ∪ Bn = �) events in the sense that one and only one of the events

B1, B2, . . . , Bn always occurs. Then

P(A) = P(AB1) + P(AB2) + · · · + P(ABn) =n∑

k=1

P(ABk)

P(A) = P(A∣∣ B1)P(B1) + P(A

∣∣ B2)P(B2) + · · · + P(A

∣∣ Bn)P(Bn)

=n∑

k=1

P(A∣∣ Bk)P(Bk)

27


�

�

�

�

P(A)

P(AB1) P(AB2)

P(AB3) P(AB4)

B1

B1

B2

B2

B3

B3

B4

B4

+

++

=

A

P(A) = P(AB1) + P(AB2) + P(AB3) + P(AB4)

= P(A∣∣ B1)P(B1) + P(A

∣∣ B2)P(B2) + P(A

∣∣ B3)P(B3) + P(A

∣∣ B4)P(B4)

28


Example

Suppose that in any given year, the probability of damaging storms in the city

of Toronto is 0.20. During such a storm, if not accompanied by tornadoes, the

probability of structural failures in the city of Toronto is 0.10. When a storm

occurs in the region, the probability that it will be accompanied by a tornado

is 0.25, and the probability that this tornado will hit the city of Toronto is 0.05.

Assume that tornadoes occur only during a storm, and when the city is hit by

a tornado it is certain to cause structural failures, whereas the probability of

structural failure in the city when a tornado occurs in the region but does not hit

the city is 0.10. What is the probability of structural failure in the city of Toronto

in a period of one year?

S = having storm, P(S) = 0.2;

T = having tornado; H = tornado hits the city;

F = having structural failure

29


Since ST, ST, ST, ST are mutually exclusive and collectively exhaustive, apply

the Total Probability Formula:

P(F) = P(F∣∣ ST)P(ST) + P(F

∣∣ ST)P(ST) + P(F

∣∣ ST)P(ST) + P(F

∣∣ ST)P(ST)

︸︷︷︸︸︷︷︸

0 0

= P(F∣∣ ST)P(ST) + P(F

∣∣ ST)P(ST)

Again, since H and H are mutually exclusive and collectively exhaustive

P(F∣∣ ST) = P[(F

∣∣ ST)

∣∣ H] P(H) + P[(F

∣∣ ST)

∣∣ H] P(H)

= 1.0×0.05 + 0.1×0.95 = 0.145

P(ST) = P(T∣∣ S) P(S) = 0.25×0.2 = 0.05

P(ST) = P(T∣∣ S) P(S) = 0.75×0.2 = 0.15

P(F∣∣ ST) = 0.1

∴ P(F) = 0.145×0.05 + 0.1×0.15 = 0.02225

30


Bayes’s Theorem

∵ P(AB) = P(A∣∣ B)P(B) = P(B

∣∣ A)P(A)

∴ P(B∣∣ A) =

P(A∣∣ B)P(B)

P(A)

If B1, B2, . . . , Bn are mutually exclusive and collectively exhaustive events of

which one must occur, then

∵ P(ABi) = P(A∣∣ Bi)P(Bi) = P(Bi

∣∣ A)P(A)

∴ P(Bi

∣∣ A) =

P(A∣∣ Bi)P(Bi)

P(A)=

P(A∣∣ Bi)P(Bi)

n∑

k=1

P(A∣∣ Bk) P(Bk)

Bayes's theorem provides a formula for finding the probability

that the "effect" A was "caused" by the event Bi.

31


Example

The computing system at the University of Waterloo is currently undergoing

shutdown for repairs. Previous shutdowns have been due to hardware failure,

software failure, or power failure. The system is forced to shut down 73%

of the time when it experiences hardware problems, 12% of the time when it

experiences software problems, and 88% of the time when it experiences power

problems. Maintenance engineers have determined that the probabilities of

hardware, software, and power problems are 0.01, 0.05, and 0.02, respectively.

The probability that any two types of failure occur simultaneously is negligible.

1. What is the probability that the current shutdown is due to hardware failure?

2. What is the probability that the current shutdown is due to software failure?

3. What is the probability that the current shutdown is due to power failure?

F = Computing system shutdown

H = Hardware failure, P(H) = 0.01, P(F∣∣ H) = 0.73

S = Software failure, P(S) = 0.05, P(F∣∣ S) = 0.12

P = Power failure, P(P) = 0.02, P(F∣∣ P) = 0.88

32


When computing system shutdown occurs, one and only one of H, S, P must

occur. From the Total Probability Formula:

P(F) = P(F∣∣ H) P(H) + P(F

∣∣ S) P(S) + P(F

∣∣ P) P(P)

= 0.73×0.01 + 0.12×0.05 + 0.88×0.02 = 0.0309

1. P(H∣∣ F) = P(HF)

P(F)= 0.0073

0.0309= 0.236

2. P(S∣∣ F) = P(SF)

P(F)= 0.006

0.0309= 0.194

3. P(P∣∣ F) = P(PF)

P(F)= 0.0176

0.0309= 0.570

Hence, if there is computing system failure, the maintenance engineers should

check power system first, hardware second, and software last.

33

Probability and Statistics — Discrete Probability Distributions © Wei-Chau Xie

Discrete Probability Distributions

A random variable X is discrete if X takes only a finite or countably infinite number

of distinct values x, with corresponding probabilities (probability mass functions)

pX(x) = P(X = x), 06 pX(x)61

Normalization condition:∑

pX(x) = 1

Expected value: E[X] =∑

x pX(x)

Variance: Var(X) =∑(

x−E[X])2

pX(x) = E[X2] −{

E[X]}2

Standard deviation: σ =√

Var(X)

Some Properties:

E[c] = c, Var(c) = 0, where c is an arbitrary constant.

E[c1 X1+c2 X2] = c1 E[X1] + c2 E[X2]

If X1 and X2 are independent random variables

E[X1X2] = E[X1] E[X2], Var(c1X1+c2X2) = c21 Var(X1) + c2

2 Var(X2)

34


Bernoulli Trials and the Binomial Distribution

Bernoulli trials are identical, independent experiments, in each of which

an event A may occur with probability p or fail to occur with probability

q = 1−p.

In n consecutive Bernoulli trials, each elementary event ω can be described

by a sequence like 1011· · · 01001 (n digits), where success (or failure) in the

ith trial is denoted by 1 (or 0) in the ith digit.

The probability of an elementary event ω, in which there are precisely k

successes and (n−k) failures, is P(ω) = pkqn−k.

X = total number of successes in n Bernoulli trials

X = k, if there are precisely k successes in the elementary event ω.

Number of distinct sequences is Cnk (selecting k trials out of n for successes).

P(X = k) = Prob. of having k successes (n−k failures) in n Bernoulli trials

P(X = k) = C nk pk qn−k, k = 0, 1, . . . , n Binomial Distribution

35


Example

Tossing a coin, X = Getting heads

1: Heads, p; 0: Tails, q = 1 − p

Tossing a coin three times, what is the probability of getting one heads?

1 0 0 Probability = p·q·q = p1q2

0 1 0 Probability = q·p·q = p1q2

0 0 1 Probability = q·q·p = p1q2

Number of sequences = C31 = 3

P(X = 1) = P{

Getting one heads in three tosses}

= p·q·q + q·p·q + q·q·p = 3p1q2

= C31 p1q2

36


Example

Suppose the probability of hitting a target with a single shot is 0.001. What is the

probability of hitting the target 2 or more times in 5000 shots?

p = probability of hitting a target with a single shot = 0.001, q = 1−p = 0.999

Hk = hitting the target k times in 5000 shots

A = hitting the target 2 or more times in 5000 shots = H2 ∪ H3 ∪ · · · ∪ H5000

A = H0 ∪ H1

P(A) = P(H0 ∪ H1) = P(H0) + P(H1)

= C50000 p0q5000 + C5000

1 p1q4999 = 1·1·q5000 + 5000·p q4999

= q4999(q + 5000 p)

= 0.9994999(0.999 + 5000×0.001) = 0.0404

∴ P(A) = 1 − P(A) = 0.9596

37


Poisson Distribution

Consider an event A with the following assumptions

The event can occur at random at any time.

The occurrence of an event in a given time interval is independent of that in

any other non-overlapping intervals.

λ = mean rate of occurrence of event A

a = λt = average number of occurrences of event A in time interval t

Random variable X = number of occurrences of event A in time interval t

P(X = k) = ak e−a

k! , k = 0, 1, 2, . . . Poisson Distribution

Expected value: E[X] = a = λt

Variance: Var(X) = a = λt

Return period = 1

λ= mean time between two consecutive occurrences

38


Example

A building is located in a region where earthquakes may occur, which may be

modelled as a Poisson process. From past record, the mean rate of occurrence of

a large earthquake that may cause damage to the building is 1 in 50 years. Assume

that during a strong earthquake, the probability of damage to the building is 0.1.

Consider a 10-year period.

1. Determine the probability of the building subjected to earthquakes.

2. What is the probability that the building will be damaged?

Q = number of earthquakes in a 10-year period

t = 10, λ = 150

, a = λt = 0.2

P(Q = 0) = a0e−a

0! = e−a = e−0.2 = 0.819

∴ Probability of the building subjected to earthquakes = P(Q>1)

= 1 − P(Q = 0) = 1 − 0.819 = 0.181

39


D = the building will be damaged by earthquakes in a 10-year period

p = probability of survival of the building in an earthquake = 0.9

Since Q = 0, 1, 2, . . . are mutually exclusive and collectively exhaustive, from

the Total Probability Formula

P(D) = P(D∣∣ Q = 0) P(Q = 0) + P(D

∣∣ Q = 1) P(Q = 1) + · · ·

+ P(D∣∣ Q = n) P(Q = n) + · · ·

= 1× 0.20e−0.2

0! + 0.91× 0.21e−0.2

1! + · · · + 0.9n× 0.2ne−0.2

n! + · · ·

= e−0.2

[

1 + (0.9×0.2)1

1! + · · · + (0.9×0.2)n

n! + · · ·]

= e−0.2 e0.9×0.2 = e−0.02 = 0.9802

P(D) = 1 − P(D) = 0.0198

ex = 1 + x

1! + x2

2! + · · · + xn

n! + · · ·

40

Probability and Statistics — Continuous Probability Distributions © Wei-Chau Xie

Continuous Probability Distributions

A random variable X is continuous if P(x1<X6x2) =∫ x2

x1

fX(x) dx.

fX(x)>0 is the probability density function (PDF).

Normalization condition:

∫ +∞

−∞fX(x) dx = 1

If X is a continuous random variable, P(X = x) = 0

xx x1

Area=P(x1<X<x2)

x2

fX(x)

FX(x)

FX(x) = P(X6x) =∫ x

−∞fX(x) dx, −∞< x <+∞

is the probability distribution function.

41


P(x1<X6x2) = FX(x2) − FX(x1)

fX(x) = dFX(x)

dx

Expected value: E[X] =∫ x

−∞x fX(x) dx

Variance: Var(X) = E[{

X−E[X]}2] = E[X2] −

{

E[X]}2

=∫ x

−∞

{

x−E[X]}2

fX(x) dx

42


Normal Distribution

Normal distribution is the most important distribution in statistical analysis.

Probability density function of the normally distributed random variable X:

fX(x) = 1√2π σ

e−(x−µ)2

2σ2

−∞<x<+∞

The expected value (mean) of X is µ, standard deviation is σ , X ∼ N(µ, σ 2).

0

0. 1

0. 2

0. 3

0. 4

–10 –8 –6 –4 –2 2 4 6 8 10

x

µ=0, σ=2

µ=2, σ=2.5

µ=–4, σ=1.5

µ=0, σ=3

µ=0, σ=1

fX(x)

43


Standard Normal Random Variable

A normally distributed random variable with µ = 0, σ = 1 is called a standard

normal random variable, Z ∼ N(0, 12).

Probability density function fZ(z) = 1√2π

e−z2

2

Probability distribution function 8(z) = P(Z6z) =∫ z

−∞

1√2π

e− y2

2dy

zz

fZ(z)

P(Z >z)=1–8(z) P(z1<Z<z2)=8(z2)–8(z1)P(Z<z)=8(z)

zz2z1

fZ(z)

8(z) is tabulated in Table 4.44


Standard Normal Random Variable

For any normal random variable X ∼ N(µ, σ 2).

P(X6x) = 8

(x−µ

σ

)

, P(X>x) = 1 − 8

(x−µ

σ

)

P(x1<X6x2) = 8

(x2−µ

σ

)

− 8

(x1−µ

σ

)

45


Example

A structure is resting on three supports, A, B, and C. Although the loads on

the three supports can be estimated accurately, the soil conditions under A, B,

and C are not completely predictable. Assume that the settlements SA, SB, and

SC are independent normally distributed random variables with means 2, 2.5,

3 cm and coefficients of variation 20%, 20%, 25%, respectively. What is the

probability that the maximum settlement will exceed 4 cm?

µA = 2, σA = 0.2 µA = 0.4

µB = 2.5, σB = 0.2 µB = 0.5

µC = 3, σC = 0.25 µC = 0.75

P(maxS>4) = 1 − P(maxS64) = 1 − P(SA 64 ∩ SB 64 ∩ SC 64)

= 1 − P(SA 64) P(SB 64) P(SC 64)

= 1 − 8

(4 − 2

0.4

)

8

(4 − 2.5

0.5

)

8

(4 − 3

0.75

)

= 1 − 8(5.0) 8(3.0) 8(1.33) = 1 − 1.0×0.9986×0.9082 = 0.093

46


Example

The diameters of rotor shafts DS in a lot and the inner diameters of bearings

DB in another lot are DS ∼ N(0.249, 0.0032) and DB ∼ N(0.255, 0.0022).

1. Determine the mean and standard deviation of the clearance between shafts

and bearings c =(

12 DB − 1

2 DS

)

selected from these lots?

2. If a shaft and a bearing are selected at random, what is the probability that

the shaft will not fit inside the bearing?

1. E[c] = E[12 DB− 1

2 DS] = 12 E[DB] − 1

2 E[DS] = 12 ·0.255 − 1

2 ·0.249 = 0.003

Var(c) = Var(RB−RS) = Var(

12 DB− 1

2 DS

)

=(

12

)2Var(DB) +

(

− 12

)2Var(DS)

= 14 ×0.0022 + 1

4 ×0.0032 = 3.25×10−6

σc =√

Var(c) =√

3.25×10−6 = 0.0018

2. P(DS >Db) = P(c<0) = 8

(0−0.003

0.0018

)

= 8(−1.67) = 0.0475

47


Lognormal Distribution

If X ∼ N(µ, σ 2), then Y = eX is lognormal Y ∼ LN(µ, σ 2) with PDF:

fY( y) =

1√2π σy

e−(ln y−µ)2

2σ2

, y>0

0, y60

A random variable Y has a lognormal probability distribution if ln Y is normal.

0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

54321 6y

µ=1, σ=2

µ=1, σ=1

µ=1, σ=0.5

µ=1, σ=0.25

fY(y)

48


µ, σ are not the mean and standard deviation of the lognormal distribution Y.

Expected value E[Y] = eµ+σ2

2, Variance Var(Y) = e2µ+σ 2(

eσ 2 − 1)

Given the mean E[Y] and the variance Var(Y) of the lognormal distribution,

the parameters µ and σ are

σ 2 = ln

[

1 + Var(Y){

E[Y]}2

]

= ln(

1 + δ2Y

)

, µ = ln{

E[Y]}

− 12 σ 2

P(Y6y) = 8

( ln y − µ

σ

)

The lognormal distribution is useful in those applications where the values of the

random variable are known to be strictly positive; for example, the strength and

fatigue life of material, the intensity of rainfall, the time for project completion, and

the volume of air traffic.

49


Example

The operational life T of a certain equipment is lognormally distributed with

a mean life of 1500 hours and a coefficient of variation of 30%. What is

the probability that two of the five equipments used, which are statistically

independent, will malfunction in less than 900 hours of operation?

E[T] = 1500, δT = 0.3

σ 2 = ln

[

1 + Var(T){

E[T]}2

]

= ln(1+δ2T) = ln(1+0.32) = 0.086, σ = 0.294

µ = ln{

E[T]}

− 12 σ 2 = ln 1500 − 1

2 ×0.086 = 7.270

p = P(T<900) = 8

(ln 900−µ

σ

)

= 8

(ln 900−7.270

0.294

)

= 8(−1.59) = 0.0559

P(2 of 5 equipments will malfunction in less than 900 hours of operation)

= C52 p2(1−p)3 = 10×0.05592×(1−0.0559)3 = 0.026

50


Exponential Distribution

If events occur according to a Poisson process with mean rate of occurrence

λ, then the time T between two consecutive occurrence of the event has an

exponential distribution.

fT(t) ={

λ e−λt, t>0

0, t<0FT(t) = P(T6 t) = 1−e−λt =⇒ P(T>t) = e−λt

t0

1

t

FT(t)=P(T< t)

fT(t)

Expected value: E[T] = 1

λ= Mean recurrence time = Return period

Variance: Var(T) = 1

λ2

51


Example

Historical records of earthquakes in San Francisco show that, during the period

1836-1961, there were 16 earthquakes of intensity VI or more. Assume that the

occurrence of such earthquakes in this region follows a Poisson process.

1. What is the probability that such earthquakes will occur in the next 2 years?

2. What is the probability that no earthquake of this high intensity will occur

in the next 10 years?

3. What are the return period of an intensity VI or higher earthquake and the

probability of such earthquakes occurring within the return period?

Mean rate of occurrence λ = 16126

= 0.127 earthquakes per year

(1) P(T62) = 1 − e−λt = 1 − e−0.127×2 = 0.224

(2) P(No earthquake of this high intensity will occur in the next 10 years)

= P(T>10) = e−λt = e−0.127×10 = 0.281

(3) P(Return period) = 1λ

= 12616

= 7.875 years

P(T67.813) = 1 − e−λt = 1 − e−0.127×7.875 = 1 − e−1 = 0.63252

Probability and Statistics — Sampling Distribution © Wei-Chau Xie

Consider a population with mean µ and standard deviation σ .

Take a sample of size n: x1, x2, . . . , xn.

Sample mean: X = 1

n

n∑

i=1

xi

Central Limit Theorem

If X is the mean of a random sample of size n taken from a population having

mean µ and standard deviation σ , then

z = X−µσ√n

is the value of a random variable whose distribution function approaches that of

the standard normal distribution as n→∞.

For n>30, X is a value of a random variable approximately N(

µ, σ 2X

)

.

σX

= σ√n

is called standard error.

53

Probability and Statistics — Confidence Intervals © Wei-Chau Xie

Confidence Intervals

Confidence interval estimation of µ (σ known)

X − zα/2 · σ√n

< µ < X + zα/2 · σ√n

with (1−α)100% confidence

Confidence interval estimation of µ (σ unknown)

X − zα/2 · s√n

< µ < X + zα/2 · s√n

, Large sample (n>30)

X − tα/2, (n−1)

· s√n

< µ < X + tα/2, (n−1)

· s√n

, Small sample


Confidence interval for σ 2

(n−1)s2

χ2α/2, (n−1)

< σ 2 <(n−1)s2

χ21−α/2, (n−1)


54


Confidence Intervals

Confidence interval estimation of the difference between means (µ1−µ2)

Large Sample (n1, n2 >30) with (1−α)100% confidence

X1 − X2 ± zα/2

√

s21

n1

+ s22

n2

Small Sample with (1−α)100% confidence

X1 − X2 ± tα/2, (n1+n2−2)

√

(n1−1)s21 + (n2−1)s2

2

n1 + n2 − 2· n1 + n2

n1n2

55


Example

The daily dissolved oxygen (DO) concentration for a stream at a station has been

recorded for 30 days, which give a sample mean X = 2.52 mg/L and sample

standard deviation s = 4.2 mg/L. Determine the 95% and 99% confidence

intervals for the population mean µ.

95% confidence interval =⇒ 1−α = 0.95, α = 0.05

tα/2, n−1 = t0.025, 29 = 2.045

µ = X ± t0.025, 29s√n

= 2.52 ± 2.045 × 4.2√30

= 2.52 ± 1.57

∴ 0.95 < µ < 4.09

99% confidence interval =⇒ 1−α = 0.99, α = 0.01

tα/2, n−1 = t0.005, 29 = 2.756

µ = X ± t0.005, 29s√n

= 2.52 ± 2.756 × 4.2√30

= 2.52 ± 2.11

∴ 0.41 < µ < 4.63

56

Probability and Statistics — Linear Regression © Wei-Chau Xie

Linear Regression

The fitting of a straight line through a given set of points according to some

specified goodness-of-fit criterion is called linear regression.

The Method of Least Square

Suppose the true linear relationship between X and Y is Y(x) = A + BX.

We have n paired observations (xi, yi), i = 1, 2, . . . , n, which are used to

obtain a and b, estimated values for A and B.

For the ith observation, the independent variable is xi, and the predicted

value of Y is Y(xi) = a+bxi.

The goal of lease-square regression is to find the values of a and b that

minimize the error

E =n

∑

i=1

[ yi − Y(xi)]2 =

n∑

i=1

[ yi − (a+bxi)]2

57


Estimated Linear Regression Coefficients

a = Y − bX, b = SXY

SXX

where

X = 1

n

n∑

i=1

xi, Y = 1

n

n∑

i=1

yi,

SXY =n

∑

i=1

(xi − X)(yi − Y) =n

∑

i=1

xi yi − n XY

SXX =n

∑

i=1

(xi − X)2 =n

∑

i=1

x2i − n X

2

SYY =n

∑

i=1

(yi − Y)2 =n

∑

i=1

y2i − n Y

2

58


X

X

Y

sY

measures total variability

in Y about the sample mean

sY |X

measures variability about

the estimated regression line

Y

Y(X)

sY =√

SYY

n−1, s

Y∣∣X

=√

SYY − bSXY

n−2

Total variability sY measures total variability in Y about the sample mean Y

without considering the effect of X.

Standard error of estimate sY∣∣X

measures variability about the estimated

regression line Y(X) = a+bX.59


With (1−α)100% Confidence

Confidence interval for the conditional mean given X

µY∣∣X

= (a + bX) ± tα/2, (n−2)·sY∣∣X

√

1

n+ (X−X)2

SXX

X

Y

XX

Y(X)=a+bX

(1–α)100% confidence interval for µY|X

Confidence interval of prediction for an individual Y given X

Y(X) = (a + bX) ± tα/2, (n−2)·sY∣∣X

√

1

n+ (X−X)2

SXX

+1

60


Correlation

Regression analysis is concerned with predicting the value of the dependent

variable Y for a known value of the independent variable X.

Correlation analysis is concerned with the nature of the relationship between

the two variables, focusing on the strength of that relationship.

The magnitude of the correlation coefficient ρ (−16ρ 6+1) is a measure of

the degree of linear inter-relationship between two variables.

When ρ = ±1.0, X and Y are linearly related.

When ρ = 0, there is no linear relationship between X and Y.

Sample correlation coefficient r = SXY√

SXX·SYY

61


X

Y

Y(X)

X

Positive Correlation

0 < ρ < 1

Negative Correlation

–1 < ρ < 0

Y Y(X)

X

Perfect Positive Correlation

ρ = +1

YY(X)

X

Perfect Negative Correlation

ρ = –1

Y

Y(X)

62


Example

A stress-strain relationship is to be established for steel beams by measuring

the strain of a specimen Y under the given stress X. The following sample

measurements are obtained:

X 14 23 9 17 10 22 5 12 6 16

Y 68 105 40 79 81 95 31 72 45 93

10∑

i=1

xi = 134,10∑

i=1

yi = 709,10∑

i=1

xi yi = 10747,10∑

i=1

x2i = 2140,

10∑

i=1

y2i = 55895

X = 1n

∑

xi = 110

×134 = 13.4, Y = 1n

∑

yi = 110

×709 = 70.9

SXX =∑

x2i − nX

2 = 2140 − 10×13.42 = 344.4

SYY =∑

y2i − nY

2 = 55895 − 10×70.92 = 5626.9

SXY =∑

xi yi − nXY = 10747 − 10×13.4×70.9 = 1246.4

63


Use the method of least squares to determine the expression for the estimated

regression line.

b = SXY

SXX

= 1246.4

344.4= 3.619

a = Y − b X = 70.9 − 3.619×13.4 = 22.405

∴ Y(x0) = a + bx0 = 22.405 + 3.619x0

Compute the sample standard deviation for the strain Y.

sY =√

SYY

n−1=

√

5626.9

9= 25.004

Compute the standard error of the estimate for strain Y.

sY∣∣X

=

√

SYY − bSXY

n−2=

√

5626.9 − 3.619×1246.4

8= 11.812

64


Determine the predicted strain when the stress is 10.

Y(x0) = 22.405 + 3.619 x0 = 22.405 + 3.619×10 = 58.595

Find a 95% confidence interval for the predicted strain when the stress is 10.

Y(10) = (a + bX) ± tα/2, (n−2)·sY∣∣X

√

1

n+ (X−X)2

SXX

+ 1

From Table 6, t0.025, 8 = 2.306

Y(10) = 58.595 ± 2.306×11.812×

√

1

10+ (10−13.4)2

344.4+ 1

= 58.595 ± 29.009

∴ 29.586 6 Y(10) 6 87.604 with 95% confidence

65


Find a 95% confidence interval for the mean strain when the stress is 10.

µY∣∣X

= (a + bX) ± tα/2, (n−2)·sY∣∣X

√

1

n+ (X−X)2

SXX

= 58.595 ± 2.306×11.812×

√

1

10+ (10−13.4)2

344.4

= 58.595 ± 9.956

∴ 48.639 6 µY∣∣X

6 68.551 with 95% confidence

Determine the sample correlation coefficient.

r = SXY√

SXX·SYY

= 1246.4√344.4×5626.9

= 0.895

66


0

20

40

60

80

100

120

5 10 15 20 25 30

Y

X

95% confidence interval

for the mean strain

when the stress is 10

95% confidence interval

for the predicted strain


Predicted strain


Y(X)=22.405+3.619 X

67

probability and statistics - unene.ca · probability and statistics — probability ©wei-chau xie...

Documents