curso nptel - reliability engineering

RELIABILITY ANALYSIS IN CIVIL ENGINEERING SYSTEMS Chapter 1: Reliability based Methods in Civil Engineering

Risk, terminology, randomness, uncertainty, modeling uncertainty,

Engineering judgment, introduction to probability, and examples

Chapter 2: Statistics and Probability

Histogram and frequency diagram, measures of variability, probability theory,

Conditional probability, random variables, probability mass and density functions,

Moments of distribution, Bayes theorem, examples

Chapter 3: Random Field Theory

Stationery process, autocovariance functions, functions of random fields.

Chapter 4: Sampling

Sampling techniques, concepts of sampling, sampling plans, decisions based on

Samplings.

Chapter 5: Reliability Analysis

Levels of reliability, loads and resistances, reliability methods, first order second moment

(FOSM) method, Hasofer-Lind approach, comparative discussion.

Chapter 6: Simulation Methods

Basis of simulations methods, random number generation

Chapter 7: Fault Tree Analysis

Decision making, branching, use of fault tree and event tree analysis

Chapter 8: System reliability

System analysis, theoretical models, use of event tree and fault tree analyses and

examples.

Learning objectives

1. Understanding various terms connected with

uncertainties in engineering design

2. Understanding the significance of incorporating

uncertainties in design

3. Understanding the use of probability considerations and

solve some problems

1

Reliability based methods in Civil Engineering

Motivation and Highlights

Motivation:

Importance of understanding the role of uncertainties in

engineering design

Highlights

Knowledge of terms such as uncertainty, randomness modeling

probability concepts, Bayes theorem

1

1. 1. Introduction

The trend in civil engineering today more than ever before is to provide

1. economical or robust design at certain levels of safety,

2. use new materials in construction. When newer materials are being used in civil

engineering design, there is a need to understand to what extent the structure is

safe, and

3. consider uncertainties in design. One has to recognize that there are many

processes such as data collection, analysis and design in civil engineering systems

which are random in nature. Design of many facilities such as: buildings,

foundations, bridges, dams, highways, airports, seaports, offshore structures,

tunnels, sanitary landfills, excavation etc. need to address the design issues

rationally.

The loading in civil engineering systems are completely unknown. Only some of the

features of the loading are known. Some of the examples of loading are frequency and

occurrence of earthquakes, movement of ground water, rainfall pattern, wind and ice

loadings etc. All these loading are random in nature, and at times they create overloading

situation. What we have been doing so far can be schematically shown as follows:

Sampling

Testing

Formula

Experience

What is the extent of sampling ?

What is the extent methods represent the actual field condition ?

Assumption ?

Build with confidence

Sampling

Testing

Formula

Experience

What is the extent of sampling ?

What is the extent methods represent the actual field condition ?

Assumption ?

Build with confidence

At all stages indicated above, there is an element of uncertainty with regard to the

suitability of the site in terms of soils, construction materials, which we transfer to a

different level using a set of expressions to obtain the desired quantities such as the floor

capacity, allowable loads in buildings etc.

1

1.2. Probability of failure and reliability

The failure of civil engineering systems is a consequence of decisions making under

uncertain conditions and different type of failures such as temporary failures,

maintenance failures, failures in design, failure due to natural hazards need to be

addressed. For example, a bridge collapses which is a permanent failure, if there is a

traffic jam on the bridge, it is a temporary failure. If there is overflow in a filter or a pipe

due to heavy rainfall, it is a temporary failure. Thus definition of failure is important. It is

expressed in terms of probability of failure and is assessed by its inability to perform its

intended function adequately on demand for a period of time under specific conditions.

The converse of probability of failure is called reliability and is defined in terms of the

success of a system or reliability of a system is the probability of a system performing its

required function adequately for specified period of time under stated conditions.

1. Reliability is expressed as a probability

2. A quality of performance is expected

3. It is expected over a period of time

4. It s expected to perform under specified conditions

1.2.1 Uncertainties in Civil engineering

In dealing with design, uncertainties are unavoidable. Uncertainties are classified into

two broad types. Those associated with natural randomness and those associated with

inaccuracies in our prediction and estimation of reality. The former type is called aleatory

type where as the latter is called epistemic type. Irrespective of the classification

understanding the nature of randomness is necessary. The nature of the first type arising

out of nature (for example, earthquake and rainfall effects) needs to be handled rationally

in design as it can not altered and the second one needs to be reduced using appropriate

prediction models and sampling techniques.

The response of materials such as concrete, soil and rock to loading and unloading is of

primary concern to the civil engineer. In all types of problems, the engineer is often

dealing with incomplete information or uncertain conditions. It is necessary for the

2

engineer to be aware of many assumptions and idealizations on which methods of

analysis and design are based. The use of analytical tools must be combined with sound

engineering judgment based on experience and observation.

In the last two decades the need for solving complex problems has led to the development

and use of advanced quantitative methods of modeling and analysis. For example, the

versatile finite element method has proved to be valuable in problems of stability,

deformation, earthquake response analysis etc. The rapid development of computers and

computing methods has facilitated the use of such methods. However, it is well known

that the information derived from sophisticated methods of analysis will be useful only if

comprehensive inputs data are available and only if the data are reliable.

Thus, the question of uncertainty and randomness of data is central to design and analysis

in civil engineering.

Decisions have to be made on the basis of information which is limited or incomplete. It

is, therefore, desirable to use methods and concepts in engineering planning and design

which facilitate the evaluation and analysis of uncertainty. Traditional deterministic

methods of analysis must be supplemented by methods which use the principles of

statistics and probability. These latter methods, often called probabilistic methods, enable

a logical analysis of uncertainty to be made and provide a quantitative basis for assessing

the reliability of foundations and structures. Consequently, these methods provide a

sound basis for the development and exercise of engineering judgment. Practical

experience is always important and the observational approach can prove to be valuable;

yet, the capacity to benefit from these is greatly enhanced by rational analysis of

uncertainty.

1.2.2 Types of uncertainty

There are many uncertainties in civil geotechnical engineering and these may be

classified into three main groups as follows:

(a) The first group consists of uncertainties in material parameters such as modulus of

concrete, steel stability of concrete and steel in different condition such as tension and

3

flexure, soil unit weight, cohesion, angle of internal friction, pore water pressure,

compressibility and permeability.

For example in a so-called homogeneous soil, each parameter may vary significantly.

Moreover, natural media, i.e. earth masses are often heterogeneous and an isotropic and

the soil profile is complex due to discontinuities and minor geological details.

(b) The second group consists of uncertainties in loads. Under static loading conditions,

one is concerned with dead and live load and there are usually more uncertainties in

relation to live loads. Structures and soil masses may also be subjected to dynamic loads

from earthquakes, wind and waves. Significant uncertainties are associated with such

random loads. Often the uncertainties associated with static loads may be negligible in

comparison to those associated with material parameters. On the other hand, uncertainties

associated with dynamic loads may be of the same order of magnitude or even greater

than those associated with material parameters. It should also be noted that under

dynamic loads, the magnitude of material parameters may change significantly.

For example, the shear strength of a soil decreases during cyclic loading and, as such,

there are additional uncertainties concerning geotechnical performance.

(c) The third group consists of uncertainties in mathematical modeling and methods of

analysis. Each model of soil behavior is based on some idealization of real situations.

Each method of analysis or design is based on simplifying assumptions and arbitrary

factors of safety’s are often used.

1.3 Deterministic and probabilistic approaches

1.3.1. Deterministic approach

An approach based on the premise that a given problem can be stated in the form of a

question or a set of questions to which there is an explicit and unique answer is a

deterministic approach. For example, the concept that unique mathematical relationships

govern mechanical behavior of soil mass or a soil structure system.

4

In this method of analysis or design one is concerned with relatively simple cause and

effect relationship. For each situation it is assumed that there is a single outcome; for

each problem a single and unique solution. Of course, one may not be able to arrive at the

exact solution and also unique solution may not exist. In such circumstances a

deterministic approach aims at obtaining approximate solution. Empirical and semi-

empirical methods have always been used in civil engineering although with varying

degrees of success. Finally, in deterministic method of analysis, uncertainty is not

formally recognized or accounted for one is not concerned with the probabilistic outcome

but with well defined outcomes which may or may not occur, that is, either a 100%

probability of occurrence or 0% without intermediate value.

For example, one may arrive at the conclusion that a foundation will be safe on the basis

that the safety factor, F, has a magnitude greater than one. On the other hand, one may

conclude that a foundation or a slope is not safe on the basis that the magnitude of the

factor of safety F is less than one. A given magnitude of F. e.g. F = 2.5 represents a

unique answer to a problem posed in specific terms with certain unique values of loads

and of shear strength parameters. In conventional analysis one is not concerned with the

reliability associated with this unique value.

1.3.2. Probabilistic approach

A probabilistic approach is based on the concept that several or varied outcomes of a

situation are possible to this approach uncertainty is recognized and yes/no type of

answer to a question concerning geotechnical performance is considered to be simplistic.

Probabilistic modeling aims at study of a range of outcomes given input data.

Accordingly the description of a physical situation or system includes randomness of data

and other uncertainties. The selected data for a deterministic approach would, in general

not be sufficient for a probabilistic study of the same problem. The raw data would have

to be organized in a more logical way. Often additional data would be for meaningful

probabilistic analysis.

A probabilistic approach aims determining the probability p, of an outcome, one of many

5

that may occur, The probability would be any percentage between p = 0% and p=100%

or any fraction between p = 0 and p=1. In a specific problem the number of likely

outcomes may be limited and it may be possible to consider the probability of each

outcome.

1.4 Risk and reliability

In engineering practice, we routinely encounter situations that involve some event

that might occur and that, if it did, would bring with it some adverse consequence. We

might be able to assign probability to the occurrence of the event and some quantified

magnitude or cost to the adversity associated with its occurrence. This combination of

uncertain event and adverse consequence is the determinant of risk. In engineering

practice to assess risk, three things need to be defined.

1. Scenario,

2. Range of consequences,

3. Probability of the event’s leading to the consequences.

Based on the above, the risk analysis attempts to answer three questions:

1. What can happen?

2. How likely is it to happen?

3. Given that it occurs, what are the consequences?

Thus, in engineering, risk is usually defined as comprising:

A set of scenarios (or events), Ei, i= 1,…..,n;

Probabilities associated with each element, pi and

Consequences associated with each element, ci.

The quantitative measure of this risk might be defined in a number of ways.

In engineering context, risk is commonly defined as the product of probability of failure

and consequence, or expressed another way, risk is taken as the expectation of adverse

outcome:

Risk= (probability of failure x consequence) = (pc) -------------------(1)

6

The term risk is used, when more than one event may lead to an adverse outcome then

above equation is extended to be the expectation of consequence over that set of events:

Risk = -------------------(2) ii icp∑In which pi is ith probability and ci is corresponding consequence.

Another measure is called reliability which is defined as (1-probablity of failure) and

expresses probability of safety. It is called reliability and is related to reliability index ( β)

which is a useful way of describing the boundary between the safe and unsafe

boundaries.

1.4.1 Acceptable Risks

In engineering as in other aspects, lower risk usually means higher cost. Thus we are

faced with question “how safe is safe enough” or “what risk is acceptable?”. In the

United States, the government acting through Congress has not defined acceptable levels

of risk for civil infrastructure, or indeed for most regulated activities. The setting of

‘reasonable’ risk levels or at least the prohibition of ‘unreasonable’ risks is left up to

regulatory agencies, such as the Environmental Protection Agency, Nuclear Regulatory

Commission, or Federal Energy Regulatory Commission. The procedures these

regulatory agencies use to separate reasonable from unreasonable risks vary from highly

analytical to qualitatively procedural.

People face individual risks to health and safety every day, from the risk of catching a

dread disease, to the risk of being seriously injured in a car crash (Table 1). Society faces

risks that large numbers of individuals are injured or killed in major catastrophes (Table

2). We face financial risks every day, too, from the calamities mentioned to the risk of

losses or gains in investments. Some risks we take on voluntarily, like participating in

sports or driving an automobile. Others we are exposed to involuntarily, like a dam

failing upstream of our home or disease.

7

Table 1 – Average risk of death of an individual from various human caused and natural accidents (US Nuclear regulatory Commission 1975)

Accident Type Total Number Individual Chance Per Year

Motor Vehicles 55,791 1 in 4,000

Falls 17,827 1 in 10,000

Fires and Hot Substances 7,451 1 in 25,000

Drowning 6,181 1 in 30,000

Firearms 2,309 1 in 100,000

Air Travel 1,778 1 in 100,000

Falling Objects 1,271 1 in 160,000

Electrocution 1,148 1 in 160,000

Lightning 160 1 in 2,500,000

Tornadoes 91 1 in 2,500,000

Hurricanes 93 1 in 2,500,000

All Accidents 111,992 1 in 1,600

Table 2 : Average risk to society of multiple injuries of deaths from various human-caused and natural accidents (US Nuclear Commission 1975)

Type of event Probabilities of 100 or more fatalities

Probability of 100 or more

Human Caused

Airplane Crash 1 in 2 yrs 1 in 2000 yrs

Fire 1 in 7 yrs 1 in 200 yrs

Explosion 1 in 16 yrs 1 in 120 yrs

Tonic gas 1 in 100 yrs 1 in 1000 yrs

Natural

Tornado 1 in 5 yrs Very small

Hurricane 1 in 5 yrs 1 in 25 yrs

Earthquake 1 in 20 yrs 1 in 50 yrs

Meteorite impact 1 in 100,000 yrs 1 in I million yrs

8

Four observations made in literature on acceptable risk are:

1. The public is willing to accept ‘voluntary risks roughly 1000 times greater than

‘involuntary’ risks’,

2. Statistical risk of death from disease appears to be a psychological yardstick for

establishing the level of acceptability of other risks

3. The acceptability of risk appears to be proportional to the third—power of the

benefits,

4. The societal acceptance of risk is influenced by public awareness of the benefits

of an activity, as determined by advertising, usefulness and the number of people

participating. The exactness of these conclusions has been criticized, but the

insight that acceptable risk exhibits regularities is important.

1.4.2. Risk perception

People view risks not only by whether those risks are voluntary or involuntary, or by

whether the associated benefits outweigh the dangers but also along other dimensions.

Over the past twenty years researchers have attempted to determine how average citizens

perceive technological risks. Better understanding of the way people perceive risk may

help in planning projects and in communication. The public’s perception of risk is more

subtle than the engineers.

Table 3. Risk perception

Separation of risk perception along two factor dimension Factor I : Controllable vs. Uncontrollable

Controllable Uncontrollable Not dread Dread Local Global Consequences not fatal Consequences fatal Equitable Not equitable Individual Catastrophic Low risk to future generation High risk to future generation Easily reduced Not easily reduced Risk decreasing Risk increasing

9

Voluntary Involuntary Factor II : Observable vs. Unobservable

Observable Unobservable Known to those exposed Unknown to those exposed Effect immediate Effect delayed Old risk New risk Risk known to science Risk unknown to science

1.5.F-N Charts

In a simple form, quantitative risk analysis involves identification of risks and

damages/fatalities. It is recognized that in many cases, the idea of annual probability of

failure, depending on F-N relationships (frequency of fatalities (f), and number of

fatalities (N)) is a useful basis. In UK, risk criteria for land use planning made based on

F-N curves (frequency - Number of fatalities) on annual basis suggest lower and upper

limits of 10-4 and 10-6 per annum for probability of failure or risk. Guidance on rRisk

assessment is reasonably well developed in many countries such as USA, Canada and

Hong Kong. Whitman (1984) based on the collected data pertaining to performance of

different engineering systems categorized these systems in terms of annual probability of

failure and their associated failure consequences, as shown in Fig.1.

10

Figure 1. Annual probabilities of failure and consequence of failure for various engineering projects (Whitman 1984)

Some guidelines on tolerable risk criteria are formulated by a number of researchers and

engineers involved in risk assessment. They indicate that the incremental risk should not

be significant compared to other risks and that the risks should be reduced to "As Low As

Reasonably Practicable" (ALARP) as indicated in Fig.2. Figure 2 shows a typical f-N

diagram adopted by Hong Kong Planning Department (Hong Kong government planning

department 1994).

11

1.E-09

1.E-08

1.E-07

1.E-06

1.E-05

1.E-04

1.E-03

1.E-02

1 10 100 1000 10000

Number of fatalities, N

Freq

uenc

y of

acc

iden

ts w

ith N

or m

ore

fata

litie

s

Unacceptable

ALARP

Acceptable

Figure 2. F–N diagram adopted by Hong Kong Planning Department for planning purposes.

The slope of the lines dividing regions of acceptability expresses a policy decision

between the relative acceptability of low probability/high consequence risks and high

probability/low consequence risks. The steeper the boundary lines, the more averse are

the policy to the former. The boundary lines in the Hong Kong guidelines are twice as

steep (in log-log space) as the slopes in the Dutch case. Also that, in the Hong Kong case

there is an absolute upper bound of 1000 on the number of deaths, no matter how low the

corresponding probability.

The role of probabilistic considerations is recognized in Corps of Engineers USA and

guidelines on reliability based design of structures is suggested. Fig.3 presents the

classification. The annual probability of failure corresponds to an expected factor of

safety E(F), which is variable and the variability is expressed in terms of standard

deviation of factor of safety σF. If factor of safety is assumed to be normally distributed,

reliability index (β) is expressed by

12

F

)0.1)F(E(σ−

=β (3)

The guidelines present the recommendations in terms of probability of failure pf, or

reliability index (β).

Fig. 3 Relationship between reliability index (β) and probability of failure (pf) (Phoon 2002) (adapted from US Army Corps of Engineers 1997).

1.6 Role of consequence cost

The role of consequence costs is realised in recent times and has been receiving

considerable attention in the geotechnical profession. Recently, Joint Committee on

Structural Safety (JCSS 2000) presented relationships between reliability index (β),

importance of structure and consequences of failure. The committee divided

13

consequences into 3 classes based on risk to life and/or economic loss, and they are

presented in Tables 4 and 5 respectively.

From these tables, it can be inferred that if the failure of a structure is of minor

consequence (i.e., C*≤2), then a lower reliability index may be chosen. On the other hand,

if the consequence costs are higher (i.e., C* = 5 to 10) and if the relative cost of safety

measures is small, higher reliability index values can be chosen. It can also be noted

from the tables that reliability index in the range of 3 to 5 can be considered as acceptable

in design practice.

Table 4. Relationship between reliability index (β), importance of structure and consequences (JCSS 2000)

Relative cost of safety measure

Minor consequence of failure

Moderate consequence of failure

Large consequence of failure

Large β = 3.1 β = 3.3 β = 3.7

Normal β = 3.7 β = 4.2 β = 4.4

Small β = 4.2 β = 4.4 β = 4.7

Table 5. Classification of consequences (JCSS 2000)

Class Consequences C* Risk to life and/or economic consequences

1 Minor ≤2 Small to negligible and small to negligible

2 Moderate 2 < C* ≤ 5 Medium or considerable

3 Large 5 < C* ≤ 10 High or significant

where C* is the normalized consequence cost (normalized with respect to initial cost).

From Tables 4 and 5 the following aspect points are clear.

14

1. The targeted reliability indices vary from 3 to 5, depending on the expected level of

performance.

2. Consequence costs can also be considered in the analysis. If the consequence costs

are not significant compared to initial costs (C*≤2) (for example slope design in a

remote area), lower reliability index can be allowed, whereas higher reliability index

is required where the consequence costs are high (for example slope in an urban

locality).

Axioms of probability

A popular definition of probability is in terms of relative frequency of an outcome A

occurs T times in N equally likely trials,

P[A] = T / N

It is implies that if large number of trials were conducted this probability is likely. As the

concept of repeated trials does not exist in civil engineering, subjective interpretation is

considered, which it implies that it is a measure of information as to the likelihood of an

occurrence of an outcome. The three axioms of probability are given by

Axiom I: 1][0 ≤≤ AP

Axiom II: The certainty of outcome is unity i.e. P[A] = 1

Axiom III: This axiom requires the concept of mutually exclusive outcomes. Two

outcomes are mutually exclusive, if they cannot occur simultaneously. The axiom states

that

P[A1+A2+……..+AN] = P[A1] + P[A2] + P[A3] + …….+ P[AN] (4)

15

As an example consider the design of structure. After construction only two outcomes are

possible either success or failure. Both are mutually exclusive, they are also called

exhaustive and no other outcome is also possible. As per axiom III,

P[Success] + P[ Failure] = 1

The probability of success of the structure is reliability is given by

R + P[Failure] = 1 or R = 1 - P[Failure] (5)

Basic Probability Concept

By probability we are referring to a number of possibilities in a given situation and

identify an event relative to other events. Probability can be considered as a numerical

measure of likelihood of occurrence of an event, relative to a set of alternatives. First

requirement is to

1. Identify all possibilities on a set

2. Identify the event of interest

In this context elements of set theory are very useful.

Elements of set theory

Many Characteristics of probability can be understood more clearly from notion of sets

and sample spaces.

Sample space

Sample space is a set of all possibilities in a probabilistic problem. This can be further

classified as continuous sample space and discrete sample space. Again discrete sample

space can be further classified as finite and infinite cases.

16

Discrete Sample Space

Example for finite case:

1. The winner in a competitive bidding

2. The number of raining days in a year

Example for Infinite case:

1. Number of flaws in a road

2. Number of cars crossing a bridge

Sample point is a term used to denote each of the individual possibilities is a sample point

Continuous Sample Space

If number of sample points is effectively infinite, then it can be called as continuous

sample space. For example, the bearing capacity of clay deposit varies from 150 to 400

kPa and any value between them is a sample point.

Venn diagram

A sample space is represented by a rectangle, an event (E) is represented by a closed

region. The part outside is complimentary event E

EE

EE

Combinations of events

17

Example

E1 E2

Occurrence of rainfall

Occurrence of earthquake

E1E2 Intersection

E1 E2E1 E2

Occurrence of rainfall

Occurrence of earthquake

E1E2 Intersection

If an event E1 occurs n1 times out of n times , it does not occur n2 times. i.e. n2 = n- n1 for

which the probability of non-occurrence being nn2

P[ ] = 21 EE ∪nn

nn 21 + = P[E1] + P[E2] (6)

Multiplication rule and Statistical Independence

The occurrence (or non-occurrence) of one event does not affect the probability of other

event, the two events are statistically independent

If they are dependent then

[ ] [ ]

[ ]11

2

22

121

EPEE

P

EPEE

PEEP

⎥⎦

⎤⎢⎣

⎡=

⎥⎦

⎤⎢⎣

⎡=

If they are independent then, [ ] [ ] [ ]2121 EPEPEEP = (7)

18

By multiplication rule

[ ] [ ] ⎥⎦⎤

⎢⎣⎡=⎥

⎦

⎤⎢⎣

⎡AE

PAPEPEAP i

ii

(8)

The above equation gives the probability of occurrence of knownisEAPif

AE

Pi

i⎥⎦

⎤⎢⎣

⎡⎥⎦⎤

⎢⎣⎡ I.

[ ]

[ ]AP

EPEAP

AE

Pi

ii⎥⎦

⎤⎢⎣

⎡

=⎥⎦⎤

⎢⎣⎡ (9)

Hence ][][

][11

2

21 EPEP

EEP=

∪

---- Multiplication Rule ][][][ 2121 EPEPEEP =∪

A generalized multiplication rule is

][].........[][][]........[ 321321 NN APAPAPAPAAAAP = (10)

Conditional Probability

The occurrence of an event depends on the occurrence (or non-occurrence) of another

event. If this dependence exists, the associated probability is called conditional

probability. The conditional probability E1 assuming E2 occurred ⎟⎟⎠

⎞⎜⎜⎝

⎛

2

1

EEP means the

likelihood of realizing a sample point in E1 assuming it belongs to E2 (we are interested in

the event E1 within the new sample space E2)

19

( )( )2

21

2

1

EPEEP

EEP =⎟⎟

⎠

⎞⎜⎜⎝

⎛

E1E2

E1 E2

E1E2

E1 E2

Total probability theorem and Bayesian Probability There are N outcomes of an expression A1, A2, A3………….., An which are mutually

exclusive and collectively exhaustive such that

[ ] 11

=∑=

N

iiAP

For the sample space N=5 and there is an other event B which intersects A2, A3 ,A4 and

A5 but not A1. For example, the probability of joint occurrence B and A2

= [ ] [ ] ⎥⎦

⎤⎢⎣

⎡=

222 A

BPAPBAP

The probability of joint occurrence of B is dependent on the outcome of A2 having

occurred. Since A2 precipitates that past of B that it overlaps, It is said to be a prior event.

The occurrence of the part of B that overlaps A2 is called posterior. Now considering that

we need to determine the occurrence of B as it is a joint event with A2, A3 ,A4 and A5,

one can write,

20

[ ] [ ] 521

tofromisiWhere

N

i ii A

BPAPBP ∑=

⎥⎦

⎤⎢⎣

⎡= (11)

The above equation is called Total Probability Equation.

We have already examined that

[ ] [ ] [ ]

[ ] [ ] [ ]

[ ]

[ ]BPABPAP

BA

P

BA

PBPABPAPBAP

HenceABPAP

BAPBPABP

ii

i

iii

⎥⎦

⎤⎢⎣

⎡

=⎥⎦⎤

⎢⎣⎡

⎥⎦⎤

⎢⎣⎡=⎥⎦

⎤⎢⎣⎡=

⎥⎦⎤

⎢⎣⎡=⎥⎦

⎤⎢⎣⎡=

Using total probability theorem which states that

[ ] [ ]

[ ]

[ ]∑=

⎥⎦

⎤⎢⎣

⎡

⎥⎦

⎤⎢⎣

⎡

=⎥⎦⎤

⎢⎣⎡

⎥⎦

⎤⎢⎣

⎡=

N

i i

ii

i

ii

ABPAP

ABPAP

BA

PHence

ABPAPBP

1

(12)

This is called Bayesian theorem. This equation is very useful in civil engineering and

science where in based on the initial estimates, estimates of outcome of an event can be

made. Once the results of the outcome known, this can be used to determine the revised

estimates. In this probability of the event B is estimated knowing that its signatures are

available in events Ai .

21

Example 1

Three contractors A , B, and C are bidding for a project. A has half the chance that B has.

B has two thirds as likely as C for the award of contract. What is the probability of each

contractor, if only he gets the contract?

Answer:

There are three contractors, one will be successful

P[A + B + C] = 1

As they are mutually exclusive, P[A] + P[B] + P[C] = 1

But, P[A] = 21 P[B] and P[B] =

32 P[C]

∴21 P[B]+ P[B]+

23 P[B] = 1

P[B] = 31 , P[A] =

61

31

21

=× and P[C] = 21

31

23

=×

Example 2

A load of 100 kg can be anywhere on the beam, consequently the reaction RA and RB can

be anywhere between 0 to 100 depending on the position of load.

1. Identify the possibilities of )2010( KgRA ≤≤ and RA > 50

2. Identify the event of interest

Answer:

Possibilities are 1. )2010( KgRA ≤≤

2. RA > 50

Event of interest: 1. P 1.010010)2010( ==≤≤ KgRA

2. P (RA > 50) = 5.010050

=

1

Example 3

A contractor is planning to purchase of equipment , including bulldozer needed for a

project in a remote area. He knows that bulldozer can last at least six months without any

breakdown. If he has 3 bulldozers, what is the probability that there will be at least one

bulldozer after six months.

Answer:

If denote the condition of each bulldozer after six months as G Good and B Bad,

the possible combinations are:

1 2 3

G G G

G G B

G B G

G B B

B G G

B B G

B G B

B B B

There are at least three combinations out of 8, where there is possibility of having at least

one bull dozer after six months.

∴The probability of having at least one bulldozer in good condition after six months is

83 .

Example 4

In a bore log of interest,the soil profile the results are as follows; sand = 30%’ silty sand

= 25%, silty clay = 25% and clay = 20%. This is the arrived result of bore logs at many

places. However there is some doubt that not all soil sampling is reliable. The adequacy

2

is represented by sand = 27%, silty sand = 10%, silty clay = 30% and clay = 50%. What

is the reliability of information from sampling at one of the random points?

Answer

1 0 % s iltys a n d

3 0 % s a n d

2 7 % s a n d 3 0 % s iltyc la y

5 0 % c la y

2 5 % s iltys a n d

2 5 % s iltyc la y 2 0 % c la y

1 0 % s iltys a n d

3 0 % s a n d


5 0 % c la y

2 5 % s iltys a n d


1 0 % s iltys a n d

3 0 % s a n d


5 0 % c la y

2 5 % s iltys a n d


Using the total probability theorem,

[ ] [ ]

[ ][ ][ ][ ] 20.0

25.025.0

3.0

====

⎥⎦

⎤⎢⎣

⎡=

C

SC

SS

S

iii

APAPAPAP

ABPAPBAP

If B denotes the sample that is considered reliable, P[B] is given by

75.0

10.0

27.0

=⎥⎦

⎤⎢⎣

⎡

=⎥⎦

⎤⎢⎣

⎡

=⎥⎦

⎤⎢⎣

⎡

SC

SS

S

ABP

ABP

ABP

3

50.0=⎥⎦

⎤⎢⎣

⎡

CABP

[ ] .05.0*2.030.0*25.010.0*25.027.0*3.0 =+++=BP 28

Example 5

A fair coin [ P[head] = P[Tail] = ½ ] is tossed three times. What is the probability of

getting three heads.

Answer

The three events are independent, hence

1. Probability of getting three heads P[Three heads] = ½ X ½ X ½ = 1/8

2. Probability of getting two heads and one tail P[Two Heads + Tail] = 3/ 8

START

H

H

T

TT

T T T TH

H H H

H

HHH HHT HTH HTT THH THT TTH TTT

START

H

H

T

TT

T T T TH

H H H

H

HHH HHT HTH HTT THH THT TTH TTT

Example 6

A sample from a pit containing sand is to be examined, if the pit can furnish adequate

fine aggregate for making road. Specifications suggest that the pit should be rejected if at

least one lot is not satisfactory of the four lots made from the pit. What is the probability

that sand from the pit will be accepted, if 100 bags are available and contain fine

aggregate of poor quality? Find the probability of acceptance.

4

Answer

Let Ei be the probability of finding that the sand can be accepted. For four lots, each time

sampling is done it is not replaced. After four trials,

P[acceptance] = 812.09792

9893

9994

10095

=×××

If there are 50 bags


4843

4944

5045

=×××

If there are only 25 bags available,


2318

2419

2520

=×××

Hence it can be noted that sample size has influence on the acceptance criteria.

Example 7

Consider a 100 Km highway; assume that the road condition and traffic conditions are

uniform throughout, So that the accidents are equally likely anywhere on the highway.

(a) Find the probabilities of events A and B if A = an event denoting accidents between 0

to 30 km and B = an event denoting accidents between 20 to 60 km.

Answer

Since the accidents are equally likely along the highway , it may be assumed that the

probability of an accident in a given interval of highway is proportional to the distance of

travel . If the accident occurs along 100 km highway then;

P[A] = 10030 and P[B] =

10040

5

(b) If an accident occurs in the interval (20, 60) what is the probability that an accident

occurs between 20 to 30 Km from A

P(A / B) = 4010 =

)()(

BPBAP ∩ = 25.0

10040

10010

=

Example 8

Consider a chain system of two links. If the applied force is 1000 kg and if the strength

anywhere is less than 100 kg, it will fail. However probability of this happening to any

link is 0.05. What is the probability of failure of chain?

Answer

If E1 and E2 denote the probability of failure of links 1 and 2 , then failure of the chain is

P[E1∪E2] = P[E1] + P[E2] – P[E1E2]

= 0.5 + 0.5 – P(E1 / E2) P[E1]

= 0.5 + 0.5 - 0.05 P(E2 / E1)

= 1- 0.05 P(E2 / E1)

This depends on the mutual dependence of E1and E2. For example if the links randomly

selected from two suppliers then E1 and E2 may be assumed to be statistically

independent. In such a case,

P[E2 / E1] = P[E2] = 0.05 hence

P[E1 / E2] = P[E1] = 0.05

P[E1∪E2] = 0.1 – 0.05 * 0.05 = 0.0975

0 20 30 60 100

A B

0 20 30 60 100

A B

6

If the links are form the same manufacturer P[E1/ E2] = 1.0

P(E1∪E2) = 0.1-0.05*0.1 = 0.05 Probability of one of the links

Hence the failure probability of the chain system ranges from 0.05 to 0.095 depending on

the conditional probability P[E2 / E1]

Example 9

The following problem demonstrates the use of Bayes theorem in updating the

information. Consider that Mohammad Gazini while planning his attack on India from

West Coast, assumed that he had 1% chance of winning (probability of win) if he

attacked. This represents the level of confidence in the first trial. Probability of win in the

subsequent trials can be obtained from Bayes theorem. Probability of win is improved in

the second and subsequent attacks and finally at the end of 17 trials, the probability of

win becomes 0.99 as given in the following .

Ans:

P(A/T) = ( )

( ) ( )APATPAP

ATP

APATP

⎟⎠⎞

⎜⎝⎛+⎟

⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛

*

*

1 ( )

( ) ( )99.0*5.001.0*101.0*1

+ 0.02

2 ( )( )98.05.002.0

02.0*1++

0.04

3 96.05.004.0

04.0++

0.08

7

4 92.05.008.0

08.0++

0.15

5 85.05.015.0

15.0++

0.26

6 74.0*5.026.0

26.0+

0.41

7 59.0*5.041.0

41.0+

0.58

8 42.0*5.058.0

58.0+

0.73

9 27.0*5.073.0

73.0+

0.84

10 16.0*5.084.0

84.0+

0.91

11 09.0*5.091.0

91.0+

0.91

12 05.0*5.095.0

95.0+

0.974

13 0256.0*5.0974.0

974.0+

0.974

14 013.0*5.0987.0

987.0+

0.993

15 0065.0*5.0993.0

993.0+

0.9967

16 0098.0*5.09967.0

9967.0+

0.9983

17 005.0*5.09983.0

9983.0+

0.9991

8

Learning objectives

1. Understanding various graphical methods of analysis like

histogram and frequency diagram.

2. Understanding the significance of significance of probability

distributions

3. Understanding the importance of moments of distribution.

1


Motivation:

Importance of understanding the function of probability theory and

conditional probability in engineering design

Highlights

Knowledge of graphical representation such as Histogram and

frequency diagram, Moments of distribution, Bayesian Theorem

1

1

2.1 Introduction Civil Engineering systems deal with variable quantities, which are described in terms of random

variables and random processes. Once the system design such as a design of building is

identified in terms of providing a safe and economical building, variable resistances for different

members such as columns, beams and foundations which can sustain variable loads can be

analyzed within the framework of probability theory. The probabilistic description of variable

phenomena needs the use of appropriate measures. In this module, various measures of

description of variability are presented.

2. HISTOGRAM AND FREQUENCY DIAGRAM

Five graphical methods for analyzing variability are:

1. Histograms,

2. Frequency plots,

3. Frequency density plots,

4. Cumulative frequency plots and

5. Scatter plots.

2.1. Histograms

A histogram is obtained by dividing the data range into bins, and then counting the number of

values in each bin. The unit weight data are divided into 4- kg/m3 wide intervals from to 2082 in

Table 1. For example, there are zero values between 1441.66 and 1505 Kg / m3 (Table 1), two

values between 1505.74 kg/m3 and 1569.81 kg /m3 etc. A bar-chart plot of the number of

occurrences in each interval is called a histogram. The histogram for unit weight is shown on

fig.1.

Table 1 Total Unit Weight Data from Offshore Boring

Number Depth

(m)

Total unit weight

(Kg / m3) (x-βx)2

(Kg / m3)2(x-βx)3

(Kg / m3)3Depth

(m)

Total unit weight

(Kg / m3)1 0.15 1681.94 115.33 -310.76 52.43 1521.75 2 0.30 1906.20 2050.36 23189.92 2.29 1537.77 3 0.46 1874.16 1388.80 12936.51 1.52 1585.83 4 1.52 1585.83 1209.39 -10503.30 6.71 1585.83

2

5 1.98 1617.86 716.03 -4791.12 13.72 1585.83 6 2.29 1537.77 2188.12 -25573.47 31.09 1585.83 7 5.03 1826.10 637.53 4028.64 5.79 1601.85 8 5.79 1601.85 946.69 -7277.19 8.38 1601.85 9 6.71 1585.83 1209.39 -10503.30 11.43 1601.85 10 7.62 1633.88 517.40 -2947.40 15.24 1601.85 11 8.38 1601.85 946.69 -7277.19 24.84 1601.85 12 9.45 1617.86 716.03 -4791.12 37.03 1601.85 13 10.52 1617.86 716.03 -4791.12 1.98 1617.86 14 11.43 1601.85 946.69 -7277.19 9.45 1617.86 15 12.19 1617.86 716.03 -4791.12 10.52 1617.86 16 13.72 1585.83 1209.39 -10503.30 12.19 1617.86 17 15.24 1601.85 946.69 -7277.19 18.90 1617.86 18 18.44 1649.90 352.41 -1649.90 37.19 1617.86 19 18.90 1617.86 716.03 -4791.12 40.23 1617.86 20 21.79 1697.96 44.85 -76.89 7.62 1633.88 21 21.95 1746.01 27.23 36.84 27.89 1633.88 22 24.84 1601.85 946.69 -7277.19 34.14 1633.88 23 24.99 1665.92 217.85 -802.52 46.48 1633.88 24 27.89 1633.88 517.40 -2947.40 18.44 1649.90 25 30.94 1697.96 44.85 -76.89 24.99 1665.92 26 31.09 1585.83 1209.39 -10503.30 43.43 1665.92 27 37.03 1633.88 517.40 -2947.40 98.15 1665.92 28 37.19 1601.85 946.69 -7277.19 0.15 1681.94 29 40.23 1617.86 716.03 -4791.12 49.38 1681.94 30 43.43 1617.86 716.03 -4791.12 21.79 1697.96 31 46.48 1665.92 217.85 -802.52 30.94 1697.96 32 49.38 1633.88 517.40 -2947.40 82.91 1697.96 33 52.43 1681.94 115.33 -310.76 61.42 1713.98 34 58.37 1521.75 2578.97 -32714.50 85.80 1729.99 35 61.42 1858.14 1106.88 9201.00 21.95 1746.01 36 64.47 1713.98 8.01 -4.81 76.66 1746.01 37 73.61 1794.07 297.94 1284.68 82.75 1746.01 38 76.66 1826.10 637.53 4028.64 79.71 1762.03 39 79.80 1746.01 27.23 36.84 89.00 1778.05 40 82.75 1762.03 84.90 198.63 64.47 1794.07 41 82.91 1746.01 27.23 36.84 94.95 1794.07

3

42 85.80 1697.96 44.85 -76.89 104.09 1794.07 43 89.00 1729.99 1.60 0.00 125.43 1794.07 44 91.90 1778.05 176.20 581.47 131.67 1794.07 45 94.95 2002.31 4800.73 83118.19 101.04 1810.09 46 98.15 1794.07 297.94 1284.68 104.24 1810.09 47 101.04 1665.92 217.85 -802.52 5.03 1826.10 48 104.09 1810.09 451.72 2401.17 73.61 1826.10 49 104.24 1794.07 297.94 1284.68 113.23 1826.10 50 107.29 1810.09 451.72 2401.17 119.33 1826.10 51 110.19 1858.14 1106.88 9201.00 122.53 1826.10 52 110.34 1986.29 4262.51 69531.33 116.28 1842.12 53 113.23 1874.16 1388.80 12936.51 119.48 1842.12 54 116.28 1826.10 637.53 4028.64 125.58 1842.12 55 119.33 1842.12 856.99 6263.22 128.47 1842.12 56 119.48 1826.10 637.53 4028.64 134.72 1842.12 57 122.53 1842.12 856.99 6263.22 58.37 1858.14 58 125.43 1826.10 637.53 4028.64 107.59 1858.14 59 125.58 1794.07 297.94 1284.68 0.46 1874.16 60 128.47 1842.12 856.99 6263.22 110.34 1874.16 61 131.67 1842.12 856.99 6263.22 0.30 1906.20 62 131.67 1794.07 297.94 1284.68 137.62 1906.20 63 134.72 1842.12 856.99 6263.22 110.19 1986.29 64 137.62 1906.20 2050.36 23189.92 91.90 2002.31

The histogram conveys important information about variability in the data set. It shows the range

of the data, the most frequently occurring values, and the amount of scatter about the middle

values in the set.

There are several issues to consider in determining the number of intervals for a histogram.

1. The number of intervals should depend on the number of data points. As the number of

data points increases, the number of intervals should also increase.

2. The number of intervals can affect how the data are perceived. If too few or too many

intervals are used, then the distribution of scatter in the data will not be clear.

Experimentation with different intervals is one approach in addition to the following equation

provides an empirical guide

)(log3.31 10 nk +=

4

Where k is the number of intervals and n is the number of data points. As an example k is equal

to 7 for the unit weight data set with n equal to 64.

Table 2 – Frequency Plot Data for Total Unit Weight

Interval

Lower bound (a)

Upper bound (b)

Number of occurrences (c)

Frequency of occurrences (%) (d)

Frequency density (% / Kg/m3) (e)

Cumulative frequency (%) (f)

1441.66 1505.74 0 0 0 0

1505.74 1569.81 2 3 0.78 3

1569.81 1633.88 21 33 8.20 36

1633.88 1697.96 9 14 3.52 50

1697.96 1762.03 6 9 2.34 59

1762.03 1826.10 13 20 5.08 80

1826.10 1890.18 9 14 3.52 94

1890.18 1954.25 2 3 0.78 97

1954.25 2018.33 2 3 0.78 100

2018.33 2082.40 0 0 0 100

Σ 64 100 25

dColumnofTotalRunningfColumnaCloumnbCloumndCloumneColumn

cCloumncCloumndColumn

=

−=

=

∑∑

)/(

)/(

Figure 1 : Histogram of total unit weight

0

5

10

15

20

25

1442

1506

1570

1634

1698

1762

1826

1890

1954

2018

Total Unit Weight

Num

ber o

f occ

uren

ce

Kg/m3

n = 64

5

2.2. Frequency Plot

The frequency of occurrence in each histogram interval is obtained by dividing the number of

occurrences by the total number of data points. A bar-chart plot of the frequency of occurrence in

each interval is called a frequency plot. The interval frequencies for the unit weight data are

calculated in Table 2, and the resulting frequency plot is shown on Fig.2.

Figure 2 : Frequency Plot

05

101520253035

1442

1506

1570

1634

1698

1762

1826

1890

1954

2018

Total Unit Weight

Freq

uenc

y of

occ

uren

ce

(%)

Figure 1

Note, that the histogram and frequency plot have the same shape and convey the same

information. The frequency plot is simply a normalized version of the histogram. Because it is

normalized, the frequency plot is useful in comparing different data sets. Example frequency

plots are shown on Figs.2 through 2.5. Fig.2 which varies spatially shows the unit weight data.

Fig.3 shows an example of data that vary with time. The data are monthly average pumping rate

measurements versus time for the leak detection system in a hazardous waste landfill. The data

vary from month to month due to varying rates of leachate generation and waste placement.

Figure 3 : Frequency plot monthly average flow rate for leak detection system

05

101520253035

0 2.63 3.95 5.26 6.57 7.89 9.2 10.52

Flow Rate

Freq

uenc

y of

oc

cure

nce

%

Kg/m3

cc/sec

n = 64

n = 23

6

Fig.4 shows an example of data that vary between construction projects. The data are the ratios

or actual to estimated cost for the remediation of superfund (environmentally contaminated)

sites. The data vary between sites due to variations in site conditions, weather, contractors,

technology and regulatory constraints, Note that the majority of projects have cost ratios greater

than 1.0.

Figure 4 - Frequency plot of cost-growth factor

05

101520253035

0 1 2 3 4 5 6 7 8 9 10

Cost growth factor

Freq

uenc

y of

occ

uren

ce

%

Fig.5 shows an example of data that vary between geotechnical testing laboratories. The data are

the measured friction angles for specimens of loose Ottawa sand. Although Ottawa sand is a

uniform material and there were only minor variations in the specimen densities, there is

significant variability in the test results. Most of this variability is attributed to differences in test

equipment and procedures between the various laboratories.

Figure 5 :Frequency plot of friction angle

0

10

20

30

40

50

60

10 14 18 22 26 30 34 42 50

Friction angle

Freq

uenc

y of

occ

uren

ce

%

2.2.1. Frequency Density Plot

Another plot related to the histogram is the frequency density plot. The frequency density is

obtained by dividing the interval frequencies by the interval widths. A bar-chart plot of the

n = 102

n = 28

7

frequency density is called the frequency density plot. The objective in dividing the frequency by

the interval width is to normalize the histogram further the area below the frequency density plot

(obtained by multiplying the bar heights by their widths) is equal to 100%. This normalization

will be useful in fitting theoretical random variable models to the data.

The frequency densities for the unit weight data are calculated in Table 2 the frequency density

are % per the units for the data, which are % per Kg/m3 weight data. The resulting frequency

density plot is shown on Fig. 6.

Figure 6 : Frequeny density plot of total unit weight

020406080

100120140

1441

.7

1505

.7

1569

.8

1633

.9

1698

.0

1826

.1

1890

.2

1954

.2

2018

.3

2082

.4

Total unit weight

Freq

uenc

y D

ensi

ty

2.2.2. Cumulative Frequency Plot

The cumulative frequency plot is the final graphical tool that we will present for variability

analysis. Cumulative frequency is the frequency of data points that have values less than or equal

to the upper bound of an interval in the frequency plot. The cumulative frequency is obtained by

summing up (or accumulating) the interval frequencies for all intervals below the upper bound.

A plot of cumulative frequency versus the upper bound is called the cumulative frequency plot.

The cumulative frequencies for the unit weight data are calculated in Table 2. For example, the

cumulative frequency for an upper bound of 1634 Kg/m3 is equal to 0% + 3% + 33% = 36%. The

resulting cumulative frequency plot is shown on Fig.7.

A percentile value for the data set corresponds to the corresponding value with that cumulative

frequency. For example, the 50th percentile value for the unit weight data set is 1698 Kg/m3 (50

percent of the values are less than or equal to 1698 Kg/m3), while the 90th percentile value is

equal to 1874 Kg/m3 (Fig.7).

n = 64

Kg/m3

% /

Kg/

m3

Kg/m3

8

2.3. Data Transformations

In some cases, it is useful to transform the data before plotting it. One example is a data set of

measured hydraulic conductivity values for a compacted clay liner. The frequency plot for uses

data is shown on Fig.8. It does not convey much about the data set because the hydraulic

conductivity values range over several orders of magnitude. A more useful representation of the

data is to develop a frequency plot for the logarithm of hydraulic conductivity, as shown on

Fig.9. Now it can be seen that the most likely interval is between 10-8.4 and l0-8.2 cm/s. and that

most of the data are less than or equal to 10-7 cm/s.

Figure 8 : Frequency plot of hydraulic conductivity

0

20

40

60

80

100

0.0E+00 1.0E-06 1.8E-06 2.5E-06 3.3E-06

Hydraulic Conductivity (m / sec)

Freq

uenc

y of

occ

uren

ce

A second example of data for which a transformation is useful are undrained shear strength data

for a normally consolidated clay. A frequency plot of these data from an offshore boring in the

Figure 7 : Cumulative frequency plot of total unit weight

0

20

40

60

80

100

120

1441 1505 1569 1649 1778 1826 1874 1906 1986 2082

Total unit weight

Cum

ulat

ive

Freq

uenc

y %

50 Percentile

90 Percentile

9

Gulf of Mexico is shown in Fig10. The data exhibit substantial variability with depth, ranging

from 2000 to 20,000 Kg/m2, however, and this frequency plot is misleading because much of the

variability can be attributed to the shear strength increasing with depth. In order to demonstrate

this trend, a scatter plot of the undrained

Figure 11: Frequency plot log-hydraulic conductivity

0

5

10

15

20

25

-9 -8 -7 -6 -5

Logarithm of hydraulic conductivity, m/s

Freq

uenc

y of

occ

uren

ce, %

Shear strength versus depth is shown on Fig.10. A more useful measure of undrained strength is

to normalize it by depth, as shown in Fig.11. This scatter plot shows that the trend with depth has

now been removed from the data, and the variability in the shear strength to depth ratio is much

smaller than that in the undrained shear strength alone. A frequency plot of the shear strength to

depth ratio is shown on Fig.12.

Figure 10 : frequency plot of undrained shear strength

0

5

10

15

20

25

30

0 9765 17089 29295

Undrained shear strength

Freq

uenc

y of

occ

uren

ce %

10

2.4. Description of random variable

The probability characteristic of a variable could be described completely if the form

distribution and the associated parameter are specified .However in practice the form of the

distribution function may not be known, consequently approximate description is often necessary

.The key description of the random variable are the central value or mean of the random variable

and a measure of dispersion represented by variance. A measure is also important when the

distribution is unsymmetrical.

2.5. Mean or average value

∫

∑∞

∞−

=

=

dxxfxXE

iablerandomdiscreteforxxpXE ix

)()(

var)()(

this is essentially a weighted average .Other quantities that are used to denote the central

tendency include Mode and Median.

The mode x is the most probable value of a randomness variable, the value that has the

maximum probability or the highest probable density.

The median is the value of randomness variable at which values above and below are

equally probable.

In general, the mean, median and mode of random variable are different, if the density

function is not symmetric .However if the Probability Density Function (PDF) is symmetric and

unimodal, then quantities coincide.

2.6. Variance and Standard deviation

Variance gives the measure of dispersion around the central value.

For a discrete random variable,

∑ −=ixall

ixxi xpxXVar )()()( γµ

For continuous random variables,

11

∑∑

∫

∫

−=

+−=

+−=

−=

∞

∞−

∞

∞−

22

22

222

2

)()(

)(2)()(

)()2()(

)()()(

X

XX

xxx

xx

XXVar

XEXXVar

dxxfxxXVar

dxxfxXVar

µ

µµ

µµ

µ

Dimensionally, a convenient measure is standard deviation SD= XXVar σ=

define the measure of dispersion relative to central value, we define

Coefficient of Variation (CoV) = X

XX µ

σδ =

Mode is the probable size which occurs most frequently .A sample may have more than

one mode and it is said to be multimodal. A cumulative distribution may show point of inflexion.

2.7. Moments

Consider a system of discrete parallel forces f1,f2,…..fn acting

µ

µ

∑

∑

=

==

ii

N

ii

fxx

f1

Suppose the discrete forces represent the probability of all possible occurrences, of N

1=µ

∑=

==N

iii fxxxE

1][ is called the expected value q, expectation of x of provides a measure of

central tendency of the distribution .It is different from arithmetic mean (It may be equal in the

case of normal distribution, where in the expected value is obtained from probability of a random

variable) The following rules are the operation for expectation, since expectation is a linear

operator

][........][][][.............2

][][.1

21

321

n

n

xExExExExxxxxIf

bxaEbaxE

+++=+++=+=+

12

)]([)]([)]()([var)()(.3

2121

21

yfExfEyfxfEibalerandomtwooffunctionsthearexfandxfIf

+=+

Again from static’s

M.I. = Iy=∑=

−N

iii fxx

1

2)(

V[x1]= ∑=

−N

iii fxx

1

2)(

2.8. Random variables and probability Distributions

To use probability or a probabilistic model for formulating and solving a given problem,

one accepts the view that the problem is concerned with a random phenomenon or phenomena.

Significant parameters influencing the problem are random variables or are regarded as such.

A random variable is a function of the value(s) which identify an outcome or an event. A

random variable may be discrete or continuous or a combination of the two. Each

numerical value of a random variable is associated with a probability measure. For example, if A

is a discrete random variable with values 1,2 and 3 then a value of A = 1 may have a probability

of 0.2 a value of A = 2 may have a probability of 0.3 and a value of A = 3 would have a

probability 0.5. (Note that the sum of these probabilities is unity.)

For a continuous random variable X, probabilities are associated with intervals on the real

line (abscissa). At a specific value of X (say X = x) only the density of the probability is defined.

The probability law or probability distribution is therefore defined in terms of a probability

density function denoted by ‘PDF’. Let fx(X) be the PDF of X. then the probability X in the

interval (a,b) is

( ) ( )∫=≤<b

ax dxXfbXaP

However, the probability distribution can also be defined by a cumulative distribution

function denoted by CDF which is

13

( ) )( xXPxFx ≤=

The CDF is extremely useful as we obtain a measure of probability directly, whereas to

obtain the probability measure from the PDF the area under the PDF has to be calculated. For

the continuous random variable we can write:

∫−

=x

xx dyyfxFα

)()(

Assuming that Fx(x) has a first derivative

dxxdF

xf xx

)()( =

The probability that the values of X lie in the interval {x, (x + dx)} is given by fx(x) dx, that is,

)()()( xdFdxxfdxxXxP xx ==+≤<

Figure 11 shows an example of a continuous random variable with PDF and CDF.

A function used to describe a probability distribution must be positive and the probabilities

associated with all possible values of the random variable must add up to unity. Therefore

0)(,0.1)(,0)( ≥=+=− xFFF xxx αα

Note also that Fx(x) will never decrease with increasing x and that it is continuous with x.

Obviously the CDF is a continuous curve, the magnitude of the ordinate at the end of the curve

being unity. This represents the total area under PDF which is also unity, the total probability

associated with a random variable.

Consider now the corresponding terms with respect to a

14

Figure 2 – A continuous random variable X showing PDF and CDF

Discrete random variable. The CDF has the same meaning as for a continuous variable and the

same equation applies. However, instead of the PDF, the alternative to CDF is a probability

mass function denoted by PMF. The PMF gives the probability of the random variable for all its

discrete values (as stated for the variable A earlier in this section).

Let X be a discrete random variable with PMF px(x1) = p(X=x1) in which x represents all the

discrete values of X, that is x1, x2, x3 etc. Then its CDF Fx(x) is given by

∑≤

=≤=xallx

ixxi

xpxXPxF )()()(

It is easy to show that in the interval bXa ≤<

)()()( aFbFbXaP xx −=≤<

The PMF is not a curve but a series of vertical lines as shown in figure below or ordinates

with heights representing probability measures (not probability density as in the case of

continuous case. i.e. PDF). The sum of the ordinates must be unity. A bona fide

0

CDF

1

PDFfx(x

)Fx

(x)

0

CDF

1

PDFfx(x

)Fx

(x)

15

Figure 3 – A discrete random variable X showing PMF and CDF

Cumulative distribution function for the discrete case must satisfy the same conditions as in the

case of the continuous random variable. Thus the CDF is a continuous curve and is none

decreasing with increasing x.

The simplest continuous distribution is a uniform distribution that is a line parallel to the

horizontal or abscissa as shown in Figure above. Another relatively simple distribution is a

triangular distribution; a modification of the triangular distribution is a trapezoidal distribution. It

is useless consider some of these as examples before proceeding to ‘well known and widely used

distributions such as the normal (or Gaussian) distribution, the lognormal distribution, the Beta

distribution and others.

x

Px(X

1)

0.1

0.3

0.2

0.4PMF of X

x

Px(X

1)

0.1

0.3

0.2

0.4PMF of X

0.1

0.4

0.6

1.0

x

Fx(X

)

CDF of X

0.1

0.4

0.6

1.0

x

Fx(X

)

CDF of X

16

2.9. Moments of a random variable

Before proceeding to more sophisticated distributions, it is necessary to consider

important descriptors of a distribution. A random variable may be described in terms of its mean

value called the ‘mean’ and its ‘variance’ (the ‘standard deviation’, which is the square root of

the variance, is often used instead of the variance). The use of these parameters with a known or

assumed distribution is very convenient. The mean and the standard deviation are generally the

main descriptors of a random variable; however, other parameters may have to be used to

describe a distribution properly. Reference is made later to another descriptor or parameter of a

distribution called the “skewness” Often a distribution is not known but estimates of the mean

and the standard deviation for the variance can be made is then possible to solve problems on the

basis of an appropriate assumption concerning the distribution. In other words one tries to fit a

distribution to the known values of these descriptors.

The mean value is a central value which represents the weighted average of the values of

the random variable where the weighs for each value is its probability density for a continuous

distribution and its probability for a discrete distribution. The

mean value is called an expected value and it is also referred to as the first moment of a random

variable. The mean value of X is denoted by E(X) or x or xµ .For a continuous random variable

with PDF fx(x) we have

∫−

==x

xx dxxxfxEx )()(

For a discrete random variable with

∑==iallx

ixi xpxXEx )()(

Other descriptors such as the ‘mode’ and the ‘median’ may also be used to designate the

central value of a random variable. The mode is the most probable value of a random variable

and the median is the value of the random variable at which the cumulative probability is 0.50 or

50. For a symmetric PDF with a single ‘mode’ the mean, the median and the mode are identical.

But, in general, the values of all three may be different from one another.

17

The variance of a random variable is a measure of its dispersion and is defined as follows for a

continuous random variable

∫−

−=α

α

dxxfxxxV x )()()( 2

Noting the form of the expression, the variance is also called the ‘second central moment’

of a random variable as it is the expectation of 2)( xx − or

2)( xxE − . By expanding the right-

hand side of Equation it can be shown that 22 )()( xXEXV −=

In practice the standard deviation Sx (also denoted σx) of a random variable is used in preference

to the variance primarily because it has the same units as the mean. We recall that:

)(xVSx = An equation similar to the above may be written for a discrete random variable.

A relative measure of dispersion of a random variable is its coefficient of variation Vx which is

the ratio of the standard deviation to the mean that is:

xS

V xx =

The coefficient of variation is a good parameter for comparing different random variable

as their spread or dispersion .In other words it is useful for comparing the variability or

uncertainty associated with different quantities.

The ‘Third centre moment ’of a random variable is a measure of the asymmetry or skewness of

its distribution; otherwise it may be negative or positive .For a continuous random variable, the

expression is

∫−

−=−x

xx dxxfxxxxE )()()( 33

A similar expression may be written for a discrete variable.

The skewness coefficient 0 is the ratio of skewness to the cube of the standard deviation.

18

2.10. The normal distribution Introduction

Perhaps the best known and most used distribution is the normal distribution also knows as

Gaussian distribution. The normal distribution has a probability density function given by

( ) ∞<<∞−⎥⎥⎦

⎤

⎢⎢⎣

⎡⎟⎠⎞

⎜⎝⎛ −−= xxxf X

2

21exp

21

σµ

πσ

where µ = mean and σ = standard deviation of the variate are the parameter of the distribution.

The standard normal distribution: A Gaussian distribution with parameters µ= 0 and σ = 1.0 is

known as standard normal distribution denoted by N (0,1) the function accordingly becomes

( )2

21exp

21

⎟⎠⎞

⎜⎝⎛−=

ssf s π

Models from limiting cases

Models arise as a result of relationships between the phenomenon of interest and its many

causes. The uncertainty as a physical variable may be as a result of combined effects of many

contributing causes. The small contributing factors are difficult to be quantified, at the same time

its overall behavior can be studied. The ability of to result in this shape to approximate the

distribution of sum of a number of uniformly distributed random variables is not coincidental. It

is due to central limit theorem.

Under very general conditions, as the number of variables in the sum becomes very large, the

distribution of sum of random variables will approach the normal distribution.

19

Continuous distribution

( )( )

( )

[ ] ( )( )

( )

[ ] ( ) ( )( )

( )

[ ] ( )[ ][ ][ ]

[ ] [ ][ ]

[ ] [ ] 22

22

22

22

2

22

2

var

1

xxExVxxxE

xxExxExxExxE

xxExV

nsformulatioiablediscretetheassameisthis

dxxfxxxV

xxfxE

dxx

bx

ax

bx

ax

bx

ax

−=

+−=+−=

+−=−=

−=

=

=

∫

∫

∫

Uniform distribution

( )ax ( )bx

M

f(x)

xa b

( )ax ( )bx

M

f(x)

xa b

C

a b

f(y)=c

y

C

a b

f(y)=c

y

20

[ ][ ]

( )12

var2

2

2

ab

ianceyE

bayE

−=

=

+=

Useful when all the chances are equally likely and no information’s are available

Cumulative distribution

Cumulative distribution is helpful in determining the probability that a random variable will take

a value less then or equal to a particular numerical value or a range of values.

2.10.1. The standard normal variate

The normal or Gaussian distribution is represented by a continuous, symmetric PDF given by

the following equation:

⎥⎥⎦

⎤

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛ −−=<<−

2

21exp

21)(

xxxx S

xxS

xfπαα

Figure 4 – A normal distribution X with mean X and standard deviation S2

0 a bx

fx(x

)

),( xSxN

Area = P(a<X<b)

0 a bx

fx(x

)

),( xSxN

Area = P(a<X<b)

21

A short notation ),( xSxN is often used for a normal distribution (Figure 14)

A very useful form of the distribution is one with a zero mean and unit standard deviation and is

referred to as the ‘standard’ normal distribution. Thus if S is the standard normal random

variable (or simply variate), its PDF is (Figure 14)

ααπ

<<−⎥⎦⎤

⎢⎣⎡−= sssf s ,

21exp

21)( 2

This is also denoted by N (0, 1) and is symmetrical about zero. Its cumulative distribution

function or CDF is often denoted by )(sφ that is:

)(sφ =Fs(s)=p

Figure 5 – A standard normal distribution

Where p is the probability p=P(S ≤ s). This probability p represents the area under the standard

normal distribution to the left of s, that is, from - o to a. This distribution is available in tables

and often values are given only for positive values of the standard normal variate. Thus values

will start from 0.5 for s =0 and approach unity for increasing positive values of s.

For negative values of the probability is obtained by subtraction from unity (the total area under

the distribution being unity). Hence, we have

)(1)( ss φφ −=− This is obviously correct because the standard distribution is symmetrical about s=0.

The reverse calculation, that is determination of the value of the variate s for a given cumulative

probability p is often important and one may write

Probability = p

N(0, 1]

Note Zero mean and unit standard deviation

fs(s)

0 s

s

Probability = p

N(0, 1]

Note Zero mean and unit standard deviation

fs(s)

0 s

s

22

)(1 ps −= φ Returning to tabulated values, as noted earlier the tables usually contain the CDF for

positive values of the variate, s. Because of symmetry the CDF for negative values of a can be

simply obtained using equation )(1)( ss φφ −=− . Positive values of the variates are associated

with CDF>O.5 or p>0.5. For values of p<0.5, the variates is given by

)1()( 11 pps −−== −− φφ [Note: In some tables the values of cumulative probability start from zero even though only

positive values of the variate are considered. In such cases the user should add 0.5 to the

tabulated value for the left symmetrical half of the area].

2.10.2. Application of standard normal variate

The first step is to obtain the standard variables s from the given mean and standard deviation of

the random variable x, The relationship between x and s is obvious from the corresponding

expressions for PDF and we have

xSxxs −=

The probability that the random variable A lies between two limits a and b is given by the

probability that the standard normal variate lies between x and s. and we have;

⎟⎟⎠

⎞⎜⎜⎝

⎛ −Φ−⎟⎟⎠

⎞⎜⎜⎝

⎛ −Φ=

−=≤≤

xx Sxa

Sxb

ssbXaP )()()( 12 φφ

2.10.3. Logarithmic normal distribution

Consider a random variable X which does not follow a normal distribution but

whose natural logarithm (In X) has a normal distribution. The variable X is then said to have a

logarithmic normal or log-normal probability distribution and its density function is

αβ

αβπ

≤≤⎥⎥⎦

⎤

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= xxx

f x 0,ln21exp

21

2

In which

23

XSXVar

andXXE

ln)(ln

ln)(ln

==

==

β

α

are respectively the mean and standard deviation of ln X and are the parameters of the

distribution. Assumption of a lognormal distribution is often preferred to the assumption of a

normal distribution for random variables which must have a positive value. For instance the

factor of safety F is, by definition. a positive quantity. Therefore, it appears desirable to adopt F

as a lognormal variate than at a normal variate. Figure 16 shows a typical lognormal distribution.

It is easy to show that the tabulated values of the CDF of a standard normal distribution can be

used for a lognormal distribution as well. The probability of X being in the interval (a,b) is

⎟⎟⎠

⎞⎜⎜⎝

⎛ −Φ−⎟⎟⎠

⎞⎜⎜⎝

⎛ −Φ=≤<β

αβ

α abbXaP lnln)(

The probability of X being less than or equal to unity is

⎟⎟⎠

⎞⎜⎜⎝

⎛ −Φ=⎟⎟⎠

⎞⎜⎜⎝

⎛ −Φ=<βα

βα1ln)1(XP

It can be shown that in terms of x and Sx ,α and β are as

Figure 6 : Lognormal distribution showing typical shape of PDF

A random variable X has a logarithmic normal probability if lnX is normal; the density function

similar to normal distribution is as written as

0x

fx(x)

0x

fx(x)

24

( )

( )( )XVarG

meanXEwhere

xG

XGx

xf x

ln

ln

0ln21exp

21 2

=

==

∞≤≤⎥⎥⎦

⎤

⎢⎢⎣

⎡⎟⎠⎞

⎜⎝⎛ −−=

λ

λπ

Where λ and G are the parameters of the distribution. Probability associated with a log-normal

variate can be determined from standard normal probabilities.

The probability that the variable X will assume values in an interval (a,b) is

( )

( )

⎟⎠⎞

⎜⎝⎛ −−⎟

⎠⎞

⎜⎝⎛ −=

=≤<

=−=

⎥⎥⎦

⎤

⎢⎢⎣

⎡⎟⎠⎞

⎜⎝⎛ −−=≤<

∫

∫

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛−

Ga

Gb

ebXaP

anddsxGdxthenGxs

let

dxG

XGx

bxaP

Gb

Ga

dss

b

a

λφλφ

π

λ

λπ

λ

λ

lnln

21

,ln

ln21exp

21

ln

ln

21

2

3

since log-normal distribution can be evaluated using normal distribution itself and since the

value of the random variable are always positive , the log-normal distribution may be useful

where the value of the variates are strictly positive Ex: Intensity of rainfall

If the log-normal is bounded between y(a) and y(b) it is such that

( ) ( ) ( ) ( )∞+=∞−= expexp byanday

If E(x) and V(y) are the mean and CoV of a log normal then

25

( )

3.0

ln21ln

21exp

2

2

≤===

==

−=

⎟⎠⎞

⎜⎝⎛ +=

µδµσξ

ξξ

ξµλ

ξµ

ifCoV

XVarasCoVrepresents

X

2.11. Beta distribution

In geotechnical engineering one is often concerned with random variables whose

Values are bounded between finite limits. For example, the angle of internal friction for given

sand has specific limits to its value depending largely on the relative density (density index) of

that sand. Similarly values of unit weight γ and un-drained shear strength cu, lie between finite

limits. In such cases it is generally unrealistic to assume variation from - α to +α (as in normal

distribution, or 0 to α as in lognormal distribution. Certainly it is true that all distributions lead to

results of similar accuracy for values of the random variable in the central region of a

distribution. Yet when one is concerned primarily with the tails of a distribution, significant

differences in results are obtained depending on the choice of distribution. Putting it differently,

if one is concerned with relatively high probabilities, the choice of a distribution may not be

critical. However, if one is concerned with relatively low probabilities say <10-2) the choice of a

distribution determines the accuracy and even the order of magnitude of the answer. In some

civil engineering problems, instance, highway cuttings, high failure probabilities may be

acceptable. In others, for example, earth dams and multi-storeyed buildings, low failure

probabilities must be ensured. In special cases, for instance, foundations of nuclear power plants

and other very sensitive structures, extremely low failure probabilities and low probabilities

against settlement or differential settlement of a certain magnitude must be ensured. Therefore in

some cases the choice of distribution is critical.

A probability distribution is appropriate for a random variable who’s values are bounded

between finite limit a and b in the bate distribution. The density function of such a distribution is

( ) ( )( )( ) ( )( )

( )( ) ( )bxaab

xbqxrqB

xf rq

rq

x ≤≤−

−−= −=

−−

1

11

,1

26

In which q and r are the parameters of the distribution and B(q,r) is the bets function

( ) ( ) ( )( )∫ −− −=1

0

11 1, dxxxrqB rq

and is related to gamma function as follows

( ) ( ) ( )( )rqT

rTqTrqB+

= *,

Depending on the parameters q and r the density functions of the bets distribution will have

different shapes.

If the values of the variate are limited between 0 an d 0.1 (i.e. a=0 and b= 1.0) , then the above

equation for fx(x) becomes

( ) ( )( ) ( )( ) 1.001

,1 11 ≤≤−= −− xxx

rqBxf rq

x

The shape of the distribution becomes important when one is concerned with very low

probability of failure. For example when one is concerned with high probabilities of safety or

high reliability > 0.99, then choice of distribution plays a dominating role

It also depends as the type of problem foe example for highway shallow cutting and mine works.

A reliability of 0.90 to 0.95 is also acceptable where as for earth dams and multi storeyed

buildings, the reliability should be in the range of 0.999 to 0.9999.

∞− ∞∞− ∞

27

The mean and the variance of the bets distribution are given by

( )

( ) ( )( )

( )

32

75105

1051

22

2

rqrq

qba

abrqrq

qr

abrq

qa

X

x

=

=−=

+

==

−+++

−+

+=

σ

µ

We have

( ) ( )( ) ( )

89.426.3

7*1.01

222

==

=−+++

randq

abrqrq

qr

The beta distribution (Figure 17) is appropriate for random variables which have a finite range.

The uniform distribution, already considered earlier, is the simplest example of a beta

distribution. Considering the limits a and b for a random variable X. the PDF of a beta

distribution may be written in the form

( ) ( )1

11

),( )(1)( −+

−−

−−−= rq

rq

rqx ab

xbaxXfβ bxa ≤≤

in which q and rare the two parameters which determine the shape of the distribution and B(q,r)

is the beta function

0 2 4 6 8 10 12

Q=2.0 r=6.0

0 2 4 6 8 10 12

Q=2.0 r=6.0

28

Figure 7 – Beta distribution

dxxxrqB rq 11

0

1 )1(),( −− −= ∫

The mean and variance of the beta distribution are:

22

2 )()1()(

)(1

abrqrq

qS

abr

qax

x −+++

=

−+

+=

γ

The standard beta distribution may be defined as one which has a=0 and b=1. The

standard PDF is symmetrical for q=r=3 and it is uniform with a density of unity for q=r=1.0

(Figure 18)

Figure 8 – Various shapes of standard beta distribution

0 2 4 6 8 10 12

0.2

0.1

0

q = 2.0 : r = 6.0

x

fx(x

)

0 2 4 6 8 10 12

0.2

0.1

0

q = 2.0 : r = 6.0

x

fx(x

)

q = 1.0

r = 4.0q = r = 3.0

q = 4.0

r = 2.0

q = r = 1.0

0 0.5 1.0

x

fx(x)

3

q = 1.0

r = 4.0q = r = 3.0

q = 4.0

r = 2.0

q = r = 1.0

0 0.5 1.0

x

fx(x)

3

29

Probability calculation are facilitated by the incomplete beta function which is defined as

∫ −− −=x

rqx dyyyrqB

0

11 )1(),( 0.10 << x

The probability of X being between limits c and d is given by

[ ]),(),(1)(),(

rqBrqBB

dXcP vurq

−=≤≤

Where

abadu

−−=

and abacv

−−=

2.12. Binomial and geometric distributions

Assume that repeated trials of an event are made, the probability p of occurrence in each

trial is constant and the trials are statistically independent. Then considering that each trial has

only two outcomes either occurrence or non-occurrence, the problem may be modeled as a

Bernoulli sequence. One may apply this to rainfall, flooding, earthquakes etc. Which affect the

performance of geotechnical structures?

The probability of exactly a occurrences among n trials in a Bernoulli sequence is given by the

binomial distribution: the equation for the PMF being:

,)1()( xnx ppxnxXP −−⎟⎠⎞

⎜⎝⎛==

x=0,1,2,…..,n

Where,

xn

)!(!!

xnxn−

=

is the binomial coefficient and a and p are parameters.

The probability of realizing one particular sequence of exactly x occurrences of the event among

n trials is px(l —p)n-x. However, the sequence of trials can be permuted n times; therefore, the

number of sequences with exactly x occurrences is

xn

30

The number of trials until an event occurs for the first time is governed by the geometric

distribution, Let T be the random variable concerning the number of trials for first occurrence

and Let this occur at T=t. Then we have

,)( 1−== tpqtTP t=1,2,…

Where (q=p—1)

This is known as the geometric distribution obtained by substituting x=1 and n= t in Equation.

The first occurrence time is considered equal to the recurrence time and the latter also has a

geometric distribution; the mean recurrence time also known as the return period is:

...)321(.)( 2

1

1 +++=== ∑−

− qqppqtTETt

tα

Since q< 1.0 series summation gives pT /1= or average return period is the reciprocal of the

probability of the event within one time unit.

So far we have considered either the number of trials or time units until the first

occurrence. The time until the next occurrence is governed by the negative binomial distribution.

The probability of (Tk = t) where Tk is the number of trials until the kth occurrence is

ktkk qp

kt

tTP −⎟⎟⎠

⎞⎜⎜⎝

⎛−−

==11

)( for t=k,k+1,…=0 from t < k

Models for simple discrete random trials

A basic situation at certain times is that if outcomes of experiments can be separated into two

exclusive categories good or bad, failure or success etc. We are interested in the simplest kind of

experiments, the out comes of which can be either failure or success. The two events are

mutually exclusive, collectively exhaustive possible outcomes. This is called Bernoulli trial

The binomial distribution is given by

b(x, N, R) = ( ) xNx

xpRN −

( )( )xNx PR

xNN −

−=

!!

31

The binomial distribution models the outcomes of experiments for which the following

properties hold are

1. An experiment is repeated N tims with the outcome of each trial being

independent of others

2. Only two mutually exclusive outcomes are possible called success ot failure , x is

the no of successes

3. The probability of success in each trial denoted by R remains the same for all

trials , p is the probability of non-occurrence

R + p =1

4. The experiment is performed under the same conditions for all N trials

5. The interest is the number of success x in the N trials and not in the order they

occur.

Geometric distribution

Assuming independence of trials and a constants values of P, the distribution of N , the number

of trials for the first success can be found. The first success will occur only on the nth trial and if

only (n-1) are failures

[ ] ( ) PPnNP n 11 −−==

This is called geometric distribution

FIGURE

The probability that there is at least one occurrence in n trials

= 1-P[no. of occurrence in n trials]

= ( )nP−− 11

moments of geometric distribution

32

E(N)=1 / P

[ ] [ ] [ ] 222 1

PPNENENVar −=−=

Design values and return periods , civil engineering systems must withstand the effect of rare

events such as large floods or high winds . it is necessary to consider the risks involved in the

choice of design capacity

In a design, one can estimate the maximum magnitude of rare events which the structure can

withstand (maximum wind velocity)

Average return periods

The expected value o geometric distribution is

[ ] ( )

( ) ( )

( )PPPP

PPPPPPPPPP

PnPNEn

n

5656

332213122*

1

2

22

21

1

−=−=

−+−+=−+−+=

−=∑=

−α

The probability that there will be no events greater than 50years m-flood in 0years is B[m,1 / m]

( )

368.0

.................314

214

1141

1*..........12

111

11

32

=

+++−=

=+⎟⎠⎞

⎜⎝⎛−+⎟

⎠⎞

⎜⎝⎛−=

⎟⎠⎞

⎜⎝⎛ −=

mmnif

mmm

mm

mr

m

The probability that one or more events will occur in m years is (1-e-1)=0.632 thus a system can

set affected by a rare event within its return period is 0.632

33

2.13. Poisson, exponential and gamma distributions

Many physical process occurrences of fatigue crakes, earthquakes occurring at any tie in an

earthquake prone region, occurrence of accidents on a highway may be modeled using

Bernaoulli sequence, dividing the time interval r space into smaller intervals and considering

whether the event will occur or not occur. If the vent can occur at any instance and again can

occur the event may be modeled as a poissons process.

Assumptions:

1. An event can occur at random at any time or any point in space.

2. The occurrence of an event in a given time (space) is independent of any other in other

interval.

3. The probability o occurrence of an event in a small interval t∆ is proportional to t∆ and

can be given by r t∆ .where r is the mean rate of occurrence.

The number of occurrence of an event in t is given by Poisson distribution i.e. if Xt is the

number of occurrences in time (or space) then,

( ) ( )

( )tXofinace

tXErateoccurencemeantheis

xwhereextXP

t

t

vtx

xt

γγ

γ

γ

==

== −=

var

.........2,1,0!

Poissions distribution (derivation from binomial distribution)

( ) ( ) ( )( )xnx

xX PPnxP −−−= 1*

when the time distribution are reduced smaller and smaller the number of trials (n) increases and

the probability Pof success decreases. But the expected number of events is np.

Say np = γ as n ∞ , P 0, np ∞

34

Substituting in the above equation

( ) ( )( )

( )

( )( )

( ) ( )( )( )

⎪⎪

⎭

⎪⎪

⎬

⎫

⎪⎪

⎩

⎪⎪

⎨

⎧

⎭⎬⎫

⎩⎨⎧

⎟⎠⎞

⎜⎝⎛ −

−−−+−⎟⎠⎞

⎜⎝⎛ −=

⎭⎬⎫

⎩⎨⎧

⎟⎠⎞

⎜⎝⎛ −

⎭⎬⎫

⎩⎨⎧

−+−⎟⎠⎞

⎜⎝⎛ −=

⎟⎠⎞

⎜⎝⎛ −−

⎟⎠⎞

⎜⎝⎛ −=

⎟⎠⎞

⎜⎝⎛ −⎟

⎠⎞

⎜⎝⎛

−=

−

x

nx

x

nx

x

nx

xnx

x

nn

nnnnxnnx

nn

xnxnn

nx

xn

nxn

nnx

nnxxnnxP

γγγ

γγγ

γγγ

γγ

1

123....11!

1

11....1...............2.11

!

1!

!1!

1!!

!

For large values of n this term is nearly 1 and γγ −=⎟⎠⎞

⎜⎝⎛ − e

n

n

1

Hence

( ) ..........,.........2,1,0!

==−

xxeXP

x

X

γγ

this is known as poisson distribution

35

[ ]( )

( )

( ) [ ]

[ ]

( ) ( )

∑ ∑∑

∑

∑

−

∞

=

−

∞

=

−−

∞

=

−

=

==

=−=

−=

=

!

!1

!1

!

0

1

10

xetxP

sayandtVputalsocanwexV

yeXxyput

xe

xeX

t

x

y

x

x

xx

x

λ

γ

γ

γ

λλ

γ

γγ

γγ

γ

This is called pisson process

To be a poisson process

1. The probability of incident in a short interval of time t to t+h is approximately hλ for any

t

2. The probability of two or more events in a short interval of time is negligible

3. The number of incidents in any interval of time is independent of the number in any non-

overlapping interval

Property of Poisson distribution

[ ][ ]( ) ( )

µµβ

µβ

µµ

31211 +==

==

i

i

xvxE

The Poisson process is useful where an event may occur anywhere in a space and

time framework and is based on the following assumptions

(1) The occurrence of an event in a given interval is independent of its occurrence in other

intervals.

(2) The probability of occurrence of an event in a small interval is proportional to that interval

and the proportionality constant v, describes the mean rate of occurrence.

(3) The mean rate of occurrence v is assumed constant,

36

(4) The probability of two or more occurrences in the chosen small time interval ∆t is negligible.

If X, is the number of occurrences in time (or space) interval t.

,!)()( vt

x

t ex

vtxXP −== x=0,1,2…..

It is obvious that

vtXXE tt ==)( It can be shown that the variance is the same as exception, i.e.

vtS x =2

The occurrences of an event between intervals are statistically independent as are

the occurrence of an event between trials in the case of the Bernoulli sequence.

An extension of the Poisson process is the important case where the occurrence of an event is

influenced by the occurrence in the previous time interval. Thus the probability of occurrence is

a conditional one and the model used for determining it is called the Markov process or Markov

chain.

If the Poisson process governs the occurrence of an event then the time T to first occurrence has

an exponential distribution also referred to as the negative exponential. We have for the

probability that no event occurs in time t: vt

t eXPtTP ===> )0()( 1 The PDF is

vtT vetf −=)(

1 0≥t The CDF is

vtT etTPtF −−=≤= 1)()( 11

If v is independent of t and hence constant the mean of T1 is

vTTE 1)( 11 ==

This is the mean recurrence time or return period and may be compared to 1/p for the Bernoulli

sequence. For small time interval the two are nearly equal.

If it is desired that the PDF for an exponential distribution should start at a value of greater than

37

zero (i.e. not pass through the origin), we may use the shifted exponential distribution.

The time until the kth occurrence is described by the gamma distribution. If Tk denotes the time

until the kth event

vtk

T ekvtvtf

k

−−

−=

)!1()()(

1

The exponential and gamma distributions are related to the Poisson process in the same way

that the geometric and negative binomial distributions are related to Bernoulli sequence.

Exponential distribution

The exponential distribution is related to the poisson distribution as follows. If the events occur

as per poisson process, then the time T1 till the first occurrence of the event is an exponential

distribution. This means that in the interval (T1>t) no evens occurs

( ) ( ) tt eXPtTP γ−===> 01

this is the first occurrence time in a poisson process

Distribution function of T1

( ) ( )( ) 0

1

1

11

≥==

−=≤=

−

−

tedtdftfT

etTPtfT

t

t

γ

γ

γ

if γ is a constant then mean value of T

MT1=1 / γ

Mean recurrence time = γ1

38

Gamma Distribution

If the occurrence of an event constitutes a poisson process, then the time until th kth occurrence

of the event is described by gamma distribution. Let Tk denote the time till the kth event , then

( )tTk ≤ means that Kor more events occur in time t.

corresponding density function is

( ) ( )( )

tk

TK ek

ttf γγγ −−

−=

!1

1

mean time till the occurrence of kth event

( )

( ) 2vkTVarVariance

vkTE

k

k

=

==

2.14. Hyper geometric distribution

In quality control, the use of a distribution for sampling acceptable from

unacceptable items is desirable. Let m elements be defective among N elements then if a sample

of items is taken randomly, the probability of x defective items in the sample is given by the

hypergeometric distribution. This is written as follows:

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛

==

nN

xnmN

xm

xXP )(

x=1,2,…m

The number of samples of size n in the finite population N is nN

The number of samples with x defective elements is ⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛xnmN

xm

39

Assuming that all samples of size n are equally to be chosen the above equation is obtained.

The hyper geometric distribution arises when samples from a finite population (for example,

consisting two types of element like good or bad.) are examined.

Consider a lot of N items, m of which are defective and the (N-m) are good. If a sample of n

items is taken (at random) from this lot, the probability of x defective items in the sample is

given by the hyper geometric distribution.

( )( ) mxN

xnmN

mxXP

n

x....,.........2,1)( =

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

==

in the lot the number of sample of size n is ( )nN

Number of ways in which x defective samples can be shown = ( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛−−

xnmN

mn

2.15. Joint distribution, covariance and correlation

If X and Y are two random variables then probabilities associated with any pair of

values x and y may be described by a joint distribution function. e.g.

),(),(, yYxXPyxF YX ≤≤=

For discrete random variables the joint PMF (Figure above) may be used

),(),(, yYxXPyxp YX ===

Figure 9 – (a) Joint PMF of X and Y (b) Joint PDF of X and Y

),(, yxp yx

x

y

(a)

),(, yxp yx

x

y

(a)

),(, yxf yx

x

y

(b)

),(, yxf yx

x

y

(b)

40

For a continuous random variable the joint PDF ( Figure 19) may be defined by:

).(),(, dyyYydxxXxPdxdyyxf YX +≤<+≤<= The marginal density functions of this joint distribution are

∫

∫

−

−

=

=

α

α

α

α

dxyxfyf

dyyxfxf

YXY

YXx

),()(

),()(

,

,

The CDF is given by the volume under the surface f(x,y) and is given by

∫∫−−

=y

YX

x

YX dudvvufyxFαα

),(),( ,,

Description of a joint distribution of two random variables requires five statistical

parameters namely, the mean and standard deviation of each variable and the correlation

coefficient between them. This coefficient is denoted by p and is the ratio of the covariance

denoted by cov(x,y) and the product of the standard deviations

yxYX SS

yx ),cov(, =ρ

The covariance itself is defined as the joint central second moment, that is, the expectation of the

product ))(( yYxX −− and hence

)()()()])([(),cov(

YEXEXYEyYxXEYX

−=−−=

If X and Y are statistically independent then

E(XY)=E(X)E(Y) and cov(X,Y)=0

If two variables are statistically independent then the variables are uncorrelated, however, the

reverse is not true, if the variables are uncorrelated, they may not be statistically independent.

The correlation coefficient p may vary from -1 to +1 and may be regarded as a normalized

covariance. It is a measure of the linear relationship between the two random variables (Figure

20).

41

If X and Y (cohesion and friction) are two random variables, then the probability associated with

any pair of values x and y may be described by a joint distribution function.

Eg : ( ) ( ) ( )yYxXPyxF YX ≤≤= ,,,

For discrete random variables the joint PDF is defined as

( ) ( )yYxXPyxP YX === ,,,

For a continuous random variable the joint PDF is defined as

( ) ( )dyyYydxxXxPdxdyyxf YX +≤≤+≤≤= ,,,

Marginal density functions of the joint distribution

( ) ( )

( ) ( )∫∫

∞

∞−

∞

∞−

=

=

ygiveanyatdxyxfyf

xgiveanyatdyyxfxf

yxy

yxx

,

,

,

,

CDF of x and y is the total volume and are dependent on each other, another variable linking the

two variables comes into picture. This is given by

( )

( )( )[ ]( ) ( ) ( )YEXEXYE

yYxXECoV

yxCoV

yxyx

−=−−=

=ρρ

ρ ,,

This is similar to parallel axis theorem

Given Constraints Assigned probability distribution

( )∫ =b

adxxf 1 Uniform

( )∫ =b

adxxf 1 expected value Exponential

( )∫ =b

adxxf 1 expected value, SD Normal

( )∫ =b

adxxf 1 expected value, SD, min,

max

Beta

( )∫ =b

adxxf 1 mean rate of occurrence

between two independent events

Poissons

42

Pearson’s system

In binomial model, it is assumed that all the outcomes are equally likely which means that

sampling is done with replacement from a finite population of N

( ) ( )

( ) ( )( ) ondistributibinomialforPR

xxNNRNxb

PRxN

RNxb

xNx

xNx

−

−

−=

⎟⎟⎠

⎞⎜⎜⎝

⎛=

!!!,,

,,

Suppose sampling is done without replacement from a collection N and the lot N contains K

number of samples which have a particular characteristic and (N-k) which do not have it. If a

sample in selected from this collection, either it is from K or (N-k). Suppose r random samples

are drawn without replacement from N items.

The probability that x of the r samples are of the type k is given by

( )⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛

=

rN

xrkN

xk

krNxh ,,,

This is called hyper geometric distribution

Binomial distribution yields Normal distribution. The limit of the hyper geometric distribution

must produce a more flexible continuous probability distribution capable of better representing

skewed variation.

For symmetrical distributions all moments of odd order about the mean are zero, which means

that any odd-ordered moment may be used as a measure of the degree of skew ness.

Moments

∑=

=N

i

fiM1

Consider a system of discrete parallel forces f1, f2, ………..fN actually on a rigid beam at the

respective distances x1, x2,………xN

From statistics

43

∑=

=N

i

fiM1

and its point of application x is given by

M

fxx

N

iii∑

== 1

we can consider that f1, f2, ………..fN represent the probability of all possible occurrences of the

N outcomes x1, x2,………xN

Since the distribution is exhaustive M=1

[ ] ∑=

==N

iii fxxEx

1

where E[x] is the expected value of x. In general, it denote the central tendency of the

distribution

The expected value of the distribution can be considered as first moment of the distribution and

the concept can be generalized to Kth moment as follows

[ ] ∑=

=N

i

ki

ki fixxE

1

we know from the statistics that the moment of inertia (MI)

( )∑ −= fixxI iy2

we can have similar concepts and

[ ] ( )∑ −= fixxxV ii2

Ex

[ ] ( ) ( )[ ] [ ] [ ][ ][ ] [ ]( )22

222

22

2

2

2

2

ii

i

ii

iiii

xExE

xxxE

xExxExE

xxxxExxExV

−=

+−=

−+=

−+=−=

1. if x is a random variable a and b are constant

[ ] [ ] bxaEbaxE +=+

2. if x = x1+ x2 + x3…….+ xN then [ ] [ ] [ ] [ ] [ ]NxExExExExE ......321 +++=

44

3. If f1(x) and f2(x) are two rvs,

( ) ( )[ ] [ ]ExfxfE =+ 21

4. It is seen that the variance has dimensions of square of the RV

[ ] [ ]ii xVx =σ

5. If xi is a random variable, a and b are constants

[ ] [ ] [ ]( )221 iii xaxVabaxV σγγ ==+

For symmetrical distribution all moments of odd order about the mean must be zero.

Consequently an odd ordered moment may be used as a measure of degree of skew ness . The

third moment ( )[ ]321 xxE − of a probability distribution can be considered to represent the skew

ness. Since the units of that central moment are cube of the units of the variable. To provide an

absolute measure of skew ness , Pearson designed an absolute term called coefficient of skew

ness.

( ) ( )[ ][ ]( )3

3

1i

i

xxxE

σβ −

=

if β(1) is positive the corresponding distribution is negative.

45

Poisson also proposed a dimensionless coefficient of Kurtosis

( ) ( )[ ][ ]( )4

4

2i

i

xxxE

σβ −

=

This is a measure of peaked ness. A distribution is said to be flat if β(2)<3

( ) ( )

( )( )PS

PS

sesf

P

P

ss

1

21 2

*21

−

−

=

=

∞<<−∞=

φφ

π

The standard normal density function

46

( ) ( )SS φφ −=− 1

Reduction of data to standard Normal variate form

Suppose we have a normal variate X with distribution N(µ,σ)

( )

( ) [ ]∫

∫

−

− −≤≤

=−=

⎥⎥⎦

⎤

⎢⎢⎣

⎡⎟⎠⎞

⎜⎝⎛ −−=≤≤

σµ

σµπσ

σσ

µ

σµ

π

b

ar

b

a

dssbxaP

dsdxxS

xbxaP

21exp2

1

21exp

21 2

The values of S correspondingly to probability of P<0.5 may be obtained as

( ) ( )PPS −−== −− 111 φφ

Standard normal function: the density function is given by

( ) ( ) ∞<<−∞= − Sesf S 221

21π

Because of its wide usage a special notation φ(s) is commonly used to designate the distribution

function of the standard normal variate

( )sfs

( )1,0N

( )sfs

( )1,0N

( )sfs

( )1,0N

47

( ) ⎟⎠⎞

⎜⎝⎛ −==

σµφ xSwherePS

The tables give the probability of only positive values of the variate .However the probability of

negative values of the variate can be obtained as

( ) ( )( ) ( )PPS

SS−−==

−=−−− 1

111 φφ

φφ

2.16. Moments of functions of random variables

2.16.1. Sum of variates x1,x2 etc

Consider a function Y which is dependent on two random variables X1 and X2. thus (a1

and a2 are constants)

221 1 XaXaY += Then it can be shown that

2211)( xaxaYEy +== and

)cov(2)()()( 21212221

21 xxaaxVaxVayV ++=

Now if 2211

2211

xaxay

XaXaY

−=

+=

and

)cov(2)()()( 21212221

21 xxaaxVaxVayV −+=

48

Figure 10 – Coefficient of correlation ρ

In general, if

∑=

=n

iiXaY

11 ,

∑

=

=n

iii xay

1

∑∑∑=

=n

ijiji

n

j

n

iii xxaaxVayV )cov()()(

1

2

Suppose Z is another function of random variable X, i.e.

∑∑∑

∑

≠=

=

+=

=

n

jiji

n

ji

n

iiii

i

n

ii

xxbaxVbaZY

XbZ

)cov()(),cov(1

1

Product of independent variates x1,x2,x3 etc

If 2

432122

322

21

2

321

321

).......()()......()()(

.........)(

.....

nnZ

n

n

xxxxxXEXEXEXES

xxxxZEZ

XXXXZ

−=

==

=

First order approximation for general functions

Let Y=g(X)

49

Then by expanding g(X) in a Taylor series about the mean value x the following first-order

approximation can be made

2

2

2

)()(

)(21)()(

⎟⎠⎞

⎜⎝⎛≈

+≈=

dxdgxVyV

dxgdxVxgYEy

Good approximation of exact moments is obtained if g(X) is approximately linear for the entire

range of values of X (even if the second term in the expression for the mean is neglected; this is

generally done)

Now if Y is a function of several variables X1, X2, X3 etc.

),......,,( 321 nXXXXgY = the corresponding first-order approximations are

∑∑∑

∑∑

≠=

==

+≈

⎟⎟⎠

⎞⎜⎜⎝

⎛+

≈=

n

jiji

n

ji

n

iii

n

jji

ji

n

i

n

xxccxVcyV

xxdydxgd

xxxxgYEy

)cov()()(

)cov(21

)........,,()(

1

2

1

2

1

321

where ci, and cj are the partial derivatives (dg/dxi) and (dg/dxj)

evaluated at nxxxx ,.......,, 321 .

The second term of the first equation is generally omitted. The second term of the second

equation is not omitted but will vanish if x1,x2…..xn are uncorrelated or statistically independent.

1

Example on PDF and CDF

The undrained shear strength cu of a stratum of clay has a uniform probability

distribution, the maximum and minimum values of uniform distribution being 25 kN/m2

and 50 kN/m2. What is the probability that the undrained shear strength has magnitude

(a)less than 40 kN/m2,(b)less than 30kN/m2 (c)less than 10 kN/m2 and (d) greater than 55

kN/m2.

Figure 1 – Uniform probability density function with associated CDF

The area under the probability density function must be unity. In this case the

abscissa or the rectangle is (50-25)=25. Therefore the height or the base of the rectangle

(i.e. the uniform probability density px ) is given by equating the area to 1.

251,125 =∴=×

uu cc pp

We use this value as follows:

(a) )40(1 ≤= ucPp

PDF

x

fx(x

)

200

h = 1/200

PDF

x

fx(x

)

200

h = 1/200

0 x 200

1.00

CDF

200)( xxFx =

)(xFx

0 x 200

1.00

CDF

200)( xxFx =

)(xFx

2

This probability is the area of the rectangle between the ordinates

at cu=25 (minimum value) and cu=4O

6.0251)2540(1 =×−=∴ p

(b) 2.0

251)2530()30(2 =×−=≤= ucPp

(c) 10 kN/m2 is outside the range 25—50, Accordingly 0)10(3 =≤= ucPp

(d) 55 kN/m2 is outside the range 25— 50. Accordingly, 0)55(4 =>= ucPp

Example on Normal distribution

Example 1

From records, the total annual rainfall in a catchments area is estimated to be normal with

a mean of 60 inches and standard deviation of 15 inches

a. What is the probability that in future years the annual rainfall will be between 40

to 70 inches

b. What is the probability that the annual rainfall will be at least 30 inches

c. What is the 10 percentile annual rainfall

3

a.

( )

( )

( )

( ) ( )( ) ( )[ ]

( )6568.0

908241.01748571.033.1167.0

33.167.015

604015

60707040

21exp

21 2

tablefrom

XP

abbXaP

dsdxxS

xbXaPb

a

−−=−−=−−=

⎟⎠⎞

⎜⎝⎛ −−⎟

⎠⎞

⎜⎝⎛ −=≤≤

⎟⎠⎞

⎜⎝⎛ −−⎟

⎠⎞

⎜⎝⎛ −=≤≤

=−=

⎥⎥⎦

⎤

⎢⎢⎣

⎡⎟⎠⎞

⎜⎝⎛ −−=≤≤ ∫

φφφφ

φφ

σµφ

σµφ

σσ

µ

σµ

πσ

b.

( ) ( )

( )( )[ ]

( ) ( ) ( ) 0.10609772.0

0.2110.21

15603030

=−∞=≥=

−−=−−=

⎟⎠⎞

⎜⎝⎛ −−∞=≥

φφ

φφ

φφ

XP

XP

c.

[ ]

( )

( )

inchesx

x

xxXP

8.402.196028.1

90.0

10.015

60

10.015

6010.0

10.0

1

11.0

1.0

1.0

=−=

−=−=

=−

=⎟⎠⎞

⎜⎝⎛ −

=≤

−

−

φ

φ

φ

4

Example 2.

A structure is resting on three supports A,B and C. Even though the loads from the roof

supports can be estimated accurately , this soil conditions under A, B and C are not

completely predictable. Assume that the settlement CBA and ρρρ , are independent ,

normal variates with mean of 2,2.5 and 3 cm and CoV of 20%, 20% and 25%

respectively.

a. What is the probability that the maximum settlement will exceed 4cm

b. If I is known that A and B have settled 2.5cm and 3.5cm respectively . What is the

probability that the maximum differential settlement will not exceed .8m and also

what if it does not exceed 1.5 cm

Answer

a.

( ) ( )

75.025.0*35.02.0*5.2

4.02.0*2*

4max14max

====

===

≤−=>

C

B

A

CoVMeanSDcmcmP

σσσ

ρρρ

( ) ( ) ( )

( ) ( ) ( )

0925.09088.0*9986.0*11

333.135175.0

345.0

5.244.0241

4441

=−=−=

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛ −−=

≤≤≤−=

φφφ

φφφ

ρρρ CBA PPP

5

( ) ( )( ) ( ) ( ) ( )

582.05.0*84.0*994.01

75.033

5.05.23

4.0231

33313max3max13max

=−=

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛ −−=

≤≤≤−=>=≤−=>

φφφ

ρρρρρρρρ

CBA PPPcmPcm

b.

The differential settlement between A and B is AB∆ =3.5cm – 2.5cm = 1 cm has already

occurred. Hence, ( ) 08.0max =≤∆ cmP irrespective of Cρ , however Cρ matters if

( )cmP 5.1max ≤∆ , it is necessary if we need the data with 95% or 99% or 99.9% of

occurrence , we need to determine the following

( )( )( ) %9.9929.329.3

%9958.258.2%9596.1960.1

=+≤<−=+≤<−=+≤<−

σµσµσµσµσµσµ

XPXPXP

For the max∆ to be more than 1.5cm, cmA 5.2=ρ , either Cρ should be less than 1 or

more than 4. Since cmB 5.3=ρ , Cρ should be less than 2cm or more than 5 cm .

Acceptable region for safety is Cρ should be between 2 to 4.

6

( ) ( )

( ) ( )

8176.00912.09088.0

333.1333.175.0

3475.0

32425.1max

=−=

−−−=

⎟⎠⎞

⎜⎝⎛ −−⎟

⎠⎞

⎜⎝⎛ −=

≤≤=≤∆

φφ

φφ

ρ cmcmPcmP C

Example 3

The total load on the footing is the sum of dead load of the structure and the live load.

Since each load is sum of various components, total dead load (X) and total live load

(Y)can be considered as normally distributed. The data from building suggest that

tonandton xx 10100 == σµ and tonandton yy 1040 == σµ both x and y

are not correlated. What is the total design load that has 5 % probability?

The total load Z = x + y = 100 + 40 = 140 ton and

tonyxz 1.142101010 2222 ==+=+= σσσ

We need to determine z such that it has only 5% probability of occurrence

( )( )

tonz

z

z

zz

z

3.163

65.11.14

140

95.01.14

140

95.0

95.005.01

=

=−

=⎟⎠⎞

⎜⎝⎛ −

=⎟⎠⎞

⎜⎝⎛ −

==−

φ

σµφ

φφ

Example 4

7

The mean and coefficient of variation of the angle of internal friction φ of a soil

supporting a multi-storeyed structure are φ =20 and V φ =3O%. What is the probability

that φ will be less than (a)ο16 , (b)

ο10 , (c) ο5 , Assume that φ has a normal distribution

Solution

(a)

253.0)16(()

)666.0(1)(1)(

666.064

62016

6203.0

3.0

=°≤∴Φ

−Φ−=−Φ−=−Φ

−=−=−=−=

°=°×=

==

sPvaluestabulatedfromobtainediswhere

ssS

s

S

SV

φ

φ

φφ

φφ

φ

(b)

110048.0048.0)10(048.0952.01)666.1(1)666.1(

666.135

62010

−×==°≤

=−=Φ−=−Φ∴

−=−=−=

sP

s

(c)

It should be noted that φ denotes the friction angle and φ (s) is cumulative distribution of

standard normal variate.

Example on cumulative distribution

Example 1

In the case of previous car problem

( ) ( ) 3106006.0994.015.215.2

5.225

615

6205

−×==−=Φ−=−Φ∴

−=−=−=−=s

8

620.025.021.012.004.000.0 6 =++++ becomes the probability that 5 or less cars take

a left turn.

If x is a random variable and r is a real number then the CDF designated as F(r) is the

probability that x will take an value equal to or less than r or

( ) [ ]rxPrF ≤=

For binomial distribution

( ) [ ] ( )RNxbrxPrF irCxall t,,,,Σ=≤=

Though this distribution is quite simple, the model as such is quite useful in many

engineering problems. For example in a series of soil boring, boulders may or may not be

present.

Though the distribution is for discrete variables it can also be applied to continuous

variables in space and time with discretisation.

Example on lognormal distribution

Example 1

The rainfall has lognormal distribution with mean and SD of 60 inches and 15 inches

a. Calculate the probability that in future the annual rainfall in between 40 and 70

b. The probability that the annual rainfall is at least 30”

c. What is the 10 percentile annual rainfall

06.403.009.425.0*2160ln

25.06015

2 =−=−=

==

λ

ξ

9

a. The probability that the annual rainfall in between 40 and 70 is

( )

( ) ( )

7039.0069.0773.0

48.175.025.0

06.440ln25.0

06.470ln7040

=−=

−−=

⎟⎠⎞

⎜⎝⎛ −−−=≤<

φφ

φxP

b. The probability that annual rainfall is atleast 20 inches

( )

( )

041.09959.01

64.2125.0

06.430ln130

=−=−=

⎟⎠⎞

⎜⎝⎛ −−=≥

φ

φxP

c. 10 percentile rainfall

10.42

74.325.028.106.4ln

28.125.0

06.4ln

10.025.0

06.4ln

74.310.0

10.0

10

10

==

=−−=

−=−

=⎟⎠⎞

⎜⎝⎛ −

ex

x

x

xφ

Example 2

A live load of 20ton acts on a structure of the loading is assumed to be log-normal

distribution. Estimate the probability that a load of 40 will be exceeded. Assume CoV for

live load = 25%

We have

10

[ ] ( )[ ] ( ) ( )[ ] ( )[ ] ( ) 2/25.020ln

25.025.01ln

ln

ln

2

2

2

2

−=

=+=

−=

=

xE

x

CoVyEx

CoVx

σ

σ

σ

As x = lnL the value of the normal variate x equivalent 20 =ln 40 =3.69

Hence

92.225.0

96.269.3 =−=h

[ ] 002.0498.05.0)92.2(45.040 =−=−=≤ LP

Example on beta distribution

The φ of the soil samples in a locality varies between 20º to 40º with the coefficient of

variation of 20º with mean value 30º. The design value is 35º. What is the probability

that ο35≥φ .

( ) ( )

( )

( )

qrqr

qrqrq

qrq

q

abrq

qax

==−

=−+

=+

++=

−+

+=

010100201010

1020*

20*2030

µ

11

( ) ( )( )

( ) ( )

( ) ( ) ( ) ( ) ( )( )

05.505.5

09.0*124112*411

09.0

201

36

1

2

2

2

2

2

22

22

2

==

+=+

=+++

=+++

=

+++=

−+++

=

rq

qqq

qqqqq

qrqrq

qrrqrq

qr

abrqrq

qrxσ

mode x =

( )

30

20*1.8

05.420

20*2*05.52

05.5120

21

=−

−+=

−−+=

−−−

−+ abrq

qa

Coefficient of skew ness =

( )( )( ) xrqrq

rqσ2

2+++

−

q < r , the distribution is positive and skewed to the left

20 25 30 35 40

0.2

0.4

0.6

0.8

1.0

20 25 30 35 40

0.2

0.4

0.6

0.8

1.0

12

q > r , the distribution is negative and skewed to the right

a=20 b=40

( )

( )( ) ( )

( )

( )( )

( )( )

( )

( )

14.033.0

93.1896.5608493.1804.27

841420*352.1

1420206*14

141420*

1420

*146

10676.0

201

2.5

2.526*

1

0146

02066206

20

2

22

2

2

22

2

22

==

=−=−+

=+

+=

⎟⎠⎞

⎜⎝⎛ +

⎟⎠⎞

⎜⎝⎛

=

+++=

+++=

=

−+++

=

=−

=−+⇒+

=

++

+=

qr

rrrrr

rrr

rrr

rr

rr

rqrqqr

rqrqqr

CoVSD

abrqrq

qrqr

qrqrq

qrq

qa

x

x

ο

σ

µ

q =1.0 ; r = 4.0

q = r = 3.0q = 4 ; r = 2

q = r = 1.0

0.5 1

q =1.0 ; r = 4.0

q = r = 3.0q = 4 ; r = 2

q = r = 1.0

0.5 1

13

Example on binomial distribution

Example 1

Over a period of time it is observed that 40% of the automobiles traveling along a road

will take a left turn at a given intersection. What is the probability that given a traffic

stream of 10 automobiles 2 will take a left turn.

N= 10 ; x=2 ; R= 0.40

Possibility Probability

0 0.006

1 0.04

2 0.12

3 0.21

4 0.25

5 0.20

6 0.11

7 0.04

8 0.01

9 0.002

10 0.0001

14

Possibility Vs Probability

00.050.1

0.150.2

0.250.3

0 1 2 3 4 5 6 7 8 9 10

Possibility

Prob

abili

ty

Example 2

A dam has a projected life of 50years . What is the probability that 100years flood will

occur during the life time of the dam?

R = 1 / 100 = 0.01 , N = 50 years

b(1.50,0.01) = 50 (0.01)1(0.99)49= 0.31

N

10 10*(0.01)1*(0.99)9 0.09

20 20*(0.01)1*(0.99)19 0.165

30 30*(0.01)1*(0.99)29 0.224

40 40*(0.01)1*(0.99)39 0.270

50 50*(0.01)1*(0.99)49 0.305

60 60*(0.01)1*(0.99)59 0.387

70 70*(0.01)1*(0.99)69 0.350

80 80*(0.01)1*(0.99)79 0.362

90 90*(0.01)1*(0.99)89 0.368

100 100*(0.01)1*(0.99)99 0.370

15

Example 3

A flood control system for a river , the yearly maximum flood of river is concern. The

probability of the annual maximum flood exceeding some specified design level no. is

0.1 in any year. What is the probability that the level no. will be exceeded once in five

years.

Assuming binomial distribution means that there is only one occurrence or not in the

year. Each occurrence or not occurrence is independent of the other events. The

probability of occurrence in each trial is also considered.

Hence

( ) ( ) ( )

( )( )

( ) ( )

( ) ( )328.0

9.01.0*5

9.01.0!1!4

!5!!

!!1!0

!1

,,

41

41

==

=

−=

=

−

−

xnx

xnx

x

qpxxn

n

qpnRNxb

The probability that at least one exceedance of level no =

( ) ( ) ( ) 590.0101 ==+==≤ xPxPxP

Example on binomial distribution

Example 1

1. Find the variance of the binomial distribution b(xi,N,R)

= b(xi , 5 , 0.01)

16

22

222

22

22

2

][][

2][][

][][2][][

]2[][

)][(][

5.01.0*5)]01.0,5,(/[

xxExV

xxxExV

xExExxExV

xxxxExV

xxExV

NRxbxE

ii

ii

iii

iii

ii

i

−=

+−=

+−=

+−=

−=

===

Variance has the dimension of a square of the random variable ,more meaningful measure

of dispersion is the positive square root of variance called standard deviation

][][ ii xx =σ

The equivalent concept of standard deviation in static’s is radius of gyration.

Another useful relative measure of scatter of radius of gyration called co-efficient of

variation

%100*][][)(

xExxV σ=

Coefficient of Variation express a measure of central tendency .For example a mean

value of 10 and SD of 1 indicated 10% CoV .

10% < Low

15-30% = moderate

30% = High

For symmetrical distribution, all moment’s of odd order are zero. The third central

moment E(xi- x )3 represents the degree of skew ness .The fourth moment provides a

measure of peaked ness (Kurtosis) of the distribution to make these non-dimensional

,They are divided by standard deviation raised to cube or fourth order respectively.

Co-efficient of skew ness = β(1)=3

3

])[(])[(

i

i

xxxE

σ−

β(1) is +ve ,the long tail is on the right side of the mean , β(2) is negative ,long tail is on

the left side

17

Co-efficient of Kurtosis or peaked ness

β(2) = 4

4

][)][(

xxxE i

σ−

A distribution is said to be flat if β(2) < 3

Example on geometric distribution

Example 1

A structure is designed for a height 8m above the mean sea level. This height corresponds

to 10 % probability of being exceeded by sea waves in a year. What is the probability that

the structure will be subjected to waves exceeding 8m within return period of design

wave.

[ ] ( ) 6513.03487.019.01108

101.0

11

10 =−=−=>

===

yearsinMHP

yearsP

T

The number of trials (t) until specified event occurs for the first time is given by

geometric distribution

8m8m

18

( ) ( ) ( )111 *!1!0

!1,1,1 −− == tt qpqppb

Example 2

A transmission tower is designed for 50years period i.e. a wind velocity having a return

period of 50yrs

What is the probability that the design wind velocity will be exceeded for the first time in

5th year, after the completion of the structure?

Every year is considered as scale and the probability of 50 years wind occurrence in any

year is p=1 / 50 = 0.02

( ) ( )

( ) ( ) ( ) 0184.098.002.098.0,02.0,5,1

,,,41 ==

= −

b

qpqpNxb xNx

what is the probability that first such wind velocity will occur with 5 years after the

completion of the structure.

( ) ( )( )

096.00184.00188.00192.00196.002.0

98.002.055

1

1

=++++=

=≤ ∑=

−

i

tTP

Example on Poisson’s distribution

Example 1

Record of rain storm in a town indicates that on the average there have been four

rainstorms per year over the last years. If the occurrence is assumed to follow a poisons

process what is the probability that there is no rainstorm next year?

19

( )

195.0!4

4

0018.0!0

40

44

40

==

===

−

−

e

eXP t

The above result indicate that the average yearly occurrence of rainstorm is 4, the

probability of having exactly 4 storm in a year is also 4

The probability of 2 or more rainstorms in a year is

( )

908.0024.00018.01!

41

!42

1

0

4

2

4

=−−=−=

=≥

∑

∑

=

−

∞

=

−

x

xx

x

t

ex

ex

XP

Example 2

The probability that a structure fails is P(f) = 0.001 of 1000 such structures built what is

the probability that two will fail

( )( ) ( ) ( ) ( )

( ) ( ) NNNNN

eNNN

FormulaStirlings

RfPxNx

NfPNxb

NN

xNx

−+=

≈

−=

−

−

ln2ln21!ln

2!

!!!,,

π

π

b

( )( ) ( ) ( )

184.021000*998

999.0001.0!998!2

!1000,, 9982

==

=fPNxb

to use Poisson distributions we need the expected value of binomial distribution

= n*p = 1000*0.001=1

20

( )( ) ( ) ( )( ) ( ) 02.0406.03

18.0237.0137.00

184.0!2

1212

=====

==−

fffff

ef

Example 3

During world war II German dropped 54% bombs on London , an analysis was

conducted to determine if the bombs were guided or not. It was reasoned that if bombs

lacked guidance they should follow Poissons distribution. To check that, London was

divided into 180 regions of approximately equal area and the number of hits in each area

were recorded.

No of hits Observed No. of

areas within xi

Poissons

distribution with

f(xi) with µ=0.943

Theoretical No.

0 229 0.389 226

1 211 0.367 213

2 93 0.173 100

3 39 0.054 32

4 7 0.013 7

5 or more 1 0.004 2

580 1.00 580

The expected value per area = 547/580=0.943

From the above , one can say that bombing is a Poisson distribution

Example on exponential distribution

Example 1

Historical records of earthquake in San Francisco, California show that during the period

1836-1961 there were 16 earthquakes of intensity VI or more. If the occurrence of suc

21

high intensity earthquakes in the region is assumed to follow a poisson process then what

is the probability that such earthquakes will occur within the next years.

( ) ( ) 226.012

128.012516

2128.01 =−=≤

==

−eTP

yearperquakesγ

The probability that no earthquake of this high intensity will occur in the next 10 years is

( ) ( )

( ) yearsTE

eTP

8.7128.011

278.00

1

128.0101

===

==≥ −

γ

Hence that an earthquake of at least VI intensity can be expected on an average once in

very 7.8 years

Hence, the general model in the area is

( )( ) 632.0118.7

118.7*128.0

1

128.01

=−=−=≤

−=≤−−

−

eeTP

etTP t

Example on hyper geometric distribution

Example 1

A box contains 25 strain gauges and out of them 4 is known to be defective. If 6 gauges

were used in the experiment, what is the probability that there is one defective gauge in

the container.

N=25 , m=4 , n=6

The probability that one gauge is defective

22

( ) 46.0

625

521

14

1 =

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛

==XP

The probability that no gauge is defective

( ) 31.0

625621

0 ===XP

Example 2

An inspector on an highway project finds that two substandard test per 10 samples are a

good measure of contractors ability to perform. Find the probability that I among the five

samples selected at random

a. There is one substandard test

b. There are two such results

We have

( )⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛

==

nN

xnmN

xm

xXP

a. We have N = 10 , m = 2, x = 1, n = 5

55.0252

70*2

5151101

414181

*11121

510

48

12

===

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛

=

b. We have N = 50, m = 5, n = 10, x = 1

23

43.0

!10!40!50

!3619!45

*!411

!5

1050

945

15

==

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛

=

Example 3

In an area chosen for foundation structure at 50 locations were collected and shear

strength determined, If the 50,10 are unsuitable from shear strength considerations. In

order to improve shear strength insitu grouting is one of the methods proposed for 10

locations. What is the probability that

a. the present location

b. Two locations were initially unsuitable

Answer

N = 50, m = 10, n = 10, x = 1 and 2

( )

nN

xnmN

xm

xf⎟⎠⎞

⎜⎝⎛

−−

⎟⎟⎠

⎞⎜⎜⎝

⎛

=

a. The present location is unsuitable

24

( )

( ) 34.0

1050

840

210

1050

2101050

210

2

267.0

1050

940

110

1050

1101050

110

1

=

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛

=

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛

=

=

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛

=

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛−

−⎟⎟⎠

⎞⎜⎜⎝

⎛

=

f

f

if the location is such that, the one or two locations can be discarded leaving the ones in

unsuitable area, the it is considered as a distribution with replacement. Binomial

distribution can be chosen for the purpose

( )

( )

( ) 30.050101

5010

210

2

268.050101

5010

110

1

1

92

91

=⎟⎠⎞

⎜⎝⎛ −⎟⎟

⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=

=⎟⎠⎞

⎜⎝⎛ −⎟⎟

⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=

⎟⎠⎞

⎜⎝⎛ −⎟⎟

⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=

−

f

f

NM

NM

xn

xfxnx

Example 4

Records collected by a contractor over 40 years period indicate that there has been

240days of indigent weather, because of which the operations were closed down. On the

basis of past record , what is the probability that no days were lost next year

Answer

Mean value of occurrence λ = 240 / 40 = 6 per year

25

( )

( ) 360

10*48.2!0

60

!−

−

−

==

=

ef

xexf

x λλ

No of days Probability of occurrence of 0.6

0 0.0025

1 0.0149

2 0.0446

3 0.0892

4 0.1339

5 0.1606

6 0.1606

7 0.1377

8 0.1033

9 0.0688

10 0.0413

11 0.0225

12 0.0113

The life period of materials / radioactivity are normally characterized in terms of

exponential distribution.

( )( )( )∞

== −

T

T

tT

ff

etfλλ λ

0

the corresponding distribution is

( ) ∫ −=t t

T eTf0

λλ

26

[ ] tt

tt

ee

e

λλ

λ

λλ

−−

−

−=−−=

⎥⎦⎤

⎢⎣⎡−=

11

1

0

Mean or expected value

( )

( ) ( )

( ) ∫

∫∫

∞

∞−

−

∞

∞−

∞

∞−

==

=

=

dtetTE

hence

dxxxfxVar

dxxfkE

t

x

λλµ

Example 5

A storm event occurring at a pint in a space is characterized by two variables, duration of

the storm X and its intensity Y . if both the variables are exponentially distributed as

( )( ) 001

001

>≥−=

>≥−=−

−

byeyF

axexFby

y

axx

Assuming that the joint distributions of both X and Y is also exponential then,

( )( )( )

( )

( )

( ) ( )

( )

( )[ ]

( )

( ) byy

byax

ax

byax

byax

byaxX

byaxYX

ayaxaxYX

byaxbyax

byaxYX

beyfeyas

aeae

ebab

dyabexf

abeyx

F

aeaex

Feee

eeF

−

+−

−

∞+−

∞+−

∞ +−

+−

++−−

+−−−

−−

=→∞→

+=

−=

⎥⎦⎤

⎢⎣⎡−=

=

=

−+=

+−−=

−−=

∫

0

1

11

0

0

0

,2

,

,

δδδ

δδ

27

( ) ( )

( )

( )

[ ]

[ ][ ] 11.0

1

0

0

0

0

0

0 0

=−−=

=

=

−−=

⎥⎦

⎤⎢⎣

⎡−

=

=

=

∞−

∞ −

∞ −

∞ +−

∞ ∞ +−

+−

∫∫

∫

∫ ∫

by

by

by

byax

byax

byaxx

eb

dye

dye

aabe

abe

dxdyabexf

The joint PDF for the concentration much of two pollutants (x,y) in ppms

If (x,y) = 2-x-y 1,0 ≤≤ yx

Show that

a. f(x,y) is a probability distributions

b. Determine CDF

c. The joint probability that 43,21 ≤≤ yx

d. Marginal distribution of x and y

e. Are they independent?

f. If the concentration of y is 0.5ppm, determine the probability 25.0≤x

Answer

a.

if f(x,y) in pd the volume has to be

28

( )

( ) [ ]

121211*2

222212

222

10

21

0

1

0

1

0

1

0

2

=−−=

−−=−−=

⎟⎟⎠

⎞⎜⎜⎝

⎛−−=−−

∫

∫ ∫ ∫

xxxdxx

yxyydxdyyx

b.

( ) ( )

⎥⎦

⎤⎢⎣

⎡−−=

⎥⎦

⎤⎢⎣

⎡−−=

−−=

∫

∫ ∫

xyxyxy

dyyyy

F

x

x y

yx

22

42

2

042

2

0

2

0 0,

c.

( )

52.0

43*212

43*2143*21*243,2122

=

⎥⎦

⎤⎢⎣

⎡−−=F

( ) ( )

( ) ( ) xxxyyydxyxxf

yyxyxxdxyxyf

x

y

−=⎥⎦⎤

⎢⎣⎡ −−=⎥

⎦

⎤⎢⎣

⎡−−=−−=

−=⎥⎦⎤

⎢⎣⎡ −−=⎥

⎦

⎤⎢⎣

⎡−−=−−=

∫

∫

5.1212

222

5.1212

222

1

0

21

0

1

0

21

0

d.

29

( ) ( ) ( )

( ) ( ) ( )( )

( ) ( )( )byax

yx

byy

axx

yx

eabff

beyfaexfsimilarly

yxyy

xyftoequalbeshouldyxyfxf

BPAPBAP

+−

−−

=

==

−−=−−=

−−=

=∩

25.15.125.2

)(5.15.1*

*

Learning objectives

1. Understanding the concept of random field theory.

2. Understanding the significance of Auto covariance function.

3. Understanding the various functions of random fields.

1

MODULE - 3


Motivation:

Importance of understanding the random field theory

Highlights:

Knowledge of Stationary process, Auto covariance function

1

3 SPATIAL VARIABILITY USING RANDOM FIELDS

3. 1 Need for spatial variability characterization in design

Many quantities such as properties of materials, concentrations of pollutants, loads etc in civil

engineering have spatial variations. Variations are expressed in terms of mean or average

values and the coefficients of variation defined in terms of the ratio of standard deviation and

mean value expressed as percentage. In addition, the distance over which the variations are

well correlated also plays a significant role.

A successful design depends largely on how best the designer selects the basic parameters of

the loading/site under consideration from in-situ and/or laboratory test results. Probabilistic

methods in civil engineering have received considerable attention in the recent years and the

incorporation of soil variability in civil/geotechnical designs has become important.

Considerable work was carried out in the area of geotechnical engineering. Guidelines such

as those of JCSS (2000) have also been developed in this context. Dasaka (2005) presented a

comprehensive compilation on spatial variability of soils. In the following sections, spatial

soil variability of soils is addressed and the concepts are applicable to any other property

variations as well. Soil has high variability compared to manufactured materials like steel or

cement, where variability in material properties is less, as they are produced under high

quality control.

3.2 Characterization of variability of design parameters

It is generally agreed that the variability associated with geotechnical properties should be

divided in to three main sources, viz., inherent variability, measurement uncertainty, and

transformation uncertainty (Baecher and Christian 2003; Ang and Tang 1984).

3.2.1 Inherent variability

The inherent variability of a soil parameter is attributed to the natural geological processes,

which are responsible for depositional behaviour and stress history of soil under

consideration. The fluctuations of soil property about the mean can be modelled using a zero-

mean stationary random field (Vanmarcke 1977). A detailed list of the fluctuations in terms

of coefficients of variation for some of the laboratory and in-situ soil parameters, along with

the respective scales of fluctuation in horizontal and vertical directions are presented in

Baecher and Christian (2003).

3.2.2 Measurement uncertainty

Measurement uncertainty is described in terms of accuracy and is affected by bias (systematic

error) and precision (random error). It arises mainly from three sources, viz., equipment

errors, procedural-operator errors, and random testing effects, and can be evaluated from data

provided by the manufacturer, operator responsible for laboratory tests and/or scaled tests.

Nonetheless the recommendations from regulatory authorities regarding the quality of

produced data, the measuring equipment and other devices responsible for the measurement

of in-situ or laboratory soil properties often show variations in its geometry, however small it

may be. There may be many limitations in the formulation of guidelines for testing, and the

understanding and implementation of these guidelines vary from operator to operator and

contribute to procedural-operator errors in the measurement. The third factor, which

contributes to the measurement uncertainty, random testing error, refers to the remaining

scatter in the test results that is not assignable to specific testing parameters and is not caused

by inherent soil variability.

3.2.3 Transformation uncertainty

9

Computation models, especially in the geotechnical field contain considerable uncertainties

due to various reasons, e.g. simplification of the equilibrium or deformation analysis,

ignoring 3-D effects etc. Expected mean values and standard deviations of these factors may

be assessed on the basis of empirical or experimental data, on comparison with more

advanced computation models. Many design parameters used in geotechnical engineering are

obtained from in-situ and laboratory test results. To account for this uncertainty, the model or

transformation uncertainty parameter is used.

3.3.4 Evaluation design parameter uncertainty

The total uncertainty of design parameter from the above three sources of uncertainty is

combined in a consistent and logical manner using a simple second-moment probabilistic

method. The design parameter may be represented as

( )εξξ ,md T= (1)

where mξ is the measured property of soil parameter obtained from either a laboratory or in-

situ test. The measured property can be represented in terms of algebraic sum of non-

stationary trend, t, stationary fluctuating component, w, and measurement uncertainty, e. ε is

the transformation uncertainty, which arises due to the uncertainty in transforming the in-situ

or laboratory measured soil property to the design parameter using a transformation equation

of the form shown in Equation 1. Hence, the design property can be represented by Equation

2.

( )εξ ,ewtTd ++= (2)

Phoon and Kulhawy (1999b) expressed the above equation in terms of Taylor series.

Linearizing the Taylor series after terminating the higher order terms at mean values of soil

10

parameters leads to the Equation 3 for soil design property, subsequently the mean and

variance of design property are expressed as given in Equations 4 and 5.

( )( ) ( ) ( )0,0,0,

0,ttt

dT

eTe

wTwtT

εεξ

∂∂

+∂∂

+∂∂

+≈ (3)

( )0,tTm d ≈ξ (4)

22

22

22

2εξ ε

SDTSDeTSD

wTSD ewd ⎟

⎠⎞

⎜⎝⎛

∂∂

+⎟⎠⎞

⎜⎝⎛

∂∂

+⎟⎠⎞

⎜⎝⎛

∂∂

= (5)

The resulting variance of design parameter after incorporating the spatial average is given by

( ) 22

22

222

2εξ ε

SDTSDeTSDL

wTSD ewa ⎟

⎠⎞

⎜⎝⎛

∂∂

+⎟⎠⎞

⎜⎝⎛

∂∂

+Γ⎟⎠⎞

⎜⎝⎛

∂∂

= (6)

Of the above, the treatment and evaluation of inherent soil variability assumes considerable

importance as the uncertainties from measurements and transformation process can be

handled if proper testing methods are adopted and transformation errors are quantified.

Approaches for evaluation of inherent soil variability are developed based on random fields

and a brief description of the theory and its relevance to characterisation of soil spatial

variability is described in the following sections.

3.4 Random field Theory

Soil properties exhibit an inherent spatial variation, i.e., its value changes from point to point.

Vanmarcke (1977a; 1983) provided a major contribution to the study of spatial variability of

geotechnical materials using random field theory. In order to describe a soil property

stochastically, Vanmarcke (1977a) stated that three parameters are needed to be described: (i)

the mean (ii) the standard deviation (or the variance, or the coefficient of variation); and (iii)

the scale of fluctuation. He introduced the new parameter, scale of fluctuation, which

11

accounts for the distance within which the soil property shows relatively strong correlation

from point-to-point.

Figure 3.1(a) shows a typical spatially variable soil profile showing the trend, fluctuating

component, and vertical scale of fluctuation. Small values of scale of fluctuation imply rapid

fluctuations about the mean, whereas large values suggest a slowly varying property, with

respect to the average.

(a) (b)

Figure 3.1(a). Definition of various statistical parameters of a soil property (Phoon and Kulhawy 1999a); (b) approximate definition of the scale of fluctuation (Vanmarcke 1977a)

Vanmarcke (1977a) demonstrated a simple procedure to evaluate an approximate value of the

vertical scale of fluctuation, as shown in Figure 3.1(b), which shows that the scale of

fluctuation is related to the average distance between intersections, or crossings, of the soil

property and the mean.

A random field is a conceivable model to characterize continuous spatial fluctuations of a soil

property within a soil unit. In this concept, the actual value of a soil property at each location

within the unit is assumed to be a realization of a random variable. Usually, parameters of the

random field model have to be determined from only one realization. Therefore the random

12

field model should satisfy certain ergodicity conditions at least locally. If a time average does

not give complete representation of full ensemble, system is non-ergodic. The random field is

fully described by the autocovariance function, which can be estimated by fitting empirical

autocovariance data using a simple one-parameter theoretical model. This function is

commonly normalized by the variance to form the autocorrelation function. Conventionally,

the trend function is approximately removed by least square regression analysis. The

remaining fluctuating component, x(z), is then assumed to be a zero-mean stationary random

field. When the spacing between two sample points exceeds the scale of fluctuation, it can be

assumed that little correlation exists between the fluctuations in the measurements. Fenton

(1999a & b) observed that the scale of fluctuation often appears to increase with sampling

domain.

3.4.1 Statistical homogeneity

Statistical homogeneity in a strict sense means that the entire joint probability density

function (joint pdf) of soil property values at an arbitrary number of locations within the soil

unit is invariable under an arbitrary common translation of the locations. A more relaxed

criterion is that expected mean value and variance of the soil property is constant throughout

the soil unit and that the covariance of the soil property values at two locations is a function

of the separation distance. Random fields satisfying only the relaxed criteria are called

stationary in a weak sense.

Statistical homogeneity (or stationarity) of a data set is an important prerequisite for

statistical treatment of geotechnical data and subsequent analysis and design of foundations.

In physical sense, stationarity arises in soils, which are formed with similar material type and

under similar geological processes. Improper qualification of a soil profile in terms of the

statistical homogeneity leads to biased estimate of variance of the mean observation in the

13

soil data. The entire soil profile within the zone of influence is divided into number of

statistically homogeneous or stationary sections, and the data within each layer has to be

analysed separately for further statistical analysis. Hence, the partition of the soil profile into

stationary sections plays a crucial role in the evaluation of soil statistical parameters such as

variance.

3.4.2 Tests for statistical homogeneity

The methods available for statistical homogeneity are broadly categorised as parametric tests

and non-parametric tests. The parametric tests require assumptions about the underlying

population distribution. These tests give a precise picture about the stationarity (Phoon et al.

2003a).

In geostatistical literature, many classical tests for verification of stationarity have been

developed, such as Kendall’s τ test, Statistical run test (Phoon et al. 2003a). Invariably, all

these classical tests are based on the important assumption that the data are independent.

When these tests are used to verify the spatially correlated data, a large amount of bias

appears in the evaluation of statistical parameters, and misleads the results of the analysis. To

overcome this deficiency, Kulathilake and Ghosh (1988), Kulathilake and Um (2003), and

Phoon et al. (2003a) proposed advanced methods to evaluate the statistical homogeneous

layers in a given soil profile. The method proposed by Kulathilake and Ghosh (1988),

Kulathilake and Um (2003) is semi-empirical window based method, and the method

proposed by Phoon et al. (2003a) is an extension of the Bartlett test.

3.4.2.1 Kendall’s τ test

The Kendall statistic is frequently used to test whether a data set follows a trend.

Kendall’s is based on the ranks of observations. The test statistic, which is also the

measure of association in the sample, is given by

τ̂

τ̂

14

2/)1n(nSˆ−

=τ (7)

where n is the number of (X,Y) observations. To obtain S, and consequently , the following

procedure is followed.

τ̂

1. Arrange the observations (Xi, Yi) in a column according to the magnitude of the X’s,

with the smallest X first, the second smallest second, and so on. Then the X’s are said

to be in natural order.

2. Compare each Y value, one at a time, with each Y value appearing below it. In

making these comparisons, it is said that a pair of Y values (a Y being compared and

the Y below it) is in natural order if the Y below is larger than the Y above.

Conversely, a pair or Y values is in reverse natural order if the Y below is smaller

than the Y above.

3. Let P be the number of pairs in natural order and Q the number of pairs in reverse

natural order.

4. S is equal to the difference between P and Q;

A total of 2

)1n(n2n −

=⎟⎟⎠

⎞⎜⎜⎝

⎛possible comparisons of Y values can be made in this manner. If all

the Y pairs are in natural order, then 2

)1n(nP −= , Q=0,

2)1n(n0

2)1n(nS −

=−−

= , and hence

12/)1n(n2/)1n(n

=−−

=τ , indicating perfect direct correlation between the observations of X and Y.

On the other hand, if all the Y pairs are in reverse natural order, we have P=0,2

)1n(nQ −= ,

2)1n(n

2)1n(n0S −−

=−

−= , and 12/)1n(n2/)1n(n

−=−−−

=τ , indicating a perfect inverse

correlation between the X and Y observations.

15

Hence cannot be greater than +1 or smaller than -1, thus, τ̂ τ̂ can be taken as a relative

measure of the extent of the disagreement between the observed orders of the Y. The strength

of the correlation is indicated by the magnitude of the absolute value of τ̂ .

3.4.2.2 Statistical run test

In this procedure, a run is defined as a sequence of identical observations that is followed and

preceded by a different observation or no observation at all. The number of runs that occur in

a sequence of observations gives an indication as to whether or not results are independent

random observations of the same random variable. In this the hypothesis of statistical

homogeneity, i.e., trend-free data, is tested at any desired level of significance, α, by

comparing the observed runs to the interval between . Here, n=N/2, N being

the total number of data points within a soil record. If the observed number of runs falls

outside the interval, the hypothesis would be rejected at the α level of significance.

Otherwise, the hypothesis would be accepted.

2/;2/1; αα nn randr −

For testing a soil record with run test, the soil record is first divided into number of sections,

and variance of the data in each section is computed separately. The computed variance in

each section is compared with the median of the variances in all sections, and the number of

runs (r) is obtained. The record is said to be stationary or statistically homogeneous at

significance level of α, if the condition given below is satisfied.

2/;2/1; αα nn rrr ≤<− (8)

3.4.2.3 Bartlett’s approach

The classical Bartlett test is one of the important tests, which examines the equality of two or

multiple variances of independent data sets. The following steps are involved in the Bartlett’s

test.

16

The sampling window is divided into two equal segments and sample variance ( ) is

calculated from the data within each segment separately. For the case of two sample

variances, , the Bartlett test statistic is calculated as

22

21 sors

22

21 sands

( ) ( )[ ]22

21

2 logloglog2130259.2 sssC

mBstat +−−

= (9)

where m=number of data points used to evaluate . The total variance, s22

21 sors 2, is defined as

2

22

212 sss +

= (10)

The constant C is given by

( )1211−

+=m

C (11)

While choosing the segment length, it should be remembered that m≥10 (Lacasse and Nadim

1996). In this technique, the Bartlett statistic profile for the whole data within the zone of

influence is generated by moving sampling window over the soil profile under consideration.

In the continuous Bartlett statistic profile, the sections between the significant peaks are

treated as statistically homogeneous or stationary layers, and each layer is treated separately

for further analysis.

3.4.2.4 Modified Bartlett technique

Phoon et al. (2003a, 2004) developed the Modified Bartlett technique to test the condition of

null hypothesis of stationarity of variance for correlated profiles suggested by conventional

statistical tests such as Bartlett test, Kendall’s test etc, and to decide whether to accept or

reject the null hypothesis of stationarity for the correlated case. The modified Bartlett test

statistic can also be used advantageously to identify the potentially stationary layers within a

soil profile. This procedure was formulated using a set of numerically simulated correlated

17

soil profiles covering all the possible ranges of autocorrelation functions applicable to soil. In

this procedure, the test statistic to reject the null hypothesis of stationarity is taken as the peak

value of Bartlett statistic profile. The critical value of modified Bartlett statistic is chosen at

5% significance level, which is calculated from simulated soil profiles using multiple

regression approach, following five different autocorrelation functions, viz., single

exponential, double exponential, triangular, cosine exponential, and second-order Markov.

The data within each layer between the peaks in the Bartlett statistic profile are checked for

existence of trend. A particular trend is decided comparing the correlation length obtained by

fitting a theoretical function to sample autocorrelation data. If the correlation lengths of two

trends of consecutive order are identical, it is not required to go for higher order detrending

process. However, it is suggested that no more than quadratic trend is generally required to be

removed to transform a non-stationary data set to stationary data set (Jaksa et al. 1999).

The following dimensionless factors are obtained from the data within each layer.

Number of data points in one scale of fluctuation, z

kΔ

=δ (12)

Normalized sampling length, kn

zkznTI =

ΔΔ

==δ1 (13)

Normalized segment length, km

zkzmWI =

ΔΔ

==δ2 (14)

where δ is the scale of fluctuation evaluated, and ‘n’ is the total of data points in a soil record

of T. The Bartlett statistic profile is computed from the sample variances computed in two

contiguous windows. Hence, the total soil record length, T, should be greater than 2W. To

ensure that m≥10, the normalized segment length should be chosen as I2=1 for k≥10 and I2=2

for 5≤k<10 (Phoon et al. 2003a).

18

Equations 15 and 16 show the typical results obtained from regression analysis for I2 equals

to 1 and 2 respectively for the single exponential simulated profiles. Similar formulations

have also been developed for other commonly encountered autocorrelation functions and

reported in Phoon et al. (2003a).

BBcrit=(0.23k+0.71) ln(I1)+0.91k+0.23 for I2=1 (15)

BBcrit=(0.36k+0.66) ln(I1)+1.31k-1.77 for I2=2 (16)

A comparison is made between the peaks of the Bartlett statistic within each layer with Bcrit

obtained from the respective layer. If Bmax<Bcrit, the layer can be treated as statistically

homogeneous and hence, accept the null hypothesis of stationarity. Otherwise, if Bmax>Bcrit,

reject the null hypothesis of stationarity, and treat the sections on either side of the peaks in

the Bartlett statistic profile as stationary and repeat the above steps and evaluate whether

these sections satisfy the null hypothesis of stationarity. However, while dividing the sections

on either side of the peaks in the Bartlett statistic profile, it should be checked for m≥10,

where ‘m’ is the number of data points in a segment.

3.4.2.5 Dual-window based method

Kulathilake and Ghosh (1988) and Kulathilake and Um (2003) proposed a simple window

based method to verify the statistical homogeneity of the soil profile using cone tip resistance

data. In this method, a continuous profile of ‘BC’ distance is generated by moving two

contiguous sub-windows throughout the cone tip resistance profile. The distance ‘BC’, whose

units are same as qc, is the difference of the means at the interface between two contiguous

windows. In this method it is verified whether the mean of the soil property is constant with

depth, which is a prerequisite to satisfy the weak stationarity. At first, the elevation of the

window is taken at a level that coincides with the level of first data point in the qc profile.

After evaluating the BC distance, the whole window is moved down at a shift each time. The

19

computed distance ‘BC’ is noted each time at the elevation coinciding the centre of the

window (i.e., the intersection of two contiguous sub-windows). This length of sub-window is

selected based on the premise that at least 10 data points are available within the sub-window.

The data within the two sub-windows is treated separately, and checked for linear trend in the

data of 10 points. The reason behind verifying the data with only linear trend is that within

0.2 m profile, higher-order trends are rarely encountered. In addition, in normally

consolidated soils, the overburden stress follows a linear trend with depth. Kulathilake and

Um (2003) suggested that the demarcation between existence and non-existence of a linear

trend in the data be assumed at a determination coefficient (R2) of 0.9. It means that if the R2

value of theoretical linear fit is greater than 0.9, then the data set is said to be having a

linearly trend in it, if not the mean value is said to be constant throughout the sub-window.

Hence, within a window length (i.e., two contiguous windows) there exist four sets of

possibility of trend in the mean values. They are

1. Constant trend in both the contiguous sub-windows

2. Constant trend in upper sub-window and a linear trend in the lower sub-window

3. Linear trend in the upper sub-window and constant trend in the lower sub-window,

and

4. Linear trend in both the contiguous sub-windows.

The above four sets possibilities of trend within the contiguous windows are shown in Figure

3.2. As the distance ‘BC’ increases, the heterogeneity of the qc at the intersection between

two sub-sections increases.

20

3.4.3 Trend removal

Once the statistically homogeneous layers are identified within a soil profile, the individual

statistical layers are checked for the existence of trend, and the same is removed before

evaluating the variance and autocorrelation characteristics of the data. In general, all soil

properties exhibit a trend with depth. The deterministic trend in the vertical soil profile may

be attributed to overburden stress, confining pressure and stress history of soil under study.

Generally, a smooth curve can be fitted using the Ordinary Least Square (OLS) method,

except in special cases such as varved clays, where periodic trends are clearly visible (Phoon

et al. 2003a). In most of the studies, the trend line is simply estimated by regression analysis

using either linear or polynomial curve fittings.

Other methods have also been applied, such as normalization with respect to some important

physical variables, differencing technique, which is routinely used by statisticians for

transforming a non-stationary time series to a stationary one. The normalization method of

trend removal with respect to a physical quantity accounts for systematic physical effects on

the soil profiles. In general, the detrending process is not unique. Different trend removal

B C C C C

D D D D

B B B

R2 fo

r lin

ear f

it>0.

9

R2 fo

r lin

ear f

it<0.

9

R2 fo

r lin

ear f

it<0.

9

R2 fo

r lin

ear f

it<0.

9

Figure 3.2. Evaluation of ‘BC’ distance in various possible combinations

A A A A

21

procedures will in most cases result in different values of the random fluctuating components

and different shapes of the autocorrelation function.

Baecher (1987) commented that the selection of a particular trend function is a decision on

how much of the spatial variability in the measurements is treated as a deterministic function

of space (i.e., trend) and how much is treated statistically and modelled as random processes.

However, the detrending process cannot be entirely arbitrary. After all, the fluctuating

components remaining in the detrended soil records must be stationary for meaningful

statistical analyses to be undertaken on limited data points. Clearly, the chosen trend function

should be reasonable in view of this stationary constraint. The scale of fluctuation or

autocorrelation distance evaluated from the non-stationary data is always higher than the

corresponding stationary data. In other words, the trend removal invariably reduces the scale

of fluctuation of the soil properties. One of the simplest methods to evaluate whether a linear

or 2nd order polynomial trend is sufficient to be removed from the experimental data is to

calculate the scale of fluctuation for the above both detrended data. If the evaluated scales of

fluctuation are closer to each other, a detrending process using the lesser degree polynomial

is chosen. In the limit, the scale of fluctuation is zero when the entire profile is treated as a

‘‘trend’’ with zero ‘‘random’’ variation (Phoon et al. 2003a).

If a trend is evident in the measurements, it should be decided whether or not it should be

removed before statistical analysis of a set of raw data. An observed trend that has no

physical or geological basis or is not predictable must not be removed prior to statistical

analysis, since it is a part of the uncertainty to be characterized (Fenton 1999b). After

selecting a proper trend function for the data, the residuals off the trend are calculated. Phoon

et al. (2004) pointed out that trend removal is a complex problem, and there is at present no

fully satisfactory solution to it. The identified trend in the data is removed by employing any

of the following three widely used detrending methods.

22

3.4.3.1 Decomposition technique

In this method the data set is divided into stationary random field and nonstationary trend, by

using the results obtained from either a non-parametric test or a parametric test discussed in

the last section. Initially a linear trend is selected and removed from the original data. The

linearly detrended data is tested for the weak stationarity. If the residuals off the linear trend

do not satisfy the stationarity hypothesis, the above procedure is repeated by choosing a

higher order polynomial. However, it is suggested that no more than quadratic trend is

normally sufficient to transform a non-stationary data set to stationary data set (Jaksa et al.

1999), and keep them fairly stationary, as complete removal of the trend in the data is rarely

achieved.

3.4.3.2 Normalization technique

Normalisation of the data set with respect to a dominant parameter, such as cone tip

resistance, qc, effective overburden pressure, , is also used in geotechnical engineering to

make the data trend free.

'voσ

3.4.3.3 Differencing technique

In this method, a nonstationary data set is made stationary by using first, second or higher

order differencing technique. This method of testing a time series is suggested by Bowerman

and O'Connell (1983), which is suitable for data containing no seasonal variations. According

to Bowerman and O'Connell (1983) if the sample autocorrelation function for experimental

data dies down fairly quickly, the original data set can be treated as stationary. However, if

the sample autocorrelation function dies down extremely slow, then the original data set can

be transformed to a stationary set by taking first or second difference of original data set.

However, the term “fairly quickly” is rather subjective and extensive judgment is involved in

it. Moreover, it is observed that if no seasonal variations exist in the data, no more than

23

second difference is rarely needed to transform a nonstationary data to stationary data (Jaksa

et al. 1999).

2.4.4 Estimation of autocorrelation

Available methods for estimating the sample autocorrelation functions differ in their

statistical properties such as the degree of bias, sampling variability, ease of use,

computational requirements, etc.. The methods that are commonly used for this purpose are

method of moments, Bartlett’s approach, method based on maximum likelihood principle,

Geostatistics, etc. However, the method of moments is the most common used to estimate

sample correlation function of soil properties.

3.4.4.1 Method of moments

A classical way of describing random functions is through the autocorrelation function,

ρ(Δz). It is the coefficient of correlation between values of a random function at separation

of k. The spatial correlation of a soil property can be modelled as the sum of a trend

component and a residual term (Vanmarcke 1977a), as shown in Equation 2.17.

x=z+e 17)

where x is the measurement at a given location, z is the trend component, and e is the residual

(deviation about the trend). The residuals off the trend tend to exhibit spatial correlation. The

degree of spatial correlation among the residuals can be expressed through an auto-

covariance function.

( ) ( ) ( )[ ] ( ) ( )[ ]jjii ZtZPZtZPEkc −−= (18)

where k is the vector of separation distance between point i and j, E[.] is the expectation

operator, P(Zi) is the data taken at location i, and t(Zi) is the value of the trend at location i.

24

The normalized form of the autocovariance function given in Equation 19 is known as the

autocorrelation function.

ρ(k)= c[k]/c[0] (19)

where c[0] is the autocovariance function at zero separation distance, which is nothing but

variance data.

It is not possible to evaluate ‘ck’ nor ‘ρk’ with any certainty, but only to estimate them from

samples obtained from a population. As a result, the sample autocovariance at lag k, , and

sample autocorrelation at lag k, r

*kc

k, are generally evaluated. The sample autocorrelation

function (ACF) is the graph of rk for lags k=0,1,2, …h, where ‘h’ is the maximum number of

lags allowable. Generally, ‘h’ is taken as a quarter of total number of data points in time

series analysis of geotechnical data (Box and Jenkins 1970; Lumb 1975a). Beyond this

number, the number of pairs contributing to the autocorrelation function diminishes and

produces unreliable results. The sample ACF at lag k, rk, is generally evaluated using

( ) ( )( )

( ) ( )∑

∑

=

−

=+

−−

−−−−= N

ii

kN

ikii

k

XXN

XXXXkNr

1

2

1

111

1

(20)

If no measurement error or noise is present, r becomes equal to 1 at a lag distance of zero.

Statistically homogeneous data are used to evaluate the sample autocorrelation functions.

The autocorrelation characteristics of soil properties can be characterized either by

autocorrelation distance, or scale of fluctuation, which is theoretically equal to the area under

the correlation function. The scale of fluctuation (or correlation radius) for one dimensional

real field is defined as shown in Equation 21 (Vanmarcke 1977a).

( )∫∞

=0

2 ττρδ d (21)

25

More generally, the scale of fluctuation δ is defined as the radius of an equivalent “unit step”

correlation function, i.e., ρ(τ)=1 for τ≤δ and =0 for τ>δ , τ being the Euclidian lag (JCSS

2000). The autocorrelation distance (or scale of fluctuation) is evaluated from the sample

autocorrelation function using method of fitting or based on Bartlett limits, which are

described in the following sections.

3.4.4.1.1 Method of fitting

Analytical expressions are fitted to the sample autocorrelation functions using regression

analysis based on least square error approach. The least square error is generally

characterised by the determination coefficient of the fit. Frequently used single-parameter

theoretical auto-correlation functions are exponential, squared exponential, though models

such as triangular, second order auto-regressive, spherical, etc. are also not uncommon to fit

the sample autocorrelation data in geotechnical engineering. Some of these models are given

in Table 3.1.

Table 3.1. Theoretical autocorrelation functions used to determine the autocorrelation distance and scale of fluctuation, δ (Jaksa et al. 1999)

Model No.

Theoretical autocorrelation

function Autocorrelation function

Auto- correlation distance, ρ

Scale of fluctuation,

δ

1 Triangular ⎪⎩

⎪⎨⎧

≥Δ

≤ΔΔ

−=Δ

azfor

azforaz

z

0

1ρ

a a

2 Single exponential

( )bzz /exp Δ−=Δρ b 2b

3 Double exponential ( )( )2/exp czz Δ−=Δρ c cπ

4 Second-order Markov

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛ Δ+Δ−=Δ d

zdzz 1/expρ d 4d

5 Cosine exponential

( ) ⎟⎠⎞

⎜⎝⎛ Δ

Δ−=Δ ezezz cos/expρ e e

26

Table 3.1 shows the autocorrelation distance and corresponding scale of fluctuation for

theoretical autocorrelation functions. A small scale of fluctuation (δ) implies rapid

fluctuations about the mean and vice versa. and a large reduction in variance over any failure

plane; this results in a small “spread” of the performance function. Conversely a large δ

means much longer variations about the mean and results in smaller reduction in variance

over a failure plane (Mostyn and Soo 1992).

3.4.4.1. 2 Bartlett limits

In the field of time series analysis, the most commonly used method to compute the

autocorrelation distance is by Bartlett’s approximation. In this method the computed scale of

fluctuation corresponds to two standard errors of the estimate, i.e., the lag distance at which

the positive Bartlett’s limits given by Equation 2.21, superimposed on the autocorrelation plot

crosses the autocorrelation function (Jaksa et al. 1999).

Nrh

96.1±= (22)

The scale of fluctuation of cone tip resistance varies from site to site. Moreover, it also varies

with type of soil, as Jaksa et al. (2004) reports smaller scales of fluctuation in sands than

clays due to their nature of formation. Further, Fenton and Vanmarcke (1998) argue that the

scale of fluctuation depends largely on the geological processes of transport of raw materials,

layer deposition, and common weathering rather than on the actual property studied.

Nonetheless, DeGroot and Baecher (1993) observed that the scale of fluctuation is also

function of sampling interval on in-situ measured property.

3.4.5 Effect of anisotropy in correlation scales

Most soils in nature are usually anisotropic due to their mode of sedimentation and

consolidation that cause preferred particle orientations. There are generally two types of

27

anisotropy. Inherent or initial anisotropy manifests itself in the soil deposits as a result of

applied stresses at the time of formulation in the form of first-structure on a macroscopic

scale or as a fabric orientation on the microscopic scale. Stress or induced anisotropy arises

from changes in the effective stress state produced by subsequent loading history. This

anisotropy can cause the elastic, strength and compressibility parameters of the soil deposits

to vary with direction, and hence cannot be ignored.

The soil properties exhibit large variations and their directional behaviour is observed by

many researchers (Vanmarcke 1983; Jaksa et al. 1999; Phoon and Kulhawy 1999a; Griffiths

and Fenton 2000; Nobahar and Popescu 2002; Fenton and Griffiths 2003; Jaksa et al. 2004;

Sivakumar Babu and Mukesh 2004; and Uzielli et al. 2005; Wei et al. 2005). The

autocorrelation distances in vertical and horizontal directions are never the same, but in

general, differ by an order of magnitude, with horizontal scale of fluctuation being higher

than that in the vertical (Uzielli et al. 2005). Attempts have been made in the literature to

formulate autocorrelation models for 1, 2, and 3-dimensional soil space (Vanmarcke 1977a;

and Kulathilake and Miller 1987). The effect of anisotropy of soil properties on the bearing

capacity in a probabilistic framework has not been studied extensively in the literature. Many

times, due to economic feasibility, speed of exploration, availability of equipment and time

constraints vertical cone penetration data alone is obtained and used in the evaluation of

strength properties (Wei et al. 2005).

The autocovariance structure is called isotropic if the normalized autocovariance depends on

the Euclidian distances between field points only, instead of the axis directional distance

components, components, i.e.,

( ) ( )222,, zyxzyx Δ+Δ+Δ=ΔΔΔ ρρ (23)

28

Isotropy implies that the autocorrelation function is invariant to orthonormal transformation

of the field coordinates. Also the autocorrelation structure may be partly isotropic, for

example with respect to horizontal field directions:

( ) ( )zyxzyx ΔΔ+Δ=ΔΔΔ ,,, 22ρρ (24)

For complete anisotropy, the exponential correlation function in 3-D space is

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛ Δ−

Δ−

Δ−=ΔΔΔ

zyx Dz

Dy

Dx

zyx exp,,ρ (25)

If an isotropy in the horizontal direction is assumed, then the exponential correlation function

shown in Equation 2.25 is reduced to

( )⎟⎟

⎠

⎞

⎜⎜

⎝

⎛ Δ−

Δ+Δ−=ΔΔΔ

zh Dz

Dyx

zyx22

exp,,ρ (26)

Similar theoretical autocorrelation functions in 3-D field for other distributions can also be

formulated on the similar lines shown above.

3.4.6 Spatial averaging

Parameters in geotechnical analyses usually refer to averages of a soil property over a sliding

surface or a rupture zone in an ultimate failure analysis or significantly strained volumes in a

deformation analysis. If the dimensions of such surfaces or volumes exceed the scales of

fluctuation of the soil property, spatial averaging of fluctuations is substantial. This implies

that the variance of an averaged soil property over a sliding surface or affected volume is

likely to be substantially less than the field variance, which is mainly based on small sample

tests (e.g. triaxial tests) or small affected volumes in insitu tests (JCSS 2002).

Because of the spatial variability of soil properties, encountering a sufficiently low strength to

induce failure in localized areas is more likely than such an encounter over the entire zone of

29

influence. Both the conventional analyses based on the factor of safety and the simplified

probabilistic analyses fail to address this issue of scale of failure. Over the depth interval ΔZ

the spatial average soil property is given as

∫Δ

ΔΔ

=ΔZ

dzZuZ

Zu )(1)( (27)

The spatial average of the soil property u(x,y,z) over a volume V is given in the same way as

∫∫∫=V

v dxdydzzyxuV

u ),,(1 (28)

Averaging distance depends on the nature of the problem in hand. For design of shallow

foundations in shear criterion, this distance is equal to the extent of shear failure zone within

the soil mass (Cherubini 2000). This distance for shallow foundations in cohesionless soil

subjected to vertical loading is approximately taken as 2B below the base of footing in the

vertical direction and 3.5B from the centre of footing in the horizontal direction, where B is

the width of the footing.

3.4.7 Evaluation of variance reduction function

The combined effect of spatial correlation and spatial averaging of soil properties over the

failure domain are beneficially utilized to reduce the variance of the measured data within the

zone of interest. The derivation of the variance reduction functions in terms of spatial

correlation and spatial average is described in the following section. JCSS (2002) presents the

evaluation of variance reduction function by both exact approach and simplified approach.

3.4.7.1 Variance reduction for data in 1-D space

The variability of soil property ui from point to point is measured by standard deviation σi

and the standard deviation of the spatial average property uΔZ is by σΔΖ. The larger the length

30

(or the volume) over which the property is averaged, higher is the fluctuation of ui that tends

to cancel out in the process of spatial averaging. This causes reduction in standard deviation

as the size of the averaging length or volume increases, which is given by

( )i

zu Z

σσ Δ=ΔΓ (29)

A simple relationship of the variance reduction function in terms of scale of fluctuation and

averaging distance is given in Equation 2.30 (Vanmarcke 1977a).

0.10.1)(

0.1)(

2

2

≤=ΔΓ

>Δ

=ΔΓ

δ

δδ

LZ

LZ

Z (30)

The Equation 2.30 indicates that with decrease in scale of fluctuation and increase in

averaging distance, the value of variance reduction function reduces, which in turn reduces

standard deviation of the spatially averaged soil property. In other words, the more erratic the

variation (i.e., less correlated the soil property) of the soil property with distance and larger

the soil domain considered, larger will be the reduction in variability of the average property.

This phenomenon is a result of the increasing likelihood that unusually high property values

at some point will be balanced by low values at other point (Vanmarcke 1977a). However,

Vanmarcke (1983) emphasized that the variance reduction function γ(T) is related to the

autocorrelation function ρ(τ) as given by.

( ) ( )∫ ∫ −=T T

dtdtttT

T0 0

21212

1 ργ (31)

which reduces to

( ) ( )∫ ⎟⎠⎞

⎜⎝⎛ −=

T

dTT

T0

12 ττρτγ (32)

31

From Equation 2.32, the variance reduction functions for triangular, exponential, and squared

exponential autocorrelation functions can be worked out as given in Equations 2.33 to 2.35,

respectively.

( )⎪⎪⎩

⎪⎪⎨

⎧

≥⎟⎠⎞

⎜⎝⎛ −⎟

⎠⎞

⎜⎝⎛

≤−=

aTforTa

Ta

aTfora

T

T

31

31

γ (33)

( ) ( )⎟⎠⎞

⎜⎝⎛ −+−⎟

⎠⎞

⎜⎝⎛= bT

bT

TbT /exp12

2

γ (34)

( )⎟⎟

⎠

⎞

⎜⎜

⎝

⎛−⎟

⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛−+⎟

⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛= 1exp

22

dT

dTE

dT

TdT πγ (35)

where a, b, d are referred to as the autocorrelation distances, T is the averaging length, the

distance over which the geotechnical properties are averaged over a failure surface, and E(·)

is the error function, which increases from 0 to 1 as its argument increases from 0 to ∞. In

terms of standard Gaussian cumulative distribution function E(u)=2[FU(u)-0.5].

As the averaging length, T→∞ the variance reduction function, γ(T) →0. In other words, the

chances associated with failure of huge volume of soil are very rare. In addition, γ(T) is

inversely proportional to T at very large values of T.

The variance reduction factor for averaging in one, 2 or 3-D random field may be

approximated as given in Equations .

( ) ( )

( ) ( ) nnn

n

nnn

LLforLL

LLforLL

αα

α

≥=

≤=Γ

KK

KK

11

112 1

(36)

where n=1, 2, 3, and L1, L2 and L3 are the lengths over which averaging takes place and α1,

α2, α3 are the correlation radii. In case of “separable” autocorrelation functions, i.e. which can

32

be written as a multiplication of factors for each of the dimensions of a 2- or 3-D surface or

volume, the total variance reduction factor can, for the 3-D case be written as:

( ) ( ) ( ) ( 32

22

12

3212 LLLLLL ΓΓΓ=Γ ) (37)

Similar to the above, Vanmarcke (1977a) also proposed an approximate and simplified

resultant variance reduction factor in 2-D space as the product of individual variance

reduction factors in vertical and horizontal directions in terms of scale of fluctuation (δ) and

spatial averaging distance (L) in the respective directions as shown in Equation (38).

222hvA Γ×Γ=Γ (29).

The above propositions have been used in the analysis of spatial variability of soils and the

influence of spatial variability in foundation design is presented in Sivakumar Babu et al

(2005) Dasaka et al (2005) and Sumanta Haldar and Sivakumar Babu (2006).

33

Learning objectives

1. Understanding the concept of sampling

2. Understand the various sampling techniques, sampling

plans.

3. Understanding the significance of making decision based

on samplings.

1

MODULE – 4


Motivation:

Importance of understanding the various sampling

Highlights:

Knowledge of sampling techniques, decision based on sampling

1

4. Sampling

4.1. Concepts of Sampling

Many variables in Civil engineering are spatially distributed. For example concentration

of pollutants, variation of material properties such as strength and stiffness in the case of

concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates

of population parameters (e.g. means, variances, covariance’s) to characterize the entire

population distribution without observing and measuring every element in the sampled

population. Sampling theory for spatial processes principally involves evaluation of

estimator’s sampling distributions and confidence limits. A very good introduction to

these methods and the uses and advantages of sampling is provided by Cochran (1977)

and Beacher and Christian (2003).

An estimate is the realization of a particular sample statistic for a specific set of sample

observations. Estimates are not exact and uncertainty is reflected in the variance of their

distribution about the true parameter value they estimate. This variance is, in turn, a

function of both the sampling plan and the sampled population. By knowing this variance

and making assumptions about the distribution, shape, confidence limits on true

population parameters can be set.

A sampling plan is a program of action for collecting data from a sampled population.

Common plans are grouped into many types: for example, simple random, systematic,

stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or

in designing a specific program once the type plan has been chosen, one attempts to

obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a

fixed precision or a specified confidence interval.

4.2. Common Spatial Sampling Plans

Statistical sampling is a common activity in many human enterprises, from the national

census, to market research, to scientific research. As a result, common situations are

encountered in many different endeavors, and a family of sampling plans has grown up to

1

handle these situations. Simple random sampling, systematic sampling, stratified random

sampling, and cluster sampling are considered in the following section.

4.2.1. Simple random sampling

The characteristic property of simple random sampling is that individual are chosen at

random from the sampled population, and each element of population has an equal

probability of being observed. An unbiased estimator of the population mean from a

simple random x={x1………..xn} is the sample mean

--------------------------------(1)

This estimator has sampling variance.

∑=

= n

iix

n x

1

1

NnN

nxVar −=

2

)( σ

--------------------------------(2)

where σ2 is the (true) variance of the sampled population and N is the total sampled

population size. The term (N-n)/N is called the finite population factor, which for n less

than about 10% of N, can safety be ignored. However, since σ2 is usually unknown. it is

estimated by the sample variance

∑=

−−

=n

ii xx

ns

1

22 )(1

1

--------------------------------(3)

in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- of-

freedom due to estimating the mean from the same data. The estimator is unbiased but

does not have minimum variance. The only choice (i.e. allocation) to be made in simple

random sampling is the sample size n. Since the sampling variance of the mean is

inversely proportional to sample size. ( ) 1−∝ nxVar , a given estimator precision can be

obtained by adjusting the sample size, if σ is known or assumed. A sampling plan can be

optimized for total cost by assuming some relationship between ( )xVar and cost in

2

construction or design. A common assumption is that this cost is proportional to the

square root of the variance, usually called the standard error of the mean, ( )xVarx21=σ .

It is usually assumed that the estimates of y and Y are normally distributed about the

corresponding population values. If the assumption holds, lower and upper confidence

limits for the population mean and total mean are as follows:

Mean:

1LtsY y fn

= − − , 1UtsY y fn

= + −

Total:

1LtNsY N y f

n= − − , 1U

tNsY N y fn

= + −

The symbol t is the value of the normal deviate corresponding to the desired confidence

probability. The most common values are tabulated below:

Confidence probability (%) 50 80 90 95 99

Normal deviate, t 0.67 1.28 1.64 1.96 2.58

If the sample size is less than 60, the percentage points may be taken from Student’s

t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated

s2. The t distribution holds exactly only if the observations yi are themselves normally

distributed and N is infinite. Moderate departures from normality do not affect it greatly.

For small samples with very skew distributions, special methods are needed. An example

of the application is as follows.

Example.

In a site, the number of borehole data sheets to characterize the substrata to obtain design

parameters is 676. In each borehole data, 42 entries reflecting the various characteristics

3

of soils viz. compressibility, shear strength, compaction control, permeability etc are

indicated. In an audit conducted, it was revealed that in some datasheets, all the data are

not entered. The audit party verified a random sample of 50 sheets ( 7% sample) and the

results are indicated in Table.1

Table 21 Results for a sample of 50 petition sheets Number of signatures, yi Frequency, fi

42 41 36 32 29 27 23 19 16 15 14 11 10 9 7 6 5 4 3

23 4 1 1 1 2 1 1 2 2 1 1 1 1 1 3 2 1 1

∑ fi 50 We find

n = ∑ fi = 50, y = ∑ fi yi = 1471, ∑ fi yi 2 = 54,497

Hence the estimated total number of signatures is

( )( )676 147119,888

50Y N y= = =

For the sample variance s2 we have

22 2 2 ( )1 1[ ( ) ]

1 1i i

i i i ii

f ys f y y f y

n n f⎡ ⎤

= − = −⎢ ⎥− − ⎢ ⎥⎣ ⎦

∑∑ ∑ ∑

4

21 (1471)54,497 229.049 50

⎡ ⎤= − =⎢ ⎥

⎣ ⎦

The 80% confidence limits are given by

( )( )( )1.28 676 15.13 1 0.074019,888 1 19,888

50tNs f

n−

± − = ±

This gives 18,107 and 21,669 for the 80 % limits. A complete count showed 21,045

entries and is close to the upper estimate.

4.2.2. Systematic sampling

In systematic sampling the first observation is chosen at random and subsequent

observations are chosen periodically throughout the population. To select a sample of n

units, we take a unit at random from the first k units and every kth unit thereafter. The

method involves the selection of every kth element from a sampling frame, where k, the

sampling interval, is calculated as:

k = population size (N) / sample size (n)

Using this procedure each element in the population has a known and equal probability of

selection. This makes systematic sampling functionally similar to simple random

sampling. It is however, much more efficient (if variance within systematic sample is

more than variance of population) and much less expensive to carry out. The advantages

of this approach are that 1) the mistakes in sampling are minimized and the operation is

speedy, 2) it is spread uniformly over the population and is likely to be more precise than

the random sampling.

An unbiased estimate of the mean from, a systematic sample is the same as above

equation .The sampling variance of this estimate is

5

⎟⎠⎞

⎜⎝⎛ −

+⎟⎠⎞

⎜⎝⎛ −

=Nnk

NNxVar w

)1(1)( 2σ--------------------------------(4)

where it is the interval between samples (k = N/n) and is the variance of elements

within the same systematic sample

2wσ

∑∑

=

=

−

−=

k

i

n

j ijw n

xx

ks

1

12

2

1

)(1

--------------------------------(5)

in which xij is the jth member of the ith interval of the sample. When only one systematic

sample has been taken (i.e. one set of n observations at spacing k) the variance of the

mean cannot be evaluated unless an assumption is made about the nature of the sampled

population. The conventional assumption is that the population can be modeled by a

linear expression of the form, ii ex += μ , in which μ is the mean and ei is a zero-mean

random perturbation. For constant mean, this leads to (Cochran 1977)

⎟⎟⎠

⎞⎜⎜⎝

⎛

−

−⎟⎠⎞

⎜⎝⎛ −

= ∑1

)(1)(2

nxx

NnN

nxVar i

--------------------------------(6)

and for linearly trending mean, ib+= 0μμ

⎟⎟⎠

⎞⎜⎜⎝

⎛

−

+−⎟⎠⎞

⎜⎝⎛ −

= ∑ ++

)2(6)2(1)( 2

nxxx

NnN

nxVar kikii

--------------------------------(7)

One must ensure that the chosen sampling interval does not hide a pattern. Any pattern

would threaten randomness. A random starting point must also be selected. Systematic

sampling is to be applied only if the given population is logically homogeneous, because

systematic sample units are uniformly distributed over the population.

Example: Suppose the auditor in the previous example wants to use systematic sampling,

then he can choose every 25th or 50th sheet and conduct the study on this sample.

6

A starting point is chosen at random, and thereafter at regular intervals. For example,

suppose you want to sample 25th sheet from 676 sheets, 676/25=27, so every 27th sheet is

chosen after a random starting point between 1 and 15. If the random starting point is 11,

then the sheets selected are 11, 28, 65, 92 etc.

4.2.3. Stratified random sampling

When sub-populations vary considerably, it is advantageous to sample each

subpopulation (stratum) independently. Stratification is the process of grouping members

of the population into relatively homogeneous subgroups before sampling. The strata

should be mutually exclusive: every element in the population must be assigned to only

one stratum. The strata should also be collectively exhaustive: no population element can

be excluded. Then random or systematic sampling is applied within each stratum. This

often improves the representativeness of the sample by reducing sampling error. It can

produce a weighted mean that has less variability than the arithmetic mean of a simple

random sample of the population.

There are several possible strategies:

1) Proportionate allocation uses a sampling fraction in each of the strata that is

proportional to that of the total population. If the soil sample consists of 60% of boulders

(boulder stratum) and 40% sand (sand stratum), then the relative size of the two types of

samples should reflect this proportion.

2) Optimum allocation (or Disproportionate allocation) - Each stratum is proportionate to

the standard deviation of the distribution of the variable. Larger samples are taken in the

strata with the greatest variability to generate the least possible sampling variance.

Estimates of the total population characteristics can be made by combining the individual

stratum estimates. For certain populations, stratifying before sampling is more efficient

than taking samples directly from the total population. Sampling plans that specify a

simple random sample in each stratum are called stratified random sampling plans.

7

An unbiased estimator of the mean of the total sampled population is

∑=

=m

hhh xN

Nx

1

1 ------------------- (8)

Where x is the population mean, in is the number of strata and h denotes the stratum (i.e.

N is the size of the hth stratum, and hx is the corresponding mean). The variance of this

estimate is

)1()(2

2h

h h

hh f

ns

wxVar −= ∑--------------------------------(9)

where and Since the sample from each stratum is simple

random the estimate of the variance within each can be taken from above equation. Then,

an estimate of the variance of the total population is

NNw hh /= hhh Nnf /=

∑ +=h

amongmeansh ssN

xVar 221)(--------------------------------(10)

4.2.4. Cluster sampling

In cluster sampling, aggregates or clusters of elements are selected from the sampled

population as units rather than as individual elements, and properties of the clusters are

determined. From the properties of the clusters, inferences can be made on the total

sampled population. Plans that specify to measure every element within clusters are

called single-staged-cluster plans, since they specify only one level of sampling: plans

that specify that cluster properties be estimated by simple random sampling are called

two-staged cluster plans, since they specify two levels of sampling. Higher order cluster

plans are sometimes used.

We consider simplest case first: of M possible clusters, in are selected: the ratio

Mmf =1 called, as before, the sampling fraction. Each cluster contains the same

8

number of elements, N, some number n of which are selected for measurement (f2= n/N).

An unbiased estimate of the average of each cluster is

∑=j

iji

i xn

x 1 ----------------------------------(11)

where, xij is the jth element of the ith cluster. An unbiased estimate of the average of the

total population is

∑∑∑ ==i j

iji

ji xmn

xm

x 11

and the variance of this estimator is

22

2121

1 )1()1()( smn

ffsm

fxVar −+

−=

in which is the estimated variance among cluster means 21s

( )1

2

21 −

−= ∑

nxx

s i

and the estimated variance within clusters 22s

( ))1(

2

22 −

−= ∑∑

nmxx

s iij ----------------------------(12)

In the more general case, not all of the clusters are of equal size. For example, the

numbers of joints appearing in different outcrops are different. With unequal sized

clusters the selection plan for clusters is not as obvious as it was previously. The relative

probability, of selecting different sized clusters is now a parameter of the plan.

Commonly, the zj are either taken all equal (simple random sampling of the clusters) or

proportional to size. The precisions of these two plans are different. For selection with

equal probability an unbiased estimate of the true total population mean is

9

∑∑

=

ii

iii

N

xNx

and the variance of this estimator is

∑ −+

−−−

=i

iiimii

nsfN

Nm

fm

xxN

Nm

fx222

2

22

12

21 )1(

)1()()1()var(

in which N is the average value of Ni and ∑= i iim mxNx / if the assumption is made

that then the plan is self weighting and this simplifies to ii Nn ∝

( ) ( ) 22

221

2

2

)1()1(

1)( ii

miii sNNmn

ffm

xxNNm

fxVar Σ

−+

−−Σ−

= --------------------------------(14)

This assumption ( )ii Nn α is frequently valid: for example, proportionally more joints are

typically sampled from larger outcrops than from smaller outcrops.

For selection with probability proportional to size, an unbiased estimate of the total

population mean is

∑=i

ixn

x 1

and the variance is

( ))1(

)(2

−−Σ

=nm

xxxVar i --------------------------------(15)

In all eases, the variance of the total population can be estimated from the variances

between elements within clusters and the variance between the means of the clusters:

clusterswithinmeansamong222 σσσ +=

10

Joint surveys and many other types of civil engineering sampling may be based on cluster

plans because the cost of sampling many individual joints on one outcrop (i.e. a cluster)

is less than the cost of traveling between outcrops.

The variance of geological populations is often a function of spatial extent. Indeed, this is

the principal argument in the geo-statistical literature for favoring variograms over auto

covariance functions. If we consider the strength of soil specimens taken dose together,

the variance among specimens is usually smaller than the variance among specimens

taken from many locations in one area of the site, which, in turn, is smaller than the

variance among specimens taken from all across the site. Cluster techniques allow us to

evaluate variance as a function of the “extent’’ of the distribution in space by nesting the

variances.

11

Learning objectives

1. Understanding importance of reliability analysis.

2. Understanding different levels of reliability.

3. Understanding the various methods adopted such as

Moment method (FOSM), Hasofer-Lind approach.

1

MODULE -5


Motivation:

Importance of understanding the concept of reliability analysis

Highlights

Knowledge of terms such as Levels of reliability, FOSM .

1

5. Levels of reliability methods

There are different levels of reliability analysis, which can be used in any design

methodology depending on the importance of the structure. The term 'level' is

characterized by the extent of information about the problem that is used and provided.

The methods of safety analysis proposed currently for the attainment of a given limit state

can be grouped under four basic “levels” (namely levels IV, III, II, and I ) depending

upon the degree of sophistication applied to the treatment of the various problems.

1. In level I methods, the probabilistic aspect of the problem is taken into account by

introducing into the safety analysis suitable “characteristic values” of the random

variables, conceived as fractile of a predefined order of the statistical distributions

concerned. These characteristic values are associated with partial safety factors

that should be deduced from probabilistic considerations so as to ensure

appropriate levels of reliability in the design. In this method, the reliability of the

design deviate from the target value, and the objective is to minimize such an

error. Load and Resistance Factor Design (LRFD) method comes under this

category.

2. Reliability methods, which employ two values of each uncertain parameter (i.e.,

mean and variance), supplemented with a measure of the correlation between

parameters, are classified as level II methods.

3. Level III methods encompass complete analysis of the problem and also involve

integration of the multidimensional joint probability density function of the

1

random variables extended over the safety domain. Reliability is expressed in

terms of suitable safety indices, viz., reliability index, β and failure probabilities.

4. Level IV methods are appropriate for structures that are of major economic

importance, involve the principles of engineering economic analysis under

uncertainty, and consider costs and benefits of construction, maintenance, repair,

consequences of failure, and interest on capital, etc. Foundations for sensitive

projects like nuclear power projects, transmission towers, highway bridges, are

suitable objects of level IV design.

5. 1. Space of State Variables

For analysis, we need to define the state variables of the problem. The state variables are

the basic load and resistance parameters used to formulate the performance function. For

‘n’ state variables, the limit state function is a function of ‘n’ parameters .

If all loads (or load effects) are represented by the variable Q and total resistance (or

capacity) by R, then the space of state variables is a two-dimensional space as shown in

Figure 1. Within this space, we can separate the “safe domain” from the “failure

domain”; the boundary between the two domains is described by the limit state function

g(R,Q)=0.

Since both R and Q are random variables, we can define a joint density function fRQ (r,

q). A general joint density function is plotted in Figure 2. Again, the limit state function

separates the safe and failure domains. The probability of failure is calculated by

integration of the joint density function over the failure domain [i.e., the region in which

g(R, Q) <0]. As noted earlier, this probability is often very difficult to evaluate, so the

concept of a reliability index is used to quantify structural reliability.

2

Figure 1 - Safe domain and failure domain in two dimensional state spaces.

Figure 2 - Three-dimensional representation of a possible joint density function fRQ

5.3. RELIABILITY INDEX

Reduced Variables

It is convenient to convert all random variables to their “standard form;’ which is a non

dimensional form of the variables. For the basic variables R and Q, the standard forms

can be expressed as

3

Q

QQ

R

RR

RZ

RZ

σμ

σμ

−=

−=

-------------------------------- (1)

The variables ZR and ZQ, are sometimes called reduced variables. By rearranging

Equation no.1, the resistance R and the load Q can be expressed in terms of the reduced

variables as follows:

QQQ

RRR

ZRZR

σμσμ

+=+=

-------------------------------- (2)

The limit state function g(R, Q) = R-Q can be expressed in terms of the reduced variables

by using Eqs. 2. The result is

QQRRQRQQQRRRQR ZZZZZZg σσμμσμσμ −+−=−−+= )(),( ------------(3)

For any specific value of g(ZR, ZQ), Equation no.3 represents a straight line in the space

reduced variables ZR and ZQ. The line corresponding to g(ZR, ZQ) =0 separates the safe

and failure domain in the space of reduced variables. The loads Q and resistances R are

some times indicated in terms of capacity C and demand D as well in literature.

Design for a given reliability index

( )

( )( )( )2log

11ln

1ln

11

1var1

22

2

2

22

−−−−−−−−

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

++

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧ +

−=−=

−−−−−−⎟⎟

⎠

⎞

⎜⎜

⎝

⎛

+

−−==

normalareDandCifVV

VV

DC

PP

normalareiableifDCRP

CD

C

D

fs

DC

s

φ

σσφ

we define CFS = DC and write (1) and (2) in terms of CFS

4

( )

( )( )22

2

2

222

11ln

11

ln

1

CD

C

D

DC

VV

VV

CFS

normalareDandCforVVCFS

CFS

++

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛

++

=

+

−=

β

β

5.3.1. General Definition of the Reliability Index

A version of the reliability index was defined as the inverse of the coefficient of variation

The reliability index is the shortest distance from the origin of reduced variables to the

is illustrated in Figure 3, line g(ZR, ZQ) = 0 .This definition, which was introduced by

Hasofer and Lind (1974) following formula:

Using geometry we can calculate the reliability index (shortest distance) from the

following formula:

22QR

QR

σσ

μμβ

+

−=

-------------------------------- (4)

where β is the inverse of the coefficient of variation of the function g(R, Q) = R-Q

When R and Q are uncorrelated for normally distributed random variables R and Q, it can

be shown that the reliability index is related to the probability of failure by

)()(1 ββ −Φ=Φ−= −ff PorP

5

Figure 3 - Reliability index defined as the shortest distance in the space of reduced variables.

Table 1 provides an indication of how β varies with Pf.

Table 1- Reliability index β and probability of failure Pf

Pf β

10-1 1.28

10-2 2.33

10-3 3.09

10-4 3.71

10-5 4.26

10-6 4.75

10-7 5.19

10-8 5.62

10-9 5.99

The definition for a two variab1e case can be generalized for n variables as follows.

Consider a limit state function g(X1, X2…… Xn), Where the Xi variables are all

uncorrelated. The Hasofer-Lind reliability index is defined as follows:

1. Define the set of reduced variables {Z1, Z2, . . . , Zn} using

6

i

i

X

Xii

XZ

σμ−

=--------------------------------(5)

2. Redefine the limit state function by expressing it in terms of the reduced variables

(Z1,Z2,..,,Zn).

3. The reliability index is the shortest distance from the origin in the n-dimensional space

of reduced variables to the curve described by g(Z1, Z2, . . . , Zn) = 0.

5.4. First-order second moment method (FOSM)

This method is also referred to as mean value first-order second moment (MVFOSM)

method, and it is based on the first order Taylor series approximation of the performance

function linearized at the mean values of the random variables. It uses only second-

moment statistics (mean and variance) of the random variables. Originally, Cornell

(1969) used the simple two variable approaches. On the basic assumption that the

resulting probability of Z is a normal distribution, by some relevant virtue of the central

limit theorem, Cornell (1969) defined the reliability index as the ratio of the expected

value of Z over its standard deviation. The Cornell reliability index (βc) is the absolute

value of the ordinate of the point corresponding to Z = 0 on the standardized normal

probability plot as given in Figure 4 and equation.

7

μZ

σZ βcLimit surface

z=0

fZ(z)

Figure 4- Definition of limit state and reliability index

22SR

SR

z

zc

σσ

μμσμ

β+

−== -----------------------------------(6)

On the other hand, if the joint probability density function fX(x) is known for the multi

variable case, then the probability of failure pf is given by

∫=L

Xf dXxfp )( -----------------------------------(7)

where L is the domain of X where g(X)<0.

In general, the above integral cannot be solved analytically, and an approximation is

obtained by the FORM approach. In this approach, the general case is approximated to an

ideal situation where X is a vector of independent Gaussian variables with zero mean and

unit standard deviation, and where g(X) is a linear function. The probability of failure pf

is then:

∑ β−Φ=<β−α=<==

n

1iiif )()0X(P)0)X(g(Pp -----------------------------------(8)

8

where αi is the direction cosine of random variable Xi, β is the distance between the

origin and the hyper plane g(X)=0, n is the number of basic random variables X, and Φ is

the standard normal distribution function.

The above formulations can be generalized for many random variables denoted by the

vector X. Let the performance function is in the form given as

Z= g(X) = g (X1, X2….Xn) -----------------------------------(9)

A Taylor series expansion of the performance function about the mean value is given by

equation.

∑ ∑∑= = =

+−−∂∂

∂+−

∂∂

+=n

i

n

i

n

jXjXi

jiXi

iX jii

XXXXgX

XggZ

1 1 1

2

.........))((21)()( μμμμ ------(10)

where derivatives are evaluated at the mean values of the random variables (X1, X2… Xn)

and iXμ is the mean value of Xi. Truncating the series in linear terms, the first order mean

and variance of Z can be obtained as

),.....,(21 nXXXZ g μμμμ ≈

and, ∑∑= = ∂

∂∂∂

≈n

i

n

jji

jiZ XX

Xg

Xg

1 1

2 ),var(σ -----------------------------------(11)

where var (Xi, Xj) is the covariance of Xi and Xj. If the variances are uncorrelated, then

the variance for z is given as

9

∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

≈n

ii

iZ X

Xg

1

22 )var(σ ----------------------------------- (12)

The reliability index can be calculated by taking the ratio of mean ( Z )μ and standard

deviation of Z ( z )σ as in Equation 2.66.

z

z

σμ

β =

5.4.1. Linear limit state functions

Consider a linear limit state function of the form

∑=

+=+++=n

iiinnn XaaXaXaXaXaaXXXXg

103322110321 ............)..........,,(

where the ai terms (i = 0, 1, 2……. n) are constants and the Xi terms are uncorrelated

random variables. If we apply the three-step procedure outlined above for determining

the Hasofer-Lind reliability index, we would obtain the following expression for β:

∑

∑

=

=

+=

n

iXi

n

iXi

i

i

a

aa

1

2

10

)( σ

μβ

-------------------------------- (13)

Observe that the reliability index ,β , in Eq. 5.18 depends only on the deviations of the

random variables Therefore this is called a second moment measure of structural safety

because only the first two moments (mean and variance)

to calculate . There is no explicit relationship between β and the distributions of the type

of probability to calculate β variables. If the random variables are all normally distributed

10

and uncorrelated then this formula is exact in the sense that β and Pf are related by Eq.

5.15. Otherwise, Eq. 5.15 Provides only an approximate means of Probability of failure.

The method discussed above has some limitations and deficiencies. It does not use the

distribution information about the variable and function g( ) is linearized at the mean

values of the Xi variables. If g( ) is non-linear, neglecting of higher order term in Taylor

series expansion introduces significant error in the calculation of reliability index. The

more important observation is that the Equations 2.58 and 2.64 do not give constant value

of reliability index for mechanically equivalent formulations of the same performance

function. For example, safety margin R-S<0 and R/S <1 are mechanically equivalent yet

these safety margins will not lead to same value of probability of failure. Moreover,

MVFOSM approach does not use the distribution information about the variables when it

is available.

Nonlinear limit state functions

Now consider the case of a nonlinear limit state function. When the function is nonlinear,

we can obtain an approximate answer by linearizing the nonlinear function using a Taylor

series expansion. The result is

).....,(1

***2

*1321

**2

*1

)().....,()........,,(nxxxatevaluated

n

i iiinn X

gxXxxxgXXXXg ∑= ∂

∂−+≈

where is the point about which the expansion is performed. One choice for

this linearization point is the point corresponding to the mean values of the random

variables. Thus Eq. 5.19 becomes

).....,( **2

*1 nxxx

).....,(121321

**2

*1

)()....,.........,()........,,(nxxxatevaluated

n

i iiinn X

gxXxxxgXXXXg ∑= ∂

∂−+≈ μμμμ

11

Since Eq. 5.20 is a linear function of the Xi variables, it can be rewritten to be exactly like

Eq. 5.17. Thus Eq. 5.18 can be used as an approximate solution for ti reliability index .

After some algebraic manipulations, the following expression for results:

).....,(

1

2 **2

*1

21

)(

),.......,(

n

n

xxxatevaluatediin

iXii

XXX

Xgawhere

a

g∂∂

==∑

=

σ

μμμβ

The reliability index defined in Eq. 5.21 is called a first-order second-moment mean

value reliability index. It is a long name, but the underlying meaning of each part of the

name is very important:

First order because we use first-order terms in the Taylor series expansion ,

Second moment because only means and variances are needed. Mean value because the

Taylor series expansion is about the mean values.

5.4.2. Comments on the First-Order Second-Moment Mean Value Index

The first-order second-moment mean value method is based on approximating non

normal CDFs of the state variables by normal variables, as shown in Figure 5.15 for the

12

Figure 5 - Mean value second - moment formulation.

simple case in which g(R, Q) = R - Q. The method has both advantages and

disadvantages in structural reliability analysis.

Among its advantages:

1. It is easy to use.

2. It does not require knowledge of the distributions of the random variables.

Among its disadvantages:

1. Results are inaccurate if the tails of the distribution functions cannot be

approximate by a normal distribution.

2. 2. There is an invariance problem: the value of the reliability index depends on

the specific form of the limit state function.

The invariance problem is best clarified by an example.

13

5.5. Advanced first-order second moment method (AFOSM)

It is essential that irrespective of method of evaluation of reliability of a limit state, all the

mechanically equivalent performance functions must produce same safety indices.

However the MVFOSM method fails to satisfy the above condition in some cases, such

as in case of correlated variables and nonlinear limit state formulations. Hence, a new

approach, called Hasofer-Lind reliability index (Hasofer and Lind 1974) was developed

to tackle the problem of variant reliability indices produced using Cornell index. In this

method the reduced variables are defined as given in Equation 2.67.

i

i

X

Xii

XX

σμ−

=' , i=1, 2 …n -----------------------------------(14)

where is a random variable with zero mean and unit standard deviation. The above

equation is used to transform the original limit state g(X) =0 to reduced limit state

g’(X) =0. X is referred to as the original co-ordinate system and X′ reduced co-ordinate

system. Note that if X

'iX

i is normal in original co-ordinate system it will be standard normal

in reduced co-ordinate system.

The Hasofer-Lind reliability index (βHL) can be defined as the minimum distance from

the origin of the axes in the reduced co-ordinate system to the limit state surface. The

minimum distance point on the limit state surface is called the design point or checking

point. Considering the limit state function in two variables as given in Equation 2.68,

wherein R and S should be normal variables, the reduced variables can be written as

given in equations.

14

0=−= SRZ

R

RRRσ

μ−='

S

SSSσ

μ−='

Substituting values of R′ and S ′ in the above equation, the limit state equation in the

reduced co-ordinate system can be written as

0)( '' =−+−= SRSR SRg μμσσ -----------------------------------(15)

The position of the limit state surface relative to the origin in the reduced coordinate

system is a measure of the reliability of the system. By simple trigonometry, the distance

of the limit state line from the origin can be calculated and it will give the reliability

index value.

22SR

SRHL

σσ

μμβ

+

−= -----------------------------------(16)

This is same as the reliability index defined by the MVFOSM method, if both R and S are

normal. In this definition the reliability index is invariant, because regardless of the form

in which the limit state equation is written, its geometric shape and the distance from the

origin remains constant.

To be specific, β is the First-order second moment reliability index, defined as the

minimum distance from the origin of the standard, independent normal variable space to

the failure surface as discussed in detail by Hasofer and Lind (1974). Figure 2.5 shows

the plot depicting the functional relationship between probability of failure (pf) and

15

reliability index (β), and classifies the performance of designs based on these two values.

As seen from the figure, the performance is high if the reliability index is equal to 5,

which corresponds to a probability of failure of approximately 3×10-7.

5.5.1. Hasofer-Lind Reliability Index

Hasofer and Lind proposed a modified reliability index that did not exhibit the invariance

problem illustrated in Example 5.3. The “correction” is to evaluate the limit state function

at a point known as the “design point” instead of the mean values. The design point is a

point on the failure surface g = 0. Since this design point is generally not known a priori,

an iteration technique must be used (in general) to solve for the reliability index.

5.5.2. AFOSM Method for Normal Variables

The Hasofer-Lind (H-L) method is applicable for normal random variables. It first

defines the reduced variables as

),.......,2,1(' niX

Xi

i

X

Xii =

−=

σμ

-----------------------------------(17)

where is a random variable with zero mean and unit standard deviation. Above

equation is used to transform the original limit state g(X) = 0 to the reduced limit state

g(X`)= 0. The X coordinate system is referred to as the original coordinate syatem. The

X` coordinate system is referred to as the transformed or reduced coordinate system.

Note that if X

'iX

i is normal, ` is standard normal.. The safety index iX Hiβ is defined as the

minimum distance from the origin of the axes in the reduced coordinate system to the

limit state surface (failure surface). It can he expressed as

)()( `*`* xx IHi =β -----------------------------------(18)

16

The minimum distance point on the limit state surface is called the design point or

checking point. It is denoted by vector x* in the original coordinate system and by vector

x`* in the reduced coordinate system. These vectors represent the values of all the

random variables, that is, X1, X2. ..., Xn, at the design point corresponding to the

coordinate system being used.

This method can be explained with the help of figure shown Consider the linear limit

state equation in two variables.

Z=R—S=0 This equation is similar to above equation. Note that R and S need not be

normal variables. A set of reduced variables is introduced as

R

RRRσ

μ−=̀ -----------------------------------(19)

and

S

SSSσ

μ−=̀ -----------------------------------(20)

Figure 6 - Hasofer -Lind Reliability Index: Linear Performance Function

17

If we substitute these into above equation the limit state equation in the reduced

coordinate system becomes

( ) 0'' =−+−= SRSR SRg μμσσ -----------------------------------(21)

The transformation of the limit state equation from the original to the reduced coordinate

system is shown in above figure. The safe and failure regions are also shown. From the

figure, it is apparent that if the failure line (limit state line) is closer to the origin in the

reduced coordinate system the failure region is larger and if it is farther away from the

origin, the failure region is smaller. Thus, the position of the limit state surface relative to

the origin in the reduced coordinate system is a measure of the reliability of the system.

The coordinates of the intercepts of a on the R` and S` axes can be shown

to be

bove equation

( )[ ] ( )[ ]SSRRSR and σμμσμμ /,00,/ −−− , respectively .Using simple

trigonometry, we can calculate the distance of the limit state line from the origin as

22SR

SRHL

σσ

μμβ

+

−= -----------------------------------(22)

This distance is referred to as the reliability index or safety index. It is the same as the

reliability index defined by the MVFOSM method in above equation if both R and S are

normal variables. However, it is obtained in a completely different way based on

geometry. It indicates that if the limit state is linear and if the random variables R and S

arc normal, both methods will give an identical reliability or safety index.

…..xn) in

ates the safe state and g(X’) < 0 denotes the failure state, Again, the Hasofer-

ned coordinated system and 21 ........,, nXXXXX

In general, for many random variables represented by the vector X = (x1,x2,…

the origin

Lind reliability index is defi ````` = 3 in

the reduced coordinate system the limit state g(X`) = 0 is a nonlinear function as shown

in the reduced coordinates for two variables in figure .At this stage, 'iX s are assumed to

be uncorrelated. Here g(X`) > 0 denoted as the minimum distance from the origin to the

design point on the limit state in the reduced coordinates and can be expressed by above

equation , where x` - represents the coordinates of the design point or the point of

minimum distance from the origin to the limit state. In this definition the reliability index

is invariant, because regardless of the form in which the limit state equation is written, its

`

18

geometric shape and the distance from the origin remain constant. For the limit state

surface where ,

the failure region is away from the origin

ilure point. The Hasofer-L

Figure 7 - Hasofer - Lind Reliability Index: Nonlinear Performance Function

it is easy to see from figure that x` is the most probable fa ind

reliability index can be used to calculate a first-order approximation of the failure

probability as This is the integral of the standard norm

the larger is the failure probability. Thus the minimum distance point on the limit state

surface is also the most probable failure point. The point of minimum distance from the

origin to the limit state surface, x`*, represents the worst combination of the stochastic

variables and is appropriately named the design point or the most probable point (MPP)

of failure.

For nonlinear limit states, the computation of the minimum distance becomes an

optimization problem:

al density function

along the ray joining the origin and x`* it is obvious that the nearer x`* is to the origin,

'' xxDMinimized l=

( ) ( ) 0'int == xgxgconstrathetoSubjected

where x` represents the coordinates of the checking point on the limit state equation in

the reduced coordinates to be estimated. Using the method of Lagrange multipliers, we

can obtain the minimum distance as

19

∑

∑

=

=

⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

−=n

i i

i

n

ii

HL

Xg

Xgx

1

*2

'

*

'1

'*

β -----------------------------------(23)

Where is the ith partial derivative evaluated at the design point with

coordinates

( )*'/ iXg ∂∂

( )'*'*2

'*1 ....., nxxx . The asterisk after the derivative indicates that it is evaluated at

( )'*'*'* ....., xxx . The design point in the reduced coordinates is given by: 21 n

( )nix HLii ,.......2,1'* =−= βα

Where

∑=

*

An algorithm was form lated by Rackwitz (1976) to compute

⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

=n

i i

ii

Xg

Xg

1

*2

'

'

α - ----------------------------------(24)

are the direction cosines along the coordinate axes ' in the space of the original iX

coordinates and using equation, we find the design point to be

HLXiXi iix βσαμ −=*

HLβu and as follows:

• Step 1. Define the appropriate limit state equation.

e initial values of the design point . Typically, the

'*ix

nixi ..,.........2,1,'* =• Step 2. Assum

initial design point may be assumed to be at the mean values of the random variables.

Obtain the reduced variates ( )ii XXii xx σμ−='* .

• Step 3. Evaluate ( ) '*iii xat

• Step 4. Obtain the new design point in terms of

'* andXg α∂∂

'*ix HLβ as in equation.

• Step 5. Substitute the new in the limit state equation g(x`*) = 0 and solve for '*ix HLβ .

20

HLβ• Step 6. Using the value obtained in Steps 5. Re-evaluate

• Step 7. Repeat Steps

This algorithm is shown geometrically in figure. The algorithm constructs a linear

pproximation to the limit state at every search point and finds the distance from the

o the limit

assumed to be at the mean values of the random variables, as noted in Step 2. Note that B

it state equation

represented b

HLiix βα−='*

3 through 6 until converges.

a

origin t state. In figure, Point B represents the initial design point. usually

is not on the lim 0)( ' =Xg the tangent to the limit state at B is

y the line BC. Then AD will give an estimate of HLβ in the first iteration,

s noted in Step 5. As the iteration continues, a value converges.

Ditlevsen (1979a) showed that for a nonlinear limit state surface lacks comparability; the

ordering of

HLβ va rdering of actual reliabilities. An

example of this is shown in figure with two limit state surfaces: one flat and the other

curved. The shaded region to the right of each limit state represents the corresponding

ilure region. Clearly, the structure with the flat limit state surface has a different

e

lues may not be consistent with the o

fa

reliability than the one with the curved limit state surfac ; however, the HLβ values are

identical for both surfaces and suggest equal reliability. To overcom sistency,

Ditlevsen (1979a) introduced the generalized reliability index,

e this incon

gβ defined as

⎦2121 nng )

al variable. Respec se the reliability index in this

definition includes the entire sale region, it provides a consistent ordering of second-

. The integral in the eq tion looks similar to that in equation and is

979 roximating the

nes t

( ) ( ) ( ) ⎥⎤

⎢⎡

Φ= ∫ ∫− ''''''1 ........................ xdxxdxxx φφφβ -----------------------------------(25( )⎢⎣ >0'Xg ⎥

where Φ and φ are the cumulative distribution function and the probability density

function of a standard norm tively, Becau

moment reliability ua

difficult to compute directly. Hence, Ditlevsen (1 a) proposed app

nonlinear limit state by a polyhedral surface consisting of tangent hyper-pla a

selected points on the surface.

21

igure 8 - Algorithm for finding βHL F Note: A number in parentheses indicates iteration numbers

Consider a limit state function g(X1, X2, . . . , Xn) where the random variables Xi are all

uncorrelated. (If the variables are correlated, then a transformation can be used to obtain

uncorrelated variables. See Example 5.15.) The limit state function is rewritten in terms

of the standard form of the variables (reduced variables) using

i

iXiXZ

Xi σ

μ−=

As before, the Hasofer-Lind reliablity index is defined as the shortest distance from the

rigin of the reduced variable space to the limit state function g = 0. o

Thus far nothing has changed from the previous presentation of the reliability index. In

fact, if the limit state function is linear, then the reliability index is still calculated as in

Eq 5.18

∑

∑

=

=

+=

n

iXi

n

iXi

i

i

a

aa

1

2

10

)( σ

μβ

-------------------------------- (26)

22

If the limit state function is nonlinear , however, iteration is required to find the design

point in reduced variable space such that still corresponds to the

Shortest distance. This concept is illustrated in Figures 5.17 through 5.19 for the case of

two random variables.

}......,{ **2

*1 nZZZ

Figure 9 - Hasofer - Lind reliability index

Figure 10 - Design point on the failure boundary for the linear limit state function g = R – Q

23

Figure 11 - Design point and reliability index for highly nonlinear limit state function.

he iterative procedure requires us to solve a set of (2n + 1) simultaneous equation with

(2n + I) unknowns: β,α1, α2 ……., αn , where

T**

2*1 ......., nZZZ

∑= ⎟

⎟

⎠

⎞

⎜⎜

⎝

⎛

∂∂

∂∂

−

=n

k podesignatevaluatedk

podesignatevaluatedii

Zg

Zg

1

2

int

intα

iXii

i

ii Xg

ZX

Xg

Zg σ

∂∂

=∂∂

∂∂

=∂∂

( )

( ) 0............., **2

*1

*1

2

=

=

∑=

n

ii

ii

zzzg

z βα

α

Equation 5.23b is just an application of the chain rule of differentiation Equation 5.23c is

a requirement on the values of the αi variables, which can be confirmed by looking at

n

24

Eq.5.23a. Equation 5.25 is a mathematical statement of the require that the design point

must be on the failure boundary .

There are two alternative procedures for Performing the iterative analysis: the

simultaneous equation procedure and the matrix procedure The steps in the simultaneous

equation procedure are as follows:

1. Formulate the limit state function and appropriate parameters for all random variables

involved

2. Express the limit state function in terms of reduced variates zi.

3. Use Eq. 5.24 to express the limit state function in terms of β and αi.

4 Calculation then a values. Use Eq. 5.24 here also to express each αi as a function of all

5. Conduct the initial cycle: Assume numerical values of β and αi , noting that the αi

6. Use the numerical values of β and αi on the right-hand sides of the equations formed in

Steps 3 and 4 above.

7. Solve the n + 1 simultaneous equations in Step 6 for β and αi.

8. Go back to Step 6 and repeat. Iterate until the β and αi values converge.

The matrix procedure consists of the following steps:

1. Formulate the limit state function and appropriate parameters for all random variables

Xi(i = 1,2, . . . , n) involved.

2. Obtain an initial design point

β and αi.

values must satisfy Eq. 5.23c.

{ }*ix by assuming values for n-1 of the random variables

Xi. (Mean values are often a reasonable initial choice.) Solve the limit state equation g =

0 for the remaining random variable. This ensures that the design point is on the failure

boundary.

25

3. Determine the reduced variates { }*iZ corresponding to the design point { }*

ix using

iXσ

4. Determine the partial derivatives o

iXiiZ =*

the limit state function with respec duced

variates using Eq. 5.23b. For convenience, define a column vector {G} as the vector

x μ−*

f t to the re

whose elements are these partial derivatives multiplied by -1:

{ }int

2

1

podesignatevaluatedii

n

ZgGwhere

G

GG

G∂∂

−=

⎪⎪

⎪⎪⎬

⎫

⎪⎪⎩

⎪⎪⎨

⎧

= -------------------- (27)

⎭

5. calculate an estimate of β using the following formula:

{ } { }{ } { }

{ }

⎪⎪⎭⎪

⎪⎩

*zGG

⎪⎪⎬

⎪⎪⎨==

*2

1

*

n

T

zzwhereβ -------------------------(28)

reduces to Eq. 5.18.

factors using

⎫⎧ *

*T

zzG

The superscript T denotes transpose. If the limit state equation is linear, then Eq 5.28

6. Calculate a column vector containing the sensitivity

{ } { }{ } { }GG

GT

=α -------------------------------- (29)

7. Determine a new design point in reduced variates for n-1 of the variables using

βα iiZ =*

26

8. Determine the corresponding design point values in original coordinates for the n-1

values in Step 7 using

ii XiXi ZX σμ ** +=

9. Determine the value of the remaining random variable (i.e., the one not found in Steps

7 and 8) by solving the limit state function g = 0.

10. Repeat Steps 3 to 9 until and the design point { }*ix converge.

27

Problems

1. Most phytoplankton in lakes are too small to be individually seen with the unaided eye.

However, when present in high enough numbers, they may appear as a green

discoloration of the water due to the presence of chlorophyll within their cells. Climate

and water quality are among the factors influencing the quantity of phytoplankton in

shallow lakes. Assume that the rate of increase of phytoplankton can be expressed as a

linear function, g(X1, X2, X3) of three variables, namely X1 temperature of water, X2

global radiations and X3 concentrations of nutrients. X1, X2, X3 can be modeled as

normal random variables. Positive growth rates must be avoided.

Although it is observed that temperature and radiation have no effect on effect on

concentration of nutrients, so that ρ13 = ρ23 = 0, mutually they are highly correlated with

ρ12 = 0.8. The equilibrium function is given by g(X1, X2, X3) = a0 + a1*X1 + a2* X2 +

a3 *X3,

where a0 = -1.5 mg/cubic m, a1 = 0.08 mg/ (cubic m. degree Celsius), a2 = 0.01 mg/mW

and a3 = 0.05

Solution.

Equilibrium (limit state), g(X1, X2, X3) = 0 RANDOM VARIABLE,X

Mean ,µ Coefficient of Variation, V

Standard Deviation, σ

X1,degree Celsius 16 0.5 8 X2,W/square M 150 0.3 45 X3,mg/cubic m 100 0.7 70 Other variables are included in a0 because of difficulty in computing separately. Reliability index 22*0 iaiiaia σµβ ∑∑+=

32.1328.6=β = 1.72

957.0)72.1( =Φ=γ

There is a chance that the equilibrium situation 96 percent. Hence the risk that algal biomass will increase is only 4 percent. 2. The economic performance of the irrigation barrage located at a place in along a river

could be improved by installing a hydropower station to meet the local energy demand.

An engineer estimates the power demand X3 to be 600kW on average with variability

µ 600kw. If standard turbo axial turbine was installed, power output can be estimated as

7.5X1X2.Discharge X1 is measured in cubic m/s and hydraulic head in m;7.5 is

coefficient accounting for gravity, density of water, and overall efficiency of installed

equipment. Accordingly, power is given in units of kW. Although average discharge of

22 cubic m/s and an average head of 5.2m are available, discharge head availability

depends on natural flow variability; it is also subjected to the construction of barrage

handling, which is operated with priority for irrigation demand. Discharge and head can

be assumed to be independent normal variables, X1 and X2, with coefficients of variation

0.2 and 0.15 respectively. Assuming that demand X3 normal and independent of

discharge and head, evaluate that reliability of the plant.

Performance function g(X1, X2, X3) = 7.5*X1*X2 – X3 RANDOM VARIABLE

UNIT MEAN COEFFICIENT OF VARIATION

STANDARD DEVAITAION

Normal Discharge X1

Cubic m/s 22 0.22 4.4

Normal Hydraulic Head X2

m 5.2 0.15 0.78

Normal Power Demand X3

kW 600 0.10 60

Step1.Partial differentiation of performance functions with respect to each random variable.

33

2*1*5.72

1*2*5.71

σ

σ

σ

−=⎟⎠⎞⎜

⎝⎛

∂∂

=⎟⎠⎞⎜

⎝⎛

∂∂

=⎟⎠⎞⎜

⎝⎛

∂∂

fxg

xfxg

xfxg

Step2.computation of direction cosines,α .

α = 2 * ⎟

⎠⎞

⎜⎝⎛ ⎟

⎠⎞⎜

⎝⎛

∂∂⎟

⎠⎞⎜

⎝⎛

∂∂∑

⎟⎠⎞⎜

⎝⎛

∂∂

Xig

Xig

fXig

Step3.calcualtion of new Xif Xif = α β

Step4.using estimated values of mean and standard deviation, calculate Xif new Xifnew = µ i + σ Xif Step5.repeat the iteration until reliability index value converge to single value. Evaluation of reliability Limiting state of interest g(X1, X2, X3) = 7.5*X1*X2 – X3 = 0. Iteration process is illustrated in the following sections. MEAN COEFFICINET

OF VARIATION

STANDARD DEVIATION

INITIAL x1f 22.0 17.8 17.7 17.7 INITIAL x2f 5.2 4.6 4.7 4.7 INITIAL X3f 600 620 623 623

fXg ⎟

⎠⎞⎜

⎝⎛

∂∂

1 171.6 153.2 154.6 154.8

fXg ⎟

⎠⎞⎜

⎝⎛

∂∂

2 128.7 104.2 103.7 103.6

fXg ⎟

⎠⎞⎜

⎝⎛

∂∂

3 -60.0 -60.0 -60.0 -60.0

fXig ⎟

⎠⎞⎜

⎝⎛

∂∂∑ 49,610 37,922 38,254 38,276

f1α 0.770 0.787 0.790 0.791 f2α 0.578 0.535 0.530 0.529 f3α -0.269 -0.308 -0.307 -0.307

NEW x1f 17.8 17.7 17.7 17.7 NEW x2f 4.6 4.7 4.7 4.7 NEW X3f 620 622.8 622.7 622.7 G(.) 7.5 x1f * x2f - X3f

-4.5*10e-5 7.5*10e-6 1.7*10e-5 1.8*10e-5

β 1.24 1.23 1.23 1.23 It can be noted that last two iterations give identical value. This corresponds to reliability

of 89% and the risk of failure of 11%.

3. Consider a harbor breakwater constructed with massive concrete tanks filled with sand.

It is necessary to evaluate the risk that the breakwater will slide under pressure of a large

wave during major storm.

The following information/data is necessary for analysis.

Resultant horizontal force, Rh, depends on the balance between the static and dynamic

pressure components, and it can be taken as quadratic function of Hb (indicated in Figure)

under simplified hypothesis on the depth of the breakwater.

Random deep water value X4 = Hs, which is found from frequency analysis of extreme

storms in the area.

Resultant vertical force, Rv = X2 - FV

Where X2, weight of the tank reduced for buoyancy.

FV , a vertical component of dynamic uplift pressure due to the braking wave. It is

proportional to height of the height of the design wave, Hb, when the slope of sea bottom

is known.

Coefficient of friction, cf, can interpret as a random variable, X1, which represents

inherent uncertainty associated with its field evaluation.

RvRhif < cf , stability against sliding will exist.

Additional variate X3 is introduced to represent the uncertainties caused the

simplifications adopted to model the dynamic forces FV and Rh .

Simplification of the shoaling effects indicates that the height Hb of the design wave is

proportional to random deepwater value X4.

All random variables are assumed to be independent.

The constants a1, a2, a3 are depends on geometry of system.

Accounting for the sea-bottom profile and the geometry, one estimate constants, a1=7,

a2 =17m/KN, a3=145.

Limiting state equation

g(X1, X2,X3.X4) = X1X2- 70X1X3X4 -17X3X4 -17X3X4X4 -145X3X4 = 0 (1)

Random variables Mean Coefficient of

variation Standard deviation

X1 0.64 0.15 0.096 X2 3400 KiloNewton/m 0.05 108.80 X3 1 0.20 0.2 X4 5.16 0.18 0.93 Partial derivatives of performance function with respect to each random variable evaluated at failure point.

⎟⎠⎞⎜

⎝⎛

∂∂

1Xg

f = (x2-70x3*x4) σ1

⎟⎠⎞⎜

⎝⎛

∂∂

2Xg

f = x1* σ2

⎟⎠⎞⎜

⎝⎛

∂∂

3Xg

f = - (70*x1*x4 +17x4*x4+145 x4) σ3

⎟⎠⎞⎜

⎝⎛

∂∂

4Xg

f = - (70*x1*x3+34*x3*x4+145*x3) σ4

For first iteration, we should take expectations as the initial values.

⎟⎠⎞⎜

⎝⎛

∂∂

1Xg

f = (3400-70*1*5.09)0.096=292.19,

⎟⎠⎞⎜

⎝⎛

∂∂

2Xg

f = 0.64*170 = 108.80.

⎟⎠⎞⎜

⎝⎛

∂∂

3Xg

f = - (70*0.64*5.09+17*5.09*5.09+145*5.09)0.02 = -283.97.

⎟⎠⎞⎜

⎝⎛

∂∂

4Xg

f = - (70*0.64*1.0+34*1.0*5.09+145*1.0)0.889 = -322.67

Direction cosines ,αi

αi =2 * ⎟

⎠⎞

⎜⎝⎛ ⎟

⎠⎞⎜

⎝⎛

∂∂⎟

⎠⎞⎜

⎝⎛

∂∂∑

⎟⎠⎞⎜

⎝⎛

∂∂

Xig

Xig

fXig

α1 = 292.19 ( )510*8.2 e

= 0.550

α2 =108.80 ( )510*8.2 e = 0.205

α3 = -283.97 ( )510*8.2 e= -0.535

α4 = -322.67 ( )510*8.2 e= -0.608

New failure point is given by x1(new) =µ1- α1 * σ1 * β =0.64-0.053 β - (2) x2(new) =µ2- α2 * σ2 * β =3400-34.85 β -(3) x3(new)=µ3- α3 * σ3 * β =1+0.107 β - (4) x4(new)=µ4- α4 * σ4 * β =5.09+0.541 β - (5) By substituting (2), (3), (4), (5) in limit state equation (1), we get solution for reliability index, β =1.379. Iteration process I iteration II iteration III iteration IV iteration V iteration Initial x1f o.64 0.576 0.603 0.594 0.597 Initial x2f 3400 3352 3378 3370 3373 Initial x3f 1.00 1.147 1.088 1.105 1.009 Initial x4f 5.16 5.825 5.637 5.704 5.681 F(x*4f) 0.570 0.799 0.784 0.767 0.760

F(x*4f) 4.4*10e-1 2.5*10e-1 3.0*10e-1 2.8*10e-1 2.9*10e-1 invФ[F(x*4f)] 0.177 0.838 0.668 0.729 0.708 Ø{invФ[F(x*4f)]} 0.393 0.281 0.319 0.306 0.311 Mean of X*4f 5.090 5.590 5.424 5.481 5.461 Standard deviation of X*4f

0.899 1.136 1.065 1.090 1.081

⎟⎠⎞⎜

⎝⎛

∂∂

1Xg

f 292.19 278.69 284.59 282.86 283.47

⎟⎠⎞⎜

⎝⎛

∂∂

2Xg

f 108.80 96.42 102.58 100.94 101.53

⎟⎠⎞⎜

⎝⎛

∂∂

3Xg

f -283.97 -312.40 -311.15 -314.95 -313.6

⎟⎠⎞⎜

⎝⎛

∂∂

4Xg

f -322.67 -488.34 -430.68 -449.30 -442.7

fXig ⎟

⎠⎞⎜

⎝⎛

∂∂∑ 2.8*10e5 2.8*10e5 2.8*10e5 2.8*10e5 2.8*10e5

α1 0.550 0.525 0.536 0.533 0.534 α2 0.205 0.182 0.193 0.190 0.191 α3 -0.535 -0.605 -0.586 -0.593 -0.590 α4 -0.608 -0.920 -0.811 -0.846 -0.831 New x1f 0.567 0.603 0.594 0.597 0.591 New x2f 3352 3378 3370 3373 3372 Newx3f 1.147 1.088 1.105 1.099 1.101 Newx4f 5.836 6.348 6.201 6.525 6.234 g 4*10e-5 6.3*10e-5 -3*10e-5 -2.1*10e-5 -2.5*10e-5 β 1.379 0.726 0.899 0.837 0.83 Reliability Ф(β) = 0.805 Risk 1- Ф(β) = 0.195

Learning objectives

1. Understanding importance of sampling simulation

method.

2. Understanding the method of generation of random

numbers.

3. Understanding various variance reduction method.

1

MODULE – 6


Motivation:

Importance of understanding the importance of simulation

methods

Highlights

Knowledge of terms such as simulation methods , random

numbers and variance reduction methods.

1

1

6. Simulation Methods

It is well established that simulation techniques have proven their value especially for

problems where the representation of the state function is associated with difficulties.

Such cases are e.g. when the limit state function is not differentiable or when several

design points contribute to the failure probability.

The basis for simulation techniques is well illustrated by rewriting the probability integral

in equation

∫≤

=0)(

)(xg

Xf dXXfP

by means of an indicator function as shown in Equation

∫ ∫≤

≤==0)(

)(]0)([)(xg

XXf dXXfXgIdXXfP

where the integration domain is changed from the part of the sample space of the vector

X =(X1, X2, .. Xn)T for which g(x)≤0 to the entire sample space of X and where I[g(x)≤0]

is an indicator function equal to 1 if g(x)≤0 and otherwise equal to zero. Above equation

is in this way seen to yield the expected value of the indicator function I[g(x)≤0].

Therefore if now N realizations of the vector X, i.e.Njx j ,.........2,1ˆ , =

are sampled it

follows from sample statistics that

∑=

≤=N

jf xgI

NP

1]0)([1

Is an unbiased estimator of the failure probability Pf.

6.1.1. Crude Monte Carlo Simulation

The principle of the crude Monte Carlo simulation technique rests directly on the

application of above equation. A large number of realisations of the basic random

variables X, i.e. Njx j ,.........2,1ˆ , =

are generated (or simulated) and for each of the

2

outcomes jx̂, it is checked whether or not the limit state function taken in jx̂

is positive.

All the simulations for which this is not the case are counted (nf) and after N simulations

the failure probability pf may be estimated through

Nn

p ff =

which then may be considered a sample expected value of the probability of failure, In

fact for ∞→N , the estimate of the failure probability becomes exact. However,

simulations are often costly in computation time and the uncertainty of estimate is thus of

interest. It is easily realized that the coefficient of variation of the estimate is proportional

to fn1 meaning that if Monte Carlo simulation is pursued to estimate a probability in

the order of 10-6 it must be expected that approximately 108 simulations are necessary to

achieve an estimate with a coefficient of variance in the order of 10%. A large number of

simulations are required using Monte Carlo simulation and all refinements of this crude

techniques have the purpose of reducing the variance of the estimate. Such methods are

for this reason often referred to as variance reduction methods.

Simulation of the N outcomes of the joint density functions, in above equation principle

simple and may be seen as consisting of two steps. Here we will illustrate the steps

assuming that the n components of the random vector X are

independent.

In the first step a “pseudo random” number between 0 and 1 is generated for each of the

components in jx̂.

njx ji ,.....,1ˆ , =The generation of such numbers may be facilitated by

build-in functions of basically all programming languages and spreadsheet software

In the second step the outcomes of the “pseudo random” numbers Zji are transformed to

outcomes of jx̂ by

)(ˆ 1jijiji ZFx −=

3

where iXFis the probability distribution function for the random variable Xi, principle is

also illustrated in Figure.

This process is continued until all components of the vector jx̂ have been generated.

6.1.2. Importance of Sampling Simulation Method

As already mentioned the problem in using above equation is that the sampling function

f(x) typically is located in a region far away from the region where the indicator function

I[g(x)≤0] attains contributions. The success rate in the performed simulations are thus

low. In practical reliability assessment problems where typical failure probabilities are in

the order of 10-3 to 10-6 this in turn leads to the effect that the variance of the estimate of

failure probability will be rather large unless a substantial amount of simulations are

performed.

To overcome this problem different variance reduction techniques have been proposed

aiming at, with the same number of simulations to reduce the variance of the probability

estimate.

The importance sampling method takes basis in the utilization of prior information about

the domain contribution to the probability integral, i.e. the region that contributes to the

indicator function. Let us first assume that we know which point in the sample space x*

contributes the most to the failure probability. Then by centering the simulations on this

point, the important point, we would obtain a higher success rate in the simulations and

4

the variance of the estimated failure probability would be reduced. Sampling centered on

an important point may be accomplished by rewriting above equation 1in the following

way

( )[ ] ( ) ( )[ ] ( )( ) ( )dXXfXfXfxgIdXXfxgIP S

S

XXf ∫∫ ≤=≤= 00

which fs(x) is denoted the importance sampling density function. It is seen the integral in

above equation represents the expected value of the term ( )[ ] ( )( )XfXf

xgIS

X0≤ where the

components of s are distributed according to fv(x).The question in regard to the choice of

an appropriate importance sampling function fs(x), however, remains open

One approach to the selection of an importance sampling density function fs(x) to select a

n-dimensional joint Normal probability density function with uncorrelated components,

mean values equal to the design point as obtained from FORM analysis, i.e. µs=x* and

standard deviations standard deviation of the component of X i.e. σs= σX. In this case the

above equation may written as

( )[ ] ( )( ) ( ) [ ][ ] ( )

( ) ( )∫∫ ≤=≤= dXXXXf

xgIdXXfXfXf

xgIP XS

S

Xf ϕ

ϕ00

In the equivalence of the equation leading to

( )[ ] ( )( )∑

=

≤=N

j

Xf s

sfsgI

NP

101

ϕ

which may be assessed by sampling over realizations of s as described in the above.

Application of these equations greatly enhances effiency of the simulations. If the limit

state function is not too non-linear around the design point x* the success rate of the

simulations will be close to 50%. If the design point is known in advance in a reliability

problem where the probability of failure is in the order of 10-6 the number of simulations

required to achieve a coefficient of variance in the order of 10% is thus around 200. This

number stands in strong contrast to the 108 required using the crude Monte Method

discussed before, but of course also requires knowledge about the design point.

5

6.2. GENERATION OF RANDOM NUMBERS 6.2.1. Random Outcomes from Standard Uniform Variates

The probability integral transform indicates that generation of uniform (0, 1) random

numbers is the basic generation process used to derive the outcomes from a variate with

known probability distribution. Current methods of generating standard uniform variates

are deterministic, in the sense that systematic procedures are used after one or more

initial values are randomly selected. For example, system-supplied random number

generators in most digital computers are almost always linear congruent generators. This

algorithm is based on recursive calculation of sequence of integers k1, k2, k3,.., each

between 0 and m-1 (a large number) from a linear transformation:

( )( )mulocakk ii mod1 +=−

Here a and c are positive integers called the multiplier and the increment respectively,

and the notation (modulo m) signifies that k, is the remainder obtained after dividing

( ) mbycaki + where m denotes a (large) positive integer, hence denoting

( )[ ]mcakInt ii +=η the corresponding residual is defined as

iii mcakk η−+=+1

Hence,

( )[ ]mcakIntmcmakmku iiii +=+== ++ xx11

where the ui are uniform(0,1 ). Because these numbers are repeated with a given period,

they are usually called pseudo-random numbers. The quality at the results depends on the

magnitudes of the constants a,c and m and their relationships ,but the particular computer

used will impose constraints. Because the period of the cycle is not greater than m and it

increases with m, the main criterion is that the periods alter which the original numbers

are unavoidably repeated should be as long as possible .In practice, in is set equal to the

word length that is, the number of bits retained as a unit in the computer. Moreover, the

constants c and m should not have any common factors, and the value of a should be

sufficiently high. Because all possible integers between 0 and in m-1 occur after sonic

interval of time, regardless of the generator used, any initial choice of the seed k0 is as

good as any other.

6

Linear congruential algorithm

Suppose we assume low values for the constants in equation, a = 5 , c =1 and m = 8. Let

k0 = 1 be the seed for generating a sequence of random integers ki i=1,2,3……. For i=1.

One has

[ ] ( )[ ]( )

[ ] ( )[ ]( )

875.087738130875.38130

81658165(sec

75.086

6081575.081581158115(

22

112

11

001

====−−=−+=

+−+=+−+=

===

=−+=−+=+−+=+−+=

xxx

xxx

x

xxxxx

mkuInt

IntcakIntmcakkyeildsiterationondThemka

equationfromandInt

IntcakIntmcakk

125.0,0,375.0,25.0,625.0,5.0,875.0,75.0,125.0,0,375.0,25.0,625.0,5.0,875.0,75.0,125.0

0,375.0,25.0,625.0,5.0,875.0,75.0,125.0,0,375.0,25.0,625.0,5.0sequencefollowingtheyeilditerationssubsequentThe

which is seen to be cyclic with a period of 8, because the underlined sequence of 8 values

is repeated indefinitely. This is clearly shown by plotting ui+1 against u, in

Trajectory of 100 sequentially generated standard uniform random numbers with

a = 5, c = 1, and m = 8 (soild line) and with a = 2+1, c = 1 and m = 235 (dotted line)

Also shown in figure are results from the generator 1,127 =+= ca and 352=m , which

yields a much larger period of cyclicity. This choice gives satisfactory results for binary

computers: and a 101,c = 1, and m = 2b for a decimal computer with a word length b.

7

The advantage of the linear congruencies method when applied through a digital

computer is the speed of implementation. Because only a few operations are required

each time, its use has become widespread. A disadvantage is that once the seed is

specified, the entire series is predictable.

The pseudorandom numbers generated by these procedures can be tested for uniform

distribution and for statistical independence. Goodness-of-fit tests, such as the chi-

squared and the Kolmogorov-Smiruov tests can be used to verify that these numbers are

uniformly distributed. Both parametric and nonparametric methods, such as the runs test,

can be used to check for randomness between successive numbers in a sequence in spite

of the fact that these procedures are essentially deterministic. Pseudo-random numbers

generated with large m and accurate choices of a and c generally appear to be uniformly

distributed, and stochastically independent, so that they can be properly used to perform

Monte Carlo simulations. Algorithms to generate pseudo-random numbers which closely

approximate mutually independent standard uniform variates are a standard feature in

statistical software. Standard uniform random numbers are available as a system-supplied

function in digital computers, as well as in most customary computational and data

management facilities such as spreadsheets and data bases. Now a days, software tools

MATLAB, MS Excel and Mathcad have built in options to generate random numbers and

evaluation of probability of failure. Problems solved using Hasofer –Lind approach and

FORM methods earlier can be reworked using simulation procedures.

1. The problem solved in reliability can be solved using simulations as follows. The following are the variables with the values of parameters along with the performance function. Performance function g(X1, X2, X3) = 7.5*X1*X2 – X3. RANDOM VARIABLE

UNIT MEAN COEFFICIENT OF VARIATION

STANDARD DEVAITAION

Normal Discharge X1

Cubic m/s 22 0.22 4.4

Normal Hydraulic Head X2

M 5.2 0.15 0.78

Normal Power Demand X3

kW 600 0.10 60

The MATLAB program with the following commands is used to evaluate probability of failure. Number of simulations is taken as 100000 and all the three variables are normally distributed. n = 10000; x1 = ( randn(n,1) * 4.4) + 22; x2 = (rand(n,1)*0.78)+5.2; x3=(rand(n,1)*60)+600; y = 7.5.*x1.*x2-x3; hist(y,100); y_mean = mean(y) y_std = std(y) beta1 = y_mean/y_std pf = normcdf(-beta1,0,1) Answer: Probability density distribution of G is shown in Figure. Mean value of performance function G_mean =292.8984 G_std = 188.5695 Reliability index and probability of failure are 1 .5533 and 0.0602 respectively which is close to the result obtained in the previous case.

-600 -400 -200 0 200 400 600 800 1000 12000

50

100

150

200

250

300

350

Consider the harbor breakwater problem in the previous case. It is necessary to evaluate the risk that the breakwater will slide under pressure of a large wave during major storm. Resultant horizontal force, Rh, depends on the balance between the static and dynamic pressure components, and it can be taken as quadratic function of Hb under simplified hypothesis on the depth of the breakwater. Random deep water value X4 = Hs, which is found from frequency analysis of extreme storms in the area. Resultant vertical force, Rv = X2 - FV Where X2, weight of the tank reduced for buoyancy. FV , a vertical component of dynamic uplift pressure due to the braking wave. It is proportional to height of the height of the design wave, Hb, when the slope of sea bottom is known. Coefficient of friction, cf, can interpret as a random variable, X1, which represents inherent uncertainty associated with its field evaluation.

RvRhif < cf , stability against sliding will exist.

Additional variate X3 is introduced to represent the uncertainties caused the simplifications adopted to model the dynamic forces FV and Rh . Simplification of the shoaling effects indicates that the height Hb of the design wave is proportional to random deepwater value X4. All random variables are assumed to be independent. The constants a1, a2, a3 are depends on geometry of system. Accounting for the sea-bottom profile and the geometry, one estimate constants, a1=7, a2 =17m/KN, a3=145. Limiting state equation g(X1, X2,X3.X4) = X1X2- 70X1X3X4 -17X3X4 -17X3X4X4 -145X3X4 = 0 (1) Random variables Mean Coefficient of

variation Standard deviation

X1 0.64 0.15 0.096 X2 3400 KiloNewton/m 0.05 108.80 X3 1 0.20 0.2 X4 5.16 0.18 0.93 Solution: calculation of reliability index using Monte Carlo simulation All random variables are in normal distribution. Mean of y =340.4513 Standard deviation of y =327.4135 Reliability index =1.0398 Risk =0.1492

Learning objectives

1. Understanding importance of logical tree, fault tree

analyses and event tree analysis

2. Understanding the symbols used in fault tree analyses

and fault tree representation.

3. Understanding the symbols used in event tree analyses

and event tree representation.

4. Understanding the representation of decision tree .

1

MODULE – 7


Motivation:

Importance of understanding the use of fault tree analysis.

Highlights

Knowledge of terms such as logical tree, fault tree analysis,

event tree, decision tree and cause consequence charts.

1

1

Module 7

Logical Trees

Different sources of risk for an engineering system and / or activity can be analyzed with respect

to their chronological and causal components using logical trees. They are useful for the analysis

of the overall risk as well as for the assessment of the risk contribution from the Individual

components.

Fault tree and event tree diagrams are the most well known and most widely applied type of

logical trees in both qualitative and quantitative risk analysis. Even though more modern risk

analysis techniques such as eg. Bayesian probabilistic nets have been developed over last years,

fault tree and event tree are still main methods recommended (for US nuclear safety studies).

Fault trees and event trees are in many ways similar and the choice of using one other or a

combination of both in reality depends more on the traditions/preferences within a given industry

than the specific characteristics of the logical tree.

A significant difference between the two types of trees is though that the fault trees take basis in

deductive (looking backwards) logic and the event trees are inductive (looking forward). In

practical applications, a combination of fault trees and event trees is typically used. In this case,

the fault tree part of the analysis is concerned about the representation of the sequences of

failures, which may lead events with consequences and the event tree part of the analysis is

concerned with the representation of the subsequent evolution of the consequence inducing

events.

Intersection between the fault tree and the event tree is in reality a matter of preference of the

engineer performing the study. Small event tree / large fault tree and large event tree / small fault

tree techniques may be applied to the same problem to supplement each other and provide

additional insight with regard to the reliability of the considered system.

Decision trees are often seen as a special type of event tree, but may be seen in much wider

perspective and if applied consistently within the framework of decision theory, provide the

theoretical basis for risk analysis. The detailed analysis of the various types of logical trees

2

requires that the performance of the individual components of the trees already has been assessed

in terms of failure rates and or failure probabilities.

7.1. Fault Tree Analysis

As mentioned earlier a fault tree is based on a deductive logic starting by considering an event of

system failure and then aims to deduct which causal sequence of component failures could lead

to the system failure. The system is thus often referred to as a top event.

The logical interrelation of the sequences of component failures is represented through logical

connections (logical gates) and the fault tree forms in effect a

tree-like structure with the top event in the top and basic events at its extremities. The basic

events are those events, for which failure rate data or failure probabilities are available and which

cannot be dissected further. Sometimes the events are differentiated into initiating (or triggering)

events and enabling events, where the initiating events are always the first event in a sequence of

the enabling events are events, which may increase the severity of the initiated failure.

The fault tree is a Boolean logical diagram comprised primarily of AND and OR gates. The

output event of an AND gate occurs only if all of the input events occur simultaneously and the

output event of an OR gate occur if any one of the input occurs. Figure 1 illustrates different

commonly used symbols for AND and OR gates.

Figure 1 – Illustration of commonly used symbols for AND and OR gates

3

Several other types of logical gates exists such as QUANTIFICATION and COMPARISON,

however, these will not be elaborated in present text.

Top events and basic events also have their specific symbols as shown in figure below

Figure 2 – Symbols commonly used in fault tree representations

In the Figure 2 diamond shaped symbol represents an undeveloped scenario which has not been

developed in to a system of sub events due to lack of information and data.

Figure 3 – Principal shape of fault tree

4

It is noted that a fault tree comprising an AND gate represents a parallel system, i.e. all

components must fail for the system to fail. Such a system thus represents some degree of

redundancy because the system will still function after one component has failed. Fault trees

comprising an OR gate on the other hand represents a series system, i.e. a system without any

redundancy in the sense that it fails as soon as any one of its components has failed. Such as

system is often denoted a weakest component system. Systems may be represented alternatively

by reliability block diagrams, see Figure 4.

Figure 4 – Reliability block diagrams for OR and AND gates

In accordance with the rules of probability theory the probability of the event for an AND gate is

evaluated by

∏=

=n

iipP

1

And for an OR gate by

∏=

−−=n

iipP

1

)1(1

Where n is the number of ingoing event to the gate .Pi are the probabilities of the failure of

ingoing events and it is assumed that the ingoing are independent.

System failure modes are defined by so-called cut-sets, i.e. combinations of basic events, which

with certainty will lead to the top event. The number of such combinations can be rather large -

several hundreds for a logical tree with about 50 basic events. It is important to note that the top

event may still occur event though not all basic events in a cut set occur. A minimal cut set is the

5

cut set that represents the smallest combination of basic events leading to the top event,

sometimes denoted the critical path. The top event will only occur if all events in die minimal cut

set occur. An important aspect of fault tree analysis is the identification of the minimal cut sets as

this greatly facilitates the numerical evaluations involved.

7.2. Event trees

An event tree is a representation of the logical order of events leading to some (normally

adverse) condition of interest for a considered system. It should be noted that several different

states for the considered system could be associated the important consequences.

In contrast to the fault tree it starts from a basic initiating event and develops from there in time

until all possible states with adverse consequences have been reached. The initiating events may

typically arise as top events from fault tree analysis. The event tree is constructed from event

definitions and logical vertices (outcomes of events), which may have a discrete sample space as

well as a continuous sample space. Typical graphical representations of event trees are shown in

Figure 5.

Figure 5 – Illustration of the principal appearance of an event tree

Event trees can become rather complex to analyze, This is easily realized by noting that for a

system with n two-state components the total number of paths is 2n. If each component has m

states the total number of branches is mn.

6

7.3. Cause Consequence Charts

Cause consequence charts are in essence yet another representation of combined fault trees and

event trees in the sense that the interrelation between the fault tree and the event tree, namely the

top event for the fault tree (or the initiating event- for the event tree) is represented by a

rectangular gate with output event being either YES or NO, each of which will lead to different

consequences. The benefit of the cause consequence chart is that the fault tree need not be

expanded in the representation, enhancing the overview of the risk analysis greatly. An example

of a gate in a cause consequence chart is shown in Figure 6.

Figure 6 – Gate in a cause consequence chart

7.4. Decision trees

As already indicated, decision trees applied within the framework of the decision theory form the

basic framework for risk analysis. This may be realized by recognition of the fact that risk

analysis serves the purpose of decision-making. Either the risk analysis shows that the risks are

acceptable and one does nothing, or it is found that the risks are not acceptable and one has to do

something. The decision analysis is the framework for the assessment of the risks as well as for

7

the evaluation of how to reduce the risks most efficiently. An example of a decision tree is

Figure 7

Figure 7 – Principle representation of a decision tree

The decision tree is constructed as a consecutive row of decisions followed by uncertain events

thus reflecting the uncertain out come of the possible actions may follow from the decisions. In

the end of the decision tree, consequences (or utilities) are assigned in accordance with the

decisions and the outcomes of the uncertain events. Depending on the number of decisions and

or action involved in the decision analysis and thus represented in the decision tree various types

of decision analysis are required, ranging from the most simple so called prior decision analysis

to the most advanced pre-posterior analysis.

It is important to note that the probabilities for the different events represented in the decision

tree may be assessed by fault tree analysis, event tree analysis, reliability analysis or any

combination of these and thus the decision tree in effect includes all these aspects of systems and

component modeling in addition to providing a framework for the decision making.

Example 1

A power supply system is composed of an engine, a main fuel supply for the engine and

electrical cables distributing the power to the consumers. Furthermore, as a backup fuel

support a reserve fuel support with limited capacity is installed. The power supply system

fails if the consumer is cut of from the power supply. This in turn will happen if either the

power supply cables fail or the engine stops, which in turn is assumed only to occur if the

fuel supply to the engine fails.

Solution

A fault tree system model for the power supply is illustrated in figure below also the

probabilities for the basic events are illustrated.

Figure 1 – Illustration of fault tree for a power supply system

Use the rules of probability calculus we obtain that the probability of engine failure PEF is

equal to (AND gate)

PEF =0.01*0.01 = 0.000

Along the same lines we obtain that the probability of engine failure PEF is equal to (OR

gate) equal to (OR gate)

PEF =0.0001+0.01-0.0001*0.01 = 0.0101

Example 2

The event tree in figure below models the event scenarios in connection with

nondestructive testing of a concrete structure. Corrosion of the reinforcement may be

present and the inspection method applied may or may not detect the corrosion, given

corrosion is present and given that corrosion is not present.

Figure 2 – Illustration of event tree for the modeling of inspection of a reinforced concrete structure

In the Figure 7 the event C/ denote that corrosion is present, and the event / that the

corrosion is found by inspection. The bars over the events denote the complementary

events. On the basis of such event trees e.g. the probability that corrosion is present given

that it is found by inspection may be evaluated.

In many cases the event trees may be reduced significantly after some preliminary

evaluations. This is e.g. the case when it can be shown that the branching probabilities

become negligible. This is often utilized e.g. when event trees are used in connection

with inspection and maintenance planning. In such cases the branches corresponding to

failure events after repair events may often omitted at least for systems with highly

reliable components.

In Figure 6.14a combined fault tree and event tree is illustrated showing how fault trees

often constitute the modeling of the initiating event for event tree.

Figure 3 – Illustration of combined fault tree and event tree

Learning objectives

1. Understanding importance of system reliability

assessment.

2. Understanding the various steps involved in system

reliability assessment.

3. Understanding the methods adopted in system reliability

assessment.

1

MODULE -8


Motivation:

Importance of understanding the concept of system reliability.

Highlights

Knowledge of terms such as system reliability.

1

Ang, A.H.-S. and Tang, W.H. (1975). Probability Concepts in Engineering Planning

and Design, Vol. 1, Basic Principles, John Wiley, New York.

Ang, A.H.-S. and Tang, W.H. (1984). Probability concepts in engineering planning

and design. Volume II – decision, risk and reliability, John Wiley & Sons, Inc., New

York.

Baecher, G.B. and Christian, J.T. (2003). Reliability and Statistics in Geotechnical

Engineering, John Wiley and Sons, London and New York.

Becker, D.E. (1996a). Eighteenth Canadian Geotechnical Colloquium: Limit states

Design for foundations, Part I. An overview of the foundation design process,

Canadian Geotechnical Journal, Vol. 33, pp.956-983.

Becker, D.E. (1996b). Eighteenth Canadian Geotechnical Colloquium: Limit States

Design for Foundations. Part II. Development for the National Building Code of

Canada, Canadian Geotechnical Journal, Vol. 33, pp. 984-1007.

Benjamin, J. R. and Cornell, C. A. (1970). Probability, Statistics, and Decision for

Civil Engineers. McGraw-HillBook Co., New York.

Cherubini, C. (2000). Reliability evaluation of shallow foundation bearing capacity on

c�, �� soils, Canadian Geotechnical Journal, No. 37, pp. 264-269.

Christian, J.T., Ladd, C.C., and Baecher, G.B. (1992). Reliability and probability in

stability analysis, In Proceedings of Stability and Performance of Slopes and

Embankments, ASCE, GSP No.31, Vol. 2, pp. 1071-1111.

Christian, J. T., Ladd, C. C., and Baecher, G. B. (1994). Reliability applied to slope

stability analysis. Journal of Geotechnical Engineering, ASCE, Vol. 120, No. 12, pp.

2180-2207.

Christian, J.T. and Urzua, A. (1998). Probabilistic evaluation of earthquake-induced

slope failure, Journal of Geotechnical and Geoenvironmental Engineering, ASCE,

Vol. 124, No.11, pp.1140-1143.

http://search.barnesandnoble.com/booksearch/isbnInquiry.asp?userid=2VOZBDBRN5&isbn=0471498335&itm=1

http://search.barnesandnoble.com/booksearch/isbnInquiry.asp?userid=2VOZBDBRN5&isbn=0471498335&itm=1

References

Christian, J.T. (2004). Geotechnical Engineering Reliability: How well do we know

what we are doing?, Geotechnical and Geoenvironmental Engineering, ASCE, Vol.

130, No.10, pp. 985-1003.

William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977.

Dasaka S N Murthy (2006) Probabilistic site characterization and reliability analysis

of shallow foundations and slopes (Doctoral thesis, unpublished) Indian Institute of

Science, Bangalore.

Dasaka, Satyanarayana Murthy and Rao, Rajaparthy Seshagiri and Babu, Sivakumar

GL (2005) Reliability Analysis of Allowable Pressure of Strip Footing in Spatially

Varying Cohesionless Soil. In Augusti, G and Schueller, GI and Ciampoli, M, Eds.

Proceedings International Conference on Structural Safety and Reliability, pages 909-

915, Rome, Italy.

De Mello, V.F.B. (1977). Reflections on design decisions of practical significance to

embankment dams, Geotechnique, Vol. 27, No. 3, pp. 279-355.

Duncan, J.M. (2000). Factors of safety and reliability in geotechnical engineering,

Journal of Geotechnical and Geoenvironmental Engineering, ASCE, Vol.126, No.4,

pp. 307-316.

Fenton, G.A. (1999a). Estimation for Stochastic Soil Models, Journal of Geotechnical

and Geoenvironmental Engineering, ASCE, Vol. 125, No.6, pp. 470-485.

Fenton, G.A. (1999b). Random Field Modeling of CPT Data, Journal of Geotechnical

and Geoenvironmental Engineering, ASCE, Vol. 125, No.6, pp. 486-498.

Haldar, A. and Mahadevan, S. (2000). Probability, Reliability and Statistical Methods

in Engineering Design, John Wiley & Sons, 304 p.

Harr, M.E. (1987). Reliability based Design in Civil Engineering, McGraw-Hill

Publishers, New York.

JCSS (2000). Probabilistic model code, part 1: basis of design, 12th draft, Joint

Committee on Structural Safety.

250

References

Kulatilake, P.H.S.W. and Ghosh, A. (1988). An investigation into accuracy of spatial

variation estimation using static cone penetrometer data, In Proceedings of First

International Symposium on Penetration Testing, Orlando, Fla., pp. 815–821.

Kulathilake, P.H.S.W. and Um, J.G. (2003). Spatial variation of cone tip resistance

for the clay site at Texas A&M University, Geotechnical and Geological Engineering,

Vol. 21, pp. 149–165.

Lacasse, S. and Nadim, F. (1996). Uncertainties in characterising soil properties,

Plenary paper for ASCE Conference on Uncertainties, Madison, Wisconsin, USA.

Lacasse, S. and Nadim, F. (1998). Risk and reliability in geotechnical engineering, In

Proceedings 4th International Conference on Case Histories in Geotechnical

Engineering, St. Louis, MO.

Nathabandu T. Kottegoda and Renzo Rosso (1998) Statistics, Probability, and

Reliability for Civil and Environmental Engineers, McGraw-Hill International edition.

Phoon, K. K., Quek, S. T., Chow, Y. K., and Lee, S. L. (1990). Reliability analysis of

pile Settlement. Journal of Geotechnical Engineering, ASCE, Vol. 116, No. 11, pp.

1717–1735.

Phoon, K.K. and Kulhawy, F.H. (1999a). Characterization of geotechnical variability,

Canadian Geotechnical Journal, Vol. 36, pp. 612-624.

Phoon, K.K. and Kulhawy, F.H. (1999b). Evaluation of geotechnical property

variability, Canadian Geotechnical Journal, Vol. 36, pp. 625-639.

Phoon, K.K. (2002). Potential application of reliability-based design to geotechnical

engineering, In Proceedings of 4th Colombian Geotechnical Seminar, Medellin, pp. 1-

22.

Phoon, K.K., Quek, S.T., and An, P. (2003a). Identification of Statistically

Homogeneous Soil Layers using Modified Bartlett Statistics, Journal of Geotechnical

and Geoenvironmental Engineering, ASCE, Vol. 129, No. 7, pp. 649-659.

251

References

USACE (1997). Risk-based analysis in Geotechnical Engineering for Support of

Planning Studies, Engineering and Design. US Army Corps of Engineers, Department

of Army, Washington D. C., 20314-100.

Vanmarcke, E.H. (1977). Probabilistic modeling of soil profiles, Journal of the

Geotechnical Engineering Division, ASCE, Vol.103, No.GT11, pp.1227-1246.

Vanmarcke, E.H. (1983). Random fields: analysis and synthesis. MIT Press,

Cambridge.

Whitman, R.V. (1984). Evaluating calculated risk in Geotechnical Engineering, The

Seventeenth Terzaghi Lecture, Journal of Geotechnical Engineering, ASCE, Vol. 110,

No. 2, pp. 145-188.

Whitman, R. V. (2000). Organizing and evaluating uncertainty in geotechnical

engineering, Journal of Geotechnical and Geoenvironmental Engineering, ASCE,

Vol. 126, No.7, pp. 583-593.

252

curso nptel - reliability engineering

Documents