cs433: modeling and simulation

54
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University Al-Imam Mohammad bin Saud University 09 October 2010 Lecture 04: Statistical Models

Upload: abba

Post on 02-Feb-2016

117 views

Category:

Documents


4 download

DESCRIPTION

Al-Imam Mohammad bin Saud University. Lecture 04: Statistical Models. CS433: Modeling and Simulation. Dr. Anis Koubâa. 09 October 2010. Goals for Today. Review of the most common statistical models Understand how to determine the empirical distribution from a statistical sample. Topics. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS433: Modeling and Simulation

CS433: Modeling and Simulation

Dr. Anis Koubâa

Al-Imam Mohammad bin Saud UniversityAl-Imam Mohammad bin Saud University

09 October 2010

Lecture 04: Statistical Models

Page 2: CS433: Modeling and Simulation

2

Goals for Today

2

Review of the most common statistical models Understand how to determine the empirical

distribution from a statistical sample

Page 3: CS433: Modeling and Simulation

3

Topics

3

Page 4: CS433: Modeling and Simulation

4

Discrete and Continuous Random Variables

Page 5: CS433: Modeling and Simulation

5

Discrete versus Continuous Random Variables

Discrete Random Variable Continuous Random Variable

Finite Sample Space

e.g. {0, 1, 2, 3}

Infinite Sample Space

e.g. [0,1], [2.1, 5.3]

1

1. ( ) 0, for all

2. ( ) 1

i

ii

p x i

p x

i ip x P X x

Cumulative Distribution Function (CDF)

f xProbability Density Function

(PDF)

X

R

X

Rxxf

dxxf

Rxxf

X

in not is if ,0)( 3.

1)( 2.

in allfor , 0)( 1.

p X x

( )i

ix xp X x p x

0

xp X x f t dt

b

ap a X b f x dx

Probability Mass Function (PMF)

Page 6: CS433: Modeling and Simulation

6

Expectation

i

ii xpxxE all

)()(

dxxxfxE )()(

2 2 2x xV X x f x dx x f x dx

2

2 2

1 1 1

n n n

i x i i i i ii i i

V X x p x x p x x p x

Page 7: CS433: Modeling and Simulation

7

Statistical Models : Properties

Page 8: CS433: Modeling and Simulation

8

Discrete Probability Distributions

Bernoulli Trials Binomial Distribution Geometric Distribution Poisson Distribution Poisson Process

Page 9: CS433: Modeling and Simulation

9

Modeling of Random Events with Two-States

Bernoulli Trials Binomial Distribution

Page 10: CS433: Modeling and Simulation

10

Bernoulli Trials Context: Random events with two possible values

Two events: Yes/No, True/False, Success/Failure Two possible values: 1 for success, 0 for failure. Example: Tossing a coin, Packet Transmission Status,

An experiment comprises n trial. Probability Mass Function (PMF): Probability in one trial

Bernoulli Process: n trials

, 1, 1, 2,...,

PMF: one trial ( ) 1 , 0 1 2

0, otherwise

j

j j

p x j n

p p x p q x , j , ,..., n

Expected Value: jE X p 2Variance : 1jV X p p

1 1 1 2, ,..., , ...n np X X X p X p X p X

Page 11: CS433: Modeling and Simulation

11

Binomial Distribution Context: Number of successes in a series of n trials.

Example: the number of 'heads' occurring when a coin is tossed 50 times.

The number of successful transmissions of 100 packets Probability Mass Function (PMF)

Binomial distribution with parameters n and p, X ~ B(n,p)Probability of having k successes in n trial

where k = 0, 1, 2, ......., n k: number of success n = 1, 2, 3, ....... n: number of trials 0 < p < 1 p = success probability

1n kkn

P X k p pk

!

! !

n n

k k n k

with

Expected Value: E X n p 2Variance : 1V X n p p

Page 12: CS433: Modeling and Simulation

12

Binomial Distribution

2

2 2

2 1 2 3 3 1

3! 61 1

2! 3 2 ! 2

p X p E p E p E p p

p p p p

!

! !

n n

k k n k

Probability of k=2 successes in n=3 trials?

E1: 1 1 0

E2: 1 0 1

E3: 0 1 1

P(E1)= p(1 and 1 and 0) = p p (1-p) = p2 (1-p)

P(E2)= p(1 and 0 and 1) = p (1-p) p = p2 (1-p)

P(E3)= p(1 and 1 and 0) = p p (1-p) = p2 (1-p)

Probability of k=2 successes in n=3 trials?

or

or

Page 13: CS433: Modeling and Simulation

13

End of Part 01

Administrative issues• Groups Formation

Page 14: CS433: Modeling and Simulation

14

Modeling of Discrete Random Time

Geometric Distribution

Page 15: CS433: Modeling and Simulation

15

Context: the number of Bernoulli trials until achieving the FIRST SUCCESS. It is used to represent random time until a first transition occurs.

Geometric Distribution

1 , 0,1,2,...,PMF: ( )

0, otherwise

kq p k np X k

1Expected Value : E X

p

22 2

1Variance :

q pV X

p p

PMF

k

CDF: F 1 1k

X p X k p

Page 16: CS433: Modeling and Simulation

16

Modeling of Random Number of Arrivals/Events

Poisson Distribution Poisson Process

Page 17: CS433: Modeling and Simulation

17

Poisson Distribution Context: number of events occurring in a fixed period

of time Events occur with a known average rate and are independent

Possion distribution is characterized by the average rate The average number of arrival in the fixed time period.

Examples The number of cars passing a fixed point in a 5 minute

interval. Average rate: = 3 cars/5 minutes The number of calls received by a switchboard during

a given period of time. Average rate: =3 call/minutes The number of message coming to a router per second The number of travelers arriving to the airport for

flight registration

Page 18: CS433: Modeling and Simulation

18

Poisson Distribution The Poisson distribution with the average rate

parameter

exp for 0,1, 2, ....PMF: !

0, otherwise

k

kp k P X k k

0

CDF: exp!

k i

i

F k p X ki

Expected value: E X

Variance: V X

PMF

CDF

the probability that there are exactly k arrivals in a certain period of time.

the probability that there are at least k arrivals in a certain period of time.

Page 19: CS433: Modeling and Simulation

19

Example: Poisson Distribution

The number of cars that enter the parking follows a Poisson distribution with a mean rate equal to = 20 cars/hour

The probability of having exactly 15 cars entering the parking in one hour:

or

The probability of having more than 3 cars entering the parking in one hour:

1520

15 15 exp 20 0.05164915!

p P X

15 15 14 0.156513 0.104864 0.051649p F F

3 1 3 1 3

1 0 1 2 3

0.9999967

p X p X F

p p p p

USE EXCEL/MATLABFOR COMPUTATIONS

Page 20: CS433: Modeling and Simulation

20

Example: Poisson Distribution

Probability Mass FunctionPoisson ( = 20 cars/hour)

Cumulative Distribution FunctionPoisson ( = 20 cars/hour)

0

20 exp 20

!

k i

i

F k p X ki

20exp 20

!

k

p X kk

Page 21: CS433: Modeling and Simulation

21

Five Minutes Break

You are free to discuss with your classmates about the previous slides, or to refresh a bit, or to ask questions.

Administrative issues• Groups Formation

Page 22: CS433: Modeling and Simulation

22

Modeling of Random Number of Arrivals/Events

Poisson Distribution Poisson Process

Page 23: CS433: Modeling and Simulation

23

Poisson Process

Process N(t) random variable N that depends on time t

Poisson Process: a Process that has a Poisson Distribution N(t) is called a counting poisson process

There are two types: Homogenous Poisson Process: constant average

rate Non-Homogenous Poissoon Process: variable

average rate

Page 24: CS433: Modeling and Simulation

24

What is “Stochastic Process”?

Day 1

Day

Day 2 Day 3 Day 4 Day 5 Day 6 Day 7

THU FRI SAT SUN MON TUE WED

X(dayi): Status of the weather observed each DAY

State Space = {SUNNY, RAINNY}

" "1X Sday " "3X Rday

" "2X Sday

" "5X Rday

" "4X Sday

" "7X Sday

" "6X Sday

" " or " " : RANDOM VARIABLE that varies with the DAYiX S Rday

IS A STOCHASTIC PROCESSiX day

Page 25: CS433: Modeling and Simulation

25

Examples of using Poisson Process

The number of web page requests arriving at a server may be characterized by a Poisson process except for unusual circumstances such as coordinated denial of service attacks.

The number of telephone calls arriving at a switchboard, or at an automatic phone-switching system, may be characterized by a Poisson process.

The arrival of "customers" is commonly modelled as a Poisson process in the study of simple queueing systems.

The execution of trades on a stock exchange, as viewed on a tick by tick basis, is a Poisson process.

Page 26: CS433: Modeling and Simulation

26

(Homogenous) Poisson Process

Formally, a counting process is a (homogenous) Poisson process with constant average rate if:

describes the number of events in time interval

The mean and the variance are equal

for 0 and 0,1,2,...

( )PMF: ( ) exp

!

n

t n

p N t N t n p N nn

E N V N

, 0N

N t N t ,t t

t t+

( )N

t++

( )N

Page 27: CS433: Modeling and Simulation

27

(Homogenous) Poisson Process Properties of Poisson process

Arrivals occur one at a time (not simultaneous)

has stationary increments, which means

The number of arrivals in time s to t is also

Poisson-distributed with mean has independent

increments

N t N s N t s

t s

, 0N

, 0N

Page 28: CS433: Modeling and Simulation

28

Inter-Arrival Times of a Poisson Process Inter-arrival time: time between two consecutive arrivals

The inter-arrival times of a Poisson process are random. What is its distribution?

Consider the inter-arrival times of a Poisson process (A1, A2, …), where Ai is the elapsed time between arrival i and arrival i+1

The first arrival occurs after time t MEANS that there are no arrivals in the interval [0,t], As a consequence:

28

1 0 expp A t p N t t

1 11 1 expp A t p A t t

The Inter-arrival times of a Poisson process are exponentially distributed and independent with

mean 1/

Brian
Poi is not an abbreviation of Poisson that I have ever seen
Page 29: CS433: Modeling and Simulation

29

Splitting and Pooling

Splitting

Pooling

N(t) ~ Poi()

N1(t) ~ Poi[p]

N2(t) ~ Poi[(1-p)]

p

(1-p)

N(t) ~ Poi(12)

N1(t) ~ Poi[]

N2(t) ~ Poi[]

1

2

Page 30: CS433: Modeling and Simulation

30

Modeling of Random Number of Arrivals/Events

Poisson Distribution Non Homogenous Poisson Process

Page 31: CS433: Modeling and Simulation

31

Non Homogenous (Non-stationary) Poisson Process (NSPP)

The non homogeneous Poisson process is characterized by a VARIABLE rate parameter (t)λ , the arrival rate at time t. In general, the rate parameter may change over time.

The stationary increments, property is not satisfied

The expected number of events (e.g. arrival) between time s and time t is,

t

s t sλ(u) du

, : s t N t N s N t s

1 2 3

Page 32: CS433: Modeling and Simulation

32

Example: Non-stationary Poisson Process (NSPP)

The number of cars that cross the intersection of King Fahd Road and Al-Ourouba Road is distributed according to a non homogenous Poisson process with a mean (t) defined as follows:

Let us consider the time 8 am as t=0. Q1. Compute the average arrival number of cars at 11H30? Q2. Determine the equation that gives the probability of having

only 10000 car arrivals between 12 pm and 16 pm. Q3. What is the distribution and the average (in seconds) of the

inter-arrival time of two cars between 8 am and 9 am?

80 cars/mn 8 960 cars/mn 9 1150 car/mn 11 1570 car/mn 15 17

if am t amif am t pm

tif am t pmif pm t pm

Page 33: CS433: Modeling and Simulation

33

Example: Non-stationary Poisson Process (NSPP)

Q1. Compute the average arrival number of cars at 11H30?

Q2. Determine the equation that gives the probability of having only 10000 car arrivals between 12 pm and 16 pm.

We know that the number of cars between 12 pm and 16 pm, i.e. follows a Poisson distribution. During 12 pm and 16pm, the average number of cars is

Thus,

Q3. What is the distribution and the average (in seconds) of the inter-arrival time of two cars between 8 am and 9 am? (Homework)

11:30

8:00,11:30 8:00

9:00 11:00 11:30

8:00 9:00 11:00

80cars/mn 60mn 60cars/mn 120mn 50cars/mn 30mn 13500 cars

λ(u) du

λ(u) du λ(u) du λ(u) du

10000

1320016 12 10000 exp 13200

10000!p N N

16 12N N

12:00 16:00 180 50 60 70 13200 cars

Page 34: CS433: Modeling and Simulation

34

Two Minutes Break

You are free to discuss with your classmates about the previous slides, or to refresh a bit, or to ask questions.

Administrative issues• Groups Formation

Page 35: CS433: Modeling and Simulation

35

Continuous Probability Distributions

Uniform Distribution Exponential Distribution Normal (Gaussian) Distribution

Page 36: CS433: Modeling and Simulation

36

Uniform Distribution

Page 37: CS433: Modeling and Simulation

37

Continuous Uniform Distribution

The continuous uniform distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable

A random variable X is uniformly distributed on the interval [a,b], U(a,b), if its PDF and CDF are:

1,

PDF: ( )0, otherwise

a x bf x b a

0,

CDF: ( ) ,

1,

x a

x aF x a x b

b ax b

Expected value: 2

a bE X

2

Variance: 12

a bV X

Page 38: CS433: Modeling and Simulation

38

Uniform Distribution

Properties is proportional to

the length of the interval

Special case: a standard uniform distribution U(0,1). Very useful for random

number generators in simulators

1 2p x X x

2 12 1

X XF X F X

b a

CDF

PDF

Page 39: CS433: Modeling and Simulation

39

Exponential Distribution

Modeling Random Time

Page 40: CS433: Modeling and Simulation

40

Exponential Distribution

The exponential distribution describes the times between events in a Poisson process, in which events occur continuously and independently at a constant average rate.

A random variable X is exponentially distributed with parameter > 0 if its PDF and CDF are: exp , 0PDF: ( )

0, otherwise

x xf x

0

0, 0CDF: ( )

1 , 0x t x

xF x

e dt e x

1Expected value: E X

2

2

1Variance: V X

1exp , 0

( )

0, otherwise

xx

f x

0, 0

( )1 exp , 0

x

F x xx

Page 41: CS433: Modeling and Simulation

41

Exponential Distribution

1exp , 0

( ) 20 20

0, otherwise

xx

f x

0, 0

( )1 exp , 0

20

x

F x xx

µ=20 µ=20

Page 42: CS433: Modeling and Simulation

42

Exponential Distribution The memoryless property: In probability theory,

memoryless is a property of certain probability distributions: the exponential distributions and the geometric distributions, wherein any derived probability from a set of random samples is distinct and has no information (i.e. "memory") of earlier samples.

Formally, the memoryless property is:

For all s and t greater or equal to 0:

This means that the future event do not depend on the past event, but only on the present event The fact that Pr(X > 40 | X > 30) = Pr(X > 10) does not mean that the events X

> 40 and X > 30 are independent; i.e. it does not mean that Pr(X > 40 | X > 30) = Pr(X > 40).

|p X s t X s p X t

Page 43: CS433: Modeling and Simulation

43

Exponential Distribution

The memoryless property: can be read as “the probability that you will wait more than s+t minutes given that you have already been waiting s minutes is equal to the probability that you will wait t minutes.”

In other words “The probability that you will wait s more minutes given that you have already been waiting t minutes is the same as the probability that you had wait for more than s minutes from the beginning.”

The fact that Pr(X > 40 | X > 30) = Pr(X > 10) does not mean that the events X > 40 and X > 30 are independent; i.e. it does not mean that Pr(X > 40 | X > 30) = Pr(X > 40).

|p X s t X s p X t

Page 44: CS433: Modeling and Simulation

44

Example: Exponential Distribution

The time needed to repair the engine of a car is exponentially distributed with a mean time equal to 3 hours. The probability that the car spends more than 3 hours in

reparation

The probability that the car repair time lasts between 2 to 3 hours is:

The probability that the repair time lasts for another hour given it has been operating for 2.5 hours:

Using the memoryless property of the exponential distribution, we have:

33 1 3 1 3 1 1 exp 0.368

3p X p X F

3 3 2 0.145p X F F

12.5 1 | 2.5 1 1 1 exp 0.717

3p X X p X p X

Page 45: CS433: Modeling and Simulation

45

Normal (Gaussian) Distribution

Page 46: CS433: Modeling and Simulation

46

Normal Distribution

The Normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields.

Each member of the family may be defined by two parameters, location and scale: the mean ("average", μ) and variance (standard deviation squared, σ2) respectively.

The importance of the normal distribution as a model of quantitative phenomena in the natural and behavioral sciences is due in part to the Central Limit Theorem.

It is usually used to model system error (e.g. channel error), the distribution of natural phenomena, height, weight, etc.

Page 47: CS433: Modeling and Simulation

47

Normal or Gaussian Distribution

A continuous random variable X, taking all real values in the range (-∞,+∞) is said to follow a Normal distribution with parameters µ and σ if it has the following PDF and CDF:

where

The Normal distribution is denoted as This probability density function (PDF) is

a symmetrical, bell-shaped curve, centered at its expected value µ. The variance is 2.

2

1 1PDF: exp

22

xf x

2~ ,X N

1CDF: 1

2 2

xF x erf

2

0

2Error Function: exp

x

erf x t

Page 48: CS433: Modeling and Simulation

48

Normal distribution

Example The simplest case of the normal distribution, known as the

Standard Normal Distribution, has expected value zero and variance one. This is written as N(0,1).

Page 49: CS433: Modeling and Simulation

49

Normal Distribution

Evaluating the distribution: Independent of and using the standard normal

distribution:

Transformation of variables: let

z t dtez 2/2

2

1)( where,

)()(

2

1

)(

/)(

/)( 2/2

xx

x z

dzz

dze

xZPxXPxF

~ 0,1Z N

XZ

Page 50: CS433: Modeling and Simulation

50

Normal Distribution

Example: The time required to load an oceangoing vessel, X, is distributed as N(12,4) The probability that the vessel is loaded in less than 10

hours:

Using the symmetry property, (1) is the complement of (-1)

1587.0)1(2

1210)10(

F

Page 51: CS433: Modeling and Simulation

51

Empirical Distribution

Page 52: CS433: Modeling and Simulation

52

Empirical Distributions

An Empirical Distribution is a distribution whose parameters are the observed values in a sample of data. May be used when it is impossible or unnecessary

to establish that a random variable has any particular parametric distribution.

Advantage: no assumption beyond the observed values in the sample.

Disadvantage: sample might not cover the entire range of possible values.

Page 53: CS433: Modeling and Simulation

53

Empirical Distributions

In statistics, an empirical distribution function is a cumulative probability distribution function that concentrates probability 1/n at each of the n numbers in a sample.

Let X1, X2, …, Xn be iid random variables in with the CDF equal to F(x).

The empirical distribution function Fn(x) based on sample X1, X2, …, Xn is a step function defined by

where I(A) is the indicator of event A. For a fixed value x, I(Xi≤x) is a Bernoulli random variable

with parameter p=F(x), hence nFn(x) is a binomial random variable with mean nF(x) and variance nF(x)(1-F(x)).

1

number of element in the sample 1 n

n ii

xF x I X x

n n

1 if

0 otherwisei

i

X xI X x

Page 54: CS433: Modeling and Simulation

54

End of Chapter 4