bus005week3 ho (1)
TRANSCRIPT
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 1/24
1/22/2013
1
BUS 005: Quantitative ResearchMethods for Business
Lecture 3: The random variableand discrete probability
distributions
Sanghamitra BandyopadhyaySchool of Business and
Management
1. Displaying graphsDescribethe data
3. Probability
4. Samplin g
5. Confidence intervalsInference
7. Modelling
Quantitative Research Methods
2. Descriptive statistics
6. Test of hypothesis
Random Variables: used to describe the
outcomes of an experiment
A random variable takes a value for each possible event of anexperiment.• it is random indicating the uncertainty of its value, which we don’tknow until the experiment has taken place.• usually represented by uppercase letters: X, Y, Z.
Example. Experiment: flip a coin {heads, tails};random variable: X = {1, 0}. Or {head, tail}.
Example. Experiment: flip a coin, repeatedly, 10 times. The events are now“the number of heads when flipping a coin 10 times”. Random variable :X = {0,1,2,3,…, 10}
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 2/24
1/22/2013
2
Random Variable types
• Discrete random variables produce outcomes that come from acounting process (e.g. number of classes you are taking).
• Continuous random variables produce outcomes that come froma measurement (e.g. annual salary, weight). They can take anyvalue in a range of values.
Analogy:Integers are Discrete, while Real Numbers are Continuous
Discrete Random Variables
Examples
Experiment RandomVariable
PossibleValues
Count Cars at TollBetween 11:00 & 1:00
# CarsArriving
0, 1, 2, ..., ∞
Make 100 Sales Calls # Sales 0, 1, 2, ..., 100
Inspect 70 Radios # Defective 0, 1, 2, ..., 70
Answer 33 Questions # Correct 0, 1, 2, ..., 33
Measure TimeBetween Arrivals
Inter-ArrivalTime
0, 1.3, 2.78, ...
Experiment RandomVariable
PossibleValues
Weigh 100 People Weight 45.1, 78, ...
Measure Part Life Hours 900, 875.9, ...
Amount spent on food £ amount 54.12, 42, ...
Continuous Random Variables
Examples
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 3/24
1/22/2013
3
How to describe an experiment and its possibleresults?
ContinuousProbability
Distributions
ProbabilityDistributions
DiscreteProbability
Distributions
Describing an experiment
Probability Distributions
A probability distribution consists of the values of a randomvariable and the probability associated with these values.
After both types of random variables (discrete or continuous) wehave two types of probability distributions:
– Discrete Probability Distribution – Continuous Probability Distribution
Probability Notation…
When we use its lower-case counterpart, we will be representing atheoretic value of the random variable.
The probability that the random variable X will equal x is:P(X = x) or just P( x)
i.e.:P(Achieving a 2B) = P(X=2B) = P(2) = P(49<mark<59)
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 4/24
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 5/24
1/22/2013
5
1,218 ÷ 101,501 = 0.012
e.g. P( X =4) = P(4) = 0.076 = 7.6%
Probability distributions can be estimated from relativefrequencies.
Example 3. TV per household
0.1
0.20.3
0.4
0 1 2 3 4 5 X
P(x)
E.g. what is the probability that there is at least one television but no
more than three in any given household? These events are mutuallyexclusive:
Discrete Probability Distributions…Remember the rules of probability:
i
Population/Probability Distribution…
When we calculate proportions using sample data we do not call them probabilities but just frequencies.
Population features are described by computing parameters .
E.g. the population mean and population variance.
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 6/24
1/22/2013
6
Population Mean (Expected Value)
The population mean is the weighted average of all of its values. Theweights are the probabilities.
This parameter is also called the expected value of X and isrepresented by E(X).
Example 3b:How many TVs in a typical US household?0,0,…,0, 1,1,…,1, 2,2,…,2, 3,3,…,3, 4,4,….4, 5,5,…,5
1218cases
32379cases
37961cases
19307cases
7714cases
2842cases
)5(5)4(4)3(3)2(2)1(1)0(01015012842
51015017714
410150119387
310150137961
210150132379
11015011218
0
101501284257714419387337961232379112180
1015015...5...2...21...10...0
... 124321
PPPPPP
N x x x x x x x N N N
)5(5)4(4)3(3)2(2)1(1)0(0 PPPPPP
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 7/24
1/22/2013
7
Population Variance…The population variance is calculated similarly. It is the weightedaverage of the squared deviations from the mean.
The standard deviation is the same as before:
Experiment: Toss 2 Coins. Let X = # heads.
T
T
Example 2b. Discrete Random VariableProbability Distribution
T
T
H
H
H H
Probability Distributi onX Value Probability
0 1/4 = 0.25
1 2/4 = 0.50
2 1/4 = 0.25
0 1 2 X
0.50
0.25 P r o b a b i l i t y
4 possible outcomes
3 possible events
Example 2b:Summary Measures Calcul ation Table
x p(x) x p(x) x –
Total
( x p( x)
(x – (x – p(x)
x p( x)
mean variance
0
1
2
0.25
0.50
0.25
0
0.50
0.50
-1
0
1
11
1
0
1
0.25
0
0.25
0.50
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 8/24
1/22/2013
8
Laws of Expected Value… E(c) = c
The expected value of a constant (c) is just the value of theconstant.
E(X + c) = E(X) + c E(b . X) = b . E(X)
We can “pull” a constant out of the expected value expression(either as part of a sum with a random variable X or as acoefficient of random variable X ).
Example 5aMonthly sales in a shop have a mean of £25,000 and a standarddeviation of £4,000 . Variable costs represent 70% of sales; fixedmonthly costs are £6,000.
Find the mean monthly profit.
1) Describe the problem statement in algebraic terms:
profits = Sales – variable costs – fixed costs =
profits = Sales – 0.70 Sales – 6000
Profit (= Y ) = 0.30(Sales) – 6,000
Example 5a
E(Profit) =E[0.30 . (Sales) – 6,000]=0.30 . E(Sales) – 6,000=0.30 . (25,000) – 6,000 = 1,500
Thus, the mean monthly profit is £1,500
sales have a mean of £25,000 E(Sales) = 25,000
Profit = 0.30(Sales) – 6,000
if: Y = b . X + cthen: E(Y) = b . E(X) + c
Note that c is a negative number.
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 9/24
1/22/2013
9
Laws of Variance: linear transformationV(c) = 0
The variance of a constant (c) is zero.
V(X + c) = V(X)V(b . X) = b 2 . V(X)
Example 4b.Find standard deviation of monthly profits
E(sales) = £25,000; standard deviation = £4,000. Profits are calculatedidem example 4a.
1) Describe the problem statement in algebraic terms: sales have a standard deviation of £4,000
V(Sales) = 4,000 2 = 16,000,000
Remember: ; then
profits are calculated by… Profit = 0.30(Sales) – 6,000
)(ProfitsVar Profits
Example 4b Find the standard deviation of monthly profits.
2) The variance of profit is = V(Profit)=V[0.30(Sales) – 6,000]
if Y = b . X + c, then V(Y) = b 2V(X)
V(Profit) =V[0.30 . (Sales) – 6,000]=0.30 2 . V(Sales)=0.30 . (16,000,000) = 1,440,000
Again, standard deviation is the square root of variance ,so standard deviation of Profit = (1,440,000) 1/2 = £1,200
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 10/24
1/22/2013
10
Example 6Xavier and Yvette are real estate agents.X = number of houses sold by Xavier in a month;Y = number of houses sold by Yvette in a month;An analysis of their past monthly performances has the following joint
probabilities (bivariate probability distribution).
Bivariate distributions
Bivariate Distributions…Up to now, we have looked at univariate distributions , i.e. probabilitydistributions in one variable.
As you might guess, bivariate distributions are probabilities of combinations of two variables. They are also called joint probability
distributions .
A joint probability distribution of X and Y is a table or formula that
lists the joint probabilities for all pairs of values x and y, and isdenoted P(x,y).
P(x,y) = P(X=x and Y=y)
Discrete Bivariate Distribution…As you might expect, the requirements for a bivariate distribution aresimilar to a univariate distribution, with only minor changes to thenotation:
for all pairs (x,y).
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 11/24
1/22/2013
11
Example 6
Xavier and Yvette are real estate agents.X = number of houses sold by Xavier in a month;Y = number of houses sold by Yvette in a month;An analysis of their past monthly performances has the following joint
probabilities (bivariate probability distribution).
Bivariate distributions
Marginal Probabilities…We calculate the marginal probabilities by summing across rows anddown columns to determine the probabilities of X and Y individually:
E.g the probability that Xavier sells 1 house = P(X=1) =0.50
P(X=x)
P(Y=y)
Describing the Bivariate Distribution…We can describe the mean, variance, and standard deviation of eachvariable in a bivariate distribution by working with the marginal
probabilities …
x . P(x)0 x 0.4 = 01 x 0.5 = 0.52 x 0.1 = 0.2
E(x) = 0.7
same formulae as for univariate distributions…
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 12/24
1/22/2013
12
Covariance…Definition. Covariance of two discrete variables :
• The covariance measures the strength of the linear relationship between two discrete random variables X and Y.
• A positive covariance indicates a positive relationship.
• A negative covariance indicates a negative relationship.
It depends on the values and units of measures of X and Y and it is notconstrained to be between -1 and 1.
Coefficient of Correlation (rho)
11
Example 6b (cont)Compute the covariance and the coefficient of correlation between thenumbers of houses sold by Xavier and Yvette.
COV(X,Y) = (0 – .7)(0 – .5)(.12) + (1 – .7)(0 – .5)(.42) + (2 – .7)(0 – .5)(.06) ++ (0 – .7)(1 – .5)(.21) + (1 – .7)(1 – .5)(.06) + (2 – .7)(1 – .5)(.03) ++(0 – .7)(2 – .5)(.07) + (1 – .7)(2 – .5)(.02) + (2 – .7)(2 –.5)(.01) = –.15
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 13/24
1/22/2013
13
= –0.15 ÷ [(.64)(.67)] = –.35
There is a weak, negative relationship between the two variables.
15.0 XY
Probability Distribution of the Sum of Two Variables…
The bivariate distribution allows us to develop the probabilitydistribution of any combination of the two variables, of particular interest is the su m of two variables (z= total houses sold).
x+y = 0 x+y = 1 x+y = 3
z =P(z)
“what is the probability that three houses are sold”?P(X+Y=3) = P(2,1) + P(1,2) = 0.02 + 0.03 = 0.05
x+y = 0 x+y = 1 x+y = 3
z =P(z)
Probability Distribution of the Sum of Two Variables…
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 14/24
1/22/2013
14
Likewise, we can compute the expected value, variance, andstandard deviation of X+Y in the usual way…
E(X + Y) = 0(.12) + 1(.63) + 2(.19) + 3(.05) + 4(.01) = 1.2
V(X + Y) = (0 – 1.2) 2(.12) + … + (4 – 1.2) 2(.01) = .56
75.56.)YX(Var yx
z =
P(z)
• A probability distribution is an equation that1. associates a particular probability of occurrence
with each outcome in the sample space.2. measures outcomes and assigns values of X to the
simple events.3. assigns a value to the variability in the sample
space.4. assigns a value to the center of the sample space.
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 15/24
1/22/2013
15
• The covariance1. must be between -1 and +1.2. must be positive.3. can be positive or negative.4. must be less than +1.
Laws of expectation and variance of the sum
We can derive laws of expected value and variance for the sum of twovariables as follows…
E(X + Y) = E(X) + E(Y) = 0.7 + 0.5 = 1.2
V(X + Y) = V(X) + V(Y) + 2COV(X, Y) = 0.41 + 0.45 + 2(-0.15) == 0.56
If X and Y are independent, COV(X, Y) = 0, thenV(X + Y) = V(X) + V(Y)
2
Y X Y X
Generalization: Laws of expectation and
variance of the linear combination
We can derive laws of expected value and variance for thesum of two variables as follows…
E(aX + bY) = a . E(X) + b . E(Y)
V(aX + bY) = a 2 . V(X) + b 2 . V(Y) + 2 a b COV(X, Y)
2bY aX bY aX
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 16/24
1/22/2013
16
Portfolio Expected Return and Portfolio Risk
• Two assets X and Y, with w invested in X.
• Portfolio expected return (weighted average return):
• Portfolio risk (weighted variability)
Where w = proportion of portfolio value in asset X(1 - w) = proportion of portfolio value in asset Y
)Y(E)w1()X(EwE(P)
XY2Y
22X
2P w)σ-2w(1σ)w1(σwσ
Portfolio Example
Investment X: μ X = 50 σX = 43.30Investment Y: μ Y = 95 σY = 193.21
σXY = 8250
Suppose 40% of the portfolio is in Investment X and 60% is inInvestment Y:
77(95)(0.6)(50)0.4E(P)
133.30
)(8250)2(0.4)(0.6(193.71)(0.6)(43.30)(0.4)σ 2222P
Probability Distributions
ContinuousProbability
Distributions
Binomial
Poisson
ProbabilityDistributions
DiscreteProbability
Distributions
Normal
Uniform
Exponential
Lec. 5 Lec. 6
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 17/24
1/22/2013
17
Mathematical Models of Probability Distributions
ContinuousProbability
Distributions
Binomial
Poisson
ProbabilityDistributions
DiscreteProbability
Distributions
Normal
Uniform
Exponential
Lec. 5 Lec. 6
Binomial Distribution Properties
1. Two different sampling methods• Infinite population without replacement• Finite population with replacement
2. Sequence of n identical trials
3. Each trial has 2 outcomes• ‘Success’ (any of the two) or ‘Failure’
4. Constant trial probability of success = π . P(failure)=1- π
5. Trials are independent: the outcome of one trial does not affectthe outcomes of any other trials.
The binomial distribution is the probability distribution that resultsfrom doing a “ binomial experiment ”. Binomial experiments havethe following properties:
Binomial: Possible Applications
• A manufacturing plant labels items as either defective or acceptable
• A firm bidding for contracts will either get a contractor not
• A marketing research firm receives survey responsesof “yes I will buy” or “no I will not”
• New job applicants either accept the offer or reject it
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 18/24
1/22/2013
18
Binomial Random Variable…The binomial random variable counts the number of successes (X) in ntrials of the binomial experiment. It can take on values from 0, 1, 2, …,
n . Thus, its a discrete random variable.
To calculate the probability associated with each value we use thisformulae:
for x = 0, 1, 2, …, n
n! = 1 x 2 x 3 x … x n
Example: Don Qi
Find out:1. The probability that Don gets no answers correct.2. The probability that Don gets at least two answers correct.3. The probability that Don fails the quiz, which demands a minimumof 5 answers correct.
Don Qi exam strategy is to rely on luck for the next quiz. The quizconsists of 10 multiple-choice questions . Each question has five
possible answers, only one of which is correct . Don plans to merelyguess the answer to each question.Algebraically then: n=10 , and P(success) = 1/5 = 0.20
Is this a binomial experiment? Check the conditions:There is a fixed finite number of trials ( n=10 ).An answer can be either correct or incorrect.The probability of a correct answer (P(success)=.20) doesnot change from question to question.Each answer is independent of the others.
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 19/24
1/22/2013
19
n=10 , and P(success) = .20
i.e. # success, x, = 0; hence we want to know P(x=0)
Don has about an 11% chance of getting no answers correctusing the guessing strategy.
1. Probability that Don gets no answers correct?
=BINOM(0, 10, 0.20, FALSE)=BINOM(X, n, P, FALSE)
In Excel
BINOM(10, 0.20, 0, FALSE)BINOM(X, n, P, FALSE)
# successes
# trials
P(success)
cumulative(i.e. P(X ≤ x)?)
True = cumulativeFalse=probability
distribution
n=10 , and P(success) = .20
i.e. # success, x, ≥ 2; that is: P(x ≥ 2) = 1 - P(0) - P(1) = 0.5906
or: P(x ≥ 2) = 1 - P(x ≤ 1 ) = 1 – BINOM(1, 10, 0.20, TRUE )
2. What is the probability that Don gets at least 2 answerscorrect?
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 20/24
1/22/2013
20
Cumulative Probability…
This requires a cumulative probability , that is,
P(X at most 4) = P(X ≤ 4) = F(4) == P(0) + P(1) + P(2) + P(3) + P(4)
3. Probability that Don fails the quiz: x min= 5
We already know P(0) = .1074. Using the binomial formula tocalculate: P(1) = .2684 , P(2) = .3020, P(3) = .2013, and P(4) = .0881
P(X ≤ 4) = .1074 + .2684 + … + .0881 = .9672
Thus, its about 97% probable that Don will fail the test using the luck strategy and guessing at answers…
Don’s Density functionProbability
0 1 2 3 4 5 6 7 8 9 10
Probability Cumulative distribution function
1
1
0 1 2 3 4 5 6 7 8 9 10
Binomial cdf The binomial cdf gives cumulative probabilities for P(X ≤ k), but as we’ve seen in the last example,
P(X = k) = P(X ≤ k) – P(X ≤ [k–1])
Likewise, for probabilities given as P(X ≥ k), we have:P(X ≥ k) = 1 – P(X ≤ [k–1])
3b. Probability that Don gets at least 5 answers correct?
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 21/24
1/22/2013
21
DNA fingerprinting
• 1985. Prof. Alec Jeffreys (Leicester) suggests a procedure to produce DNA individual’s pictures which may be compared acrossindividuals. Only twins may have similar profiles.
• Fingerprints are now being used in courts for forensic and paternitytests.
• Blood samples are tried with enzymes and exposed to electric fieldto produce unique fragment sequences.
DNA fingerprinting
• Child bands are equal to one of their parents unlessexceptional mutations with probability 1/300.
• Example: # of bands = 30;X = # of mutations. X ~ B(n=30, p=1/300)
The questions to ask are:1. how many bands do not come from the mother?
2. How many of these are different from those of the allegedfather?
Some drawbacks1. It’s not always straight forward to identify band matches.
2. New York Times talks about gross discrepancies between differentlaboratories.
DNA fingerprinting
3. What is the probability of those not coming from the allegedfather being mutations? If this probability is too low (say below 5%),then we should reject that the alleged father is the father.
P(X ≥ 2)=1-P(0)-P(1) = 0.045;P(X ≥ 3)=0.0001.So if there are two bands or more with no match we may rejectthat the alleged father is the father.
X ~ B(n=30, p=1/300)
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 22/24
1/22/2013
22
Mathematical Models of Probability Distributions
ContinuousProbability
Distributions
Binomial
Poisson
ProbabilityDistributions
DiscreteProbability
Distributions
Normal
Uniform
Exponential
Poisson Distribut ion1. Number of events that occur in an interval (or ‘area of
opportunity’)• events per unit
— Time, Length, Area, Space
2. Examples• Number of customers arriving in 20 minutes• Number of strikes per year in the U.S.• Number of defects per lot (group) of DVD’s• Number of exits per mile in a motorway.
Note: the difference with binomial is that now X is not defined as successes ina number n of trials, but in an ‘area of opportunity’.
Poisson Process1. Constant event probability
• Average of 60/hr is1/min for 60 1-minuteintervals
2. One event per interval• Don’t arrive together
3. Independent events• Arrival of 1 person does
not affect another’sarrival
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 23/24
1/22/2013
23
Poisson Distribution…The Poisson random variable is the number of successes that occur ina period of time or an interval of space in a Poisson experiment.
E.g. On average, 96 trucks arrive at a border crossingevery hour .
E.g. The number of typographic errors in a new textbook editionaverages 1.5 per 100 pages .
successes
time period
successes (?!) interval
Poisson Distribution ExampleCustomers arrive at a rate of 72 per hour. What is the
probability of 4 customersarriving in 3 minutes?
© 1995 Corel Corp.
Poisson Distributi on Solution
72 Per Hr. = 1.2 Per Min. = 3.6 Per 3 Min. Interval
-
4 -3.6
( )!
3.6(4) .1912
4!
xe
p x x
e p
First, lambda and x must be defined for the same time period.x is defined per 3 minutes.
7/27/2019 Bus005Week3 HO (1)
http://slidepdf.com/reader/full/bus005week3-ho-1 24/24
1/22/2013
Using Excel For The Poisson Distribution
How to calculate the probability of at most 4 eventsin a period of time?
POISSON(4, 3.6, FALSE)POISSON(X, , FALSE)
# successesMean/Expectednumber of events
cumulative(i.e. P(X ≤ x)?)
True = cumulativeFalse=probability
distribution
Recommended readings
• Chapter 5. Except for section 5.6.
Berenson, Levine, Krehbiel or the Pearson custom textbook