Download - Lecture Eleven
![Page 1: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/1.jpg)
1
Lecture Eleven
Probability Models
![Page 2: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/2.jpg)
2
Outline
• Bayesian Probability
• Duration Models
![Page 3: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/3.jpg)
3
Bayesian Probability
• Facts
• Incidence of the disease in the population is one in a thousand
• The probability of testing positive if you have the disease is 99 out of 100
• The probability of testing positive if you do not have the disease is 2 in a 100
![Page 4: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/4.jpg)
4
Joint and Marginal Probabilities
Sick: S Healthy: H
Test + Pr(+ S) Pr(+ H) Pr(+)
Test- Pr(- S) Pr(- H) Pr(-)
Pr(S) Pr(H)
![Page 5: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/5.jpg)
5
Filling In Our Facts
Sick: S Healthy: H
Test +
Test -
Pr(s) =0.001
Pr(H) =0.999
![Page 6: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/6.jpg)
Using Conditional Probability
• Pr(+ H)= Pr(+/H)*Pr(H)= 0.02*0.999=.01998
• Pr(+ S) = Pr(+/S)*Pr(S) = 0.99*0.001=.00099
![Page 7: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/7.jpg)
7
Filling In Our Facts
Sick: S Healthy: H
Test + Pr(+ S)=0.00099
Pr(+ H)=0.01998
Test -
Pr(s) =0.001
Pr(H) =0.999
![Page 8: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/8.jpg)
8
By Sum and By Difference
Sick: S Healthy: H
Test + Pr(+ S)=0.00099
Pr(+ H)=0.00198
Pr(+)=0.02097
Test - Pr(- S)=0.00901
Pr(- H)=0.88802
Pr(s) =0.001
Pr(H) =0.999
![Page 9: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/9.jpg)
False Positive Paradox
• Probability of Being Sick If You Test +
• Pr(S/+) ?
• From Conditional Probability:
• Pr(S/+) = Pr(S +)/Pr(+) = 0.00099/0.02097
• Pr(S/+) = 0.0472
![Page 10: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/10.jpg)
Bayesian Probability By Formula
• Pr(S/+) = Pr(S +)/Pr(+) = PR(+/S)*Pr(S)/Pr(+)
• Where PR(+) = PR(+/S)*PR(S) + PR(+/H)*PR(H)
• And Using our facts;Pr(S/+) = 0.99*(0.001)/[0.99*.001 + 0.02*.999]
• Pr(S/+) = 0.00099/[0.00099+0.01998]
• Pr(S/+) = 0.00099/0.02097 = 0.0472
![Page 11: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/11.jpg)
11
Duration Models
• Exploratory (Graphical) Estimates– Kaplan-Meier
• Functional Form Estimates– Exponential Distribution
![Page 12: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/12.jpg)
12
Duration of Post-War Economic Expansions in Months
![Page 13: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/13.jpg)
13
Trough Peak DurationOct. 1945 Nov. 1948 37Oct. 1949 July 1953 45May 1954 August 1957 39April 1958 April 1960 24Feb. 1961 Dec. 1969 106Nov. 1970 Nov. 1973 36March 1975 January 1980 58July 1980 July 1981 12Nov. 1982 July 1990 92March 1991 March 2000 120
![Page 14: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/14.jpg)
14
Estimated Survivor Function for Ten Post-War Expansions
![Page 15: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/15.jpg)
15
Kaplan-Meyer Estimate of Survivor Function
• Survivor Function = (# at risk - # ending)/# at risk
![Page 16: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/16.jpg)
16
Duration # Ending # At Risk Survivor0 0 10 112 1 10 0.924 1 9 0.836 1 8 0.737 1 7 0.639 1 6 0.545 1 5 0.458 1 4 0.392 1 3 0.2106 1 2 0.1120 1 1 0
![Page 17: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/17.jpg)
17
Figure 2: Estimated Survivor Function for Post-War Expansions
0
0.2
0.4
0.6
0.8
1
1.2
0 20 40 60 80 100 120 140
Duration in Months
Su
rviv
or
Fu
nct
ion
![Page 18: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/18.jpg)
Exponential Distribution
• Density: f(t) = exp[ - t], 0 t
• Cumulative Distribution Function F(t)
• F(t) =
• F(t) = - exp[- u]
• F(t) = -1 {exp[- t] - exp[0]}
• F(t) = 1 - exp[- t]
• Survivor Function, S(t) = 1- F(t) = exp[- t]
• Taking logarithms, lnS(t) = - t
f (u)du 0
t
exp[ u]du0
t
0t
![Page 19: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/19.jpg)
19
Postwar Expansions
y = -0.0217x + 0.1799
R2 = 0.9533
-2.5
-2
-1.5
-1
-0.5
0
0.5
0 20 40 60 80 100 120
Duration (Months)
Ln
Su
rviv
or
Fu
nct
ion
So
![Page 20: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/20.jpg)
Exponential Distribution (Cont.)
• Mean = 1/ =
• Memoryless feature:
• Duration conditional on surviving until t = :
• DURC( ) = = + 1/
• Expected remaining duration = duration conditional on surviving until time , i.e DURC, minus
• Or 1/ , which is equal to the overall mean, so the distribution is memoryless
t * f (t)dt
t * f (t)dt / S( )
![Page 21: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/21.jpg)
Exponential Distribution(Cont.)
• Hazard rate or function, h(t) is the probability of failure conditional on survival until that time, and is the ratio of the density function to the survivor function. It is a constant for the exponential.
• h(t) = f(t)/S(t) = exp[- t] /exp[- t] =
![Page 22: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/22.jpg)
22
Model Building
• Reference: Ch 20
![Page 23: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/23.jpg)
23
20.2 Polynomial Models
• There are models where the independent variables (xi) may appear as functions of a smaller number of predictor variables.
• Polynomial models are one such example.
![Page 24: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/24.jpg)
24
y = 0 + 1x1+ 2x2 +…+ pxp +
y = 0 + 1x + 2x2 + …+pxp +
Polynomial Models with One Predictor Variable
![Page 25: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/25.jpg)
y01x
• First order model (p = 1)
y = 0 + 1x + 2x2 +
2 < 0 2 > 0
• Second order model (p=2)
Polynomial Models with One Predictor Variable
![Page 26: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/26.jpg)
y = 0 + 1x + 2x2 + 3x3 +
3 < 0 3 > 0
• Third order model (p = 3)
Polynomial Models with One Predictor Variable
![Page 27: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/27.jpg)
• First order modely = 0 + 1x1 +
Polynomial Models with Two Predictor Variables
x1
x2
y
2x2 + 1 < 0
1 > 0
x1
x2
y
2 > 0
2 <
0
![Page 28: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/28.jpg)
28
20.3 Nominal Independent Variables
• In many real-life situations one or more independent variables are nominal.
• Including nominal variables in a regression analysis model is done via indicator variables.
• An indicator variable (I) can assume one out of two values, “zero” or “one”.1 if a first condition out of two is met
0 if a second condition out of two is metI=1 if data were collected before 19800 if data were collected after 19801 if the temperature was below 50o
0 if the temperature was 50o or more1 if a degree earned is in Finance0 if a degree earned is not in Finance
![Page 29: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/29.jpg)
29
Nominal Independent Variables; Example: Auction Car Price (II)
• Example 18.2 - revised (Xm18-02a)– Recall: A car dealer wants to predict the auction
price of a car.– The dealer believes now that odometer reading
and the car color are variables that affect a car’s price.
– Three color categories are considered:• White
• Silver
• Other colors
Note: Color is a nominal variable.
![Page 30: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/30.jpg)
30
• Example 18.2 - revised (Xm18-02b)
I1 =1 if the color is white0 if the color is not white
I2 =1 if the color is silver0 if the color is not silver
The category “Other colors” is defined by:I1 = 0; I2 = 0
Nominal Independent Variables; Example: Auction Car Price (II)
![Page 31: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/31.jpg)
31
• Note: To represent the situation of three possible colors we need only two indicator variables.
• Conclusion: To represent a nominal variable with m possible categories, we must create m-1 indicator variables.
How Many Indicator Variables?
![Page 32: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/32.jpg)
32
• Solution– the proposed model is
y = 0 + 1(Odometer) + 2I1 + 3I2 + – The data
Price Odometer I-1 I-214636 37388 1 014122 44758 1 014016 45833 0 015590 30862 0 015568 31705 0 114718 34010 0 1
. . . .
. . . .
White car
Other color
Silver color
Nominal Independent Variables; Example: Auction Car Price
![Page 33: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/33.jpg)
33Odometer
Price
Price = 16701 - .0555(Odometer) + 90.48(0) + 295.48(1)
Price = 16701 - .0555(Odometer) + 90.48(1) + 295.48(0)
Price = 16701 - .0555(Odometer) + 45.2(0) + 148(0)
16701 - .0555(Odometer)
16791.48 - .0555(Odometer)
16996.48 - .0555(Odometer)
The equation for an“other color” car.
The equation for awhite color car.
The equation for asilver color car.
From Excel (Xm18-02b) we get the regression equationPRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2)
Example: Auction Car Price The Regression Equation
![Page 34: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/34.jpg)
34
From Excel we get the regression equationPRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2)
A white car sells, on the average, for $90.48 more than a car of the “Other color” category
A silver color car sells, on the average, for $295.48 more than a car of the “Other color” category.
For one additional mile the auction price decreases by 5.55 cents.
Example: Auction Car Price The Regression Equation
![Page 35: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/35.jpg)
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.8355R Square 0.6980Adjusted R Square 0.6886Standard Error 284.5Observations 100
ANOVAdf SS MS F Significance F
Regression 3 17966997 5988999 73.97 0.0000Residual 96 7772564 80964Total 99 25739561
Coefficients Standard Error t Stat P-valueIntercept 16701 184.3330576 90.60 0.0000Odometer -0.0555 0.0047 -11.72 0.0000I-1 90.48 68.17 1.33 0.1876I-2 295.48 76.37 3.87 0.0002
There is insufficient evidenceto infer that a white color car anda car of “other color” sell for adifferent auction price.
There is sufficient evidenceto infer that a silver color carsells for a larger price than acar of the “other color” category.
Xm18-02b
Example: Auction Car Price The Regression Equation
![Page 36: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/36.jpg)
36
• Recall: The Dean wanted to evaluate applications for the MBA program by predicting future performance of the applicants.
• The following three predictors were suggested:– Undergraduate GPA– GMAT score– Years of work experience
• It is now believed that the type of undergraduate degree should be included in the model.
Nominal Independent Variables; Example: MBA Program Admission (
MBA II)
Note: The undergraduate degree is nominal data.
![Page 37: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/37.jpg)
37
Nominal Independent Variables; Example: MBA Program Admission
(II)
I1 =1 if B.A.0 otherwise
I2 =1 if B.B.A0 otherwise
The category “Other group” is defined by:I1 = 0; I2 = 0; I3 = 0
I3 =1 if B.Sc. or B.Eng.0 otherwise
![Page 38: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/38.jpg)
38
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.7461R Square 0.5566Adjusted R Square 0.5242Standard Error 0.729Observations 89
ANOVAdf SS MS F Significance F
Regression 6 54.75 9.13 17.16 0.0000Residual 82 43.62 0.53Total 88 98.37
Coefficients Standard Error t Stat P-valueIntercept 0.19 1.41 0.13 0.8930UnderGPA -0.0061 0.114 -0.05 0.9577GMAT 0.0128 0.0014 9.43 0.0000Work 0.098 0.030 3.24 0.0017I-1 -0.34 0.22 -1.54 0.1269I-2 0.71 0.24 2.93 0.0043I-3 0.03 0.21 0.17 0.8684
Nominal Independent Variables; Example: MBA Program Admission
(II)MBA-II
![Page 39: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/39.jpg)
39
20.4 Applications in Human Resources Management: Pay-Equity
• Pay-equity can be handled in two different forms:– Equal pay for equal work– Equal pay for work of equal value.
• Regression analysis is extensively employed in cases of equal pay for equal work.
![Page 40: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/40.jpg)
40
• Solution– Construct the following multiple regression model:
y = 0 + 1Education + 2Experience + 3Gender +
– Note the nature of the variables:• Education – Interval• Experience – Interval• Gender – Nominal (Gender = 1 if male; =0 otherwise).
Human Resources Management: Pay-Equity
![Page 41: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/41.jpg)
41
• Solution – Continued (Xm20-03)
Human Resources Management: Pay-Equity
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.8326R Square 0.6932Adjusted R Square 0.6836Standard Error 16274Observations 100
ANOVAdf SS MS F Significance F
Regression 3 57434095083 19144698361 72.29 0.0000Residual 96 25424794888 264841613.4Total 99 82858889971
CoefficientsStandard Error t Stat P-valueIntercept -5835.1 16082.8 -0.36 0.7175Education 2118.9 1018.5 2.08 0.0401Experience 4099.3 317.2 12.92 0.0000Gender 1851.0 3703.1 0.50 0.6183
Analysis and Interpretation• The model fits the data quite well.• The model is very useful.• Experience is a variable strongly related to salary.• There is no evidence of sex discrimination.
![Page 42: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/42.jpg)
42
• Solution – Continued (Xm20-03)
Human Resources Management: Pay-Equity
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.8326R Square 0.6932Adjusted R Square 0.6836Standard Error 16274Observations 100
ANOVAdf SS MS F Significance F
Regression 3 57434095083 19144698361 72.29 0.0000Residual 96 25424794888 264841613Total 99 82858889971
Coefficients Standard Error t Stat P-valueIntercept -5835.1 16082.8 -0.36 0.7175Education 2118.9 1018.5 2.08 0.0401Experience 4099.3 317.2 12.92 0.0000Gender 1851.0 3703.1 0.50 0.6183
Analysis and Interpretation • Further studying the data we find: Average experience (years) for women is 12. Average experience (years) for men is 17
• Average salary for female manager is $76,189 Average salary for male manager is $97,832
![Page 43: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/43.jpg)
43
Midterm Grade Distribution
• A: 68- 7
• A-: 65-67 7
• B+: 61-64 9
• B: -59 7
• total 30
![Page 44: Lecture Eleven](https://reader036.vdocuments.us/reader036/viewer/2022070414/56814df9550346895dbb6677/html5/thumbnails/44.jpg)
44
0
2
4
6
8
50 55 60 65 70 75
Series: MIDTERMSample 1 30Observations 30
Mean 63.40000Median 64.00000Maximum 73.00000Minimum 49.00000Std. Dev. 5.887450Skewness -0.666511Kurtosis 2.870864
Jarque-Bera 2.242033Probability 0.325948
Midterm Grade distribution: Normal Distribution
If you scored above the median, A- or Aotherwise B or B+