applications of regression analysis

22
100 Applications of Regression Analysis: Reading: “ACME Clinic” (ECON 4550 Coursepak, Page 47) and “Big Suzy’s Snack Cakes” (ECON 4550 Coursepak, Page 51) Within this topic we are going to analyze several different scenarios in which the previously developed basic tools of Regression Analysis can be applied… Motivation for first two examples… Recall, primary objective of most private firms: maximize profit (Profit) = (Revenue) – (Costs) ) ( ) ( ) ( q C q qP q D = π In practice, how does a firm “know” ) ( q P D or even ) ( q C ?

Upload: others

Post on 07-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applications of Regression Analysis

100

Applications of Regression Analysis: Reading: “ACME Clinic” (ECON 4550 Coursepak, Page 47) and “Big Suzy’s Snack Cakes” (ECON 4550 Coursepak, Page 51) Within this topic we are going to analyze several different scenarios in which the previously developed basic tools of Regression Analysis can be applied… Motivation for first two examples… Recall, primary objective of most private firms: maximize profit

(Profit) = (Revenue) – (Costs) )()()( qCqqPq D −=π

In practice, how does a firm “know” )(qPD or even )(qC ?

Page 2: Applications of Regression Analysis

101

1. Estimating an Average Cost Function • Consider an automobile manufacturer trying to estimate

a functional form of )(qATC , based on past realizations of Average Total Costs for different levels of Output

• Assume 2210)( qbqbbqATC ++=

• We have data on “Average Costs” and “Quantity of Output” for each of the past 26 weeks as follows:

Average Costs Quantity

Average Costs Quantity 39,380 758

36,580 114

29,120 100

33,980 625 51,200 629

71,560 800

34,500 571

36,900 424 32,980 584

24,900 428

18,790 576

56,290 804 18,200 434

32,655 641

59,210 752

31,240 431 41,215 602

18,250 142

12,990 300

17,720 427 17,450 285

19,980 620

51,985 796

33,450 150 24,500 308

14,210 420

• Start by computing some “descriptive statistics” for the variables in our data set: sample mean, sample standard deviation, sample maximum, and sample minimum

• In practice, this partly serves as a “check” to potentially identify any errors in the dataset

Sample Maximum – the largest realized value of a variable Sample Minimum – the smallest realized value of a variable

Page 3: Applications of Regression Analysis

102

Descriptive Statistics:

Average Costs Quantity Mean 33,048 489.27

Std Dev 15,153.12 219.68 Maximum 71,560 804 Minimum 12,990 100

• In order for our data to match the assumed functional

form for Average Costs, we need to do a non-linear transformation of “Quantity” (i.e., compute “Quantity Squared” for each observation)

• Regression results from Excel…

Page 4: Applications of Regression Analysis

103

Example 1 – Estimating an Average Cost Function

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.826186414 R Square 0.682583991 Adjusted R Square 0.654982599 Standard Error 8900.667707 Observations 26

ANOVA df SS MS F Significance F

Regression 2 3918323443 1959161722 24.73005667 1.85666E-06 Residual 23 1822103369 79221885.63

Total 25 5740426813

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 41208.54946 7596.660953 5.424560833 1.63735E-05 25493.65897 56923.43996 25493.65897 56923.43996 X Variable 1 -120.6847582 36.01458995 -3.350996315 0.002767997 -195.1866138 -46.18290263 -195.1866138 -46.18290263 X Variable 2 0.17805629 0.038240066 4.656275685 0.000109633 0.098950686 0.257161894 0.098950686 0.257161894

Estimated equation of

22210 )1781.0(68.12055.208,41ˆˆˆ qqqbqbb +−=++

Note, all “p-values” are small enough so that each estimated coefficient is statistically significant at a 1% error level

68258.2 ≈R

Page 5: Applications of Regression Analysis

104

(?) What is the “Efficient Scale of Production” for this firm? (A) Recall, the Efficient Scale of Production is the quantity of

output that minimizes Average Total Costs of Production. • We have estimated Average Total Costs of Production to

be: 2)1781.0(68.12055.208,41)( qqqATC +−= • From here, we have: qqCAT )3562.0(68.120)( +−=′ and

3562.0)( =′′ qCAT • 0)( <′ qCAT for “small quantities” and 0)( >′ qCAT for

“large quantities” • Average Total Costs are minimized where:

0)( =′ qCAT 0)3562.0(68.120 =+− q

68.120)3562.0( =q

80.3383562.

68.120≈=q

• Thus, the “Efficient Scale of Production” is roughly 338 units of output

Page 6: Applications of Regression Analysis

105

2. Estimating Demand • Consider a coffee house with retail outlets in 32 markets • For each market they have data on annual quantity sold,

price per unit, average income, and price set by a rival.

Store Number Quantity Sold Price Average Income Rival Price 1 476,500 2.10 33,560 1.85 2 358,750 2.15 30,120 1.90 3 443,900 2.05 34,250 1.80 4 524,450 2.20 32,340 2.05 5 433,575 2.20 41,750 2.15 6 498,790 1.65 34,250 1.45 7 389,670 2.45 25,690 2.25 8 430,560 2.40 33,240 2.10 9 575,690 2.20 37,800 2.00 10 420,350 2.15 28,900 2.05 11 430,150 2.65 32,450 2.25 12 470,200 2.20 34,150 1.95 13 324,175 2.25 33,225 1.80 14 530,210 1.90 43,750 1.60 15 638,900 1.95 42,990 1.75 16 672,340 1.75 32,785 1.55 17 609,510 2.05 31,140 2.00 18 410,210 2.25 25,670 1.90 19 410,450 2.30 29,310 2.05 20 575,750 1.80 38,800 1.75 21 452,790 2.25 37,725 1.80 22 624,900 1.85 40,050 1.75 23 432,910 2.05 34,800 1.95 24 579,800 2.10 42,500 1.55 25 388,750 1.70 26,700 1.40 26 505,675 1.85 29,750 1.75 27 575,680 2.10 32,000 2.15 28 517,750 1.95 33,540 1.80 29 572,250 1.90 38,765 1.70 30 540,000 2.15 39,975 1.95 31 540,825 1.95 41,200 1.50 32 480,100 2.40 35,800 2.20

Page 7: Applications of Regression Analysis

106

Descriptive Statistics:

Quantity Price Income Rival Price Mean 494,861 2.09 34,655.47 1.865625

Std Dev 86,754.68 0.23 5,047.94 0.2326001 Maximum 672,340 2.65 43,750 2.25 Minimum 324,175 1.65 25,670 1.4

• Suppose they conjecture that:

( ) ( ) ( ) 321 _)( BBB pricerivalincomepriceAquantity = Where A is a constant

• Note: )ln()ln()ln()ln( zcybxazyx cba ++= • Thus, the demand relation above can be expressed as:

( ) ( ) ( ) ( )pricerivalBincomeBpriceBAquantity _lnlnln)ln(ln 321 +++=( ) ( ) ( ) ( )pricerivalBincomeBpriceBBquantity _lnlnlnln 3210 +++=

• We can do a transformation of variables and run a linear

regression! • Regression results from Excel…

Page 8: Applications of Regression Analysis

107

Example 2 – Estimating Demand

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.714436775 R Square 0.510419906 Adjusted R Square 0.457964895 Standard Error 0.131195 Observations 32

ANOVA df SS MS F Significance F

Regression 3 0.502454162 0.167484721 9.730622574 0.000145102 Residual 28 0.481939582 0.017212128

Total 31 0.984393744

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 7.001467495 1.764209634 3.96861425 0.000457071 3.38764788 10.61528711 3.38764788 10.61528711 X Variable 1 -1.268085531 0.381958721 -3.319954385 0.002509044 -2.050492504 -0.485678559 -2.050492504 -0.485678559 X Variable 2 0.630866119 0.165579736 3.810044245 0.000697642 0.291691406 0.970040831 0.291691406 0.970040831 X Variable 3 0.706725885 0.331927927 2.129154635 0.04217271 0.026802348 1.386649421 0.026802348 1.386649421

Estimated equation of ( ) ( ) )_ln(ˆlnˆlnˆˆ2210 pricerivalBincomeBpriceBB +++ is )_ln(7067.)ln(6309.)ln(2681.1)0015.7( pricerivalincomeprice ++−

Note, all “p-values” are small enough so that each estimated coefficient is statistically significant at a 5% error level; 51042.2 ≈R .

Page 9: Applications of Regression Analysis

108

• From here, we can essentially “undo” the previous transformation of variables

• Note, since )ln(0 AB = and 001467.7ˆ0 ≈B , it follows that

24.098,1}001467.7exp{ˆ ≈≈A • So, our estimated equation is:

( ) ( ) ( ) 321 _)( BBB pricerivalincomepriceAquantity = ( ) ( ) ( ) 7067.06309.02681.1 _24.098.1 pricerivalincomepricequantity −=

• Recognize that “fixing income and rival price,” this demand function is of the “constant elasticity form” => price elasticity of demand is 2681.1−≈pε => Elastic Demand

• Further, Income Elasticity of Demand is 6309.0≈Iε => Normal Good

• And Cross Price Elasticity of Demand (with respect to rival price) is 7067.0, ≈

YpXε => good in question is a Substitute for the good being sold by the rival firm

Page 10: Applications of Regression Analysis

109

3. ACME Clinic – Page 47 in Coursepak

ACME Clinic is a health care facility based in a major Midwestern area. It employs a variety of health care professionals including physicians and nurses. In December 2006, one of its nurses, Mr. Jones, filed a lawsuit claiming salary discrimination based on gender. Specifically, he claims that “I am paid less than my peers.” That is, Mr. Jones believes that he is paid less than comparably positioned employees.

In response, the clinic explained that the lower compensation resulted from the claimant’s lack of education. After all, the clinic responds that “Mr. Jones is in the bottom half of the nurses in terms of educational attainment.” They added that “when it comes to compensation, educational attainment matters.” Mr. Jones responded that even if you account for education, he is still underpaid. You have been retained as a consultant to conduct analysis of the compensation patterns of ACME-employed nurses to find if any such discrepancies exist. You request personnel data and are provided with the attached data set on 20 nurses (including Mr. Jones who appears as “observation 20”). That dataset is attached as “Exhibit A.” You are asked to be prepared to make a presentation on statistical validity of Mr. Jones’ claim. With that in mind, consider the following questions. 1. Based upon “Exhibit A,” are male nurses paid less than female nurses? If so,

by how much? Is that difference statistically significant? 2. What about the clinic’s claim that Mr. Jones is appropriately paid if you

account for his below average education? Is that supported by the data? If education is the only determinant of compensation, what is a fair estimate of what Mr. Jones’ salary should be?

3. After conducting your preliminary analysis, you interview supervisors in the clinic and find that years of experience are also highly valued by the clinic. Based on that observation, you request data on the experience of the nurses and receive data contained in “Exhibit B.” How is you analysis altered if you consider experience as a factor that determines compensation? Is Mr. Jones underpaid according to this analysis? Why not?

4. How do you reconcile the apparent contradiction between your answers above?

Page 11: Applications of Regression Analysis

110

“Exhibit A”

ID # Salary Education Gender 1 49,380 4 Female 2 33,400 3 Male 3 40,940 1 Female 4 43,440 4 Female 5 24,960 0 Male 6 47,580 5 Female 7 33,400 3 Male 8 37,520 4 Male 9 45,080 2 Female 10 36,820 0 Female 11 43,100 2 Female 12 64,820 2 Female 13 33,980 4 Male 14 43,440 4 Female 15 29,260 2 Male 16 26,940 0 Male 17 53,300 4 Female 18 61,250 5 Female 19 60,750 5 Female 20 29,980 2 Male

(?) How do you evaluate an equation at (Salary)=(49,380),

(Education)=(4), (Gender)=(Female)? (A) Create a “Dummy Variable” which indicates “Gender” for

each person in the sample. Dummy Variable – a variable that indicates whether an observation is characterized by a particular attribute (typically equal to 1 if the attribute is true and equal to 0 otherwise)

Page 12: Applications of Regression Analysis

111

Define a “dummy variable” as follows: 13 =ix , if individual i is female

03 =ix , if individual i is not female (i.e., if i is male) This gives us “Exhibit A”

ID # Salary Education Gender 1 49,380 4 1 2 33,400 3 0 3 40,940 1 1 4 43,440 4 1 5 24,960 0 0 6 47,580 5 1 7 33,400 3 0 8 37,520 4 0 9 45,080 2 1

10 36,820 0 1 11 43,100 2 1 12 64,820 2 1 13 33,980 4 0 14 43,440 4 1 15 29,260 2 0 16 26,940 0 0 17 53,300 4 1 18 61,250 5 1 19 60,750 5 1 20 29,980 2 0

Descriptive Statistics: Salary Education Gender

Mean 41,967 2.8 0.6 Std Dev 11,596.76 1.67 0.50

Maximum 64,820 5 1 Minimum 24,960 0 0

Note: for a dummy variable if we have either a Maximum or Minimum equal to something other than (0) or (1), then we know there is an error in our dataset!

Page 13: Applications of Regression Analysis

112

1. Based upon “Exhibit A,” are male nurses paid less than female nurses? If so, by how much? Is that difference statistically significant?

• Observe that from the dataset we can compute that the

Average Salary of Female nurses is $49,158.33, while the Average Salary of Male nurses is only $31,180.00 => Male nurses are paid $17,978.33 less!

• If we run a regression to estimate the equation )(10 femalebbsalary += , we get…

Page 14: Applications of Regression Analysis

113

Example 3 – ACME Clinic [Regression (i)] SUMMARY OUTPUT

Regression Statistics

Multiple R 0.779213999 R Square 0.607174456 Adjusted R Square 0.585350814 Standard Error 7467.528844 Observations 20

ANOVA df SS MS F Significance F

Regression 1 1551458253 1551458253 27.82186741 5.14359E-05 Residual 18 1003751767 55763987.04

Total 19 2555210020

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 31180 2640.170142 11.80984494 6.52351E-10 25633.20836 36726.79164 25633.20836 36726.79164 X Variable 1 17978.33333 3408.444997 5.274643818 5.14359E-05 10817.45612 25139.21055 10817.45612 25139.21055

Estimated equation: )(33.978,17180,31)(ˆˆ10 femalefemalebb +=+ => if we run a regression with

only one “X” variable that happens to be a “dummy,” then 0̂b is equal to “the average value of the

observations with (dummy)=(0)” and 1̂b is equal to “the difference between average value of the observations with (dummy)=(1) and average value of the observations with (dummy)=(0)”

Each estimated coefficient is significant at a 1% error level 61331.2 ≈R

Page 15: Applications of Regression Analysis

114

• So, based upon the results of this regression, it appears as if Male nurses are paid less ($17,978.33 less!) than Female nurses

• Further, this difference is statistically significant at a .01% error level

2. What about the clinic’s claim that Mr. Jones is appropriately

paid if you account for his below average education? Is that supported by the data? If education is the only determinant of compensation, what is a fair estimate of what Mr. Jones’ salary should be?

• To determine the relation between education and salary

(assuming education is the only determinant of salary), run a regression on the equation )(10 educationbbsalary += . Doing so, we get…

Page 16: Applications of Regression Analysis

115

Example 3 – ACME Clinic [Regression (ii)] SUMMARY OUTPUT

Regression Statistics

Multiple R 0.56362876 R Square 0.317677379 Adjusted R Square 0.279770567 Standard Error 9841.741032 Observations 20

ANOVA df SS MS F Significance F

Regression 1 811732422.3 811732422.3 8.380482559 0.009650965 Residual 18 1743477598 96859866.54

Total 19 2555210020

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 31029.73684 4372.308192 7.096877778 1.29141E-06 21843.8582 40215.61549 21843.8582 40215.61549 X Variable 1 3906.165414 1349.323602 2.894906313 0.009650965 1071.341719 6740.989109 1071.341719 6740.989109

Estimated equation: )(17.906,374.029,31)(ˆˆ

10 educationeducationbb +=+ Each estimated coefficient is significant at a 1% error level 31767.2 ≈R

Page 17: Applications of Regression Analysis

116

• So, based upon the results of this regression, it appears that nurses with more education are paid higher salaries

• Further, Mr. Jones’ education level (of only 2 years) is slightly below the sample mean of (2.8)

• But, by the estimated equation )(17.906,374.029,31 education+ , the expected salary of a nurse with 2 years of education should be 08.842,38)2(17.906,374.029,31 =+ => Mr. Jones salary of only $29,980 is well below this amount

• Thus, the Clinic’s claim that Mr. Jones’ low salary is accounted for by his below average education is not supported by the data

3. After conducting your preliminary analysis, you interview

supervisors in the clinic and find that years of experience are also highly valued by the clinic. Based on that observation, you request data on the experience of the nurses and receive data contained in “Exhibit B.” How is you analysis altered if you consider experience as a factor that determines compensation? Is Mr. Jones underpaid according to this analysis? Why not?

Page 18: Applications of Regression Analysis

117

So, we now have “Exhibit B”

ID # Salary Education Female Experience 1 49,380 4 1 11 2 33,400 3 0 4 3 40,940 1 1 10 4 43,440 4 1 8 5 24,960 0 0 3 6 47,580 5 1 9 7 33,400 3 0 4 8 37,520 4 0 5 9 45,080 2 1 11

10 36,820 0 1 9 11 43,100 2 1 10 12 64,820 2 1 21 13 33,980 4 0 3 14 43,440 4 1 8 15 29,260 2 0 3 16 26,940 0 0 4 17 53,300 4 1 12 18 61,250 5 1 17 19 60,750 5 1 17 20 29,980 2 0 3

Descriptive Statistics:

Salary Education Gender Experience Mean 41,967 2.8 0.6 8.6

Std Dev 11,596.76 1.67 0.50 5.26 Max 64,820 5 1 21 Min 24,960 0 0 3

• To determine the relation between salary and all three of

the available independent variables, run a regression on the equation

)(exp)()( 3210 eriencebfemalebeducationbbsalary +++= . Doing so, we get…

Page 19: Applications of Regression Analysis

118

Example 3 – ACME Clinic [Regression (iii)] SUMMARY OUTPUT

Regression Statistics

Multiple R 0.997744418 R Square 0.995493923 Adjusted R Square 0.994649034 Standard Error 848.3061073 Observations 20

ANOVA df SS MS F Significance F

Regression 3 2543696048 847898682.7 1178.253594 5.66323E-19 Residual 16 11513972.03 719623.2516

Total 19 2555210020

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 19829.49549 443.6150013 44.69978569 3.12608E-18 18889.0737 20769.91728 18889.0737 20769.91728 X Variable 1 2054.6398 122.41135 16.78471645 1.40007E-11 1795.13933 2314.140269 1795.13933 2314.140269 X Variable 2 706.5752692 636.476074 1.110136419 0.283346397 -642.6937328 2055.844271 -642.6937328 2055.844271 X Variable 3 1855.87999 61.49954229 30.17713499 1.56335E-15 1725.506784 1986.253195 1725.506784 1986.253195

)(88.855,1)(58.706)(64.054,250.829,19)(ˆ)(ˆ)(ˆˆ3210 ExperiencefemaleeducationExperiencebfemalebeducationbb +++=+++

99549.2 ≈R However, the coefficient for the “Female dummy variable” is no longer statistically significant ("p-

value” of .28335)

Page 20: Applications of Regression Analysis

119

• So, based upon the results of this regression, there is not statistically significant difference in salaries of females versus males

• Further, accounting for Mr. Jones’ education level (of only 2 years) and experience (of only 3 years), his expected salary is 42.506,29)3(88.855,1)0(58.706)2(64.054,250.829,19 =+++

• His actual salary of $29,980 is greater than this estimated expected salary (an estimate that takes into account his level of education and experience) => if anything, he’s slightly overpaid

4. How do you reconcile the apparent contradiction between

your answers above? • To answer Question (1) we ran a regression for the

equation )(10 femalebbsalary += and found the impact of “female” to be statistically significant

• To answer Question (3) we ran a regression for )()()( 3210 Experiencebfemalebeducationbbsalary +++= and

found the impact of “education” and “experience” to be statistically significant but the impact of “female” to not be statistically significant

• When running this latter regression, we are determining the impact of changes in each independent variable, controlling for differences in each of the other independent variables (recall, for multiple regression the interpretation of each coefficient is along the lines of “all other factors fixed”)

Page 21: Applications of Regression Analysis

120

The regression we ran to answer Question (1) suffers from an Omitted Variables Bias, due to the fact that for this population there is a strong, positive correlation between “Female” and “Experience” • Recall the definition of the Correlation Coefficient between

two variables (X and Y): YX

XY ssYX ),cov(

• Calculate the value of the correlation coefficient between each pair of independent variables:

Education Female Experience Education 1

Female 0.275344396 1 Experience 0.307617041 0.792986037 1

• Correlation Coefficient between “Experience” and “Female” is (.79299), which is “fairly close to the upper bound of (1)”

Omitted Variable Bias – a problem of distorted regression results arising from specifying a model which leaves out one or more important independent variables (i.e., a specification of the true model which is “wrong” because all of the relevant “X” variables were not included)

For such a bias to arise in linear regression, the “omitted variable” must (i) be a true determinant of the independent variable and (ii) be strongly correlated with one or more of the other included independent variables

If such a relevant independent variable is omitted, then the estimated coefficient on the strongly correlated (included) independent variable is partly measuring the impact of the highly correlated omitted variable

Page 22: Applications of Regression Analysis

121

For the regression we ran to answer Question (1), this was precisely the case… • Recall, the specified equation for this regression was

)(10 femalebbsalary += • We omitted “Experience,” which is highly correlated with

“Female” => when doing so, the estimated coefficient for “Female” is actually providing a measure of both “gender” and the highly correlated “experience”

• Once we include both “Female” and “Experience,” the coefficient on “Female” only measures the impact of “gender” and not the impact of “experience” => from these results we see that “experience” has a statistically significant impact on salary, while “gender” does not

• Thus, the better results in this case are those from the regression which includes all three potential determinants of salary (i.e., the results for the estimation of the equation

)()()( 3210 Experiencebfemalebeducationbbsalary +++= , as estimated within our answer to Question 3) => these results do NOT suffer from any “Omitted Variable Bias”