multi col linearity

22
Intro to Multicollinearity.xls This workbook demonstrates perfect and near multicollinearity between two independ It uses a subset of the data from Multireg.xls Excel 2003's LINEST function finds one solution, but doesn't directly warn the use Both the Example1 and the Example2 sheets demonstrate perfect multicollinearity. In Example1, one variable is directly proportional to a second variable. In Example2, there is a slightly more complicated linear relationship between the t The Table sheet uses the data from Example 1 to graphically demonstrate that many d b1 and b2 find the exact same, minimum sum of squared residuals. The NearMulti sheet demonstrates a case of near multicollinearity. The Q&A sheet has questions pertaining to multicollinearity.

Upload: herick-alvenus-willim

Post on 11-Nov-2015

219 views

Category:

Documents


0 download

DESCRIPTION

sdzsdszcdzfd

TRANSCRIPT

Explanation of Multicollinearity

IntroIntro to Multicollinearity.xlsThis workbook demonstrates perfect and near multicollinearity between two independent variables.It uses a subset of the data from Multireg.xlsBoth the Example1 and the Example2 sheets demonstrate perfect multicollinearity.In Example1, one variable is directly proportional to a second variable.In Example2, there is a slightly more complicated linear relationship between the two X variables.The Table sheet uses the data from Example 1 to graphically demonstrate that many different combinations ofb1 and b2 find the exact same, minimum sum of squared residuals.The NearMulti sheet demonstrates a case of near multicollinearity.The Q&A sheet has questions pertaining to multicollinearity.Excel 2003's LINEST function finds one solution, but doesn't directly warn the user about the multicollinearity.

Example1The Data Set, Summary Statistics, and Bivariate Regression AnalysesMultivariate Analysis: A Pivot TableMultivariate Analysis: Multiple RegressionThe Cause of Multicollinearity: A Perfectly Linear Relationship BetweenPriceIncomeQuantity DemandedLinEst QD= f(Price)Change b1 and then use Solver to find a least squares solutionPrice and Income in the Data Set501010.3SlopeInterceptAverage of Quantity DemandedIncomeb0 (Intercept)3.405501011.8Coefficients0.1463.405Price10121416Grand Totalb1 (Price)1LinEst Output601211.50.0674.4335011.0511.05b2 (Income)-4.273SSR27.109b2b1b0601212.6R-Squared0.4392.126RMSE6012.0512.0500.1463.405701416.04.6866df7012.6512.65Price of Heating OilIncome Per CapitaQuantity Demanded of Heating OilPredicted YResidualSquared Residual00.0674.43370149.3Reg SS21.17027.109SSR8015.715.7501010.310.68-0.380.140.4392.1260801615.6Grand Total11.0512.0512.6515.712.8625501011.810.681.121.254.68660801615.8601211.512.14-0.640.4021.17027.1090Summary Statistics601212.612.140.460.22Price (cents /gal.)Income ($1000s/ person)QD (100s gals. /Month)LinEst QD= f(Income)701416.013.592.415.81Mean65.0013.0012.86SlopeIntercept70149.313.59-4.2918.40SD11.952.392.63Coefficients0.7283.405801615.615.040.560.310.3364.433801615.815.040.760.57R-Squared0.4392.126RMSE4.6866dfReg SS21.17027.109SSRExcel's Output In this Example Depends on the VersionScroll Right for Multivariate AnalysesThe bolded zeros tell the user thatExcel 2000 or less can't deal withExcel has "zeroed out" Income.multicollinearity.

Example1

PriceQ=f(P): A Demand Curve?

Table

Quantity DemandedQ=f(Income): An Engel Curve?

Example2

Price (cents/gallon)Income ($1000 p.c./per year)Price and IncomeIncome = 0.2Price

NearMulti

Price (cents/gallon)Income ($1000 p.h./per year)Price and IncomeIncome = 0.2Price

Q&Ab0 (Intercept)3.405b1 (Price)2b2 (Income)-9.27PriceIncomeYFitted YResidualsSquared Residuals501010.310.7-0.40.1501011.810.71.11.3601211.512.1-0.60.4601212.612.10.50.2701416.013.62.45.870149.313.6-4.318.4801615.615.00.60.3801615.815.00.80.6SSR27.1085000036b2 (Income)b1 (Price)1161-4-9-14-19-24-29-22734,827139,227313,227556,827870,0271,252,8271,705,2272,227,227-134,8272734,827139,227313,227556,827870,0271,252,8271,705,2270139,22734,8272734,827139,227313,227556,827870,0271,252,8271313,227139,22734,8272734,827139,227313,227556,827870,0272556,827313,227139,22734,8272734,827139,227313,227556,8273870,027556,827313,227139,22734,8272734,827139,227313,22741,252,827870,027556,827313,227139,22734,8272734,827139,22751,705,2271,252,827870,027556,827313,227139,22734,8272734,82762,227,2271,705,2271,252,827870,027556,827313,227139,22734,82727-2-10123456intercept3.4slope15xyfitted lineresidualsresiduals squared187.315252634218.468.91525263424749.31204564232133.352318783533.499.95231878359990.46603020753115.388173199248.466.98817319924487.4153485622489.514389062663.426.1143890626681.96131611515183.0803842178.4104.6803842110957.982838347694.718726119193.41.31872611911.7390385772794.8746156742108.4-13.5253843258182.93602115978225.8659366572123.4102.465936657210499.2681750394991.6293104605138.4-46.77068953952187.497399998310153.7234327776153.40.32343277760.104608761611180.0987083597168.411.6987083597136.859777286412211.6802010706183.428.2802010706799.769772594313173.5201144842198.4-24.8798855158619.008703278414203.3097495013213.4-10.0902504987101.8131551258SUM45396.134230695

&APage &P

Q&A

1161-4-9-14-19-24-29b1 Priceb2 Income

The Data Set, Summary Statistics, and Bivariate Regression AnalysesMultivariate Analysis: A Pivot TableMultivariate Analysis: Multiple RegressionThe Cause of Multicollinearity: A Perfectly Linear Relationship BetweenPriceIncomeQuantity DemandedLinEst QD= f(Price)Change b1 and then use Solver to find a least squares solutionPrice and Income in the Data Set5097.1SlopeInterceptAverage of Quantity DemandedIncomeb0 (Intercept)-26.5075095.7Coefficients0.265-7.830Price9111315Grand Totalb1 (Price)4.00060115.70.0674.4345011.0511.05b2 (Income)-18.678SSR27.11960116.6R-Squared0.7212.126RMSE6012.0512.05701313.215.4796df7012.6512.65Price of Heating OilIncome Per CapitaQuantity Demanded of Heating OilPredicted YResidualSquared Residual70139.7Reg SS69.96027.119SSR8015.715.75097.15.401.702.91801511.3Grand Total11.0512.0512.6515.712.86255095.75.400.300.09801515.660115.78.04-2.345.48Summary StatisticsLinEst QD= f(Income)60116.68.04-1.442.07Price (cents /gal.)Income ($1000s/ person)QD (100s gals. /Month)SlopeIntercept701313.210.682.526.33Mean65.0012.009.36Coefficients1.323-6.50870139.710.68-0.980.97SD11.952.393.720.3364.103801511.313.33-2.034.12R-Squared0.7212.126RMSE801515.613.332.275.1515.4796dfLINEST results depend on the version of ExcelReg SS69.96027.119SSRLINESTExcel 2003b2b1InterceptCoefficients0.0000.265-7.830Estimated SE0.0000.0674.434R-Squared0.7212.1260.000RMSE15.478787543660dfReg SS69.96027.1190SSRIncome is zeroed out.Earlier Versions of ExcelLINEST gives no explicit warning that there is a problem.

Adjust b2 to Minimize the SSR

PriceQ=f(P): A Demand Curve?

Quantity DemandedQ=f(Income): An Engel Curve?

Price (cents/gallon)Income ($1000 p.c./per year)Price and IncomeIncome = 0.2 Price - 1

Near-multicollinearity and the 3-D Graph of the Sum of Squared Residualsb0 (Intercept)4.3896b1 (Price)-2.1856b2 (Income)11.5689PriceIncomeYFitted YResidualsSquared Residuals501010.310.8-0.50.2501011.810.81.01.0601211.512.1-0.60.3601212.612.10.50.3701416.013.42.67.070149.313.4-4.116.5801615.614.61.00.98016.115.815.80.00.0SSR26.232b2 (Income)b1 (Price)22171272-3-8-4.22734,747139,228313,469557,470871,2321,254,755-3.234,9872634,827139,388313,709557,791871,634-2.2139,54634,9062634,907139,549313,951558,113-1.2313,705139,38534,8262634,988139,710314,192-0.2557,464313,464139,22534,7462735,069139,8720.8870,824557,144313,224139,06534,6672835,1511.81,253,783870,423556,823312,984138,90634,58830We used the data below to construct the 3D graph of SSR's.-4.1856437237-3.1856437237-2.1856437237-1.1856437237-0.18564372370.81435627631.8143562763intercept3.4slope15xyfitted lineresidualsresiduals squared187.315252634218.468.91525263424749.31204564232133.352318783533.499.95231878359990.46603020753115.388173199248.466.98817319924487.4153485622489.514389062663.426.1143890626681.96131611515183.0803842178.4104.6803842110957.982838347694.718726119193.41.31872611911.7390385772794.8746156742108.4-13.5253843258182.93602115978225.8659366572123.4102.465936657210499.2681750394991.6293104605138.4-46.77068953952187.497399998310153.7234327776153.40.32343277760.104608761611180.0987083597168.411.6987083597136.859777286412211.6802010706183.428.2802010706799.769772594313173.5201144842198.4-24.8798855158619.008703278414203.3097495013213.4-10.0902504987101.8131551258SUM45396.134230695

&APage &P

-4.2-3.2-2.2-1.2-0.20.81.8b2 Incomeb1 Price

Q&A for Multicollinearity.xls1. Explain how the two bivariate regression results in Example1 are related to each other. Do the same for Example 2.Using the data below, LINEST reports the regression results below. Answer the questions that follow.X1X2YLINEST ResultsNote: If you are not using Excel 2003, you may get different results224734570162862450000In Excel 2003, the results look like this:28624501002964.5467.5596497033652577600000000000000000180224734552430002144.5327.52759.5432.52759.5432.524523802349.5362.52964.5467.52144.5327.526574152349.5362.52759.5432.5204231024523802144.5327.52759.5432.52759.5432.52. What does the zero mean in the top row (cell F7)?3. Use the regression results to predict the value of Y for the first observation. Show your work and comment on how well the prediction did.4. Does R2 = 1 mean that there is perfect multicollinearity between the X's? Explain.5. Can you recover the linear relationship between X1 and X2? If so, what is it?