· web viewsociology 7704: regression models for categorical data instructor: natasha sarkisian...

43
Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian OLS Regression Assumptions A1. All independent variables are quantitative or dichotomous, and the dependent variable is quantitative, continuous, and unbounded. All variables are measured without error. A2. All independent variables have some variation in value (non- zero variance). A3. There is no exact linear relationship between two or more independent variables (no perfect multicollinearity). A4. At each set of values of the independent variables, the mean of the error term is zero. A5. Each independent variable is uncorrelated with the error term. A6. At each set of values of the independent variables, the variance of the error term is the same (homoscedasticity). A7. For any two observations, their error terms are not correlated (lack of autocorrelation). A8. At each set of values of the independent variables, error term is normally distributed. A9. The change in the expected value of the dependent variable associated with a unit increase in an independent variable is the same regardless of the specific values of other independent variables (additivity assumption). A10. The change in the expected value of the dependent variable associated with a unit increase in an independent variable is the same regardless of the specific values of this independent variable (linearity assumption). A1-A7: Gauss-Markov assumptions: If these assumptions hold, the resulting regression estimates are BLUE (Best Linear Unbiased Estimates). Unbiased: if we were to calculate that estimate over many samples, the mean of these estimates would be equal to the mean of the population (i.e., on average we are on target). 1

Upload: nguyennhan

Post on 07-Apr-2018

231 views

Category:

Documents


4 download

TRANSCRIPT

Sociology 7704: Regression Models for Categorical DataInstructor: Natasha Sarkisian

OLS Regression Assumptions

A1. All independent variables are quantitative or dichotomous, and the dependent variable is quantitative, continuous, and unbounded. All variables are measured without error.A2. All independent variables have some variation in value (non-zero variance).A3. There is no exact linear relationship between two or more independent variables (no perfect multicollinearity).A4. At each set of values of the independent variables, the mean of the error term is zero.A5. Each independent variable is uncorrelated with the error term.A6. At each set of values of the independent variables, the variance of the error term is the same (homoscedasticity). A7. For any two observations, their error terms are not correlated (lack of autocorrelation). A8. At each set of values of the independent variables, error term is normally distributed. A9. The change in the expected value of the dependent variable associated with a unit increase in an independent variable is the same regardless of the specific values of other independent variables (additivity assumption).A10. The change in the expected value of the dependent variable associated with a unit increase in an independent variable is the same regardless of the specific values of this independent variable (linearity assumption).

A1-A7: Gauss-Markov assumptions: If these assumptions hold, the resulting regression estimates are BLUE (Best Linear Unbiased Estimates).

Unbiased: if we were to calculate that estimate over many samples, the mean of these estimates would be equal to the mean of the population (i.e., on average we are on target).

Best (also known as efficient): the standard deviation of the estimate is the smallest possible (i.e., not only are we on target on average, but we don’t deviate too far from it).

If A8-A10 also hold, the results can be used appropriately for statistical inference (i.e., significance tests, confidence intervals).

OLS Regression diagnostics and remedies

1. Multivariate NormalityOLS is not very sensitive to non-normally distributed errors but the efficiency of estimators decreases as the distribution substantially deviates from normal (especially if there are heavy tails). Further, heavily skewed distributions are problematic as they question the validity of the mean as a measure for central tendency and OLS relies on means. Therefore, we usually test for nonnormality of residuals’ distribution and if it's found, attempt to use transformations to remedy the problem.

To test normality of error terms distribution, first, we generate a variable containing residuals:. reg agekdbrn educ born sex mapres80 age

1

Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 5, 1083) = 49.10 Model | 5760.17098 5 1152.0342 Prob > F = 0.0000 Residual | 25412.492 1083 23.4649049 R-squared = 0.1848-------------+------------------------------ Adj R-squared = 0.1810 Total | 31172.663 1088 28.6513447 Root MSE = 4.8441------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- educ | .6158833 .0561099 10.98 0.000 .5057869 .7259797 born | 1.679078 .5757599 2.92 0.004 .5493468 2.808809 sex | -2.217823 .3043625 -7.29 0.000 -2.81503 -1.620616 mapres80 | .0331945 .0118728 2.80 0.005 .0098982 .0564909 age | .0582643 .0099202 5.87 0.000 .0387993 .0777293 _cons | 13.27142 1.252294 10.60 0.000 10.81422 15.72861------------------------------------------------------------------------------

. predict resid1, resid(1676 missing values generated)

Next, we can use any of the tools we used above to evaluate the normality of distribution for this variable. For example, we can construct the qnorm plot:. qnorm resid1

-20

-10

010

20R

esid

uals

-20 -10 0 10 20Inverse Normal

In this case, residuals deviate from normal quite substantially. We could check whether transforming the dependent variable using the transformation we identified above would help us:. quietly reg agekdbrnrr educ born sex mapres80 age. predict resid2, resid(1676 missing values generated). qnorm resid2

-.05

0.0

5.1

Res

idua

ls

-.05 0 .05Inverse Normal

2

Looks much better – the residuals are essentially normally distributed although it looks like there are a few outliers in the tails. We could further examine the outliers and influential observations; we’ll discuss that later.

2. LinearityWe looked at bivariate plots to assess linearity during the screening phase, but bivariate plots do not tell the whole story - we are interested in partial relationships, controlling for all other regressors. We can try plots for such relationship using mrunning command. Let’s download that first:. search mrunning

Keyword search Keywords: mrunning Search: (1) Official help files, FAQs, Examples, SJs, and STBsSearch of official help files, FAQs, Examples, SJs, and STBsSJ-5-3 gr0017 . . . . . . . . . . . . . A multivariable scatterplot smoother (help mrunning, running if installed) . . . . P. Royston and N. J. Cox Q3/05 SJ 5(3):405--412 presents an extension to running for use in a multivariable context

Click on gr0017 to install the program. Now we can use it:

. mrunning agekdbrn educ born sex mapres80 age1089 observations, R-sq = 0.2768

1020

3040

50r's

age

whe

n 1s

t chi

ld b

orn

0 5 10 15 20highest year of school completed

1020

3040

50r's

age

whe

n 1s

t chi

ld b

orn

1 1.2 1.4 1.6 1.8 2was r born in this country

1020

3040

50r's

age

whe

n 1s

t chi

ld b

orn

1 1.2 1.4 1.6 1.8 2respondents sex

1020

3040

50r's

age

whe

n 1s

t chi

ld b

orn

20 40 60 80 100mothers occupational prestige score (1980)

1020

3040

50r's

age

whe

n 1s

t chi

ld b

orn

20 40 60 80 100age of respondent

We can clearly see some substantial nonlinearity for educ and age; mapres80 doesn’t look quite linear either. We can also run our regression model and examine the residuals. One way to do so would be to plot residuals against each continuous independent variable:.lowess resid1 age

3

-10

010

20R

esid

uals

20 40 60 80 100age of respondent

bandwidth = .8

Lowess smoother

We can detect some nonlinearity in this graph. A more effective tool for detecting nonlinearity in such multivariate context is so-called augmented component plus residual plots, usually with lowess curve:

. acprplot age, lowess mcolor(yellow)

-10

010

2030

Aug

men

ted

com

pone

nt p

lus

resi

dual

20 40 60 80 100age of respondent

In addition to these graphical tools, there are also a few tests we can run. One way to diagnose nonlinearities is so-called omitted variables test. It searches for a pattern in residuals that could suggest that a power transformation of one of the variables in the model is omitted. To find such factors, it uses either the powers of the fitted values (which means, in essence, powers of the linear combination of all regressors) or the powers of individual regressors in predicting Y. If it finds a significant relationship, this suggests that we probably overlooked some nonlinear relationship.

. ovtestRamsey RESET test using powers of the fitted values of agekdbrn Ho: model has no omitted variables F(3, 1080) = 2.74 Prob > F = 0.0423

4

. ovtest, rhs(note: born dropped due to collinearity)(note: sex dropped due to collinearity)(note: born^3 dropped due to collinearity)(note: born^4 dropped due to collinearity)(note: sex^3 dropped due to collinearity)(note: sex^4 dropped due to collinearity)

Ramsey RESET test using powers of the independent variables Ho: model has no omitted variables F(11, 1074) = 14.84 Prob > F = 0.0000

Looks like we might be missing some nonlinear relationships. We will, however, also explicitly check for linearity for each independent variable. We can do so using Box-Tidwell test. First, we need to download it:

. net search boxtid(contacting http://www.stata.com)

3 packages found (Stata Journal and STB listed first)-----------------------------------------------------

sg112_1 from http://www.stata.com/stb/stb50 STB-50 sg112_1. Nonlin. reg. models with power or exp. func. of covar. / STB insert by / Patrick Royston, Imperial College School of Medicine, UK; / Gareth Ambler, Imperial College School of Medicine, UK. / Support: [email protected] and [email protected] / After installation, see

We select this first one, sg112_1, and install it. Now use it: . boxtid reg agekdbrn educ born sex mapres80 ageIteration 0: Deviance = 6483.522Iteration 1: Deviance = 6470.107 (change = -13.41466)Iteration 2: Deviance = 6469.55 (change = -.5577601)Iteration 3: Deviance = 6468.783 (change = -.7663782)Iteration 4: Deviance = 6468.6 (change = -.1832873)Iteration 5: Deviance = 6468.496 (change = -.103788)Iteration 6: Deviance = 6468.456 (change = -.0399491)Iteration 7: Deviance = 6468.438 (change = -.0177698)Iteration 8: Deviance = 6468.43 (change = -.0082658)Iteration 9: Deviance = 6468.427 (change = -.0035944)Iteration 10: Deviance = 6468.425 (change = -.0018104)Iteration 11: Deviance = 6468.424 (change = -.0008303)-> gen double Ieduc__1 = X^2.6408-2.579607814 if e(sample) -> gen double Ieduc__2 = X^2.6408*ln(X)-.9256893949 if e(sample) (where: X = (educ+1)/10)-> gen double Imapr__1 = X^0.4799-1.931881531 if e(sample) -> gen double Imapr__2 = X^0.4799*ln(X)-2.650956804 if e(sample) (where: X = mapres80/10)-> gen double Iage__1 = X^-3.2902-.0065387933 if e(sample) -> gen double Iage__2 = X^-3.2902*ln(X)-.009996425 if e(sample) (where: X = age/10)-> gen double Iborn__1 = born-1 if e(sample) -> gen double Isex__1 = sex-1 if e(sample) [Total iterations: 33]Box-Tidwell regression model Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 8, 1080) = 38.76 Model | 6953.00253 8 869.125317 Prob > F = 0.0000 Residual | 24219.6605 1080 22.4256115 R-squared = 0.2230

5

-------------+------------------------------ Adj R-squared = 0.2173 Total | 31172.663 1088 28.6513447 Root MSE = 4.7356------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- Ieduc__1 | 1.215639 .7083273 1.72 0.086 -.174215 2.605492 Ieduc_p1 | .00374 .8606987 0.00 0.997 -1.685091 1.692571 Imapr__1 | 1.153845 9.01628 0.13 0.898 -16.53757 18.84525 Imapr_p1 | .0927861 2.600166 0.04 0.972 -5.009163 5.194736 Iage__1 | -67.26803 42.28364 -1.59 0.112 -150.2354 15.69937 Iage_p1 | -.4932163 53.49507 -0.01 0.993 -105.4593 104.4728 Iborn__1 | 1.380925 .5659349 2.44 0.015 .2704681 2.491381 Isex__1 | -2.017794 .298963 -6.75 0.000 -2.604408 -1.43118 _cons | 25.14711 .2955639 85.08 0.000 24.56717 25.72706------------------------------------------------------------------------------educ | .5613397 .05549 10.116 Nonlin. dev. 11.972 (P = 0.001) p1 | 2.64077 .7027411 3.758------------------------------------------------------------------------------mapres80 | .0337813 .0115436 2.926 Nonlin. dev. 0.126 (P = 0.724) p1 | .4798773 1.28955 0.372------------------------------------------------------------------------------age | .0534185 .0098828 5.405 Nonlin. dev. 39.646 (P = 0.000) p1 | -3.290191 .8046904 -4.089------------------------------------------------------------------------------Deviance: 6468.424.Here, we interpret the last three portions of output, and more specifically the P values there. P=0.001 for educ and P=0.000 for age suggests that there is some nonlinearity with regard to these two variables. Mapres80 appears to be fine. With regard to remedies, the process here is the same as we discussed earlier when talking about bivariate linearity. Once remedies are applied, it is a good idea to retest using these multivariate screening tools.

3. Outliers, Leverage Points, and Influential Observations

A single observation that is substantially different from other observations can make a large difference in the results of regression analysis.  For this reason, unusual observations (or small groups of unusual observations) should be identified and examined. There are three ways that an observation can be unusual:

Outliers: In univariate context, people often refer to observations with extreme values (unusually high or low) as outliers. But in regression models, an outlier is an observation that has unusual value of the dependent variable given its values of the independent variables – that is, the relationship between the dependent variable and the independent ones is different for an outlier than for the other data points. Graphically, an outlier is far from the pattern defined by other data points. Typically, in a regression model, an outlier has a large residual.

Leverage points: An observation with an extreme value (either very high or very low) on a single predictor variable or on a combination of predictors is called a point with high leverage. Leverage is a measure of how far a value of an independent variable deviates from the mean of that variable. In the multivariate context, leverage is a measure of each observation’s distance from the multidimensional centroid in the space formed by all the predictors. These leverage points can have an effect on the estimates of regression coefficients.

6

Influential Observations: A combination of the previous two characteristics produces influential observations. An observation is considered influential if removing the observation substantially changes the estimates of coefficients. Observations that have just one of these two characteristics (either an outlier or a high leverage point but not both) do not tend to be influential.

Thus, we want to identify outliers and leverage points, and especially those observations that are both, to assess and possibly minimize their impact on our regression model. Furthermore, outliers, even when they are not influential in terms of coefficient estimates, can unduly inflate the error variance. Their presence may also signal that our model failed to capture some important factors (i.e., indicate potential model specification problem).

In the multivariate context, to identify outliers, we want to find observations with high residuals; and to identify observations with high leverage, we can use the so-called hat-values -- these measure each observation’s distance from the multidimensional centroid in the space formed by all the regressors. We can also use various influence statistics that help us identify influential observations by combining information on outlierness and leverage.

To obtain these various statistics in Stata, we use predict command. Here are some values we can obtain using predict, with the rule-of-thumb cutoff values for statistics used in outlier diagnostics:

Predict option Result Cutoff value (n=sample size, k=parameters)

xb xb, fitted values (linear prediction); the default

stdp standard error of linear predictionresiduals residualsstdr standard error of the residualrstandard standardized residuals (residuals divided by

standard error)rstudent studentized (jackknifed) residuals, recommended

for outlier diagnostics (for each observation, the residual is divided by the standard error obtained from a model that includes a dummy variable for that specific observation)

|rstudent|> 2

lev (hat) hat values, measures of leverage (diagonal elements of hat matrix)

Hat >(2k+2)/n

*dfits DFITS, influence statistic based on studentized residuals and hat values

|DFits|>2*sqrt(k/n)

*welsch Welsch Distance, a variation on dfits |WelschD|>3*sqrt(k)cooksd Cook's distance, an influence statistic based

on dfits and indicating the distance between coefficient vectors when the jth observation is omitted

CooksD >4/n

*covratio COVRATIO, a measure of the influence of the jth observation on the variance-covariance matrix of the estimates

|CovRatio-1|>3k/n

*dfbeta(varname) DFBETA, a measure of the influence of the jth observation on each coefficient (the difference between the regression coefficient when the jth observation is included and when it is excluded, divided by the estimated standard error of the coefficient)

|DFBeta|> 2/sqrt(n)

*Note: Starred statistics are only available for the estimation sample; unstarred statistics are available both in and out of sample; type predict ... if e(sample) ... if you want them only for the estimation sample.

7

So we could obtain and individually examine various outlier and leverage statistics, e.g.,.predict hats, lev.predict resid, resid.predict rstudent, rstudent

For instance, we can then find the observations with the highest leverage values:. sum hats if e(sample), det Leverage------------------------------------------------------------- Percentiles Smallest 1% .00176 .0015777 5% .0021025 .001619610% .0023401 .00162 Obs 108925% .0030041 .0016511 Sum of Wgt. 1089

50% .0041908 Mean .0055096 Largest Std. Dev. .00404375% .006332 .023640690% .010143 .0258473 Variance .000016395% .0155289 .0302377 Skewness 2.46617999% .0198167 .038942 Kurtosis 11.40481

. list id hats if hats>.023 & hats~=. & e(sample) +-----------------+ | id hats | |-----------------| 3. | 1934 .0302377 | 10. | 112 .038942 | 17. | 1230 .0236406 |2447. | 1747 .0258473 | +-----------------+

But the best way to graphically examine both leverage values and residuals at the same time is the leverage versus the residuals squared plot (L-R plot) (you can replicate it by creating a scatterplot of hat values and residuals squared):

.lvr2plot, mlabel(id)

1934

112

1711594

1230

2156

2235

68

1268

2013

459

2497850

4471699

2638

75114717941457

88815

5 2415

1018

24242196

8052125930

1152

9241982

21882069

1847

1569481266722961347

2516 2718244

2281

1928

2475740

1196

6302673783

352

1683147516152112752

569140369523871673

4602348

1906

1954

246146

970

42

22248397472073

1834

1102

2451243

2556

1395

1884

1059

823840

1164

2402127614971043

2458

246713491985

249119092462

20701356

7552727249021151271

1681258113

633610

671431

11616782250729

442

224122

514

11222681627156213362435

843 2003

2368835

2124

2461299

1448550

2175

2702553824

1684

6221344

20262275126051520391758439

393

1966280

1229

242612581961

892 9601126 11141104

8361608228

2367177220005851205

138217161353

5651542642157

2328415

2508913786

25391427101959184818411354 1977665937

936894789

1626928

2164

1622

2385402

1999

24802377117

323

2411916

2755915491909

906

709

1964

110114091004

1885

1616

2707 1402891

1200

2351

1596

9982669

50

878 1240

16

1057

976

26072238

7971751

2027 2540921365978

1165

2520

214823162700

1136

2225

2579

443

1071

2169

1566

595

17712212

63416871686

2557

1511

1117168984

7902757110696851

506

1120

15581930618 1128749

1912106326002453

1613

1284

30723881672

21711432

945

1163

23332522008

2764

1575

492

2665659

1469215527022194

2034

653

1208

1153

1552251249

1193

152812023865991526

1262152226531515

1512

221311333801157

2089

356

2719

10251140 8221846

2604

2086148240956215233062404

229

53921492052

2382

533624904

877

973

1201

2613

435

1762208820671685

1305

2036

1366243712781072991 5962054

114

2362

876 24772241

5328962159 12185672522215224321887

2763

23452725

22762542

2402488 1547

367

20061795

1971408

1478

10782580

11592446

132826332112

58784

1355

2472803261011132233

269914769502341

25722035

1186

827194824511292

17642009

26841983

213326182335

22631974

570

2639232268

1900

26371094 1426572

4045831068

20621496

972712

855

1952

23472136

281 2481166813671116 61212

358118214720251969830

138519552448 609785980

15511082232

91

1599271260527342660715

2622603

2611

8561664

1451

1142 929 15062427

468

19811679 6548101266

2187

2131177025841192

22

1720690

14252354791

1039

18452185

2752758

196

1535

844

166310862720 10341209 742861298

11212220

198

2019

809 219014302236

1100882

56

13102051

157717252239

26892167

19391242159 23955578932657

7781461

1583

11301710

19901037

297

2765

2436

640

145524131682

115449540121321880

1676416133 2505753

12231054

225

2322

25312438739

275321104072270

14582498

102

1556

1666875

5591077

509

1662104

19514861050

710

1001896796

244483

144545

938

2729

16961468

2007

2166

8161521

21652071

24921882015

1066

1996

969

406

1923

2100399

2481576987

6811997544 15401667 208

516975

26242041796

10444082262

2321

12981832

82364591

1614

268015652032

25991564

27382722

1361134

7381407

934103108016381501

508

2151204319562526

265

5111103588

770

512421

227

1546227417881967

711

23972174

2410

2205249

1316842

503

939

2641

2058

601369

400

7312350

2563 25681563

269 1357260

18289632634660

12561145

571

2704

383

7842726807

1694

1233

426

1274

1231

1530

649

543

23279

1172

18072661902

14821536

2536

1677758 1436135812462655

1567 2122663173

3111753

377

622158719 26022750619

44

1760

1618

6451559813 1220

1591151725602329

586

1946575

354

2394

1549

955 195321182533

10352302578704 11911374

576

655556

174926501724

23661370

535 18031680531

1649

675

1005

1588

1917

14945102357

2059

2065

1181

1752

128217151259

1176

2200845

372

63224421250

4622591

1761

425 1394

2353126

1972

707 1827

1217

1877

100114601882617

2414261 1500214417179401604

294

2336

994

1965326

18731585

1829499

7241169

1654 322

288

457

769

194

636244114706381630

11891586 209211822142

2339

23341390889

2440

2309264918972129

1171

1089

1188

371

1175

22641282288362

14023371631

178

241872 1824683

1838

2197138618102502

2625

18701539

2460

271027282741

25611892

857 883274

2703979

1245

16932737776

113210201541793

235

2457

682

25374272209

2285304 14921082

627

13895810221215

17898622550

686

1971

17792320

231

34113041146

538

498

17002449262321982551143

1811

657

1742

2119 18233732227

3761798

77117915821671840

890

1875

1452

1625714

1491455

6761650687

99

24392791713

22651329

1110 366

24841099

1767

2511

1194

96

1911

1747

12394319761957

24342292

264417192046

602369

480

2361501

1733

205320

6661303

1806

1420

15681096

1449

18042545284

1051

1787

1518

77725661485

21148882040

1534

20961656296

430

2338788

2022

309

1306286276914

958

2061 2060

1049

411

1088500607

96718791400 806

613

612

13

541

2485

1819669

632874361337

1056 524

1932

21401774

16591895

15271488

11611359

179110761709

5221311483

773

527

11743391428

1653 55935

8862083685 282

0.0

1.0

2.0

3.0

4Le

vera

ge

0 .005 .01 .015 .02Normalized residual squared

8

There are many observations with high leverage and residuals; we would be especially concerned about 112, 1934, 2460, 1452 etc. Added variable plots (avplots) is another tool we can use to identify outliers and leverage points – in this case, we can see them in relationship to the slopes. Note that you can also obtain these plots one by one using avplot command, e.g. avplot educ, mlabel(id)

.avplots, mlabel(id)

112

1934

1230

59417112156

2235850

681268

2013

88

794

24971699

447

2415

7511457

4592638

1471

511962196

21881347815

1834

6309302424

1152

2296

2461

2667

352

481

1982

24480578318471569924211212510182069

2516

2462

14035691683228162221242242175247523484610592458

2718

2387

2752

267313491395

695

16734601475740

19281615272725561954

1276

19092070

8393231900

1102

20731497823970211587724461884

6331682035

1795

114

198514511596

2453

1356

2133

24672402976

1575

1885

1243

747

19641355442

840

906

120821871229

835

1284

1200

2362

116

2491

9842036505951758

514

246

67

14781448

19061547270245

2382

755

222425816272490104323284211641260

2368

1305

562

591122

2086

550196625841299

2166

2233

439

16685151116

729203926226812712553

2275

9042633932435824

2270875

45

570

2729

640110441511402633404

1212

599

2755

19711422054

11281762

2765491

1681

1106

92

1562609

14082347

2164

5331113896

2448

1114

21551528

1948

2610238610688431427

11323331258

2026

22

2540

15121522208910041687280562758263716709

1558

1771

27252603

654

565

1353915

2136

105798018412426

2542

1599

2212

1515968

136523677122472876937216777821712669367258026072660

1542

659

195

2388

1622269914322034

1426

1983117

1240

2131

5321511

16781336241158521699131686

1583

2250443122610

1344

2003

16843567891431255720062147238515235961679108

252

1613435

2088

19551367120546820711551

1328

20002432827

2579

1981

1192

298

10782613929

978

1974

797

102550616169732752409972

1382

2522215725202366

120

653

1549

822

3541063

23413802185

58311571664803836

4021608

1506

2488892406

1186

4921153

2395

2354587

2665

1912

223816721469

1764

8941066

1685

14022700

1772

16262112

624

1996

251278540975

111711631430

2335

5162436634

2684

508

1274

1848

909749136

7869982377

2536

2653

960

25991552

891

2152

711

21497911455

567

2757

681

2225271

936

240412984007156426586111019928

17102481

2025

1242

25082213

2351495856

2505

15661676

1716

509

10711770

2480

950

790

809

1482451

2498

2165

14092067

253949

102

112610502734205224272008

307

416

1133

770

146123161577

6

1077

1556880

12012298301486

2763

8101354

1526

1977

84

2492

1969

240

2232855882

1425

665

1218

2560

12781961

270210941366

260526112194

1496

206211302236108618962027

1292

16632657

832159

559

1407

14582268

2600

1266923878

2531

2477

2148118945281166218461213358991

10195391193

1952

1468112091615351374

19827192322

3061476

51

2618

2263

618227620091887

1072

1165

2764

2604

23452707

1999228

57281621

25722241

1100

2753133

1614

91

2437230

1385

9691930

893

1720

1990

2704

27221559

1154

241311361760

543

2158

1262

200717511845

16822019

1034

1881159

557297

131010473915012205

1217

20152563

19651054

690

5881752159

75825911357

12231828

16387105761517262420512680

1037232112311567244422391939195631119612462726

2190

9381080844

1209

2634

11342641

15362438

2526675

1521556

742

21102043

6191946

1176

5861576241011912601316

421955407

212

1039

21511807738

112116491002720

512268914457961588

704

1546225731

2220

153020582118

54416961667

1666

217462441103426197117251923104450326619391256

753

645

1172

660

1591

136917242032963

25335715911259

1747

5751827

1500

902273823971035991694

9341358

204

23272482457

1145

719

124519971631

373

24

326

987

2351032750

682383807

2414

649

1540

2578

627

82

2439

1586

2440228521442200274123941717

1654

10821175

7762065

1089

19671953269179622741220

408

2568

2625

655842

1753

1482

6572100

1897

1700

18752227724

16041715

2460

24926632081677

1803

2611233

1735311618

5111788

227377

2602

194

1215617235063260

1436

364

2655

24491250174918112262

3991564

174281337215631565

1823

535784189223295868670711321877940

1832

2336457

2353

2623

1625

1917602

2197

1171

2442

994

1786838891169

10051194

20591370362

2119

12821394

42519721001

2357

6762882561

2650

366

638

18101389769

538

179

182915821460

2537

1110

2941386

3411873

16931882

2441

455979845

126

2292

23341630

4981329

2264

1789

2434

2309

1541022

158513042129

2649427

636

1491499

179321981188890

7141680

1761

2320

2339

1181167

462

2092

510

25501494

1976

1840

1452

22883761650

25111798

8571911

2484

2337

320274

114643

304214214327101182

22651824

883

2737

19571492

7711239

231

2209

1872

15182502

309

12813902551

1779

1051

501

371

1020888

1713

480

1879687

15392114

18061719

2022

104914066696736918702791470

205

862

2728

2703

1189

322

7771099

1838

1306

411

14851303

607

24852046

96

1534

2566541

2361

1767

43017332862644

1819

6131449276

7882338

1088

2545

1659

11612965001420206020962061914

1568

1400

669

20401804284

17871056806339

1096

13

15271656958

133721404361774

524

612

14886355

287

2083685886

13591895131

527522

19321653

1709

1076

282

1428

1791

1174

935

773

1483

-10

010

2030

e( a

gekd

brn

| X )

-15 -10 -5 0 5 10e( educ | X )

coef = .6158833, se = .05610987, t = 10.98

2591975

25608771298

50904657232821241724

14302292

595

6272236134915172624258424391461135515761004

2395

125011042071376176097621741577

9062198139527221687236611221900619

2449

415

640

1767

1583

46791

60212172083

9842665

260

11402633

27191001111724381455263

70423851631

17492498630710406739

1428

15152144

427

2689

1586

6902641

19661243

1682

14322699

148514861080516

968

2200

176223341556246

219710441896339

1559

1976

1672

2680

1512

1328771823

1681

1353268625832382

2086

2270

158819714001713233815221303404112118481627

2563

1810

1523

2092

224

2034816244

19321153

1733

23942750

150197212582302446

2185

2467

588

232726371840

282

2089134426113651662

1106

27411260

1171

1686913

2212

856

1242

1315962402

2453

13162321

1176

3582727

559

1212888

22051562

524

1885

92

4423358802726

1196769

258

562

116

1154

1407421747

1231

18342704

1274

23678301271

1491

499

156883515582142480

1625

21157122472239754424421807

12921965

17741460

2649266117161664

2505

15362133

4071685416

1063362

244427101567

14261215714

1969587

164912391672667

609

2712368

58

2566

1071

179

601347167816792725

1306

940

889

64921

2220205912591717

1099498

1492

9391770

18291582

23291676108632

843

2550

1604

645

501

783

1540

254218975502354

2166

2644

8101408

20966762623719

1982440

2537

51226001068

2350

366

617

2414439

136

211807

1476

9371841

286

1653135421401103

1957

1803

24801374

24095351266202524

373

2239265

8402233298

1213

2347426

2000

309

168

2035

2285950

25261823

8781752

225133

1819

2511533

1082

1654

909288

21572522

6817902872491

967686

1427687

1939

1758

1240

82

145881322092536

1276

1977

1526

11131793

7492561204319517952757

2621005

13864021608

707

1054

252842

1892

2477

322

1389

22819

2758561668

1547

2046

613

8961667

994

148325208271615

114

148

1663

2765341

1521

1613

436

1566

2058

87522881964515

1425

16842167930

776

55

857

809

1436

1072

15421996

685886

1677

1956311

532

213122682320

2755

9551732669

2065

84

599970

306

17091873742

892

12231591

14701157

45

212

1879517872061

1967

934998

21882553

1693

128

2026

945

326

1136557

1791

1511

15991771197

1827

15526181449

2232

240

1114

2448778

1329178

19462707

2540

22751882

2657

2110

492

2060

2516364

13578931246

2413

2603

850

1152

1500

14022700

15631182855

2155

2264

1089

2492

17791788806

1256

2165

26551494

1828

6755651282

235

159

883

2156

2114

654

695

902

1468

13041116

2502

1034

53125782580367

862

715

2653

1990

87610351409

83

936669

556281

297

1700100214924371528

27182602

4622462

2159145120671527

1056

2441

1382

5691638

2703

923

273872422271875

183225722599

204

1358

284

1948

607

1789

11302613

2164

890

2539

2551

2388

2158189523532461

2151

671310

2339

2112777

1824

9352692070

575

665

2387

653

929

2484

2194

1798

11102737

2545

9871715

21879141431789

2265

622

276

248

1218

23862610265064

978

68349

24151305

1174

2345

1506

2729

1020

758

6

11422512

2262

773

2241

96

2309

154

9608822434

500

1596

11322451

2190

1191180610310252015711

17961022

2424

19172171

105117251694

11261870

1845

938104

1659212527281585263422632475

1022196

43

9632720401208

12781101

1974198312301683

511

18461696

738

1262

1076

2460

2054

408

729

891

2341

1872

753

1912

522

21477881761

16801656634

2684

1448217513692039

2118

973

5941385

2568

2607

9282152140063815658611134

1043135920888242491478

11001128

636107718842660

8392027

2129

228

2276273420524552238

120

1551

274

1161

2119

457

1145510

1953

1172427105018771955

14886821751171926632007

296

16509151673399

275

1132508222413561847

2008

97913672490

2432208

491

1535

1096

2435

2213

24971710

63

845

307

2337

1997

1146

2032

1630

20132250

2702

1961

23511469822

58510885911402579

2022

2322

2485

1753

655

1094

23331804

279120920511930

275220091086

894

1764

1838

991

1192

1981

27531057

2136

13372625495143

194

1189

2377

6591366731

803

83659

1403

12202274

1475

126

2036

231

304

2605

78418872073177228010191037

1497

10492533

1457

9589801390

2069

567

2481

1626

380188

204010782100

2361

22961205770

572539

2411511954

1284

12992404

1720

1336624666

2618

3691169

231615461539371

245270

24261200

481

740

2235

785

612

11932488253111337961445

7866331182148660190920032357

1482

11652410

1120916882006

755570

2362

294

1811

1711

1564

19282348

794

2673

68

1163

216912291999

1569

2764

1181

1985

1622

1471924

751

1159805

9114961201

797

11021420

229196

1666

116419721018

2763

2062122

1982

383

610

4591268

460

815

26381934

12451518

174722411

1616

1549

10594432557323243624571911

571

354

2019

1575

969844

541

1370

1233508

576

430

2556

42

1394

506167092611503

1534425205

1066

2458

527

509

393

1172112

1530

5144421039

1952

1742

99

161419061186

320372

1452

16992225538118813

356

35211941618

26041175586468543

2336227377

447

1923435

-10

010

20e(

age

kdbr

n | X

)

0 .5 1e( born | X )

coef = 1.6790781, se = .57575994, t = 2.92

7101117221461411

6191577275065723271760115346

229226991432

2092

2334232926412220830249823381911233524615561627506

524

406

627

131135323501512

2034

2124913168615221662

1492

2059

587810225

1063

1355

843

559

2096

544

2212

2051354

2089

807906

976244

287

8802726

714527

393

180726492025

255015881161426

550153684210721526

1977

1136

2409

2491

8355352382

4989091521617

2704

2140198

512306

2000

32322682757

1054

1747

1575

749

5011906

400

1803

1566

889

17911494

783

742

845761213

117613888

211184119711281957

18321068827

2703

16672242225

806436

462

2354

2526

2623

1215

2110240

2232

1452

2026

1240

1458

2572998

855

115726049701182

1885

930

1389695

255325781427159685886

923

55284

27001402

2413

21562275

893

1823532

2131955

936

189715422241

2653

10352262

1034

51374

23452551

2718

1725

2516

366

21942440

5386652621892

1511

1824

320102012181262

1382

96798718702387

2264599

17982112

2285

57517712652263

6

261324922603

511

2451

929103

1056

960

156587621252512

1506654500978

1656

1789

1694

1917

562758

1278

1751

9

2545

10763772272388

46851022762336

1872

2765

607

1715

249

112

399

13292164154

2309

891

2424

2265

1974

1022

1191

2152

26631838882195313697581488729

2490

1585

274675

208

2171

522824

2250

1700556

2118

21472634

120

788

279

22271875

724

1189

1961

2702200720091997

1146

1305

2337

1175

1094

543

2129

2196

7841110401134

2032

296

2051

24341923

1659

10771887822275224971475136622741220

1535

2119958

5942013260591510883041630572

457280

1037

2460

20401337

4352533

1161

12082481

567

1086123014971057

2136

3713802175740242611652003

1299

1133

2235

1564

1482

980

1457

23572411

755

7861941546

36929419281159

2764

142091

20362673

24101471

16661496633

19851164805196

1711

1018882348162220621811

1982

26381934

176727192689

1428

24383761001161619322174125026651121690112222367399045712198844

771

123314301576

1733

12431724

1682

25572019259113491517262413444434220831303

1681

13702385

17491672

704282

239410041044969358260

196625601848232813955831687823268

427

1258

1394

1328

62

975

1104

239550261112982584856791

10594151515246727222144151860

184097220711245

1568145515624078161140263318961292

2436152396887724021460148610802200425

1485

21421678

747

171324495952632556

541

1969

1154

2680244450317161762

442710

2644430

219759623972442

1810

126024391583

1476

1271649261107126002711685155927271664211519761540

266114831770

32213162563126640417746308131365

26671099

2637940

5881491499

258

4218781653

719

1501

167

769

984645232116709

1172

2239

9391242339

121724801106

2086

640

21

602

1171

4801679

2367

6872368

1586

2185

1549

123123015341631950

921952

2457

24771212

22051039

840

108

2472712632

1436

155816841939

82

220919004397901103

1615

179

2566147041615671239

2505

2046362173426

51616771005

16041347

15821829

1625

354

17172725

25222157148

133609

2458

286

1306562

1563

994

1787253715301066

2542676

2707

4021608364618

1676707

509

1709

9452270

937

862

85712592741265519672288

24142520

178822811407

58

122314082061

2437

1819

508

252127618821186

179351516181613

9342366

2058

24461699

514

288892

15911779

1649

2602

100

2347281

6132453

15523722043

2320

13862060

17585572561

669

93525021895

533

1282

883

1274

1663442

2159

341

2511492

2233298196514091113715902

1425

1431686

773

1196118812562067896

21211742133

21492669

1873

297

2065

269

3111956

1990

8092539

14491358

204

2728

1152

1693

273826503561668

2737914

2755

2657

1742

1996

194613101599

309

21882353

24

373

1468

531

96

753

49

161418345691304

136

642481696

1654

1082

1796

12462166

1752

653

21902540

135727201527

1114

2448195

1385

197

831783672580

2339

1827

1761

1680

565890

938

2165

1828

681

112621551846

1547

201516820351500

244120702151253621671130

850776

104352

228

78918454081638102515281043

276

11011879

99

1795247567

2568

963111614001948

1964875

114

20275869287781806

2484

634

26842158

1912

7772341

326273420522610238616832462

21142224

109645

6831673

1089

20392427973

2415

19301142114584522381847188423563

2008738

1194

11325081983

1100

2088140

447

1650

1132102

1877

307

135943

2213

1551

839

135617531804636

991

2351

19552607

14486382435

1209259924322054136726601451

45514695918611171719

126

218710192579

1128

1439791051

894

176424611050

2312729

2377

6222322585

711

1192

1981

275320691596803836

1390511710539

491

612

655

17722100

2361275

261824851403

1336731

2333207314781539

1626

18811932404659

245

231679614454951078624682

1720

2296214811812051120916

2022481

19547852488666596602625

11811999

270

10492531

1169

190920061569681163924

794

7701201751

2292169

1284

57019721200

122

79727636101229

1102

383

459

2362

4601268815

-20

-10

010

20e(

age

kdbr

n | X

)

-1 -.5 0 .5e( sex | X )

coef = -2.2178232, se = .30436248, t = -7.29

19321483

322

183811361832

753

8132329243742

8621262

1233

1563

17511494

1385

2703

2728

60

26892719

1930

6121952

1725

1470

9352220

462

1436

2604

1618

773

13443641420

26551696

2572

1039

1189

176727071121

1314762241

2350189515651159510358

1733

1680

1761

1428

225

1788

1684

2262

618

2602

37722726111072

26442281174

784

17384440739914311678100

2600

2345

282

17911266

306878

1677

13942477

2438

1292687287

2720425

1787

9451615

1181511

2811370146010012263

91

1564

1096

8061870

1656

227623941846

1906

5722268

2159

2009

7711882

1043

1666

51

1999

2618

1568

279

1887

1969

1445796923

958284

991

1401019

669

22501303

1779

1709

1967

5396491165

2444842

1540

2046

22242209690119316531521223921422650

82

845269571

2764742

9161120830271019397102027

2110

2490

2019249

1796856

1020

12582502

2750810

126

2663

1354208

408

200313361716128

1492

2568

1804

1071

96

2061

1673

1977

1526

17702194206921481181539

2480

1400849502551

1681

1076807

790

1753

240

223263883

2190

855

148

914

209612212231282

9381562

610

2361

719

739271222527372059665376

100521251164

1218

1616

1475

695

210014091848

1172

2442206712432578

25392397969

1278

23271953

49

8431209

202513902665112650374014881682

747

1182

18722040

857

2060245

994

167215661847

1099

6

7152702

196124911186

1699

10941220227427342052

1824

11172427

19281592008

231

840934

64

2026

1366

307

371

2467

2605987184024021685

1054

103155211012757

755204

13582335583

928

1452

393

2451

2149248

1482294

143

1154

9402508506

2213

20151136452351

936

23161044

55716649021153909

447

7492288

27381145

297

2357

2051274

21562718

156913102682174

587

972460

998

1035

17981496

2426

823233626611018

2387

924

266716084021591

11461037

939

104

2653

892197212716237219972802062

524

1201

235316944927073042292115

205

1328

634

2684

960441471

2557

1063276325222157

247511332337

1034

1530

27001402

25202404

891

96318451774

575

527

21521256

1650

191711031122

2752

426

2481

2112

1990

2377

1912

1250

1966816

2092

25121332556

167

383

500

443439

2238

11572413

5911749

196

653

24091469215151356

7512385

2550

567894

1764

187722818152341

1506

17725351523188

13698931923

1626

596

2673704

25531884

97045962468

2275

131

468

24351025

20002007

1188

1382

7852032

1679824

2638

430

544

108

43621988272320803

836973

2058

2039

481

2533

2522083786

632116316131260356

5506601667

2235

569

2339

120

126826132488

929

2368805

1100

978

822

427

1551

1720

1683197424322140

7142579729

12762338

541

27532727380

1981

1192261

1491

499

435

6172088522

1955

1078260

1367

1152

890

20701793

16869132516

2296

1316

1527

286

15761534512421

2236

21181819

1713

198207312057891985

1403839

6132649

531

1982

258166293016271896

480

233413591807685886

797

14681449

1497

25451803

1299

212

1304269914321604

1213

276

2147

1810

731

5862034498

55961924241515

769296

514

968709162497

21515

179

538

1426

18291582

1337

16631365

2537

2144

22220023226362367

411

788

1056

1080

738712247214582657

1386

1486195455

2566

1240

9551806

2264

341

832013

937

2043

1873

2641151726242006

532

2131

1789

1693

1088

1102509

1909

6767941511

1717

154

230910041059

1719

1022

194624411687246288

585

501

2669

15851349

26802561

146115461535

2212

1395

1536

15771983

2542123914853672580

2129

1911

232111302436

154220651934

12312637

1389

1556117

14252526

1567134717581715

4161353

2414

455155824582726

666979272588023881191

1599

588

211

362

260724982660

1242

1956311369

1630638

2169

79167

2505

1512

1841145587612461522

253116761957

2089

2563

1306

320783

442

1762

2411659

565

1430

1066

1448

2171

655

21972492

1106

2188809

92

1104

1357415

182826331140

1171

1724

609

2165

2540

16381501882

140826344042205

2347

1161086533

11342395

17711113

1212

2603896461068

654

10288924851427

270

607

2448

1114

2155

1625

43

90410501528

2265

835

178

2410

2722

1948

12592511

238626101077

24842755

15591760758

263197683915861

352

2185

2119

1622406

1976

2441710

1827

1057

2136

1996

1500

11422333495980

1614

2164

2623

2302054

1128

599

2354

2415

207117427772584

491543

686

27041169215858

145785024491132298

20862233

40

1711

1668262

2196

1116

1823

457

1892

562

12292328

570594

2756301583

339

1588

23482462121515181649

633

2292

14071374

25609676571245

59

1305

1659

11611879

243950

8882114

1586

2591

195595

1176

2382

1329

24572741

1897

1945761811

309

6272434

2167

15471654

1082129898477820221175

366776

1700

18752227724

1631

1049

1089

8850810511194

2460

2440

675

228555611101217

99

373

24

975

1752

1549

602

1575

354

1478326

1965

2625400

235

640

516275856

681

1355

906

1274

265

9

682

1362270

8772536976

2453

1964

2133

2765

2124

25991885770

2446

711

8752187

452242035

1682729

1795

114

203632319711451

12081284

190015961200

2166

1230

2362

6222175

1196

2461

112

1834

1747

2366

-10

010

20e(

age

kdbr

n | X

)

-20 0 20 40e( mapres80 | X )

coef = .03319454, se = .01187284, t = 2.8

1117271971026892438112212434613491461266515779041344

168161913951153223669017672174112110018232699143224673919662750

14301627238537683010041672268

1682125812502335

212417602328246735823272498184815761687222058350151726241104

24413531328

15122402155626412329

2034258425911724704406415

747

6571562856152225601515168691323941044

2395260

2198843

877771

972

975

2667

1749

11402633

13557911678

5879681523

10632334

2263062810

2092

1292

16621298

2156

207119322212

595906

23501455229297616162089

1428

27272631260

1896816

1733

1354

21151486

40727221271559225

17621969

11617161303

596627

2491

60

42755020251080

1154

422144

147610712444880544835

271

1583205978326002338

1426

168525816642680

4042200807177013652382211146012662726

1492

44

1526

1977

2409263718401347107225571615180711362397443

840

984

9092368

844

878

649

2083

2000

105915592449

208630611062757

524

21422268201924427492480

2367

25631588571

15369302661

282

16791316

261

1540

198

1521640842

243950615662197

92

4391242

168412337142710

12121810

813224

588

1713

1054

9502239

2649

1568

712247251900

2704

108

421

969150121

2477

84

2550

2185

15581315127196951485

23212611

2096

1841970

790

64540053510689392556940

2644

2281

742

2026

1213827617

411

230123112171370

322

1939

498

2205

1976

2232

240

2725

1240

4991491

416

609

5622505

252221572354167

1803769998127618851482572

2718

2553

25161157

1436

254282

39324361103287

855

211010991586

1494

4021608

1458

117111761394

516

15672707

1631

632

6181667937

1832

133

2526

88914272275687

9454266021731911

1677

1408

515

2270

501

1676

2520923

480

27001402

2387

252197124372131

532

1774

11961604

1542

244622094621005

936

159

1717

1792653

1563

1906

1758224112451281613

1223364

1215

362

21252140503

262324132345

2453

1152

893

892147023471483

425

1407

1625

2703

2578

262

15821829

994

323

1957196721942655

1575

281

1034

1552888

665707

1788

1239

1511

21881259165370916533339

9552046

2537

24141374676

17251518100

1218

1035

1382

21592233

1182599

2566

1699

1389

2741492

20585571113

1262

2262

934

14312366850

20514091771

15912112806

857

1663715

1882

58

862

1791

19522602

2288569

6

20432133

261324582263

298

896

1834

1274

1649

18231897

1306

20672603

286

2669436

2424

1172

2149

15491425

1793

929

112541

24511787

87610399602539

654

2225987288

2492251226525512440

1506430

1668

57597817795941282297

809

1965

2755

9021892

1386

1990

2320

1819

25611256

212

2061

1599284

366

1278576

1824

23882502

2604

17512758561747

1956311

883

341

2070

5111020

49103269265764

1996

2285

2264

2457354

106617092540

1565187022761310

1114

2448

527

204

13581798

653

1972475

1468

2065

2738

88668550925116132765

36725809

686216655514

2164

1694

1385565

753

2490

1946

20601547

1682035

1952155

136

2415

1043

11861530

1696

1873891669

1126967

219083

18467292720265019742497

19172481534

2165219612301246442

16932152

135782450867938

1796

2353

1789

531

1452

2728189516832497892015

2250

681

17951752

152839921712013

510

22827372167

373

2410251828

16731847882

1101171513

1130246219641116935

1654

1082

1827

1304104

114

1948

2536161818452147

1191

2224

1449

2151

1872

120

1680

1761

2663

500

2027

75815001369

1788751953

105696

16381656

309

2339

408928

2752

2309

154

9148901961

2702

2545

18841475208

773

634

2684

1305

2009

1912

238626102039

1174

2341778

675

2441372

1329

27342052

963

2568

2118

10222634

10946072007

776

453565381142556973

8391132427

2265

135618381585

19302158274

2238352401700

4681983107620081527

2235

250819972088822

320

1887

2051

724

18752227

307

144815512213

1077

1100

113413667842435

279118820321879

1145

102

1189

326915

276

2607

1614

2351

1955738377227

99126052801535

447

2461

1089

11462069

6831451

2337

2054

2484

1488180624321400

2660845136721295727881469

1742

22741220235117

1209

2336175314971110

77718772187740522

62221142599

1037

1128

861243412082579

1132

1457

14010191650

1096

1403

1764

894

2481

567

21752119591

2962377

6361596

43272910505851086

2136

1057

1981

11926382426

3801630

457

30425338368035862073

491

2322455

2460

165951

126

1336229627535396399

1299

20031772711

17101804958

2618245543

10881165275

2333

755

1478

481

979

1711

171914313591928

1626

11332319806591471210019231193240424111175

2040

188655

7311051

23161078786

1954

37113376241390

68495

1161

26731205144579611821481564

1482

435

17201120916

11941546

2764

7851159

236123571539

612

20362488

2485

591569805

79491

194

682

751294

270

1909881985

1999

92411641018369633660

1496

24102531

1666

2006

20222348

1181

26251163

1420

666

196

1982

1169162212012062

1284

2169

1268459229570770

1049

1102

122

12001229

797

610193427632638

46018111972815

2362

383

-10

010

2030

e( a

gekd

brn

| X )

-40 -20 0 20 40e( age | X )

coef = .05826431, se = .00992019, t = 5.87

Observation #2460 is the first one that looks especially suspicious - that's an outlier, a high residual observation; same thing with 1305. Looks like these are people who had their first child very late in life. As for high leverage observations, not too many stand out on this graph, although #112 might be one – looks like that might be a foreign born individual with very little education who had their first child relatively late in life.

To supplement these graphs, we can use a number of influence statistics that combine information on outlier status and leverage -- DFITS, Welsch's D, Cook's D, COVRATIO, and DFBETAs. It is usually a good idea to obtain a range of those to decide which cases are really problematic.

It makes sense to list the values of your dependent and independent variables for those observations that have values of these measures above the suggested cutoffs.E.g., we get Cook's D (based on hat values and standardized residuals):. predict cooksd if e(sample), cooksd

9

Don’t forget to specify “if e(sample)” here – Cook’s D is available out of sample as well!NOTE: if you already generated a variable with this name (e.g. cooksd) but want to reuse the name, just use the drop command first: e.g., drop cooksd

Now we list those observations with high Cook's distance. The cutoff is 4/n so in this case, it's 4/1089=.00367309.

. sort cooksd

. list id agekdbrn educ born sex mapres80 age cooksd if cooksd>=4/1089 & cooksd~=. +--------------------------------------------------------------------+ | id agekdbrn educ born sex mapres80 age cooksd | |--------------------------------------------------------------------|1031. | 1394 30 15 no female 28 33 .0036766 |1032. | 63 19 19 yes female 34 64 .003683 |1033. | 2484 37 17 yes female 52 56 .0037003 |1034. | 1906 29 10 no male 23 39 .0037224 |1035. | 994 38 15 yes female 33 41 .003788 | |--------------------------------------------------------------------|1036. | 22 19 12 no male 44 23 .0038182 |1037. | 1402 37 12 yes male 33 42 .0038667 |1038. | 742 36 13 yes male 28 39 .0038726 |1039. | 366 37 17 yes male 66 44 .0041899 |1040. | 2265 39 17 yes male 52 55 .004212 | |--------------------------------------------------------------------|1041. | 2703 16 16 yes male 23 45 .004219 |1042. | 1284 17 12 yes female 64 76 .0043403 |1043. | 2764 35 12 yes male 23 75 .0044005 |1044. | 1114 39 12 yes female 46 46 .0044603 |1045. | 2653 38 12 yes male 32 43 .0044713 | |--------------------------------------------------------------------|1046. | 322 13 16 yes female 20 38 .0044766 |1047. | 352 16 9 no female 44 49 .0045471 |1048. | 1382 39 12 yes male 35 45 .0045595 |1049. | 1990 42 13 yes female 34 46 .0046982 |1050. | 514 16 11 no female 40 42 .0047655 | |--------------------------------------------------------------------|1051. | 1186 30 12 no female 30 44 .0049131 |1052. | 669 37 18 yes female 32 49 .005042 |1053. | 1428 17 20 yes female 32 28 .0052439 |1054. | 753 35 13 yes female 17 51 .0053052 |1055. | 797 34 12 yes female 35 83 .0054951 | |--------------------------------------------------------------------|1056. | 126 38 15 yes female 28 65 .0056446 |1057. | 1824 41 16 yes male 34 49 .0058367 |1058. | 6 40 12 yes male 29 47 .0059349 |1059. | 447 26 6 no female 23 55 .0060603 |1060. | 1549 32 14 no female 66 34 .0061423 | |--------------------------------------------------------------------|1061. | 1066 32 13 no female 47 40 .0062896 |1062. | 612 36 18 yes female 23 73 .0063017 |1063. | 508 18 14 no female 64 40 .0064009 |1064. | 1747 24 17 no male 86 36 .0065845 |1065. | 1189 39 16 yes male 23 62 .0066001 | |--------------------------------------------------------------------|1066. | 773 37 20 yes female 28 54 .0070942 |1067. | 2545 42 18 yes male 46 54 .0072636 |1068. | 1709 38 20 yes female 35 47 .0073801 |1069. | 541 35 18 no female 46 37 .0075467 |1070. | 524 16 19 yes male 42 34 .0075767 | |--------------------------------------------------------------------|1071. | 430 35 18 no female 44 38 .0075794 |

10

1072. | 1194 21 17 no female 66 60 .0079331 |1073. | 435 19 12 no male 36 67 .0079604 |1074. | 1172 33 14 no female 32 39 .0080491 |1075. | 411 21 18 no male 51 30 .0082472 | |--------------------------------------------------------------------|1076. | 1952 31 12 no female 20 40 .0083125 |1077. | 1575 34 12 no male 64 34 .0090088 |1078. | 1934 25 0 yes male 23 89 .009117 |1079. | 1711 27 2 yes male 36 69 .0093139 |1080. | 114 37 12 yes female 66 47 .0096068 | |--------------------------------------------------------------------|1081. | 2156 25 2 yes male 20 33 .0104581 |1082. | 527 22 20 no male 44 43 .0112643 |1083. | 2362 36 12 yes female 64 83 .0117106 |1084. | 1305 44 12 yes male 56 53 .0125958 |1085. | 2415 35 7 yes female 42 48 .0133718 | |--------------------------------------------------------------------|1086. | 1982 37 8 yes male 30 83 .0139673 |1087. | 1452 41 16 no male 36 47 .0191272 |1088. | 2460 50 16 yes male 64 62 .0251248 |1089. | 112 32 2 no male 63 38 .0434919 | +--------------------------------------------------------------------+That's quite a few; the largest Cook’s D belong to observations 112, 2460, and 1452. All of those stood out in graphs as well, so we want to investigate those, but first we might want to examine other indices (e.g. DFITS, COVRATIO, etc.) as well. In the end, we want to identify and further investigate those observations that are consistently problematic across a range of diagnostic tools.

E.g., we can combine the information on high leverage, high studentized residual, and Cook’s D:.scatter hats rstudent [w=cooksd] , mfc(white)

0.0

1.0

2.0

3.0

4Le

vera

ge

-4 -2 0 2 4Studentized residuals

To identify problematic observations, let's replace circles with ID numbers:. scatter hats student [w=cooksd] , mlabel(id)

11

1934

112

1711594

1230

2156

2235

68

1268

2013

459

2497 850

4471699

2638

7511471 7941457

88815

5 2415

1018

24242196

8052125930

1152

9241982

21882069

1847

1569481 266722961347

2516 2718

244

2281

1928

2475740

1196

630 2673783

352

16831475 16152112752

5691403

69523871673

4602348

1906

1954

246146

970

42

2224 8397472073

1834

1102

2451243

2556

1395

1884

1059

823840

1164

24021276

14971043

2458

24671349

198524911909

246220701356

7552727249021151271

1681258113

633610

671431

1161678 2250729

442

224 122

514

11222681627

156213362435

8432003

2368835

2124

2461299

1448550

2175

2702553824

1684

6221344

2026227512605152039 1758439

393

1966280

1229

242612581961

892960 1126 11141104

836 1608228

236717722000 5851205

138217161353

565154264 2157

2328415

2508913786

253914271019

59184818411354 1977665937

936894789

1626928

2164

1622

2385402

1999

24802377117

323

2411916

2755915 491909

906

709

1964

110114091004

1885

1616

2707 1402891

1200

2351

1596

9982669

50

878 1240

16

1057

976

26072238

797

1751

2027 2540921365 978

1165

2520

214823162700

1136

2225

2579

443

1071

2169

1566

595

17712212

6341687 1686

2557

1511

1117 168984

7902757 110696851

506

1120

15581930

618 11287491912

106326002453

1613

1284

30723881672

21711432

945

1163

23332522008

2764

1575

492

2665659

1469215527022194

2034

653

1208

1153

1552251249

1193

1528 1202386599 1526

1262152226531515

1512

22131133 3801157

2089

356

2719

10251140822 1846

2604

2086148

2409 56215233062404

229

539

21492052

2382

533624904

877

973

1201

2613

435

1762208820671685

1305

2036

1366 24371278

10729915962054

114

2362

876 24772241

532896 2159 1218567252221522432

1887

2763

23452725

22762542

2402488 1547

367

20061795

197 1408

1478

10782580

11592446

132826332112

58784

1355

24728032610 1113 2233

2699 14769502341

25722035

1186

827 194824511292

17642009

26841983

213326182335

22631974

570

2639232268

1900

263710941426572

404583 1068

20621496

972712

855

1952

23472136

28124811668 13671116 61212

358118

214720251969

8301385

19552448609785980

15511082232

91

15992712605

27342660715262

2603

2611

8561664

1451

1142929 15062427

468

19811679 6548101266

2187

213117702584

1192

22

1720690

14252354791

1039

18452185

2752758

196

1535

844

1663

10862720 10341209 742861 298

1121 2220

198

2019

809 219014302236

1100882

56

13102051

157717252239

26892167

1939 12421592395557893

2657

7781461

1583

11301710

19901037

297

2765

2436

640

1455 24131682

115449540121321

8801676 416133 2505

7531223

1054225

2322

2531 2438739

2753 21104072270

14582498

102

1556

1666875

5591077

509

1662104

19514861050

710

100 1896

7962444

83

144545

938

2729

16961468

2007

2166

8161521

21652071

2492 1882015

1066

1996

969

406

1923

2100399

2481576 987

6811997544 15401667208

516975

26242041796

10444082262

2321

12981832

82364591

1614

26801565

2032

25991564

27382722

1361134

7381407

934103108016381501

508

2151 204319562526

265

5111103 588

770

512421

227

15462274 17881967

711

23972174

2410

2205249

1316842

503

939

2641

2058

601369

400

7312350

2563 25681563

2691357260

18289632634

660

12561145

571

2704

383

7842726807

1694

1233

426

1274

1231

1530

649

543

23279

1172

18072661902

14821536

2536

1677758 143613581246

26551567 212

2663 173311

1753

377

62 2158 719 26022750619

44

1760

1618

64515598131220

15911517

25602329

586

1946575

354

2394

1549

9551953 21182533

1035230 257870411911374

576

655

556

174926501724

23661370

535 18031680531

1649

675

1005

1588

1917

14945102357

2059

2065

1181

1752

128217151259

1176

2200845

372

632 24421250

4622591

1761

425 1394

2353126

1972

707 1827

1217

1877

10011460 1882617

24142611500 21441717 940 1604

294

2336

994

1965326

18731585

1829499

7241169

1654322

288

457

769

194

636 24411470

6381630

11891586209211822142

2339

23341390889

2440

230926491897

21291171

1089

1188

371

1175

2264 1282288 362

14023371631

178

241872 1824683

1838

2197138618102502

2625

18701539

2460

271027282741

25611892

857883274

2703979

1245

16932737776

1132 10201541793

235

2457

682

25374272209

22853041492 1082

627

138958 10221215

1789862

2550686

1971

17792320

231

34113041146

538

498

17002449262321982551143

1811

657

1742

2119 18233732227

3761798

77117915821671840

890

1875

1452

1625714

1491455

6761650 687

99

2439 2791713

22651329

1110 366

24841099

1767

2511

1194

96

1911

1747

123943 19761957

24342292

26441719 2046

602369

480

2361501

1733

205 320

666

13031806

1420

15681096

1449

18042545284

1051

1787

1518

77725661485

2114 8882040

1534

20961656 296

430

2338788

2022

309

1306286276914

958

2061 2060

1049

411

1088500607

967 18791400 806

613

612

13

541

2485

1819669

63 287436

13371056524

1932

21401774

16591895

15271488

11611359

179110761709

522 1311483

773

527

11743391428

165355935

8862083 685 282

0.0

1.0

2.0

3.0

4Le

vera

ge

-4 -2 0 2 4Studentized residuals

Another set of index measures of influence, DFBETAs, focuses on one regression coefficient at a time. It is a normalized measure of the effect of each specific observation on a regression coefficient, estimated by omitting each observation and comparing the resulting coefficient to the coefficient with that observation included in the data. Positive DFBETA value indicates that an observation increases the value of the coefficient; negative value indicates a decrease in the coefficient due to that observation.

. dfbeta(1676 missing values generated) DFeduc: DFbeta(educ)(1676 missing values generated) DFborn: DFbeta(born)(1676 missing values generated) DFsex: DFbeta(sex)(1676 missing values generated) DFmapres80: DFbeta(mapres80)(1676 missing values generated) DFage: DFbeta(age)

. di 2/sqrt(1089)

.06060606

. scatter DFage DFsex DFborn DFeduc DFmapres80 id, yline(.06 -.06) mlabel(id id id id id)

12

5691316212224404243

4445464950

5155565859

6062

63

646768828384

88

91

92

9699100102103104108112113114116117118120122

126

128131133136140143148154159167168173178179188194195196197198204205208211212224225227228229230

231235240

244245246248249252258260261262263265268269

270

271274275

276279280281282284286287288294296297298304306

307309311320322323326339341352354356358362364366367369371372373376377380

383

393399400402404406407408411415416421425426427430

435

436439442443447

455

457459

460

462468480481491492495498499500501503506508509510511512514515516522

524

527531532533535538539541543544550556557559562565567569570571572575576583585586587588591594595596599602607609610612

613617618619622624

627630632633634636638640645649653654655657

659660665

666

669675676681

682

683685686687690695704707709710

711712714715719724729731738739740742747749751753755

758769770771773776777778783784785786788789790791794

796

797

803805806807809810813815816822823824827830835

836839840842843844845850855856857861862875876877878880882883886888889890891892893894896902904906909913914915916923924928929930934935936937938939940945950955958960963967968969970972973

97597697897998098498799199499810011004100510181019102010221025103410351037103910431044

1049

1050105110541056105710591063106610681071107210761077107810801082108610881089109410961099

11001101

1102

110311041106111011131114111611171120112111221126112811301132113311341136114011421145114611521153115411571159

1161

116311641165

1169

11711172117411751176

1181

1182118611881189

1191119211931194

11961200

1201120512081209121212131215121712181220122312291230123112331239124012421243124512461250125612581259126012621266126812711274127612781282

1284

12921298129913031304

1305

13061310131613281329133613371344134713491353135413551356135713581359136513661367136913701374138213851386138913901394139514001402

1403140714081409

1420

14251426

1427

1428143014311432

143614451448144914511452145514571458146014611468146914701471147514761478148214831485148614881491

149214941496

14971500150115061511

15121515151715181521152215231526152715281530153415351536153915401542154615471549155115521556155815591562156315641565156615671568156915751576157715821583158515861588159115961599160416081613161416151616161816221625

16261627163016311638164916501653165416561659166216631664

1666

166716681672167316761677167816791680

16811682

168316841685168616871693169416961699170017091710

1711

17131715171617171719

172017241725173317421747174917511752175317581760176117621764

17671770177117721774177917871788178917911793179517961798180318041806180718101811

181918231824182718281829183218341838184018411845184618471848187018721873187518771879188218841885188718921895189618971900190619091911191219171923192819301932

1934

19391946194819521953195419551956195719611964196519661967196919711972197419761977

1981

1982

1983198519901996199719992000

20032006200720082009201320152019202220252026202720322034203520362039204020432046205120522054205820592060206120622065206720692070207120732083208620882089

2092

2096210021102112211421152118

2119

212421252129213121332136214021422144214721482149215121522155

2156

2157215821592164216521662167216921712174217521852187218821902194219621972198220022052209221222132220222422252227223222332235223622382239224122502262226322642265226822702274227522762281228522882292229623092316232023212322

232723282329233323342335233623372338233923412345234723482350

23512353235423572361

2362

236623672368237723822385238623872388239423952397240224042409

24102411241324142415242424262427243224342435243624372438243924402441244224442446244824492451245324572458

2460

24612462246724722475247724802481

24842485

2488249024912492249724982502250525082511251225162520252225262531253325362537253925402542254525502551255325562557256025612563256625682572257825792580258425912599260026022603260426052607261026112613261826232624262526332634263726382641264426492650265326552657266026612663266526672669267326802684268926992700270227032704270727102718

271927202722272527262727272827292734273727382741275027522753275527572758

27632764

27655

6

9131621

2224404243444546495051

55565859606263646768

82838488

91

9296

99100102103104108

112

113

114116117118120122

126128131133136

140143148

154159167168173178179188194195196197198204205208211212224225227228229230231235240

244245246248249252258260261262263265268269270271274275

276

279280281282284286

287

288294296

297298304306

307309311320322323326339341

352354356358362364

366

367369371372373376377380383

393399400402404406407408

411

415416421425426427430435

436439442443447

455457459460

462468480481491492495498499

500501503506

508509510

511512514515516

522524527

531

532

533535538539

541

543544550556557559562565567569570571572

575576583585586587588591594595596599602607

609610612613617618619622624

627

630632633634636638640645649

653

654

655657659660665666

669

675676681

682683685686687690695704707709710

711712714715719724729

731738739740

742

747749751753

755

758769770771773776777778783784785786788789790791794796

797

803805806

807809810813815816822823824827830835836839840842843844845850855856857861862875876877878880882

883886888889890

891

892893894896902904906909913914915916923924928929930934935

936

937938939940945950955958960963967968969970972973975

976

978979980984987991

994

99810011004100510181019102010221025

1034

10351037

10391043104410491050105110541056105710591063

1066106810711072107610771078108010821086108810891094

1096

109911001101110211031104110611101113

1114

111611171120112111221126112811301132113311341136114011421145

1146115211531154115711591161116311641165116911711172117411751176

1181118211861188

1189

1191

119211931194

1196120012011205120812091212121312151217

1218

1220122312291230123112331239

1240

12421243124512461250125612581259126012621266126812711274127612781282

1284

12921298129913031304

1305

1306

1310131613281329133613371344134713491353135413551356135713581359136513661367136913701374

1382

13851386

13891390139413951400

1402

14031407140814091420

14251426

142714281430143114321436

1445144814491451

1452

14551457

145814601461146814691470147114751476147814821483148514861488149114921494

1496149715001501

15061511

1512151515171518152115221523152615271528153015341535153615391540

1542

15461547154915511552155615581559156215631564156515661567156815691575

15761577158215831585158615881591159615991604160816131614161516161618162216251626162716301631163816491650165316541656165916621663166416661667166816721673167616771678167916801681168216831684168516861687169316941696169917001709

1710171117131715171617171719172017241725173317421747

1749

1751175217531758176017611762176417671770177117721774177917871788

1789

179117931795179617981803

180418061807181018111819

18231824

1827182818291832183418381840184118451846184718481870

18721873187518771879188218841885188718921895189618971900190619091911

1912

19171923192819301932193419391946194819521953

1954195519561957

1961196419651966196719691971197219741976

1977

1981

1982

19831985

199019961997199920002003

2006200720082009201320152019202220252026202720322034203520362039

204020432046205120522054205820592060206120622065206720692070207120732083208620882089

2092

2096210021102112211421152118

211921242125212921312133

2136214021422144214721482149215121522155

2156

215721582159

2164

2165

2166216721692171217421752185218721882190219421962197219822002205220922122213222022242225222722322233223522362238223922412250

226222632264

2265

226822702274

227522762281228522882292229623092316232023212322

23272328232923332334233523362337

2338

2339

23412345234723482350

2351235323542357

2361

2362

23662367236823772382238523862387

2388

2394239523972402240424092410241124132414

2415

242424262427243224342435243624372438243924402441244224442446244824492451245324572458

2460

2461246224672472247524772480248124842485248824902491

249224972498

2502

2505250825112512251625202522

25262531

2533

25362537

2539

25402542

2545

2550255125532556255725602561256325662568

2572257825792580258425912599260026022603260426052607261026112613261826232624262526332634263726382641264426492650

2653

26552657266026612663

26652667266926732680268426892699

2700

27022703

270427072710

2718

271927202722272527262727272827292734273727382741275027522753

2755275727582763

27642765569

13

1621

22

244042434445464950515556585960626364676882838488919296

99

100102103104108

112

113114116117118120122126128131133136140143148154159167168173178179

188194195196197198204

205

208211212224225

227

228229230231235240244245246248249252258260261262263265268269270271274275276279280281282284286287288294296297298304306307309311

320322323326339341

352

354

356

358362364366367369371372373376377380383

393

399400402404406407408

411

415416421425426427

430

435

436439

442

443

447

455457459460462468480481491492495498499500501503506

508

509

510511512

514

515516522524

527

531532533535538539

541

543

544550556557559562565567569570571

572575576583585586587588591594595596599602607609610612613617618619622624627630632633634636638640645649

653654655657659660665666669675676681682683685686687690695704707709710711712714715719724729731738739740

742747749751753755758769770771773776777778783784785786788789790791794796797803805806807809810813815816822823824827830835836839840842843

844

84585085585685786186287587687787888088288388688888989089189289389489690290490690991391491591692392492892993093493593693793893994094595095595896096396796896997097297397597697897998098498799199499810011004100510181019102010221025103410351037

1039

1043104410491050105110541056105710591063

1066

10681071107210761077107810801082108610881089109410961099110011011102110311041106111011131114111611171120112111221126112811301132113311341136114011421145114611521153115411571159116111631164116511691171

1172

11741175117611811182

1186

11881189119111921193

1194

1196120012011205120812091212121312151217121812201223122912301231

1233

12391240124212431245124612501256125812591260126212661268127112741276127812821284129212981299130313041305130613101316132813291336133713441347134913531354135513561357135813591365136613671369

1370137413821385138613891390

1394

13951400140214031407140814091420142514261427142814301431143214361445144814491451

1452

14551457145814601461146814691470147114751476147814821483148514861488149114921494149614971500150115061511151215151517

1518152115221523152615271528

1530

15341535153615391540154215461547

1549

1551155215561558155915621563156415651566156715681569

1575

1576157715821583158515861588159115961599160416081613

1614

16151616

1618

162216251626162716301631163816491650165316541656165916621663166416661667166816721673167616771678167916801681168216831684168516861687169316941696

16991700170917101711171317151716171717191720172417251733

1742

1747

1749175117521753175817601761176217641767177017711772177417791787178817891791179317951796179818031804180618071810181118191823182418271828182918321834183818401841184518461847184818701872

1873187518771879188218841885188718921895189618971900

1906

19091911

19121917

1923

1928193019321934193919461948

1952

19531954195519561957196119641965196619671969197119721974197619771981198219831985199019961997199920002003200620072008200920132015

2019

202220252026202720322034203520362039204020432046205120522054205820592060206120622065206720692070207120732083208620882089209220962100211021122114211521182119212421252129213121332136214021422144214721482149215121522155215621572158215921642165

21662167216921712174217521852187218821902194219621972198220022052209221222132220222422252227223222332235223622382239224122502262226322642265226822702274227522762281228522882292229623092316232023212322232723282329233323342335

2336

2337233823392341234523472348235023512353235423572361236223662367236823772382238523862387238823942395239724022404240924102411241324142415242424262427243224342435243624372438243924402441244224442446244824492451245324572458246024612462246724722475247724802481248424852488249024912492249724982502

2505250825112512251625202522252625312533253625372539254025422545255025512553

2556

2557256025612563256625682572257825792580258425912599260026022603

2604

26052607261026112613261826232624262526332634263726382641264426492650

26532655265726602661266326652667266926732680268426892699

2700270227032704270727102718271927202722272527262727272827292734273727382741275027522753275527572758276327642765569

1316212224404243444546495051

55

5658596062

63

6467

68

828384

88

9192

96

99100102103104108

112

113

114

116117118120122

126

128131133136140143148154159167168173178179

188194195196197198204

205208211212224225227228229230231235240244

245246248249252258260261262263265268269270271274275276279280281

282

284

286287288294296

297298304306307309311320

322

323326

339341352

354356358362364366367369371372373376377380383393399400402404406407408

411

415416421425426427

430

435436439442443

447

455457459460

462468480481491492495498499

500501

503506508509510511512514515516

522

524527

531532533535538539

541

543544550556557559562565567569570571572575576583585586587588591

594

595596599602607

609610

612

613

617618619622624627630632633634636638640645649653654655657659660665666

669

675676681682683685686687690695704707709710711712714715719724729731738739740742747749751

753

755758769770771

773

776777778783784785786788789790791

794

796797803805806

807809810813815816822823824827830835836839840842843844845850

855856857861862875876877878880882883

886888889890891892893894896902904906909913914915916923924928929930934935936937938939940945950955

958

960963967968969970972973975976978979980984987991

994

998100110041005101810191020102210251034103510371039104310441049105010511054105610571059106310661068107110721076107710781080108210861088

10891094

1096

109911001101110211031104110611101113

1114

1116111711201121112211261128113011321133113411361140114211451146

1152

1153115411571159

11611163116411651169117111721174

117511761181118211861188

1189

11911192119311941196

12001201120512081209121212131215121712181220122312291230123112331239124012421243124512461250125612581259126012621266126812711274127612781282128412921298129913031304

1305

1306

13101316132813291336133713441347134913531354135513561357135813591365136613671369137013741382138513861389139013941395140014021403140714081409

1420

142514261427

1428

14301431143214361445144814491451

1452

1455

1457

14581460146114681469147014711475147614781482148314851486

1488

1491

1492149414961497150015011506151115121515151715181521152215231526152715281530153415351536153915401542154615471549155115521556155815591562156315641565156615671568

1569

15751576157715821583158515861588159115961599160416081613161416151616161816221625

162616271630163116381649165016531654

1656

16591662166316641666166716681672167316761677167816791680168116821683168416851686168716931694169616991700

1709

1710

1711

17131715171617171719172017241725173317421747174917511752175317581760176117621764176717701771177217741779178717881789

179117931795179617981803

18041806180718101811

18191823

1824

1827182818291832183418381840184118451846184718481870

1872

1873

18751877187918821884

18851887189218951896189719001906

19091911191219171923192819301932

1934

193919461948195219531954195519561957196119641965196619671969197119721974197619771981

1982

198319851990199619971999200020032006200720082009201320152019202220252026202720322034203520362039

2040

2043204620512052205420582059

20602061206220652067206920702071207320832086208820892092

2096

2100211021122114211521182119

21242125

2129213121332136214021422144214721482149215121522155

2156

215721582159216421652166216721692171217421752185218721882190219421962197219822002205220922122213222022242225222722322233

2235

22362238223922412250226222632264

2265

226822702274227522762281228522882292229623092316232023212322232723282329233323342335233623372338

2339

23412345234723482350235123532354235723612362

23662367236823772382238523862387

23882394239523972402240424092410241124132414

2415

242424262427243224342435243624372438243924402441244224442446244824492451245324572458246024612462246724722475247724802481

248424852488249024912492

2497

24982502250525082511251225162520252225262531253325362537253925402542

2545

255025512553255625572560256125632566256825722578257925802584259125992600260226032604260526072610261126132618262326242625263326342637

263826412644264926502653265526572660266126632665266726692673268026842689269927002702

2703

270427072710

2718

271927202722272527262727272827292734273727382741275027522753

2755

27572758276327642765

5

6

9

131621222440424344

45

4649

50

5155565859

606263646768828384

8891

92

9699

100102103104108

112

113

114

116117118120122

126

128131133136140143148154159167168173178179188194195196197198

204205208211212

224

225227

228229230231

235240244245246248249252258260261262263265268269270271274275276279280281

282

284286287288294296297298304306307

309311320

322

323326

339341352354356358362364

366

367369371372

373

376377380383393399400402404406407408411415416421425426427430435436439442443447455457459460462468480481491492495498499500501503506

508

509510

511

512514515

516522524527531532533535538539541543544550556557559562565567569570571572575576583585586587588591594

595596599602607609610

612

613617618619622624

627630632633634636638640645649

653

654655657659660665666

669

675676

681

682

683685686687690695704707709710

711

712714715719724

729731738739740

742747749751

753

755758769770771

773

776

777778783784785786788789790791794796797803805806

807809810813815816822823824827830835836839840842843844845850855856857861862875876877878880882883886888889890891892893894896902904906909913914915916923924928929930934935

936937938939940945950955958960963967968969970972973

975976

978979980984987991

99499810011004100510181019

102010221025103410351037103910431044

1049105010511054105610571059106310661068107110721076107710781080108210861088108910941096

109911001101110211031104110611101113

1114

111611171120112111221126112811301132113311341136114011421145114611521153115411571159116111631164116511691171

117211741175

1176

1181118211861188

1189

119111921193

1194

1196

120012011205

1208120912121213121512171218

12201223122912301231123312391240124212431245124612501256125812591260

1262

126612681271

1274

127612781282

1284

12921298129913031304

1305

130613101316132813291336133713441347134913531354

1355135613571358135913651366136713691370137413821385

138613891390139413951400140214031407140814091420

142514261427

1428

143014311432

1436

144514481449145114521455145714581460146114681469147014711475

1476147814821483148514861488149114921494

1496149715001501150615111512151515171518152115221523152615271528153015341535153615391540

15421546

15471549

1551155215561558155915621563156415651566156715681569

1575

15761577158215831585

1586

1588159115961599160416081613161416151616161816221625

16261627163016311638164916501653

165416561659166216631664166616671668167216731676167716781679168016811682168316841685168616871693169416961699

17001709

1710

171117131715171617171719172017241725

17331742

1747

174917511752175317581760176117621764176717701771177217741779178717881789179117931795179617981803180418061807181018111819

1823

1824

1827182818291832

18341838184018411845184618471848187018721873187518771879

18821884188518871892189518961897

19001906190919111912191719231928193019321934193919461948

1952

1953195419551956195719611964

19651966196719691971197219741976

197719811982198319851990

199619971999200020032006200720082009201320152019202220252026202720322034

2035203620392040204320462051205220542058205920602061206220652067206920702071207320832086208820892092209621002110211221142115211821192124

21252129213121332136214021422144214721482149215121522155215621572158215921642165

2166

21672169217121742175

21852187218821902194219621972198220022052209221222132220222422252227

22322233223522362238223922412250

226222632264

226522682270227422752276228122852288229222962309231623202321232223272328232923332334233523362337233823392341234523472348235023512353235423572361

2362

2366236723682377238223852386238723882394239523972402240424092410241124132414

2415

24242426242724322434243524362437243824392440244124422444

2446

244824492451

245324572458

2460

24612462246724722475

2477248024812484248524882490249124922497249825022505250825112512251625202522252625312533

2536

25372539254025422545255025512553255625572560256125632566

256825722578257925802584259125992600

2602

2603260426052607261026112613261826232624262526332634263726382641264426492650

2653

265526572660266126632665266726692673268026842689269927002702

2703

2704270727102718

2719272027222725272627272728

2729

2734273727382741275027522753

2755

2757

275827632764

2765

-.4-.2

0.2

.4

0 1000 2000 3000respondnt id number

Dfbeta age Dfbeta sexDfbeta born Dfbeta educDfbeta mapres80

Observations 112 and 2460 seem to have influence on a number of coefficients; others seem to have effects on specific coefficients, so we need to look into those that have particularly large effects.

Remedies:Once you detected influential data points, you need to decide what to do with them. Typically, non-influential outliers and leverage points do not concern us much, although outliers do increase error variance. We also want to watch out for clusters of outliers, which may suggest an omitted variable. But influential points can have dramatic effects, and we definitely want to investigate those. Once we find them, there is no one clear-cut solution. They should not be ignored, but neither should they be automatically deleted. Typically, the presence of an influential point can mean one of the following:A. Our model is correct, the influential point can be attributed to some kind of measurement errorB. The value of the influential point is observed correctly, but our model is not correct in that it cannot model the influential point well. Possible reasons for that: (a) The relationship between the dependent and the independent variable is not linear in the interval of values that includes the influential point; (b) There is another explanatory variable that can help account for that influential point; (c) The model has heteroskedasticity problems. Unfortunately, often it is not possible to determine which one is the case. But here’s what you can do:

1. You have to investigate what makes these data points unusual —- make sure that you examine their values on all of the variables you use. This will help identify potential data entry errors or might provide other clues as to why these data points are unusual. E.g., we could check #112:

. list agekdbrn educ born sex mapres80 age if id==112 +------------------------------------------------+ | agekdbrn educ born sex mapres80 age | |------------------------------------------------| 10. | 32 2 no male 63 38 | +------------------------------------------------+

13

Let’s also get averages for all variables to compare:. sum agekdbrn educ born sex mapres80 age if e(sample) Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- agekdbrn | 1089 23.66483 5.352695 11 50 educ | 1089 13.3168 2.719027 0 20 born | 1089 1.070707 .2564527 1 2 sex | 1089 1.624426 .4844932 1 2 mapres80 | 1089 39.44077 12.95284 17 86 age | 1089 46.1258 15.06822 19 89

2. If you are considering omitting unusual data, you should investigate whether omitting these data points changes the results of your regression model. Try omitting them one by one and compare the coefficients with and without them: are there large changes? Let’s check what happens if we omit #112:. reg agekdbrn educ born sex mapres80 age, beta Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 5, 1083) = 49.10 Model | 5760.17098 5 1152.0342 Prob > F = 0.0000 Residual | 25412.492 1083 23.4649049 R-squared = 0.1848-------------+------------------------------ Adj R-squared = 0.1810 Total | 31172.663 1088 28.6513447 Root MSE = 4.8441------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| Beta-------------+---------------------------------------------------------------- educ | .6158833 .0561099 10.98 0.000 .3128524 born | 1.679078 .5757599 2.92 0.004 .0804462 sex | -2.217823 .3043625 -7.29 0.000 -.2007438 mapres80 | .0331945 .0118728 2.80 0.005 .0803266 age | .0582643 .0099202 5.87 0.000 .1640182 _cons | 13.27142 1.252294 10.60 0.000 .------------------------------------------------------------------------------. reg agekdbrn educ born sex mapres80 age if id~=112, beta Source | SS df MS Number of obs = 1088-------------+------------------------------ F( 5, 1082) = 50.04 Model | 5841.74787 5 1168.34957 Prob > F = 0.0000 Residual | 25261.3762 1082 23.3469281 R-squared = 0.1878-------------+------------------------------ Adj R-squared = 0.1841 Total | 31103.1241 1087 28.6137296 Root MSE = 4.8319------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| Beta-------------+---------------------------------------------------------------- educ | .63726 .0565958 11.26 0.000 .3214802 born | 1.515919 .5778803 2.62 0.009 .0722698 sex | -2.187693 .3038273 -7.20 0.000 -.1980863 mapres80 | .030491 .0118905 2.56 0.010 .0737543 age | .0583569 .0098953 5.90 0.000 .1644404 _cons | 13.20334 1.249428 10.57 0.000 .

The actual effect of that observation on the coefficients of educ, mapres80, and born are rather pretty small; for each, beta changes by about 0.01.Also, try omitting the most persistent influential points as a group and examine the effects. If there are large changes in coefficients, you might use that to justify omitting a few (but only very few) observations from the model – but you will also have to explain what is so special about these cases.

14

3. To reduce the incidence of high leverage points, consider transforming skewed variables and/or topcoding/bottomcoding variables to bring univariate outliers closer to the rest of the distribution (e.g. coding incomes of >$100,000 to $100,000 so that these high values do not stand out), like we did when we discussed data screening (and if that was done at that stage, it reduces the chances that problems emerge in multivariate context).

4. If unusual data come in clusters, you may have to introduce another variable to control for their unusualness, or you might want to deal with them in a separate regression model.

5. Robust regression is another option when one observes substantial problems with influential data. The Stata rreg command performs a robust regression using iteratively reweighted least squares, i.e., assigning a weight to each observation with higher weights given to better behaved observations, while extremely unusual data can have their weights set to zero so that they are not included in the analysis at all.

. rreg agekdbrn educ born sex mapres80 age, gen(wt)Robust regression Number of obs = 1089 F( 5, 1083) = 52.34 Prob > F = 0.0000------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- educ | .6518023 .0539119 12.09 0.000 .5460186 .7575859 born | 1.792079 .5532063 3.24 0.001 .7066014 2.877556 sex | -2.012778 .29244 -6.88 0.000 -2.586591 -1.438965 mapres80 | .0275798 .0114078 2.42 0.016 .005196 .0499637 age | .0522715 .0095316 5.48 0.000 .033569 .070974 _cons | 12.34444 1.203239 10.26 0.000 9.983493 14.70538------------------------------------------------------------------------------

. sum wt, det Robust Regression Weight------------------------------------------------------------- Percentiles Smallest 1% .2138941 0 5% .5965052 .000736310% .7419349 .0035576 Obs 108925% .8782627 .0726816 Sum of Wgt. 1089

50% .9564363 Mean .9001565 Largest Std. Dev. .151333775% .988214 .999999890% .9983087 .9999999 Variance .022901995% .9996306 1 Skewness -2.92681499% .9999847 1 Kurtosis 12.98754

Comparing the robust regression results with the OLS results on the previous page, we see that even though there are a few small differences, the coefficients, standard errors, and p-values are quite similar. Despite the minor problems with influential data that we observed while doing our diagnostics, the robust regression analysis yielded quite similar results, suggesting that these problems are indeed minor. If the results of OLS and robust regression were substantially different, we would need to further investigate what problems in our OLS model caused the difference. If it is impossible to resolve such problems, then the robust regression results should be viewed as more trustworthy.

15

4. Additivity.

First and foremost, we should always use our theory insights to consider the need for interactions. We can have interactions between dummies (or sets of dummies), a dummy (or a set of dummies) and a continuous variable, or two continuous variables. To avoid multicollinearity problems, you should code your dummies 0/1 and mean-center those continuous variables that are involved in interaction terms.

. gen sexd=sex-1

. gen bornd=born-1(6 missing values generated)

. for var age educ mapres80: sum X \ gen Xmean=X-r(mean)-> sum age Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- age | 2751 46.28281 17.37049 18 89-> gen agemean=age-r(mean)(14 missing values generated)

-> sum educ Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- educ | 2753 13.36397 2.973924 0 20-> gen educmean=educ-r(mean)(12 missing values generated)

-> sum mapres80 Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- mapres80 | 1619 40.96912 13.63189 17 86-> gen mapres80mean=mapres80-r(mean)(1146 missing values generated)

A user-written program “fitint” helps find statistically significant two-way interactions, so it can be used as a diagnostic tool.

. net search fitintClick on: fitint from http://fmwww.bc.edu/RePEc/bocode/f

. fitint reg agekdbrn bornd sexd agemean educmean mapres80mean, twoway(bornd sexd agemean educmean mapres80mean) factor(bornd sexd) Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 15, 1073) = 17.65 Model | 6169.67284 15 411.311523 Prob > F = 0.0000 Residual | 25002.9902 1073 23.301948 R-squared = 0.1979-------------+------------------------------ Adj R-squared = 0.1867 Total | 31172.663 1088 28.6513447 Root MSE = 4.8272------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- _Ibornd_1 | 1.710533 .9779923 1.75 0.081 -.2084614 3.629527 _Isexd_1 | -2.21852 .3179507 -6.98 0.000 -2.842395 -1.594644 agemean | .0587138 .0171439 3.42 0.001 .0250744 .0923532 educmean | .4551926 .0888308 5.12 0.000 .2808908 .6294943mapres80mean | .033156 .0203674 1.63 0.104 -.0068085 .0731205 _Ibornd_1 | (dropped) _Isexd_1 | (dropped)_IborXsex_~1 | .1211157 1.271076 0.10 0.924 -2.372961 2.615193

16

_Ibornd_1 | (dropped) agemean | (dropped)_IborXagem~1 | .0048469 .0568729 0.09 0.932 -.1067477 .1164415 _Ibornd_1 | (dropped) educmean | (dropped)_IborXeduc~1 | -.2922046 .210566 -1.39 0.166 -.7053724 .1209631 _Ibornd_1 | (dropped)mapres80mean | (dropped)_IborXmapr~1 | .0046759 .0414082 0.11 0.910 -.0765743 .0859261 _Isexd_1 | (dropped)_IsexXagem~1 | -.0031427 .0207363 -0.15 0.880 -.043831 .0375455 _Isexd_1 | (dropped)_IsexXeduc~1 | .391932 .1146716 3.42 0.001 .1669259 .6169381 _Isexd_1 | (dropped)_IsexXmapr~1 | -.0005186 .024932 -0.02 0.983 -.0494397 .0484024 __13_6 | -.0038885 .0038209 -1.02 0.309 -.0113858 .0036088 __14_6 | .0004487 .0008266 0.54 0.587 -.0011732 .0020706 __15_6 | .0033919 .0044236 0.77 0.443 -.005288 .0120717 _cons | 24.98069 .2579745 96.83 0.000 24.4745 25.48688------------------------------------------------------------------------------------------------------------------------------------------------------Fitting and testing any interactions and any main effects not includedin interaction terms using the ratio of the mean square error of eachterm and the residual mean square error to obtain an F ratio statistic------------------------------------------------------------------------Model summaryNumber of observations used in estimation: 1089Regression command: regressDependent variable: agekdbrnResidual MSE: 23.30degrees of freedom: 1073------------------------------------------------------------------------ Term | Mean square F ratio df1 df2 P>F-------------------+---------------------------------------------------- i.bornd*i.sexd | 0.21 0.01 1 1073 0.9241 i.bornd*agemean | 0.17 0.01 1 1073 0.9321 i.bornd*educmean | 44.87 1.93 1 1073 0.1655 i.bornd*mapres80mean| 0.30 0.01 1 1073 0.9101 i.sexd*agemean | 0.54 0.02 1 1073 0.8796 i.sexd*educmean | 272.21 11.68 1 1073 0.0007 i.sexd*mapres80mean| 0.01 0.00 1 1073 0.9834 agemean*educmean | 24.13 1.04 1 1073 0.3091 agemean*mapres80mean| 6.87 0.29 1 1073 0.5874 educmean*mapres80mean| 13.70 0.59 1 1073 0.4434------------------------------------------------------------------------It appears that when all twoway interactions are tested simultaneously, the only one that is statistically significant is sex by education. We could also check each two-way interaction separately to make sure we did not miss anything by testing all simultaneously:

. for X in var bornd sexd agemean educmean mapres80mean: for Y in var bornd sexd agemean educmean mapres80mean: fitint reg agekdbrn bornd sexd agemean educmean mapres80mean, twoway(Y X) factor(bornd sexd)[output omitted]

Note that you should always include main effect variables in addition to the interaction, because the interaction term can only be interpreted together with that main effect. Further, if you want to explore three-way interactions, the model should also include all possible two-way interactions in addition to main terms. For example:. gen bornsex=bornd*sexd(6 missing values generated)

17

. gen borneduc=bornd*educmean(13 missing values generated). gen educsex=educmean*sexd(12 missing values generated). gen educsexborn=educmean*sexd*bornd(13 missing values generated). xi: reg agekdbrn bornd sexd agemean educmean mapres80mean bornsex borneduc educsex educsexborn Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 9, 1079) = 29.48 Model | 6152.90509 9 683.656121 Prob > F = 0.0000 Residual | 25019.7579 1079 23.1879128 R-squared = 0.1974-------------+------------------------------ Adj R-squared = 0.1907 Total | 31172.663 1088 28.6513447 Root MSE = 4.8154------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- bornd | 1.779615 .9740461 1.83 0.068 -.131624 3.690854 sexd | -2.220267 .3134215 -7.08 0.000 -2.835252 -1.605282 agemean | .0594442 .0098793 6.02 0.000 .0400593 .078829 educmean | .4461687 .0831105 5.37 0.000 .2830922 .6092451mapres80mean | .0324834 .0118427 2.74 0.006 .0092461 .0557208 bornsex | .0946345 1.204318 0.08 0.937 -2.268437 2.457706 borneduc | -.4745646 .2819971 -1.68 0.093 -1.027889 .0787601 educsex | .3621368 .1124932 3.22 0.001 .1414065 .5828671 educsexborn | .4750623 .3902632 1.22 0.224 -.2906985 1.240823 _cons | 25.00961 .2479526 100.86 0.000 24.52309 25.49614-------------+----------------------------------------------------------------

But we’ll focus on two-way interactions for now, and in order to explore how to interpret them, we’ll review 4 examples: (1) an interaction of two dichotomous variables; (2) an interaction of a dummy variable and a continuous variable; (3) an interaction of a set of dummy variables and a continuous variable; (4) an interaction of two continuous variables.

Example 1: Two dichotomous variables

. reg agekdbrn educ bornd##sexd mapres80 age Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 6, 1082) = 40.91 Model | 5764.17997 6 960.696662 Prob > F = 0.0000 Residual | 25408.483 1082 23.4828863 R-squared = 0.1849-------------+------------------------------ Adj R-squared = 0.1804 Total | 31172.663 1088 28.6513447 Root MSE = 4.8459------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- educ | .6165377 .0561537 10.98 0.000 .5063552 .7267202 1.bornd | 1.358118 .9670434 1.40 0.160 -.5393752 3.25561 1.sexd | -2.251548 .3152298 -7.14 0.000 -2.870079 -1.633017 | bornd#sexd | 1 1 | .4964787 1.201596 0.41 0.680 -1.861244 2.854201 | mapres80 | .0333659 .0118846 2.81 0.005 .0100464 .0566855 age | .0584314 .0099322 5.88 0.000 .0389428 .07792 _cons | 12.73045 .9671152 13.16 0.000 10.83281 14.62808------------------------------------------------------------------------------The interaction is not statistically significant, but let’s suppose it would be. Then we can first interpret the two main effects: the foreign born men have children 1.4 years later than the native born men, and the native born women have children 2.3 years earlier than the native born men.

18

To interpret the interaction term, we need to focus on one variable as our main variable and the other will be used as a moderator. We can do it both ways.

Nativity status as the main variable:The effect of being foreign born is 1.4 for men (i.e., the foreign born men have children 1.4 years later than the native born men), but for women, it is 1.4+0.5=1.9 (that is, the foreign born women have children 1.9 years later than the native born women).

Gender as the main variable:The effect of gender is -2.3 for the native born (i.e., the native born women have children 2.3 years earlier than the native born men), but for the foreign born, it is -2.3 +.5=-1.8 (that is, the foreign born women have children 1.8 years earlier than the foreign born men).

The only time when we would use both main effects and an interaction is when we wanted to compare across gender and nativity status at the same time: that is, the foreign born women have children 0.4 of a year earlier than the native born men: 1.4-2.3+0.5=-0.4

Although it doesn’t make sense to examine an interaction of two dummy variables graphically, we can use “adjust” command to help us interpret this interaction:

. xi: qui reg agekdbrn educ i.bornd*sexd mapres80 age

. adjust educ mapres80 age if e(sample), by(sexd bornd)------------------------------------------------------------------------------- Dependent variable: agekdbrn Command: regress Variables left as is: _Ibornd_1, _IborXsexd_1 Covariates set to mean: educ = 13.316804, mapres80 = 39.440773, age = 46.125805-------------------------------------------------------------------------------------------------------------- | bornd sexd | 0 1----------+----------------- 0 | 24.9519 26.31 1 | 22.7004 24.555---------------------------- Key: Linear Prediction

These are the predicted values of agekdbrn given average values of education, age, and mother’s occupational prestige.

Example 2: A dummy variable and a continuous variable

. reg agekdbrn bornd##c.educmean sexd mapres80 age Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 6, 1082) = 41.17 Model | 5793.5421 6 965.590349 Prob > F = 0.0000 Residual | 25379.1209 1082 23.4557494 R-squared = 0.1859-------------+------------------------------ Adj R-squared = 0.1813 Total | 31172.663 1088 28.6513447 Root MSE = 4.8431---------------------------------------------------------------------------------- agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-----------------+---------------------------------------------------------------- 1.bornd | 1.716336 .5764944 2.98 0.003 .585162 2.847509 educmean | .6352486 .058401 10.88 0.000 .5206565 .7498407 |

19

bornd#c.educmean | 1 | -.2323323 .194782 -1.19 0.233 -.6145255 .1498609 | sexd | -2.229199 .3044525 -7.32 0.000 -2.826583 -1.631814 mapres80 | .0334778 .0118729 2.82 0.005 .0101813 .0567743 age | .0587405 .0099263 5.92 0.000 .0392636 .0782175 _cons | 20.93833 .7498733 27.92 0.000 19.46696 22.4097----------------------------------------------------------------------------------Again, no significant interaction, but for practice, we’ll interpret the results.

Education as the main variable, nativity status as the moderator:Among the native born individuals, a one year increase in education is associated with a 0.6 of a year increase in the age of having kids. Among the foreign born individuals, a one year increase in education is associated with a (.63-.23)=0.4 of a year increase in the age of having kids.

Nativity status as the main variable, education as the moderator:Among those with average education (13.4 years), the foreign born have kids 1.7 years later than the native born. Among those with education one unit above average (14.4 years), the foreign born have kids 1.5 years later than the native born (1.7+1*(-0.2)). Among those with education one unit below average (12.4 years), the foreign born have kids 1.9 years later than the native born (1.7 + (-1*(-0.2))). We could also look at those whose education is 4 years below average (9.4 years); for them, the foreign born have kids 2.5 years later than the native born (1.7 + (-4*(-0.2))).

We could estimate this model in a different way to see separately the effects of education in the native born and the foreign born groups; that will also allow us to see if the effect is significant in each of the groups:

. gen educfb=educmean*bornd(13 missing values generated). gen educnb=educmean(12 missing values generated). replace educnb=0 if bornd==1(256 real changes made)

. reg agekdbrn bornd educfb educnb sexd mapres80 age Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 6, 1082) = 41.17 Model | 5793.5421 6 965.590349 Prob > F = 0.0000 Residual | 25379.1209 1082 23.4557494 R-squared = 0.1859-------------+------------------------------ Adj R-squared = 0.1813 Total | 31172.663 1088 28.6513447 Root MSE = 4.8431------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- bornd | 1.716336 .5764944 2.98 0.003 .585162 2.847509 educfb | .4029163 .1871522 2.15 0.032 .0356939 .7701387 educnb | .6352486 .058401 10.88 0.000 .5206565 .7498407 sexd | -2.229199 .3044525 -7.32 0.000 -2.826583 -1.631814 mapres80 | .0334778 .0118729 2.82 0.005 .0101813 .0567743 age | .0587405 .0099263 5.92 0.000 .0392636 .0782175 _cons | 20.93833 .7498733 27.92 0.000 19.46696 22.4097------------------------------------------------------------------------------

This way we can see that the effect of education is significant in both groups. Finally, we can again examine this interaction graphically. . adjust sexd mapres80 age if e(sample), gen(pred1)

20

---------------------------------------------------------------------------------- Dependent variable: agekdbrn Command: regress Created variable: pred1 Variables left as is: bornd, educfb, educnb Covariates set to mean: sexd = .62442607, mapres80 = 39.440773, age = 46.125805-------------------------------------------------------------------------------------------------------- All | xb----------+----------- | 23.6648---------------------- Key: xb = Linear Prediction

. twoway (line pred1 educ if bornd==0, sort color(red) legend(label(1 "native born"))) (line pred1 educ if bornd==1, sort color(blue) legend(label(2 "foreign born")) ytitle("Respondent’s Age When 1st Child Was Born"))

1520

2530

Res

pond

ent’s

Age

Whe

n 1s

t Chi

ld W

as B

orn

0 5 10 15 20highest year of school completed

native born foreign born

Alternatively, we could split pred1 into two variables (or if needed more):.separate pred1, by(bornd)

This would generate two variables, pred10 and pred11, which we can graph:.line pred10 pred11 educ, lcolor(red blue) sort

1520

2530

0 5 10 15 20highest year of school completed

pred1, bornd == 0 pred1, bornd == 1

Example 3: A set of dummy variables and a continuous variable

21

. reg agekdbrn bornd marital##c.educmean sexd mapres80 age

Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 13, 1075) = 21.96 Model | 6540.34346 13 503.103343 Prob > F = 0.0000 Residual | 24632.3195 1075 22.9137856 R-squared = 0.2098-------------+------------------------------ Adj R-squared = 0.2003 Total | 31172.663 1088 28.6513447 Root MSE = 4.7868

------------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------------+---------------------------------------------------------------- bornd | 1.536577 .5729824 2.68 0.007 .4122865 2.660868 | marital | widowed | -.8946254 .626208 -1.43 0.153 -2.123354 .3341031 divorced | -.9166076 .3889825 -2.36 0.019 -1.679859 -.1533567 separated | -1.944692 .7095625 -2.74 0.006 -3.336977 -.5524077 never married | -2.55648 .5380556 -4.75 0.000 -3.612238 -1.500722 | educmean | .6467199 .0727279 8.89 0.000 .504015 .7894247 |marital#c.educmean | widowed | -.3294696 .167311 -1.97 0.049 -.6577629 -.0011764 divorced | .0213546 .151949 0.14 0.888 -.2767956 .3195049 separated | -.0935184 .2455722 -0.38 0.703 -.5753736 .3883368 never married | -.527267 .2268917 -2.32 0.020 -.9724677 -.0820662 | sexd | -2.028997 .3066702 -6.62 0.000 -2.630737 -1.427257 mapres80 | .0292701 .0118022 2.48 0.013 .0061121 .0524282 age | .0435388 .0117499 3.71 0.000 .0204835 .0665942 _cons | 22.24782 .8245124 26.98 0.000 20.62999 23.86566------------------------------------------------------------------------------------ To test whether the set of interactions is jointly significant:. mat list e(b)

e(b)[1,16] 1b. 2. 3. 4. 5. bornd marital marital marital marital maritaly1 1.5365772 0 -.8946254 -.91660762 -1.9446921 -2.5564798

1b.marital# 2.marital# 3.marital# 4.marital# 5.marital# educmean co.educmean c.educmean c.educmean c.educmean c.educmeany1 .64671988 0 -.32946962 .02135465 -.09351841 -.52726698

sexd mapres80 age _consy1 -2.0289968 .02927015 .04353882 22.247823

. test 2.marital#c.educmean 3.marital#c.educmean 4.marital#c.educmean 5.marital#c.educmean

( 1) 2.marital#c.educmean = 0 ( 2) 3.marital#c.educmean = 0 ( 3) 4.marital#c.educmean = 0 ( 4) 5.marital#c.educmean = 0

F( 4, 1075) = 2.22 Prob > F = 0.0653

We cannot reject the null hypothesis, so we conclude that jointly these interaction effects are not statistically significant (they do not add significantly to the amount of variance explained by the

22

model; although it is possible that with fewer groups, the overall significance test would change). If we were to explore these interaction terms, however, we would want to get the estimates of separate slopes of education by marital status:

. tab marital, gen(mardummy) marital | status | Freq. Percent Cum.--------------+----------------------------------- married | 1,269 45.90 45.90 widowed | 247 8.93 54.83 divorced | 445 16.09 70.92 separated | 96 3.47 74.39never married | 708 25.61 100.00--------------+----------------------------------- Total | 2,765 100.00

. for num 1/5: gen educmarX=educmean*mardummyX-> gen educmar1=educmean*mardummy1(12 missing values generated)-> gen educmar2=educmean*mardummy2(12 missing values generated)-> gen educmar3=educmean*mardummy3(12 missing values generated)-> gen educmar4=educmean*mardummy4(12 missing values generated)-> gen educmar5=educmean*mardummy5(12 missing values generated)

. reg agekdbrn bornd i.marital educmar1-educmar5 sexd mapres80 age Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 13, 1075) = 21.96 Model | 6540.34346 13 503.103343 Prob > F = 0.0000 Residual | 24632.3195 1075 22.9137856 R-squared = 0.2098-------------+------------------------------ Adj R-squared = 0.2003 Total | 31172.663 1088 28.6513447 Root MSE = 4.7868-------------------------------------------------------------------------------- agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------------+---------------------------------------------------------------- bornd | 1.536577 .5729824 2.68 0.007 .4122865 2.660868 | marital | widowed | -.8946254 .626208 -1.43 0.153 -2.123354 .3341031 divorced | -.9166076 .3889825 -2.36 0.019 -1.679859 -.1533567 separated | -1.944692 .7095625 -2.74 0.006 -3.336977 -.5524077never married | -2.55648 .5380556 -4.75 0.000 -3.612238 -1.500722 | educmar1 | .6467199 .0727279 8.89 0.000 .504015 .7894247 educmar2 | .3172503 .1522423 2.08 0.037 .0185245 .615976 educmar3 | .6680745 .1348759 4.95 0.000 .4034246 .9327244 educmar4 | .5532015 .2360602 2.34 0.019 .0900105 1.016392 educmar5 | .1194529 .2155296 0.55 0.580 -.3034536 .5423594 sexd | -2.028997 .3066702 -6.62 0.000 -2.630737 -1.427257 mapres80 | .0292701 .0118022 2.48 0.013 .0061121 .0524282 age | .0435388 .0117499 3.71 0.000 .0204835 .0665942 _cons | 22.24782 .8245124 26.98 0.000 20.62999 23.86566--------------------------------------------------------------------------------It appears that education has a statistically significant effect on age of parenthood in all groups except for the never married.

23

Example 4: Two continuous variables

Both variables should be mean centered: . reg agekdbrn bornd c.educmean##c.agemean sexd mapres80 Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 6, 1082) = 41.24 Model | 5801.57307 6 966.928846 Prob > F = 0.0000 Residual | 25371.0899 1082 23.4483271 R-squared = 0.1861-------------+------------------------------ Adj R-squared = 0.1816 Total | 31172.663 1088 28.6513447 Root MSE = 4.8423-------------------------------------------------------------------------------------- agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------------------+---------------------------------------------------------------- bornd | 1.679599 .5755567 2.92 0.004 .5502651 2.808932 educmean | .6362385 .0581443 10.94 0.000 .5221503 .7503268 agemean | .054804 .0102529 5.35 0.000 .0346862 .0749219 |c.educmean#c.agemean | -.0045353 .0034131 -1.33 0.184 -.0112324 .0021618 | sexd | -2.232587 .3044578 -7.33 0.000 -2.829982 -1.635193 mapres80 | .0335181 .0118711 2.82 0.005 .010225 .0568111 _cons | 23.64786 .52946 44.66 0.000 22.60897 24.68674-------------------------------------------------------------------------------------- The interaction term is not significant. But if it were, to interpret it, we would pick one variable that’s primary and the other one will serve as the moderator variable. E.g. if education is primary:For agemean=0 (age at its mean, 46 y.o.), the effect of education is educmean coefficient, .6362385For agemean=20 (age is at mean+20, i.e. 66 y.o.), the effect of education is . di .6362385 + 20*-.0045353.5455325For agemean=-20 (age=26 y.o.), the effect of education is. di .6362385 - 20*-.0045353.7269445

We can do the same thing graphically -- focus on one of the continuous variables and then graph it at various levels of the other one. E.g., we’ll see how the effect of education varies by age: . gen educage=educmean*agemean(24 missing values generated)

. qui reg agekdbrn bornd educmean agemean eudcage sexd mapres80

. qui adjust bornd sexd mapres80 if e(sample), gen(pred2)

. twoway (line pred2 educ if age==30, sort color(red) legend(label(1 "30 years old"))) (line pred2 educ if age==40, sort color(blue) legend(label(2 "40 years old"))) (line pred2 educ if age==50, sort color(green) legend(label(3 "50 years old"))) (line pred2 educ if age==60, sort color(lime) legend(label(4 "60 years old")) ytitle("Respondent’s Age When 1st Child Was Born"))

24

1520

2530

Res

pond

ent’s

Age

Whe

n 1s

t Chi

ld W

as B

orn

0 5 10 15 20highest year of school completed

30 years old 40 years old50 years old 60 years old

Here we can see that the higher one’s age, the later they had their first child, but the effect of education becomes a little bit smaller with age (e.g. with age, the intercept becomes larger but the slope of education becomes smaller).We could have done it other way around – graph how agekdbrn is related to age for educational levels of, say, educ=10, 12, 14, 16, and 20. There is also a user-written command that allows to automatically generate such a graph for three values – mean, mean+sd, mean-sd:

. net search sslopeClick on: sslope from http://fmwww.bc.edu/RePEc/bocode/s

. sslope agekdbrn bornd educmean sexd mapres80 agemean educage, i(educmean agemean educage) graph Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 6, 1082) = 41.24 Model | 5801.57308 6 966.928846 Prob > F = 0.0000 Residual | 25371.0899 1082 23.4483271 R-squared = 0.1861-------------+------------------------------ Adj R-squared = 0.1816 Total | 31172.663 1088 28.6513447 Root MSE = 4.8423------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- bornd | 1.679599 .5755567 2.92 0.004 .5502651 2.808932 educmean | .6362385 .0581443 10.94 0.000 .5221503 .7503268 sexd | -2.232587 .3044578 -7.33 0.000 -2.829982 -1.635193 mapres80 | .0335181 .0118711 2.82 0.005 .010225 .0568111 agemean | .054804 .0102529 5.35 0.000 .0346862 .0749219 educage | -.0045353 .0034131 -1.33 0.184 -.0112324 .0021618 _cons | 23.64786 .52946 44.66 0.000 22.60897 24.68674------------------------------------------------------------------ Simple slope of agekdbrn on educmean at agemean +/- 1sd ------------------------------------------------------------------ agemean | Coef. Std. Err. t P>|t|------------+----------------------------------------------------- High | .5678996 .066709 8.51 0.000 Mean | .6362385 .0581443 10.94 0.000 Low | .7045775 .0871861 8.08 0.000------------+-----------------------------------------------------

25

1020

3040

50ag

ekdb

rn

-15 -10 -5 0 5educmean

r's age when 1st child born agemean+1sdagemean mean agemean-1sd

Note that this gives us significance tests for the slope estimates at three levels of the moderator variable. If we reverse how we list the two main effect variables in the i() option of this command, we get:. sslope agekdbrn bornd educmean sexd mapres80 agemean educage, i(agemean educmean educage) graph------------------------------------------------------------------ Simple slope of agekdbrn on agemean at educmean +/- 1sd ------------------------------------------------------------------ educmean | Coef. Std. Err. t P>|t|------------+----------------------------------------------------- High | .0424724 .0154784 2.74 0.006 Mean | .054804 .0102529 5.35 0.000 Low | .0671357 .0119546 5.62 0.000------------+-----------------------------------------------------

1020

3040

50ag

ekdb

rn

-20 0 20 40agemean

r's age when 1st child born educmean+1sdeducmean mean educmean-1sd

Finally, let’s consider a more complicated case when we have a curvilinear relationship of age with agekdbrn and an interaction between age and education; we will right away create interaction terms to be able to use adjust command for graphs:. gen agemean2=agemean^2(14 missing values generated). gen agemean3=agemean^3(14 missing values generated). gen educage2=educmean*agemean2(24 missing values generated)

26

. gen educage3=educmean*agemean3(24 missing values generated). reg agekdbrn bornd sexd mapres80 educmean agemean agemean2 agemean3 educage educage2 educage3 Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 10, 1078) = 35.55 Model | 7731.43912 10 773.143912 Prob > F = 0.0000 Residual | 23441.2239 1078 21.7451056 R-squared = 0.2480-------------+------------------------------ Adj R-squared = 0.2410 Total | 31172.663 1088 28.6513447 Root MSE = 4.6632------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- bornd | 1.278985 .556004 2.30 0.022 .1880122 2.369958 sexd | -2.113086 .2941837 -7.18 0.000 -2.690323 -1.535848 mapres80 | .0355671 .0114369 3.11 0.002 .0131259 .0580082 educmean | .7185734 .0759774 9.46 0.000 .569493 .8676538 agemean | -.0445573 .0182216 -2.45 0.015 -.080311 -.0088036 agemean2 | -.0064784 .0007326 -8.84 0.000 -.0079158 -.005041 agemean3 | .0002514 .0000327 7.69 0.000 .0001873 .0003155 educage | -.0001007 .005545 -0.02 0.986 -.010981 .0107796 educage2 | -.0008988 .0003225 -2.79 0.005 -.0015315 -.0002661 educage3 | .0000198 9.75e-06 2.03 0.042 6.87e-07 .000039 _cons | 24.53094 .5244201 46.78 0.000 23.50194 25.55994------------------------------------------------------------------------------Indeed, significant interactions with the squared term and the cubed term.. qui adjust bornd sexd mapres80 if e(sample), gen(pred3). twoway (line pred3 age if educ==12, sort color(red) legend(label(1 "12 years of education"))) (line pred3 age if educ==14, sort color(blue) legend(label(2 "14 years of education"))) (line pred3 age if educ==16, sort color(green) legend(label(3 "16 years of education"))) (line pred3 age if educ==20, sort color(lime) legend(label(4 "20 years of education")) ytitle("Respondent’s Age When 1st Child Was Born"))

1520

2530

Res

pond

ent’s

Age

Whe

n 1s

t Chi

ld W

as B

orn

20 40 60 80 100age of respondent

12 years of education 14 years of education16 years of education 20 years of education

5. Multicollinearity

Our real life concern about the multicollinearity is that independent variables are highly (but not perfectly) correlated. Need to distinguish from perfect multicollinearity -- two or more independent variables are linearly related – in practice, this usually happens only if we make a mistake in including the variables; Stata will resolve this by omitting one of those variables and will tell you it did it. It can also happen when the number of variables exceeds the number of observations.

27

Perfect multicollinearity violates regression assumptions -- no unique solution for regression coefficients.

High, but not perfect, multicollinearity is what we most commonly deal with. High multicollinearity does not explicitly violate the regression assumptions - it is not a problem if we use regression only for prediction (and therefore are only interested in predicted values of Y our model generates). But it is a problem when we want to use regression for explanation (which is typically the case in social sciences) – in this case, we are interested in values and significance levels of regression coefficients. High degree of multicollinearity results in imprecise estimates of the unique effects of independent variables.

First, we can inspect the correlations among the variables; it allows us to see whether there are any high correlations, but does not provide a direct indication of multicollinearity:. corr educ born sex mapres80(obs=1615) | educ born sex mapres80-------------+------------------------------------ educ | 1.0000 born | 0.0182 1.0000 sex | 0.0066 0.0205 1.0000 mapres80 | 0.2861 0.0169 -0.0423 1.0000

Variance Inflation Factors are a better tool to diagnose multicollinearity problems. These indicate how much the variance of a given coefficient estimate increases because of correlations of a certain variable with the other variables in the model. E.g. VIF of 4 means that the variance is 4 times higher than it could be, and the standard error is twice as high as it could be. . reg agekdbrn educ born sex mapres80 Source | SS df MS Number of obs = 1091-------------+------------------------------ F( 4, 1086) = 51.24 Model | 4954.03533 4 1238.50883 Prob > F = 0.0000 Residual | 26251.1232 1086 24.172305 R-squared = 0.1588-------------+------------------------------ Adj R-squared = 0.1557 Total | 31205.1586 1090 28.6285858 Root MSE = 4.9165

------------------------------------------------------------------------------ agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- educ | .6122718 .0569422 10.75 0.000 .5005426 .724001 born | 1.360161 .5816506 2.34 0.020 .218875 2.501447 sex | -2.37973 .3075642 -7.74 0.000 -2.983218 -1.776243 mapres80 | .0243138 .0119552 2.03 0.042 .0008558 .0477718 _cons | 16.95808 1.101139 15.40 0.000 14.79748 19.11868------------------------------------------------------------------------------. vif Variable | VIF 1/VIF -------------+---------------------- mapres80 | 1.08 0.926124 educ | 1.08 0.926562 born | 1.00 0.998366 sex | 1.00 0.999456-------------+---------------------- Mean VIF | 1.04

Different researchers advocate for different cutoff points for VIF. Some say that if any one of VIF values is larger than 4, there are some multicollinearity problems associated with that variable.

28

Others use cutoffs of 5 or even 10. In the example above, there are no problems with multicollinearity regardless of the cutoff we pick.

In addition, the following symptoms may indicate a multicollinearity problem: large changes in coefficients when adding or deleting variables non-significant coefficients for variables that you know are theoretically important coefficients with signs opposite of those you expected based on theory or previous results large standard errors in comparison to the coefficient size two (or more) large coefficients with opposite signs, possibly non-significant all or most coefficients are not significant even though the F-test indicates the entire

regression model is significant

Solutions for multicollinearity problems:

1. See if you could create a meaningful scale from the variables that are highly correlated, and use that scale instead of the individual variables (i.e. several variables are reconceptualized as indicators of one underlying construct). . sum mapres80 papres80 Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- mapres80 | 1619 40.96912 13.63189 17 86 papres80 | 2165 43.47206 12.40479 17 86The variables have the same scales so we can add them: . gen prestige=mapres80+papres80(1519 missing values generated)

If the scales were different, we would first standardize each of them:

. egen papres80std = std(papres80)(600 missing values generated)

. egen mapres80std = std(mapres80)(1146 missing values generated)

. sum mapres80std papres80std Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- mapres80std | 1619 4.12e-09 1 -1.758312 3.303348 papres80std | 2165 -8.26e-11 1 -2.134019 3.42835

. gen prestige2=mapres80std+papres80std(1519 missing values generated)

. pwcorr prestige prestige2

| prestige presti~2-------------+------------------ prestige | 1.0000 prestige2 | 0.9994 1.0000

We can now use prestige variable in subsequent OLS regressions. We might want to report a Chronbach’s alpha – it indicates the reliability of the scale. It varies between 0 and 1, with 1 being perfect. Typically, alphas above .7 are considered acceptable, although some argue that those above .5 are ok.

29

. alpha mapres80 papres80Test scale = mean(unstandardized items)Average interitem covariance: 56.39064Number of items in the scale: 2Scale reliability coefficient: 0.5036

2. Consider if all variables are necessary. Try to primarily use theoretical considerations -- automated procedures such as backward or forward stepwise regression methods (available via “sw regress” command) are potentially misleading; they capitalize on minor differences among regressors and do not result in an optimal set of regressors. If not too many variables, examine all possible subsets.

3. If using highly correlated variables is absolutely necessary for correct model specification, you can use biased estimates. The idea here is that we add a small amount of bias but increase the efficiency of the estimates for those highly correlated variables. The most common method of this type is ridge regression (see http://members.iquest.net/~softrx/ for the Stata module).

6. Heteroscedasticity

The problem of heteroscedasticity commonly refers to non-constant error variance (that’s opposite of homoscedasticity). We can examine this graphically as well as using formal tests. First, let's see if error variance changes across fitted values of our dependent variable:

. qui reg agekdbrn educ born sex mapres80 age

. rvfplot

-10

010

20R

esid

uals

15 20 25 30Fitted values

Can examine the same using a formal test:. hettestBreusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of agekdbrn chi2(1) = 21.44 Prob > chi2 = 0.0000

Since p<.05, we reject the null hypothesis of constant variance - the errors are heteroscedastic. Both the graph and the test indicate that the error variance is nonconsant (note the megaphone pattern).

30

Now let's search if there is any systematic relationship between error variance and individual regressors. First, graphical examination:

. rvpplot educ

-10

010

20R

esid

uals

0 5 10 15 20highest year of school completed

.rvpplot age

-10

010

20R

esid

uals

20 40 60 80 100age of respondent

We can see the heteroscedasticity in both graphs, but it is much more severe for age. For a dummy variable, it is more difficult to examine it graphically:. rvpplot sex

-10

010

20R

esid

uals

1 1.2 1.4 1.6 1.8 2respondents sex

31

Now, let's use a formal test to examine the patterns of error variance across individual regressors:. hettest, rhs mtestBreusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance--------------------------------------- Variable | chi2 df p -------------+------------------------- educ | 5.87 1 0.0154 # born | 0.00 1 0.9810 # sex | 9.19 1 0.0024 # mapres80 | 1.45 1 0.2279 # age | 10.26 1 0.0014 #-------------+-------------------------simultaneous | 25.78 5 0.0001--------------------------------------- # unadjusted p-values

It looks like a number of regressors are responsible for our problems.

Remedies:1. Transformations might help – it is especially important to consider the distribution of the dependent variable. As we discussed above, it is typically desirable, and can help avoid heteroscedasticity as well as non-normality problems, if the dependent variable is normally distributed. Let's examine whether the transformation we identified – reciprocal square root – would solve our heteroscedasticity problem.

. gen agekdbrnrr=1/(sqrt(agekdbrn))(810 missing values generated)

. reg agekdbrnrr educ born sex mapres80 age Source | SS df MS Number of obs = 1089-------------+------------------------------ F( 6, 1082) = 48.07 Model | .11381105 6 .018968508 Prob > F = 0.0000 Residual | .426934693 1082 .000394579 R-squared = 0.2105-------------+------------------------------ Adj R-squared = 0.2061 Total | .540745743 1088 .000497009 Root MSE = .01986------------------------------------------------------------------------------ agekdbrnrr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- educ | -.0024213 .0002353 -10.29 0.000 -.0028829 -.0019597 born | -.0070982 .0023638 -3.00 0.003 -.0117363 -.0024602 sex | .0095887 .0012506 7.67 0.000 .0071349 .0120425 mapres80 | -.0001494 .0000487 -3.07 0.002 -.000245 -.0000539 agemean | -.0003115 .0000434 -7.18 0.000 -.0003967 -.0002264 agemean2 | 8.86e-06 2.29e-06 3.87 0.000 4.37e-06 .0000134 _cons | .2373519 .0046505 51.04 0.000 .228227 .2464769------------------------------------------------------------------------------

. hettestBreusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of agekdbrnrr

chi2(1) = 0.35 Prob > chi2 = 0.5566. hettest, rhs mtest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance

32

--------------------------------------- Variable | chi2 df p -------------+------------------------- educ | 0.63 1 0.4262 # born | 0.26 1 0.6111 # sex | 0.29 1 0.5932 # mapres80 | 0.73 1 0.3939 # age | 1.71 1 0.1911 #-------------+-------------------------simultaneous | 3.06 5 0.6900--------------------------------------- # unadjusted p-values

The heteroscedasticity problem has been solved. As I mentioned earlier, however, it is important to check that we did not introduce any nonlinearities by this transformation, and overall, transformations should be used sparsely - always consider ease of model interpretation as well. Also, sometimes when searching for a transformation to remedy heteroscedasticity, Box-Cox transformations can be very helpful, including the “transform both sides” (TBS) approach (see boxcox command).

2. Sometimes, dealing with outliers, influential observations, and nonlinearities might also help resolve heteroscedasticity problems. That is why I recommend testing with heteroscedasticity only after you’ve dealt with other problem.

3. Heteroscedasticity can also be a sign that some important factor is omitted, so you might want to rethink your model specification.

4. If nothing else works, we can obtain robust variance estimates using robust option in regress command (note that this is different from robust regression estimated by rreg!). These variance estimates do not rely on distributional assumptions and are therefore not sensitive to heteroscedasticity:

. reg agekdbrn educ born sex mapres80 age, robustLinear regression Number of obs = 1089 F( 5, 1083) = 47.74 Prob > F = 0.0000 R-squared = 0.1848 Root MSE = 4.8441------------------------------------------------------------------------------ | Robust agekdbrn | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- educ | .6158833 .0640298 9.62 0.000 .4902467 .7415199 born | 1.679078 .5756992 2.92 0.004 .5494661 2.80869 sex | -2.217823 .3143631 -7.05 0.000 -2.834653 -1.600993 mapres80 | .0331945 .0122934 2.70 0.007 .009073 .0573161 age | .0582643 .0088246 6.60 0.000 .0409491 .0755795 _cons | 13.27142 1.239779 10.70 0.000 10.83877 15.70406------------------------------------------------------------------------------

33