seasonal and temporary employment and health insurance … · 2008. 9. 9. · insurance to their...
TRANSCRIPT
Seasonal and Temporary Employment and Health Insurance Coverage
Matthew Mark
Contents
Executive Summary ............................................................................................................................................. i
Section 1 ‐ Introduction ...................................................................................................................................... 1
Section 2 ‐ Data Characteristics .......................................................................................................................... 2
Section 3.1 ‐ Model Selection and Interpretation ................................................................................................ 4
Section 3.2 – Analysis of Model .......................................................................................................................... 6
Section 4 ‐ Summary and Concluding Remarks .................................................................................................... 8
Section 5 – Works Cited...................................................................................................................................... 9
Section 6 ‐ Appendices ....................................................................................................................................... 9
Appendix 1 – Full list of variables used from the MEPS database ............................................................................. 9
Appendix 2 – Specific Analysis of Data .................................................................................................................... 10
Appendix 3 – Validation Data .................................................................................................................................. 11
Appendix 4 – “.36” Data .......................................................................................................................................... 13
Appendix 5 –All Interactions Model ........................................................................................................................ 14
Appendix 6 – EMPOFFER Interaction Model ........................................................................................................... 16
Appendix 7 – SEASONAL/TEMPJOB Interaction Model ........................................................................................... 17
i
Executive Summary In the United States, health insurance is usually obtained through an employer‐sponsored health plan. This paper uses data from the US government’s Medical Expenditure Panel Survey to analyze the relationship between the type of employment (i.e. whether it is seasonal or temporary) and whether the employee possesses health insurance. After controlling for age, sex, ethnicity, educational level and whether the employer offers health coverage, temporary workers (but not seasonal workers) are significantly less likely to have health coverage than traditionally employed individuals. This result has public policy implications, as well as implications for employers seeking to introduce a new health plan in their benefits package.
1
Section 1 Introduction Access to health insurance is one of the most pressing issues in the United States. People obtain coverage through employer‐sponsored insurance programs; they purchase individual coverage, and can obtain coverage through government programs, such as Medicare, Medicaid, and SCHIP. The most common method of obtaining insurance is through employer sponsored health coverage, with 60% of people obtaining their coverage this way1.
In the United States, an estimated five‐and‐a‐half million people have a current main job that is seasonal, and an estimated seven million people are temporarily employed2. Those that have seasonal or temporary employment (“non‐traditional employment”) may not be offered year‐round health coverage, may choose to not buy coverage, and may not have the resources to purchase coverage.
This paper will examine the relationship between non‐traditionally employed individuals, and whether the individual has health coverage. This information is useful from a public policy standpoint, for instance, it would be helpful if deciding to expand Medicaid coverage. Employers in states with “pay or play” laws can also use this analysis as a reference, when estimating the costs of offering health insurance to their seasonal and temporary employees.
The analyzed data comes from the Medical Expenditure Panel Survey (MEPS) from 2005, the most recent year with complete data available. Statistical techniques were used to investigate and estimate the relationship between non‐traditional employment and whether the people holding those jobs carry health insurance. A logistic regression was performed, and controls for a person’s age, sex, ethnicity, education level, and whether or not that person’s current main job offers health coverage.
This paper is focused on the effect of type of employment on possessing health coverage. Therefore, unemployed individuals were excluded from the analysis. Children under sixteen years of age were also excluded from the sample.
In Section 2, I present and explain the variables used in the analysis, and patterns in the data themselves. In Sections 3.1, a model is proposed, interpreted, and evaluated. Section 3.2 contains a more technical analysis of the model. Section 4 contains a summary of the key results and concluding remarks.
2
Section 2 Data Characteristics The Medical Expenditure Panel Survey (MEPS) database is the source of all data used in this analysis. The MEPS “collects data on the specific health services that Americans use, how frequently they use them, the cost of these services, and how they are paid for, as well as data on the cost, scope, and breadth of health insurance held by and available to U.S. workers.” 3 Specifically, the data were collected from the 2005 Full Year Consolidated data file, which is the most current complete version of the survey. All data was observational and self‐reported. The survey included data on a total of 33,961 persons, but only 13,779 had complete entries for all variables of interest used in this analysis.
The variables analyzed were:
COVWHOLE – A binary variable denoting whether or not the specified person had health coverage in each month of 2005. This was the dependent variable in the analysis. “0” corresponds to no coverage.
SEASONAL – A binary variable denoting whether or not the specified person’s current main job (CMJ) is only available during certain times of year. A person was considered to be seasonally employed if their CMJ was listed as seasonal at any time during 2005. Teachers and other school personnel were counted as non‐seasonal. “0” corresponds to non‐seasonal employment.
TEMPJOB – A binary variable denoting whether or not the specified person’s CMJ only lasts for a limited period of time. A person was considered to be temporarily employed if their CMJ was listed as temporary at any time during 2005. “0” corresponds to non‐temporary employment. A very strong relationship exists between seasonal employment and temporary employment (See Table 1).
ETHNIC – A categorical variable denoting the ethnicity of the respondent. It could take on the values of:
o Asian o Black o Hispanic o _Other – The default ethnicity in the analysis. Since the vast majority of the non‐Asian,
non‐Black, non‐Hispanic population is Caucasian, the variable _Other is essentially interchangeable with being Caucasian.
Non‐Temporary Temporary
Non‐Seasonal 91.37% 8.63%
Seasonal 44.86% 55.14%
Total 87.69% 12.31%
Table 1: % of CMJs that are Temporary by whether or not CMJ is Seasonal
3
SEX – A binary categorical variable, taking on the value of M for male and F for female. Female was the default sex in the analysis.
EMPOFFER – A binary variable denoting whether or not the respondent’s CMJ offers employer subsidized health insurance. “0” corresponds to no health insurance offered.
HIDEG – A categorical variable denoting the highest level of education that the respondent achieved. Highest degree is highly correlated with income. Using HIDEG as a proxy for income increases the number of observations available for analysis due to the unwillingness of people to disclose income. The levels are:
o _HSDiploma – The educational level in the analysis o Bachelors o Doctorate o GED o Masters o NoDegree o Other
AGE – A discrete variable listing the respondent’s age
Excluding AGE, there were a total of 13 binary variables (plus 6 defaults), for a total of 448 groups that a respondent could have fallen into. Of these groups, 114 contained no members (e.g., there were no Asian Males with a doctorate that were seasonally employed). No group had more than 10% of participants of the survey (the largest group being Caucasian Male, non‐seasonal, non‐temporary, with a high school diploma and employer subsidized insurance, comprised of 1,272 members out of 13,779).
The demographics of the covered group were very different from the non‐covered group (See Appendix 2). Those that were covered were less likely to be black or Hispanic, less likely to hold a seasonal or a temporary job, and were offered insurance from their employer more often. They also tended to be female, older, and attained a higher level of education.
Individually, nearly every variable was a significant predictor of coverage. Only the variables Black and _HSDiploma had a covered proportion that was not statistically different from the overall proportion.
Of the 13,779 complete observations, one thousand were randomly selected from the sample. These thousand observations were excluded from the model, and were later used to test the explanatory power of the model (See Section 3.2).
4
Section 3.1 Model Selection and Interpretation This section will cover the selection of the model and the results from its analysis. The purpose of this paper is to evaluate and estimate the effect of non‐traditional types of employment on health care coverage. Therefore, a proper model would be chosen on this basis:
It must predict whether or not an individual has health insurance coverage.
This prediction must be based in part on whether the individual holds a job that is seasonal and/or temporary.
The model should correct for factors that may create under‐estimates or over‐estimates of the effect of non‐traditional employment on holding health insurance.
The explanatory value of the model is only important as to correctly determine the effect of non‐traditional employment on health care coverage. Correcting factors should be chosen based on their relationship to the non‐traditional employment indicators.
Understanding the relationship between non‐traditional work and health insurance coverage is more important than predicting whether or not a person would be covered.
Due to the relationships between the data found in Section 2 and the criteria listed above, the chosen model was a logistic regression model to predict the value of COVWHOLE. The full regression equation of the proposed model is:
π−1 (COVWHOLE) = b0 + SEASONAL*b1 + TEMPJOB*b2 + EMPOFFER*b3 + Asian*b4+ Black*b5 + Hispanic*b6 + NoDegree*b7 + GED*b8 + Bachelors*b9 + Masters*b10 + Doctorate*b11 + Other*b12 + AGE*b14+ M*b13
Where
An a priori α level of .05 is used for statistical significance.
The full list of coefficients for the logistic regression is located in Table 2 (next page). These coefficients can be interpreted as the increase in the logarithm of the odds ratio of holding insurance. For example, the logarithm of the odds ratio of holding insurance of a person temporarily employed would be .2481 lower than a similar person in a permanent job.
5
The variables associated with seasonality, having a highest level of education of “Other”, and with being Asian were found to not be significant predictors of COVWHOLE, but all other variables were. Seasonality and an education level of “Other” are close to being statistically significant (p<.1).
The signs of the statistically significant coefficients are all as expected from the preliminary analysis of the data in Section 2. The sign for seasonality is positive. This means that after adjusting for the included variables, a person employed seasonally is more likely to hold health insurance coverage than a person who is not seasonally employed. The coefficient, however, is not statistically significant from zero.
The intercept in the model is associated with a prediction of 31.8% of people covered. Using the median observed age of 39 and the default categorical variable values, we have a prediction of 57.5% probability of having coverage.
The largest effects on the predicted probability come from whether the respondent’s CMJ offers insurance coverage, whether the respondent is Hispanic, and whether the respondent has an advanced degree.
Table 3 shows the marginal effect of changing a single binary variable on the prediction. As above, a 39 year old Caucasian female with a high school diploma holding a non‐seasonal, non‐temporary job that
Table 2 – Logistic Regression Coefficients
Variable Estimate Standard Error P‐Value
Intercept ‐0.7636 0.0803 < 0.001
Non‐Seasonal *
Seasonal 0.1626 0.0855 0.057
Non‐Temporary *
Temporary ‐0.2481 0.0687 < 0.001
Not Offered *
Offered 1.4283 0.0455 < 0.001
_Other *
Asian ‐0.1002 0.1173 0.393
Black ‐0.4417 0.0613 < 0.001
Hispanic ‐0.9922 0.0533 < 0.001
NoDegree ‐0.1897 0.0574 < 0.001
GED ‐0.6174 0.0977 < 0.001
_HSDiploma *
Bachelors 0.4423 0.0707 < 0.001
Masters 1.0101 0.1376 < 0.001
Doctorate 1.0142 0.2563 < 0.001
Other 0.1658 0.0864 0.055
Age 0.0273 0.0016 < 0.001
F *
M ‐0.2056 0.0435 < 0.001
* Default value, included in Intercept
Variable Approx. Change in Prediction
Seasonal 3.9%
Temporary ‐6.1%
Offered 27.5%
Asian ‐2.5%
Black ‐11.0%
Hispanic ‐24.1%
NoDegree ‐4.7%
GED ‐15.3%
Bachelors 10.3%
Masters 21.3%
Doctorate 21.4%
Other 4.0%
M ‐5.1%
Table 3 ‐Marginal Change in Prediction for Binary Variables
6
Figure 1 – Difference between Actual Proportion and Prediction versus Prediction
Validation Data Model Data
does not offer insurance is predicted to have a 57.5% probability of having health care coverage. If this person is a male, holding all else equal, the predicted probability is 52.4% (57.5% minus 5.1%). Table 3 is provided for demonstrative purposes; the numbers are only accurate for values close to the default.
Section 3.2 – Analysis of Model
The proposed model has an Akaike's Information Criterion (AIC) of 13095, and a null deviance of 16109. This implies a pseudo‐R2 value of .1870.4 A model without the two independent variables of interest (SEASONAL and TEMPJOB) has an AIC of 13105, and a pseudo‐R2 of .1865. Even though one of the two is statistically significant, the addition of the non‐traditional employment binary variables explains only an additional 0.05% of the variation of the data.
The model was applied to the thousand observations excluded from the sample, as mentioned in Section 2. The demographics of the validation data were very similar to the demographics from the model data (See Appendix 3). The validation data was put into the new model. A new logistic regression was run, with the dependent variable being whether the individual had health coverage, and the independent variable being the prediction from the model. The model had a pseudo‐R2 of .2083 when predicting the new data. It is likely that the model has actual explanatory power, and is not just an artifact of the data used.
The predictions of the model were evaluated by comparing the prediction with the actual proportion of people with health coverage. The results of this evaluation are in Figure 1. The black line is the difference between the actual proportion with health coverage and the predicted probability. The grey shaded area is a 95% confidence interval, given the model is accurate. The proportion and confidence intervals were created using a bandwidth of .04. They have been truncated at .12 due to the very low sample size for values below that.
7
Each point of the graph can be thought of as a statistical test on whether the prediction is accurate. We expect 5% of the points to fall outside of the shaded area. We see similar patterns between the model data and the new validation data. For example, predictions that were given a probability around .36 were significantly higher in both the model data and the validation data. The most interesting fact about the demographics of this group is that only 2% of respondents with a prediction near .36 were offered insurance through their employer (See Appendix 4).
The model appears to predict well for people that are likely to have insurance. These people constitute most of the data (See Appendix 3). It is possible that the relationship between the variables is different when several are present, implying that the best model may contain some measure of interaction between variables.
There were 60 possible dual‐interaction variables. Including all interactions explained a larger proportion of the deviance and eliminated any obvious biases. This model was far less parsimonious, and had several other problems (See Appendix 5). Because EMPOFFER was the most significant predictor, and because of the results at the prediction value .36, a new model was proposed that included only interactions between SEASONAL and EMPOFFER, and TEMPJOB and EMPOFFER. Only one of these two interaction variables was significant, and the “.36 problem” was still present (See Appendix 6). A model involving an interaction between SEASONAL and TEMP had the same AIC as the proposed model, and did not offer any more explanatory value (See Appendix 7).
8
Section 4 Summary and Concluding Remarks The proposed model, using no interactions, is the most parsimonious model, and has significant explanatory power. Although the proposed model does have its problems, it explains around a significant proportion of the variation in the data set. Because the dependent variable (health care coverage) is ultimately based on human choice, it is unlikely that any data set or model would be able to perfectly predict whether a person would be covered. There were also technological limitations to this analysis that may have prevented the discovery of a better model (See Appendix 5).
Even when controlling for age, sex, ethnicity, education level, and whether a person is offered employer‐subsidized insurance, whether a person holds a temporary job carries significant predictive value when estimating the probability that that person carries insurance. There is an intuitive explanation for causation; if a person’s job is temporary, then their employer subsidized insurance (if it exists) would also be temporary, possibly leading to gaps in coverage throughout the year. Employers of temporary employees should take note of this result in their decisions, even though the actual result is relatively small.
No statistical significance was found on whether seasonality affects the probability of coverage. It is possible that no relationship exists between seasonality and coverage, but hidden variables may also have played a part.
One possible hidden variable in this model may involve how a particular person obtained their coverage. If a person obtained their coverage through a spouse, or through a government program may have some predictive value, and may help analyze the effect of non‐traditional employment on health coverage. In addition, eligibility for Medicaid or Medicare could have also had a major effect, especially on the low end. It is possible that different results could have been found if the dependent variable indicated whether a person was covered for at least part of the year, or covered at a particular point in time.
From a public policy standpoint, temporarily employed people are less likely to hold health coverage. Programs like the Consolidated Omnibus Budget Reconciliation Act (1985) could be expanded to ease the transition from one temporary job to another, or to unemployment. Changing tax advantages away from employer‐based coverage to individual‐purchaser coverage would remove the disparity between types of employment, and make it solely an income issue. Specific ways of addressing the disparity are left to another paper.
9
Section 5 – Works Cited 1 Income, Poverty, and Health Insurance Coverage in the United States: 2006. U.S. Census Bureau. Issued August 2007.
2 MEPS. http://www.meps.ahrq.gov/mepsweb/data_stats/download_data_files_codebook.jsp?PUFId=H97. Retrieved May 5 2008.
3 Survey Background. http://www.meps.ahrq.gov/mepsweb/about_meps/survey_back.jsp. Retrieved May 5 2008.
4 Applied Categorical & Nonnormal Data Analysis. http://www.gseis.ucla.edu/courses/ed231c/notes3/fit.html. Retrieved May 5 2008.
Section 6 Appendices
Appendix 1 – Full list of variables used from the MEPS database
DUID SSNLJB31 SSNLJB42 SSNLJB53 TEMPJB31 TEMPJB42 TEMPJB53 RACETHNX TTLP05X FCSZ1231 AGE05X SEX
INSJA05X INSFE05X INSMA05X INSAP05X INSMY05X INSJU05X INSJL05X INSAU05X INSSE05X INSOC05X INSNO05X INSDE05X
HIDEG PERWT05F OFFER31X OFFER42X OFFER53X OFREMP31 OFREMP42 OFREMP53 HELD31X HELD42X HELD53X
10
Group Not Covered Covered P‐Value
SEASONAL
Non‐Seasonal 89.4% 93.4% < 0.001 Seasonal 10.6% 6.6% < 0.001TEMPJOB
Non‐Temporary 82.5% 90.2% < 0.001 Temporary 17.5% 9.8% < 0.001EMPOFFER
Not Offered 59.9% 21.5% < 0.001 Offered 40.1% 78.5% < 0.001ETHNIC
_Other 40.4% 64.2% < 0.001 Asian 2.9% 4.8% 0.007 Black 15.7% 14.9% 0.335 Hispanic 41.0% 16.1% < 0.001HIDEG
NoDegree 33.8% 13.8% < 0.001 GED 6.3% 3.4% < 0.001 _HSDiploma 44.4% 45.8% 0.688 Bachelors 8.0% 18.5% < 0.001 Masters 1.5% 7.8% < 0.001 Doctorate 0.4% 2.2% < 0.001 Other 5.6% 8.5% < 0.001AGE
16‐35 56.2% 34.4% < 0.001 36‐55 36.3% 48.9% < 0.001 56+ 7.5% 16.7% < 0.001SEX
F 45.1% 49.7% < 0.001 M 54.9% 50.3% < 0.001
Table A‐1 ‐ Comparison of Demographics by Coverage
Group % Covered Count P‐Value
SEASONAL
Non‐Seasonal 68.4% 12689 0.021 Seasonal 56.3% 1090 < 0.001TEMPJOB
Non‐Temporary 69.4% 12083 < 0.001 Temporary 53.8% 1696 < 0.001EMPOFFER
Not Offered 42.7% 4688 < 0.001 Offered 80.2% 9091 < 0.001ETHNIC
_Other 76.7% 7782 < 0.001 Asian 77.4% 574 < 0.001 Black 66.3% 2092 0.283 Hispanic 44.8% 3331 < 0.001HIDEG
NoDegree 45.9% 6247 < 0.001 GED 52.8% 2073 < 0.001 _HSDiploma 68.1% 228 0.836 Bachelors 82.7% 597 < 0.001 Masters 91.6% 793 < 0.001 Doctorate 91.7% 2800 < 0.001 Other 75.9% 1041 < 0.001AGE
16‐35 55.9% 5712 < 0.001 36‐55 73.6% 6177 < 0.001 56+ 82.2% 1890 < 0.001SEX
F 69.6% 6644 < 0.001 M 65.5% 7135 < 0.001TOTAL 67.5% 13779
Table A‐2 – Proportion Covered by Group
Appendix 2 – Specific Analysis of Data
Statistic Value
Minimum 16
25th Percentile 28
Median 39
Mean 39.71
75th Percentile 50
Maximum 85
Standard Deviation 13.505
Table A‐3 – AGE Summary Statistics
11
Appendix 3 – Validation Data
The demographics are very similar between the model data and the validation data.
Group Model Validation Sample
SEASONAL
Non‐Seasonal 92.2% 90.9%
Seasonal 7.8% 9.1%
TEMPJOB
Non‐Temporary 87.7% 87.6%
Temporary 12.3% 12.4%
EMPOFFER
Not Offered 34.0% 34.5%
Offered 66.0% 65.5%
ETHNIC
_Other 56.6% 54.6%
Asian 4.2% 4.1%
Black 15.2% 15.1%
Hispanic 24.0% 26.2%
HIDEG
NoDegree 20.2% 21.7%
GED 4.4% 3.9%
_HSDiploma 45.2% 47.4%
Bachelors 15.2% 13.5%
Masters 5.7% 6.0%
Doctorate 1.7% 1.4%
Other 7.7% 6.1%
AGE
16‐35 41.4% 42.7%
36‐55 44.9% 44.5%
56+ 13.8% 12.8%
SEX
F 48.3% 47.1%
M 51.7% 52.9%
Covered
No Coverage 32.5% 33.6%
Coverage 67.5% 66.4%
TOTAL 12779 1000
Table A‐4 – Comparison of Demographics of Model Data and Validation Data
12
The distribution of predictions from the model data is very similar to those from the validation data, as we would expect from the similar demographics. There are many more predictions near the right, which explains why the accuracy of the model is better near the larger values.
Figure A‐1 – Comparison of Distribution of Predictions from the Model Data and Validation Data
13
Group Model 0.36
SEASONAL
Non‐Seasonal 92.2% 83.7%
Seasonal 7.8% 16.3%
TEMPJOB
Non‐Temporary 87.7% 75.4%
Temporary 12.3% 24.6%
EMPOFFER
Not Offered 34.0% 98.3%
Offered 66.0% 1.7%
ETHNIC
_Other 56.6% 48.6%
Asian 4.2% 3.3%
Black 15.2% 18.5%
Hispanic 24.0% 29.6%
HIDEG
NoDegree 20.2% 53.7%
GED 4.4% 7.8%
_HSDiploma 45.2% 33.6%
Bachelors 15.2% 2.5%
Masters 5.7% 0.0%
Doctorate 1.7% 0.0%
Other 7.7% 2.4%
AGE
16‐35 41.4% 67.3%
36‐55 44.9% 26.6%
56+ 13.8% 6.1%
SEX
F 48.3% 48.6%
M 51.7% 51.4%
Covered
No Coverage 32.5% 53.0%
Coverage 67.5% 47.0%
TOTAL 12779 67
Table A‐5 – Comparison of Demographics betweenModel Data and Persons with Predictions Around .36
Appendix 4 – “.36” Data
Note that very few people with predictions around .36 are offered employer‐sponsored health insurance. Other variables differ between populations, but none as drastically as EMPOFFER.
14
Appendix 5 –All Interactions Model The major problem with this model is that most of the variables are not statistically significant. Most likely, this is due to the fact that the interactions are highly correlated with other variables. Even though this model gives a fairly good prediction with no major biases (See Figure A‐2), and has a lower AIC than the proposed model (12765 vs. 13095) it is too complex to be an accurate representation of the actual relationship. Performing a stepwise regression would be a good way to fine‐tune this model. I do not have the computing power available to perform one to a satisfactory degree (to illustrate, one step of the algorithm took a 2GHz processor over ten minutes). This graph is of the same type as Figure 1 (See Section 3.2).
The following is the regression output from the model that includes interaction between all variables.
glm(COVWHOLE~factor(SEASONAL)+factor(TEMPJOB)+ETHNIC+SEX + factor(EMPOFFER)+HIDEG+AGE+factor(SEASONAL)*factor(TEMPJOB) + factor(SEASONAL)*factor(EMPOFFER) + factor(SEASONAL)*ETHNIC + factor(SEASONAL)*HIDEG + factor(SEASONAL)*AGE + factor(TEMPJOB)*factor(EMPOFFER) + factor(TEMPJOB)*ETHNIC + factor(TEMPJOB)*HIDEG + factor(TEMPJOB)*AGE + factor(EMPOFFER)*ETHNIC+ factor(EMPOFFER)*HIDEG + factor(EMPOFFER)*AGE + ETHNIC*HIDEG + ETHNIC*AGE + HIDEG*AGE, binomial(link=logit))
Estimate Std. Error z value Pr(>|z|) (Intercept) -1.081e+00 1.477e-01 -7.322 2.44e-13 *** factor(SEASONAL)1 9.410e-01 2.940e-01 3.201 0.00137 ** factor(TEMPJOB)1 9.596e-01 2.272e-01 4.224 2.40e-05 *** ETHNICAsian -2.896e-01 4.529e-01 -0.639 0.52250 ETHNICBlack -1.494e-01 2.167e-01 -0.689 0.49058 ETHNICHispanic -1.414e+00 1.923e-01 -7.352 1.96e-13 *** SEXM -2.055e-01 4.454e-02 -4.613 3.96e-06 *** factor(EMPOFFER)1 4.425e-01 1.542e-01 2.869 0.00412 ** HIDEGBachelors 1.759e-01 3.089e-01 0.569 0.56910 HIDEGDoctorate 2.444e+00 1.390e+00 1.759 0.07865 . HIDEGGED -7.056e-01 3.890e-01 -1.814 0.06967 . HIDEGMasters 5.031e-01 6.792e-01 0.741 0.45885 HIDEGNoDegree 1.287e+00 1.766e-01 7.285 3.22e-13 *** HIDEGOther 4.829e-01 3.463e-01 1.395 0.16316 AGE 3.500e-02 3.390e-03 10.323 < 2e-16 ***
Figure A‐2 – Difference between Actual Proportion and Prediction versus Prediction for the All Interaction Model
15
factor(SEASONAL)1:factor(TEMPJOB)1 -2.124e-01 1.762e-01 -1.205 0.22814 factor(SEASONAL)1:factor(EMPOFFER)1 -2.466e-01 1.796e-01 -1.373 0.16971 factor(SEASONAL)1:ETHNICAsian 3.130e-01 6.501e-01 0.481 0.63018 factor(SEASONAL)1:ETHNICBlack -4.181e-01 2.448e-01 -1.708 0.08758 . factor(SEASONAL)1:ETHNICHispanic -1.152e-01 2.098e-01 -0.549 0.58278 factor(SEASONAL)1:HIDEGBachelors -5.085e-01 2.675e-01 -1.901 0.05731 . factor(SEASONAL)1:HIDEGDoctorate 1.147e+01 1.540e+02 0.074 0.94062 factor(SEASONAL)1:HIDEGGED 1.485e-01 4.198e-01 0.354 0.72353 factor(SEASONAL)1:HIDEGMasters -5.771e-01 4.641e-01 -1.243 0.21370 factor(SEASONAL)1:HIDEGNoDegree 4.144e-02 2.138e-01 0.194 0.84631 factor(SEASONAL)1:HIDEGOther -7.373e-02 3.647e-01 -0.202 0.83979 factor(SEASONAL)1:AGE -1.498e-02 6.062e-03 -2.472 0.01345 * factor(TEMPJOB)1:factor(EMPOFFER)1 -8.245e-01 1.383e-01 -5.963 2.47e-09 *** factor(TEMPJOB)1:ETHNICAsian 7.851e-02 4.136e-01 0.190 0.84944 factor(TEMPJOB)1:ETHNICBlack -1.624e-01 1.812e-01 -0.896 0.37005 factor(TEMPJOB)1:ETHNICHispanic 1.963e-01 1.671e-01 1.175 0.24007 factor(TEMPJOB)1:HIDEGBachelors -6.901e-02 2.215e-01 -0.312 0.75536 factor(TEMPJOB)1:HIDEGDoctorate -2.364e-01 8.893e-01 -0.266 0.79037 factor(TEMPJOB)1:HIDEGGED -3.749e-01 3.460e-01 -1.083 0.27870 factor(TEMPJOB)1:HIDEGMasters 8.784e-01 4.668e-01 1.882 0.05985 . factor(TEMPJOB)1:HIDEGNoDegree -3.102e-01 1.735e-01 -1.788 0.07383 . factor(TEMPJOB)1:HIDEGOther -5.344e-02 2.605e-01 -0.205 0.83747 factor(TEMPJOB)1:AGE -2.116e-02 4.812e-03 -4.398 1.09e-05 *** ETHNICAsian:factor(EMPOFFER)1 6.975e-01 2.573e-01 2.711 0.00670 ** ETHNICBlack:factor(EMPOFFER)1 3.477e-01 1.313e-01 2.649 0.00807 ** ETHNICHispanic:factor(EMPOFFER)1 9.307e-01 1.165e-01 7.986 1.39e-15 *** factor(EMPOFFER)1:HIDEGBachelors 2.772e-01 1.630e-01 1.701 0.08900 . factor(EMPOFFER)1:HIDEGDoctorate 3.067e-02 5.981e-01 0.051 0.95910 factor(EMPOFFER)1:HIDEGGED 2.079e-02 2.116e-01 0.098 0.92175 factor(EMPOFFER)1:HIDEGMasters 9.705e-01 3.270e-01 2.968 0.00300 ** factor(EMPOFFER)1:HIDEGNoDegree -6.523e-01 1.196e-01 -5.455 4.89e-08 *** factor(EMPOFFER)1:HIDEGOther -5.967e-02 1.892e-01 -0.315 0.75240 factor(EMPOFFER)1:AGE 2.484e-02 3.546e-03 7.006 2.45e-12 *** ETHNICAsian:HIDEGBachelors -8.319e-01 3.146e-01 -2.644 0.00820 ** ETHNICBlack:HIDEGBachelors -9.955e-02 2.156e-01 -0.462 0.64433 ETHNICHispanic:HIDEGBachelors -6.752e-02 2.144e-01 -0.315 0.75279 ETHNICAsian:HIDEGDoctorate -4.184e-01 6.987e-01 -0.599 0.54930 ETHNICBlack:HIDEGDoctorate 1.119e+01 1.445e+02 0.077 0.93829 ETHNICHispanic:HIDEGDoctorate -7.025e-01 9.284e-01 -0.757 0.44925 ETHNICAsian:HIDEGGED 1.258e+01 3.593e+02 0.035 0.97208 ETHNICBlack:HIDEGGED 7.980e-01 2.654e-01 3.006 0.00264 ** ETHNICHispanic:HIDEGGED 7.515e-01 2.504e-01 3.001 0.00269 ** ETHNICAsian:HIDEGMasters -3.065e-01 5.098e-01 -0.601 0.54775 ETHNICBlack:HIDEGMasters -8.986e-01 4.045e-01 -2.221 0.02632 * ETHNICHispanic:HIDEGMasters 8.873e-01 6.008e-01 1.477 0.13967 ETHNICAsian:HIDEGNoDegree -1.547e-01 3.492e-01 -0.443 0.65784 ETHNICBlack:HIDEGNoDegree 4.810e-02 1.713e-01 0.281 0.77881 ETHNICHispanic:HIDEGNoDegree -1.022e-01 1.311e-01 -0.780 0.43555 ETHNICAsian:HIDEGOther 4.598e-01 6.270e-01 0.733 0.46338 ETHNICBlack:HIDEGOther -3.876e-01 2.304e-01 -1.682 0.09249 . ETHNICHispanic:HIDEGOther -3.502e-01 2.479e-01 -1.413 0.15778 ETHNICAsian:AGE 1.783e-03 1.040e-02 0.172 0.86381 ETHNICBlack:AGE -1.073e-02 4.718e-03 -2.275 0.02289 * ETHNICHispanic:AGE 1.094e-04 4.540e-03 0.024 0.98078 HIDEGBachelors:AGE 5.712e-03 6.469e-03 0.883 0.37724 HIDEGDoctorate:AGE -3.394e-02 2.410e-02 -1.409 0.15895 HIDEGGED:AGE -7.229e-03 8.432e-03 -0.857 0.39127 HIDEGMasters:AGE -3.627e-03 1.304e-02 -0.278 0.78081 HIDEGNoDegree:AGE -3.220e-02 4.196e-03 -7.673 1.68e-14 *** HIDEGOther:AGE -4.729e-03 7.350e-03 -0.643 0.51999 Null deviance: 16109 on 12778 degrees of freedom Residual deviance: 12615 on 12704 degrees of freedom AIC: 12765
16
Figure A‐3 – Difference between Actual Proportion and Prediction versus Prediction for the EMPOFFER Interaction Model
Appendix 6 – EMPOFFER Interaction Model This model has a lower AIC than the proposed model, and one of the variables introduced is statistically significant. This model made TEMPJOB non‐significant. An explanation behind this model is that a temporary job only has explanatory value when the employer offers health coverage, which is intuitive. SEASONAL still does not have any significant predictive value associated with it. It has the same problems associated with it as with the proposed model, such as the “.36” problem (See Figure A‐3). Introducing the interaction terms makes the model less parsimonious. This may be a better model than the proposed model, but a stepwise regression to introduce new interactions should be done (See Appendix 5).
The following is the output from the regression.
glm(formula = COVWHOLE ~ factor(SEASONAL) + factor(TEMPJOB) + factor(SEASONAL) * factor(EMPOFFER) + factor(TEMPJOB) * factor(EMPOFFER) + ETHNIC + SEX + factor(EMPOFFER) + HIDEG + AGE, family = binomial(link = logit)) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.83169 0.08115 -10.249 < 2e-16 *** factor(SEASONAL)1 0.18786 0.10787 1.742 0.08159 . factor(TEMPJOB)1 0.10188 0.09348 1.090 0.27576 factor(EMPOFFER)1 1.56229 0.04943 31.608 < 2e-16 *** ETHNICAsian -0.09311 0.11771 -0.791 0.42892 ETHNICBlack -0.44007 0.06148 -7.158 8.21e-13 *** ETHNICHispanic -0.99267 0.05342 -18.583 < 2e-16 *** SEXM -0.21011 0.04365 -4.814 1.48e-06 *** HIDEGBachelors 0.44608 0.07089 6.292 3.13e-10 *** HIDEGDoctorate 1.01568 0.25724 3.948 7.87e-05 *** HIDEGGED -0.61958 0.09814 -6.314 2.73e-10 *** HIDEGMasters 1.02096 0.13774 7.412 1.24e-13 *** HIDEGNoDegree -0.18901 0.05747 -3.289 0.00101 ** HIDEGOther 0.16749 0.08670 1.932 0.05339 . AGE 0.02716 0.00165 16.464 < 2e-16 *** factor(SEASONAL)1:factor(EMPOFFER)1 -0.33284 0.17134 -1.943 0.05208 . factor(TEMPJOB)1:factor(EMPOFFER)1 -0.77997 0.13282 -5.872 4.29e-09 *** --- Null deviance: 16109 on 12778 degrees of freedom Residual deviance: 13014 on 12762 degrees of freedom AIC: 13048
17
Appendix 7 – SEASONAL/TEMPJOB Interaction Model The AIC for this model was the same as the AIC without, the interaction factor did not have statistical significance, no coefficients or significance was changed for the other variables, and it was less parsimonious than the proposed model. This is overall a worse model than the proposed one.
The following is the output from the regression.
glm(formula = COVWHOLE ~ factor(SEASONAL) + factor(TEMPJOB) + factor(SEASONAL):factor(TEMPJOB) + ETHNIC + SEX + factor(EMPOFFER) + HIDEG + AGE, family = binomial(link = logit))
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.762359 0.080350 -9.488 < 2e-16 *** factor(SEASONAL)1 0.037825 0.119076 0.318 0.750748 factor(TEMPJOB)1 -0.299231 0.076785 -3.897 9.74e-05 *** ETHNICAsian -0.101594 0.117336 -0.866 0.386581 ETHNICBlack -0.440250 0.061336 -7.178 7.09e-13 *** ETHNICHispanic -0.993105 0.053311 -18.629 < 2e-16 *** SEXM -0.206447 0.043546 -4.741 2.13e-06 *** factor(EMPOFFER)1 1.430226 0.045521 31.419 < 2e-16 *** HIDEGBachelors 0.444164 0.070769 6.276 3.47e-10 *** HIDEGDoctorate 1.015207 0.256318 3.961 7.47e-05 *** HIDEGGED -0.619202 0.097739 -6.335 2.37e-10 *** HIDEGMasters 1.014326 0.137610 7.371 1.69e-13 *** HIDEGNoDegree -0.191150 0.057370 -3.332 0.000863 *** HIDEGOther 0.166396 0.086473 1.924 0.054323 . AGE 0.027393 0.001649 16.613 < 2e-16 *** factor(SEASONAL)1:factor(TEMPJOB)1 0.250167 0.168615 1.484 0.137899 --- Null deviance: 16109 on 12778 degrees of freedom Residual deviance: 13063 on 12763 degrees of freedom AIC: 13095