resolving the goldilocks problem: model specification

31
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Model specification Jane E. Miller, PhD

Upload: caleb-mays

Post on 30-Dec-2015

32 views

Category:

Documents


0 download

DESCRIPTION

Resolving the Goldilocks problem: Model specification. Jane E. Miller, PhD. Overview. Model specification approaches to resolving the Goldilocks problem include Standardized coefficients Logarithmic transformation Other specification issues. Standardized coefficients. - PowerPoint PPT Presentation

TRANSCRIPT

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Resolving the Goldilocks problem: Model specification

Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Overview

• Model specification approaches to resolving the Goldilocks problem include– Standardized coefficients– Logarithmic transformation– Other specification issues

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Standardized coefficients

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Unstandardized coefficients• Unstandardized βs estimate the effect of a 1-

unit increase in Xi on Y, where the effect size is measured in the original units of Y.

• A “one-size-fits-all” approach to interpreting βs can be misleading because variables– Represent different levels of measurement,– Have different units of measurement,– Have varying distributions of values,– Occur in different real-world circumstances.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Standardized coefficients• A standardized coefficient estimates the effect of a

one-standard-deviation increase in Xi on Y– Measured in standard deviation units of Y

• e.g., an effect size of 0.3 would mean 30% of a standard deviation in the dependent variable

– Similar to standardized scores or z-scores

• Standardized βs provide a consistent metric in which to compare the relative sizes of the βs on continuous independent variables with different ranges and scales.– Contrast for each IV is its standard deviation

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Using standardized coefficients• Commonly used for psychological or

attitudinal scales for which the units have no inherent meaning.

• Should not be used for variables for which a one-standard-deviation increase lacks an intuitive interpretation. E.g.,– dummy variables – interaction terms

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Specifying a model with standardized coefficients

• Easily specified as an option to an OLS model in most statistical packages.

• Identify the dependent and independent variables as usual.– Enter them in the model specification in their

original, untransformed versions.• Do not create versions in the metric of standard

deviations. The software will do that for you!

• Request “standardized betas”

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Descriptive statistics to report if you use standardized coefficients• In table of descriptive statistics, report the

mean, minimum and maximum values and standard deviation in the original units for – each independent variable (IV)– the dependent variable (DV)

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Describing standardized coefficients in prose

• In the results section, interpret the effect sizes for different IVs in terms of multiples or percentages of the standard deviation in the DV– E.g., “A one-standard-deviation increase in the

income-to-poverty ratio (IPR) is associated with an increase of 19.6% of a standard deviation in birth weight (about 38 grams), roughly twice the size of the corresponding standardized coefficient on mother’s age (9.7%).”

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Reporting the effect size in original units

“A one-standard-deviation increase in the income-to-poverty ratio (IPR) is associated with an increase of 19.6% of a standard deviation in birth weight (about 38 grams), roughly twice the size of the corresponding standardized coefficient on mother’s age (9.7%).”

• Note that the effect size is also reported back in the original units of the DV (grams in this case), to facilitate intuitive understanding in the context of the specific research question and variables.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Logarithmic specifications

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Logarithmic specifications

• Another approach to comparing βs across variables with different ranges and scales is to take logarithms of the– dependent variable (Y), – independent variable(s) (Xis),

– or both.

• The βs on the transformed variable(s) lend themselves to straightforward interpretations such as percentage change.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Types of logarithmic specifications

• Lin-lin• Lin-log• Log-lin• Log-log – Also known as “double log”

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Lin-lin specifications

• Review: For OLS models in which neither the IV nor the DV is logged, β measures the change in Y for a 1-unit increase in X1, – the changes are measured in the respective units

of the IV and DV.

• In the lingo of logarithmic specifications, these models are termed “lin-lin” models because they are linear in both the IV and DV

Y = β0 + β1X1

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Lin-log specifications• Lin-log models are of the form Y = β0 + β1 lnX1.

Where lnX1 is the natural log (base e) of X1

• For such models, β1 ÷ 100 gives the change in the original units of the DV for a 1 percent increase in the IV.

• E.g., in a model of earnings, βlog(hours worked) =

5,905.3:– “Each 1 percent increase in monthly hours worked is

associated with a NT$ 59 increase in monthly earnings.”

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Log-lin specifications

• Log-lin models are of the form lnY = β0 + β1X1.

• For such models, 100 (eβ – 1) gives the percentage change in Y for a 1-unit increase in X1,– Where the increase in X1 is in its original units.

• E.g., “For each additional child a woman has, her monthly earnings are reduced by 3.6 percent.”

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Log-log specifications• Log-log models are of the form lnY = β0 + β1lnX1

• For such models, β1 estimates the percentage change in the Y for a one percent increase in X1.– This measure is known in economics as the

elasticity (Gujarati 2002).

• E.g., “A 1 percent increase in monthly hours worked is associated with a 0.6% increase in monthly earnings.”

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Choice of contrast size for logarithmic models

• Caveat: The scale of the logged variable must be taken into account when choosing an appropriate-sized contrast.

• E.g., a 1-unit increase in ln(monthly hours worked) from 5.3 to 6.3 is equivalent to an increase from 200 to 544 hours per month. – That contrast is nearly a 2.5 fold increase in hours.– Implies working three-quarters of all day and

night-time hours, 7 days a week.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Review: Assess whether a 1-unit increase in the variable is the right sized contrast

• Always consider whether a 1-unit increase in the variable as specified in the model makes sense in its real world context!– Topic– Distribution in the data

• If not, use theoretical and empirical criteria for choosing a fitting sized contrast.– See podcast on measurement and variables

approaches to resolving the Goldilocks problem

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Descriptive statistics to report if you use a logarithmic specification• In a table of descriptive statistics, report the

mean and range both– In the original, untransformed units, such as

income in dollars, which are • more intuitively understandable• easier than the logged version to compare with values

from other samples.

– In the logged units, so readers know the range and scale of values to apply to the estimated coefficients.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Interpreting coefficients from logarithmic specifications

• Taking logs of the IV(s) and/or DV affects interpretation of the estimated coefficients.

• If your models include any logged variables, report the pertinent units as you write about the βs, especially if– your specifications include a mixture of logged

and non-logged variables;– you are testing the sensitivity of your findings to

different logarithmic specifications.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Goldilocks issues for other types of specifications

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Polynomial: Quadratic specification of IPR/ birth weight pattern

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Goldilocks issues for polynomials• In models involving polynomials such as Xi and Xi

2, the effect of a 1-unit increase in Xi on Y varies for different values of Xi.– E.g., cannot generalize the size of the effect of Xi on Y for

all values of Xi.

• To convey shape of the association between Xi and Y.– In the text, present change in Y for each of several

contrasts in values of Xi.

– Create a graph.

• See podcast on polynomials for more information.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Goldilocks issues for interactions

• In models involving interactions, βs on main effect and interaction terms for two or more IVs must be combined to calculate the overall effect on the DV.

• Cannot examine the effect of a 1-unit change in only one of those variables based on its β alone.

• See chapter and podcasts on interactions.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Summary• Certain model specifications can help reduce

Goldilocks problems by imposing a consistent metric to facilitate comparison of βs across independent variables with different levels and ranges. E.g.,– A 1-standard deviation increase, from standardized

coefficients– A 1% increase from log-log coefficients.

• Models involving non-linear functions or interactions complicate the Goldilocks issue because the effect of each variable involves several terms.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Suggested resources• Miller, J. E., 2013. The Chicago Guide to

Writing about Multivariate Analysis, 2nd Edition. – Chapter 10 on Goldilocks problem, standardized

coefficients, and polynomials– Chapter 8, on standardized scores and z-scores– Chapter 16, on interactions

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

More suggested resources

• Miller, J. E. and Y. V. Rodgers, 2008. “Economic Importance and Statistical Significance: Guidelines for Communicating Empirical Research.” Feminist Economics 14 (2): 117–49.

• Kachigan, Sam Kash. 1991. Multivariate Statistical Analysis: A Conceptual Introduction. 2nd Edition. New York: Radius Press. on standardized coefficients.

• Gujarati, Damodar N. 2002. Basic Econometrics. 4th ed. New York: McGraw-Hill/Irwin, on logarithmic specifications.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Supplemental online resources• Podcasts on – Defining the Goldilocks problem– Resolving the Goldilocks problem• Measurement and variables• Presenting results

– Calculating the shape of a polynomial– Calculating the shape of an interaction pattern

• Online appendix on interpreting coefficients from logarithmic specifications.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Suggested practice exercises• Study guide to The Chicago Guide to Writing

about Multivariate Analysis, 2nd Edition.– Suggested course extensions for chapter 10 • “Applying statistics and writing” question #5.• “Revising” questions #1, 2, 3, and 9.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Contact information

Jane E. Miller, [email protected]

Online materials available athttp://press.uchicago.edu/books/miller/multivariate/index.html