using quasi-variance to communicate sociological results from statistical models

27
Using Quasi-variance to Communicate Sociological Results from Statistical Models Vernon Gayle & Paul S. Lambert University of Stirling Gayle and Lambert (2007) Sociology, 41(6):1191-1208.

Upload: leilani-briggs

Post on 30-Dec-2015

30 views

Category:

Documents


0 download

DESCRIPTION

Using Quasi-variance to Communicate Sociological Results from Statistical Models. Vernon Gayle & Paul S. Lambert University of Stirling. Gayle and Lambert (2007) Sociology, 41(6):1191-1208. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Using Quasi-variance to Communicate Sociological Results

from Statistical Models

Vernon Gayle & Paul S. LambertUniversity of Stirling

Gayle and Lambert (2007) Sociology, 41(6):1191-1208.

Page 2: Using Quasi-variance to Communicate Sociological Results from Statistical Models

“One of the useful things about mathematical and statistical models [of educational realities] is that, so long as one states the assumptions clearly and follows the rules correctly, one can obtain conclusions which are, in their own terms, beyond reproach. The awkward thing about these models is the snares they set for the casual user; the person who needs the conclusions, and perhaps also supplies the data, but is untrained in questioning the assumptions….

Page 3: Using Quasi-variance to Communicate Sociological Results from Statistical Models

…What makes things more difficult is that, in trying to communicate with the casual user, the modeller is obliged to speak his or her language – to use familiar terms in an attempt to capture the essence of the model. It is hardly surprising that such an enterprise is fraught with difficulties, even when the attempt is genuinely one of honest communication rather than compliance with custom or even subtle indoctrination” (Goldstein 1993, p. 141).

Page 4: Using Quasi-variance to Communicate Sociological Results from Statistical Models

A little biography (or narrative)…

• Since being at Centre for Applied Stats in 1998/9 I has been thinking about the issue of model presentation

• Done some work on Sample Enumeration Methods with Richard Davies

• Summer 2004 (with David Steele’s help) began to think about “quasi-variance”

• Summer 2006 began writing a paper with Paul Lambert

Page 5: Using Quasi-variance to Communicate Sociological Results from Statistical Models

The Reference Category Problem

• In standard statistical models the effects of a categorical explanatory variable are assessed by comparison to one category (or level) that is set as a benchmark against which all other categories are compared

• The benchmark category is usually referred to as the ‘reference’ or ‘base’ category

Page 6: Using Quasi-variance to Communicate Sociological Results from Statistical Models

The Reference Category Problem

An example of Some English Government Office Regions

0 = North East of England

----------------------------------------------------------------

1 = North West England

2 = Yorkshire & Humberside

3 = East Midlands

4 = West Midlands

5 = East of England

Page 7: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Government Office Region

Page 8: Using Quasi-variance to Communicate Sociological Results from Statistical Models

1 2 3 4

Beta StandardError

Prob. 95% Confidence Intervals

No Higher qualifications - - - - -

Higher Qualifications 0.65 0.0056 <.001 0.64 0.66

Males - - - - -

Females -0.20 0.0041 <.001 -0.21 -0.20

North East - - - - -

North West 0.09 0.0102 <.001 0.07 0.11

Yorkshire & Humberside 0.12 0.0107 <.001 0.10 0.14

East Midlands 0.15 0.0111 <.001 0.13 0.17

West Midlands 0.13 0.0106 <.001 0.11 0.15

East of England 0.32 0.0107 <.001 0.29 0.34

South East 0.36 0.0101 <.001 0.34 0.38

South West 0.26 0.0109 <.001 0.24 0.28

Inner London 0.17 0.0122 <.001 0.15 0.20

Outer London 0.27 0.0111 <.001 0.25 0.29

Constant 0.48 0.0090 <.001 0.46 0.50

Table 1: Logistic regression prediction that self-rated health is ‘good’ (Parameter estimates for model 1 )

Page 9: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Beta StandardError

Prob. 95% Confidence Intervals

North East - - - - -

North West 0.09 0.07 0.11

Yorkshire & Humberside 0.12 0.10 0.14

Page 10: Using Quasi-variance to Communicate Sociological Results from Statistical Models
Page 11: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Conventional Confidence Intervals

• Since these confidence intervals overlap we might be beguiled into concluding that the two regions are not significantly different to each other

• However, this conclusion represents a common misinterpretation of regression estimates for categorical explanatory variables

• These confidence intervals are not estimates of the difference between the North West and Yorkshire and Humberside, but instead they indicate the difference between each category and the reference category (i.e. the North East)

• Critically, there is no confidence interval for the reference category because it is forced to equal zero

Page 12: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Formally Testing the Difference Between Parameters -

)ˆˆ( s.e.

ˆˆ

3-2

3-2

t

The banana skin is here!

Page 13: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Standard Error of the Difference

))ˆˆ( (cov 2 - )ˆvar( )ˆvar( 3-232

Variance North West (s.e.2 )

Variance Yorkshire & Humberside (s.e.2 )

Only Available in the variance covariance matrix

Page 14: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Table 2: Variance Covariance Matrix of Parameter Estimates for the Govt Office Region variable in Model 1

Column 1 2 3 4 5 6 7 8 9

Row North West

Yorkshire &Humberside

East Midlands

West Midlands

East England

South East South West Inner London

Outer London

1 North West .00010483

2 Yorkshire &Humberside

.00007543 .00011543

3 East Midlands

.00007543 .00007543 .00012312

4 West Midlands

.00007543 .00007543 .00007543 .00011337

5 East England

.00007544 .00007543 .00007543 .00007543 .0001148

6 South East .00007545 .00007544 .00007544 .00007544 .00007545 .00010268

7 South West .00007544 .00007543 .00007544 .00007543 .00007544 .00007546 .00011802

8 Inner London

.00007552 .00007548 .0000755 .00007547 .00007554 .00007572 .00007558 .00015002

9 Outer London

.00007547 .00007545 .00007546 .00007545 .00007548 .00007555 .00007549 .00007598 .00012356

Covariance

Page 15: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Standard Error of the Difference

Variance North West (s.e.2 )

Variance Yorkshire & Humberside (s.e.2 )

Only Available in the variance covariance matrix

)0.00007543 ( 2- 0.00011543 0.000104830.0083 =

Page 16: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Formal Tests

t = -0.03 / 0.0083 = -3.6

Wald 2 = (-0.03 /0.0083)2 = 12.97; p =0.0003

Remember – earlier because the two sets of confidence intervals overlapped we could wrongly conclude that the two regions were not significantly different to each other

Page 17: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Comment

• Only the primary analyst who has the opportunity to make formal comparisons

• Reporting the matrix is seldom, if ever, feasible in paper-based publications

• In a model with q parameters there would, in general, be ½q (q-1) covariances to report

Page 18: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Firth’s Method (made simple)

)ˆvar( )ˆvar( 32 quasiquasi s.e. difference ≈

Page 19: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Table 1: Logistic regression prediction that self-rated health is ‘good’ (Parameter estimates for model 1, featuring conventional regression results, and quasi-variance statistics )

1 2 3 4 5

Beta StandardError

Prob. 95% Confidence Intervals

Quasi-Variance

No Higher qualifications - - - - - -

Higher Qualifications 0.65 0.0056 <.001 0.64 0.66 -

Males - - - - - -

Females -0.20 0.0041 <.001 -0.21 -0.20 -

North East - - - - - 0.0000755

North West 0.09 0.0102 <.001 0.07 0.11 0.0000294

Yorkshire & Humberside 0.12 0.0107 <.001 0.10 0.14 0.0000400

Page 20: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Firth’s Method (made simple)

)ˆvar( )ˆvar( 32 quasiquasi s.e. difference ≈

0.0000400 0.0000294 0.0083 =

t = (0.09-0.12) / 0.0083 = -3.6

Wald 2 = (-.03 / 0.0083)2 = 12.97; p =0.0003

These results are identical to the results calculated by the conventional method

Page 21: Using Quasi-variance to Communicate Sociological Results from Statistical Models

The QV based ‘comparison intervals’ no longer overlap

Page 22: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Firth QV Calculator (on-line)

Page 23: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Table 2: Variance Covariance Matrix of Parameter Estimates for the Govt Office Region variable in Model 1

Column 1 2 3 4 5 6 7 8 9

Row North West Yorkshire &Humberside

East Midlands

West Midlands

East England

South East South West Inner London

Outer London

1 North West .00010483

2 Yorkshire &Humberside

.00007543 .00011543

3 East Midlands

.00007543 .00007543 .00012312

4 West Midlands

.00007543 .00007543 .00007543 .00011337

5 East England .00007544 .00007543 .00007543 .00007543 .0001148

6 South East .00007545 .00007544 .00007544 .00007544 .00007545 .00010268

7 South West .00007544 .00007543 .00007544 .00007543 .00007544 .00007546 .00011802

8 Inner London

.00007552 .00007548 .0000755 .00007547 .00007554 .00007572 .00007558 .00015002

9 Outer London

.00007547 .00007545 .00007546 .00007545 .00007548 .00007555 .00007549 .00007598 .00012356

Page 24: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Information from the Variance-Covariance Matrix Entered into the Data Window (Model 1)

0

0 0.00010483

0 0.00007543 0.00011543

0 0.00007543 0.00007543 0.00012312

0 0.00007543 0.00007543 0.00007543 0.00011337

0 0.00007544 0.00007543 0.00007543 0.00007543 0.00011480

0 0.00007545 0.00007544 0.00007544 0.00007544 0.00007545 0.00010268

0 0.00007544 0.00007543 0.00007544 0.00007543 0.00007544 0.00007546 0.00011802

0 0.00007552 0.00007548 0.00007550 0.00007547 0.00007554 0.00007572 0.00007558 0.00015002

0 0.00007547 0.00007545 0.00007546 0.00007545 0.00007548 0.00007555 0.00007549 0.00007598 0.00012356

Page 25: Using Quasi-variance to Communicate Sociological Results from Statistical Models
Page 26: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Conclusion – We should start using method

Benefits

• Overcomes the reference category problem when presenting models

• Provides reliable results (even though based on an approximation)

• Easy(ish) to calculate

• Has extensions to other models

Costs

• Extra column in results

• Time convincing colleagues that this is a good thing

Page 27: Using Quasi-variance to Communicate Sociological Results from Statistical Models

Conclusion –Why have we told you this…

• Categorical X vars are ubiquitous

• Interpretation of coefficients is critical to sociological analyses– Subtleties / slipperiness– (cf. in Economics where emphasis is often on

precision rather than communication)