comments on j. a. nelder ‘the statistics of linear models: back to basics’

Statistics and Computing 5 (1995) 97-101

Comments on J. A. Nelder 'The statistics of

linear models." back to basics'

R O B E R T R O D R I G U E Z , R A N D A L L TOBIAS and R U S S E L L W O L F I N G E R

SAS Institute Inc., SAS Campus Drive, Cary, NC 27513 USA

Submitted December 1994 accepted January 1995

1. What does SAS really compute?

We feel obliged to address Nelder's criticisms of Type III sums of squares, since these are an important feature of SAS linear modeling tools. Nelder's criticisms are levelled not so much at how they work, but at whether the data analytic questions which they answer are valid to ask in the first place. By elucidating the hypotheses which they test, we hope to clarify the issues.

Nelder indicts the Type III sum of squares for a main effect in the presence of a potential interaction on two counts: that it 'loses power in any test', and that it is 'obtained by constraining the [interaction] margins to be zero [and thus] corresponds to an uninteresting hypothesis' It is difficult to see the sense in which the first statement is true. A Type III sum of squares provides an optimal test for the hypothesis that L'/3 = 0 for some matrix L of coefficients. It is only less powerful with respect to a different hypothesis or model than the one for which it was constructed--namely, the model without interaction. We can recover this power only at the cost of assuming the interaction to be null or of increasing the Type I error probability by removing it from the model on the basis of its signifi- cance.

With reference to Nelder's second criticism, the Type III sums of squares computed by the SAS G L M procedure are not, in fact, obtained by imposing constraints on the parameters. Instead, they are derived from the estimable functions, which Nelder correctly identifies as the only appropriate bases for inference. Before discussing this further, however, it is helpful to review the four types of sums of squares computed by the G L M procedure.

�9 Type I sums of squares are sequential and model-order dependent, and they are the only one of the four which represent a partition of the total sum of squares of the data. They correspond precisely to the sequences of 'ignoring' and 'eliminating' effects described in Nelder's 0960-3174 �9 1995 Chapman & Hall

Marginality section. Type I sums of squares are computed in the G L M procedure by using a generalized Cholesky root of XIX (which may be rank deficient).

�9 Type II sums of squares are partial and model-order independent. The Type II sum of squares for an effect is adjusted for all other effects in the model except those which are interactions with or are nested within the given effect. One property of the Type II L matrix is that its coefficients are functions of the cell frequencies. Type II sums of squares are computed via a reversible g2-sweep operator as described by Goodnight (1979).

�9 Like Type II sums of squares, Type III and Type IV sums of squares are partial and model-order independent. The rows of their L matrices do not depend upon cell frequencies, but rather exhibit orthogonality and balance. They are computed in the G L M procedure by eliminating, reorganizing, and orthogonalizing columns of the Hermite matrix (XIX)-X~X, and they differ only when there are missing cells in the data.

For the example of a two-way model with interaction and no missing cells, the hypotheses associated with each of these sums of squares are displayed at the end of Nelder's Section 5. In particular, H~ is a Type I hypothesis, Hi* is a Type II hypothesis, and HA is a Type III/IV hypothesis.

To spell out what the Type III sums of squares are testing, note first that the marginality relations require that a main effect sum of squares be corrected for the interaction in some sense. For example, the Type II sums of squares for a main effect in a two-way table measure the extra variation due to A after fitting B and the part of the interaction which Nelder calls 'A.B eliminating (A and B)'. However, this tests a hypothesis which depends on the design. Type III sums of squares were specifically devised to have the same form regardless of the cell counts in the data; they correct a main effect sum of squares for the estimable functions of the interaction parameters, which do not change (assuming that all cell counts are

98

non-zero). Thus, Type III tests address research questions associated with a balanced population of responses, even when the observed data do not exhibit such balance. The advantages of such tests are that they are invariant to ancillary missing-data mechanisms and they are compar- able across similar experiments.

The key relationships between the four types are summarized as follows in Chapter 12.2 of Searle (1987):

I = I I = I I I = IV I I = I I I = I V

I I I = I V

with balanced data with no-interaction models with all-cells-filled data

Because of these relationships, Nelder's criticisms of Type III hypotheses must be seen to apply also to Types I, II, and IV when his marginality constraints are violated. This occurs in the two-way model when testing main effects in the presence of an interaction for unbalanced data situations.

Incidentally, our point that the GLM procedure does not use a constrained parametrization is evidenced by the fact that the expected mean squares it produces for mixed models are the same as those in the lower half of Nelder's Table 2. This has been the subject Of some recent controversy; see McLean et al. (1991) and Samuels et al. (1991).

2. 'Uninteresting' hypotheses

Nelder's second indictment of Type III sums of squares is that they correspond to 'uninteresting' hypotheses. While we find his primary definition of 'uninteresting' somewhat vague, evidently the only hypotheses which Nelder deems interesting are those whose acceptance will lead to a simpli- fication of the model, mainly by dropping terms. While we agree that Type III tests may not, in general, be useful for inferring such model simplifications, we do not concede that this makes them uninteresting. On the contrary, while Type III tests may be less informative about the relationship between the response and the explanatory variables, they are preferable for testing population relationships among the parameters of the model equation themselves (Macnaughton, 1992). The issue is the role of inference in the prediction phase of analysis, which was also raised in the discussion of Nelder's 1977 paper (which covers many of the ideas presented here).

Nelder is not addressing the question, 'Are the Type III sums of squares good at testing for a main effect in the presence of an interaction?' but rather whether it is 'interesting' to ask such a question at all. His answer is an emphatic 'No', even more so than in his 1977 paper. How- ever, some of the discussants of that paper argued that the question is indeed valid; see, in particular, the contributions of Tukey, Cox, John, and Franc and Jennrich. Also see Speed et al. (1978), who present Type III hypotheses

Rodriguez, Tobias and Wolfinger

as common ones, and Searle (1987, Section 4.6), who describes them as 'interesting, reasonable, useful'.

In one sense, the controversy is about whether an effect should be thought of as describing a feature of the data or of the estimated model. Type III sums of squares are connected with the latter, whereas in Nelder's view only the former is valid. This is a philosophical question, and proponents of both sides can produce data and analyses which give evidence for their position. Our view is that either type of question may be appropriate in certain situations, and this is why SAS tools for linear modelling typically offer various different types of sums of squares. The GLM procedure prints Type I and Type III sums of squares by default, the implication being that if they are very different, the user should investigate further.

3. Comparisons of statistical software

A good feature of Nelder's paper is that he considers the practical implications of his ideas for statistical software. While we agree with his general conclusions, we would like to clarify what SAS software offers and argue for broader criteria by which to compare statistical software.

3.1. Criteria

Nelder claims that since the examples in Section 8.2 are taken from the documentation for competitors to GENSTAT, they should be biased in favour of said competitors. However, for the SAS example at least, the criteria that he uses for comparison have little to do with the purpose of the example. While Nelder complains that the PROC ANOVA code in Table 5 requires the user to know the appropriate error term for each sum of squares, the purpose of the example from which the code was extracted was precisely to show how a knowledgeable user can apply the TEST statement to construct particular tests. If the right error terms are unknown, the RANDOM statement for the GLM procedure can be used to find them:

proc glm; class rep soil ca fert; model y = calfertlsoil

rep rep*fert rep*soil rep*soil*fert rep*ca*fert; random rep rep*soil rep*fert rep*soil*fert rep*ca*fert/

test;

Since this is a mixed model, another alternative is to use the MIXED procedure (see also section 3.3 below):

proc mixed; class rep soil ca fert; model y = calfert[soil; random int fert[soil ca*fert [ subject=rep;

Comments on a paper by J. A. Nelder 99

These two approaches are much closer to the GENSTAT code in Table 5b in terms of Nelder's 'number of tokens for the parser' metric. The primary differences are that

�9 GLM and MIXED require a CLASS statement listing the qualitative factors, since they can also handle ANOVA with quantitative covariates

�9 effects modelling in SAS software does not have an equivalent to the GENSTAT nested model operator /, which provides

A/B/C ~ A + A.B + A.B.C

On the other hand, we must register scepticism about the token-count metric. Brevity is not the only desirable feature of code; versatility, depth, and readability are also important. Moreover, one is always able to write better code with a familiar language. SAS experts would be able to reproduce Nelder's analyses with about the same amount of effort as it took him with GENSTAT. We have shown this above for the ANOVA example; for the analysis of deviance example, an analyst who is familiar with SAS programming would know to use the following code:

proc plan ordered; factors chd=2 serum=4 press=4 ] noprint; output out=anodev chd cvals=('Y' 'N');

data anodev; set anodev; input num @@; datalines; 2 3 3 4 3 2 1 38 11 6 6 7 12 11 11 117 121 47 22 85 98 43 20 119 209 68 43 67 99 46 33

proc genmod data=anodev; class chd serum press; model n u m = serumlpresslchd / dist = p typel ;

3.2. Interactive versus 'batch'

Nelder's approach to linear modelling is interactive and iterative, and it measures effects by the amount of variation they account for over and above marginal effects. We heartily agree that looking at the data under several models will very likely prove more informative than analysis under a single model. Savvy data analysts will use their statistical software of choice in precisely this way--as a statistical computing environment and programming language. For example, the following code uses a SAS macro to compute an incremental linear analysis of the data from Andrews and Herzberg (1985), as in the first part of the GENSTAT code shown in Nelder's Table 4.

%let data = children; %let depvar = y; %let weight = n; %addterms ( a b c d e f ); %addterms (# a[blcldlelf@2); %addterms(# alb[cldlelf@3); proc print data=accum; run;

]* Fit main effects */ ]*.. . the�9 2-fact�9 interactions *] ]*.. . then 3-factor interactions *[ [* Display acc�9 analysis *]

TYPE DF SS MS VR

a b c de f 6 1866.35 311.058 14.6334 # a[blcldlelf@2 15 413 .61 27.574 1.2972 # alblcldle]f@3 20 3 6 8 . 9 6 18.448 0.8679 Residual 22 467 .65 21.257 1.0000 Total 63 3116.57 49 .469 2.3272

Admittedly, the %addterms macro is not a 'primitive opera- tion' in SAS, but once it is created it can be treated as one.

However, Nelder's opinion that 'batch processing statistical computing' merely reflects outmoded technological limitations is simplistic. For example, by using a batch driver in conjunction with a cut-and-paste facility, code from a previous analysis can be quickly recalled and modi- fied by adding or deleting effects, specifying interesting contrasts, etc., all of which is prompted by the intelligent interpretation of previous output. This underscores the importance of a clear, flexible syntax, and GENSTAT, S, and SAS all do fairly well in this regard. Of course, the trend is toward graphical point-and-click interfaces, two examples of which are SAS/INSIGHT and JMP software; see SAS Institute Inc. (1993, 1994). These interfaces are especially suitable for researchers with limited statistics backgrounds and little or no programming skills.

The fact is that much of the way statistics is taught and practised is of the one-analysis-per-data-set variety, and it is unlikely that this is just because common statistical software offers extensive printed output, as Nelder suggests in Section 8. (The truism that 'You can't make all of the people happy all of the time' is never so relevant than in deciding what to print by default with a statistical procedure.) Alternatives to this approach are still under- developed, though the knowledge-based front end to GLIM which Nelder describes is definitely an important advance. Our hope is that competition among software, combined with advances in interface technology, will stimu- late joint improvement and subsequent benefit to many kinds of researchers.

3.3. Mixed models

Much of the confusion over mixed models mentioned by Nelder is avoided by the SAS MIXED procedure (see the above example). Restricted/residual maximum likelihood (REML) assuming normality is the default estimation method (elegantly handling unbalanced data), and F-tests are formed using only the fixed effects design matrix. The general F-statistic has the following form:

F = (L~)'[Lt(X' I?-IX)-L] -1 (L~)/(rank(L))

where L is a suitably chosen contrast matrix, l? is the estimated variance-covariance matrix of the data, and

3 =

A drawback is that for many unbalanced data situations,

100

F is only approximately F-distributed, with numerator degrees of freedom equal to the rank of L, and denomina- tor degrees of freedom selected by considering rank relationships between the fixed and random effects or by Satterthwaite's method. However, F has the favourable property of automatically incorporating the appropriate error terms without directly computing expected mean squares.

4. Points for clarification

Finally, we would like to mention a number of aspects that merit clarification by Professor Nelder.

4.1. Expected mean squares

In Table 2, Nelder uses a very concise notation in order to show the relationship between different models. Unfortu- nately, this notation glosses over important distinctions. For example, the ~-]~A terms in the table define different functions of different parameters in different cells of the table. For the constrained parameters,

A i

while for the unconstrained parameters

: Z ( O ~ i - O~. ) 2

A i (2)

except for the case where both A and B are fixed, where

Z : A i

1 _ - + ( 3 )

N~ ;j

(see Nelder, 1977). In particular, (1) and (2) always define positive quantities which manifestly measure an A main effect, but this is not the case for (3).

Rodriguez, Tobias and Wolfinger

'violation.' He gives no firm definition, but there are several examples of what he doesn't like.

1. In the discussion of the fractional factorial design from Pignatiello and Ramberg (1985), the neglect of marginality consists of deciding that an interaction is non-null but that its constituent main effects are null. In this case, marginatity dictates that

If you believe there is an interaction then it makes little sense to say that the constituent factors do not affect the response.

2. In other examples, marginality is evidently violated by even trying to test a hypothesis about main effects without first deciding that interactions to which they are marginal are null. This is the Sense of Nelder's discussion of the Hocking and Speed (1975) hypotheses in Section 5. Evi- dently the marginality constraint here is:

Unless you believe there is no interaction then it makes little sense to consider whether there is a statistically significant difference between factor margins.

We would agree that the first situation here is problematic and should be avoided, although it is not entirely out of the question; even Nelder notes that large interactions with small main effects are not impossible. However, we would stop short of the second, much stronger characterization.

We ask, what does strict adherence to Nelder's marginality constraints tell us to do with the following data?

395 400 894 903 898 906 901 899 902 1593 1586

Here both main effects as well as the interaction are highly significant, but the main effects are much larger than the interaction. Should we simply say that the mean response depends on both factors and leave it at that, ignoring the fact that both factors independently cause an increase in the response, and that this dependence accounts for most of the observed variation? Often, this is how interactions show up in industrial experimentation--as effects which, while significantly larger than background noise, are much less important than general factor effects.

4.2. Marginality

Nelder's definition of marginality is very clear, although it might be more elegant to base the definition on linear spaces, as in McCullagh and Nelder (1989): E1 is marginal to E2 if the linear space defined by E1 is a subspace of the space defined by E2. This is in line with the definition of marginality for classification effects, but for polynomials, it implies that it is more natural to think of the entire first-order model/30 +/31x as being marginal to the quad- ratic model/30 +/31x + 32 x2.

However, it is not so clear what Nelder means by the 'marginality constraints' and what constitutes their

4.3. Non-centrality parameters

Nelder substantively discusses the non-centrality parameter

A = (L/3)'[L'(XtX)-L] -l(L/3)/cr 2

(where L is a suitably chosen contrast matrix) only in the first paragraph of Section 5. The two references in that paragraph, Searle (1987) and SAS Institute Inc. (1985), hardly mention non-centrality parameters, and Nelder only appears to do so as a novel way of expressing certain hypotheses. Most statistical analysts never consider non-centrality parameters outside of power and sample size calculations, let alone taking the third 'false step' of

Comments on a paper by J. A. Nelder

'Confusing non-centrality parameters in the expectation of sums of squares with the corresponding hypothesis that they might be used to test.'

4.4. Predictive margins

In the final paragraph in Section 7, Nelder states that ' . . . when the weights used in forming the predictive margins are the internal ones (nij), the Type III sum of squares coincides with Yates' weighted sum of squares of means (Yates, 1934).' While we agree about the correspondence between this sum of squares and Yates' procedure, it seems that they compare predictive margins constructed using equal weights (least squares means in the GLM procedure).

References

Andrews, D. F. and Herzberg, A. M. (1985) Data. Springer- Verlag, New York.

Goodnight, J. H. (1979) A tutorial on the sweep operator. American Statistician, 33, 149-158.

Hocking, R. R. and Speed, F. M. (1975) A full rank analysis of some linear model problems. Journal of the American StTtistical Association, 70, 706 12.

Macnaughton, D. B. (1992) Which sums of squares are best in unbalanced ANOVA? Unpublished manuscript presented at the Joint Statistical Meetings, Boston, 11 August 1992.

101

McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd edn. Chapman & Hall, London.

McLean, R. A., Sanders, W. L. and Stroup, W. W. (1991) A unified approach to mixed linear models. American Statisti- cian, 45, 54-63.

Nelder, J. A. (1977) A reformulation of linear models (with discussion). Journal of the Royal Statistical Society, Series A, 140, 48-76.

Pignatiello, J. J. and Ramberg, J. S. (1985) Contributions to discussion of off-line quality control, parameter design, and the Taguchi method. Journal of Quality Technology, 17, 198-206.

Samuels, M. L., Casella, G. and McCabe, G. P. (1991) Interpret- ing blocks and random factors. Journal of the American Statistical Association, 86, 798-808.

SAS Institute Inc. (1989) SAS/STAT User's Guide, Version 6, 4th edn., Vol. 1. SAS Institute Inc., Cary, NC.

SAS Institute Inc. (1992) SAS Technical Report P-229. SAS/ STAT Software: Changes and Enhancements, Release 6.07. Chapter 16: The MIXED Procedure. SAS Institute Inc., Cary, NC.

SAS Institute Inc. (1993) SAS/INSIGHT User's Guide, Version 6, 2nd edn. SAS Institute Inc., Cary, NC.

SAS Institute Inc. (1994) JMP User's Guide, Version 3. SAS Insti- tute Inc., Cary, NC.

Searle, S. R. (1987) Linear Models for Unbalanced Data. John Wiley & Sons, New York.

Speed, F. M., Hocking, R. R. and Hackney, O. P. (1978) Methods of analysis of linear models with unbalanced data. Journal of the American Statistical Association, 73, 105-12.

comments on j. a. nelder ‘the statistics of linear models: back to basics’

Documents