sociological methodology, volume 11 (1980), 413-458 ... · robert m. hauser university of ... one...

47
Some Exploratory Methods for Modeling Mobility Tables and Other Cross-ClassifiedData Robert M. Hauser Sociological Methodology, Volume 11 (1980), 413-458. Stable URL: http://links.jstor.org/sici?sici=008 1- 1750%28 1980%2911%3C413%3ASEMFMM%3E2.O.C0%3B2-B Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. Sociological Methodology is published by American Sociological Association. Please contact the publisher for further permissions regarding the use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/asa.html. Sociological Methodology 01980 American Sociological Association JSTOR and the JSTOR logo are trademarks of JSTOR, and are Registered in the U.S. Patent and Trademark Office. For more information on JSTOR contact [email protected]. 02002 JSTOR http://www.jstor.org/ Mon Feb 4 13:42:28 2002

Upload: vuongduong

Post on 05-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Some Exploratory Methods for Modeling Mobility Tables and Other Cross-Classified Data

Robert M. Hauser

Sociological Methodology, Volume 11 (1980), 413-458.

Stable URL: http://links.jstor.org/sici?sici=008 1- 1750%28 1980%2911%3C413%3ASEMFMM%3E2.O.C0%3B2-B

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Sociological Methodology is published by American Sociological Association. Please contact the publisher for further permissions regarding the use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/asa.html.

Sociological Methodology 01980 American Sociological Association

JSTOR and the JSTOR logo are trademarks of JSTOR, and are Registered in the U.S. Patent and Trademark Office. For more information on JSTOR contact [email protected].

02002 JSTOR

http://www.jstor.org/ Mon Feb 4 13:42:28 2002

SOME EXPLORATORY METHODS FOR MODELING MOBILITY

TABLES AND OTHER CROSS-CLASSIFIED DATA

Robert M. Hauser UNIVERSITY O F WISCONSIN-MADISON

This research was supported by grants from the National Science Foun- dation (GI-31604 and GI-44336), the National Institute of Mental Health (MH- 06275), and the Institute for Research on Poverty at the University of Wisconsin- Madison under funds granted by the Equal Opportunity Act of 1964 and admin- istered by the Department of Health, Education, and Welfare. It was written in part while the author was a Fellow at the Center for Advanced Study in the Behavioral Sciences with support from the National Institute of Mental Health, the Spencer Foundation, and the Graduate School of the University of Wiscon- sin-Madison. Peter J. Dickinson, Randy D. Hodson, and James Baron assisted in computations, using facilities of the Center for Demography and Ecology that are supported by a grant from the National Institute for Child Health and Human Development (HD-5876). I thank Otis Dudley Duncan, Leo A. Goodman, William Mason, Halliman H. Winsborough, Burton Singer, Harry P. Travis, Taissa S. Hauser, Randy D. Hodson, and Yossi Shavit for advice and criticism. Any

ROBERT M. HAUSER

Social scientists often analyze square tables of counts, where relationships, persons, or other subjects of interest have been classi- fied twice using the same set of categories. For example, marriages or friendships may be classified by the occupation, education, ethnicity, or religion of each participant. A sample of people may be cross-classified by place of birth and place of residence or by politi- cal party preference before and after an electoral campaign. In studies of social mobility, persons are often classified by their own occupation (or education or social class) and by the occupation (or education or social class) of their fathers.

How should one interpret the counts in such a classifica- tion-say, one of persons by their own occupation and by occupa- tion of father? A common answer is to regard these counts as the products of prevalence and interaction effects. Because of the prev- alence of blue-collar occupations, there are many blue-collar work- ers with blue-collar fathers; that number may be greater (or less) owing to an interaction between occupation of son and father.

The verbal distinction between prevalence and interaction is easy to maintain, but for many years a sound statistical representa- tion of it eluded the efforts of social scientists. The history of this pursuit and the common faults of proposed solutions have been reviewed by Hauser (1978; see also Featherman and Hauser, 1978, chap. 4). In the next section of this chapter I describe a class of multiplicative (log linear) models whose parameters correspond exactly to the intuitive concepts of prevalence and interaction effects.l Empirically, the correspondence between parameters and

opinions, findings, conclusions, or recommendations are those of the author and do not necessarily reflect the views of the National Science Foundation or other agencies supporting this work.

' I assume the familiarity of the reader with log linear models for frequency data. Fienberg (1970a, 1977), Goodman (1972a, 1972b), and Davis (1974) give useful introductions, as does the comprehensive treatise by Bishop and others (1975). I rely heavily on methods for the analysis of incomplete tables, which have been developed by Goodman (1963, 1965, 1968, 1969a, 1969b, 1971, 1972c), Bishop and Fienberg (1969), Fienberg (1970b, 1972), and Mantel (1970); again, Bishop and others (1975, esp. pp. 206-21 1,225-228,282-309,320-324) is valuable. Applications of log linear models to occupational mobility data include several of the papers by Goodman just cited and, also, Hope (1974, 1978), Hauser and others (1975), Pullum (1975), Iutaka and others (1975), Featherman and others (1975), Ramsay (1977), Hauser and Featherman (1977), Baron (1978), Hauser (1978), Goldthorpe and others (1978), Goldthorpe and Payne (1979), Featherman and Hauser (1978), Duncan (1979), and Goodman (1979a, 1979b).

concepts only becomes useful when a model of the desired form fits the data. One usually assumes the existence of prevalence effects for categories of the classificatory variables, so the empirical problem is to specify the form of the interactions. Sometimes sociological theory will provide sufficient guidance in model specification (Goldthorpe and Payne, 1979; Hope, 1978), but often theory will provide incorrect, incomplete, or contradictory directions. For these reasons I describe methods for assessing goodness of fit and for - -

improving specification through the examination of residuals. Users of empirically based search strategies run the risk of

overfitting the data; that is, one loses parsimony and reliability by seeking to fit every feature of a sample of observations. To minimize such misuse of my empirically guided search methods, I describe methods of aggregating and smoothing data prior to model selec- tion. In the same spirit I sometimes encourage the acceptance of models that would be rejected on narrowly statistical grounds- provided they have other interpretative virtues (Bishop and others, 1975, chap. 9). For illustrative purposes, I use tables of occupational mobility from the United States and Great Britain, but the present methods are applicable to other types of data.2

A MULTIPLICATIVE MODEL OF THE MOBILITY TABLE

My model is a special case of Goodman's (1972~) general multiplicative model for cross-classified data, but I take a slightly different approach from him in developing models of the mobility table. First, I limit my attention to a class of models in which only one interaction parameter applies to each cell in the classification. Second, I do not assume that occupational categories are ordered. Third, I emphasize the use of exploratory methods in model speci- fication. Elsewhere these models and methods have been applied in analyses of the 1949 British mobility table (Hauser, 1978), several American mobility tables (Featherman and Hauser, 1978, chap. 4), Rogoff's (1953) Indianapolis mobility tables (Baron, 1977, 1978), and a 1972 British mobility table (Goldthorpe and Payne, 1979).

Let x i j be the observed frequency in the ijth cell of the classification of men by their own occupations ( j = 1, . . . , J) and

2Shavit (1978) applies the present methods to ethnic intermarriage.

4 16 ROBERT M. HAUSER

their own occupations or fathers' occupations at an earlier time (i = 1, . . . , I ) . In the context of mobility analysis the same cate- gories will appear in rows and columns, and the table will be square with I = J. For k = 1, . . . , K, let Hk be a mutually exclusive and exhaustive partition of the pairs ( i , j ) in which

where sii = 8, for (i, j ) E Hk, subject to the normalization nipi = njy j = nin,si, = 1. The normalization of parameters is a matter of convenience, and we choose the value of a so it will hold. Note that in contrast to the conventional structural model for counted data (Bishop and others, 1975, chap. 2), the interaction effects in Equa- tion (1) are not constrained within rows or columns even though the marginal frequencies are fitted exactly.3

Expected frequencies are the product of an overall effect (a), a row effect (Pi), a column effect (yi), and an interaction effect (Sij). The row and column parameters correspond to the concept of prevalence. They reflect occupational supply and demand, demo- graphic replacement processes, and past and present technologies and economic conditions. The cells ( i , j ) are assigned to K mutually exclusive and exhaustive subsets, and each of those sets shares a common interaction parameter (Sk). Thus, aside from total, row, and column effects, each expected frequency is determined by only one interaction parameter, which reflects the density of mobility or immobility in that cell relative to that in other cells in the table. The interaction parameters of the model correspond directly to the concept of the joint density of observations (White, 1963, p. 26), and they may be interpreted as indexes of the social distance between categories of the row and column classifications (compare Rogoff, 1953, pp. 31-32).

For the model to be informative, the allocation of interac- tion levels across cells of the table must form a meaningful pattern

''In the I x J cross-classification there are (I - 1 ) ( J - 1 ) degrees of freedom for two-way interaction. The conventional structural model yields two- way interaction effects for each of I x J counts by constraining the product of interaction effects within each row and within each column of the table; these constraints identify (I - 1) ( J - 1) independent interaction effects. Instead, the model of Equation ( 1 ) identifies the two-way interaction effects by constraining some of them to be equal across cells of the classification.

and one in which the interaction parameters are identified (Mason and others, 1973; Haberman, 1974, p. 217). Furthermore, the number of interaction parameters (K) should be substantially less than the number of cells in the table. I have found it difficult to interpret models where the number of interaction parameters is large relative to the number of categories in the occupational classification.

It may be helpful to present the model of Equation (1) in more than one form. There is a pronounced rightward skew in multiplicative effects because decreases are bounded by 0 and 1, while increases are unbounded. For this reason it is useful to take logs of frequencies and parameters and write the model in additive form; then incremental and decremental effects may each range from zero to infinity in absolute value. Let u = log a, u,(,, = log Pi, u ~ ( ~ ) = log yj, u ~ ~ ( ~ ~ ) = log aij, and = log 13~. The model is

where u ~ ~ ( ~ ~ ) = for ( i , j ) E Hk, and Hk is defined as before. Here the normalization of parameters is 2i~l( i) = 2j~2(i) = 2i2j~12(ij) = 0.

A slight variation of Equation (I), which I present in mul- tiplicative form, is more suggestive of the way in which I have manipulated counts for purposes of estimation and testing. With Hk defined as before, let

E[xij] = miik = aPiyisk for ( 2 , j ) E Hk (3)

and

m.. = 0 23k

subject to the normalization nibi = IIjyi = IIk8$ = 1, where nk is the number of cells assigned to the kth level. This version of the model suggests a three-dimensional representation of the original two-dimensional table in which IJ(K - 1) of the interior cells contain structural zeros, and the original IJ frequencies are fitted by row (Pi), column (yi), and level (ak) parameters, as under a model of quasi independence (Goodman, 1972c, p. 689; Bishop and others, 1975, pp. 225-226).

To estimate and test models of the present form I have represented cross-classifications as incomplete multiway arrays. I

4 18 ROBERT M. HAUSER

have used Fay and Goodman's (1973) computer program, ECTA, to estimate frequencies by iterative rescaling and to run tests of good- ness of fit (and other hypotheses). Under the assumption that the data were obtained by independent Poisson or simple multinomial sampling, the program computes maximum-likelihood estimates of the counts (Goodman, 1972c, pp. 663-667; Bishop and others, 1975, pp. 206-208). The likelihood ratio test statistic (G2) computed by the program is asymptotically distributed as x2 with degrees of freedom equal to IJ, the number of cells in the array that are not structural zeros, less the number of independent parameters that have been estimated. In many applications this will be IJ - 1 - ( I - 1) - ( J - 1) - (K - 1) = ( I - l ) ( J - 1) - (K - I), but it may be greater, depending on the arrangement of levels within the original two-way array (Bishop and others, 1975, chap. 5, esp. p. 227; Btland and Fortier, 1978). Great care should be used in computing degrees of freedom when the design specifies separable subtables (Bishop and others, 1975, chap. 5).

ECTA does not estimate parameters for models of incom- plete tables. I have estimated the (additive) parameters by regressing logs of estimated frequencies on dummy-variable representations of the rows, columns, and levels of the design. That is, I create a dummy variable for each row (but one), for each column (but one), and for each level (but one); then I regress logs of estimated fre- quencies on these three sets of variables. By construction this re- gression completely accounts for the estimated frequencies (Payne, 1977). I use an auxiliary program to renormalize the parameter estimates as deviations from the grand mean and to compute and display residuals. Using other packaged programs for the analysis of categorical data, one can estimate the models and obtain parameter estimates and measures of fit in a single pass by the methods of weighted least squares or maximum likelihood (Evers and Namboodiri, 1978; Goldthorpe and Payne, 1979). Moreover, a simple computational method (for example, median fitting) may suffice for diagnostic purposes (Tukey, 1977, chap. 11).

In presenting goodness-of-fit tests and comparing alternative models, it is convenient to use a single letter to denote each variable. I let P = father's occupation, S = son's occupation, and H = levels of interaction to which the several cells in the mobility table are

assigned in the model. Following the conventional notation, in which the highest-order marginal distributions fitted under a given model are listed in a series of parentheses, I denote the model by (P) (S) (H). Written in this form it is clear that the model is one of statistical independence, conditional on the assignment of cells in the P x S table to levels of H. Under the model the association between P and S is spurious; no association (quasi independence) between P and S occurs within levels of H (Goodman, 1972c, p. 689). One could think of the scheme as a latent-factor or latent- structure model in which the levels of H are latent classes (Good- man, 1974, p. 1231). However, the assignment of cells-and hence the assignment of observations to levels of H-is strictly determi- nistic, so the term manzfest class might be more fitting.

MOBILITY TO FIRST JOBS OF AMERICAN MEN

Table 1 gives frequencies in a classification of son's first full-time civilian occupation by father's (or other family head's) occupation at son's sixteenth birthday among American men who were aged 20 to 64 in 1973 and were not currently enrolled in school. The data were obtained in the Occupational Changes in a Generation (OCG) supplement to the March 1973 Current Popu- lation Survey (Featherman and Hauser, 1975, 1978).4 Table 2 is a convenient display of the final model for the data of Table 1. Each numerical entry in the body of the table gives the level of Hk to which the corresponding entry in the frequency table was assigned; one may think of the entries as subscripts of dummy variables pertaining to the density of interaction in the several regions of the table. Formally, the entries are merely labels; but for convenience in interpretation the numerical values are inverse to the estimated

4 T h e reported frequencies are based on a complex sampling design and have been weighted to estimate population counts while compensating for certain types of survey nonresponse. The estimated population counts have been scaled down to reflect underlying sample frequencies, and a further downward adjust- ment was made to compensate for departures of the sampling design from simple random sampling. The frequency estimates in Table 3 have been rounded to the nearest integer, but my computations are based on unrounded figures. I treat the adjusted frequencies as if they had been obtained under simple random sampling (Featherman and Hauser, 1978, app. B).

420 ROBERT M. HAUSER

TABLE 1 Counts in a Classification of Mobility from Father's

(or Other Family Head's) Occupation to Son's First Full-Time Civilian Occupation: U.S. Men Aged 20-64 in 1973

Son's Occupation

Father's Upper Lower Upper Lower Occupation Nonmanual Nonmanual Manual Manual Farm Total

Upper nonmanual 1.414 521 302 643 40 2,920 Lower nonmanual 724 524 254 703 48 2,253 Upper manual 798 648 856 1,676 108 4,086 Lower manual 756 914 771 3,325 237 6,003 Farm 409 35 7 441 1.61 1 1,832 4.650 Total 4,101 2.964 2.624 7,958 2,265 19.912

NOTE: Counts are based on observations weighted to estimate population counts and compensate for departures of the sampling design from simple random sampling. Broad occupation groups are upper nonmanual: professional and kindred workers, managers and officials. and non-retail sales workers; lower nonmanual: proprietors, clerical and kindred workers, and retail sales workers; upper manual: craftsmen, foremen, and kindred workers: lower manual: service workers. operatives and kindred workers, and laborers (except farm): farm: farmers and farm managers, farm laborers, and foremen.

density of mobility or immobility in the cells to which they refer. I offer no apriori rationale for the specification of interaction effects in Table 2; it is the outcome of a search procedure that I describe later.

The model says that, aside from conditions of supply and demand, immobility is highest in farm occupations (level 1) and next highest in the upper nonmanual category (level 2). If one takes the occupation groups as ranked from high to low in the order

TABLE 2 Asymmetric Five-Level Model of Mobility from Father's

(or Other Family Head's) Occupation to Son's First Full-Time Civilian Occupation

Son's Occupation Father's

Occupation

1. Upper nonmanual 2. Lower nonmanual 3. Upper manual 4. Lower manual 5. Farm

listed, one may say there are zones of high, almost uniform density bordering the peaks at either end of the status distribution. There is one zone of high density that includes upward or downward move- ments between the two nonmanual groups and immobility in the lower nonmanual group. Mobility from lower to upper nonmanual occupations (level 3) is more likely than the opposite movement, and the latter is as likely as stability in the lower nonmanual category (level 4). Moreover, the densities of immobility in the lower nonmanual category and of downward mobility to it are identical to those in the second zone of relatively high density, which occurs near the lower end of the occupational hierarchy. The second zone includes movements from the farm to the lower manual group and back, as well as immobility in the lower manual group. Last, there is a broad zone of relatively low density (level 5) that includes immobility in the upper manual category, upward and downward mobility within the manual stratum, mobility between upper manual and farm groups, and all movements between nonmanual and either manual or farm groups. The design says that an upper manual worker's son is equally likely to be immobile or to move to the bottom or top of the occupational hierarchy; obversely, it says that an upper manual worker is equally likely to have been recruited from any location in the occupational hierarchy, including his own.

It is worth noting that four of the five interaction levels recognized in the model occur along the main diagonal, and two of these (levels 4 and 5) are assigned both to diagonal and to off- diagonal cells. Thus immobility varies among occupational strata, and it is in some cases less likely than mobility. Moreover, with a single exception the design is symmetric. That is, net of row and column effects upward mobility is more prevalent than downward mobility within the nonmanual group. This asymmetry in the design is striking because it suggests the power of upper white-collar families to block at least one type of status loss; and because it is the only asymmetry in the design. For example, Blau and Duncan (1967, pp. 58-67) suggest that there are semipermeable class boundaries separating white collar, blue collar, and farm occupations that permit upward mobility but inhibit downward mobility. The only

422 ROBERT M. HAUSER

TABLE 3 Estimated Parameters (in Additive Form) in the Model of Table 2:

Mobility from Father's (or Other Family Head's) Occupation to Son's First Full-Time Civilian Occupation, U.S. Men Aged 20-64 in 1973

Category of Row, Column, or Level Design Factor (1) (2) (3) (4) (5)

Rows (father's occupation) -0.466 -0.451 0.495 0.570 -0.148

Columns (son's occupation) 0.209 0.190 0.240 1.020 -1.660

Levels (interaction) 3.044 1.234 0.549 0.243 -0.356

Grand mean = 6.277.

asymmetry in the present design occurs within one of the broad classes delineated by Blau and Duncams

Table 3 gives the row, column, and interaction effects estimated in the 1973 OCG data under the model of Table 2 for intergenerational mobility to son's first job. The estimates are expressed in additive form; that is, they are effects on logs of frequencies under the model of Equation (2). The row and column effects clearly show an intergenerational shift out of farming and into white-collar or lower blue-collar occupations. These reflect temporal shifts in the distribution of the labor force across occupa- tions, differential fertility, and life-cycle differences in occupational positions.

The interaction effects show very large differences in mo- bility and immobility across the several cells of the classification, and these differences closely follow my interpretation of the display in Table 2. Differences between interaction effects may readily be interpreted as differences in the log of the estimated frequency, net of row and column effects. For example, the estimates say that the immobility in farm occupations (at level 1) is 3.40 = 3.044 - (-0.356) greater (in the metric of logged frequen- cies) than the estimated mobility or immobility in cells assigned to interaction level 5 in Table 2. In multiplicative terms, immobility in farm occupations is e3.40 = 29.96 times greater than mobility or

jThis observation is elaborated by Featherman and Hauser (1978, pp. 177-180).

immobility at level 5. It would be incorrect to attach any impor- tance to the signs of the interaction effects reported in Table 3, for they merely reflect our normalization rule that interaction effects sum to zero (in the log-frequency metric) across the cells of the table. For example, while the effects of levels 4 and 5 each reflect relatively low densities, it is not clear that either effect indicates "status disinheritance" in the diagonal cells to which it pertains (compare Goodman, 1969a, 1969b).

In any event the effects do show a sharp density gradient across interaction levels. The smallest difference, between levels 3 and 4, indicates a relative density e0.M9-0.243 = e0.306 = 1.36 times greater at level 3 than at level 4. Immobility in farm occupations and in upper nonmanual occupations is quite distinct from densities at other levels, but also immobility in the farm occupations is e3.044-1.234 = = 6.11 times greater than in the upper non- manual occupations.

EVALUATING THE MODEL

The model of Table 2 provides less than a complete de- scription of the mobility data in Table I . Under the model of statistical independence the likelihood ratio statistic is G2 = 6,167.7, which is asymptotically distributed as x2 with 16 degrees of free- dom. With the model of Table 2 as null hypothesis, G2 = 66.5 with 12 degrees of freedom, since 4 degrees of freedom are used to create the 5 categories of H. By the usual inferential standards the model does not fit-the probability associated with the test statistic is very small. On the other hand, the model does account for 98.9 percent of the association in the data-that is, of the value of G2 under independence. Given the extraordinarily large sample size, small departures from frequencies predicted by the model are likely to be statistically significant.

Exact tests of the difference between any two interaction parameters can be carried out in a straightforward way. First modify the model to give the two groups of cells to be contrasted the same interaction parameter, then fit the revised model. Since the revised model places an additional equality constraint on two parameters of the initial model, leaving it otherwise unchanged, the difference between the likelihood-ratio x2 statistics (G2) of the two

424 ROBERT M. HAUSER

models will be distributed as X2 with 1 degree of freedom. If one combines levels 1 and 2 of the present model, for example, the revised model yields G2 = 676.3 with 13 degrees of freedom, so one rejects the hypothesis that immobility is the same in the farm and upper white-collar categories with G2 = 676.3 - 66.5 = 609.8 with 1 degree of freedom.

By examining residuals one can more fully evaluate the fit and perhaps see how to improve the model. Table 4 displays a measure of lack of fit for each cell of the mobility classification. It expresses residuals as natural logs of the ratios of observed frequen- cies to frequencies estimated under the model:

h h

log(eij) = log(xij/mij) = log xij - log mij

where xij is the observed frequency and Gij is the estimated fre- quency in the ijth cell. As long as the residuals are small, say, less than t 0 . 2 0 , they can be interpreted approximately as proportions. Thus, expressed in this way, the residuals have a convenient inter- pretation, and positive and negative deviations are expressed sym- metrically in the metric of the (log linear) model. For example, the entry of 0.06 in cell (3,l) says the observed mobility from upper manual to upper nonmanual occupations is e0.06 = 1.06 times the mobility estimated under the modele6 The entry of -0.17 in cell

Gunsubscripted e is the base of natural logarithms and should not be confused with the sample residuals in the multiplicative model, eij = xij/Gij.

TABLE 4 Log of Ratio of Observed to Estimated Frequencies in the Model of Table 2: Mobility from Father's (or Other Family Head's) Occupation to Son's First

Full-Time Civilian Occupation, U.S. Men Aged 20-64 in 1973

Son's Occupation Father's

Occupation 1 2 3 4 5

1. Upper nonmanual O.OOa 0.01 0.02 -0.01 -0.10 2. Lower nonmanual O.OOa 0.00 -0.17 0.06 0.06 3. Upper manual 0.06 -0.13 0.10 -0.01 -0.07 4. Lower manual -0.07 0.14 -0.08 0.00 0.04 5. Farm 0.03 -0.09 0.08 -0.01 O.OOa

a Fitted exactly under the model

(2,3) says mobility from lower nonmanual to upper manual occu- pations is e-0.17 = 0.84 times the mobility estimated under the model-that is, 16 percent less. As suggested by these two examples, the approximation is better when the residual is small.

Under the model of Table 2 cells (1,1), (2,1), and (5,5) are fitted exactly, each by its own parameter. The fourth level-cells (1,2), (2,2), (4,4), (4,5), and (5,4)-is also fitted closely. At this level, the largest deviation is the 4 percent underestimate of movement from lower manual to farm occupations. Every other deviation at level 4 is less than 1 percent. The lack of fit in the model occurs primarily at level 5 of the design. There is a positive deviation of 0.10 in the one diagonal cell (3,3) assigned to level 5, so immobility in the upper manual (skilled) occupations is not quite so low as in some other cells at the same level. At the same time the largest positive residual at level 5 is that for upward mobility from lower manual to lower nonmanual occupations. The two largest negative residuals at level 5 pertain to the exchange between upper manual and lower nonmanual occupations (cells (3,2) and (2,3)). Even relative to the low density (presumed by the model) throughout level 5, movement between the skilled and lower white-collar occupations appears to be blocked. This is more striking because there is no similar hindrance to exchange between the skilled and upper white-collar occupations (cells (1,3) and (3,l)) or between the lower manual and lower nonmanual occupations (cells (4,2) and (2,4)). From the entries in Table 3 one might argue that the model and the fit could be improved by creating a sixth interaction level to include cells (3,2) and (2,3) and, possibly, (1,5), which indicates a very low rate of mobility from upper nonmanual origins to first jobs in farming.

The residuals in Table 4 are in a metric that facilitates interpretation and comparison, but they take no account of sam- pling variability, which is inverse to expected frequency. Perhaps the simplest way to take account of sampling variability in the residuals is to form the ratio

2. . = (xij - mhij)/ 23 (6)

which is the square root of the component of the Pearson chi-square statistic for each cell of the table. The zij are (roughly) interpretable

426 ROBERT M. HAUSER

TABLE 5 Standardized Errors in the Model of Table 2: Mobility from Father's

(or Other Family Head's) Occupation to Son's First Full-Time Civilian Occupation, U.S. Men Aged 20-64 in 1973

Son's Occupation Father's

Occupation 1 2 3 4 5

1. Upper nonmanual O.OOa 0.26 0.29 -0.24 -0.60 2. Lower nonmanual O.OOa 0.04 -2.76 1.71 0.45 3. Upper manual 1.60 -3.38 2.80 - 0 -0.74 4. Lower manual - 1.97 4.14 -2.28 -0.05 0.60 5. Farm 0.65 - 1.62 1.58 -0.32 O.OOa

aFitted exactly under the model.

as unit normal deviates7 Since there are several more cells in the table (25) than degrees of freedom under the model (12), the expected value of z:j is considerably less than unity. However, I have not made a correction for that here. (See Bishop and others, 1975, pp. 135-141.)

Table 5 displays standardized residuals from the model of Table 2. Again one is impressed with the close fit at level 4 and the heterogeneity at level 5 of the model. The interpretation of these residuals must be tempered by the results in Table 4, for the standardized residuals are not in the metric of the model. Taken in conjunction with earlier results, Table 5 also suggests a respecifica- tion of the model in which, as a first step, cells (2,3), (3,2), and possibly other negative outliers would be assigned to a separate level.

MOBILITY RATIOS

One other index is particularly useful in evaluating the specification of interaction effects. From Equation ( I ) , observed

' Larntz (1978) has shown that zij has better small-sample properties than do components of @ (the likelihood-ratio test statistic) or Freeman-Tukey deviates.

frequencies may be expressed in terms of estimated parameters and residuals:

Divide both sides of Equation (7) by the first three terms on the right-hand side to obtain

I call R:j the mobilit_y ratio. In the case of diagonal cells R t is equiva- lent to the new immobility ratio proposed by Goodman (1969a, 196913, 1972c; see also Pullum, 1975, pp. 7-8), but I suggest the ratio be computed for all cells of the table as an aid in the evaluation of model design. If the model is specified correctly, the estimated

h

interactions (aii) will be more useful in interpretation than the R:i, for the latter will confound interaction effects with sampling errors (eii). On the other hand, when the model is not correctly specified, the residuals (eii) will reflect specification error as well as sampling variability. For this reason the R;i can be useful in revising a model that does not fit the data.

Conceptually, RTj is related to Rij, Rogoff's (1953) social distance mobility ratio and Glass's (1954) index of association:

where N is the sum of observed counts and xi and x.j are, respec- tively, sums of counts in the ith row and in thej th column. Both Rij and R:i may be interpreted as ratios of observed counts to those estimated from a scale factor and row and column effects under a given statistical model (see Hauser, 1978, pp. 923-924). Indeed, R.. 23 = R:i in the special case of the model of simple statistical independence, which specifies no interaction effects. In terms of Equation (I) , R> becomes Rii when we specify Sij = 1 for all i and j .

As a measure of interaction, Rij has several undesirable properties. Contrary to supposition (Rogoff, 1953, p. 32), Rij is not independent of prevalence (row and column) effects; it varies inversely as the marginal proportions in the i th row andj th column. The maximum of Rii is the reciprocal of the larger of the marginal proportions in the i th row and j t h column. Moreover, the set of Rij

428 ROBERT M. HAUSER

for a square table determines the row and column marginal distri- butions. This renders Rij useless in comparing interaction effects across tables with differing marginal distributions, for the multiple sets of Rij cannot take on values corresponding to the hypothesis of no change. Furthermore, the Rij cannot be symmetric across the main diagonal (Rij = Rji)-showing, for example, equal propensi- ties toward upward and downward mobility-unless the observed counts are symmetric (xij = xj i ) Thus propensities toward upward and downward mobility cannot appear to be the same unless the frequencies of upward and downward mobility are the same and, consequently, the two marginal distributions are the same (Blau and Duncan, 1967, pp. 93-97; Tyree, 1973). These undesirable properties all arise because the model of simple statistical indepen- dence does not fit the data, so Rij confounds prevalence effects (of rows and columns) with interaction effects (Goodman, 1969b). Hauser (1978, pp. 939-941, 943; Featherman and Hauser, 1978, pp. 156-161) shows that the marginal proportions and the Rij may be expressed as weighted sums of the prevalence and interaction effects under a model that fits the data. Except under statistical indepen- dence, R:j has more desirable properties than Rij.

To illustrate the use of the R$ and their differences from the Rij, Table 6 gives these indexes for the counts of mobility to first jobs. Obviously, the pattern of the R:j conforms to our earlier description of the design. Moreover, as may not have been obvious from the Sij (Table 3) and the rij (Table 4) taken separately, the fit is good enough so there is no overlap in interactions across levels recognized in the design. For example, if immobility among skilled workers-in cell (3,3)-is high relative to mobility in other cells at level 5 in Table 2, the immobility in that category is still substan- tially less than the immobility in any other occupation group. Again level 5 appears to be heterogeneous, but I have not carried the analysis of Table 1 beyond the model of Table 2.

From the Rii in Table 6 one would conclude (correctly) that there is substantial immobility at both the top and the bottom of the occupation hierarchy, but not nearly so much immobility as is indicated by the R,Tjj. The Rij also show status immobility in the three middle occupation groups, but less in the lower manual than in the other two categories. In contrast the RTj show a very low level

TABLE 6 New Mobility Ratios (RL) Under the Model of Table 2 and Old Mobility

Ratios (Rij) Under the Model of Simple Independence: Mobility from Father's (or Other Family Head's) Occupation to Son's First Full-Time Civilian

Occupation, U.S. Men Aged 20-64 in 1973

Father's Occupation

R; 1. Upper nonmanual 2. Lower nonmanual 3. Upper manual 4. Lower manual 5 . Farm

Ri, 1. Upper nonmanual 2. Lower nonmanual 3. Upper manual 4. Lower manual 5. Farm

Son's Occupation

1 2 3 4 5

of immobility in the upper manual group, and they show moderate and roughly equal levels of immobility in the lower nonmanual and lower manual groups. Both sets of ratios show greater than expected interchange between the upper and lower nonmanual groups, with the upward flow exceeding the downward flow. The Rij show asymmetric flows between the lower manual and farm groups, both of which are below expectation; but between these same two groups the R:i show roughly equal flows that are larger than those expected from row, column, and scale effects. With a single exception the Rii decline regularly as one moves away from the main diagonal in any row or column, but the R:j are low and fluctuate irregularly outside the eight cells in the upper left and lower right corners of the table. Outside those same corners, four of the Rij-in cells (3,2), (3,3), (3,4), and (4,2)-show greater frequencies than expected, but none of the R,Ti show greater frequencies than expected. With a single exception the Rij are greater in size in corresponding cells below than above the main diagonal, and this suggests a preponderance of upward relative to downward mobility. Excepting the one asymmetry in the design matrix, the R,Ti are roughly the same size in corresponding cells above and below the main diagonal.

430 ROBERT M. HAUSER

It is obvious that the Rii, which are based on a statistical model that does not fit the data (G2 = 6,167.7 with 16 degrees of freedom), show a substantially different pattern of interaction than the R;, which are based on a statistical model that fits the data moderately well ( G L 66.5 with 12 degrees of freedom). I have emphasized the undesirable properties of the Rij because they have been so long and so widely used, yet other well-known measures of interaction have similar undesirable properties. For example, Hauser (1978, pp. 943-949) and Featherman and Hauser (1978, pp. 161-166) show that both the conventional saturated multiplicative model (Bishop and others, 1975, p. 24) and multiplicative stand- ardization to uniform marginal distributions (Mosteller, 1968) yield patterns of interaction that more closely resemble the patterns of the Rij than those of the Rh. While R,T has been introduced in the context of a model that fits rather well, the reader should bear in mind that it is primarily useful when the specification of the model is in doubt. I turn now to such exploratory applications of mobility ratios and related measures, beginning with search procedures leading to the model of Table 2.

MODEL SPECIFICATION UNDER QUASI INDEPENDENCE

One can analyze the pattern of interaction in a table by temporarily ignoring those cells of the classification that contribute most to the confounding of interaction effects with row and column effects. In Goodman's (1965, 1969a) terms, one "blanks out" those cells and fits models of quasi independence to the remaining cells. Equivalently, in my multiplicative model for the full table, I fit one parameter to each cell that is to be ignored, and I assign the remaining cells to a single level of interaction. Table 7 displays these equivalent specifications for three models of quasi independence in the table of mobility to first jobs.

Model Q1 is the quasi-perfect mobility model (Goodman, 1965, 1969a). It ignores (or fits exactly) the frequencies on the main diagonal, which represent occupational inheritance relative to the five-category occupational classification. Under the null hypothesis there is no interaction in the remainder of the table, which is coded at level 1 in the displays both for the full and for the partial tables. Frequencies are estimated in those cells by iteratively rescaling a

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 43 1

TABLE 7 Three Models of Quasi Independence in the 5 X 5 Mobility Table

Son's Occupation

Father's Occupation

Model Q1 1 2 3 4 5

Model Q2 1 2 3 4 5

Model Q3

Partial Table Full Table

matrix containing 1's at level 1 and 0's elsewhere-that is, the Q1 array in the left-hand panel of Table 7-to the observed marginal frequencies in the fitted cells (Bishop and others, 1975, p. 189). The multiplicative rescaling procedure preserves both the observed marginal frequencies and the hypothesized lack of interaction in the fitted portion of the table. Again it is convenient to estimate models of this form using ECTA. While there are 25 degrees of freedom in the 5 x 5 table, 9 degrees of freedom are used in fitting row, column, and scale effects, and another 5 degrees of freedom are used in fitting the six-level Q1 model. Thus, under the null hypothesis that there is no interaction off the main diagonal, there are 25 - 9 - 5 = 11 degrees of freedom for error.

Table 8 summarizes the fit of maximum-likelihood estimates of the independence model and other multiplicative models of the 5 x 5 table of mobility to first jobs (Table 1). Clearly model Ql accounts for much of the interaction in the table. While G2 = 683.06 is still very large relative to its degrees of freedom, it is

432 ROBERT M. HAUSER

TABLE 8 Summary of Fit of Selected Multiplicative Models: Mobility from Father's

(or Other Family Head's) Occupation to Son's First Full-Time Civilian Occupation, U.S. Men Aged 20-64 in 1973

Model Na G DF 1 G$/G;

Independence 19,913 6167.69 16 20.1 100.0 Q1 -main diagonal blocked 11,963 683.06 11 5.5 11.1 Q2-diagonal and intrastratum moves blocked 8,869 50.05 7 1.4 0.8 Q3-diagonal and inner diagonals blocked 5,520 15.67 3 0.6 0.3

a Sum of frequencies excluding those fitted exactly under the model.

only about one ninth the value of G2 under simple independence. Furthermore, while the simple independence model misallocates 20.1 percent of the joint distribution of fathers and sons (as indi- cated by the index of dissimilarity, A, in the fourth column of Table 8), model Q1 misallocates only 5.5 percent of the observations.

The first panel of Table 9 presents ratios of observed fre- quencies to those estimated at level 1 of model Q1 (within the zone of quasi independence specified in the model). Although the diago- nal cells are not assigned to level 1 of model Ql , I have also calculated ratios of observed to estimated frequencies in the diago- nal cells. The diagonal entries are the indexes of immobility pro- posed by Goodman (1969a, 1969b); they are ratios of the observed frequencies to those frequencies that would have been estimated in the main diagonal if the quasi-independence hypothesis held there. In other words, they are ratios of observed frequencies to those estimated from the row, column, and scale effects at level 1 of the design. These estimated frequencies for "blanked out" cells are not produced directly by the computer program (ECTA) used to esti- mate models Ql, Q2, and Q3. With a simple model in a small table it is convenient to compute them from the estimated frequencies in the cells where quasi independence is presumed to hold. Under the null hypothesis all the odds ratios within the zone of quasi inde- pendence are equal to unity (Goodman 1968, 1969a). Thus if one knows only three estimated frequencies in a 2 x 2 subtable of the

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 433

TABLE 9 Ratios of Observed Frequencies to Estimated Frequencies at

Quasi-Independent Level under Three Models of Quasi Independence: Mobility from Father's (or Other Family Head's) Occupation to Son's First

Full-Time Civilian Occupation, U.S. Men Aged 20-64 in 1973

Father's Occupation

Model Q1 1. Upper nonmanual 2. Lower nonmanual 3. Upper manual 4. Lower manual 5. Farm

Model Q2 1. Upper nonmanual 2. Lower nonmanual 3. Upper manual 4. Lower manual 5. Farm

Model Q3 1. Upper nonmanual 2. Lower nonmanual 3. Upper manual 4. Lower manual 5. Farm

Son's Occupation

aCells ignored (or fitted exactly) under the model.

full table, one can compute the fourth estimated frequency by setting the odds ratio in the subtable equal to 1. For example, Table 10 shows the estimated frequencies in each cell of the mobility table under model Ql. To estimate the frequency in cell (1,1), one could use the entries in cells (1,2), (3,1), and (3,2) to write

Other combinations of cells could be used to obtain the same estimate (within the limits of rounding e r r ~ r ) . ~

In more complex models other methods may be needed to estimate the missing frequencies, such as those used to estimate parameters for models of the full table. The manual computations are often convenient, and ECTA converges more rapidly when cells with unique parameters are ignored than when the program is forced to fit them exactly.

434 ROBERT M. HAUSER

TABLE 10 Estimated Frequencies under Model Q1: Mobility from Father's (or Other Family Head's) Occupation to Son's First Full-Time Civilian Occupation,

U.S. Men Aged 20-64 in 1973

Son's Occupation Father's

Occupation 1 2 3 4 5

1. Upper nonmanual 372.3a 344.1 285.6 81 1.5 65.1 2. Lower nonmanual 419.5 387.7a 321.8 914.5 73.3 3. Upper manual 754.9 697.7 579.0a 1,645.5 131.9 4. Lower manual 934.8 864.0 717.0 2,037.7a 163.3 5, Farm 578.6 534.8 443.8 1.261.2 101.la

aCells ignored in estimating the model.

The ratios of observed to estimated frequencies in Table 9 are not mobility ratios (Rh), but they differ from the R;i by only a scalar multiple. That is, I have expressed the R,Ti as deviations from a scale factor (or grand mean) for the full table, but the ratios in Table 9 are expressed as deviations from estimated frequencies at level 1 of the design. With the understanding that a change in normalization (shift in origin) has occurred, I shall refer to the entries in Table 9 as mobility ratios.

Under model Q1 the mobility ratios show a pattern of association which is somewhere between that displayed by the Rii and the R*. (see Table 6). Relative to the Rii, the ratios in Table 9

a? are larger in cells (1,l) and (5,5); they are also relatively larger in cells (4,5) and (5,4) and, to a lesser degree, in cells (1,2) and (2,l). The ratios for model Ql do not appear to fall as rapidly as do the Rij as one moves away from the main diagonal. At the same time there is still a relatively high ratio in the central diagonal cell (3,3).

The fit of model (21 is not very close, and there are relatively large mobility ratios in four of the cells that were not fitted exactly under model Q1-(1,2), (2,1), (4,5), and (5,4). For these reasons I specify that model Q2 ignores those four cells as well as the diagonal cells. Thus model Q2 has only 7 degrees of freedom for error under the null hypothesis. White (1963) and Pullum (1975) advocate models of this form; see Fienberg (1976) for an evaluation of this specification as applied by Pullum. The fit is much improved under model Q2, so one would expect the mobility ratios to be more

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 435

informative. Under model Q2, G2 = 50.05, which is only 0.8 per- cent of its value under simple independence. The model mis- classifies only 1.4 percent of the joint frequency distribution of fathers' and sons' occupations.

The mobility ratios for model Q2 in Table 9 show a pattern that is far more like that of the R> in Table 6. One problem is the relatively high ratio in cell (3,3), but there is otherwise little varia- tion in the ratios outside the intersections of rows 1 and 2 with columns 1 and 2 and of rows 4 and 5 with columns 4 and 5. Moreover, the change in specification has again increased the mobility ratios in cells (1,l) and (5,5) relative to those under simple independence.

In model Q3 I ignore all the cells on the main diagonal and on the adjacent minor diagonals. Here the fit is rather close: G2 = 15.7 with 3 degrees of freedom and A = 0.6. In obtaining this fit I ignore (or fit exactly) the cells containing about three-fourths of the observations, but my purpose is not to fit the data both closely and parsimoniously. Rather, I am trying to obtain diagnostic measures of interaction. Relative to the standard set by the pattern of R:j in Table 6, model Q3 is very helpful in uncovering the pattern of interaction in Table 1. The mobility ratios in panel Q3 of Table 9 show all the major features of the display in Table 2 and those of the R t in Table 6.

The lesson in this illustrative analysis is that diagnostic or exploratory analysis of a classification may be improved by ignoring large parts of the classification. It may be better to ignore too much than too little of the classification-provided one is left with enough information at the end to construct diagnostic measures for the full table. In the present case, I would specify models Ql, Q2, and (23 in advance and look at the mobility ratios only under model Q3- because it fits best-as a guide to specification of a more parsimo- nious model. By grouping cells with similar mobility ratios under model Q3, I can write the model of Table 2 by inspection.

ITERATIVE RESPECIFICATION OF THE MODEL

Rather than "blanking out" selected cells, one may start with an a priori model or one obtained from analyses of some other

436 ROBERT M. HAUSER

classification. After each round of estimation the mobility ratios (R;) are examined, and the model may be accepted or revised. To illustrate this procedure, I introduce a second set of data from the 1973 OCG survey-a 5 x 5 classification of mobility from first full-time civilian occupation to current occupation among men aged 20 to 64. The counts are displayed in Panel A of Table 1

In developing a model to fit these data I wanted to eliminate

The classification, weighting, resealing, and rounding of these counts follow procedures described in footnote 4.

TABLE 11 Counts in Observed and Smoothed Classifications of Mobility from First

Full-Time Civilian Occupation to Current Occupation: U.S. Men Aged 20-64 in 1973

Current Occupation First

Occupation 1 2 3 4 5 Total

A. Unsmoothed Counts 1. Upper nonmanual 3,309 320 239 258 31 2. Lower nonmanual 1,134 768 450 634 29 3. Upper manual 448 235 1,305 628 28 4. Lower manual 1,036 837 2,019 3,966 163 5. Farm 158 149 454 860 519 Total 6,085 2,309 4,467 6,346 770

B. Age-Homogeneous Quasi-Symmetric Counts 1. Upper nonmanual 3,321 332 269 228 8 2. Lower nonmanual 1,129 769 472 622 24 3. Upper manual 420 216 1,311 665 32 4. Lower manual 1,053 845 1,974 3,956 192 5. Farm 163 148 439 877 514 Total 6,086 2,310 4,465 6,348 770

C. Age-Homogeneous Counts 1. Upper nonmanual 3,321 323 234 248 31 2. Lower nonrnanual 1,137 769 452 628 30 3. Upper manual 457 237 1,311 612 29 4. Lower manual 1,030 839 2,027 3,957 167 5. Farm 141 142 442 903 513 Total 6,086 2,310 4,466 6,348 770

NOTE: Counts are based on observations weighted to estimate population counts and compensate for departures of the sampling design from simple random sampling. Broad occupation groups are defined in Table 1. Marginal totals are not consistent across Panels A, B, and C because adjusted or smoothed counts were rounded independently.

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 437

the association between mobility and age. Furthermore, for reasons of parsimony I did not want to introduce unnecessary asymmetries into the model. That is, I wanted to assume equality of the interac- tions between each pair of occupations (6ii = 6ji for all i, j ) unless the data provided clear contrary evidence. For these reasons I did not develop a model using the counts in Panel A of Table 11. Instead I worked with two-way tables of counts from which fluctu- ations of interactions (across ages and across the main diagonal of the classification) had been removed.

The fit of models used to smooth the data is summarized in Panel A of Table 12. I begin with the three-way classification of first occupation (W) by current occupation (S) by 5-year age group (A). To this design I add a fourth dimension (R): one category composed of the initial three-way classification and the second of counts transposed across the main diagonal of each age-specific mobility subtable.1° Line A1 reports the fit of the model of conditional independence of mobility within each age group, (WAR) (SAR). In this model, the margins of the mobility subtables are permitted to vary by age; furthermore, occupational origin and destination distributions may differ within each age group, but there are no interactions between first and current occupations.ll The model of Line A1 does not fit the data, but it is a useful baseline for compar- ison with other models.

The model of Line 2 in Panel A introduces a set of age- constant interactions between first and current occupations. By construction these interactions are also symmetric across the main diagonal of each age-specific mobility subtable; the expected fre- quencies are quasi-symmetric.12 The model of Line A2 can also be rejected at any conventional level of statistical significance. (When

Each count appears twice in the four-way classification, so it is neces- sary to rescale the test statistic by dividing each count by 2 or by dividing G2 or X2 by 2. Since the diagonal entries are unaffected by transposition and play no part in the test of quasi symmetry, this procedure is equivalent to that proposed by Bishop and others (1975, pp. 289-290).

l1 The model ( W A R ) ( S A R ) in the augmented classification is equivalent to the model ( WA) ( S A ) in the initial three-way classification. Likewise, the model in Line 3 in Panel A, ( W A R ) ( S A R ) ( W S R ) , is equivalent to the model ( W A ) ( S A ) ( W S ) in the initial three-way classification.

l2 See Bishop and others (1975, pp. 286-288) for further discussion of statistical aspects of quasi symmetry. Featherman and Hauser (1978, pp. 184-187) discuss its substantive implications for mobility analysis.

438 ROBERT M. HAUSER

its degrees of freedom are large, G2 is approximately normally distributed with expected value equal to its degrees of freedom and variance equal to twice its degrees of freedom.) At the same time the model of Line A2 does account for all but 3 percent of the associa- tion (G2) under the baseline model. Furthermore, as shown by the index of dissimilarity (A), it misallocates only 3.7 percent of the frequency distribution, compared to a misallocation of 28.3 percent of the distribution under the baseline model. I take a mobility subtable of expected frequencies under the model of Line A2 and rescale its marginal totals to those of the classification for men of all ages. This age-homogeneous, quasi-symmetric set of counts, dis- played in Panel B of Table 11, is the starting point for my analysis of mobility from first to current occupations.

To anticipate asymmetries in the age-smoothed data, I also fit a model of age-constant interaction (without the constraint of quasi symmetry). The fit of this model, reported in Line A3, is slightly better than that in Line A2, but the model is still rejected at conventional significance levels. Thus the interactions between first and current occupations differ across ages beyond the chance level. To obtain age-smoothed counts I take a mobility subtable of ex- pected frequencies under the model of Line A3 and rescale its marginal totals to those of the classification for men of all ages. These counts are given in Panel C of Table 11.

Line A4 of Table 12 contrasts the models of Lines A2 and A3 to test the hypothesis of quasi symmetry in the age-constant interactions between first and current occupations. This hypothesis also is rejected with G2 greater relative to its degrees of freedom than in the test (Line A3) for age homogeneity. At the same time, departures from quasi symmetry account for a very small fraction of G2 under the baseline model, so I do not expect to find substantial asymmetry in the final model.

In fitting the data I work from simple models for smoothed counts toward more elaborate models for observed counts. I want to introduce only those parameters that are needed to fit major and persistent features of the data. Thus I start by fitting the age-homo- geneous, quasi-symmetric counts (Panel B of Table 11). As shown in Line B 1 of Table 12, the model of simple independence does not fit this classification (G2 = 9,778.2 with 10 degrees of freedom);

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 439

TABLE 12 Summary of Fit of Selected Models of Mobility from First to Current

Occupation: U.S. Men Aged 20-64 in 1973

Model G ' D F A G&/GF

A. Men 20 to 64 by 5-Year Age Groups 1. (WAR)(SAR) 9,780.9 144 28.3 100.0 2. ( WAR)(SAR)(WS) 288.5 134 3.7 3.0 3. (WAR)(SAR)(WSR) 220.3 128 3.4 2.3 4. Line 2 vs. line 3 68.2 6 0.3 0.7

B. Men 20 to 64: Aggregate Age-Smoothed Quasi-Symmetric Data 1. (W)(s) 9,778.2 10 28.1 100.0 2. (W)(S)(H1) 2,945.6 9 15.3 30.1 3. (W)(s)(H2) 518.3 7 4.5 5.3 4. (W)(S)(Hd 179.0 6 3.3 1 .8 5. (W)(s)(H4) 198.3 7 3.3 2.0 6. (W)(s)(H5) 19.9 6 1 .O 0.2

C. Men 20 to 64: Aggregate Age-Smoothed Data 1. (W)(s) 9,838.9 16 28.6 100 .O 2. (W)(S)(H5) 89.1 12 1.6 0.9 3. (W)(S)(H,) 50.4 12 1.5 0.5 4. ( W)(S)(H,) 18.0 11 1 .O 0.2

N O T E : W = first job; S = current job; R = transpose; A = age; H,, = model of interaction.

note that the classification has 10 degrees of freedom, rather than 16 degrees of freedom for interaction by virtue of the constraint of quasi symmetry. I expect occupational persistence to be greater within the career than across generations; moreover, from the analysis of Table 1 I do not expect to find a marked social distance gradient in the interactions between occupations. With these ideas in mind I begin with the very simple model HI, shown in Panel A of Table 13. The design has only two levels of interaction: one pertaining to each diagonal cell and the other pertaining to each off-diagonal cell.13

As shown in line 2 in Panel B of Table 12, model HI with 1 degree of freedom accounts for nearly 70 percent of the association under independence, but a very large and significant component of G2 remains unexplained. The right-hand columns of Table 13

l3 Some interesting properties of this model are discussed by Goodman (1979a).

440 ROBERT M. HAUSER

TABLE 13 Interaction Levels and Residuals for Selected Models of

Mobility from First to Current Jobs: U.S. Men Aged 20-64 in 1973

Data and Levels in Model Model of Interaction Residuals (log R:,)

A. Age-Homogeneous. Quasi-Symmetric Data 1 2 2 2 2 1.27 -0.06 - 1.04 2 1 2 2 2 0.17 0.76 -0.50 2 2 1 2 2 -0.44 -0.13 0.91 2 2 2 1 2 -0.66 0.10 0.18 2 2 2 2 1 - 1.60 -0.72 -0.40

1 3 4 4 4 1.60 0.35 -0.61 3 2 3 3 4 0.37 1.03 -0.20 4 3 2 3 4 - 0 5 5 -0.16 0.90 4 3 3 2 3 -0.72 0.11 0.21 4 4 4 3 1 - 1.74 -0.79 -0.44

1 3 5 5 5 2.96 0.65 -0.31 3 2 4 4 5 0.62 0.23 - 1.01 5 4 2 4 5 -0.30 -0.98 0.07 5 4 4 2 3 -0.49 -0.72 -0.62 5 5 5 3 1 -0.51 -0.62 -0.28

1 3 4 4 4 2.67 0.59 -0.37 3 2 4 4 4 0.57 0.42 -0.83 4 4 2 4 4 -0.36 -0.80 0.25 4 4 4 2 3 -0.55 -0.54 -0.44 4 4 4 3 1 -0.79 -0.66 -0.33

1 2 4 4 5 2.73 0.73 -0.35 2 2 5 4 5 0.74 0.65 -0.71 4 5 3 4 4 -0.33 -0.70 0.23 4 4 4 3 3 -0.48 -0.40 -0.43 5 5 4 3 1 -0.84 -0.64 -0.43

B. Age-Homogeneous Data 1 2 4 4 5 2.73 0.70 -0.49 2 2 5 4 5 0.75 0.65 -0.75 4 5 3 4 4 -0.25 -0.61 0.23 4 4 4 3 3 -0.50 -0.41 -0.40 5 5 4 3 1 -0.99 -0.68 -0.42

1 2 4 4 2 2.72 0.67 -0.52 2 2 5 4 4 0.68 0.58 -0.82 4 5 3 4 4 -0.31 -0.69 0.16 4 4 4 3 3 -0.57 -0.49 -0.47 5 5 4 3 1 - 1.06 -0.77 -0.50

1 2 4 4 2 2.76 0.75 -0.47 2 2 5 4 4 0.77 0.69 -0.74 4 5 3 5 4 -0.39 -0.73 0.08 4 4 4 3 3 -0.56 -0.45 -0.47 6 5 4 3 1 - 1.06 -0.74 -0.50

display the logs of mobility ratios under model HI for each cell of the classification. Recall from Equation (8) that R:i reflects both interaction and error; thus it should not be surprising that the residuals are asymmetric even though the interactions (by con- struction) are symmetric. This asymmetry reflects poor fit.

Using the residual rnatrix (log R;) from model HI as a guide, I write the revised model H, and refit the data. The respecification distinguishes densities in cells (1,l) and (5,5) from those in other diagonal cells. Off the diagonal it makes a rough distinction between short and long-distance mobility. Using two additional parameters, model H2 improves the fit substantially, as shown in Line B3 of Table 12. Again I examine the residual matrix and respecify the model as in H3. I continue in this fashion until I obtain H5: for which G2 = 19.9 with 6 degrees of freedom. I shall not describe the intervening respecifications in detail, but merely observe that heterogeneity of the residuals within levels (including asymmetry) is reduced as the fit improves. Moreover, the fit im- proves little from H3 to H4 (compare Lines B4 and B5 of Table 12), but it improves markedly at the next step.

The matrix of residuals (log R:i) is not the only information I use to revise the model. I also find it helpful to look at a (com- puter-generated) display of the data in which residuals are ranked from largest to smallest. For example, Table 14 reproduces the display obtained from model HI. Column 1 gives the rank of the R;, whose logs are given in column 8. Columns 2, 3, and 4 identify the row, column, and level of each cell; columns 5 and 6 give the observed and expected frequencies, respectively. The expected frequencies are helpful in locating cells that are large enough to warrant fitting closely or small enough to disregard. A similar purpose is served by the zii in column 9 (Equation 6); the zij are particularly helpful in the later rounds of fitting. Column 7 reports errors within levels in the log linear metric (Equation 5 ) .

The last column gives the cumulative percentage distribu- tion of observed frequencies; I have used these entries to locate breaks between levels of interaction in revising models of large tables. For example, I have sometimes used one or two interaction parameters to fit the most dense cells in a classification and then added another parameter for each decile of the cumulative fre-

TABLE 14 Diagnostic Output from Model H, of Mobility from First to Current Occupations

Rank Row Column Observed Frequcncy

Expected Frequency log K,:

1.569 1.266 0.940 0.905 0.757 0.355 0.293 0.178 0.165 0.097

- 0.062 -0.129 -0.156 -0.336 - 0.402 -0.440 -0.497 -0.658 -0.724 -0.979 - 1.037 - 1.139 - 1.602 - 1.669 -2.776

Cumulative Observed

' X I Distribution

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 443

quency distribution. This procedure tends to fit the data both closely and parsimoniously because the likelihood-ratio test statistic is a weighted sum of errors of prediction in which observed fre- quencies are the weights (Bishop and others, 1975, p. 125):

One should be cautious in using this procedure, for it is completely mechanical and cannot be relied upon to yield a substantively interpretable design.

Having obtained a satisfactory fit of the age-smoothed quasi-symmetric classification, I turn to the age-smoothed classifi- cation (Panel C of Table 11); I use H5 as the initial model. As one might expect, the fit deteriorates slightly in this round of estimation (Line C2 of Table 12), but with two minor respecifications the fit of model H, becomes superior to that obtained with model H5 in the quasi-symmetric classification (Line C4). Model H, cannot be rejected at the (nominal) 0.05 probability level, but it should be borne in mind that I have ransacked the data to obtain this fit.

The final model for career mobility (H,) resembles that for mobility to first occupations (Table 2) with respect to the concen- tration of high densities in the upper left and lower right corners of the classifications. In several respects, however, the two models differ. First, the design of H, says that immobility is equally high in the farm and in the upper nonmanual stratum. Recall that in the transition to first occupations, immobility was greater in the farm stratum than in the upper nonmanual stratum. Second, immobility within the lower nonmanual stratum is in level 2 of the career mobility model; it is equal in density to both exchanges between the upper and lower nonmanual strata and to the flow from upper nonmanual to farm strata. The symmetry in mobility between upper and lower nonmanual strata contrasts with the greater mo- bility from lower to upper nonmanual strata in the intergenera- tional classification of mobility to first jobs. Third, the densities in cells adjacent to the upper left corner of the career mobility table are each encoded at a higher level of interaction (2) than those surrounding the lower right corner of the table, which are each

444 ROBERT M. HAUSER

encoded at level 3. Excepting the asymmetry just noted, in the intergenerational mobility table the densities in cells ad.jacent to the upper left and lower right corners of the table are homogeneous. Fourth, career immobility in the upper manual occupations is also assigned to level 3. Unlike the case of intergenerational mobility, career immobility in skilled work is relatively high.

As in Table 2, most cells outside the upper left and lower right corners of the career mobility table are encoded at one level (4) of model H7. Both the upward and downward exchanges be- tween upper manual and lower nonmanual occupations are rela- tively rare (at level 5), as are moves from upper manual first jobs to lower manual current jobs or from first jobs in the farm stratum to lower nonmanual current jobs. By far the least likely move is that from first jobs in the farm stratum to current jobs in the upper nonmanual stratum; this would require no comment if it did not contrast so sharply with the relative excess of moves from upper nonmanual first jobs to current farm jobs.

There are only three asymmetries in the career mobility model. I have just mentioned the largest of these-the exchange between farm and upper nonmanual strata. In addition there are . .

two asymmetries involving shifts in cells between levels 4 and 5. Mobility from lower to upper manual occupations (at level 4) is greater than that from upper to lower manual occupations (at level 5). This is the only respect in which the model suggests the greater prevalence of upward than downward mobility. Moreover, mobility is greater from lower nonmanual to farm occupations (at level 4) than from farm to lower nonmanual occupations (at level 5). These and other asymmetries in American mobility tables are interpreted in greater detail by Featherman and Hauser (1978, pp. 177-180).

A GGREGATION IN MODEL SPECIFICATION

If one lacks specific hypotheses about the structure of a cross-classification, it may be possible to borrow a model that fits a less detailed cross-classification. If the counts in a table are sparse or the categories are very narrowly defined, one may want to combine some categories, fit the aggregated cross-classification, and use the model of interaction in the aggregate classification as a starting

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 445

point in specifying a model of the full table. My analysis of the smoothed career mobility table has followed this pattern, but another example may better illustrate the formal relationships of aggregation and disaggregation in this context.

Panel A of Table 15 gives the counts in an 8 X 8 version of the British mobility table of 1949 (Glass, 1954). Table 16 summa- rizes statistical tests of several models of this c la~sif icat ion.~~ Line A1 of Table 16 reports a test of the model of simple independence, which obviously is rejected. Line 2 reports a test of the model of symmetry, ( P S ) ( R ) , which says that mji = mij for all i, j (Bishop and

l4 Duncan's (1979) and Goodman's (197913) models of this table ignore entries on the main diagonal, and they (plausibly) assume ordinality of the occupational classes. In at least the latter respect their models have greater substantive appeal than mine.

TABLE 15 Observed Counts and Fitted Counts Under Quasi Symmetry:

British Mobility Table, 1949

Son's Occupation

Father's Occupation 1 2 3 4 5 6

1. I 2. I1 3. I11 4. IV 5. Va 6. Vb 7. VI 8. VII

1. I 2. I1 3. I11 4. IV 5. Va 6. Vb 7. VI 8. VII

A. Observed Counts 50 19 26 8 7 11 16 40 34 18 11 20 12 35 65 66 35 88 11 20 58 110 40 183 2 8 12 23 25 46

12 28 102 162 90 554 0 6 19 40 21 158 0 3 14 32 15 126

B. Fitted Counts Under Quasi Symmetry 50.0 21.1 22.1 10.8 6.4 13.2 13.9 40.0 33.0 17.6 11.7 22.5 15.9 36.0 65.0 60.2 30.0 93.1 8.2 20.4 63.8 110.0 41.0 174.0 2.6 7.3 17.0 22.0 25.0 48.1 9.8 25.5 96.9 171.0 87.9 554.0 1.9 5.9 16.7 39.9 26.3 150.5 0.7 2.8 15.4 27.4 15.7 130.7

Source of observed counts is Miller (1960, p. 71).

ROBERT M. HAUSER

TABLE 16 Summary of Fit of Selected Multiplicative Models:

British Mobility Table, 1949 (1V = 3,498)

Model ( 3 2 DF n 1 GL/G$

A. Observed Counts (Panel A of Table 15) 1. (P) (S) = (PR)(SR) 954.49 49 0.000 16.6 100.07~ 2. (PS)(R) 89.29 28 0.000 5.2 9.4% 3. (PR)(SR)(PS) 22.94 2 1 >0.250 2.2 2.4"~ 4. (P)(S)(M,) 155.35 44 0.000 6.5 16.3% 5. (P)(S)(M2) 58.09 43 >0.050 4.2 6.1 'X

B. 5 x 5 Aggregate of Observed Counts 1. ( p ) ( s ) 811.27 16 0.000 15.4 100.0% 2. (P) (S) (M?) 12.13 11 >0.250 1.1 1.55

C. Fitted Counts Under Quasi Symmetry (Panel B of Table 15) 1. ( p ) ( s ) 931.55 28 0.000 16.3 100.0% 2. (P)(S)(M,) 132.41 2 3 0.000 6.0 14.2% 3. (P)(S)(M,) 35.15 22 >0.025 3.7 3.8%

D. Selected Contrasts Between Models 1. A2 vs. A3a 66.35 7 0.000 - -

2. B1 vs. B2 799.14 5 0.000 - -

3. A1 vs. A4 799.14 5 0.000 - -

4. C1 vs. C2 799.14 5 0.000 - -

5. A1 vs. B1 143.22 3 3 0.000 - -

6. A4 vs. B2 143.22 3 3 0.000 - -

7. A1 vs. A3 931 .55 28 0.000 - -

8. A4 vs. C2 22.94 2 1 >0.250 - -

9. A5 vs, C3 22.94 2 1 >0.250 - -

NOTE: P = father's occupation; S = son's occupation; R = transposition of data; M ; = initial model of interaction for 5 X 5 table; M I = initial model of interac- tion for 8 x 8 table; M p = final model of interaction for 8 X 8 table. a"A2" refers to line 2 of panel A; "Bl" refers to line 1 of panel B; and so on.

others, 1975, pp. 282-284). If the counts were symmetric, one might aggregate the data simply by averaging counts (arithmetically) in corresponding cells above and below the diagonal. This hypothesis is rejected with G2 = 89.29 on 28 degrees of freedom, but an alternative scheme for smoothing the data across the main diagonal is more satisfactory. Line A3 reports a test of the model of quasi symmetry, (PR)(SR)(PS), which implies equality across the diago- nal in the interaction effects but not in the frequencies. With G2 = 22.94 on 21 degrees of freedom, the hypothesis of quasi symmetry cannot be rejected. For this reason I use estimated counts

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 447

under quasi symmetry to develop a (symmetric) model of interac- tion; these counts are given in Panel B of Table 15.1"omplete symmetry (Line A2 of Table 16) implies both quasi symmetry (Line A3) and marginal homogeneity. Hence the contrast between Lines A2 and A3 tests the hypothesis of marginal homogeneity. As re- ported in line Dl of Table 16, marginal heterogeneity accounts for rejection of the model of symmetry in the present case.

Hauser (1978) fitted a 5 X 5 version of the British mobility table using a specification of interaction levels, MT, which is dis- played in Table 17. Note that MT is symmetric, so it implies quasi symmetry in the expected counts. In principle the 5 x 5 cross-classi- fication may be obtained by collapsing categories I1 with 111, Va with Vb, and VI with VII in the 8 x 8 cross-~lassification.~~ Under

15An implication of this finding is that a model of the 8 x 8 British table which implies quasi symmetry, such as the uniform association model of Duncan 11979). must vield a test statistic of a t least G' = 22.94 on at least 21 degrees of \ ,, u

freedom. For this reason, if one regards the deviations from quasi symmetry as insignificant, one might fit models that imply quasi symmetry to the quasi-symmet- ric table of expected counts (with its reduced degrees of freedom) or, equivalently, refer the fit of such models to the test statistics for the observed counts less the G 2 = 22.94 and 21 degrees of freedom under quasi symmetry.

'"n fact the 5 X 5 table analyzed by Hauser (1978) and Goodman (1969a, 1972c), among others, is not an aggregation of the 8 X 8 table given by Miller (1960, p. 71). The aggregate of the 8 X 8 table has one extra count in cell (4,4); for the present purpose I have analyzed the 5 X 5 aggregate of Miller's 8 X 8 table, rather than the 5 X 5 version analyzed elsewhere. This accounts for slight discrepancies between test statistics reported here and elsewhere. The discrepancy in the raw counts is by no means the most serious challenge to the validity of the 1949 British mobility table; see Payne and others (1977).

TABLE 17 Model of Interaction ( M ; ) for a 5 X 5 Aggregation of the 8 x 8

British Mobility Table

Father's Occupation

1. I 2. I1 and I11 3. IV 4. Va and Vb 5 . VI and VII

Son's Occupation

1 2 3 4 5

448 ROBERT M. HAUSER

simple independence G2 = 81 1.27 with 16 degrees of freedom in the aggregated 5 x 5 table. Under the six-level model (P)(S)(MT) the fit is very good; G2 = 12.13 with 11 degrees of freedom. (See Lines B1 and B2 of Table 16.) That is, as reported in Line D2 of Table 16, model MT accounts for G2 = 799.14 with 5 degrees of freedom.

Panel A of Table 18 displays MI, a modified version of MT with which I begin the analysis of the 8 x 8 cross-classification.

TABLE 18 Initial Model of Interaction ( M I ) and Two Sets of Residuals

(log R:) from the 8 x 8 British Mobility Table

Son's Occupation

Father's Occupation 1 2 3 4 5 6 7 8

A. Initial Model ( M I ) 1. I 1 2 2 4 5 5 6 6 2. I1 2 3 3 4 5 5 6 6 3. I11 2 3 3 4 5 5 6 6 4. IV 4 4 4 4 5 5 5 5 5 . Va 5 5 5 5 6 6 5 5 6. Vb 5 5 5 5 6 6 5 5 7 . VI 6 6 6 5 5 5 4 4 8 . VII 6 6 6 5 5 5 4 4

B. Residuals (log R;j) from Observed Counts (Panel A of Table 15) 1. I 4.8 2.5 2.1 0.3 0.3 -0.8 -0.3 -1.1 2. I1 2.8 2.4 1.5 0.2 -0.2 - 1.1 - 1.0 - 1.6 3. I11 1.6 1.4 1.3 0.6 0.2 -0.5 -0.7 -0.5 4. IV 0.9 0.2 0.5 0.5 -0.4 -0.4 -0.4 -0.7 5. Va -0.0 0.0 -0.3 -0.3 -0.1 - 1.0 -0.4 -0.9 6. Vb -0.4 -0.9 -0.3 -0.5 -0.9 -0.7 -0.5 -0.4 7 . VI a -0.9 -0.5 -0.4 -0.8 -0.4 0.6 0.2 8. VII " - 1.4 -0.6 -0.4 - 1.0 -0.5 0.3 0.8

C. Residuals (log R:,) from Quasi-Symmetric Counts (Panel B of Table 15) 1. I 4.8 2.6 1.9 0.6 0.2 -0.6 -0.7 -1.6 2. I1 2.6 2.4 1.4 0.2 -0.1 -1.0 -0.9 -1.5 3. 111 1.9 1.4 1.3 0.6 0.0 -0.4 -0.6 -0.6 4. IV 0.6 0.2 0.6 0.5 -0.3 -0.5 -0.4 -0.6 5. Va 0.2 -0.1 0.0 -0.4 -0.1 - 1.0 -0.6 -1.0 6. Vb -0.6 -1.0 -0.4 -0.5 -1.0 -0.7 -0.5 -0.4 7 . VI -0.7 -0.9 -0.6 -0.4 -0.6 -0.5 0.6 0.3 8 . 1711 - 1.5 - 1.5 -0.5 -0.6 - 1.0 -0.4 0.3 0.8

a Undefined because of zero counts

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 449

Model MI is obtained from MT in the following way: Each cell in the disaggregated classification is assigned to its level in the aggre- gated classification. For example, immobility in class I is assigned to level 1 in both designs; moves from classes I1 or I11 to class I are assigned to level 2, as was their aggregate in the collapsed table. The fit of model MI to the observed counts is reported in Line A4 of Table 16, and its fit to the estimated (quasi-symmetric) counts is reported in Line C2. In neither case is the fit satisfactory.

Note that the components of association (G2) explained by model MI in the observed counts and in the fitted (quasi-symmetric) counts are the same (G2 = 799.14 with 5 degrees of freedom), and these are the same as the component of association explained by design MT in the collapsed table. (See Lines D2 to D4 of Table 16.) Moreover, the estimated interaction effects (not reported) are equivalent under model MT in the collapsed table or model MI in the observed or fitted (quasi-symmetric) 8 x 8 tables.17

Furthermore, the contrast between Lines A1 and B1 in Table 16 (see Line D5) shows that one loses G2 = 143.22 with 33 degrees of freedom by aggregating the 8 x 8 table to form the 5 x 5 table (see Goodman, 1968); this reflects association within categories collapsed in the aggregate table. This same component of association accounts for the difference between the fit of model MI to the observed 8 X 8 table (Line A4) and the fit of Model MT to the collapsed (5 X 5) table (Line B2; the contrast is reported in Line D6). That is, there is an explicit three-component decomposition of association (G2 = 954.49 with 49 degrees of freedom) in the ob- served 8 x 8 table: between (5 x 5) cell association explained by model MI (G2 = 799.14 with 5 degrees of freedom); unexplained between-cell association (G2 = 12.13 with 11 degrees of freedom); and unexplained within-cell association (G2 = 143.22 with 33 de- grees of freedom).

Last, the contrast between the models of Lines A1 and A3 shows that the model of quasi-symmetric association explains G2 = 931.55 with 29 degrees of freedom (Line D7). This is the same

17An overall shift in the estimated parameters (but not in differences among them) is implied by my normalization as the numbers of cells at each level are changed by aggregation. The estimated parameters of design M I are identical in the observed and fitted (quasi-symmetric) 8 X 8 tables.

450 ROBERT M. HAUSER

as the association (G2) under simple independence in the 8 x 8 table of fitted (quasi-symmetric) counts. Note that the difference between the fit of model MI to the observed and to the fitted (quasi-symmetric) counts is G2 = 22.94 with 21 degrees of freedom (Line D8), which is the same result obtained earlier (Line A3) in testing the hypothesis of quasi symmetry.ls

In summary, I have shown that aggregation is not merely a heuristic device in explorations of cross-classified counts. Strong formal properties support this use of aggregation (Goodman, 1968; Beland and Fortier, 1978), and these may help in understanding and interpreting the pattern of interaction in the classification. At minimum one can measure the information lost in aggregating the data.

Panels B and C of Table 18 report residuals (log R:) from model MI in the observed counts and in the fitted (quasi-symmetric) counts, respectively. Because the observed counts are almost quasi-symmetric, I specify a symmetric model, and I work with residuals from fitted (quasi-symmetric) counts in Panel C, rather than those from observed counts. After three rounds of revision I specify model M2, which is shown in Panel A of Table 19. In Panels B and C, respectively, I show residuals from both the raw and the fitted (quasi-symmetric) counts. The final model, M2, has only one more density level than the initial model MI. Only two pairs of cells-(2,6) and (6,2) and (5,8) and (8,5)-shift by more than one level, and most changes seem to occur among relatively sparse cells.

Model M2 fits the data fairly well; G2 = 58.09 with 43 degrees of freedom in the raw counts and G2 = 35.15 with 22 degrees of freedom in the fitted (quasi-symmetric) counts. The former test statistic is not significant at the 0.05 probability level; at the same time, the test statistic for the fitted (quasi-symmetric) counts is statistically significant beyond the 0.05 probability level.

COMPARISONS BE TWEEN CLA SSIFICATIONS

Earlier, I alluded to one use of a structural model in com- paring cross-classifications. By applying a model that fits one clas-

lX A similar observation may be made about the fit of design M2 to the observed and fitted (quasi-symmetric) counts. See also footnote 15.

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 45 1

TABLE 19 Final Model of Interaction ( M z ) and Two Sets of Residuals (log R;)

from the 8 X 8 British Mobility Table

Son's Occupation Father's

Occupation 1 2 3 4 5 6 7 8

A. Final Model ( M 2 ) 1. 1 2 2 4 4 6 6 7 2. 2 2 3 5 5 7 7 7 3. 2 3 3 4 5 6 6 6 4. 4 5 4 4 6 6 6 6 5. 4 5 5 6 5 7 6 7 6. 6 7 6 6 7 6 6 6 7 . 6 7 6 6 6 6 4 5 8. 7 7 6 6 7 6 5 4

B. Residuals (log R:j) from Observed Counts (Panel A of Table 15) 1. 5.0 2.5 2.2 0.3 0.4 -0.7 -0.3 - 1 . 1 2. 2.8 2.2 1.4 0.1 -0.2 -1.1 - 1 . 1 -1.7 3. 1.7 1.3 1.2 0.6 0.2 -0.4 -0.8 -0.6 4. 0.9 0.0 0.4 0.4 -0.4 -0.4 -0.5 -0.9 5. 0.1 -0.0 -0.3 -0.3 -0.0 -0.9 -0.5 -1.0 6. -0.3 -1.0 -0.3 -0.5 -0.9 -0.6 -0.5 -0.5 7. a - 1 . 1 -0.6 -0.5 -0.9 -0.4 0.4 0.1 8 . a -1.6 -0.8 -0.6 - 1 . 1 -0.5 0.1 0.6

C. Residuals (log R:i) from Quasi-Symmetric Counts (Panel B of Table 15) 1 . 5.0 2.6 2.0 0.6 0.3 -0.5 -0.7 -1.5 2. 2.6 2.2 1.3 0.1 -0.1 -1.0 -1.0 -1.7 3. 2.0 1.3 1.2 0.5 0.0 -0.4 -0.7 -0.6 4. 0.6 0.0 0.5 0.4 -0.4 -0.4 -0.5 -0.7 5. 0.3 -0.2 0.0 -0.4 -0.0 -0.9 -0.7 -1.1 6 . -0.5 -1.0 -0.4 -0.5 -0.9 -0.6 -0.5 -0.5 7 . -0.7 -1.1 -0.7 -0.5 -0.7 -0.4 0.4 0.1 8. -1.6 -1.7 -0.7 -0.8 - 1 . 1 -0.5 0.1 0.6

aUndefined because of zero counts.

sification to a second classification, one obtains an explicit test of the assignment of cells to levels of interaction. For example, recall that the design of Table 2 fits the aggregate table of mobility from father's occupation to son's first occupation in the 1973 OCG survey with G2 = 66.5 on 12 degrees of freedom. To test the assignment of cells to levels in this model, I apply it to mobility from father's occupation to son's first occupation by age in the 1962 OCG survey.

45 2 ROBERT M. HAUSER

Where P = father's occupational stratum, W = occupational stra- tum of son's first job, A = age in 5-year groups, and H = the model of Table 2, I fit the model (PA)(WA)(HA) to the 1962 data and obtain a test statistic of G2 = 121.2 with 108 degrees of freedom, which is not statistically significant. That is, conditional on varia- tion in occupational origins and destinations between cohorts, the same set of equality restrictions fits interactions between father's occupation and son's first occupation in the 1962 OCG survey as in the 1973 OCG survey. Of course, this does not indicate that mobil- ity chances are numerically identical (or even remotely similar) in the two OCG surveys.lg

Under a given model one may also test the equality of interaction parameters among two or more cross-classifications. To illustrate this, I compare mobility from father's occupation (P) to son's first occupation ( W) across nine 5-year age cohorts (A) covered in the 1973 OCG survey. I begin by fitting the model (PA)( WA)(H), in which occupational origins and destinations vary across cohorts but relative mobility chances do not. There are 9 mobility sub- tables, each with 16 degrees of freedom, conditional on the observed marginal distributions; since the 5-level model of Table 2 uses just 4 degrees of freedom, there are 140 degrees of freedom for error. Under this specification the test statistic, G2 = 235.3, is significant. I also fit the model (PA)(WA)(HA), which fits origin and destination effects as in the initial model but permits the parameters of the model to vary across cohorts. Under this model the test statistic is G2 = 175.6 with 108 degrees of freedom; relative to the initial model, there are 4 more parameters for each of 8 subtables. Again the test statistic is significant, so there are nonchance departures from the model within one or more cohorts. More important, since the model (PA)(WA)(HA) is obtained from (PA)(WA)(H) by relax- ing restrictions on interactions in the latter, these restrictions may be tested by taking the difference between the two likelihood ratio statistics. This is G2 = 59.7 with 32 degrees of freedom, so there are statistically significant intercohort variations in parameters of the model of Table 2.

l q h e r e are methodological differences in the measurement of first occupations in these two surveys, and for this reason it would not be surprising if the two models (or their parameters) differed substantially. (See Featherman and Hauser, 1978, pp. 200-208.)

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 453

Using related procedures one can test hypotheses about differences in each interaction parameter of a cross-classification, and one can evaluate the fit within each level of interaction speci- fied by the model. Appropriate test statistics may be constructed by contrasting hierarchical models and exploiting the additive prop- erties of the likelihood-ratio test statistic (see Bishop and others, 1975, pp. 126-127). In general, the comparisons will be more powerful than tests based on less parsimonious models of the classi- fication.

CONCLUSION

It may be well to end on a cautionary, if not agnostic, note. The models described here are intended to fit cross-classifications using relatively few parameters under the condition that interac- tions be uniform within levels of the design. My exploratory meth- ods are intended to yield models with these features. Fit and parsi- mony are always desirable; the distinctive feature of the present models is the specification of uniform interaction effects within levels of the design. It is intended to meet the traditional demand of mobility analysts (among others) for a model of the mobility table in which row and column effects are not confounded with interac- tions.

Any number of models may imply the same set of odds ratios, and in this sense they will be equivalent. One such model may be transformed into another by multiplicative rescaling (Goodman, 1979a). Thus any model that fits a given classification is going to be equivalent in this sense (or nearly equivalent) to any other model that fits the same classification-for close fit means they will imply a particular set of odds ratios. In this sense a saturated model of the 5 x 5 British mobility table will be equivalent to the rather different specification of Hauser (1978), within limits of sampling variability. Likewise, the present model of the 8 x 8 British mobility classification will be nearly equivalent in this sense to the models of the same classification proposed by Duncan (1979) and Goodman (1979b).

It may be difficult to see the equivalence (of odds ratios) between a pair of models by inspecting a design like that in Table 2 or even by inspecting the array of interaction parameters. Thus it

454 ROBERT M. HAUSER

may be necessary to use multiplicative rescaling to compare models just as it is to compare classifications of counts.

At the same time, the way in which one chooses to model a given set of odds ratios is not arbitrary. The choice between equiv- alent models should be dictated by substantive questions, and it should be susceptible to explicit formulation. The models described here are motivated by the criterion of uniform interaction within levels of the design, for example, while those of Duncan (1979) and Goodman (197913) are intended to exploit the assumption of order in the row and column variables.

Similarly, the exploratory methods I have described are linked to the specification I have been using. In the context of alternative specifications, the present methods may be useful for preliminary smoothing of tables of counts, but other search methods may be more useful in interpreting the data.

REFERENCES

BARON, J. N.

1977 "The structure of intergenerational occupational mobility: Another look at the Indianapolis mobility data." Unpublished masters thesis, University of Wisconsin, Madison.

1978 "The structure of intergenerational occupational mobility: Another look at the Indianapolis and OCG-I1 data." Paper presented at the 73rd annual meeting of the American Socio- logical Association, San Francisco.

B ~ L A N D , F., AND FORTIER, J.-J.

1978 "Collapsible contingency tables." Unpublished manuscript. BISHOP, Y. M. M., AND FIENBERG, S. E.

1969 "Incomplete two-dimensional contingency tables." Biometrics 25:119-128.

BISHOP, Y. M. M., FIENBERG, S. E., AND HOLLAND, P. W.

1975 Discrete Multivariate Analysis: T h e 0 9 and Practice. Cambridge, Mass.: M.I.T. Press.

BLAU, P. M., AND DUNCAN, 0. D.

1967 T h e American Occupational Structure. New York: Wiley. DAVIS, J. A.

1974 "Hierarchical models for significance tests in multivariate contingency tables: An exegesis of Goodman's recent papers."

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 455

In H. L. Costner (Ed.), Sociological Methodology 1973-1974. San Francisco: Jossey-Bass.

DUNCAN, 0. D.

1979 "How destination depends on origin in the occupational mo- bility table." American Journal of Sociology 84(January):793-803.

EVERS, M., AND NAMBOODIRI, N. K.

1978 "On the design matrix strategy in the analysis of categorical data." In K. F. Schuessler (Ed.), Sociological Methodology 1979. San Francisco: Jossey-Bass.

FAY, R., AND GOODMAN, L. A.

1973 "ECTAprogram description for users." Unpublished document. FEATHERMAN, D. L., AND HAUSER, R. M.

1975 "Design for a replicate study of social mobility in the United States." In K. C. Land and S. Spilerman (Eds.), Social Indicator Models. New York: Russell Sage Foundation.

1978 Opportunity and Change. New York: Academic Press. FEATHERMAN, D. L., JONES, F. L., AND HAUSER, R. M.

1975 "The assumptions of social mobility research in the United States: The case of occupational status." Social Science Research 4:329-360.

FIENBERG, S. E. 1970a "The analysis of multidimensional contingency tables." Ecology

51 :419-433. 1970b "Quasi-independence and maximum likelihood estimation in

incomplete contingency tables." Journal of the American Statistical Association 65 : 1610-161 6.

1972 "The analysis of incomplete multiway contingency tables." Biometrics 23(March): 177-202.

1976 "Book review ofMeasuring OccupationalInheritance by Thomas W. Pullum." Journal of the American Statistical Association 7 1 (Decem- ber): 1005-1006.

1977 The Analysis of Cross-Classijed Categorical Data. Cambridge, Mass.: M.I.T. Press.

GLASS, D. B.

1954 Social Mobility in Britain. London: Routledge and Kegan Paul. GOLDTHORPE, J. W., AND PAYNE, C.

1979 "Class structure and the pattern of intergenerational fluidity." In J. H. Goldthorpe (Ed.), Social Mobility and Class Structure in Modern Britain. Oxford, England: Clarendon Press.

GOLDTHORPE, J. W., PAYNE, C., AND LLEWELLYN, C. 19 78 "Trends in class mobility." S o c i o l o ~ 12(September):44 1-468.

456 ROBERT M. HAUSER

GOODMAN, L. A.

1963 "Statistical methods for the preliminary analysis of transaction flows." Econometrics 31 (January): 197-208.

1965 "On the statistical analysis of mobility tables." American Journal of Sociology 70(March):564-585.

1968 "The analysis of cross-classified data: Independence, quasi- independence, and interaction in contingency tables with or without missing entries." Journal of the American Statistical Associ- ation 63(December):1091-1131.

1969a "How to ransack social mobility tables and other kinds of cross-classification tables." American Journal of Sociology 75 (July): 1-39.

1969b "On the measurement of social mobility: An index of status persistence.'' American Sociological Review 34(December):83 1-850.

1971 "A simple simultaneous test procedure for quasi-independence in contingency tables." Applied Statistics 20: 165-1 77.

1972a "A general model for the analysis of surveys." American Journal of Sociology 77(May): 1035-1086.

1972b "A modified multiple regression approach to the analysis of dichotomous variables." American Sociological Review 37(Febru- ary):28-46.

1972c "Some multiplicative models for the analysis of cross-classified data." In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press.

1974 "The analysis of systems of qualitative variables when some of the variables are unobservable. Part I-A modified latent structure approach." American Journal of Sociology 79(March): 1 179-1259.

1979a "Multiplicative models for the analysis of occupational mo- bility tables and other kinds of cross-classification tables." American Journal of Sociology 84 (January): 804-819.

1979b "Simple models for the analysis of association in cross-classifica- tions having ordered row categories and ordered column cate- gories." Journal of the American Statistical Association (in press).

HABERMAN, S. J.

1974 The Analysis of Frequency Data. Chicago: University of Chicago Press.

HAUSER, R. M.

1978 "A structural model of the mobility table." Social Forces 56(March):9 19-953.

SOME EXPLORATORY METHODS FOR MODELING MOBILITY TABLES 457

HAUSER, R. M., AND FEATHERMAN, D. L.

1977 The Process of Strat$cation: Trends and Analyses. New York: Aca- demic Press.

HAUSER, R. M., DICKINSON, P. J., TRAVIS, H. P., AND KOFFEL, J. M.

1975 "Temporal change in occupational mobility: Evidence for men in the United States." American Sociolo~gical Review 40( June):279-297.

HOPE, K.

1974 "Trends in the openness of British society in the present cen- tury." Paper prepared for the Toronto Conference on Meas- urement and Models in Social Stratification, 14-16 August.

1978 "Vertical mobility in Britain: A structured analysis." Unpub- lished manuscript.

IUTAKA, S., BLOOMER, B. F., BURKE, R. E., AND WOLOWYNA, 0. 1975 "Testing the quasi-perfect mobility model for intergenerational

data: International comparisons." Economic and Social Review 6:215-236.

LARNTZ, K. 1978 "Small-sample comparisons of exact levels for chi-squared

goodness-of-fit statistics." Journal of the American Statistical Associ- ation 73(June):253-263.

MANTEL, N.

1970 "Incomplete contingency tables." Biometrics 26:291-304. MASON, K. O., MASON, W. M., WINSBOROUGH, H. H., AND POOLE, K. W.

1973 "Some methodological issues in cohort analysis of archival data." American Sociological Review 38(April):242-258.

MILLER, S. M.

1960 "Comparative social mobility." Current Sociology 9:l-89. MOSTELLER, F.

1968 "Association and estimation in contingency tables.'' Journal of the American Statistical Association 64(March): 1-28.

PAYNE, C.

1977 "The log-linear model for contingency tables." In C. A. O'Muircheartaigh and C. Payne (Eds.), The Analysis of Survey Data. Vol. 2: Model Fitting. London: Wiley.

PAYNE, G . , FORD, G., AND ROBERTSON, C.

1977 "A reappraisal of social mobility in Great Britain." Sociology 1 l(May):289-310.

PULLUM, T.

1975 Measuring Occupational Inheritance. New York: Elsevier.

458 ROBERT M. HAUSER

RAMS@Y, N.

1977 Social Mobilitet i Norge [Social Mobility in Norway]. Oslo: Tiden Forlag.

ROGOFF, N.

1953 Recent Trends in Occupational Mobility. Glencoe, Ill.: Free Press. SHAVIT, Y.

1978 "Ethnic intermarriage in the United States." Center for De- mography and Ecology, Working Paper 78-1 1. Madison: Uni- versity of Wisconsin.

TUKEY, J.

1977 Exploratory Data Analysis. Reading, Mass.: Addison-Wesley. TYREE, A.

1973 "Mobility ratios and association in mobility tables." Population Studies 27(July):577-588.

WHITE, H. C.

1963 "Cause and effect in social mobility tables." Behavioral Science 8: 14-27.