a comparison of three developmental stage scoring systemskohlberg’s moral stages describe 3...

45
JOURNAL OF APPLIED MEASUREMENT, 3(2), 146- 1 89 Copyrighto 2002 A Comparison of Three Developmental Stage Scoring Systems The0 Linda Dawson University of California at Berkeley In social psychological research the stage metaphor has fallen into disfavor due to concerns about bias, reliability, and validity. To address some of these issues, I employ a multidimensional partial credit analysis comparing moral judgment interviews scored with the Standard Issue Scoring System (SISS) (Colby and Kohlberg, 1987b), evaluative reasoning interviews scored with the Good Life Scoring System (GLSS) (Armon, 1984b), and Good Education interviews scored with the Hierarchical Complexity Scoring System (HCSS) (Commons, Danaher, Miller, and Dawson, 2000). A total of 209 participants between the ages of 5 and 86 were interviewed. The multidimensional model reveals that even though the scoring systems rely upon different criteria and the data were collected using different methods and scored by different teams of raters, the SISS, GLSS, and HCSS all appear to measure the same latent variable. The HCSS exhibits more internal consistency than the SISS and GLSS, and solves some methodological problems introduced by the content dependency of the SISS and GLSS. These results and their implications are elaborated. Requests for reprints should be sent to The0 Linda Dawson, Graduate School of Education, Tolman Hall, UC Berkeley, Berkeley, CA 94720-1670, e-mail: [email protected].

Upload: others

Post on 16-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

JOURNAL OF APPLIED MEASUREMENT, 3(2), 146-1 89

Copyrighto 2002

A Comparison of Three Developmental Stage Scoring Systems

The0 Linda Dawson University of California at Berkeley

In social psychological research the stage metaphor has fallen into disfavor due to concerns about bias, reliability, and validity. To address some of these issues, I employ a multidimensional partial credit analysis comparing moral judgment interviews scored with the Standard Issue Scoring System (SISS) (Colby and Kohlberg, 1987b), evaluative reasoning interviews scored with the Good Life Scoring System (GLSS) (Armon, 1984b), and Good Education interviews scored with the Hierarchical Complexity Scoring System (HCSS) (Commons, Danaher, Miller, and Dawson, 2000). A total of 209 participants between the ages of 5 and 86 were interviewed. The multidimensional model reveals that even though the scoring systems rely upon different criteria and the data were collected using different methods and scored by different teams of raters, the SISS, GLSS, and HCSS all appear to measure the same latent variable. The HCSS exhibits more internal consistency than the SISS and GLSS, and solves some methodological problems introduced by the content dependency of the SISS and GLSS. These results and their implications are elaborated.

Requests for reprints should be sent to The0 Linda Dawson, Graduate School of Education, Tolman Hall, UC Berkeley, Berkeley, CA 94720-1670, e-mail: [email protected].

Page 2: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 147

As part of a larger lifespan investigation into the development of social and moral cognition, I am exploring the possibility of employing general criteria for establishing the developmental level of verbal and textual per- formances. As a step in this process, my colleagues and I have conducted a series of validation studies, comparing multiple scoring systems and data- gathering techniques to determine whether different means of assessing developmental progress provide similar results. This paper describes the results of one of these studies, a comparison of performances on three different instruments designed to assess the development of reasoning in three different domains, and scored with three different developmental scoring systems by two different teams of raters. The instruments are Kohlberg’s (1987b) Moral Judgment Interview, Armon’s ( 1984b) Good Life Interview, and Armon’s Good Education Interview. The scoring sys- tems associated with these instruments are the Standard Issue Scoring System (SISS) (Colby and Kohlberg, 1987b)’ the Good Life Scoring Sys- tem (GLSS) (Armon, 198413)’ and the Hierarchical Complexity Scoring System (HCSS) (Commons, et al., 2000; Dawson, Commons, Wilson, and Xie, 1999).

The Piagetian model of cognitive development as a series of hierar- chical integrations of thought processes-referred to here as stages or or- ders of hierarchical complexity--provides the theoretical underpinnings for the project. Novel understandings emerge as the actions of an earlier stage become the content of the actions of the subsequent stage. Kohlberg’s (1984) moral judgment stages, Armons’s (1984a) stages of evaluative rea- soning about the good, and Commons’ (Commons, Trudeau, Stein, Richards, and Krause, 1998) orders of hierarchical complexity (OHCs) are all grounded in this notion of hierarchical integration.

Kohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional. Each of these three periods is subdivided into two stages so that the model com- prises six stages of moral development. Armon defines 5 stages of evalu- ative reasoning about the good, covering the period from early childhood to adulthood. In both of these systems, stage definitions are tightly tied to their content domain, and scoring with either Kohlberg’s Standard Issue Scoring System (SISS) or Armon’s Good Life Scoring System (GLSS) is conducted by matching concepts in a given performance with similarly structured concepts in a scoring manual.

Commons’ Model of Hierarchical Complexity, a content-indepen- dent model of development, specifies 15 orders of hierarchical complexity

Page 3: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

148 DAWSON

(OHCs). The sequence is: (0) computory, (1) sensory and motor, (2) circu- lar sensory-motor, (3) sensory-motor, (4) nominal, ( 5 ) sentential, (6) pre- operational, (7) primary, (8) concrete, (9) abstract, (10) formal, (1 1) systematic, (12) metasystematic, (13) paradigmatic, and (14) cross-para- digmatic. OHCs 0 through 12 are closely related to the tiers and levels of Fischer’s (1980) skill theory.

The model of Hierarchical Complexity differs from Kohlberg’s and Armon’s models in that it is a general stage model. In this model, an action is considered to be at a given stage when it successfully completes a task of a specified order of hierarchical complexity. Hierarchical complexity refers to the number of non-repeating recursions that coordinating actions must perform on a set of primary elements. Actions at the higher order of hierarchical complexity are: (a) defined in terms of the actions at the next lower order; (b) organize and transform the lower order actions; and (c) produce organizations of lower order actions that are new and not arbi- trary and cannot be accomplished by the lower order actions alone. For example, the notion of learning through play is constructed at the abstract OHC from notions ofplay and learning constructed at the concrete OHC. The notion of learning through play (1) cannot be constructed until the individual notions of learning and play have been constructed, and (2) is an entirely new concept with a meaning that is not present in its learning and play elements.

While cognitive developmental stage theories are grounded in the notion of hierarchical integration and embody principles that are thought to hold across knowledge domains, few efforts have been made to estab- lish a generalized method of assessment that can be applied in multiple knowledge domains. In fact, most stage assessment systems assign a stage score based on the conceptual content of performances, with little direct attention to their hierarchical complexity. Several such stage-scoring sys- tems have been developed (Armon, 1984b; Damon and Hart, 1982; Fowler, 1991; Kitchener and King, 1990). Kohlberg’s Standard Issue Scoring Sys- tem (SISS) (Colby and Kohlberg, 1987b) is one of the best known of these scoring systems. To assess moral development with the SISS, the researcher administers a set of moral judgment interviews, transcribes these, identi- fies each moral argument that addresses one of 6 moral issues (life, law, conscience, punishment, authority, and contract), then employs a scoring manual to match each identified argument with a similarly structured ar- gument in the manual. The Good Life Scoring System functions similarly.

Page 4: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 149

In fact, though the notion that the development of knowledge is a process of hierarchical integration underlies Kohlberg’s stages, they are not explicitly defined in terms of the hierarchical complexity of their con- ceptions. Instead, each stage is defined in terms of particular conceptual content that is associated with a particular moral perspective. To construct these moral stages, Kohlberg and his team employed a “bootstrapping” process. They started with a theoretical sequence strongly influenced by philosophical categories and Piaget’s ideas that moral thinking moves (a) from heteronomy to autonomy and (b) from concrete to abstract, and re- fined their stage definitions on the basis of successive rounds of longitudi- nal data from a group of Chicago school boys. The result is a reasonably reliable but limited measure of moral judgment development. It is reliable in the sense that it has reasonable internal consistency (Dawson, 2002) and high inter-rater agreement rates (Armon and Dawson, 1997; Colby and Kohlberg, 1987a), and in that its hierarchy replicates longitudinally (Armon and Dawson, 1997; Colby and Kohlberg, 1987a; Walker, 1989). It is limited in the sense that the stage definitions are tied to a particular sample of school-boys, who were administered a particular type of moral judgment interview.

The problems arising from this limitation have resulted in numerous critiques. From the perspective of this project, among the most notable are charges that Kohlberg’s moral stages were based on American male think- ing and exclude moral conceptions more prevalent among women (Gilligan, 1982) and among persons embedded in other cultures (Boyes and Walker, 1988; Haste and Baddeley, 1991; Puka, 1994).

A second set of concerns about Kohlberg’s system are associated with stages 1 and 2. Several researchers report problems with the definition of these stages and their associated scoring criteria (Damon, 1977; Keller, Eckensberger, and von Rosen, 1989; Kuhn, 1976). Dawson and Kay (in review) suggest that these problems arise because Kohlberg based his lower stage scoring criteria on an analysis of performances that were too develop- mentally advanced to provide accurate information about lower-stage be- havior. Kohlberg’s youngest respondents were 10-years-old, the modal age for the emergence of abstractions (Fischer and Bidell, 1998). Stages 1 and 2 are intended to represent pre-abstract or concrete reasoning.

In addition to questions about bias and the age-range of the construc- tion sample, there are a number of other problems confronted when stage definitions are tied to a particular domain, research approach, and sample.

Page 5: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

150 DAWSON

First, the scoring systems employed to assess performances involve match- ing particular conceptual content in a performance with similar concep- tual content in a scoring manual. Therefore, the scoring systems can only be used to assess stage on a particular instrument within a particular do- main of knowledge. This means that a scoring manual must be developed independently for each domain and instrument. The time and expense of producing reliable scoring systems with this method are formidable.

Second, the stage scores produced with various scoring systems are difficult to compare with one another, making cross-domain research diffi- cult. Commons and his colleagues (Commons, et al., 1989; Commons, Richards, with Ruf, Armstrong-Roche, and Bretzius, 1984) suggest that one can legitimately compare the level of an individual’s reasoning across tasks only when scoring systems are based upon the same structural criteria. But even when stage sequences in different domains are based on similar con- ceptions of structure, comparing stage achievements across domains is not a simple task. In their comparison of reasoning across the domains of intellec- tual, moral, and ego development, King, Kitchener, Wood, and Davison (1989) point out that no method exists for determining just how stages in different domains should be related to one another. In spite of this problem, a few studies have been published that attempt to compare developmental stage across social-cognitive domains. One of the earliest of these was con- ducted by Selman (1971a), who reported an association between role-taking stage and moral judgment stage. Other examples include studies by Kuhn, Langer, Kohlberg, and Haan, (1977) and Commons and Grotzer (1990), who report relationships between reasoning in the moral and logico-mathemati- cal domains; and studies by Walker and his colleagues (1980; Walker and Richards, 1979), who reports strong relationships between moral stages and cognitive and perspective-taking stages.

Third, content dependent scoring systems like Kohlberg’s conflate form and content. Some of the conceptual content identified with each stage becomes part of the stage definition. For example, the concept of care became synonymous with moral stage 3 reasoning, precipitating a major controversy in moral theory (Gilligan, 1977). The failure to differ- entiate between the formal attributes of stages-as embodied in the notion of orders of hierarchical complexity-and their particular conceptual con- tent precipitates such acts of reification. Moreover, the failure to differen- tiate structure and content in a coherent manner makes it impossible to investigate the relationship between the more universal aspects of devel- opment and those that are strongly influenced by particular contexts and

Page 6: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 15 I

content. As Levine (1979) points out, it is impossible to study content or context effects when these are conflated with what he calls general cogni- tive abilities. Several attempts have been made to overcome this problem (Edelstein, Keller, and Wahlen, 1984; Keasey, 1975; Keller and Reuss, 1984; Rosenberg, Ward, and Chilton, 1988; Selman and Damon, 1975; Selman, 1971b; Simpson, 1976; Stuart, 1967), but few of these have pro- vided a general solution that can be transferred to the study of any domain of knowledge. Case (Case, et al., 1996), Fisher (1980), and Commons (Commons, et al., 1998) are among a small number of researchers who have worked to provide such a general solution.

A fourth problem with content dependent scoring systems is in their use as methods for testing certain postulates of stage theory. Kohlberg and Armon (1 984) describe four criteria for cognitive-developmental stage sequences (I) ordered acquisition; (2) no reversals; (3) structured whole- ness-a tendency for individuals to employ a single organizational struc- ture to reasoning in any given domain-and (4) universality. While the research of Kohlberg, his colleagues, and others working in the cognitive- developmental tradition generally support the ordered acquisition of moral stages as defined in his sequence and the absence of statistically signifi- cant reversals in the direction of development over time (Armon and Dawson, 1997; Colby, Kohlberg, Gibbs, and Lieberman, 1983; Holstein, 1976; Nisan and Kohlberg, 1982; Snarey, Reimer, and Kohlberg, 1985; Walker, 1982), postulates of structured wholeness and universality are not as uniformly supported.

If reasoning processes tend to organize themselves into a coherent system as suggested by the notion of structured wholeness, one would expect to see only two kinds of performances within a given knowledge domain: (1) those that are consolidated at a single stage, and (2) those that are in transition and therefore exhibit the structures of two adjacent stages. Global stage scores calculated with the SISS are weighted averages. All of the arguments on each of the 6 moral issues identified in an interview are scored separately. To support the postulate of structured wholeness, scores on these arguments should not span more than two adjacent stages (Fischer and Bidell, 1998). In Kohlbergian research, arguments in a single inter- view frequently span more than two stages. This appears, on the surface, to refute the structured whole criterion. However, this result may be an artifact of the scoring system. As I have demonstrated elsewhere, any per- formance can incorporate conceptual content associated with multiple stages (Dawson, 1998). Coding systems that rely on concept-matching have

Page 7: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

152 DAWSON

the potential to produce “noisier” results than those that emphasize more formal structural criteria, because every argument on a given theme is scored, whether or not it represents the highest level of integration in the larger argument within which it is embedded. Given this potential, it can be argued that content-dependent scoring systems like Kohlberg’s and Armon’s cannot reliably be employed to examine the postulate of struc- tured wholeness. If we want to know about continuity in the overall struc- ture of an individual’s reasoning performances, it is preferable, if possible, to analyze their hierarchical complexity directly.

Finally, claims of universality for Kohlberg’s stages are generally up- held by cross-cultural studies that support invariant sequence and the ab- sence of reversals (e.g., (Nisan and Kohlberg, 1982; Snarey, et al., 1985). However, notable differences have been found cross-culturally in the con- ceptual content present in performances (Maqsud, 1979; Nisan and Kohlberg, 1982; Snarey, et al., 1985). In fact, researchers conducting these cross-cul- tural moral development studies have generally been forced to make adapta- tions to Kohlbergian dilemmas in order to make them meaningful to new populations, and have often found it impossible to score by matching re- sponses to those in the scoring manual. In addition, highest stage attainment varies across cultural contexts, raising questions about the universality of the sequence. The bias built in to the SISS makes it difficult to interpret differences in conceptual content and highest stage attainment.

In summary, we presently have a theoretical construct, developmen- tal stage, which has universal properties, in that aspects of the construct are defined similarly across domains, but we have a separate ruler for ev- ery domain in which stage is assessed and no satisfactory means for com- paring rulers across domains. Using conventional methods we can neither assess whether criteria for determining stage across domains are equiva- lent, nor whether some developmental changes in reasoning can be suc- cessfully modeled as “stages.” A generalized stage scoring system like the HCSS would theoretically make it possible to (1) compare developmental progress across domains and contexts, (2) more meaningfully address theo- retical postulates of universality and structured wholeness, and (3) exam- ine the relationship between developmental stages and conceptual content. However, because the HCSS is so different from conventional stage scar- ing systems, it is important to examine its validity as a stage-scoring sys- tem. As part of a series of checks on the validity of the HCSS as a stage scoring system, its functioning is compared here with two content depen- dent scoring systems, the SISS and the GLSS.

Page 8: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 153

Four hypotheses inform the following comparison of performances on the SISS, GLSS, and HCSS. First, it is hypothesized that the HCSS and the GLSS and SISS tap the same underlying dimension of performance: developmental stage or order of hierarchical complexity. The second hy- pothesis is that the HCSS, because it directly assesses hierarchical com- plexity rather than particular conceptual content, measures stage with greater sensitivity and consistency than either the GLSS or the SISS. The third hypothesis is that the stages identified with the HCSS will appear to be “easier” than the Good Life and Moral Judgment stages with which they correspond theoretically (see methods section). There are two rea- sons for this. First, the HCSS specifies that statements should be scored at the highest stage of hierarchical complexity evident in a given argument, and that borderline cases should be assigned the higher stage. This con- trasts with the requirement in Kohlberg’s system that borderline cases be assigned to a transitional stage, and the requirement in Armon’s system that borderline cases should be assigned to the lower stage. Second, be- cause concept matching is not necessary with the HCSS, the actual OHC of arguments will not be obscured by the presence in an interview of lower- stage concepts. The fourth and final hypothesis is that the model of Kohlberg’s moral stages, due to problems with the definition of his lower stages, will reveal less stage-like patterns of performance at stages 1 and 2 than at his higher stages.

Method

A total of 220 respondents were interviewed, resulting in 138 moral judgment interviews, 147 good life interviews, and 155 good education inter- views. Sixty-three respondents received only the good education interviews. The ages of respondents ranged from 5 to 86 years as shown in Table 1.

Moral reasoning: Kohlberg (1 969) combined a Piagetian understand- ing of developmental processes with a philosophical analysis of the moral domain to develop his theory of moral development. His instrument, the Moral Judgment Interview (Colby and Kohlberg, 1987a, 1987b), consists of a series of moral dilemmas that focus on a variety of moral issues, most of them concerned with problems of rights, responsibilities, and justice. Participants in Kohlberg’s studies were not only asked what was the right way to respond to a given dilemma, but they were required to provide justifications for their responses. This probing of respondents’ reasoning was designed to generate adequate material for scoring by eliciting par- ticipants’ highest stage of performance or competence.

Page 9: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

154 DAWSON

Table 1 Age distributions

Age All Good Ed Good life Moral

5-6 3 1 2 0 7-8 6 3 2 3 9-1 0 11 10 3 3 11-12 14 9 a 8 13-14 12 a 5 5 15-1 6 12 a 5 5 17-18 15 15 6 5 19-20 13 12 5 5 21 -25 9 7 7 7 26-30 10 5 9 9 31 -35 23 11 22 20 36-40 25 19 19 17 41 -45 23 18 17 15 46-50 5 2 5 5 51 -55 8 6 4 3 56-60 7 2 7 7

66-70 10 10 9 9 71 -86 4 3 4 4 Total 220 155 147 138

61 -65 10 6 a a

As described above, Kohlberg’s stage-scoring system was developed through a dialectical “bootstrapping” process between philosophicalhheo- retical constructs and empirical findings. Early in this process, philosophical concepts held sway over structural concepts, in that his scoring rubric ap- peared to be based more on the conceptual content of responses than their form. In his third scoring system (Colby and Kohlberg, 1987a, 1987b), structural criteria are more apparent, but are still not the explicit criteria through which stage is determined.

The interviews for Armon’s study were recorded on audiotape and transcribed. Armon scored or supervised the scoring of all of the interviews. The moral reasoning data were analyzed with Colby and Kohlberg’s (1987a; 1987b) Standard Issue Scoring System (SISS) as described in Armon (1984b). This system assigns stage scores in half-stage increments across six issues (or themes), life, law, conscience, punishment, contract, and authority. Be- fore the assignment of stage scores, scorable arguments were identified and assigned to one of the six moral issues. They were then individually scored. The scorer had access to the entire interview as she scored each segment. Stages 1.0, 1.5, 2.0, 2.5, 3.0, 3.5,4.0,4.5, and 5.0 were identified.

Evaluative Reasoning about the Good Life: The Good Life Interview focuses on evaluative reasoning about the good, including the good life,

Page 10: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 155

good work, the good person, and good friendship. Armon’s interview ap- proach differs from Kohlberg’s in that she does not pose dilemmas. Instead, her interview technique employs open-ended questions such as, “What is the good life?” “Why is that good?” A second distinction between Armon’s and Kohlberg’s work is that the development of her stage scoring system was influenced, from the beginning, by more formal structural criteria than Kohlberg’s. Like Kohlberg, however, Armon was guided in her stage defini- tions by philosophical categories, and her stage scoring method relies, as does his, on content descriptions, though she does point out that the concep- tual content included in her stage scoring manual is not exhaustive.

Armon’s stages are intended to conform to the same stage criteria as Kohlberg’s. Her longitudinal study (Armon, 1984b; Armon and Dawson, 1997) provides convincing empirical evidence of invariant sequence, and some evidence in support of structured wholeness (Dawson, 2000a), but there have been no longitudinal replication studies that could provide ad- ditional support for these or the other hard-stage criteria.

As with the moral judgment interviews, the good life interviews were recorded on audiotape and transcribed. The Good Life data were scored according to the guidelines in Armon’s (1984a) Good Life Scoring Manual (GLSM). This system assigns stage scores in full stage increments across four themes, good life, good work, good friendship, and the good person. Armon scored or supervised the scoring of all of the data. The interviews were divided into scorable segments, and the scorer had the entire inter- view in front of her as she scored each segment. Stages 1.0 2.0, 3.0, 4.0, and 5.0 were identified in this dataset.

Evaluative Reasoning about Education: The Good Education Inter- view focuses on the development of evaluative reasoning about education. Altogether, 155 good education interviews were conducted. The data for 92 of these were collected for Armon’s longitudinal study of moral reasoning and evaluative reasoning about the good. The data for the remaining 63 cases were collected by Dawson (1998) for her cross sectional investigation of the development of evaluative reasoning about education. All interviews were recorded on audiotape and transcribed verbatim. All 155 of the good education interviews were scored with the HCSS by a blind rater after being divided into scorable segments and randomly ordered.

Data collection was conducted as follows: After demographic infor- mation was collected and the respondent appeared to be at ease, the Good Education Interview was administered. It includes the following questions:

Page 11: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

156 DAWSON

1. What is a good education? or What is your concept (idea) of a good education? or What makes an education good? Why is that good (im- portant)?

2. What are the aims (goals, purposes) of a good education? Why are those good (important)?

The clinical interview, as adapted by Armon (1984b) from an ap- proach developed by Piaget (1929), was employed. Questions and probes are designed to encourage participants to expand upon their conceptions of good education, and elicit their highest level of reasoning. Responses are probed with requests for further elaboration- “Why is that good?’ “Why is that important?’ “Why should education include both of those things?”-until the interviewer is satisfied that a given participant has pre- sented as full an account as possible of his reasoning on each question. The interviewer does not introduce concepts of her own unless the subject is unable to respond to her initial questions. Instead, she notes the ele- ments of good education that are mentioned by the participant and probes for explanations of why these are important. Interviews vary in length from 15 or 20 minutes to over an hour.

For the present analysis, all 155 good education interviews were di- vided into scorable segments (statements). Because the interviews were open-ended, there was no predetermined content-guided basis for segmen- tation. Consequently, the following criteria were employed: 1. A scorable segment should, as much as is possible, represent a com-

plete argument for a given proposition or related set of propositions, including all of the “why” probes and responses associated with that argument.

2. When two or more arguments are intertwined in the same text, the text is left intact and scored only once; and

3. Arguments must include responses to “why” probes or spontaneous jus- tifications, because these, much more than the propositions themselves, reveal the structure of participants’ thinking. When these are not present, the argument is not scorable, and is dropped from the analysis.

Stage scoring for the education interviews was conducted with Com- mons’ Hierarchical Complexity Scoring System (HCSS) (Commons, et al., 2000; Commons, et al., 1998). The HCSS has been employed to score texts from a wide range of content domains, including evaluative reason- ing about the good, moral reasoning, and logico-mathematical reasoning

Page 12: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 157

(Commons, et al., 1989; Dawson, 1998; Dawson, Commons, and Wilson, in review; Goodheart, 1996; Rodriguez, 1993; Sau-Ching Lam, 1995).

The Model of Hierarchical Complexity describes 15 OHCs as listed above. The present project is concerned with stages 6 through 12. In text performances, hierarchical complexity is reflected in two aspects of per- formance: hierarchical order of abstraction and the logical organization of arguments. While the Model of Hierarchical Complexity does not itself predict that increases in hierarchical complexity will take the form of in- creasing hierarchical order of abstraction, in scoring text performances order of abstraction must be taken into account. This is because new con- cepts are formed at each OHC as the operations of the previous OHC are “summarized” into single constructs (Fischer, 1980). Burtis (1982), Halford (1999), and others suggest that this summarizing or “chunking” helps make more complex forms of thought possible by reducing the number of ele- ments that must be simultaneously coordinated, freeing up processing space and making it possible to produce an argument or conceptualization at a higher OHC. For example, the concept of honor, which appears for the first time at the formal OHC, “summarizes” an argument coordinating con- cepts of reputation, trustworthiness, and kindness constructed at the ab- stract OHC. Similarly, the concept of personal integrity, which appears for the first time at the systematic OHC, summarizes an argument coordi- nating concepts of honor, personal responsibility, and personal values con- structed at the formal OHC. Appendix 1 provides an example of scoring with the HCSS.

Brief descriptions of analogous moral stages, good life stages, and relevant (those identified in the sample) OHCs are provided in Table 2. Note that the good life and moral stages are described in terms of the conceptualizations associated with each stage in each domain. HCSS stages are defined more abstractly. Examples of concepts and arguments associ- ated with each OHC have been provided in order to connect OHC defini- tions with concrete instances. The correspondences between Good Life Stages and moral stages are suggested by Armon (1984b). The correspon- dences between Good Life Stages and OHCs are hypothetical.

Analysis

Members of the Rasch family of measurement models were employed to explore the three hypotheses that guide this analysis. Several research- ers have now employed Rasch models to examine patterns of performance in developmental data (Bond and Fox, 2001; Bond, 1994; Dawson, 1998;

Page 13: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

Table 2 c VI 00

Descriptions of Good Life Stages, Moral Stages, and Orders of Hierarchical Complexity

U P Goad Life Stage Moral Stage om OHC Conceptual OHC Concept OHC Argument Argument

and Description and Description Structure Examples Structure Examples R Stage 5 The good is conceptu- Stage 5: People seer as aked lrom a rational. gerwatiz- wmplex systems 01 qualnies systemaiii abie framework of autanmusly and processes in interadon con?Aructed values Intrimsic and with omw wmplex sptems. extrinsic d u e s are differentiated. Adions iustmed on the basis

Meta-

SL&$e 4 The good life wnsists of activities that express one’s sell-choseo interesls and values Good actiu,lies are personalty meaninghi. Focus is on sell-fulfillmt.

Stage 3: Good is a feelng of happinas resuhing from vi- t ~ e , mutual interpersonai exp-nce that Is dstthguihed from deasure.

Stage t The G w d w m s of adiwites, obpcts. and persons

tonal and malmal nterests Thew I a k i n ~ n g d s t “

that Smethe mdmdual‘s Bmo-

of uniYersal amraci principles.

stase 4: Pwapie sewn as com- plex systemsof quailtiesthat vary wml circumstances. Mons 1usMied in terms of impact on system aMi slatus wnhin systsm.

Stage 3: People underrtood ~-a! in terms of Wed psrronality charaderisk% semimems. roles, OT motives. Adions justi- fied h tems ofrewiation and charade+zation of p-. Stap 2 5 Peopie understood In terms of a lew imer states that are olcsety ted to behavior. ActiMls are justified in terms 01 the psychological response they Wks. e.9. @.

Systematic

, AMacl

Stage 2 People cbssifsed biro groups according to ac ths they perfam. Actan is ius+ 6ed in terms 01 one3 personal wereas.

between happiness and wasure.

Stage 1: TheGMd lfe consists oi phyycallstic axpenences that gratify the M s desires. oood ilml bad are dihotomired.

Stage 1: Peow conceived of as particular persons vho do p”l’ilarmncp. Acticrrs lunnied h terms of avoidng punlshment, or oblailing rewardr.

Concrete

Primary

prp operatiml

Third OTder AbstractlMs ( p r w i e s d abslractsystwns]

Second Order Abstrachons (abstradms coordin - *ng 01 “mg abnranlons)

Frst order P . b S W l M S (qual- abstracted from representatwns)

Thnd Order Reppresentational sais Lpropwties of

repesenratiinal

Second Order Representational sets Lrepreswnatiom mrdhating M mod- iying wfeswtationsi

Frst Order Rqresenlational sets [Classes of states. thinm. NcDlei

t u n d m t a i pnncipie, mid “act. paalle(ism heteronomy, prqxrtimality

verbal contract m o d commitment. development schm! StriIctUre. &-futfiihlent

“lng to an agieemem. basis. catchingon. hildmgtrust. evidence. realty

fairness. be(lef. W S B . ecbcate. “ o r w e proof. d i r e

explain. blime. be l ie , tan, being f a r , reason. understanding

belieue. ccirect,

changing yow mind, lying, tdlhg the truih

guess. probably

r-. nice, know. mink. wroog, teach, l a m

Definaimnat Identifies m e aspect of a pnndple wordin- aing systems of abstractions

Munivariate: Oardinates multiple aspects oitva or mwe abstradims

tmear. Coadinatss one aspect of two abstractions

Dehnnwna!: Identifks one aspect of a sbgle abstradLn.

MuhNaiate: Cowdinate3 muhiple aspeas of representations two or more

Linear: c a d m a t e s one a s p e c t O l N w representations

Definaiwnal: Idemiiies M%

aspect of a single reDTeSentatii

5 z A g o d educadond 5ystm mrdinates the needs of W d u a k and ”ie ty . For- tunat+, our personal interas and shared interests are commensurde-especially when we take a &vwelopmental v h .

Studems have diflermt learning styles ami interests. Ciw aspect a1 a good teacher’s rde IS to Lam as much as she can about h% students so she can indi- W a k e lnslrudion enough to engage I of her mrdents. If you talkto ywteacher like a person. she’n get to know ycu. and that will make her more interested in yw. reachers who are hierested in you are more fun to learn hom.

Teachersshould not be baing. “ley should make me c k interesting.

If you study hard, listen to your teacher and do your homework, your teacher will give you g o d grades.

You have u, do h a t the teacher srys b e c a w I you don’t she’ll be mean.

My teacher 6 nice.

Page 14: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 159

Dawson, et al., in review; Demetriou, Efklides, Papadaki, Papantoniou, and Economou, 1993; Muller, Sokol, and Overton, 1999). One of the rea- sons for this trend is that these models are designed specifically to exam- ine hierarchies of difficulty. Rasch models can be employed to test the extent to which items or scores form a stable sequence (within probabilis- tic constraints). A central tenet of stage theory is that cognitive abilities develop in an invariant sequence, making the statistical tests implemented in a Rasch analysis especially relevant to understanding stage data. A sec- ond reason for the growing employment of Rasch models is that the soft- ware used to conduct analyses provides probabilistic, quantitative estimates of both participant and item performance that can be arranged along a single interval scale (logit scale). Each of these estimates includes an error term and model fit statistics, which makes it possible to assess their reli- ability. This detailed information about item functioning and individual performances makes it possible to simultaneously examine group and in- dividual effects.

Two of the important statistics provided in a Rasch analysis are person performance estimates and item difficulty estimates. The person performance estimates order respondents by the likelihood that they will perform at a given stage. The persons whose raw scores are high will be closer to the top of the developmental continuum, and the persons whose raw scores are lower will be closer to the bottom of the continuum. The item difficulty estimates (in this case, stage difficulty estimates), which are arranged on the same metric, order items by their relative difficulty. The common metric along which both stage difficulty and person performance estimates are arranged is referred to as a logit scale, in reference to the log-odds unit employed (Ludlow, 1985; Wright and Masters, 1982; Wright and Stone, 1979). In the analyses presented here, the mean item difficulty is set at 0.

The distance between logits has a particular probabilistic meaning. In the present case, an ability estimate for a given individual means that the probability of that individual performing at a stage whose difficulty estimates are at the same level is 50%. There is a 73% probability that the same individual will perform at a stage whose difficulty estimates are one logit easier, an 88% probability that he or she will perform at a stage whose difficulty estimate is two logits easier, and a 95% probability that he or she will perform at a stage whose difficulty estimate is three logits easier. The same relationships apply, only in reverse, for stages that are one, two, and three logits harder. (For more on Rasch’s models, see (Andrich, 1988; Fisher, 1992; Fisher, 1994; Wright, 1997; Masters, 1982).

Page 15: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

160 DAWSON

The logit estimates of item difficulty and person ability are but one of the statistics essential to measurement. Reliability and validity assess- ments require (I) that item and person ability estimates be associated with an error term, which makes it possible to establish confidence intervals for all item and person ability estimates, and (2) one or more model fit statis- tics, so both items and persons can be examined for their conformity with the requirements of the model. Two types of fit statistics are included in the following analysis, outfit and infit. Fit statistics are used to assess whether a given performance (or item) is consistent with other perfor- mances (or items). They are based on the difference between observed and expected performances. Outfit statistics are based solely on the difference between observed and expected scores. In calculating infit statistics, how- ever, extreme persons or items are downweighted. In most applications, the weighted infit statistics are more useful for assessing fit, because they are not affected by outliers. Infits (or outfits) near 1 are desirable. T-val- ues are calculated to assess the significance of both positive and negative divergences from 1.

Scoring with the SISS, GLSS, and HCSS involves identifying scorable statements and awarding a stage score to each statement. This means, for any performance, that an unspecified number of scorable statements for each of the moral life, moral law, moral conscience, moral punishment, moral authority, moral contract, good life, good work, good friendship, good person, and good education issues could be identified. For example, the range of scorable responses identified in the good education interviews was from 2 to 14. Data in this form is cumbersome, and no entirely satis- factory means for summarizing such data exists, though our research team is currently exploring solutions. For the purpose of the present analysis, the data were set up so that each respondent was given 4 opportunities to score in each of the 6 moral and 4 good life thematic categories, and a total of 5 opportunities to score in the good education category. In total, this approach yielded 45 possible items, distributed among 11 thematic cat- egories. Because interviews are of different lengths, some participants pro- vided enough interview material to score on every item in every thematic category. Others did not.

Because there are no apparent patterns in the distribution of missing responses, absent responses are treated as missing at random. Missing data of this kind can be accounted for in the measurement process, though they can pose problems for estimation when some levels of items have very low sample sizes, as occurs at the extreme ranges of the distribution. When

Page 16: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 161

participants had more than the allocated number of scores in any given thematic area, which occurred in a few cases, scores were randomly de- leted until the maximum allowable number were left. Within thematic cat- egories, scores were distributed randomly across items in order to even out the ns for individual items. This was done to help ensure more reliable estimates. The “random” items, thus created, are not conventional, and raise serious questions about item independence that are currently under investigation. Table 3 shows the strings for three cases before the random reallocation of “items” and following the random reallocation of “items”.

Table 3 Response strings before and after random reallocation of items

Before random allocation of items

Caseno Moral items Good life items- Gooded 0001 1219 2199 2221 1219 0120 1221 1999 2199 2101 1219 22219 0002 333332312223399923999999333223233219233333339 0003 499945995554534449994555599945995454445955599

After random reallocation of items

Caseno Moral items Good life items- Gooded 0001 1912 2991 2221 9211 0120 1221 9919 2199 2101 9212 29212 0002 333332312223999393299999333223233912233393333 0003 994995495554534499944555995995945454445959595

Results

Two analyses were conducted. First, to examine the extent to which the SISS, GLSS, and HCSS assess the same dimension of performance, a multidimensional partial credit analysis was conducted with the software application, Conquest (Wu, Adams, and Wilson, 1998). Second, to exam- ine patterns of performance across the three instruments, the data were pooled into a single, unidimensional partial credit analysis.

Multidimensional Analysis: The underlying model of the software, Conquest, is the random coefficients multinomial logit (RCML) model (Adams and Wilson, 1996). The RCML model is a generalized Rasch model that provides the flexibility of customizing models for particular test situ- ations. The multidimensional model can be employed to test the extent to which different instruments measure the same latent dimension. Here, a total of 220 cases were included in the analysis. Fifty-six received only the good life and moral judgment interviews, 9 received only the good life and good education interviews, 63 received only the good education interview, and 92 received all three of the interviews.

Page 17: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

162 DAWSON

As an indicator of the amount of shared variance across instruments, the multidimensional analysis provides correlations, disattenuated for er- ror, between performance estimates on the instruments included in the analysis. Table 4 shows the correlations of person performance estimates across moral judgment, good life, and good education instruments from the present analysis.

Table 4 Correlations (disattenuated for error) * between the Moral Judgment Interview, Good Life Interview, and Good Education Interview

Good Life Good Education

MJI .97 .90

Good Life .92

* Correlations provided by Conquest (Wu, Adams & Wilson, 1997).

Even though the data were collected using different methods and the scoring systems relied upon quite different criteria, all of these instruments appear to work together to measure the same dimension of performance. In fact, there is less difference between these instruments than is often found between two administrations of the same instrument in other areas of psy- chological measurement. This is not to say that there are no differences, however, as revealed by the following unidimensional analysis.

Weighted item fit statistics for the multidimensional model, shown in Appendix B, are well within acceptable limits (t < 2.0). As is usually the case with stage developmental data, there is some overfit, with several t values below -2.0. Because a pattern of overfit is compatible with devel- opmental theory, we do not consider it to be a form of misfit. Deviance (an indicator of overall model fit) for the multidimensional analysis was 5115.91. The deviance statistic is employed when two or more models are compared, and has little meaning independent of such a comparison. It is reported here because a comparison of the multidimensional model and the unidimensional model is presented below.

Unidimensional Analysis

The unidimensional partial credit analysis was also conducted with the software application, Conquest (Wu, et al., 1998). Deviance for the unidimensional model is 5301.67. The difference between the deviance for the unidimensional and multidimensional models is 185.78. Distrib- uted as a chi square with 8 degrees of freedom, this difference is signifi-

Page 18: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 163

cant at the .05 level. This means that the multidimensional model explains a statistically significantly greater amount of the variance than the unidi- mensional model. Clearly, despite the high correlations between measures across instruments, something other than the shared dimension of perfor- mance is influencing the order of person performance estimates across the three instruments. A close examination of the unidimensional analysis pro- vides some insight into what this other factor (or factors) might be.

Weighted item fit statistics for the unidimensional model, shown in Appendix C, are well within acceptable limits (t < 2.0). As is usually the case with stage developmental data, there is some overfit, with several t values below -2.0. Because a pattern of overfit is compatible with devel- opmental theory, we do not consider it to be a form of misfit. Figure 1 displays some of the results of the analysis. The logit scale is displayed on the left of the figure. The person performance estimates are shown to the immediate right of the logit scale. To the right of these are the stage diffi- culty estimates for each of the three instruments. Note that the moral judg- ment scale includes transitional stages, 1.5, 2.5, and 3.5, while the good life and good education scales do not. This contributes to a somewhat less ordered appearance in the item levels for the moral judgment items. The moral judgment full stage estimates are highlighted in bold type to make it easier to visualize the range of difficulties for full stage item levels. While there is a clear separation between the estimates for moral stages 4 and 5, with estimates for each stage clustered within a two-logit range, the full- stage estimates for stages 2 and 3 overlap, with stage 2 items spanning a 6 logit range.

When item difficulty estimates cluster into clearly separated groups in a partial credit analysis, it is an indication that, in general, individuals are more likely to perform at their modal level, and less likely to perform at lower or higher levels than their modal level. The more pronounced this trend, the greater the band of white space between groups of item esti- mates. When the individual statements of a large percentage of the respon- dents in a sample are scored predominantly at single stages, the bands of white space widen. The cognitive developmental postulate of structured wholeness predicts that once individuals have access to the structures and processes of a given order of complexity, they will tend to apply these systematically, at least within a given domain. Consequently, as noted above, it would be unlikely for individuals to demonstrate reasoning at more than two adjacent orders of complexity within a particular domain of knowl- edge. Only two types of performances would be expected: those consoli-

Page 19: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

164 DAWSON

dated at a single stage, and those in transition form one stage to another (Dawson, 1998). In a partial credit analysis, large gaps between groups of item difficulty estimates reflect just such a pattern of performance.

Figure 1: Moral Reasoning, Evaluative Reasoning About the Good, and Evaluative Reasoning About Education: Map of Latent Distributions and Thresholds for Unidimensional Model

Page 20: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 165

The estimates for good life stages are well defined, with white space between the estimates for stages 3 and 4 and stages 4 and 5, and little overlap between the stage 2 and stage 3 estimates. Both the vertical spread and the levels of the stage estimates for the moral and good life items are very similar. The education items, however, present a more extreme pat- tern. First of all, the stage estimates for the good education items are clus- tered tightly, and the gaps between clusters of good education item levels are even larger, revealing a clear, step-like structure to performances on these items. This may mean more than is immediately apparent from view- ing the figure. Recall that the scorable statements for the good education items were randomly sorted and blindly scored, while the segments for the moral and good life scoring were not. Randomization and blind scoring, in and of itself, would lead to the expectation of less consistency in the good education scores, rather than more.

The fact that consistency of performances within persons is greater on the good education items indicates either that performance on the good education interview is much more consistent than performance on either the moral judgment or good life interviews, or that the HCSS, as a scoring system, is a more consistent method of assessing stage, possibly due to its focus on the structural features of arguments rather than their conceptual content. The argument might be made that the apparent increased consis- tency within performances is due to the fact that there are fewer education items than moral or good life items, but this does not appear to be the case. When the moral and good life items are subdivided into thematic catego- ries (each of which includes 4 items), as shown in Figure 2, the stage estimates within each of these categories are still much less distinctly sepa- rated than the good education estimates.

The greater consistency in performances on the good education items over the good life and moral judgment items may account for some of the additional variance explained by the multidimensional model over the unidimensional model. If some arguments presented by an individual are incorrectly scored due to error introduced by the limitations of a particular scoring method, the performance estimate for that individual will be af- fected. If this occurs in several cases, the overall order of performance estimates will vary. This variation, which is due to the measure rather than any systematic differences in the performances across domains, may well account for the additional variance explained by the multidimensional model over the unidimensional model. One way to test this hypothesis is to rescore the moral or good life interviews with the HCSS and submit the

Page 21: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

166 DAWSON

findings to a series of analyses like the one presented here. The results of one such analysis (Xie and Dawson, 2000) comparing 375 moral judg- ment interviews scored with both the SISS and HCSS show a pattern of stage estimates for the SISS and the HCSS very much like those reported here, indicating that domain differences do not account for the difference between these two scoring systems.

Logits Measures

xxxxxxxxxx

Generalised-item Thresholds

Education

t- u

Faeh 'X' renresents ? . E cases

Figure 2: Moral Reasoning, Evaluative Reasoning About the Good, and Evaluative Reasoning About Education: Map of Latent Distributions for Unidimensional Model by Issue

Page 22: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 167

One possible problem with the conclusion that the HCSS is a more consistent method of stage scoring emerges from the very strong correla- tion of the moral judgment and good life performance estimates. The per- formance estimates from the multidimensional analysis of these two systems are very highly correlated (r = .97). Does the fact that they are more con- sistent with one another than with the HCSS imply that they are measuring stage with greater consistency? Though this is certainly a plausible con- clusion, there are two other factors that may explain this high correlation. First, scoring of the good life and moral judgment interviews was con- ducted by the same group of raters. Second, the construction of both the GLSS and the SISS were strongly influenced by philosophical categories. In fact, Armon, as a student of Kohlberg’s, employed his approach, inte- grating philosophical concepts with empirical evidence. It is possible that these shared philosophical influences resulted in similar scoring criteria across the two systems.

The unidimensional analysis also reveals that the HCSS appears to be ‘easier,’ overall, than the GLSS and SISS, in that difficulty estimates for the good education items are systematically lower than for the good life and moral judgment items. It is here that the problem of comparing stages across scoring systems is most troubling. While OHCs, have been shown to correspond to specific stages in the SISS (Dawson and Kay, in review) and theoretically correspond to specific stages in the GLSS, they do not line up in accord with expectations. The estimates for metasystematic items line up fairly well with the lower estimates for stage 5 on the good life and moral judgment items, but the estimates for the systematic items are generally below estimates for Armon’s and Kohlberg’s stage 4 items, estimates for the formal items are well below Armon’s and Kohlberg’s stage 3 items, and the abstract items are even further below Kohlberg’s stage 2.5 items. Recall that the third hypothesis guiding this analysis pre- dicted that moral and good life stages would be “harder” overall than OHCs. However, it was not hypothesized that this trend would be more pronounced at the lower OHCs than at the higher OHCs. Dawson and Kay (Dawson and Kay, in review) suggest a possible explanation for this trend. They argue that several of Kohlberg’s stage 1, 2, and 2.5 scoring criteria are misspecified, many of them actually belonging to the concrete, abstract, or formal OHCs, respectively. They suggest that this may be because Kohlberg constructed his lower stage scoring criteria on the basis of per- formances that were too developmentally advanced. His youngest respon- dents were 10, the modal age at which Dawson (Dawson, et al., in review),

Page 23: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

168 DAWSON

Fischer (Fischer and Bidell, 1998) and their colleagues find abstract per- formances. If more advanced structures are indeed part of stage 1, 2, and 2.5 scoring criteria, one would expect the kind of pattern seen here, where the abstract OHC is “easier” and estimates for moral stages 1.5,2, and 2.5 present a much less systematic pattern than expected for stage items.

Discussion

The overall pattern of fit in this analysis, once known differences in the scoring systems have been accounted for, appears to support the first hypothesis, that the SISS, GLSS, and HCSS all predominantly measure hierarchical complexity. The HCSS, because of it’s focus on the level of abstraction and organization of performances rather than their particular conceptual content, is a more direct measure of this dimension, and pat- terns of performance on the good education interview as scored with the HCSS demonstrate greater consistently with the postulates of stage theory than performance on either the SISS or GLSS. However, even though the moral and good life measures appear to be less discriminating than the HCSS, they provide some evidence of order and gappiness (Wilson, 1984) that are stage-like, and their stage estimates have a systematic relationship with the good education stage estimates that suggest the differences be- tween the three measures are due to factors other than developmental stage.

Support for the second hypothesis, that the HCSS measures stage with greater consistency and sensitivity than either the GLSS or the SISS comes from the evidence shown in Figures 1 and 2 that the good education stages are more clearly delineated than those produced with the SISS and GLSS, with large gaps between stages and no overlap. In fact, there are only two kinds of performances in the good education data, consolidated performances, in which every statement is scored at a single stage, and transitional performances with statements scored at two adjacent stages. This is not the case for the SISS and GLSS, in which scores regularly span more than two stages.

Support for the third hypothesis, that the stages identified by the HCSS will appear to be “easier” than the good life and moral judgment stages with which they correspond theoretically, is readily observed in the differ- ence between estimates for good education items and both moral judg- ment and good life items. The locations of good education estimates may result from the fact that the HCSS assigns borderline statements to the higher stage in question, while Kohlberg’s system assigns transitional per-

Page 24: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 169

formances to half-stages and Armon’s systems assigns transitional perfor- mances to the lower stage. It may also be a consequence of the HCSS’s exclusive employment of structural criteria, rather than concept matching, to determine stage, especially if the ability to construct particular concep- tual content lags behind the ability to employ the organizational strategies necessary to construct that content.

Moving down the developmental scale, OHCs are progressively ‘easier’ than analogous moral and good life stages. Moreover, patterns of performance on the SISS are less stage-like at the lower stages than at the higher stages. These patterns, in combination with concerns raised about the specification of lower stage scoring criteria, suggest that the SISS un- derestimates moral competence at the lower stages as well as being incon- sistent in assessing developmental level in this range. This lack of consistency in assessment of the lower stages supports the fourth hypoth- esis, that Kohlberg’s scoring system, due to problems with the definition of his lower stages, would reveal less stage-like patterns of performance at stages 1 and 2 than at his higher stages.

Overall, the results of this analysis provide some interesting insights into the performance of the systems used to score moral judgment, good life, and good education interviews. The combination of the small size of the sample and the fact that this is a secondary analysis of the moral judg- ment and good life scores means that these results should be interpreted cautiously. However, they do indicate some possible directions for future comparisons of scoring systems, and demonstrate the utility of Rasch modeling for such efforts.

Most importantly, however, this comparison of stage assessment sys- tems demonstrates that all three systems, to a remarkable extent, assess the same dimension of performance, and this dimension displays many of the qualities attributed to stage. The gaps between OHC estimates are a strong indication that individuals tend to employ a single form of reason- ing to a much greater extent than would be expected if development did not involve hierarchical restructuring of thought processes. The larger gaps between the good education items than between moral and good life stage items, potentially resulting from the greater precision in scoring provided by the HCSS, indicate that this tendency to form arguments at a single order of hierarchical complexity may be much stronger than previous re- search has indicated. See Dawson (1998) and Dawson, Commons and Wilson (in review) for further discussion of this finding).

_-

Page 25: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

170 DAWSON

Is the stage metaphor worth preserving? I believe it is. But we may need to reconsider the burden we place on this metaphor. Stages are not particular meanings; they are orders of hierarchical complexity. This com- parison of the HCSS, a purely structural stage-scoring method, with the GLSS and the SISS shows that most of what these systems actually mea- sure is hierarchical complexity-not particular conceptualizations. Only by determining the hierarchical complexity of performances independently of their particular content, can we make it possible to order performances by their stage and then, independently, ask the question, “How do conceptualizations change or stay the same across stages?’ Only when we have a reliable measure of development as a general feature of perfor- mance can we effectively examine situational, informational, and cultural influences on the development of meaning.

Acknowledgements

Special thanks to Cheryl Armon for the use of her data, to Mark Wilson for his psychometric expertise, and to Karen Draney for her com- ments on previous drafts of this manuscript.

References Adams, R. J., and Wilson, M. (1996). Formulating the Rasch model as a mixed

coefficients multinomial logit. In G. Engelhard and M. Wilson (Eds.), Objec- tive measurement III: Theory into practice. Norwood, NJ: Ablex.

Andrich, D. (1988). Rasch models for measurement. Newbury Park, CA,: Sage Publications.

Armon, C. (1984a). Ideals of the good life and moral judgment: Ethical reasoning across the lifespan. In M. Commons, F. Richards and C. Armon (Eds.), Beyond formal operations, Volume 1: Late adolescent and adult cognitive develop- ment. New York: Praeger.

Armon, C. (1984b). Ideals of the good life: Evaluative reasoning in children and adults. Unpublished Doctoral dissertation, Harvard, Boston.

Armon, C., and Dawson, T. L. (1997). Developmental trajectories in moral rea- soning across the lifespan. Journal of Moral Education, 26(4), 433-453.

Bond, T., and Fox, C. M. (2001). Applying the Rasch model: Fundamental mea- surement for the human sciences. Mahwah, NJ: Lawrence Erlbaum.

Bond, T., G. (1994). Piaget and measurement 11: Empirical validation of the Piagetian model. Archives de Psychologie, 63, 155-185.

Boyes, M. C., and Walker, L. J. (1988). Implications of cultural diversity for the universality claims of Kohlberg’s theory of moral reasoning. Human Develop-

Page 26: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 171

ment, 31( l), 44-59. Burtis, P. J. (1982). Capacity increase and chunking in the development of short-

term memory. Journal of Experimental Child Psychology, 34(387-4 13). Case, R., Okamoto, Y., Griffin, S., McKeough, A., Bleiker, C., Henderson, B.,

and Stephenson, K. M. (1996). The role of central conceptual structures in the development of children’s thought (Vol. 60, Serial no. 246).

Colby, A., and Kohlberg, L. (1987a). The measurement of moral judgment, Vol. 1: Theoretical foundations and research validation. New York: Cambridge University Press.

Colby, A., and Kohlberg, L. (1987b). The measurement of moral judgment, Vol. 2: Standard issue scoring manual. New York: Cambridge University Press,

Colby, A., Kohlberg, L., Gibbs, J., and Lieberman, M. (Eds.). (1983).A longitudi- nal study of moral judgment (Vol. 48).

Commons, M. L., Armon, C., Richards, F. A., Schrader, D. E., Farrell, E. W., Tappan, M. B., and Bauer, N. E (1989). A multidomain study of adult develop- ment. In D. Sinnott, E A. Richards, and C. Armon (Eds.), Adult development, Vol. 1: Comparisons and applications of developmental models. (pp. 33-56). New York: Praeger.

Commons, M. L., Danaher, D., Miller, P. M., and Dawson, T. L. (2000, June). The Hierarchical Complexity Scoring System: How to score anything. Paper pre- sented at the Annual meeting of the Society for Research in Adult Develop- ment, New York City.

Commons, M. L., and Grotzer, T. A. (1990). The relationship between Piagetian and Kohlbergian stage: An examination of the “necessary but not sufficient relationship”. In M. L. Commons and C. Armon and L. Kohlberg and E A. Richards and T. A. Grotzer and J. D. Sinnott (Eds.), Adult development, VoZ. 2: Models and methods in the study of adolescent and adult thought. (pp. 205- 231). New York: Praeger.

Commons, M. L., Richards, F. A,, with Ruf, E J., Armstrong-Roche, M., and Bretzius, S. (1984). A general model of stage theory. In M. Commons, E A. Richards and C. Armon (Eds.), Beyond Formal Operations (pp. 120-140). New York: Praeger.

Commons, M. L., Trudeau, E. J., Stein, S. A., Richards, S . A., and Krause, S. R. (1998). Hierarchical complexity of tasks shows the existence of developmen- tal stages. Developmental Review, 18,237-278.

Damon, W. (1977). Measurement and social development. Counseling Psycholo- gist, 6(4), 13-15.

Damon, W., and Hart, D. (1982). The development of self-understanding from infancy through adolescence. Child Development, 53,841-864.

Page 27: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

172 DAWSON

Dawson, T. L. (1998). “A good education is ... ” A life-span investigation of devel- opmental and conceptual features of evaluative reasoning about education. Unpublished doctoral dissertation, University of California at Berkeley, Ber- keley, CA.

Dawson, T. L. (2002). New tools, new insights: Kohlberg’s moral reasoning stages revisited. International Journal of Behavioral Development.

Dawson, T. L., Commons, M. L., and Wilson, M. (in review). The shape of devel- opment.

Dawson, T. L., Commons, M. L., Wilson, M., and Xie, Y. (1999, June). The Gen- eral Model of Hierarchical Complexity Scoring System: Refinements and ad- ditions. Paper presented at the annual symposium of the Jean Piaget Society, Mexico City, Mexico.

Dawson, T. L., and Kay, A. (in review). A stage is a stage is a stage: A direct comparison of two scoring systems.

Demetriou, A., Efklides, A., Papadaki, M., Papantoniou, G., and Economou, A. (1993). Structure and development of causal-experimental thought: From early adolescence to youth. Developmental Psychology, 29,480-497.

Edelstein, W., Keller, M., and Wahlen, K. (1984). Structure and content in social cognition: Conceptual and empirical analyses. Child Development, 55, 15 14- 1526.

Fischer, K. (1980). A theory of cognitive development: The control and construc- tion of hierarchies of skills. Psychological Review, 87,477-53 1.

Fischer, K. W., and Bidell, T. R. (1998). Dynamic development of psychological structures in action and thought. In W. Damon and R. M. Lerner (Eds.), Hand- book of Child Psychology: Theoretical models of human development (5 ed., pp. 467-561). New York: John Wiley and Sons.

Fisher, W. P., Jr. (1992). Objectivity in measurement: A philosophical history of Rasch’s scalability theorem. In M. Wilson (Ed.), Objective measurement: Theory into practice (pp. 29-60). Norwood, NJ: Ablex.

Fisher, W. P., Jr. (1994). The Rasch debate: Validity and revolution in educational measurement. In M. Wilson (Ed.), Objective measurement: theory into prac- tice (pp. 36-72). Norwood, NJ: Ablex.

Fowler, J. W. (1991). Stages in faith consciousness. New Directions for Child Development, 27-45.

Gilligan, C. (1977). In a different voice: Women’s conceptions of self and of mo- rality. Haward Educational Review, 47,481-517.

Gilligan, C. (1982). In a different voice: Psychological theory and women’s de- velopment. Cambridge, MA: Harvard University Press.

Page 28: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 173

Goodheart, E., Dawson, T., and Commons, M. (1996, June). Primary, concrete, abstract, formal, systematic, and metasystematic operations as observed in a piagetian logical task series. Paper presented at the Annual Meeting of the Society for reseach in Adult development, Boston.

Halford, G. S. (1999). The properties of representations used in higher cognitive processes: Developmental implications, Development of mental representa- tion: Theories and applications. (pp. 147- 168). Mahwah, NJ, USA: Lawrence Erlbaum Associates, Inc., Publishers.

Haste, H., and Baddeley, J. (1991). Moral theory and culture: The case of gender. In W. M. Kurtines and J. L. Gewirtz (Eds.), Handbook of moral behavior and development, Vol. 1: Theory (pp. 223-249). Hillsdale, NJ: Lawrence Erlbaum.

Holstein, C. B. (1976). Irreversible, stepwise sequence in the development of moral judgment: A longitudinal study of males and females. Child Development, 47( l),

Keasey, C . B. (1975). Implicators of cognitive development for moral reasoning. In D. Palma and Foley (Eds.), Moral development: Current theory and re- search (pp. overcoming the structure/content problem). Mahwah, NJ: Lawrence Earlbaum.

Keller, M., Eckensberger, L. H., and von Rosen, K. (1989). A critical note on the conception of preconventional morality: The case of stage 2 in Kohlberg’s theory. International Journal of Behavioral Development, 12, 57-69,

Keller, M., and Reuss, S. (1984). An action-theoretical reconstruction of the de- velopment of social-cognitive competence. Human Development, 27,211-220.

King, P. M., Kitchener, K. S., Wood, P. K., and Davison, M. L. (1989). Relation- ships across developmental domains: A longitudinal study of intellectual, moral, and ego development. In M. L. Commons, J. D. Sinnot, F. A. Richards, and C . Armon (Eds.), Adult development. Volume I: Comparisons and applications of developmental models (pp. 57-71). New York: Praeger.

Kitchener, K. S., and King, P. M. (1990). The reflective judgment model: ten years of research. In M. L. Commons, C. Armon, L. Kohlberg, E A. Richards, T. A. Grotzer, and J. D. Sinnott (Eds.), Adult development (Vol. 2, pp. 62-78). New York: Praeger.

Kohlberg, L. (1969). Stage and sequence: The cognitive-developmental approach to socialization. In D. Goslin (Ed.), Handbook of socialization theory and re- search (pp. 347-480). Chicago: Rand McNally.

Kohlberg, L. (1984). Moral stages and moralization: A cognitive developmental approach, The psychology of moral development: The nature and validity of moral stages (Vol. 2, pp. 170-205). San Francisco: Jossey Bass.

Kohlberg, L., and Armon, C. (1984). Three types of stage models in the study of adult development. In M. L. Commons, E A. Richards, T. A. Grotzer, and J. D.

51-61.

Page 29: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

174 DAWSON

Sinnot (Eds.), Beyond formal operations: Vol 1. Late adolescent and adult cognitive development (pp. 357-381). New York: Praeger.

Kuhn, D. (1976). Short-term longitudinal evidence for the sequentiality of Kohlberg’s early stages of moral judgment. Developmental Psychology, 12,

Kuhn, D., Langer, J., Kohlberg, L., and Haan, N. S. (1977). The development of formal operations in logical and moral judgment. Genetic Psychology Mono-

Levine, C. G. (1979). The form-content distinction in moral developmental re- search. Human Development, 22,225-234.

Ludlow, L. H. (1985). A strategy for the graphical representation of Rasch model residuals. Educational and Psychological Measurement, 45,85 1-859.

Maqsud, M. (1979). Cultural influences on transition in the development of moral reasoning in Nigerian boys. Journal of Social Psychology, 108, 151-159.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika,

Muller, U., Sokol, B., and Overton, W. E (1999). Developmental sequences in class reasoning and propositional reasoning. Journal of Experimental Child

Nisan, M., and Kohlberg, L. (1982). Universality and variation in moral judg- ment: A longitudinal and cross-sectional study in Turkey. Child Development,

Piaget, J. (1929). The childk conception of the world. London: Routledge and Kegan Paul.

Puka, B. (Ed.). (1994). The great justice debate: Kohlberg criticism. New York: Garland Publishing.

Rodriguez, J. A. (1993). The adult stages of social perspective-taking: Assess- ment with the Doctor-Patient Problem. Unpublished doctoral dissertation, Harvard, Cambridge, MA.

Rosenberg, S. W., Ward, D., and Chilton, S. (1988). Political reasoning and cog- nition. London: Duke University Press.

Sau-Ching Lam, M. (1995). Women and men scientists’ notions of the good life: A developmental approach. Unpublished Doctoral dissertation, University of Massachusetts.

Selman, R., and Damon, W. (1975). The neccessity (but insufficiency) of social perspective taking for conceptions of justice at three early levels. In D. Palma and Foley (Eds.), Moral development: Current theory and research. Mahwah, NJ: Lawrence Erlbaum.

162- 166.

graphs, 95, 97-188.

47, 149-174.

Psychology, 74(2), 69-106.

53, 865-876.

Page 30: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 175

Selman, R. L. (1971a). The relation of role taking to the development of moral judgment in children. Child Development, 42( l), 79-91.

Selman, R. L. (1971b). Taking another’s perspective: Role-taking development in early childhood. Child Development, 42, 1721-1734.

Simpson, E. L. (1976). A holistic approach to moral development and behavior. In T. Lickona (Ed.), Moral development and behavior: Theory, research, and social issues. New York: Holt, Rinehart, and Winston.

Snarey, 3. R., Reimer, J. , and Kohlberg, L. (1985). Development of social-moral reasoning among Kibbutz adolescents: A longitudinal cross-cultural study. Developmental Psychology, 21, 3-17.

Stuart, R. B. (1967). Decentration in the development of children’s concepts of moral and causal judgments. Journal of Genetic Psychology, 3,59-68.

Walker, L. J. (1980). Cognitive and perspective-taking prerequisites for moral development. Child Development, 51, 131-139.

Walker, L. J. (1982). The sequentiality of Kohlberg’s stages of moral develop- ment. Child Development, 53, 1330-1336.

Walker, L. J. (1989). A longitudinal study of moral reasoning. Child Develop- ment, 60, 157-166.

Walker, L. J., and Richards, B. S. (1979). Stimulating transitions in moral reason- ing as a function of stage of cognitive development. Developmental Psychol-

Willett, J. B. (1989). Some results on reliability for the longitudinal measurement of change: Implications for the design of studies of individual growth. Educa- tional and Psychological Measurement, 49(3), 587-602.

Wilson, M. (1984). A psychometric model of hierarchical development. Unpub- lished doctoral dissertation.

Wright, B. D. (1997). Fundamental measurement for outcome evaluation. In R. M. Smith (Ed.), Outcome measurement (Vol. 11, pp. 261-288). Philadelphia: Hanley and Belfus.

Wright, B. D., and Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press.

Wright, B. D., and Stone, M. H. (1979). Best test design. Chicago: MESA Press. Wu, M., Adams, R., and Wilson, M. (1998). ConQuest user guide. Hawthorn,

AU: ACER. Xie, Y., and Dawson, T. L. (2000, April). Multidimensional models in a develop-

mental context. Paper presented at the International Objective Measurement Workshop, New Orleans, LA.

ogy, 15,95-103.

Page 31: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

176 DAWSON

Appendix A

Scoring with the HCSS

To score for stage with the HCSS, one must first identify a scorable statement. A scorable statement is defined as a segment of text that con- tains a complete argument or justification. Here, for example, is an extract from one of the good education interviews conducted for the present project: A. The ideal education would be to have a school system, okay, first of

all, that you have teachers that are aware of human needs, and have a balanced perspective. Because nowadays we have one type of school where all they care about are the emotional needs and social interac- tion, but they have no intellectual growth for the kids. And you have the others that will make them into little computers. So what you have to do is realize that both of those have some equal weight in maximiz- ing potential in human beings. I think where our problem lies, in devis- ing the system, is that we’re in the age of specialization, which I’m completely appalled by. Because I feel it makes all these lopsided people in society. So that we have to start reeducating the educators about the uniqueness of the human being and his potentiality and the fragileness of it. That he has-not only is he an animal, emotional but he is also an intellectual, he has intellectual potential. And these things are inter- twined so he can seek his happiness. So we have to have educators that are more interdisciplinary, that are more aware of all the different re- percussions, the factors that influence the human being.

The general argument stated here is that a good education promotes both the emotional and the intellectual development of students, and that good schools systems and teachers should promote this development. Several justifications (or arguments) are offered:

f l .

f2.

g l .

82.

h.

Teachers would be aware of human needs. Teachers would have a balanced perspective. Teachers would not emphasize emotional needs over intel- lectual needs. Teachers do not emphasize intellectual needs over emotional needs. Intellectual and emotional growth have equal weight in maxi- mizing potential.

Page 32: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

~

j . k. 11.

12.

13.

ml. m2. m3. n.

P.

ql . 92.

q3.

STAGE SCORING SYSTEMS COMPARED 177

It appalls me that people are so specialized. Specialization makes lopsided people. Educators should be reeducated about the uniqueness of hu- man beings. Educators should be reeducated about the potentiality of hu- man beings. Educators should be reeducated about the fragility of human beings. Human beings have an animal part. Human beings have an emotional part. Human beings have an intellectual part. The emotional and intellectual parts are intertwined. Ability to seek happiness results from [appropriate?] inter- twining of parts of self. Educators should be more interdisciplinary. Educators should be more aware of the different repercus- sions. Educators should be more aware of the factors that influence the human being.

Each of these justifications is minimally a formal proposition, in that each coordinates a pair of abstract or formal concepts. For example, in k, the formal concept, specialization, is coordinated with the abstract con- cept of lopsided people. Specialization is considered to be a formal con- cept because even the most rudimentary understanding of this concept involves the abstraction of an abstraction. The first level of abstraction is the idea of specializing in a particular area. The second is the idea of spe- cialization without reference to any particular area, specialization in gen- eral. The concept of lopsided people is considered to be at least abstract, because it refers to a generalized, abstract category, lopsided, of another such category, people. The concept, lopsided people could also be consid- ered a formal category, depending upon the intended meaning of lopsided. However, in the present case, it is not necessary to explore the meaning of this term more fully because there are several examples of formal con- cepts in the text, more than enough to conclude that this participant is, minimally, performing at the formal stage.

Page 33: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

178 DAWSON

In addition to formal propositions, several systematic propositions are present in the text sample. Some of these are concepts such asfactors that influence the human being. This is a system because one can infer from the types of factors listed in the text as a whole that thesefactors are, at the least, a coordinated set of formal concepts such as the potentiality, fragility, and uniqueness of human beings, and their intellectual and emo- tional needs. Other systems present in the text are constructions contain- ing several concepts that are related to one another through causal or other logical links. The following four systems are formed in this way:

System A = proposition (fl + f2) where proposition (81 + 82) be- cause proposition h. System B = proposition j because proposition k, \ proposition (11 + 12 + 13) System C = proposition (ml ,m2 = m3) where proposition n System D = proposition p, \ proposition (91 + 42 + 43) The final System, D, is actually a metasystem, because it coordinates

System A with Systems B and C. Educators, in a good education, must balance the intellectual and emotional needs of students, and, in order to do this, they must understand their students’ potentiality, fragility, and uniqueness. In other words, they must coordinate these two knowledge systems. In making his case, this participant has demonstrated his ability to think metasystematically by advocating that educators should reason metasystematically. Based on this analysis, the stage score for this participant’s performance would be stage 5, or metasystematic.

Page 34: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 179

Appendix B

1 MLIFE1 2 MLIFE2 3 MLIFE3 4 MLIFE4 5 MLAWl 6 MLAWZ 7 MLAW3 8 MLAW4 9 MCONSl 10 MCONS2 11 MCONS3 12 MCONS4 13 MPUNl 14 MPLIN2 15 MPUN3 16 MPUN4 17 MCONTl 18 MCONT2 19 MCONT3 20 MCONT4 21 MAUTHl 22 MAUTH2 23 MAUTH3 24 MAUTH4 25 GLIFE1 26 GLIFE2 27 GLIFE3 28 GLIFE4 29 GWORK1 30 GWORK2 31 GWORK3 32 GWORK4

0.198 0.123

-0.256 0.021 -0.259 -0.685 -0.020 -0.178 0.189 -0.211 -0.092 -0.216 -0.730 -0.669 -1.839 -0,473 0.898

-0.145 -0,190 0.060 0.467 0.836 0.924 2.248* 0.870

-0.254 -0,318 -0.387 0.940

-0.002 -0.547 -0.485

1.55 2.3 1.03 0.2 1.31 1.4 1.11 0.5 1.20 0.8 1.24 0.9 0.59 -1.6 1.05 0.3 0.96 -0.1 1.00 0.1 1.27 1.0 1.17 0.7 1.12 0.5 0.97 -0.0 1.12 0.5 0.93 -0.1 1.11 0.6 1.15 0.7 1.02 0.2 1.19 0.8 0.84 -0.5 2.73 4.2 1.25 0.8

0.95 -0.2 0.87 -0.7 0.95 -0.2 0.69 -1.8 0.78 -1.4 0.95 -0.3 7.38 16.6 0.81 -1.1

1.40 2.1 1.22 1.3 1.31 1.7 1.18 1.0 1.47 1.9 1.34 1.5 0.61 -2.1 1.28 1.2 1.05 0.4 1.10 0.5 1.35 1.5 1.28 1.2 1.39 1.8 1.06 0.3 1.48 1.8 1.05 0.3 1.22 1.3 1.36 1.9 1.17 0.9 1.14 0.8 1.05 0.3 1.42 1.5 1.14 0.6

1.02 0.2 0.95 -0.3 1.03 0.2 0.82 -1.2 0.86 -1.0 0.86 -0.9 0.91 -0.5 0.86 -0.9

(Appendix continued on next page.)

Page 35: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

180 DAWSON

(Continued from previous page.)

1 MLIFE1 1 MLIFE1 1 MLIFEl 1 MLIFEl 1 MLIFEl 2 MLIFE2 2 MLIFEZ 2 MLIFE2 2 MLIFEZ 2 MLIFE2 3 MLIFE3 3 MLIFE3 3 MLIFE3 3 MLIFE3 3 MLIFE3 4 MLIFE4 4 MLIFE4 4 MLIFE4 4 MLIFE4 4 MLIFE4 5 MLAW1 5 MLAW1 5 MLAW1 5 MLAWl 5 MLAW1 6 MLAWZ 6 MLAW2 6 MLAW2 6 MLAW2

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4

-2.317 -4.184 -0.139 0.935 2.059

-2.076 -5.025 -0.537 0.930 2.394

-1.723 -4.509 -0.618 1.041 2.729

-2.064 -4.485 -1.528 1.038 2.535 -2.699 -2.338 -0.249 -0.480 2.277 -5.750 -1.938 -0.742 0.073

0.87 -0.8 1.97 4.7 0.81 -1.2 0.56 -3.2 0.73 -1.8 0.20 -7.7 0.14 -8.9 0.69 -2.1 0.58 -3.0 1.26 1.5 0.59 -2.9 0.63 -2.5 1.32 1.8 1.43 2.4 0.71 -1.9 0.47 -3.8 0.48 -3.7 1.60 3.0 0.60 -2.6 0.57 -2.9 1.59 2.2 2.90 5.5 1.15 0.7 0.66 -1.5 0.48 -2.7 0.29 -4.2 0.48 -2.6 0.75 -1.1 0.93 -0.2

1.59 1.6 1.43 1.3 0.98 -0.1 0.86 -1.0 1.06 0.3 0.83 -0.3 0.54 -1.5 0.95 -0.2 0.86 -1.0 1.36 1.6 1.25 1.0 1.24 1.0 1.32 1.7 1.12 1.2 1.04 0.2 0.87 -0.2 0.92 -0.1 1.40 1.6 0.86 -0.9 0.98 -0.0 0.97 0.0 1.12 0.5 1.11 0.4 0.93 -0.2 0.97 0.0 0.91 0.1 1.10 0.4 1.32 0.9 1.05 0.3

(Appendix continued on next page.)

7- - -~ ~~ ___

Page 36: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 18 1

(Continued from previous page.)

VARIABLES

item _-_-_--___

6 MLAW2 7 MLAW3 7 MLAW3 7 MLAW3 7 MLAW3 7 MLAW3 8 MLAW4 8 MLAW4 8 MLAW4 8 MLAW4 8 MLAW4 9 MCONSl 9 MCONSl 9 MCONSl 9 MCONSl 9 MCONSl 10 MCONS2 10 MCONS2 10 MCONSZ 10 MCONS2 10 MCONSZ 11 MCONS3 11 MCONS3 11 MCONS3 11 MCONS3 11 MCONS3 12 MCONS4 12 MCONS4 12 MCONS4 12 MCONS4 12 MCONS4 13 MPUNl 13 MPUNl 13 MPUNl 13 MPUNl 14 MPUN2 14 MPUNZ 14 MPUN2 14 MPUN2 15 MPUN3 15 MPUN3 15 MPUN3 16 MPUN4 16 MPUN4 16 MPUN4 16 MPUN4 17 MCONTl 17 MCONTl 17 MCONTl 17 MCONTl 17 MCONTl 18 MCONT2 18 MCONT2 18 MCONT2

5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 1 2 3 4 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3

3.034 -3.954 -2.600 -0.961 0.583 3.545

-4.123 -1.438 -0.978 0.303 2.860

-3.277 -3.783 -0.598 0.411 2.551 2.409 -8.878 -1.086 0.647 3.135 2.086 -8.092 -0.215 0.619 2.891 2.433 -8.239 -1.359 1.609 2.338

-3.172 -1.398 0.124 0.462

-2.969 -1.434 0.652 0.490 1.933

-4.310 -0.250 3,016

-8.050 -0.141 2,124

-2.837 -3.727 -0.118 -1.163 4.312

-5.104 -3.114 -0.091

0.68 -1.4 0.13 -5.5 0.26 -4.0 0.36 -3.2 0.53 -2.1 0.26 -4.0 0.34 -3.6 0.42 -3.0 0.49 -2.5 1.23 1.0 0.88 -0.4 0.07 -8.9 0.23 -5.9 0.44 -3.6 0.55 -2.7 0.62 -2.2 0.29 -4.1 0.29 -4.1 0.27 -4.3 0.69 -1.4 0.42 -3.0 2.35 4.0 2.34 4.0 0.99 0.1 1.07 0.4 0.46 -2.6 0.45 -2.6 0.45 -2.6 0.66 -1.4 0.64 -1.5 0.65 -1.4 3.51 7.1 1.45 1.9 0.88 -0.5 1.00 0.1 0.54 -2.0 0.61 -1.6 0.67 -1.3 0.70 -1.2 0.55 -1.8 0.56 -1.8 0.81 -0.6 0.25 -3.6 0.25 -3.6 0.62 -1.4 0.52 -1.8 1.48 2.5 0.43 -4.4 0.69 -2.0 0.78 -1.4 0.13 -8.7 0.12 -8.2 0.23 -6.4 0.74 -1.6

1.09 0.4 0.53 -1.2 0.65 -0.8 0.55 -1.9 0.73 -1.3 0.83 -0.2 0.79 -0.5 0.79 -0.7 0.79 -0.6 1.16 0.8 1.29 0.9 0.31 -2.1 0.57 -1.2 0.65 -1.9 0.82 -1.0 1.11 0.5 0.92 -0.0 0.92 -0.0 0.52 -1.7 0.87 -0.7 0.95 -0.0 1.18 0.6 1.18 0.6 1.05 0.3 1.14 0.9 0.98 0.1 1.08 0.3 1.08 0.3 0.85 -0.5 0.90 -0.5 1.04 0.2 1.20 0.7 1.24 1.0 0.99 0.0 0.92 -0.5 0.86 -0.3 0.82 -0.5 0.85 -0.6 0.94 -0.4 0.68 -1.8 0.68 -1.8 0.88 -0.9 0.57 -1.1 0.57 -1.1 0.81 -0.8 0.87 -0.4 1.66 1.8 0.86 -0.4 0.89 -0.6 1.10 0.6 0.89 0.0 0.75 -0.4 0.69 -1.1 1.04 0.3

(Appendix continued on next page.)

Page 37: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

182 DAWSON

18 MCONTZ 18 MCONTZ 19 MCONT3 19 MCONT3 19 MCONT3 19 MCONT3 19 MCONT3 20 MCONT4 20 MCONT4 20 MCONT4 20 MCONT4 20 MCONT4 21 MAUTHl 21 MAUTHl 21 MAUTHl 21 MAUTHl 22 MAUTHZ 22 MAUTH2 23 MAUTH3 23 MAUTH3 23 MAUTH3 24 MAUTH4 24 MAUTH4 25 GLIFEl 25 GLIFEl 26 GLIFE2 26 GLIFE2 27 GLIFE3 27 GLIFE3 28 GLIFE4 28 GLIFE4 29 GWORKl 29 GWORK1 30 GWORKZ 30 GWORKZ 31 GWORK3 31 GWORK3 32 GWORK4 32 GWORK4 33 GFRIENDl 33 GFRIENDl 34 GFRIENDZ 34 GFRIENDZ 35 GFRIEND3 35 GFRIEND3 36 GFRIEND4 36 GFRIEND4 37 GPERSONl 37 GPERSONl 38 GPERSONZ 38 GPERSONZ 39 GPERSON3 39 GPERSON3

4 5 1 2 3 4 5 1 2 3 4 5 2 3 4 5 3 4 3 4 5 3 4 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

0.065 4.076 -4.659 -2.376 0.043

-0.113 4.319

-1.474 -4.013 0.746

-1.081 3.461

-5.424 0.059 1.038 1.440

-0.696 -1.536 -1.040 -0.715 0.725

-0.935 -1.502 -4.236 0.305

-4.055 -0.418 -3.121 -0.266 -5.263 0.272

-4.504 -0.189 -4.761 -0.803 -3.786 -0.524 -3.720 -0.143 -4.706 0.721

-5.059 0.108

-3.955 -0.136 -3.946 0.038

-4.350 0.503

-5.158 -0.253 -5.848 -0.256

0.86 -0.8 0.41 -4.2 0.20 -6.5 1.88 3.7 1.11 0.6 0.86 -0.7 0.22 -6.2 0.54 -2.6 0.63 -2.1 1.14 0.7 0.79 -1.1 0.38 -4.0 0.36 -3.5 0.63 -1.7 0.49 -2.6 0.87 -0.5 2.16 3.4 7.71 11.0 1.57 1.7 0.83 -0.5 0.60 -1.4 1.25 0.8 1.75 2.0 1.00 0.1 0.75 -1.8 0.39 -4.7 0.91 -0.5 1.26 1.4 1.40 2.1 0.31 -5.5 0.81 -1.1 0.27 -7.4 9.24 23.2 0.97 -0.2 1.01 0.1 8.96 21.0 9.17 21.3 1.06 0.4 1.10 0.7 1.06 0.4 1.10 0.7 0.35 -5.4 1.03 0.2 0.54 -3.4 0.77 -1.5 1.48 2.7 0.79 -1.4 0.46 -3.8 0.93 -0.4 0.58 -2.6 1.03 0.2 5.27 11.7 1.17 0.9

1.13 0.8 0.99 0.1 0.72 -0.6 1.23 0.8 1.01 0.1 1.07 0.5 0.94 0.1 1.15 0.5 1.21 0.6 1.23 1.1 1.04 0.2 0.99 0.2 0.68 -0.8 0.79 -1.0 0.77 -1.2 1.14 0.6 1.61 2.2 1.40 2.0 1.19 0.8 1.00 0.0 0.94 -0.0 1.40 1.5 1.33 1.3 0.98 -0.0 1.03 0.3 0.81 -0.8 1.15 1.0 1.12 0.6 1.12 0.9 0.75 -0.8 0.96 -0.2 0.73 -1.1 0.94 -0.5 0.69 -1.1 1.05 0.3 0.90 -0.4 1.02 0.2 1.03 0.2 0.96 -0.4 0.99 0.0 1.07 0.7 0.62 -1.4 1.07 0.5 0.79 -0.9 0.98 -0.1 1.05 0.3 1.01 0.1 0.85 -0.5 1.17 1.4 1.13 0.4 1.19 1.1 1.08 0.3 1.16 0.9

(Appendix continued on next page.)

Page 38: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 183

(Continued from previous page.)

VARIABLES --_-_-_--_-------_--_-----

item step -----__-_-__--_-___-----------.

40 GPERSON4 1 40 GPERSON4 2 41 GOODEDl 1 41 GOODEDl 2 41 GOODEDl 3 42 GOODED2 1 42 GOODED2 2 42 GOODEDZ 3 43 GOODED3 1 43 GOODED3 2 43 GOODED3 3 44 GOODED4 1 44 GOODED4 2 44 GOODED4 3 45 GOODED5 1 45 GOODED5 2 45 GOODED5 3

UNWGHTED FIT - _- - _- _- - __- _-

ESTIMATE ERROR MNSQ T

-5.572 0.15 -7.2 -0.247 0.71 -1.6 -6.778 0.49 -4.8 -2.393 1.14 1.1 2.246 0.68 -2.7

-7.064 0.45 -5.3 -1.615 1.15 1.2 1.924 0.70 -2.6

-7.813 0.30 -7.8 -1.108 1.21 1.6 1.544 0.83 -1.3

-6.734 0.31 -7.6 -2.151 0.59 -3.7 1.656 0.80 -1.7

-6.976 0.79 -1.8 -1.210 0.86 -1.1 1.201 1.00 0.0

WGHTED FIT

MNSQ T ------_-_-___-_--

0.57 -1.1 0.89 -0.5 1.13 0.6 1.34 1.9 0.94 -0.4 1.11 0.5 1.12 0.8 0.99 -0.1 0.97 -0.0 1.09 0.6 1.15 1.1 0.72 -1.5 0.90 -0.6 1.01 0.1 1.08 0.4 0.88 -0.7 0.94 -0.4

1 2 3

11.752 10.534 0.965 15.923 0.898 0.924

Page 39: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

184 DAWSON

Appendix C

VARIABLES

item

1 MLIFEl 2 MLIFE2 3 MLIFE3 4 MLIFE4 5 MLAWl 6 MLAW2 7 MLAW3 8 MLAW4 9 MCONSl 10 MCONS2 11 MCONS3 12 MCONS4 13 MPUNl 14 MPUN2 15 MPUN3 16 MPUN4 17 MCONTl 18 MCONT2 19 MCONT3 20 MCONT4 21 MAUTHl 22 MAUTH2 23 MAUTH3 24 MAUTH4 25 GLIFEl 26 GLIFEZ 27 GLIFE3 28 GLIFE4 29 GWORKl 30 GWORKZ 31 GWORK3

__-------__-- ESTIMATE

-0.202 0.917 0.569 0.831 0.661

-0.585 0.978 -0.380 1.044 0.705 0.843 0.711 0.288 0.233

-1.348 0.305 1.702 0.763 0.647 0.868 1.090 1.537 1.555 2.712 -0.320 -0.563 -0.858 -0.617 -0.301 -0.921 -0.763

. - - - - - - - - 0.105 0.108 0.106 0.115 0.126 0.134 0.135 0.128 0.120 0.133 0.131 0.135 0.126 0.134 0.151 0.143 0.108 0.118 0.118 0.116 0.129 0.142 0.137 0.148 0.102 0.112 0.105 0.111 0.098 0.109 0.106

0.76 -1.8 0.85 -1.1 0.75 -1.9 0.74 -1.9 0.67 -2.3 0.74 -1.7 0.59 -2.9 0.74 -1.8 0.66 -2.5 0.66 -2.4 0.71 -1.9 0.73 -1.8 0.80 -1.3 0.69 -2.0 0.67 -2.2 0.69 -2.0 0.69 -2.4 0.70 -2.2 0.78 -1.5 0.73 -1.9 0.68 -2.2 0.91 -0.5 0.75 -1.6 0.73 -1.7 0.79 -1.6 0.90 -0.7 0.91 -0.6 0.83 -1.2 0.72 -2.3 0.82 -1.3 0.69 -2.4

0.88 -0.9 1.05 0.4 0.80 -1.7 0.79 -1.7 0.71 -2.2 0.76 -1.7 0.58 -3.4 0.83 -1.2 0.71 -2.4 0.61 -3.2 0.82 -1.3 0.74 -2.0 1.00 0.0 0.75 -1.9 0.70 -2.5 0.74 -2.0 0.76 -2.0 0.79 -1.7 0.86 -1.0 0.80 -1.5 0.83 -1.2 0.77 -1.7 0.79 -1.5 0.77 -1.8 1.00 0.0 0.93 -0.5 1.07 0.5 0.97 -0.2 0.88 -0.9 0.87 -1.0 0.67 -2.7

(Appendix continued on next page.)

Page 40: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 185

(Continued from previous page.)

32 GWORK4 33 GFRIENDl 34 GFRIEND2 35 GFRIEND3 36 GFRIEND4 37 GPERSONl 38 GPERSONZ 39 GPERSON3 40 GPERSON4 41 GOODEDl 42 GOODED2 43 GOODED3 44 GOODED4 45 GOODED5

-1.009 -1.077 0.522 -0.835 -0.322 -0.840 0.605

-0.134 -0.565 -1.207 -2.155 -1.524 -2.278 -1.282*

0.104 0.104 0.109 0.107 0.103 0.112 0.117 0.115 0.122 0.118 0.116 0.118 0.117

0.70 -2.3 0.83 -1.4 0.88 -0.8 0.90 -0.7 0.77 -1.7 0.89 -0.9 0.77 -1.7 0.88 -0.9 0.72 -2.2 0.84 -1.2 0.78 -1.5 0.93 -0.5 0.88 -0.8 0.99 -0.0 0.87 -0.8 1.04 0.4 0.67 -2.4 0.79 -1.6 0.73 -2.2 0.66 -3.1 0.64 -3.2 0.57 -3.9 0.84 -1.3 0.67 -2.8 0.74 -2.2 0.71 -2.6

1 MLIFEl 1 MLIFEl 1 MLIFEl 1 MLIFEl 1 MLIFE1 1 MLIFEl 1 MLIFEl 2 MLIFE2 2 MLIFE2 2 MLIFE2 2 MLIFE2 2 MLIFE2 3 MLIFE3 3 MLIFE3 3 MLIFE3 3 MLIFE3 3 MLIFE3 4 MLIFE4 4 MLIFE4 4 MLIFE4 4 MLIFE4 4 MLIFE4 5 MLAWl 5 MLAWl 5 MLAWl 5 MLAWl 5 MLAWl 6 MLAW2

1 2 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 1

1.238 -8.489 -0.459 -2.548 1.129 1.871 2.799

-1.443 -4.547 -0.417 0.681 1.925

-1.134 -4.051 -0.499 0.823 2.280 -1.453 -4.033 -1.315 0.812 2.046 -2.207 -1.944 -0.038 -0.633 1.834 3.266

0.653 0.653 0.511 0.464 0.286 0.312 0.421 0.539 0.508 0.292 0.292 0.405 0.432 0.420 0.294 0.281 0.405 0.573 0.543 0.349 0.309 0.425 0.575 0.519 0.458 0.440 0.531 0.953

0.09 -10.1 0.09 -10.1 1.09 0.6 1.91 4.5 0.88 -0.7 0.53 -3.5 0.80 -1.3 0.20 -7.7 0.22 -7.3 0.76 -1.5 0.84 -1.0 1.98 4.8 0.39 -4.8 0.40 -4.7 0.92 -0.5 0.95 -0.3 1.10 0.6 2.32 5.6 2.28 5.5 1.27 1.5 0.51 -3.4 0.35 -5.1 1.54 2.1 1.99 3.4 0.89 -0.4 0.68 -1.5 0.40 -3.3 0.61 -1.9

0.51 -1.3 0.51 -1.3 1.37 1.2 1.22 0.8 0.99 -0.0 0.78 -1.6 0.91 -0.3 0.84 -0.4 0.84 -0.4 0.82 -1.3 0.97 -0.2 1.27 1.1 0.95 -0.1 0.97 -0.1 0.99 -0.0 1.07 0.7 0.94 -0.2 0.87 -0.3 0.98 0.1 1.16 0.8 0.68 -2.5 0.81 -0.7 0.88 -0.2 1.02 0.2 1.07 0.4 0.95 -0.1

1.26 0.6 0.80 -0.7

(Appendix continued on next page.)

Page 41: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

186 DAWSON

6 MLAW2 6 MLAW2 6 MLAW2 6 MLAW2 6 MLAW2 6 MLAW2 7 MLAW3 7 MLAW3 7 MLAW3 7 MLAW3 7 MLAW3 8 MLAW4 8 MLAW4 8 MLAW4 8 MLAW4 8 MLAW4 8 MLAW4 8 MLAW4 9 MCONSl 9 MCONSl 9 MCONSl 9 MCONSl 9 MCONSl 10 MCONS2 10 MCONS2 10 MCONS2 10 MCONS2 10 MCONS2 11 MCONS3 11 MCONS3 11 MCONS3 11 MCONS3 11 MCONS3 12 MCONS4 12 MCONS4 12 MCONS4 12 MCONS4 12 MCONS4 13 MPUNl 13 MPUN1 13 MPUNl 13 MPUNl 14 MPUN2 14 MPUN2 14 MPUN2 14 MPUN2 15 MPUN3 15 MPUN3 15 MPUN3 15 MPUN3 15 MPUN3 16 MPUN4 16 MPUN4

2 3 4 5 6 7 3 4 5 6 7 1 2 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 3 4 5 6 3 4 5 6 1 2 3 4 5 3 4

-4.737 -8.124 -0.411 0.325 0.713 3.334

-3.211 -2 .lo8 -0.806 0.328 2.929

-4.566 2.630

-7.259 0.156 0.264 1.279 3.541

-2.575 -3.441 -0.422 0.234 2.045 2.332

-7.550 -0.934 0.354 2.604 2.719

-7.314 -0.184 0.326 2.347 2.391 -7.002 -1.117 1.323 1.757

-2.404 -1.226 0.140 0.126

-2.484 -1.062 0.653 0.151 3.822

-1.696 -1.642 -3,565 0.302 3.021

-7,291

0.953 0.952 0.666 0.562 0.453 0.472 0.832 0.675 0.488 0.459 0.779 0.692 0.666 0.666 0.556 0.501 0.440 0.580 0.657 0.564 0.371 0.360 0.466 0.714 0.714 0.474 0.414 0.576 0.657 0.657 0.424 0.426 0.648 0.669 0.668 0.480 0.456 0.587 0.517 0.441 0.388 0.379 0.568 0.500 0.458 0.451 0.468 0.468 0.468 0.468 0.437 0.690 0.690

0.61 -1.9 0.61 -1.9 1.10 0.5 1.52 2.0 1.10 0.5 0.63 -1.7 0.11 -5.8 0.31 -3.6 0.40 -2.9 0.57 -1.9 0.32 -3.5 1.52 1.9 1.55 2.0 1.55 2.0 1.44 1.6 1.00 0.1 1.49 1.8 0.52 -2.3 0.10 -8.1 0.22 -6.1 0.43 -3.7 0.78 -1.2 0.55 -2.7 0.16 -5.6 0.16 -5.6 0.47 -2.7 0.54 -2.2 0.50 -2.5 2.73 4.8 2.72 4.8 0.79 -0.8 0.98 0.0 0.42 -2.9 0.51 -2.2 0.51 -2.2 0.53 -2.1 0.51 -2.2 0.44 -2.7 1.64 2.5 1.21 0.9 1.07 0.4 1.23 1.0 0.91 -0.2 0.89 -0.3 0.75 -0.9 0.98 0.0 0.63 -1.4 0.63 -1.4 0.63 -1.4 0.63 -1.4 0.82 -0.6 0.28 -3.3 0.28 -3.3

1.26 0.6 1.26 0.6 1.23 0.6 1.48 1.3 1.07 0.4 0.92 -0.3 0.48 -1.2 0.61 -0.8 0.59 -2.0 0.70 -1.6 0.92 0.0 1.11 0.4 1.03 0.2 1.02 0.2 0.98 0.0 1.06 0.3 1.06 0.4 1.05 0.3 0.51 -1.2 0.60 -1.2 0.64 -2.0 1.01 0.1 1.06 0.3 0.49 -1.1 0.49 -1.1 0.73 -1.1 0.69 -1.5 0.99 0.1 1.01 0.1 1.01 0.1 0.93 -0.3 1.08 0.6 0.92 -0.1 1.17 0.5 1.17 0.5 0.75 -0.9 0.67 -2.0 0.84 -0.5 1.22 0.7 1.24 1.0 1.10 0.6 1.00 0.1 0.93 -0.1 0.92 -0.2 0.92 -0.4 1.08 0.7 0.75 -1.5 0.75 -1.5 0.75 -1.5 0.75 -1.5 0.89 -1.0 0.66 -0.7 0.66 -0.7

(Appendix continued on next page.

Page 42: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 187

(Continued from previous page.)

16 MPUN4 16 MPUN4 17 MCONTl 17 MCONTl 17 MCONTl 17 MCONTl 17 MCONTl 18 MCONT2 18 MCONTZ 18 MCONTZ 18 MCONT2 18 MCONT2 19 MCONT3 19 MCONT3 19 MCONT3 19 MCONT3 19 MCONT3 20 MCONT4 20 MCONT4 20 MCONT4 20 MCONT4 20 MCONT4 21 MAUTHl 21 MAUTHl 21 MAUTHl 21 MAUTHl 22 MAUTHZ 22 MAUTHZ 23 MAUTH3 23 MAUTH3 23 MAUTH3 24 MAUTH4 24 MAUTH4 25 GLIFEl 25 GLIFEl 25 GLIFEl 25 GLIFEl 25 GLIFEl 25 GLIFEl 26 GLIFE2 26 GLIFE2 26 GLIFE2 26 GLIFE2 26 GLIFE2 27 GLIFE3 27 GLIFE3 27 GLIFE3 27 GLIFE3 27 GLIFE3 27 GLIFE3 28 GLIFE4 28 GLIFE4

5 6 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 4 5 6 7 5 6 5 6 7 5 6 1 2 3 4 5 6 2 3 4 5 6 1 2 3 4 5 6 1 2

-0.141 1.850

-2.242 -3.359 -0.019 -1.377 3.895 -4.359 -2.711 0.042 -0.171 3.557

-4.078 -2.033 0.180 -0.250 3.875

-0.976 -3.638 0.913

-1.200 3.046

-4.726 0.224 0.866 1.123 -0.298 -1.583 -0.858 -0.599 0.674

-0.671 -1.472 0.721

-8.305 2.028 -2.990 7.776

-3.890 -5.413 4.333

-5.865 6.873

-4.492 -0.160 -7.107 2.165

-1.941 7.145 -4.056 1.756

-7.053

0.514 0.593 0.491 0.432 0.318 0.316 0.764 0.763 0.489 0.352 0.328 0.630 0.659 0.472 0.355 0.332 0.744 0.540 0.510 0.363 0.353 0.620 0.651 0.419 0.449 0.537 0.439 0.439 0.512 0.501 0.655 0.518 0.547 0.434 0.434 0.337 0.330 0.295 0.295 0.427 0.376 0.377 0.303 0.303 0.414 0.414 0.341 0.335 0.296 0.296 0.475 0.475

0.72 -0.9 0.71 -1.0 0.53 -3.3 0.32 -5.5 0.59 -2.9 0.60 -2.8 0.14 -8.5 0.17 -7.2 0.21 -6.7 0.56 -2.9 0.78 -1.2 0.79 -1.2 0.14 -7.5 0.75 -1.4 1.07 0.4 1.03 0.2 0.20 -6.5 0.97 -0.1 0.87 -0.6 1.01 0.1 0.89 -0.5 0.32 -4.6 0.50 -2.5 0.71 -1.3 0.57 -2.1 1.04 0.2 1.66 2.2 3.36 5.7 1.39 1.2 0.90 -0.2 0.68 -1.1 1.45 1.3 1.56 1.6 1.17 1.1 1.17 1.1 0.92 -0.5 0.93 -0.5 0.88 -0.8 0.88 -0.8 0.50 -3.6 0.60 -2.8 0.60 -2.8 0.98 -0.1 0.98 -0.1 0.55 -3.1 0.55 -3.1 1.00 0.1 1.14 0.8 1.11 0.7 1.11 0.7 1.68 3.3 1.58 2.9

0.85 -0.6 1.00 0.1 1.27 1.0 0.73 -1.1 0.83 -1.1 0.87 -0.9 0.87 -0.0 0.72 -0.3 0.60 -1.5 0.80 -1.1 0.99 -0.0 1.08 0.3 0.62 -0.9 1.04 0.2 1.12 0.7 1.17 1.1 0.81 -0.1 1.10 0.4 1.20 0.7 1.06 0.4 0.93 -0.4 0.86 -0.2 0.75 -0.4 0.85 -0.6 0.89 -0.6 1.16 0.7 1.38 1.8 1.29 1.6 1.28 1.1 1.05 0.3 1.02 0.2 1.44 1.8 1.24 0.9 1.07 0.3 1.06 0.3 0.87 -0.6 0.92 -0.3 1.09 0.7 1.10 0.7 0.91 -0.2 0.85 -0.6 0.85 -0.6 1.16 1.2 1.16 1.2 0.82 -0.7 0.82 -0.7 1.04 0.3 1.11 0.6 1.17 1.5 1.17 1.5 0.82 -0.5 0.82 -0.5

(Appendix continued on next page.)

Page 43: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

188 DAWSON

VARIABLES

item step

28 GLIFE4 28 GLIFE4 28 GLIFE4 28 GLIFE4 29 GWORKl 29 GWORKl 29 GWORK1 29 GWORKl 29 GWORKl 29 GWORKl 30 GWORK2 30 GWORK2 30 GWORK2 30 GWORK2 30 GWORK2 30 GWORK2 31 GWORK3 31 GWORK3 31 GWORK3 31 GWORK3 31 GWORK3 31 GWORK3 32 GWORK4 32 GWORK4 32 GWORK4 32 GWORK4 32 GWORK4 32 GWORK4 33 GFRIENDl 33 GFRIENDl 33 GFRIENDl 33 GFRIENDl 33 GFRIENDl 33 GFRIENDl 34 GFRIENDZ 34 GFRIENDZ 34 GFRIEND2 34 GFRIEND2 35 GFRIEND3 35 GFRIEND3 35 GFRIEND3 35 GFRIEND3 35 GFRIEND3 35 GFRIEND3 36 GFRIEND4 36 GFRIEND4 36 GFRIEND4 36 GFRIEND4 36 GFRIEND4 36 GFRIEND4 37 GPERSONl 37 GPERSON1 37 GPERSONl

3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3

UNWGHTED FIT WGHTED FIT --_---_-------- -----------__

ESTIMATE ERROR MNSQ T MNSQ T

3.913 -6.520 7.371

-4.296 0.765

-8.155 4.939

-6.187 7.624 -4.208 0.108

-7.918 4.772

-5.877 7.373

-4.431 1.362

-6.997 4.359

-5.415 6.775

-4.490 0.297

-7.470 4.836

-5.375 7.388

-4.072 -4.309 -3.943 4.567

-5.645 4.700 -0.011 3.056

-7.745 3.454

-2.644 -0.046 -7.130 4.800

-5.517 7.403

-4.115 1.818

-7.021 4.505

-6.039 7.100

-4.403 0.176

-7.572 4.745

0.435 0.435 0.310 0.310 0.428 0.428 0.345 0.345 0.262 0.262 0.476 0.476 0.398 0.398 0.290 0.290 0.357 0.357 0.330 0.330 0.271 0.271 0.372 0.372 0.326 0.326 0.270 0.270 0.373 0.369 0.338 0.337 0.265 0.269 0.413 0.413 0.288 0.288 0.403 0.403 0.340 0.340 0.283 0.283 0.372 0.372 0.331 0.331 0.281 0.281 0.450 0.450 0.390

0.28 -5.9 0.28 -5.9 0.83 -1.0 0.83 -1.0 0.19 -9.0 0.18 -9.0 0.43 -5.2 0.42 -5.2 6.41 18.1 6.62 18.6 0.78 -1.5 0.78 -1.5 0.82 -1.2 0.82 -1.2 0.90 -0.6 0.90 -0.6 0.48 -4.1 0.48 -4.1 0.65 -2.6 0.65 -2.5 0.83 -1.1 0.83 -1.1 0.39 -5.1 0.39 -5.1 0.63 -2.7 0.62 -2.7 0.88 -0.7 0.88 -0.7 1.85 4.5 1.83 4.5 1.36 2.2 1.34 2.1 1.01 0.1 0.90 -0.6 0.31 -6.0 0.31 -6.0 1.01 0.1 0.94 -0.3 0.52 -3.5 0.52 -3.5 0.65 -2.4 0.65 -2.4 1.00 0.1 1.00 0.1 1.22 1.3 1.22 1.3 1.12 0.8 1.11 0.7 0.68 -2.3 0.68 -2.3 0.54 -3.1 0.54 -3.1 0.63 -2.3

0.63 -1.6 0.63 -1.6 1.02 0.2 1.03 0.2 0.64 -1.5 0.63 -1.5 0.78 -1.0 0.78 -1.0 0.92 -0.6 0.92 -0.6 0.71 -0.9 0.71 -0.9 0.59 -1.8 0.59 -1.8 0.96 -0.2 0.96 -0.2 0.71 -1.4 0.71 -1.4 0.80 -1.0 0.80 -1.0 0.90 -0.8 0.90 -0.8 0.81 -0.9 0.81 -0.9 0.84 -0.8 0.83 -0.8 1.00 0.0 1.00 0.0 1.25 1.2 1.16 0.8 1.00 0.1 1.00 0.1 1.05 0.5 1.02 0.2 0.59 -1.7 0.59 -1.7 1.12 0.9 1.07 0.5 0.88 -0.4 0.88 -0.4 0.84 -0.8 0.84 -0.8 1.11 0.9 1.11 0.9 0.82 -0.8 0.81 -0.8 0.91 -0.4 0.91 -0.4 0.84 -1.3 0.84 -1.3 0.92 -0.2 0.91 -0.2 0.82 -0.7

(Appendix continued on next page.)

Page 44: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

STAGE SCORING SYSTEMS COMPARED 189

(Continued from previous page.)

37 GPERSONl 37 GPERSONl 37 GPERSONl 38 GPERSONZ 38 GPERSONZ 38 GPERSONZ 38 GPERSONZ 39 GPERSON3 39 GPERSON3 39 GPERSON3 39 GPERSON3 39 GPERSON3 39 GPERSON3 40 GPERSON4 40 GPERSON4 40 GPERSON4 40 GPERSON4 40 GPERSON4 40 GPERSON4 41 GOODEDl 41 GOODEDl 41 GOODEDl 42 GOODEDZ 42 GOODEDZ 42 GOODEDZ 42 GOODEDZ 43 GOODED3 43 GOODED3 43 GOODED3 44 GOODED4 44 GOODED4 44 GOODED4 44 GOODED4 45 GOODED5 45 GOODED5 45 GOODED5

4 -5.729 5 7.848 6 -3.882 3 3.151 4 -7.855 5 6.160 6 -5.700 1 2.473 2 -7.356 3 3.844 4 -7.306 5 7.281 6 -4.767 1 0.824 2 -7.965 3 4.620 4 -6.823 5 7.709 6 -4.440 3 -4.543 4 -1.736 5 1.605 2 -2.889 3 -4.501 4 -0.249 5 2.115 3 -5.199 4 -0.720 5 0.932 2 -3.601 3 -3.978 4 -0.541 5 2.086 3 -4.594 4 -0.760 5 0.682

0.390 0.312 0.313 0.499 0.499 0.331 0.331 0.584 0.584 0.558 0.558 0.351 0.351 0.715 0.715 0.564 0.564 0.371 0.371 0.314 0.256 0.226 0.316 0.307 0.244 0.224 0.309 0.241 0.223 0.326 0.308 0.251 0.222 0.298 0.247 0.227

0.63 -2.3 0.85 -0.8 0.85 -0.8 0.32 -5.0 0.32 -5.0 1.54 2.6 1.55 2.6 1.96 4.0 1.96 4.0 1.88 3.8 1.88 3.7 1.30 1.5 1.30 1.5 0.08 -8.6 0.08 -8.6 0.13 -7.5 0.13 -7.5

0.80 -1.0 0.73 -2.3 1.00 0.1 0.72 -2.3 0.44 -5.5 0.45 -5.4 0.79 -1.7 0.76 -1.9 0.54 -4.4 0.85 -1.2 0.96 -0.2 0.38 -6.5 0.40 -6.2 0.65 -3.1 0.76 -2.0 0.55 -4.2 0.63 -3.3 0.68 -2.8

0.80 -1.0

0.82 -0.7 1.16 1.2 1.16 1.2 0.79 -0.6 0.78 -0.6 1.20 1.2 1.20 1.2 1.01 0.1 1.01 0.1 0.90 -0.2 0.90 -0.2 1.27 1.6 1.27 1.6 0.45 -1.2 0.45 -1.2 0.45 -1.6 0.45 -1.6 0.96 -0.1 0.96 -0.1 1.09 0.5 1.03 0.3 0.95 -0.6 0.95 -0.2 0.90 -0.5 0.92 -0.6 0.96 -0.5 0.85 -0.7 0.80 -1.5 1.08 0.7 0.92 -0.4 0.86 -0.7 0.94 -0.4 0.97 -0.3 0.90 -0.5 0.76 -1.8 0.88 -1.3

Page 45: A Comparison of Three Developmental Stage Scoring SystemsKohlberg’s moral stages describe 3 periods of development in the moral domain: preconventional, conventional, and postconventional

JOURNAL OF APPLIED MEASUREMENT, 3(2), 190-204

Copyright" 2002

Development of a Functional Movement Scale for Infants

Suzann K. Campbell University of Illinois at Chicago

Benjamin D. Wright

J. Michael Linacre University of Chicago

The increasing survival rate of infants with a complicated birth and perinatal history generated the need for a test of functional motor performance with the capability of identifying children under four months of age with delayed development which could be addressed with physical therapy. This paper describes a Rasch analysis of the psychometric qualities of the Test of Infant Motor Performance (TIMP) for the purpose of reducing the length of the test while maintaining its precision as a measurement device. Following analysis of fit statistics, item-to-total correlations, redundancy of item difficulty measures, and consideration of clinically-relevant features of test items from analysis of 1732 tests, the TIMP was reduced from 59 to 42 items forming a functional motor scale for prematurely born infants. The resulting person separation index was 4.85 and the item separation index was 23.19.

Requests for reprints should be sent to Suzann K. Campbell, Department of Physical Therapy, University of Illinois at Chicago, 1919 W. Taylor Street M/C 898, Chicago, IL 60612-725 1, email: skc @uic.edu.