measuring the quality of education: a report on assessing educational progress

Measuring the Quality of Education:

A Report on Assessing Educational Progress

(Material taken from a news release and the Wirtz/Lapointe report. f

Since 1969 National Assessment of Educational Progress (NAEP) reports on the scholastic achieve- ments of the nation’s elementary and secondary level students have been appearing frequently. The reactions to these reports are a perplexing mix of acclaim and criticism, of expressed respect but apparently slight regard.

In December 1980 Willard Wirtz and Archie E. Lapointe were asked by the Carnegie Corporation, Ford Foundation, and Spencer Founda- tion to undertake a 1-year study to evaluate the Assessment and consider the anomaly of the gap between the Assessment’s reputed quality among experts and its relatively slight public notice and influence. Measuring the Quality of Educa- t ion is the product of that study.

Copies of the report are available from Wirtz and Lapointe, 1211 Connecticut Avenue, N.W., Wash- ington, D.C. 20036. Price: $5.00 plus shipping.

The following recommenda- tions were made:

I That the essential elements of the National Assessment be maintained as vital factors in implement- ing an educational standards policy: by reporting nationwide student achievement levels; by showing changes over time; by developing objectives and exercises consistent with effective educational quality and process, and; by providing services to state and local assessments, testing, and standard-setting agencies.

11-A That the National Assess- ment program be restored to a basis permitting at least two assessments each year.

11-B That current Assessment sampling practices be maintained; that the one-hour limitation on the use of student time be increased to two hours; that continuing atten-

tion be given to possible economies in instrument administration practices; and that the Assessment reports include information regarding the courses students have had in the subject area assessed.

11-C That the Assessment not be expanded to permit reporting of results on a state-by-state basis; but that arrangements be made to facilitate use of the Assessment by state or local school agencies for comparisons with nationwide student achievement levels. -

11-D That the Assessment be administered in the future on a grade-level basis at the fourth, eighth, and twelfth grades; and that it include, as i t did originally, out- of-school seventeen-year-olds and a young adult group.

11-E That Assessment practices and policies relating to the setting of objectives be directed toward including all elements important in improving education in particular learning areas, and that increased efforts be made to develop patterns of objectives that will facilitate the establishing of higher educational standards.

11-F That the National Assess- ment procedures and capacity for developing instrument items be revised to meet the need for items that will measure with maximum accuracy, and in a variety of ways, students’ proficiencies as they relate to identified educational objectives.

11-G That National Assessment results be reported on an aggre- gated-item basis; that recently adopted practices regarding interpretation of these results be maintained; that no definitive qualitative judgments be included in Assessment announcements; and that an independent council be established (see recommendation IV) with the responsibility, among others, to improve public under- standing of this type of data.

11-H That a fuller research and development component be in-

cluded in the Assessment program. 11-1 That the Assessment pro-

gram be designed and administered to maximize its service function to state and local educational assessment and standard-setting agencies.

111-A That Assessment data be developed in forms facilitating their use for research purposes, including particularly the analysis of factors that may relate causally to student achievement. 111-B That a program of specific

assessments be developed to illumi- nate particular educational policy issues.

IV That an Educational Assess- ment Council be established to synthesize data developed by various assessment and measuring systems, to improve communica- tion of the meaning and signifi- cance of educational statistics, and to recommend changes in the processes and structure of the educational measurement system.

An overriding concern of this report is that NAEP’s reports and materials are not as widely known or as useful to educators as they should be. EM invited a number of prominent educators to address this concern. Six responses are printed here. The editor welcomes addition- al reactions/suggestions and will be happy to share them with the new Director of NAEP, Dr. Beverly Anderson, 1860 Lincoln Street, Suite 700, Denver, Colorado 80295.

More Useful Reports While taking the opportunity to

offer suggestions for improvement, I also want to acknowledge how helpful NAEP has been to us at the state level. After all, good marks should be recognized also.

1. The variety of publications disseminated by NAEP certainly enhances the use of its data by all

Winter 1982 17

types of audiences, including the lay person and the media. This area has definitely strengthened since its inception.

2. The comparative trends are quite helpful also, though they often reflect an inconsistency across ages which is disconcerting but nevertheless valuable as we look at the total process of education. The reality of the data often shocks us.

3. To my knowledge, no other group has had such a nationwide impact on improving testing practices as has NAEP, primarily because of the quality workshops that they sponsor each summer. The states would suffer without this staff development resource.

4. Though analysis is done from the existing data for minorities, male or female, various regions, and various economic groups, the sample should be increased so that the analytical results could be valid for each region. For example, are Blacks progressing differently in the West than the South, or are scores for girls from the Midwest different than the scores of their counterparts from the Northeast?

5. In addition, I feel that certain minimal competencies in each of the areas tested should be tested with sufficient numbers of students so that comparable statewide data would be available. Note that the emphasis is on min imal skills that would be expected regardless of one’s geographic location. These data should be compared state by state. My judgment is that the variation from state to state would be slight even though their demo- graphic variables are different because nearly everyone is capable of reaching the minimum. This data would, however, add some measure of reassurance to the publics from various states.

6. Adding to national assessment some measures appropriate to the learning disabled and educational mentally handicapped would also be helpful, especially i f i t was related to how they were being provided services. In short, make the purpose of NAEP partly evalua- tive.

Wil l iam J . Brown Director of Research

North Carolina State Department of Public Instruction

A Distant Trumpet The National Assessment of

Educational Progress cannot be all things to all educators. What it is designed to do, it does well. The NAEP tells us regularly and consis- tently what students in the nation know and can do. But its generaliza- tions of nationwide achievement may not absolutely mesmerize local school districts in the midst of current priorities. Although comprehensive in their item-by-item analyses and multigroup comparisons, NAEP reports can be over- whelming. However, the information can be helpful in buttressing local planning and needs assessment.

From the perspective of the local school district, the question is not as much how to make NAEP test data more accessible to local and state educators, which the recent Wirtz/ Lapointe study proposes, as it is how to make it more relevant to local situations. To accomplish this might require a different design, which may not be what local districts want or care to support. To maintain integrity of current and regional national samples presents a continuing demand on a cooperative school district to commit itself to repeated testing over the years. It is unlikely that expanded time and effort will be available for NAEP to do more. Currently, the NAEP-style testing violates a longstanding principle in some districts because it is not designed to supply specific results to participating students and schools. “For the good of a national sample” is not a very appealing reason for participation. Given the daily crises and priorities of local and state assessment efforts, I don’t view NAEP soon being more than it is now-a “distant trumpet.”

Margaret Fleming Deputy Superintendent

Cleueland Public Schools

Using NAEP Results I am thoroughly satisfied with

NAEP’s services and materials. In previous employment with

another school district, I developed a minimal competency testing program in which I relied heavily on advice from NAEP, as well as their released test items. In addition,

I subcontracted with an NAEP consultant to help us develop a scoring mechanism for a student writing sample.

My phone calls to NAEP regarding advice on testing issues, and particularly the use of NAEP material, have always received a prompt and thorough response. I have had several opportunities to attend the NAEP summer confer- ence in Boulder and have found these to be extremely worthwhile.

About the only thing I can say I would appreciate more from NAEP at the present would be a concise overview of the trends that occur in major discipline areas as a result of their testing over the years.

I found a recent bulletin most interesting, wherein students’ interpretation of literature was de- scribed as being in a state of decline. It would be helpful if longitudinal overviews of the trends occurring in such subject areas could be written u p in a concise brochure form.

Jeremy M . Hughes Superintendent

Haslett Public Schools, Michigan

Using National Assessment Data in Schools

National Assessment of Educa- tional Progress (NAEP) was designed to provide a benchmark for designated objectives and to report on progress, or lack of it, toward those objectives. The reports were to reflect national trends. NAEP has, for the most part, operated as designed. Assessment has been done, reports written, and results reported widely by the media. Thus in one sense of the word “use,” NAEP findings have been used.

I want to clarify what I mean by the term “use” in this context. It means that teachers, principals, and other curriculum decisionmakers look at items and data, study the relationships of the items and data to the curriculum, and base some activity on the results of that study. “Activity” is the important impli- cation, and it can range from discussing the results with colleagues to making major program changes .

Although NAEP has met its major objective of providing national benchmarks, considerable

18

concern has been expressed that results are not usedat the local level. This lack of use of assessment results is more the norm than the exception. In Pennsylvania we found that state assessment results often did not get used primarily because few school staff ever use any standardized test results extensively. The most extensive use of test results continued to be by teachers who use their own or short, text- oriented tests to assign a grade or to group students.

However, it is not surprising that so little use is made. of test results because there is virtually no train- ing in test interpretation or use in teacher or administrator prepara- tion programs. A major effort is required if educators are to be convinced to use any test results effectively.

The most successful way we have found to get people “into the data” is through what we call item analysis processes. We have designed worksheets that show clus- ters of items, such as all items measuring basic operations with whole numbers. The percentage of students responding correctly to an item within the cluster is recorded for both the school and the state. The state percentage is then sub- tracted from the local percentage and is recorded with a plus or minus sign. We suggest looking for patterns of strengths or weaknesses based on the differences between the local scores and the comparison group.

Once a given set of items has been analyzed, we ask the “analysts” to consider why the students re- spbnded as they did and, if the pattern was negative, what they might do to improve the skills on which the responses are based. Answers to this question usually indicate that educators do know ways to change the status quo.

Based on this experience, we suggest that one way for NAEP to get classroom teachers or principals to use NAEP results is to prepare a special packet of usable materials for them. It might be a short list of similar items with national and regional response data. Subject matter teachers could administer the items to a class and quickly compute the percentages for each item. By comparing the percentages with nationaVregiona1 data, the

teachers could ascertain how these students fared according to the standard. The action the teacher takes would depend on the results. If students compare favorably the teacher might inform colleagues, the principal, the students or even the school board. Less favorable responses may not receive such widespread dissemination but may be disturbing to the teacher and perhaps stimulate investigation into the reasons for the results.

Of course, the use of national/regional comparison data is not necessary. The teacher could use a criterion approach by reading through the items, estimating the percent of students who would respond correctly, administering the items, and matching results to the estimates. Actions taken as a result of the comparison might be similar to those mentioned above. In any event, some activity has taken place; NAEP results have been used as another kind of benchmark.

A similar packet could be pre- pared for the principal using atti- tudinal items or those that are more school-referenced than subject- referenced. By administering a small set of NAEP items to a sample of students, the principal could determine how this school’s students compared to those nationally and regionally in these more general areas.

With the advent of microcompu- ters, sets of items could easily be put on line so that students could take the test almost any time and results could be made available immedi- ately. The computer could make analyses instantaneous, and it could be programmed to retain the results for longitudinal comparisons, especially if interventions were effected.

However, getting school staff, principals, or teachers to take time to use even these simple materials will be difficult. They are already overwhelmed by required activities. One motivational approach might be to emphasize that recent research indicates that more effective principals and teachers monitor student achievement more frequently than do those who are less effective.

National Assessment would also be faced with the problem of informing educators about the availability of these materials. That could be done by sending NAEP

representatives to state, regiona1, and national teacher conventions of the various subject matter disci- plines to demonstrate and distribute the packets. NAEP could offer workshops and, of course, make good use of its already existing mailing lists to make educators more aware of such materials.

J . Robert Coldiron, Chzef Division of Educational

Quality Assessment Pennsylvania Department

of Education

A Solution in Search of a Problem

A solution in search of a problem. Is this a fair description of NAEP? I once saw a box of corks about to be discarded. Alas, although I had no immediate need for corks, it seemed a shame to destroy a collection so perfect in design and so pregnant with potential. I must be part of a large audience of test developers who have marveled over NAEP’s carefully developed body of objectives and test items covering such a wide range of intended school outcomes. We have wished that it was possible to dip into this pool, use a few carefully selected items and report the results on some type of scale that allowed for comparisons with the whole nation, with selected regions and over different points of time without the various trammels of the fixed-item approach, including the awkwardness of dealing with “released” items and “unreleased” items.

NAEP is to be commended for its explorations into various applica- tions of item response theory. Many of the different models and applica- tions may prove useful; I am most familiar with the work of Bock, Mislevy, and Reiser, who have developed scaling models specially suited to large-scale, sampling type assessment programs. The marriage of matrix sampling and item response curve methodology they have arranged seems to be off to a happy and productive start.

Major theoretical or philosophi- cal shifts may have to be accommo- dated. It seems certain, however, that the payoffs would be substan- tial. NAEP itself would be on more solid footing and the second-order NAEP consumers-state and district assessment personnel-would

Cont inued on page 2?

Winter 1982 19

learning theory and cognitive development, in general.

Answering Questions A b o u t ERIC. The Clearinghouse will respond to all inquiries about the ERIC database and how to use it. We can provide assistance in developing a search strategy for your computerized search of the database, present workshops on the use of ERIC, or explain how to order paper copies of ERIC documents. Each year, we respond to thousands of requests for this type of information.

Answer ing Questions A b o u t Tes t ing and Measurement. Because this Clearinghouse specializes in testing, measurement, and evaluation, we are more than willing to respond to questions in these areas. We can verify specific test titles or find out who publishes a certain test, identify test reviews or validity studies, identify examples of a specific evaluation technique, or do a computerized literature search on a testing or evaluation topic.

Computerized literature searches of the ERIC database are available from ERIC/TM for $25.00 plus $0.10 per citation retrieved. If you wish to make your search more comprehensive, we will expand it to include other databases, for a charge of $12.50 per database. T o arrange for a computerized literature search, contact our User Services Coordina- tor, Louisa Coburn, at (609) 734- 5181.

Publications. ERIC/TM also publishes bibliographies and reports on current issues in its scope areas. They are published in three series. The first, the ERICYTM Reports, consists of synthesis pa- pers reviewing the state of the art in issues of current concern. Recent titles include “An Introduction to Rasch’s Measurement Model,” by Jan-Eric Gustafsson ($5.50); “How Attitudes Are Measured: A Review of Investigations of Professional, Peer, and Parent Attitudes toward the Handicapped,” by Marcia D. Horne ($5.50); and “Intelligence Testing, Education, and Chicanos: An Essay in Social Inequality,” by Adalberto Aguirre, Jr. ($5.50). Over 20 reports are currently available in this series.

The second series, Highl ights , consists of computerized literature searches on current topics, which have been carefully edited by the

ERIC/TM Staff to select only the most relevant citations from the ERIC database. Recent titles in this series include “Equating With the Rasch Model,” “Qualitative Assess- ment Techniques,” and “Survey of State Minimum Competency Test- ing.” Each of the 25 bibliographies in this series is available for $6.00.

The third series, the Update series, consists of mini-bibliographies and fact sheets on topics of current interest. They are available free from the Clearinghouse. Recent titles include “Assessing Experien- tial Learning,” “Student Evalua- tion of Teacher Performance,” and “Minimum Competency Testing: Legal Issues.”

The complete list of ERIC/TM publications is available from the Clearinghouse on request. If you would like to receive announcements of our new publications as they are available, ask to be added to our mailing list. Any requests for information from ERIC/TM can be sent to Louisa Coburn, ERIC/TM, Educational Testing Service (ETS), Princeton, NJ 08531, or call (609)

Barbara M . Wi ldemut l i E R I C / T M at E T S

734-51 81.

Cont inued f r o m page 19 profit from the combination of greater flexibility in selecting items and greater interpretability of the findings.

The crisis of confidence in public education demands gieater under- standability of test results. At AERA in New York, Milton Goldberg, Executive Secretary of the National Commission on Excellence in Edu- cation, called for better ways of defining functional literacy and higher levels of literacy. Similarly, David Wiley (in Vol. 10 of Jossey- Bass’s series, New Directions in Testing and Measurement, Tes t ing in the States: Beyond Accountabili- t y ) called for NAEP to work toward the conceptualization and development of practically defined and easily understood levels of compe- tence. This is just one of several ways in which the power of modern measurement models might be used to improve NAEP and increase its effectiveness and usefulness.

Dale Carlson, Director California Assessment Program

Cooperation Would Benefit All

One of the ways in which the usefulness of NAEP to state and local assessment programs could be enhanced is for NAEP and other testing programs to share in the development and administration of items and interpretation and use of the results. NAEP’s policy on the exclusive use of secure and unreleased items now prohibits such cooperative development projects. By sharing development costs, NAEP and state and local testing agencies could reduce costs and provide greater services within the shrinking resources in education.

The major payoff would come in cooperative interpretation and use of the test data. By involving state and local agencies in the effort, local educators could take the NAEP data and their own scores and use them as a basis for individu- al student remedial work as well as for review and modification of local curricula. Instructional support materials for the NAEP objectives could be developed that others could use. In addition, NAEP testing methods (e.g., primary trait scoring in writing) could them- selves provide useful instructional techniques to local educators.

A portion of each NAEP assessment should be secure, using new or previously used but unreleased items. In this way the measurement of change will be free of possible contamination. However, the re- mainder of the assessment should be opened up so that items previously released or from state and local programs could be used. This would encourage joint test development projects and concurrent NAEP, state, or local assessments. The critical factor would be to change the policy on the use of “released” or publicly available items in NAEP assessments, so that incentives would exist to promote cooperative assessment projects. NAEP, state assessment programs, and local districts all would have something to gain from such a change.

Edward D. Roeber Supervisor

Michigan Educational Assessment Program

Winter 1982 2 Y

measuring the quality of education: a report on assessing educational progress

Documents