special education finance || mis-conceptualizing the cost of large-scale assessment

10
Mis-conceptualizing the Cost of Large-Scale Assessment Author(s): Richard P. Phelps Source: Journal of Education Finance, Vol. 21, No. 4, Special Education Finance (Spring 1996), pp. 581-589 Published by: University of Illinois Press Stable URL: http://www.jstor.org/stable/40703978 . Accessed: 28/06/2014 08:30 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . University of Illinois Press is collaborating with JSTOR to digitize, preserve and extend access to Journal of Education Finance. http://www.jstor.org This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AM All use subject to JSTOR Terms and Conditions

Upload: richard-p-phelps

Post on 01-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Special Education Finance || Mis-conceptualizing the Cost of Large-Scale Assessment

Mis-conceptualizing the Cost of Large-Scale AssessmentAuthor(s): Richard P. PhelpsSource: Journal of Education Finance, Vol. 21, No. 4, Special Education Finance (Spring 1996),pp. 581-589Published by: University of Illinois PressStable URL: http://www.jstor.org/stable/40703978 .

Accessed: 28/06/2014 08:30

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

University of Illinois Press is collaborating with JSTOR to digitize, preserve and extend access to Journal ofEducation Finance.

http://www.jstor.org

This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AMAll use subject to JSTOR Terms and Conditions

Page 2: Special Education Finance || Mis-conceptualizing the Cost of Large-Scale Assessment

JOURNAL OF EDUCATION FINANCE 21 (SPRING 1996), 581-589

COMMENTARY

Mis- Conceptualizing the Cost of Large-Scale Assessment

By Richard P. Phelps

years ago, I started a job at the U.S. General Accounting Office (GAO), the congressional research agency that spe-

cializes in government program audits and evaluations. My charge was to estimate the cost of a national examination for elementary and secondary students, a concept then very much in the news.

Early in the debate over national testing, Congress had de- cided that it lacked some key information. What was the current extent and cost (in both time and dollars) of standardized testing in the schools, and how would a new, national test affect the cur- rent testing? And, what was the experience of the several states that had implemented large-scale performance-based tests, simi- lar to the kind proposed for a national test?

The project proceeded in typical fashion for a GAO study - slowly, methodically, and carefully. This level of care - with all the double-checking and the unending reviews at every stage of the research process - frustrates many GAO project managers, but it produces highly reliable results. The project's design plan, sam- pling plan, data collection instruments, and draft report - each was reviewed by many experts and practitioners.

The development of the questionnaires was a project in itself. We drafted four questionnaires: one for each state education agency; each local school district; each statewide test; and each district- wide test. The draft questionnaires were reviewed by doz- ens of individuals both within the GAO and in state and local education agencies across the country. We then travelled to schools in the Mid-Atlantic states to pre-test these draft questionnaires with school district testing officials. In all these reviews, the ques- tionnaires were screened both for reliability and completeness. We wished to make certain that we would obtain all the pertinent

Richard P. Phelps is Senior Research Analyst at the Pelavin Research Institute, American Insti- tutes for Research, Washington, DC.

[581]

This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AMAll use subject to JSTOR Terms and Conditions

Page 3: Special Education Finance || Mis-conceptualizing the Cost of Large-Scale Assessment

582 Journal of Education Finance

information on the extent and cost of testing that it was possible to obtain.

I think we were successful. The GAO report provides a com- plete cost estimate for all systemwide testing in the United States for the school year 1990-9 1.1 It includes no expenditure that can not validly be classified as a cost. It counts only the marginal costs that can be associated with testing and not with other activi- ties. It includes the cost of all test-related activity.

Imagine my surprise, then, when I heard that someone had accused the GAO study of leaving out substantial categories of costs. I traced the accusations to David Monk and a report he had written for the New Standards Project (NSP). That report, which has not been released by the NSP, was summarized in an article in the Spring 1995 issue of the Journal of Education Finance?

In the unreleased NSP report, Monk makes several inaccurate assertions about the GAO report. They include: "The NSP pro- gram includes a sizeable investment in staff development while the GAO estimates are based primarily on the costs of simply administering the exams,"3 and "It is worth noting that the GAO report is one of the few published studies where Development Costs of a national testing system are considered separately."4

Each of these statements is way off the mark. As I mentioned above, the GAO report includes all the pertinent costs of testing: all cash expenditures for testing materials or services as well as the costs of employing all the state and local education agency personnel involved in every aspect of the testing effort.5 Person-

1 . We restricted the domain of tests to include only "systemwide" tests; that is, those adminis- tered to every student, to almost every student, or to a representative sample of all students in at least one grade level in a district or state. Since we intended to use questionnaires as our pri- mary source of data, we realized it was impossible to ask about all tests, or even all standardized tests, because the reporting burden would have been too great and our response rate decreased in consequence. The domain of systemwide tests includes all standardized tests except those administered to special populations, such as special education and gifted and talented students; optional tests, such as college entry exams; and many tests used for Chapter 1 evaluation. Thus, the set of systemwide tests seemed the most appropriate for our study, since it comprised about 86 percent of all standardized academic tests. See U.S. General Accounting Office, Student Test- ing: Current Extent and Expenditures, With Costs and Estimates for a National Examination, (Washington, DC: U.S. Government Printing Office, 1993).

2. David H. Monk, "The Costs of Pupil Performance Assessment: A Summary Report." Jour- nal of Education Finance 20 (Spring 1995): 363-371.

3. The complete report upon which this summary is based can be found in: David H. Monk, "The Costs of Pupil Performance Assessment," ERIC Document Reproduction System: ED 376 210.

4. Monk, "The Costs of Pupil Performance Assessment," 226-7.

5. Some, including apparently David Monk, advocate also counting student test-taking and

This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AMAll use subject to JSTOR Terms and Conditions

Page 4: Special Education Finance || Mis-conceptualizing the Cost of Large-Scale Assessment

The Cost of Large-Scale Assessment 583

nel costs were estimated by asking state and local education agency respondents to list or estimate the number of personnel hours de- voted to the testing effort in the four personnel categories of teacher, administrator, clerical, and "other" and multiplying the number of hours in each category by the average salary for that category. All this is straightforwardly explained in the GAO re- port.

But, let me take each of Monk's criticisms in turn. The aver- age cost for systemwide tests in the 1990-91 school year was about $15 per student. Of this amount, less than one-third, or $5, was accounted for by cash expenditures for test materials and services. Two-thirds of the costs of systemwide tests in 1990-91 were ac- counted for by personnel costs, costs that Monk suggested that the GAO study left out.

Monk also accused the GAO study of including only the costs of "simply administering the exams." In fact, the GAO study in- cludes all costs associated with any and all test-related activity. Table 1 displays all the personnel cost categories in the GAO ques- tionnaires along with a breakdown of the weight, or proportion of the total personnel cost, that each category portends.

The cost category for "simply administering the exams" would be Category 6 of the 1 1 categories of personnel costs included in the GAO study's cost calculations (see Table 1). Category 6 com- prised 8.5 percent of personnel costs at the state level and less than 40 percent of the personnel costs at the local school district level. As mentioned earlier, personnel costs comprised about two- thirds of all costs. All told, then, "simply administering the ex- ams" comprised less than 23 percent of the cost of testing that the GAO study calculated. Monk is not only wrong in his character- ization of the GAO report, he is wrong by a wide margin.

Now, let me turn to the first part of Monk's sentence in the NSP draft, "The NSP program includes a sizeable investment in staff development while the GAO estimates...."6 In fact, the GAO study includes all training costs related to testing (which are clas- sified in the same way as in Monk's study) and no training costs not related to testing. Because the GAO study estimated the cost

test-preparation time as a cost, say, against students' future earnings. The debate on this issue alone could fill a journal article. I believe that student test-taking time can not be counted as a marginal cost of testing because: students learn while they're taking tests; tests are an integral, inseparable part of instructional programs; and the test-taking process itself has intrinsic in- structional value.

6. Monk, "Conceptual Issues and Preliminary Estimates," p. 226.

This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AMAll use subject to JSTOR Terms and Conditions

Page 5: Special Education Finance || Mis-conceptualizing the Cost of Large-Scale Assessment

584 Journal of Education Finance

Table 1 Personnel Hours, From the GAO Study

Task* State level District level (Original wording of the questionnaire) SUM % SUM %

1. Developing test 96,317 35.5 404,000 2.0 2. Getting trained to administer

or score the test 1,254 0.5 1,322,000 6.5 3. Training others to administer

or score the test 14,970 5.5 363,000 1.8 4. Preparing the administration of the test 24,667 9.1 5. Preparing students to take the test 5,725,000 28.2 6. Administering or overseeing the

administration of the test 23,003 8.5 8,003,000 39.4 7. Getting trained to score test 4,968 1.8 8. Training others to score test 6,720 2.5 9. Scoring or overseeing the

scoring of the test 37,092 13.7 515,000 2.5 10. Collecting, sorting, and

mailing completed tests 12,808 4.7 1,270,000 6.3 11. Analyzing or reporting the results 49,864 18.4 2,430,000 12.0 12. Doing any other related activity in any

other way pertaining to the test (or, all of the above) 134,908 275,000 1.4 Total Hours 406,571 20,307,000

"The question in the GAO questionnaire was written thus: "For this test, what were the aggre- gate total number of all teacher, administrator, clerical, and other personnel hours spent in the previous fiscal year on each of the following test-related tasks within your school district [or, state]? Remember, if you can't provide exact answers, provide the best estimates you can."

"/« calculating the aggregate total number of teacher hours, don 't forget all the factors in- volved - the number of classrooms tested per school; the number of schools in the district; and the number of grade levels tested. Generally, most of these tasks listed below are performed at the school level. If your school district is large and you don Y normally collect this kind of infor- mation from the schools, we recommend that you poll a sample of school principals or school test coordinators. "

Note: Teachers represent about 81 percent of the staff that could possibly be involved in testing activities. Other staff that might be involved include: other instructional staff; administrative staff; and clerical staff. Still other staff are not likely to be involved in testing activities. They include: transportation and maintenance workers; school nurses; and cafeteria workers.

Source: GAO study, 1993.

This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AMAll use subject to JSTOR Terms and Conditions

Page 6: Special Education Finance || Mis-conceptualizing the Cost of Large-Scale Assessment

The Cost of Large-Scale Assessment 585

of testing, no staff development costs unrelated to testing were included. If Monk included staff development costs unrelated to testing in his study, then he overestimated the cost of testing.

Monk further asserts, "It is worth noting that the GAO report is one of the few published studies where Development Costs of a national testing system are considered separately... the program I envision retains development activities during operations."7 In fact, the GAO study separates test development costs into two components: "start-up" costs that precede the first year's admin- istration of any test; and ongoing costs that accompany tests each year they are used. Ongoing test development accounts for 35 percent of annual testing personnel time at the state level in the GAO estimates (see Table 1).

A year-and-a-half ago, I telephoned David Monk to inform him that his characterizations of the GAO report were inaccurate. Then, I sent him a copy of the GAO report, pointing out the pas- sages in it that contradicted what he was saying about it. I also sent him copies of the four questionnaires.

With that, I thought the issue was settled. Three-quarters of a year later the Journal of Education Finance article, a summary of his NSP report, appeared. In the Journal of Education Finance article, Monk made minor adjustments that worsened, rather than corrected, his mistakes. His assertion that the GAO report cost estimates were "based primarily on the costs of simply adminis- tering the exams," was changed into "based primarily on the costs of administering the examination system."8 This seems to me to be an even more restrictive category than the earlier one. It sug- gests to me something like a state education agency testing office's administrative costs for a testing program, a minuscule propor- tion of the total costs of testing in a state.

The two other erroneous characterizations of the GAO report, about the alleged exclusion of staff development and test devel- opment costs, were repeated in the Journal of Education Finance article verbatim from the NSP unpublished report.9

Even aside from his mischaracterization of the GAO report, I find a number of problems with Monk's NSP report and its sum- mary in the Journal of Education Finance. His NSP report con- sists of three sections: theory; estimation methodology; and a sum-

7. Monk, "The Costs of Pupil Performance Assessment" pp. 226-7.

8. Monk, "A Summary Report," 371.

9. Monk, "A Summary Report," 371.

This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AMAll use subject to JSTOR Terms and Conditions

Page 7: Special Education Finance || Mis-conceptualizing the Cost of Large-Scale Assessment

586 Journal of Education Finance

mary of his calculations. The essence of the theory section is sum- marized in the Journal of Education Finance article:

Costs are measures of what must be foregone to realize some benefit, and for this reason they can- not be divorced from benefits. Expenditures, in contrast, are measures of resource flows regardless of their consequences. A cost analysis requires a comparison of benefits; an expenditure analysis does not.10

I find this passage misleading in several ways. First, there are many different kinds of costs in the economist's tool kit (e.g., fixed, sunk, variable, total, marginal, average, incremental, avertable, avoidable, common, joint, stand-alone, fully-allocated, and so on), and all have been used in cost analyses of various types, with or without consideration of counterveiling benefits. What Monk defines as "cost analysis," most researchers, I be- lieve, would identify as benefit/cost analysis, which includes an analysis of costs, an analysis of benefits, and a reconciling of the two that calculates "net benefits" or "net costs." It appears to me that what Monk defines as "cost" is really this benefit-cost analy- sis product of net cost.

Even more misleading is how Monk identifies expenditures, as "measures of resource flows regardless of their consequences." This statement ignores the fact that we humans are capable of much, and we are rather easily able to isolate particular expendi- tures with particular consequences when we need to.

It goes without saying that one can not grab willy-nilly any expenditure datum and call it a cost of anything one wishes. A quarter spent on bubble gum may well represent a cost of a pref- erence for bubble gum, but it does not represent a cost of, say, transportation.

It also goes without saying that one cannot rely on budgetary expenditure categories to represent costs. Line-item expenditures in budgets are often not aggregated in precisely the way one needs them to be to represent the costs one wishes to represent. Take budgetary expenditures on testing programs, for example. They do not necessarily equal expenditures on particular tests. A test- ing division in an education agency may handle several tests, it may perform consulting services for tests given by other agen- cies, it may hire personnel from other divisions for work on cer-

10. Monk, "A Summary Report," 365.

This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AMAll use subject to JSTOR Terms and Conditions

Page 8: Special Education Finance || Mis-conceptualizing the Cost of Large-Scale Assessment

The Cost of Large-Scale Assessment 587

tain tests, and so on. To learn the costs of any particular test, the researcher must isolate the expenditures for that particular test. And that requires the hard work of developing, administering, and collecting a customized survey, as the GAO did.

The above two points underline the proposition that one should only ascribe as a cost of an activity those expenditures attribut- able to that activity and not those attributable to some other ac- tivity.

There are at least two methods available for estimating the cost of testing. One, a benefit-cost analysis, consists of counting up all the costs, all the benefits, and then reconciling the two. The other method, the one the GAO report used, estimated the mar- ginal costs of testing. This method attempts to isolate all costs attributable to a certain activity alone and ignore any costs not attributable to that activity. An heuristic can be used for judging whether any particular activity is a marginal cost of testing or not - take the test away and ask if the activity would remain. If it would, the activity's expense can not be considered a cost of test- ing; if it would not, the activity's expense can be considered a marginal cost of testing.

It was a great disappointment to me that Monk's article in the Journal of Education Finance summarized only the "theory" and "results" sections of his NSP report, and did not cover the meth- ods section,11 which explains how he calculated his estimates.

I believe that an accurate summary of his estimation method would reveal that he mixed up benefit-cost and marginal cost analyses. He included in his calculations gross expenditures for tests, for example. That takes care of the cost side. But, he only estimates a small portion of the benefits, those that accrue from using an allegedly better performance-based test rather than us- ing some allegedly worse type of test. Thus, the cost side of his calculation measured costs that accrue over an initial condition of no testing. But, the benefit side of his calculation measured benefits that accrue over an initial condition of another form of testing.

But, this inconsistency is not the worst problem with his esti- mation method, in my opinion. The worst problem is that his es- timates have no empirical base. His method consisted of obtain- ing single-data-point estimates from a few isolated pilot studies of the New Standards Project and, more often, from his own con- jecture, for dozens of expenditure items, "conceptualizing" up-

1 1 . Monk, "The Costs of Pupil Performance Assessment," pp. 226-7 '.

This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AMAll use subject to JSTOR Terms and Conditions

Page 9: Special Education Finance || Mis-conceptualizing the Cost of Large-Scale Assessment

588 Journal of Education Finance

per and lower-bounds for them, and multiplying to larger scale. In other words, he started with zero degrees of freedom, then made many assumptions, and wants us to have confidence in the results. (An exposition of Monk's estimation methods are con- tained in chapters 3-6, on pp. 38-216 of the NSP report.)

This estimation method should be familiar to all of us. We employ it in the consulting business, where I now work, every time we propose a budget for a new project. We think of all the components involved in executing a proposed task, think of how much of each component (eg. a person's time, use of certain equip- ment, etc.) we will need, and multiply by the unit costs of each of those components.

It's a time-worn, familiar method for estimating costs. How- ever, it can produce wildly unreliable estimates. We are all famil- iar with the maxim, "estimate the cost of everything you can think of, and then double or triple your estimate to account for all that you haven't thought of." Not terribly precise.

Various factors can be brought into play to increase the reli- ability of such estimates. Estimates are more reliable; the smaller the project; the more control the estimator has over the spending; the greater the degree of familiarity the estimator has with the type of work involved in the task; and the more experience the estimator has with producing such estimates. The more solid, empirical data one has based on past experience, the more reli- able one's estimates will be. The more assumptions one has to make, the less reliable one's estimates will be.

In making his estimates Monk had none of the above helpful factors on his side. He had no experience, no familiarity, no con- trol over the spending, and no data other than the single data- point estimates and conjectures upon which all his calculations are based. One can know the standard errors of the GAO sample estimates; they're in the confidence intervals listed in Appendix I of the report. The standard errors of Monk's estimation method are infinite in value.

As I read the methods chapters in Monk's NSP report,12 1 high- lighted every occurrence of the words "assume" and "presume" (in any variation). I counted 439 occurrences, or about 2.5 as- sumptions per page. This counting method, however, understates the extent of assumption in his estimates, because he mixes up his vocabulary quite a lot, also using such terms as: project, derive, imply, envision, adjust, modify, calibrate, expect, estimate, ac-

12. Monk, "The Costs of Pupil Performance Assessment," pp. 38-216.

This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AMAll use subject to JSTOR Terms and Conditions

Page 10: Special Education Finance || Mis-conceptualizing the Cost of Large-Scale Assessment

The Cost of Large-Scale Assessment 589

cording to, in terms of, depending on, given that, impose, may be, could be, and so on.

My intention here is not to trivialize his estimation method, but to illustrate in a concise way just how loose it is. Essentially, his estimates are back-of-the-envelop calculations that employ hundreds of conjectures about reality. This is not to say they are not thoughtful or even reasonable, given what they are. But, they are far from being empirically-based, and far from being reliable.

I strongly encourage anyone interested in the topic of cost estimation as it is applied to student testing to obtain copies of the full reports - Monk's NSP report and the GAO report - in order to judge the quality of the estimation methods for themselves.

Conclusion

It is an occupational hazard in our profession for one to invest some effort in a study only to be scooped by someone else who gets a similar study to press more quickly, has access to a supe- rior data source, or who simply invests more effort in the research. There are no monopolies on public information.

The GAO study was conducted carefully and produced cost estimates with a genuine empirical base. Monk preferred, instead, to calculate very rough estimates. That is the most important dif- ference between the two studies.

This content downloaded from 193.0.147.17 on Sat, 28 Jun 2014 08:30:54 AMAll use subject to JSTOR Terms and Conditions