document resume ed 301 828 cs 211 614 author greenburg ... · case study: deborah holdstein &...

34
ED 301 828 AUTHOR TITLE INSTITUTION SPONS AGENCY PUB DATE NOTE PUB TYPE EDFS PRICE DESCRIPTORS ABSTRACT DOCUMENT RESUME CS 211 614 Greenburg, Karen, Ed.; Slaughter, Ginny, Ed. Notes from the National Testing Network in Writing. Volume VIII, November 1988. City Univ. of New York, N.Y. Office of Academic Affairs. Fund for the Improvement of Postsecondary Education (ED), Washington, DC. Nov 88 34p.; Abstracts of papers presented at the National Testing Network in Writing Conference (1988). Collected Works - Conference Proceedings (021) -- Collected Works - Serials (022) MF01/PCO2 Plus Postage. Abstracts; Computer Uses in Education; Essay Tests; Holistic Evaluation; Scaling; *Scoring; *Writing Evaluation; *Writing Research This newsletter contains 32 abstracts of approximately 1000 words each of papers presented at the 1988 conference the National Testing Network in Writing. Abstracts, listed with their authors, include "Instructional Directions from Large Scale K-12 Writing Assessments" (C. Chew); "Portfolio Assessment across the Curriculum: Early Conflicts" (C. Anson and others); "Revamping the Competency Process for Writing: A Case Study" (D. Holdstein and I. Bosworth); "We Did It and Lived: A State University Goes to Exit Testing" (P. Liston and others); "Proficiency Testing: Issues and Models" (G. Gadda and M. Fowles); "Creating, Developing, and Evaluating a College-Wide Writing Assessment Program" (S. Groden); "How to Organize a Cross-Curricula Writing Assessment Program" (G. Hughes-Wiener and others); "Presenting a Unified Front in a University Writing and Testing Program" (L. Silverthorne and P. Stephens); "Evaluating a Literacy across the Curriculum Program: resigning an Appropriate Instrument" (L. Shohet); "Validity Issues in Direct Writing Assessment" (K. Greenberg and S. Witte); "Reliability Revisited: How Meaningful Are Essay Scores?" (E. White); "Establishing and Maintaining Score Scale Stability and Reading Reliability" (W. Patience and J. Auchter); "Training of Essay Readers: A Process for Faculty and Curriculum Development" (R. Christopher); "Discrepancies in Holistic Evaluation" (D. Daiker and N. Grogan); "Problems and Sclutions in Using Open-Ended Primary Traits" (M. C. Flanagan); "The Implications of Using the Rhetorical Demands of College Writing fo: Placement" (K. Fitzgerald); "Using Video in Training New Readers of Assessment Essays" (G. Cooper); "WPA Presentation on Evaluating Writing Programs" (R. Christopher and others); "Developing and Evaluating a Writing Assessment Program" (L. Boehm and M. A. McKeever); "The Changing Task: Tracking Growth over Time" (C. Lucas); "Assessing Writing to Teach Writing" (V. Spandel); "Reader-Response Criticism as a Model for Holistic Evaluation" (K. Schnapp); "The Discourse of Self-Assessment: Analyzing Metaphorical Stories" (B. Tomlinson and P. Mortensen); "The Uses of Computers in the Analysis and Assessment of Writing" (W. Wresch and H. Schwartz); "Legal Ramifications of Writing Assessments" (W. Lutz); "Some Not So Random Thoughts on the Assessment of Writing" (A. Purves); "Research on National and International Writing Assessments" (A. Purves and others); "Teaching Strategies and Rating Criteria: An International Perspective" (S. Takala and R. E. Degenhart); "Effects of Essay Topic Variations on Student Writing" (G. Brossell and J. Hoetker); "What Should Be a Topic?" (S. Murphy and L. Ruth); "Classroom Research and Writing Assessment" (M. Meyers); and "Computers and the Teaching of Writing" (M. Ribaudo and L. Meeker). (RS)

Upload: others

Post on 23-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

ED 301 828

AUTHORTITLE

INSTITUTION

SPONS AGENCY

PUB DATENOTE

PUB TYPE

EDFS PRICEDESCRIPTORS

ABSTRACT

DOCUMENT RESUME

CS 211 614

Greenburg, Karen, Ed.; Slaughter, Ginny, Ed.Notes from the National Testing Network in Writing.Volume VIII, November 1988.City Univ. of New York, N.Y. Office of AcademicAffairs.

Fund for the Improvement of Postsecondary Education(ED), Washington, DC.Nov 88

34p.; Abstracts of papers presented at the NationalTesting Network in Writing Conference (1988).Collected Works - Conference Proceedings (021) --Collected Works - Serials (022)

MF01/PCO2 Plus Postage.

Abstracts; Computer Uses in Education; Essay Tests;Holistic Evaluation; Scaling; *Scoring; *WritingEvaluation; *Writing Research

This newsletter contains 32 abstracts ofapproximately 1000 words each of papers presented at the 1988conference the National Testing Network in Writing. Abstracts,listed with their authors, include "Instructional Directions fromLarge Scale K-12 Writing Assessments" (C. Chew); "PortfolioAssessment across the Curriculum: Early Conflicts" (C. Anson andothers); "Revamping the Competency Process for Writing: A Case Study"(D. Holdstein and I. Bosworth); "We Did It and Lived: A StateUniversity Goes to Exit Testing" (P. Liston and others); "ProficiencyTesting: Issues and Models" (G. Gadda and M. Fowles); "Creating,Developing, and Evaluating a College-Wide Writing Assessment Program"(S. Groden); "How to Organize a Cross-Curricula Writing AssessmentProgram" (G. Hughes-Wiener and others); "Presenting a Unified Frontin a University Writing and Testing Program" (L. Silverthorne and P.Stephens); "Evaluating a Literacy across the Curriculum Program:resigning an Appropriate Instrument" (L. Shohet); "Validity Issues inDirect Writing Assessment" (K. Greenberg and S. Witte); "ReliabilityRevisited: How Meaningful Are Essay Scores?" (E. White);"Establishing and Maintaining Score Scale Stability and ReadingReliability" (W. Patience and J. Auchter); "Training of EssayReaders: A Process for Faculty and Curriculum Development" (R.Christopher); "Discrepancies in Holistic Evaluation" (D. Daiker andN. Grogan); "Problems and Sclutions in Using Open-Ended PrimaryTraits" (M. C. Flanagan); "The Implications of Using the RhetoricalDemands of College Writing fo: Placement" (K. Fitzgerald); "UsingVideo in Training New Readers of Assessment Essays" (G. Cooper); "WPAPresentation on Evaluating Writing Programs" (R. Christopher andothers); "Developing and Evaluating a Writing Assessment Program" (L.Boehm and M. A. McKeever); "The Changing Task: Tracking Growth overTime" (C. Lucas); "Assessing Writing to Teach Writing" (V. Spandel);"Reader-Response Criticism as a Model for Holistic Evaluation" (K.Schnapp); "The Discourse of Self-Assessment: Analyzing MetaphoricalStories" (B. Tomlinson and P. Mortensen); "The Uses of Computers inthe Analysis and Assessment of Writing" (W. Wresch and H. Schwartz);"Legal Ramifications of Writing Assessments" (W. Lutz); "Some Not SoRandom Thoughts on the Assessment of Writing" (A. Purves); "Researchon National and International Writing Assessments" (A. Purves andothers); "Teaching Strategies and Rating Criteria: An InternationalPerspective" (S. Takala and R. E. Degenhart); "Effects of Essay TopicVariations on Student Writing" (G. Brossell and J. Hoetker); "WhatShould Be a Topic?" (S. Murphy and L. Ruth); "Classroom Research andWriting Assessment" (M. Meyers); and "Computers and the Teaching ofWriting" (M. Ribaudo and L. Meeker). (RS)

Page 2: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

COCOCO

CZ)v--I VOLUME VIII

"PERMISSION TO REPRODUCE THISMATERIAL HAS BEEN RANTED BY

e9c Q-P(A

TO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)"

cz NOTES FROM THE

"" NATIONAL TESTING NETWORKIN WRITING

U $ DEPARTMENT Of EDUCATIONOrate of Educational Reaseicn and improvement

EDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC)

P This document has been reproduced asreceived from the person or organizationoriginating it

P Minot cnanges have been made to improvereproduction quality

Points of view or opinions stated in the document do not necessarily represent officiaiOEM pombon or policy

NOVEMBER 1988

A Project of The City University nf New York and The Fund to, the Improvement of Postsecondary Education

The National Testing Network in Writing, now inits seventh year, numbers 3,000 ,,,embers across elevencountries. We are busier than ever collecting, cataloging,and disseminating information and data on measures andprocedures used to assess students' writing skills. Wn aregrateful for your help in sending us materials fromtesting programs--which, amazingly, are stillproliferating, as questions about when, how, and N hetherto test writers continue to plague teachers and schooladministrators.

From the beginning, NTNW has sought to helpmembers find answers to these questions. Our eight issuesof Notes, the book, Writing Assessment; Issues andStrategies, and our annual conferences have attractedteachers, administrators, and assessment specialists frominstitutions around the world to examine models, and toexplore the impact of assessment on pedagogy, curricula,and students.

The 1989 conference will be international in scope,featunag noted researchers from eight countries who willlead workshops and present their latest findings. Theconference, co-sponsored by Dawson College. will

take place Sunday, April 9th through Tuesday, April 11th(to allow for a weekend in Quebec) at the Centre SheratonHotel in Montreal, Canada. A new feature will be pre-and post-conference workshops. (See the centerfold of thisissue for more information and a registration form.)

This issue of Notes 'continues the tradition ofpublishing abstracts from the annual conferences. The1988 conference was co-sponsored by the University ofMinnesota under the coordination of Chris Anson. Hereare the abstracts of all of the workshops and panels,grouped ace°, ding to themes: the first nine describemodels of successful writing assessment programs,followed by eight that focus on models of scales andscoring; nine abstracts examine the impact of writingassessment on students, faculty, and curricula; and thefinal six examine current research on writing assessment.

The theme of the upcoming 1989 conference is"Writing Assessment Across Cultures." We hope youwill join us in Montreal in April.

Karen Greenberg and Ginny Slaughter

We wish to thank the people who graciously contributed to this issue of Notes. First we thank all of theRecorders whose summary reports make up the bulk of the issue. We also thank Chris Anson of the Universityof Minnesota, Twin Cities for organizing a wonderful conference and Leslie Denny for carefully overseeing all ofthe local arrangements. And vie are grateful to Ilaeryung Shin and Roddy Potter for their invaluable editorialassistance. Finally, our thanks to tiarvey Wiener, University Associate Dean for Academic Affairs, and Director ofCUNY's Instructional Resource Center, for supporting the production z.f NTNW Notes.

NOTES is )ublished h. the Instructional Resource Center, Office of Academic AffairsThe City University of New York, 535 East 80th Street, New York, New York 10021

1

Page 3: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

CONTENTS Page

MODELS OF EFFECTIVE WRITINGASSESSMENT PROGRAMS

Instrurtional Directions from Large Scale K-12Writing Assessments: CHARLES CHEW

Portfolio Assessment Across the Curriculum: EarlyConflicts: CHRIS M. ANSON, ROBERT L.BR')WN,& LILLIAN BRIDWELL- BOWLES

Revamping the Competency Process for Writing: ACase Study: DEBORAH HOLDSTEIN & INESBOSWORTH

We Did It and Lived: A State University Goes toExit Testing: PHYLLIS LISTON, JOHN MATHEW, &LINDA PEL7ER

Proficiency Testing: Issues and Models:GEORGE GADDA & MARY FOWLES

Creating, Developing, and Evaluating a College-Wide Writing Assessment Program: SUZY GRODEN

How to Organize a Cross-Curricula WritingAssessment Program: GAIL HUGHES-WIENER,SUSAN JENSEN-CEKALLA, MARY THORNTON-PHILLIPS & GERALD MARTIN

Presenting a Unified Front in a University Writingand Testing Program: LANA SILVERTHORNE &PATRICIA STEPHENS

Using Video in Training New Readers ofAssessment Essays: GEORGE COOPER

4THE IMPACT OF WRITING ASSESSMENT ONSTUDENTS, FACULTY, AND CURRICULA

6 WPA Presentation on Evaluating WritingPrograms: ROBERT CHRISTOPHER, DONALDDAIKER, & EDWARD WHITE

7 Developing and Evaluating a WritingAssessment Program: LORENZ BOEHM & MARYANN MCKEEVER

7 The Changing Task: Tracking Growth Over Time:CATHERINE LUCAS

8

9

10

Ass 'ssing Writing to Teach Writing:SPANDEL

Reader-Response Criticism as a Model for HolisticEvaluation: KARL SCHNAPP

The Discourse of Self-Assessment: AnalyzingMetaphorical Stories: BARBARA TOMLINSON &PETER MORTENSEN

The Uses of Computers in the Analysis andAssessment of Writing: WILLIAM WRESCH &

10 HELEN SCHWARTZ

Evaluating:Literacy Across the Curriculum Program: 12Designing an Appropriate Instrument:LINDA SHOHET

MODELS OF SCALES AND SCORINGValidity Issues in Direct Writing Assessment:KAREN GREENBERG & STEPHEN WTTTE.

Reliability Revisited: How Meaniagful Are EssayScores?: EDWARD WHITE .

Establishing and Maintaining Score Scale Stabilityand Reading Reliability:WAYNE PATIENCE & JOAN AUCHTER

Training of Essay Readers: A Process for Faculty andCurriculum Development: ROBERT CHRISTOPHER

Discrepancies in Holistic Evaluation:DONALD DAIKER & NEDRA GROGAN

Probems and Solutions in Using Open-EndedPrimary Traits: MICHAEL C. FLANAGAN

The Implicatic -is of Using the Rhetorical Demandsof College Writing for Placement:KATHRYN FTTZLERALD

13

14

14

Page

20

:1

22

23

24

25

25

26

Legal Ramifications of Writing Assessment: 27WILLIAM LUTZ

Some Not So Random Thoughts on the Assessment 28of Writing: ALAN C. PURVES

CURRENT RESEARCH ON WRITINGASSESSMENT

Research on National and International WritingAssessments: ALAN C. PURVES, THOMASGORMAN, & RAINER LEHMANN

Teaching Strategies and Rating Criteria: AnInternational Perspective: SAULI TAICALA &R. ELAINE DEGENHART

Effects of Essay Topic Variations on Student15 Writing: GORDON BROSSELL & JIM HOETKER

What Should Be a Topic?16 SANDRA MURPHY & LEO RUTH

Classroom Research and Writing Assessment:17 MYLES MEYERS

Computers and the Teachirc of Writing:17 MICHAEL RIBAUDO & LINDA MEEKER

3L.;

29

29

30

31

32

33

Page 4: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

INSTRUCTIONAL DIRECTIONS FROM LARGESCALE K-12 WRITING ASSESSMENTS

Speaker: Charles Chew, New YorkState Department of Education

Introducer: Marie Jean LedermanBaruch College,CUNY and NTNW

It is now generally agreed that (1) direct assessmentof writing should, if possible, approximate what weexpect when students write; (2) learning to write is aprocess which takes place over time, sometimesrecursively; and (3) requiring students to write wholediscourses is a better assessment tool than the objectivetest of discrete skills. In 1979, when New York Statefirst instituted a writing competency test including threewriting samples, it was virtually in its attempt toassess students' writing ability through multiple writingsamples. Five years have passed since the ofthe first Regents Competency Test in Writing. Theprogram has grown to encompass not only the eleventhgrade but the eighth grade and fifth grade as well. SinceSeptember of 1985, G.E.D. diploma candidates must alsowrite an essay.

The Writing Test for New York State ElementarySchools administered at grade 5 comes very close toapproximating the composing process. Students arerequirr.4 to write two different pieces for the test on twodifferent days. A prewriting section precedes the writingsample and is not evaluated. Students draft a response andthen redraft. In the tests at the secondary level, thePreliminary Competency Test in Writing at grade 8requires students to write three pieces, as does the RegentsCompetency Test in Writing which is administered ingrade 11 and is a requirement for graduation. TheComprehensive Examination in English, which isadministered useally to average or above '.verage studentsrequires two writing samples.

If tests are to approximate the reality of the writingprocess and are to have a positive effect on instruction,they need to require many types of writing . The outlinebelow shows the types of writing assessed in New YorkState Writing Assessment programs.

Types of Writing

Writing Test for New

PersonalNarrative

York State Elementary Schools

The writer recounts anexperience which he/she had.

4

Personal

Expression

Description The writer describes a person,object, or place.

Process The writer explains how to dosomething.

StoryStarter

The writer recounts afeeling or an emotion.

The writer completes a storywhich is started in the writingprompt.

Preliminary and Regents Competency Test in Writing

BusinessLetter

Report

PerstasiveDiscourse

The writer at eighth gradewrites a letter orderingsomething. At grade 11, t'iewriter composes a letter ofcomplaint and suggests how toremedy the situation.

The writer takes data suppliedand prepares a report foranother person or he class.

The writer attempts topersuade the reader to takesome action by stating theaction to be taken and givingreasons why such action shouldbe taken.

Comprehensive Examination in English

Essay

Composition

The writer, using literaturewhich has been read, respondsto a given question which isgeneric in the sense that a widevariety of literature covid beused in the response.

The writer can choose to doone question from amongeight. Two of these aresituations which provide apurpose and audience. Theother six are discrete topicswhich require a full rhetoricalinvention by the writer.

The methods of evaluation used in this testingprogram also speak to instruction. Students' writingsamples are evaluated holistically at grade 5, and modifiedholistic scoring is used with all the other tests. This

Page 5: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

rating procedure delivers a message to anyone in the statewho is involved with the testing program. The idea thatthe whole piece of writing may be worth more than anyone single feature is an important message to teacherswho have for years spent an inordinate amount of time"red-penciling" errors in students' work. Many of thecriteria used to evaluate the writing samples are virtuallythe same for grades 5-11, indicating that these elementsare seen as essential in a competent piece of writing. Thefact that the criterion focusing on mechanics is not at thetop of the list reminds teachers that mechanics, althoughimportant, are not the "be all and end all" of writtendiscourse. The fact that the tests are unlimited in time andthat length is merely suggested delivers additionalmessages about the teaching of writing.

Any student who fails below the State ReferencePoint on the writing tests is required by the state toreceive additional or remedial instruction. Parents must benotified of the student's grade and must be informed of theremedial program established for the student. Theseprograms must begin no later than one semester after theadministration of the rest. Students can be removed fromremediation if it can be documented that deficiencies havebeen overcome. At the senior high school level, studentsmust pass the Regents Competency Test in Writing inorder to receive a diploma.

To meet the needs of educators at the local schoollevel in rating the tests and to devise instructionalstrategies to meet student needs in writing, the Bureau ofEnglish and Reading Education developed a two-year in-service program. The first phase of the programidentified fifty key teachers or supervisors, representinggeographic areas of the state, who came to Albany for atwo and one-half day intensive training program. Thisprogram focused on rating procedures, developing aworkshop agenda, and actually simulating the role ofworkshop leader. When these fifty people returned to theirlocal areas, they in turn trained teachers from the localschools who were involved directly with students affectedby the writing tests. The success of the program wasconfirmed by the evaluations done by workshopparticipants. The sampling by the Bureau of ratings oftest papers done locally attested to the reliability of localrating.

Because of the success of our assessment program,there is a reluctance to make changes. In New York wehave sensed a need to change the examinations for anumber of years. After extensive discussion, pretestingand field testing, changes in the examinations will beginin the 1988-89 school year. Part III of the PreliminaryCompetency Test will be changed to reflect the revisedcomposition curriculum for New York State, and thepurposes for writing will rotate among those covered inthis curriculum material. Evaluation of the samples willno longer require model answers. Although rating will be

5

done in much the same way, raters will rely on criteriaonly. This change goes into effect for both the PCT andRCT. In January, 1989, the format of the business letizron the Regents Competency Test in Writing will changeand information needed by the test taker will be in note oroutline form. This change will require the student toprocess the demands of the task and formulate a responserather than simply reword the task. In January 1992, PartIII of the RCT will change to follow the change begun inthe PCT.

I conclude by pointing out some problems,questions, and concerns which still need to be addressed bytest makers and others interested in improving students'writing ability. These are as follows:

(1) Samples of students' writing for evaluation andinstructional purposes must be obtainedthroughout the school year, not only at test time.

(2) Research needs to focus on the development ofwriters over time.

(3) Research needs to determine if skills differappreciably for various types of writing in a testsituation.

(4) Research needs to ascertain the relationshipbetween writing done during a test and that doneby the student at other times.

(5) Writing prompts may not tap the experience ofthe writers.

(6) Instruction can be limited to test items. Studentsmay spend an inordinate amount of time writingbusiness letters and structured responses toliterature.

(7) Evaluators using holistic scoring may notappreciate the fact that more must be done withstudent papers to plan instruction.

(8) Once common elements of competent writinghave been identified, instructional strategies needto be developed which will enable teachers tofocus on these elements in a total languageapproach.

(9) In-service programs connected to a test may belimited when compared to extensive needs ofteachers.

t,'

Page 6: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

Those of us involved in the assessment of writing knowhow much more we know today than we knew just a fewyears ago, but there is still much to be learned.

PORTFOLIO ASSESSMENT ACROSS THECURRICULUM: EARLY CONFLICTS

Speakers: Chris M. Anson, Robert L. Brown, Jr.,and Lillian Bridwell-Bowles, Universityof Minnesota

Introducer Virginia Slaughter, CUNY

Faced with a mandate to begin assessing students'writing at the Uni-trsity of Minnesota, members of theProgram in Composition and Communication therefinally convinced an interdisciplinary task force that across-curricular portfolio assessment would be the onlyway to bring about large-scale changes in the quantity andquality of writing instruction beyond their own writingprogram. In this session, the speakers shared pieces of anongoing cultural critique that focuses on the political,curricular and ideological contexts in which they arestruggling to turn a potentially damaging process into amethod for empowerment, enrichment, and educationalchange.

Currently, the University of Minnesota plans torequire applicants to submit a high school portfolio aspart of the admission requirements. These portfoliosrequire samples of writing from several subject areas as away to encourage writing across the curriculum in thehigh schools. Throughout their college years, studentswill continue to build on their portfolio until they arejuniors, at which time their major department will beresponsible for assessing the quality of their writing forexit from junior-year status. Increased attention towriting, including new composition courses, writing-intensive courses across the curriculum, and trialassessments before the junior year, will provide supportfor the assessment program. Composition faculty willtake on a greater consultative role to help departmentsincorporate writing into their curriculum and to help themestablish methods for the portfolio assessment.

Chris Anson described the University's plans for thisassessment as these are outlined in the 1987 report of theTask Force on Writing Standards. Reactions to the reportwere solicited from departments and colleges at theUniversity of Minnesota, from 143 secondary schoolteachers and administrators across the state, and fromassorted other readers, including personnel at theMinnesota State Department of Education and localprofessionals. Anson's c:ose reading of these readers'responses to the report revealed a more positive attitudetoward instruction among secondary teachers than amongteachers at the University itself. In comparison to collegefaculty, the secondary teachers showed a deeperunderstanding of the relationship between testing andteaching, expressed fewer fears about increased workload,

6

and worried more about the potential hazards of testingwhen it does not support enhanced instruction. Usingquotations from several responses, Anson showed howfaculty members' views of writing assessment are notonly saturated by their tacit endorsement of thesurrounding academic values of their institution, but alsooy the more specific ideological perspectives of theirdiscipline.

Anson explained the resistance to portfolioassessment among the college faculty by describing theinstitutional ethos at Minnesota, a university thatprivileges research and publication and de-emphasizesundergraduate education. After exploring some of theideological reasons why university faculty resist rich typesof assessment and accept simplistic types (such asmultiple-choice tests of grammar skills), Anson arguedthat before a writing assessment program can beimplemented successfully, administrators must study andunderstand the academic culture that surrounds the plannedassessment. Armed with this knowledge, administratorscan plan ways of implementing rich assessment programswithout facing the sort of resistance that can lead toimpovaished tests and instructional decay.

Central to these understandings is an awareness ofthe relationship between writing programs and the largeracademic culture. Composition teachers andadministrators in radical writing programs are changeagents, whose political praxis must be consciouslygrounded in theory or run the risk of becoming ineffectual,or worse, of merely reinscribing the ideologies they seekto change. Beginning with this premise, Robert Brownset out to raise theoretical questions central to such praxis.An adequate theory, he claimed, would be hermeneutic,and might take as its text the university itself, in itsseveral manifestations: the behaviors of its members, itsconstituting texts, and its organizational structures. Theuniversity-as-text speaks of knowing and knowledge:

eir nature, value (economic and otherwise), creation andsocial utility. We might profitably read this text throughthe reciprocal processes defined in radical ethnography. Ifwe do, we can simultaneously explicate the bureaucraticforces we encounter in attempting to build genuineliteracy programs, and our own culture-specific ideologies.

Creating change requires ongoing dialogue across thecurriculum about such issues as standards vs. individualityand creativity; program assessment vs. individual growth;and the place of writing instruction in the rise of the newprofessionalism vs. the liberal arts education. Arguingthat change is possible with the right incentives forfaculty, Lillian Bridwell-Bowles concluded the session byoutlining some of the assessment activities underway atMinnesota. These include a study of "strong, typical andweak" writing samples across the undergraduatecurriculum, studies of writing in "linked courses" whichcombine composition instruction with content :earning,and planning the implementation of portfolios as arequirement for admission. The newly endowed DeluxeCenter for interdisciplinary Studies of Writing will

Page 7: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

provide o-,oing research funds for faculty interested infive catcbories: the status of writing ability during thecollege years; characteristics of writing across thecurriculum; the functions of writing in learning;characteristics of writing beyond the academy; andcurricular reform in undergraduate education. Other effortsto improve the context for the planned assessment includeearly pilot projects for portfolio assessment that have beenconducted in 18 Twin Cities Metropolitan school districts,and collaborative writing assessment projects that are partof the Alliance for Undergraduate Education,a consortiumof 13 public research universities.

REVAMPING THE COMPETENCY PROCESS FORWRITING: A CASE STUDY

Speakers: Deborah lloldstein, Governors StateUniversity, IllinoisInes Boswoth, Educauonal TestingService, Illinois

Introducer/Recorder: Lu Ming Man, University of Minnesota

Deborah Holdstein began by describing GovernorsSlate University. a junior, senior, and graduate institutionwith diversified s....dent body. The University used tohave a writing competency test for prospectivejuniors andseniors. Accompanying this competency test were a setof grading standards and a pass-and-fail system, both ofwhich had continually drawn criticism from test readersand scorers alike, because the) were vague and notacademically sound. According to this set of standards, apassing essay must (1) respond to the stated topic; (2)have a clearly stated thesis; (3) show clear, logicalorganization of ideas in organized. well-developedparagraphs; (4) include supporting details; (5) demonstrateone's editing ability.

Holdstein was asked to change this system. Sheobserved that the process of revamping an assessmentprogram was as political as it was academic. Onemisperception bandied around a lot was that the Englishteachers were determined to flunk students. Holdsteinrecalled that they needed someone from outside, an expertwith no stake, political or otherwise, in the system, tohelp teachers revamp the system. Ines Bosworth fromEducational Testing Service was brought in as aconsultant. Bosworth emphasized that as a neutralobserver, she was able to get different opinions fromfaculty in different departments. These discussionsbecame extremely useful because they enabled faculty toarticulate their concerns about possible changes rn thetesting program. Out of these discussions--andProvost's unfailing support--came the new scoring criteria,which have four major areas: focus, organization,elaboratica (support) and conventions (mechanics). Theseare scored with a 6-point scale, 6 being superior and Ibeing seriously inadequate. This scale replaced the oldpass/fail scale.

7

Holdstein noted that the number of questions nn thetest was reduced from 5 to 3 (although the test time isstill 60 minutes) in response to students' complaints thatthe number of tasks on the old test forced them to spend alot of time reading and figuring out questions instead ofactually composing. One of the three new questions readsas follows:

"Matrimony is a process by which a groceiacquired an account the florist had." What doesthis quote say about the transition from single tomarried life? Is it accurate? How so -- or hownot? Again, be sure to formulate a thesis withyour point of view, and use specific examples toback up your points.

Both speakers noted that one of the many merits ofthis new competency test is that readers can more easilyscore each essay according to the criteria. Moreover, thenew system is fairer than the old one. Under the oldsystem, whenever there was a split in "failing" or"pa sing" decisions, a 3rd reader was consulted. Under thenew system, each essay receives four readings, and readersdo not know whether they are the 3rd or 4th reader; thus, alot of political heat is removed. Readers also provireestudents with an "analytic checklist," which informs themof the criteria used, the weaknesses in their essays, andcomments from the readers.

Bosworth commented that the interrelater reliabilityof the new test is 92% (as opposed to 73% in the old test)and that more students have passed the new competencytest than before. However, Holdstein pointed out thatmost questions in the new test tend to be too content-laden and that the scoring criteria are too heavily weightedtoward content. Nevertheless, both speakers noted that thenew test has proven to be far more effective than the oldone and has fostered faculty collaboration.

WE 1,1t) IT AND LIVED: A STATEUNIVERSITY GOES TO EXIT TESTING

Speakers: Phyllis Liston, John Mathew, LindaPelzer, Ball State University

Introducer/Recorder: Joyce Malek, University of Minnesota

In Fall 1987, Ball State University (Muncie,Indiana) implemented exit testing for writing competencyas a prerequisite for graduation. The three member panelresponsible for establishing the rubrics and coordinatingthe testing and holistic grading discussed what they learnedduring this first year. Participants were given hands-onexperience with the exam by writing briefly in response toa sample 'writing test essay assignment and discussing theprocess we went through to begin answering the essayquestion. They then ranked actual essays and were led

Page 8: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

through the process the panel uses to develop the rubric.

Phyllis Liston began by describing what the examcoordinators learned in the process: (1) implementing,coordinating and gaining comn.unity-wide acceptance forexit exams is "a lot harder than it looks"; (2)communication at all levels is essential; (3) low-levelmistakes can cause high-level difficulties; money whenneeded is found; and (4) holistic grading works well. Inaddition, the exam needs full administrative and facultysupport As the director of the writing competency exam,Liston found the administrative duties to be a full-timeresponsibility requiring personnel assistance.

Liston explained Ball State's "3/3/3" exam process.Students sign up for the exam three weeks before theexam date and are given an instruction sheet detailing theexam process, the exam question, how to prepare for theexam, and where to go to receive help preparing for theexam. On the exam date, students are given gun hours towrite approximately gime pages in response to the examquestion. Students must pass the test to graduate. Aftertwo attempts, they are required to enroll in--and repeatuntil they pass--an upper division writing course. Thesecond opportunity to take the exam constitutes anautomatic appeal. Exit from the course is by portfolioprepared by the students with the help of their instructors.Portfolios are evaluated by two or more readers other thanthe classroom teacher/coach. No student takes the exammore than twice.

John Mathew explained the training process forholistic graders by taking participants through a minigrading workshop. We read and ranked three sampleessays high, middle and low. Then we read, ranked andintegrated into the previous essays three more, and did thesame for two additional essays. We then discussed ourranking of one of the essays in terms of its strengths andweaknesses. Finally we were presented with the six-pointrubric developed by the panel for the particular exam andwere asked to rate the essay.

In an actual reading, graders read ten papers at a time,assess, record and score, and pass the papers on to a secondreader. Papers with scores that do not match are given toa third reader. All pass decisions are made by theUniversity Provost under the advisement of the panel andother administrators after all exams for the quarter havebeen scored. The panel acknowledges a high readercalibration and suggests a main reason for it is that readersdo not know the cut-off point for failing, and therefore aremore objective and not sympathetically influenced to passa border-line paper.

Linda Pelar described the rubric design process. Thepanel develops a new rubric for each exam by reading andsorting all essays written for the exam into high, middleand low categories. After sorting, they discuss thecategories and write about them, and they draft a six-pointrubric--one that is quite detailed and descriptive and thatincludes specific examples from student papers to

8

illustrate the rubric's categories. A six-point rubric isused because it eliminates a middle score and because afour-point rubric would not be specific enough toencompass the aspects of the writing they wish to assess.The panel takes care and time in designing the rubric tomake it clear and specific in order for readers to reachconsensus and to withstand criticism from students,parents and faculty. Rubrics are kept on file at theUniversity library. One indicator of the success of therubric is that students who fail the exam and wish tocontest it usually reach agreement after examining therubric and evaluating their own writing against it.

Althougn the writing competency examinationproject is bigger than the panel first anticipated, they agreethat it is worth the work.

PROFICIENCY TESTING: ISSUES ANDMODELS

Speakers: George Gadda, University of California,Los AngelesMary Fowles, Educational TestingService, New Jersey

Introducer/Recorder: Adele Ilansen, University of Minnesota

George Gadda opened the discussion with a statementconcerning general issues in developing a proficiencytesting program. Proficiency testing, like achievementtesting, measures success in a particular domain. Thereare several motivations for proficiency testing: to certifyindividual achievement exclusive of grades, to validate aprogram's effectiveness, or to screen before certification ofpassing to the next level of instruction. The choice ofpurpose governs the rest of the assessment program.Proficiency tests may be used to exempt students fromfurther work; to prove value added in a course program; topermit passage, graduation or certification; or to identifythose who need further instruction.

Gadda noted that test-makers should define thedomain of the test by describing the kind of written abilitybeing assessed and that we should make a public statementconcerning the criteria used for judgment. Tests used foradv;_ncement should be a well-defined part of thecurriculum, with samples and grading criteria clearlydescribed. Ideally, scorers should be those people who aretesting and using the results. In addition, we need todetermine what will happen to those who don't pass.Gadda noted that proficiency tests should not be a"roadblock." He concluded by stming that we shouldstrive for high reliability and validity m our testingbecause proficiency tests need to withstand legalchallenges.

Mary Fowles remarked that we need an increasedunderstanding of what is to be tested and that the"community" must share the same standards. Shc referredto a project in Rhode Island, where a state administrator

Page 9: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

decided to work on literacy beginning in the third grade.ETS was asked to construct a test that encouraged goodwriting. They worked 7.1th local administrators andteachers from every school district in the state toformulate a writing test which was administered to all 3rdgraders. The test featured a pre-writing section and then anessay test. It also included an editing phase, wherestudents were given specific questions about content.

Fowles described how scorers were trained. Everydistrict in the state was represented in training sessions,and benchmark papers were identified and then used totrain local raters. After the results were tabulated, theteachers returned to the classroom and showed examples ofgood papers to the students and discussed the scoringcriteria. Next, the state decided to develop a portfolio ofsuch "assignments" to validate the scores on the "test" andto enhance teaching.

In the discussion that followed, questions were raisedconcerning the "read and respond" type of test. Gaddaagreed that such a test does assess reading as well aswriting, but that there is a connection and such tests areuseful to determine the students basic ability to douniversity level work. :le added that such tests seem mostfair, because all students begin with the same informationand the students can then better understand the testingsituation. He cautioned that such tests should always bepre-tested to discover if the reading is "accessible andinteresting" and if the assignment elicits more than oneresponse, because this can affect raters' evaluations.

CREATING, DEVELOPING, ANDEVALUATING A COLLEGE-WIDE WRITINGASSESSMENT PROGRAM

Speaker: Suzy Groden, University ofMassachusetts, Boston

Introducer/Recorder: Geoffrey Sim University of Minnesota

In this session, Suzy Grodcn reported on theUniversity of Massachusetts' writing assessment program,on-going since 1978. She described how it wasdeveloped, changed, and validated. The exam is a "risingjunior exam," required of students after 68 credits (orwithin first semester for transfer students). Called a testof writing proficiency, the exam really tests reading,writing, and critical thinking because students have torespond to questions on texts (or "reading sets") withwhich they are provided one month prior to the exam.Students are either judged proficient or must !mediatetheir writing skills.

The idea behind the test is to teach students what thefaculty want them to know in various core curriculumcourses, courses designed to include elements of criticalanalysis and reading/writing associated with thatdiscipline. Each reading set is 20 pages and concerns acontroversial topic associateAl with a specific discipline.

Students chose one of three sets, from natural sciences andmathematics, the social science, or the humanities, andthey read the set for a month After the first exam wasgiven in 1978, a sample for students became available anda student manual was developed.

Groden stated that one problem in the exam is thelack of a penalty for those who fail. TN:exam is gradedby readers who are trained in one morning and then readexams all afternoon. A student needs two readers to agreein order to pass, and three readers to agree in order to fail.But there are actually no practical penalties now associatedwith failing the exam: students can still take upper-division courses if they fail, and there is now an alternateway to demonstrate proficiency - -a portfolio.

During the course of subsequent years, changesoccurred in the context of the exam. After Groden and theuniversity's ESL Director became involved, an interest inwriting and the acquisition of language found its way intothe readings. Policies surrounding the implementation ofthe test were gradually loosened. The use of the portfolioalternative was extended, particularly to ESL students.Also, the range of writing samples included in theportfolio was expanded to include more than just thetraditional analytical paper: lab reports, for example,would he accepted. Students were allowed three hours towrite the exam, rather than just two. And in one of hermore striking findings, Groden found students wrote muchmore easily when they switched from the standard-sizeblue books to the larger, 8- and-I/2 inch size (that beingthe standard in which they most frequently composed).The exam committee also spent more time thinking aboutreadings and questions; the exams became morecomplicated, involving ideas about the nature ofknowledge. What ultimately evolved were two possiblequestions, one for the non-intellectual and one for themore challenging intellect. Finally, they also offered anevening session for taking the exam.

There were also many changes over the years whichGroden termed losses. Faculty involvement waned, withmore and more responsibility for grading falling to theexam committee. The school changed, taking in fewerfreshmen and more transfer students, with the exambecoming a kind of graduation test. Funding dried up,causing the university to retreat from its core curriculumand limit the number of its core courses, and, hence,severing the relationship between the curriculum andwriting proficiency.

One area in which the Massachusetts examdevelopers were successful was ;n establishing gradingcnteria. Sending exam samples to national experts,Groden received strong validation and agreement on theircriteria. The experts, however, were critical of the numberof questions, feeling they made different cognitivedemands and were unfair. The committee is continuing torevise the exam.

Page 10: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

HOW TO ORGANIZE A CROSS-CURRICULAWRITING ASSESSMENT PROGRAM

Speakers: Gail Hughes-Wiener, Susan JensenCekalla, Gerald Martin , MaryThornton-Phillips, MinnesotaCommunity College System

Introducer/Recorder: Julienne Prineas. University of

Minnesota

The speakers began the session by describing how,with the support of a Bush Foundation Grant, theMinnesota Community College System (comprised of 18two-year college scattered across the state) has beenengaged in a three-year project to assess the effects of theirWriting Across the Curriculum (WAC) program onfaculty and students, especially on student learning. Thefour speakers described their separate but overlapping rolesin the project, with a view to communicating thecomplexity of implementing this type of project. MaryThornton-Phillips' role has been to design and establishthe broad structure of the project. Susan Jensen-Cekalla,as the WAC Program Coordinator, has served as a bridgebetween the evaluation project and the faculty out int hecolleges. Gail Hughes-Wiener's role as EvaluationCoordinator is to ensure that all of the components of theevaluation-- such as surveys, interviews, essay exams andsuch--are designed, coordinated, implemented, analyzed andcommunicated. Gerald Martin, as research analyst, is incharge of processing the data.

Hughes-Wiener pointed out the need to budget for animmense amount of administrative, interpersonal, andprogram development required prior to an actual dataanalysis or report writing. Her experience has been thatno one, including consultants prominent in the field ofprogram evaluation, anticipated the amount of work andtime needed for this preparatory work. The scope of theproject demonstrates its complexity: data must becollected on faculty attitudes, student attitudes, and studentlearning. The project required the careful development ofquestionnaires and surveys, the effective training ofinterviewers, and the preparation of holistic scoring terms.In addition, the project leader had to build credibility andtrust among program participants and becomeknowledgeable about all needed information.

Thornton-Phillips commented that their progress hasbeen aided by a clear sense of direction, despite uncertaintyas to how to achieve their goals. Allowing for flexibilityand change within a general framework has provednecessary and productive. For example, facultyinvolvement was essential to the success of the project.Faculty had to become trained, knowledgeable participantswho understood the research and their role in it. Thus,Thornton-Phillips' first challenge was to assess the needsand interests of faculty in an attempt to generate strongstaff commitment and to develop a core faculty able to

1 0

provide leadership for the program. The task was hinderedby the voluntary nature of :taff development in theCommunity College System and by the tendency to cutfunds for such development during budget crises.Thornton-Phillips found the catalyst for the change neededin a dedicated Joint Faculty/Administrative StaffDevelopment Committee and in a small group of facultywho had worked together for several years onimplementing "Writing Across the Curriculum." Jensen-Cekalla joined the team as program coordinator, leavingThornton-Phillips free to work on budgeting, staffing, andscheduling aspects. Together, they refined the assessmentcomnonent and developed ml iable approach to reassurefaculty.

In her role as the most direct connector betweenfaculty and the evaluation project, Jensen-Cekalla'sforemost concern has been that all participants worktogether and coordinate their efforts. A cornerstone of theproject is a summer workshop, which brings togetherfaculty from all eighteen colleges. Follow-up meetingsduring the year provide the support and opportunity forexchange of information needed to maintain a united WACteaching approach, and the grant provides all teachers withfunds for a variety of supportive measures such as tutors,materials and supplies for the Learning Centers, andoutside and in-house consulting. Jensen -Cekalla has alsohad to organize the data flow of information and resourcesfrom the evaluation out to people in the colleges.

Martin's roles have included data analyst, dataprocessor, in-house statistical research consultant, andresident skeptic. With the project now in its fourth year,the time has arrived to renew the research grant and informthe granting agency of the progress made. Martin notedthat a project of this type raises many issues along theway. Its original purpose was to look at studentoutcomes, such as specific changes in writing proficiencyand the learning of subject matter. However, several otherdesirable outcomes not in the original proposal havebecome obvious. Hughes-Wiener noted that they havelearned, for example, that faculty enthusiasm for WACcan be generated and that inroads into the organizations atboth campus and administration levels can be made.

PRESENTING A UNIFIED FRONT IN AUNIVERSITY WRITING AND TESTING PROGRAM

Speakers: Lana Silverthorne, University of SouthAlabamaPatricia Stephens, University of SouthAlabama

Introducer/Recorder: Gail A. Koch, University of Minnesota

How can we foster it stitutional consensus aboutundergraduate wriung in a university? Lana Silverthorneand Patricia Stephens answered this question by focusingon university-wide participauon and dialogue. Theydescribed a multilateral commitment to undergraduate

lv

Page 11: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

writing that has grown incrementa:ly over the past sevenyears at the University of South Alabama. The nrimaryagent of this progress has been the continuousparticipation of faculty from various disciplines,especially in the construction of an upper-level writingacross the curriculum (WAC) program .

The impetus for ongoing development of the upper-level WAC program has been a week-long summerseminar for faculty across the disciplines. including onerepresentative from each of the undergraduate departments.It has been repeated annually since 1981. The seminarwork is guided by Director of the University WritingProgram and by an outside consultant. The participantswrite and talk al,-)tit the purpose of writing in their juniorand senior courses. They get acquainted with the practiceof continuous "writing-to-learn" and with its potentialuses in their courseF. They put together a proposal for asequence of "writin.,-to-learn" assignments to be tried andrevised in their own courses over several quarters, and theyreview each others' WAC proposals. They learn ways ofresponding to students' efforts to "write-to-learn."

According to Silverthorne, the WAC seminar,first conceived as a means to convert, has by now becomea forum for faculty leade' ,ip. Participants become theteachers of upper-level content courses designated aswriting courses. By now, at least haif of the faculty areteaching such courses. (Students are ww r...quired to taketwo such corn s, one in their major, and there are nowabout 70 such courses available each quarter.) WAC-experienced faculty influence the criteria by which acontent course can be designated as a writing course.They give precedence to continuous writing in contentcourses over production of the "one-shot" term paper, andthey sanction "discovery" writing which encouragesstudents to "bring their own experiences to bear uponsubject matter."

Silverthorne noted that holistic assessment of essayscomposed by transfer students w ho have had FreshmanEnglish elsewhere has provided a second opportunity forbuilding consensus at the University of South Alabama.Piloted in 1983, the test has recently become arequirement. The test prompt mirrors the emphasis onpersonal writing in the University's first quarter of lower-level composition and on the exit test given at the end ofthis first quarter of writing. Students are given a choice ofthree prompts. They have two hours to write withdictionaries and handbooks. Students are informed of thegeneral criteria by which their essays will be judged. Eachpaper gets three readings, and the evaluation determineswhether or not a tested transfer student starts in the first ofthe University's writing courses. Since 1983, about 75percent of the students have passed the test The transfertest essays are assessed by cadre of faculty readers fromvarious disciplines who teach the upper-level contentcourses designated as writing courses. Their decision is topass or fail an essay. If an 'ssay arouses irresolvableambiguity in one reader, it is passed on to two additionalreaders for the pass/fail decision.

Records on this assessment process bear out theclaim of active university-wide participation of faculty.Between the fall of 1986, fifty-six faculty have served asreaders, about 71% of them from the pa,fessorial ranks.Their diaribution by department or discipline showsvariety: 7% Business; 34% English; 9% Humanities andthe Arts, 18% Medical Sciences and Nursing; 14% NaturalSciences and Engineering; 18% Social Sciences andEducation. The records also show high inter-readeragreement. Figures over twelve quarters between the fallof 1983 and the fall of 1987 show the average rate ofagreement to be 87.2% in the first year. The local readingwas tested against the judgment of external readers. Withthe help of .he NTNW, a study was conducted to comparethe assessments of five local readers to that of three readersat CUNY. The rate of agreement between he two groupsoverall was nearly 80%.

Patricia Stephens took up the matter of the reasonsfor the high degree of consensus in tnis assessmentprocess. She cited the quality of the WAC seminars, thecredibility of the program director,and administrativesupport and incentives.Faculty who are developing a newupper-level writing-designated content course are releasedfrom teaching one course, and the errollment in theirwriting-designated content course is reduced to 2i Aparticipant in the week-long WAC seminar is paid $400; areader for the transfer test essay who, on an average,judges 35-40 papers, receives an honorarium of $50.Stephens stressed the importance of the faculty's commonconcern for students' development as effective writers,underscoring Silverthorne's contention that drawing uponfaculty from various disciplines creates a university-widesense of responsibility for the quality of students' writingand fosters a continuing university-wide dialogue aboutwriting standards.

The continuing dialogue :s crucial. Stephensdescribed "calibration sessions". In these sessions,readers consider their common purpose of helping studentsto improve their writing and discuss the genera! criteria orqualities by which they decide to pass or fail a test essayin relation to this common goal. There are four qualities,a number kept small on purpose, to head off apenchant"read for everything we know in our variousdisciplines." The naming of the criteria, too, is keptsimple and true to the holistic assessment principle ofreading for general impression: jnvention (Has the writerof the essay been thoughtful, reflective, candid?)Arrangement (Has the writer achieved wholeness, made apiece of it?) Development (Has the writer recognized andfleshed out the point of the essay, giving it credibility andvalidity?) Style (Does the essay have clarity, give evidenceof the writer's own voice, the writer's own crafting, andediting?).

The dialogue amongst faculty continues throughinstructional use of carefully kept records. Results ofinter-reader reliability and validity studies are shared withreaders to help them evaluate their own readingperformance in relation to that of the others. Readers are

Page 12: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

given detailed information about the results of their owndecisions, a statistical summary of each reading session,and a cumulative summary of all reading sessions. Inaddition, tie readers are rated and their ranking reported tothem. They are rated on three bases: experience,reliability, and validity (or the fit between their judgmentsand other information about students such as GPAs ..ndACT scores). In short, readers have regular, informedopportunities to reflect upon the relative fit of theirjudgment with the consensus.

One last piece of information about the consensusreported by Silverthome and Stephens is that themembership of the transfer-test reading group is stable,the chief movement being the addition each year of twonew members from the summer WAC seminar. Oncehaving assumed the role, very few have ever repudiated it.Stephens pointed out that it is in the faculty's interest tobe involved: reading the test essays serves as a usefulmeans by which faculty who teach writing-designatedjunior and senior courses can gauge students' readiness todeal with the "writing-to-learn" orientation of theircourses.

EVALUATING A LITERACY ACROSS THECURRICULUM PROGRAM: DESIGNING ANAPPROPRIATE INSTRUMENT

Speaker: Linda Shohet, Dawson College,Montreal

Introducer/Recorder: L. Lee Forsberg, University of

Minnesota

Linda Shohet, director of the Literacy Across theCurriculum Center at Dawson College, has taughtCanadian literature and writing at Dawson since 1973.She began developing the Literacy Across the Curriculumprogram in 1984; the Center now provides instructionaland consultation services to English high schools andcolleges throughout the province. She began the seesionby reviewing the language-related political issues inQuebec, and then she sketched the development of thecenter and discussed the evaluation of the programscheduled this spring at Dawson.

Quebec is a unilingual province in a bilingualcountry. French (first language) speakers comprise 24.6percent of the population of Canada and 83.5 percent ofthe population in Quebec. English (first language)speakers comprise 68.2 percent of the population inCanada and 12.7 percent of the population in Quebec.French speakers see the maintenance of their languagepolitically, as the survival of their culture. Consequently,language awareness is high.

Dawst. college is a two-year, English languagecommunity college; all students going on for a Universitydegree must first complete community college. TheLiteracy Across the Curriculum program was initiated by

12

the faculty development committee, not the EnglishDepartment, and its administration remains in the facultydevelopment office. Keeping it out of the Englishdepartment gives the program a broader base of supportand institutional commitment, Shohet said. The program,originally intended as internal, soon started receivingrequests from English-language high schools and othercolleges, asking for ideas and resources. As the programexpanded to meet those needs, costs rose. The only sourceof additional funding was the government, which requiredevidence that the program was relevant to the entirecommunity, including French-language schools andcolleges. Th.; Dawson College administration had orderedan evaluation of the program to show its value to its cwnfaculty before supporting expansion.

The bureaucratic demand, Shohet said, is to ask howmuch student literacy has increased as a result of aprogram; her response consisted of showing how facultyhave responded and how classroom activities havechanged. She also commented that the outcomes of aliteracy across the curriculum program are not limited toreading and writing instruction. At Dawson, facultymembers began attending more faculty developmentseminars, interacting across departments, and volunteeringto develop classroom projects (when previously, they hadbeen embarrassed to be seen at a writing workshop).Faculty publications also increased.

The design of an evaluation instrument began with amodel developed at San Diego State College, whichhelped define an evaluation that would be particular toDawson. The college also employed an outsideconsultant, Shohet noted, which gave the evaluationadditional objectivity. She cautioned against usinggeneric evaluation instruments; each program developswith its own goals and philosophy, and must be evaluatedon that basis.

The Dawson evaluation focused on particularquestions ?bout classroom activities before and afterworkshop attendance: class time spent writing; timespent talking about writing; writing assignments; use ofjournals; working with drafts; oral communicationassignments; use of library resources. In categoriescovering writing, reading, speaking, and listening skills,the evaluation attempted to determine what changesinstructors had made in their classes, what goals they hadfor center programs, and whether participation in centerprograms had promoted educational exchange with otherfaculty members or increased levels of theoreticalknowledge. One section defines the program's objectivesand asks faculty to evaluate objectives as appropriate andapplicable; this type of inquiry not only helps refineprogram goals for future planning, but reinforcesawareness of program objectives among faculty memberswho respond, Shohet said.

The center ran a pilot study, distributing evaluationforms to 150 randomly selected faculty members, chosenfrom those who had attended workshops. About 100responses were returned; some questions were refined. The

1 ,

Page 13: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

evaluation in its final form will be distributed to facultymembers across the curriculum this spring. Shohet willreport on the results at next year's NTNW Conference,which will be held at Dawson College. (Shohet is theConference Co-Coordinator.)

VALIDITY ISSUES IN DIRECT WRITINGASSESSMENT

Speakers: Karen Greenberg, NTNW and HunterCollege, CUNYStephen Witte, Stanford University

Introducer/Recorder: Joanne Van Oorsouw, College of St.

Catherine, Minnesota

Karen Greenberg began with what she deemed aradical statement: "I have examined more than 600writing tests and have yet to see one that I would conside-to be a valid one." She went on to state that it seemsimpossible for writing tests, with their narrow subjects,implausible audiences and severely restricted time frames,to reflect the natural processes of writing in eitheracademic or personal contexts.

Greenberg explained her position by pointing outthat writing consists of the ability to discover what onewishes to say and to convey one's message throughlanguage, content, syntax and usage that are appropriate.0. one's audit ice ;Ad purpose. In light of this, she said,it is particularly distressing to note that teachers at manyinstitutions find themselves administering tests that bearlittle resemblance to this definition or to their curriculaand pedagogy. For example, many schools still usemultiple-choice tests of writing even though this type oft :sting does not elicit the cognitive and linguistic skillsInvolved in writing.

She stated that writing sample tests, on the otherhand, can assess writing capacities that cannot bemeasured by existing multiple-choice tests. They,however, also, have flaws, and many problems result fromour reliance on single-sample writing tests for placementand proficiency decisions. She warned that a singlewriting sample can never reflect a stude: ability to writeon another occasion or in a different mode. Yet, accordingto surveys conducted by NTNW and CCCC, thousands ofschools across the country continue to assume that"writing ability" is stable across different writing tasksand contexts and continue to use a single piece of writingas their sole assessment instrument.

Greenberg then went rt. to suggest what thoseinvolved in large-scale scale direct assessment of writingshould do about validity. The first step in establishing atest's validity is to determine its purpose: what

13

information is needed by which people and for whatpurposes? The next step is developing a clear definitionIf the writing competence that is being assessed, one thatwill vary according to the purpose and context of theassessment. Developing this definition is a critical stepin creating a valid assessment, but it is easier said thandone for there is as yet no adequate model of the variousfactors that contribute to effective writing in differentcontexts. Finally, after coming to agreement on theirdefinition of writing competence, faculty need to establish^onsensus about the writing tasks that are significant inparticular functional contexts.

Greenberg noted that she deliberately chose to talkabout faculty rather than test developers, for she believesthat the people who teach writing should be the ones whodevelop the assessment instruments. Faculty need towork together to develop tests, to shape an exam theybeliev, in so that they can be sure its principles infusecure:alum and classroom practice. Even when facultywork together, however, Greenberg said that definitions ofcompetent writing may vary dramatically. Locally-developed essay tests show incredible variability in theskills measured, due to difference in the range of skillsassessed ai.cl the criteria used to judge those skills. Forexample, faculty often differ about the range of discoursestructules that they should teach and that a test shouldassess. One way to sample students' ability to writedifferent types of discourse is to use the portfolio method,in which writers select three or four different types ofdrafts and revisions for evaluation. This kind ofassessment reflects a pedagogy that emphasizes processover short, unrevised products. Thus, this kind of teststimulates writing teachers and programs to pay moreattention to the craft of composing.

Greenberg's final point was phrased as a question:What is the relationship between what we teach and whatwe test? We cannot, and should not, separate testing fromteaching, and we as a profession must be more concernedwith the validity of both of these efforts.

Steve Witte summarized a study begun in 1982which sought to answer two research questions: (1) Dowriting prompts that elicit different types of writing andthat elicit written texts of the same quality cause writersto orchestrate composing in different ways? and (2) Docomparable prompts that elicit the same type of writingand elicit written texts of the same quality cause writers toorchestrate composing in different ways? Witte stated thatalthough this study did not investigate naturally occurringdiscourse, this type of experimental study can inform thekinds of conceptualizations we can make beyond theexperimental study.

The rust step in corducting this research was tocreate two comparable writing tasks of two types:

Page 14: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

expository and persuasive. Prompts were created afterconsultation with students, writing teachers, high school}'others, and "e- service high school teachers. Thoseprompts found to be comparable by these groups were!'en pretested and were found to elicit comparable rangesof writing quality. The subjects were 40 volunteer collegefreshmen at the University of Texas who were randomlyassigned to one of the four tasks. Think-aloud protocolsand rough drafts were collected and analyzed according to acoding scheme developed by the experimenters. Theresults of a multivariate ANOVA showed that 16 variablesdistinguished between the persuasive and expository tasks;these variables included generating ideas, setting contentgoals, reviewing text. Writers tended to set more contentgoals and generate more ideas for the expository tasks andset more rhetorical goals for the persuasive tasks. Adiscriminant analysis was done to determine whichvariables distinguished among all four tasks. Elevenvariables were found to do this.

Witte stated that findings indicate that writers engagein different kinds of processes for different kinds of tasks.In terms of writing assessment, each prompt we use toassess ability will be measuring different dimensions ofthat ability. The obvious conclusion, then, is that thereis no way to assess writing ability with r- ' j one task orprompt. We do not yet know how many prompts or tasksmight be needed. Witte also noted that this study shouldmake us question models or the writing process that arebased on protocols from just one task. More research ofthe type presented here--studies that examine the effects ofcontext on process--are needed. In Witte's study, contextwas limited to the writing prompt, a part of the contextimportant to writing assessment. He said that we needmore research that will help us identify how writingprocesses are circumscribed by other aspects of context.

RELIABILITY REVISITED: HOW MEANINGFULARE ESSAY SCORES?

Speaker: Edward White, California StateUniversity, San Bernardino

Introducer/Recorder: Karen Greenberg,NINNY and CUNY

Ed White began the session by offering a cleardefinition of reliability: it is the consistency ofmeasurement over different test situations and contexts.He explained the various types of reliability and &cussedtheir origins in agricultural research. He briefly c scussedvalidity in educational research and noted that reliability is"the upper limit for validity" (i.e., no test can be anymore valid ,Nan it is reliable).

Next, White discussed "true scores," the "standarderror of measurement," and uncertainty in measurement.The true score of a test is a Platonic ideal--it is the meanscore of repeated attempts at the test under identical

conditions. Since we can never determine a student's truescore on a test, we need to calculate the test's standarderror of measurement (a statistical estimation of thestandard deviation that would be obtained for a series ofmeasurements of the same student on the same test).White pointed out that because of the error in allmeasurement, no single score is reliable enough to beused as the sole determinant of any particular ability orskill.

Next, White explained the problems in essay testreliability. He compared the reliabilities of holisticscoring, analytic scoring, and multiple-choice scoring; andhe discussed the difference between inter-rater reliability(agreement between different raters) and intra-raterreliability (agreement of a rater with him/herself atdifferent points in time). White commented that raterdisagreements over the quality of holistically-scored essaysdo not constitute "errors." The traditional psychometricparadigm of reliability cannot help us with a phenomenonsuch as subjective judgment, which may be betterdetermined through rater disammags rather than throughtheir agreements. This led White to a discussion of"generalizability theory" and its implications for thereliability of essay test scores. He noted t our goalshould be a reduction in the number of rail Agreementsof more than two scale points (these should occur no morethan 5% of the time in any scoring session).

White ended with suggestions for increasing thereliability of essay testing. Essay test administratorsshould reduce the sources of variability in test contexts(by controlling as many variables as possible), shouldkeep the scoring criteria constant, should pre-test andcontrol test prompts, should control essay reading andscoring procedures, and should always try to use multiplemeasures to assess students' skills.

ESTABLISHING ANP MAINTAINING SCORESCALE STABILITY AND READING RELIABILITY

Speakers: Wayne Patience,GED Testing ServiceJoan Auchter, GED Testing Service

Introducer/Recorder: Anne Aronson, Tniversity of Minnesota

Wayne Patience and Joan Auchter presented theprocedures used by the General Education DevelopmentTesting Service (GEDTS) to evaluate essay exams requiredas part of the GED Test for individuals seeking highschool equivalency diplomas. They described andillustrated the methods employed by GEDTS to establishand maintain stability or consistency of scoring, andreliability among readers, despite the decentralized natureof their evaluation program.

Patience explair.cd that the notion of equivalencyderives from: (1) defining the content of the GED Testsso as to reflect the community expected outcomes ofcompleting a tradition' high school program of study and

141 4

Page 15: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

(2) defining passing scores relative to the actualdemonstrated performance of contemporary graduatingseniors. Only those examinees who receive scores that arebetter than 30% of high school seniors are awarded thediploma. The job of the GED staff is rather to describethe skills and content knowledge that characterize the workof high school seniors than to prescribe levels ofachievement.

The recent addition of an essay exam to the WritingSkills Test created questions about how to establish andmaintain reliability. The GEDTS's first activity was todevelop a scoring scale that would have the same criteriaregardless of time or place. By administering an essayexam to thousands of high school seniors, sorting thoseessays into six stacks, and describing the characteristics ofeach stack, the Writing Committee of GEDTS was able todevelop a holistic scoring scale that has been usedsuccessfully in hundreds of sites nationwide.

Auchter then reported on how GEDTS insuresstability and reliability in the use of the scoring guide. Apermaner.: GEDTS Writing Committee, consisting ofpracticing language arts professionals, selects the topicsand the papers that are used in training, certifying, andmonitoring site trainers and readers. The WritingCommittee chooses and tests "expository" topics that donot require students to have any special knowledge orexperience. The next step is for GEDTS to train andcertify Chief Readers who are responsible for insuring thatthe GED scoring standards are applied uniformly. Duringthe 21/2 day training, Chief Readers learn to overcomepersonal biases (e.g., responses to handwriting) that mayinfluence scoring, and to use the language of the scoringguide alone to describe and evaluate papers. Sets oftraining papers contain a range of papers for each point, toillustrate the fact that there is no "perfect" paper for eachpoint, but that there is typically a distribution of high,medium, and low papers. Training packets also includeproblematic papers (e.g., a paper written in the form of arap song). Since the national average for high schoolessays scores is 3.25 and for GED scores is 2.7, trainingsets contain a disproportional number of 2, 3, and 4papers. After working with training papers, GEDTStrainees are required to evaluate several sets of papers todetermine whether or not they are currently certifiable asChief Readers.

The same training and certifying procedure is carriedout at the various decentralized testing sites, with theChief Readers responsible for training and certifyingreaders. Auchter noted that language arts teachers trainedthrough this process feel better about teaching writing andabout using holistic grading in the classroom.

Further steps to insure score scale stability andreliability are site certification and monitoring. Eachscoring site must demonstrate the ability to score essaysin accord with the standards defined by the GED TestingService. Essays used for site certification must receive atleast 80 percent agreement in scoring among Writing

I5

.1 5

Committee members. Although some sites may achievehigh inter-reader reliability, a site cannot pass certificationunless it achieves at least 90 percent agreement withGEDTS essay scores (or 85 percent for a provisionalpass). Three procedures are used to monitor testing sites:(1) the Chief Reader does third readings of discrepantscores and records each time a reader is off the standard; (2)readers evaluate a set of 'recalibration" papers at thebeginning of each scoring session in order establishreliability for that day; and (3) GEDTS conducts sitemonitoring using the same procedures as are used in sitecertification.

TRAINING OF ESSAY READERS: A PROCESSFOR FACULTY AND CURRICULUMDEVELOPMENT

Speaker: Robert Christopher, Ramapo College,New Jersey

Introducer/Recorder: Mary Ellen Ashcroft, University of

Minnesota

Robert Christopher emphasized the imperative ofassessment, pointing out that assessment has always beenintrinsic to the classroom experience, but it has nowbecome extrinsic. He noted that many faculty fear writingassessment efforts because they represent an intrusion ontheir methods for evaluating students. He stated thatfaculty fears can be countered by several arguments:assessment helps students, it facilitates faculty andinstitutional research, and it is a professional activity.

Christopher went on to suggest ways of buildingfaculty consensus for assessment. In a training readers, hesuggested starting with a loyal, supportive core. Thisgroup's primary responsibility would be the developmentof an instrument for assessment, a task which should takesix months to a year. He suggested that good readers arepeople who are task oriented, are good collaborators, arepreferably not new faculty members (who might not havea sense of writing at the institution), and who work with"all deliberate speed." Good readers must not be "MatthewArnolds" before whose standard everything fails. It isimportant, according to Christopher, that a large pool ofreaders be developed, so that a small loyal group will notwear out.

The next step, according to Christopher, isconducting a reading to build consensus. The initialreading should consist of 500 to 1000 essays, so thatreaders get a sense of the range of writing abilities ofstudents at the institution. The reading must be conducted"blind" with each paper read and assessed twi^,e. Essaysare identified as "strong," "weak," and "in-between";readers discuss each essay and slowly evolve into aninterpretive community.

In terms of curricular implications, Christopherpointed out that placement assessment is easier to

Page 16: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

accomplish and has been more fully studied thanproficiency assessment. He also noted that assessmentcan be used for students to learn 'ci talk about their writingin small groups and in conferences, so that students learnto to better readers an4 editors. Assessment can also beused to encourage collaborative or group teaching, saidChristopher. As faculty members relinquish some controlto group or collaborative situations in the assessmentprocess, they learn from one another and share techniquesand materials.

In answer to conferees questions about developingthe holistic process, Christopher suggested that theEnglish Department provide a core of expert readers whichshould eventually grow to become interdisciplinary. Henoted that in any two-day reading of essays, there isalways the need for reliability chocking ("Let's all read thisessay and make sure we're on track"). Finally Christopherpointed out that students benefit film holistic essayassessment because their writing skills are evaluated by ateam of teachers. This kind of assessment program,Christopher says, works on behalf of students.

DISCREPANCIES IN HOLISTIC EVALUATION

Speakers: Donald Daiker, Miami University, OhioNedra Grogan, Miami University, Ohio

Introducer/Recorder: Sandra Flake, University of Minnesota

Donald Daiker presentee. the goals of the sessions:to share the conclusions and a tentative evaluation of hisand Nedra Grogan's examination of discrepancies inholistic evaluation. Noting that discrepancies in holisticevaluation have been a problem from the beginning, heraised two questions: What accounts for discrepancies inholistic evaluation if the "quidcy", reader is ruled out? Andis there such a thing as a discrepant essay?

Daiker and Grogan sought to answer these questionsusing an annual holistic grading session for MiamiUniversity's Early English Composition AssessmentProgram (EECAP), a program in which 10,000 essayswritten by high school juniors in a controlled setting areevaluated for diagnostic purposes. The setting was one inwhich students, using a prompt, wrote for 35 minutes in ahigh school composition class. The time limitation wasdictated by the constraints of a single class period. Thegoal of the holistic evaluation was essentially diagnostic,with a scoring scale of I to 6. Grades of 5 or 6 indicatedclearly above average papers demonstrating strengths in allof the rating criteria. Grad as of 3 or 4 indicated papersranging from slightly below to slightly above average,with combined strengths and weaknesses in the criteria orunder development. And grades 1 or 2 indicated clearlybelow average papers tailing to demonstrate competence inseveral of the criteria, often because the paper was tooshort. A grade of 0 was used only for papers which wereoff the topic of the prompt. Evaluators gave each paper asingle holistic rating, and additionally rated criteria in four

16

categories (ideas, supporting details, unity andorganization, and style).

The participating high schcol teachers (who were theevaluators) were trained through a process of rating anddiscussing sample papers, so that the rating criteria wouldbe internalized. Participants in the session were thenprovided with the writing assignment or prompt, thescoring scale, the rating criteria, a rater questionnaire, andone of the papers.

To locate possible discrepant papers, Daiker lookedfor three-point gaps in scoring by two evaluators and gavesuch papers to both a third and fourth evaluator. If thoseevaluators also disagreed on the rating of the paper, heidentified it as a potentially discrepant paper. Throughthis process, four potentially discrepant papers wereidentified, and those four papers were given to all 61 ofthe evaluators in a session at the end of the secondweekend of evaluation. Participants in our session thenread and evaluated one of the potentially discrepant papers,using a rater questionnaire, scoring scale, and ratingcriteria. The rating of the participants were tabulated: 1person assigned the the paper a 5,16 assigned a 5,28assigned a 4, and 4 assigned a 3.

Following the participant evaluation and somediscussion, Grogan presented the result of the evaluationby 61 trained raters who rated the paper at the end of thesecond weekend of evaluation, with 26 of the raters(42.6%) giving an upper range (5-6) rating, 34 of theraters (55.8%) giving a middle range (3-4) rating, and 1(1.6%), giving a lower range (1-2) rating.

Because of the clear division between the 5-6 and the3-4 rating, Grogan and Daiker believe that the paper didqualify as a discrepant paper. Daiker reported thatdiscussion following the rating by the trained evaluatorssuggested a correlation between the depth of emotionalresponse to the paper and the highness of the score.Following some discussion about whether or not thepaper was truly discrepant, a conferee asked whether theproblem was really caused by discrepant readers who couldnot be objective because of the depth of their emotionalresponse. Daiker argued that reader objectivity was morecomplicated issue and further argued that precisely becausethe paper provokes a range of responses to the emotionalcontent, it could be defined as a discrepant paper.

The implications of evaluating discrepant paper werethen summarized by Grogan, who raised the issue of therole of holistic evaluation of a single essay that receivesdiscrepant scores. She concluded that in such cases asingle essay should not determine the fate of the writer,and that an appeals process clearly needs to be asignificant part of a holistic evaluation program.Discussion throughout the session focused on some of thelimitations of holistic evaluation of writing producedunder a time constraint, on problems in establishing clearcriteria and stales, and on problems of reader objectivity.

Page 17: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

PROBLEMS AND SOLUTIONS IN USING OPEN-ENDED PRIMARY TRAIT SCORING

Speaker: Michael C. Flanigan, University ofOklahoma

Introducer/Recorder: Chris Anson, University of Minnesota

Michael Flanigan began by outlining hisuniversity's plan for five 'ears of experimental research onthe teaching and testing of writing. Much of this researchwill replicate published research studies, but originalresearch will also be conducted. All of the studies will becontrolled experimental studies so that the researchers canbe fairly faithful to the original ones and can analyze anydifferences between the original and new research.

Flanigan discussed one study, already completed, inwhich his colleague David Mair and he combined thestrategies of two studies by George Hillocks, anexperimental study involving teaching extended definitionusing inquiry and models and a descriptive study dealingwith "modes of instruction" (both of which Hillocksdiscusses in some detail in this book j.,3earch on WrittenComposition). In the replicated study at Oklahoma, alltwenty classes consisted of university freshmen; for nineof the ten teachers it was their second semester ofteaching, and approximately 500 students were involved.

Flanigan pointed out he chose Hillocks' studiesbecause both dealt with significant areas in teaching andwriting. Extended definition represents a kind of discoursethat permeates almost al, thinking and writing. Theresearchers believe that by replicating such an importantstudy they could get inside the problems of the earlierresearch, and come to understand it better. Theexperimental extended definition study also used Hillock'sopen-ended primary trait scoring technique because theresearchers wanted to learn to use and understand it better.

After reporting the findings from a small samplingof the data, Flanigan described some problems that he andhis colleagues faced as they attempted to use Hillocks'open-ended primary trait scoring system and he discussedthe modifications they made in it to obtain reliableresults. He pointed out that with an open-ended primarytrait scoring scale theoretically there is almost no limit towhat students can score. Most scoring scales range from1 to 6 (as in the h 'vtic score for the ECT), 2 to 8 (as inCLEP), 1 to 5 CORE scoring) and so forth. Inopen-ended pri nit scoring, the limit for a talentedstudent is probably dictated by time and the variation andlimitations imposed by the writing called for. In thepapers scored in this srt-.;,), the top score was 28.

The traits for which students could receive scoreswae: (1) properly putting an item in a class; (2) creatingcriteria for the class; (3) giving examples; and (4)providing contrastive examples to clarify and limit eachcriterion. Points were not given for differentiae as inHillocks' original study; instead, class and differentiae

17

were combined (on the advice of Hillocks when the studywas set up). Hillocks' scorers had had problems reachingagreement on this point. Students could receive 2 pointsfor the class, 2 for each criterion, 2 for each example, and2 for each contrastive example. Obviously the morecriteria, examples and contrastive examples students couldcome up with, the higher their score. In initial training,scorers had problems staying close together in the higherranges, so Flanigan modified his tolerance of acceptabilityby allowing scores in the range 1 to 10 to differ by 1point, 11 to 20 to differ by 2 points, and 21 up to differby 3 points. Scores within tI it range were averaged;scores that did not meet acceptable standards were read by athird reader. If the third reading fell within range of eitherof the other two readers, then those scores were averaged.If there still was no agreement, a fourth and fifth readerscored the paper, and the paper and the range of scoreswere given to the researchers and a score was determined.For example, one paper was scored 6 and 8; a third readergave it 10; the fourth reader gave it 9, and the fifth readergave it 7. Its final average was an 8. Only seven papersrequired the fourth and fifth reader. Often, readers hadproblems keeping clearly in mind the kinds of criteria thewriters were developing. To simplify the process, anyone clear criterion could be accompanied by a number ofexamples and contrastive examples. If no criterion wasgiven, only one example could be counted. If anundeveloped example or string of general examples wasgiven, a score of 1 was given.

Flanigan concluded that open-ended primary-traitscoring offers real promise, for it allows for a kind ofdifferentiat;on that closed, limited systems do not.However, researchers who use the system will probablyhave to modify it to get consistent, reliable scores. Theywill also have to plan their research so that the traits theyare describing and scoring are clear, well-defined, and fullyconceptualized by their scores. The session ended with thespeak?" giving participants six papers that had been scoredby three readers and leading participants through a guidedscoring session.

THE IMPLICATIONS OF THE RHETORICALDEMANDS OF COLLEGE WRITING FORPLACEMENT

Speaker: Kathryn Fitzgerald, University of UtahIntroducer/Recorder: Linda Jorn, University of Minnese'l

Kathryn Fitzgerald gave participants attending this sessiona chance to analyze student writing in terms of rhetoricalevaluation criteria developed at the University of Utah.These criteria arc intended to do the following:(1) describe the rhetorical situation college students facewhen asked to write an essay that will be assessed andused to place them in a freshman writing course; (2) assertthat the evaluation of the rhetorical situation providesvalid criteria for placement of students into various levelsof freshmen writing courses; and (3) shape the discussion

.1 I

Page 18: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

TIE SEVENTH ANNUALCONFERENCE ON WRITINGASSESSMENT is a national conferencedesigned to encourage the exchange ofinformation about writing evaluation andassessment among elementary, secondary, andpostsecondary administrators, teachers, andtest developers through forums, panels, andworkshops.

DISCUSSION TOPICS:Research on Writing Assessment Yr WritingAssessment Across Cultures and LanguagesDeveloping Essay Tasks 181 New Models of ScoringEssays II The Impact of Testing on Curricula andPedagogy CI Problems in Exit and ProficiencyTesting II Computers and Writing Assessment ElWriting Program Evaluation 8 National Standardsfor Writing II Portfolio Assessment

Note: Springboards, Quebec's annual languagearts conference, will take place April 13th and14th in Montreal. For information, call Fran Davisat (514) 484-7646. E.S.T

PRE- AND POST CONFERENCEWORKSHOPS:To provide in-depth exploration of selectedissues, four workshops have been addod to thisyear's program. Enrollment for each islimited, so register as soon as possible. The feefor each workshop is $40 US, $50 Canadian(which includes lunch, coffee breaks, and allmaterials).

Pre-Conference: Saturday, April 89:30 a.m.-4:00 p.m.

A. Computers in WritingLeaders: Helen Schwartz & MichaelSpitzer (30 participants)

B. Large-Scale Writing AssessmentLeader: Edward White(50 participants)

Post-Conference: Wednesday, April 129:30 a.m.-4:00 p.m.

C. Portfolio AssessmentLeaders: John Dixon & Peter Elbow(50 participants)

D. Writing Assessment Across CulturesLeaders: t.lan Purves and colleagues(50 participants)

CONFERENCE SCHEDULE

Sunchy, April 9, 1989

6:00 p.m. - 7:30 p.m.Conference Opening

SPEAKERS:

Carolynn Reid-Wallace, ViceChancellor, CUNY

Gerrard Kelly, Director GeneralDawson College

Rexford Brown, EducationCommission of the States

7:30-9:30 p.m. Reception

Monday, April 10, 1989

9:00 a.m. - 10:30 a.m.Opening Plenary Session

SPEAKE As:

Joseph S. Murphy, Chancellor, TheCity University of New York

John Dixon, Author of Growth ThroughEnglish

10:30 a.m. - 5:30 p.m.Concurrent Panels/Workshops

Tuesday, April 11, 1989

9:00 a.m. - 10:15 a.m.Second Plenary Session

SPEAKER :

Janet White, Deputy Director,NFER, England

10:30 a.m. - 11:45 a.m.Concurrent Panels/Workshops

12:00 p.m. - 2:00 p m.Luncheon and Closing Session

SPEAKER :

Bernard Shapiro, Deputy Ministerof Education, Ontario

2:15 - 5:00 p.m.Concurrent Panels and Workshops

Page 19: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

REGISTRATION IN FORMATION

Registration is limited to 700 people. Before March9, the registration fee of $100 US or $125Canadian per participant includes:

opening night reception (April 9)2 continental breakfasts (April 10,11)conference luncheon (April 11)coffee and tea between sessionsall conference materials

After March 9, 1989, the registration fee is $120US or $150 Canadian.

For further information, call Linda Shohet514-931-8731 or Karen Greenberg 516-766-8099

HOTEL RESERVATIONS

Please write or call the Centre Sheraton Hoteldirectly before March 7, 1989 to receive ourspecial conference rates:

Single: $115 Canadian*Double: $125 Canadian*

'There is so. tax on hotels in Montreal. The UnitedStates/Canadian exchange rate varies: the cost of the room inUS dollars will be 20% - 25% less than the Canadian rates above.

Address: Le Centre Sheraton Hotel1201 Dorchester Blvd. WestMontreal, QuebecCanada H3B 2L7

Phone: (514) 878-2063

THE SEVENTH ANNUAL CONFERENCE ON WRITING ASSESSMENT

Registration Form

NAME(last name) (first name)

TITLE

INSTITUTION

MAILING ADDRESS

HOME PHONE ( ) WORK PHONE ( )

Conference Registration Fee: Before March 9, 1989: $100 US; $125 CanadianAfter March 9, 1989: $120 US; $150 Canadian

Workshop Fee: If you are registering for a pre- or post-conference workshop, please write a separate checkfor each workshop (so that we may return your check if the workshop has already reached its limit). Eachworkshop is $40 US, $50 Canadian. Please circle the letter of each workshop for which you are registering:

Sat. April 8: A_ B_ Wed. April 12: C_ D_

Please mice all checks payable to: National Testing Network 1989

Please mail completed registration form and check(s) to:

Linda Shohet3040 Sherbrooke Street W.Montreal, Quebec, Canada H3Z 1A4

Page 20: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

of the student writing by providing a holistic view ofwriting.

Before handing out samples of student writing,Fitzgerald discussed the theoretical background fordeveloping the rhetorical criteria. She also reviewed someof the common problems of holistic scoring, emphasizingthe fact that holistic scoring do.s not consider the differentpurposes of writing (for example, persuasive vs. selfexpressive writing). The rhetorical evaluation criteriadeveloped at the University of Utah were designed toalleviate some of the problems encountered when usingholistic scoring. The criteria help readers consider thepurpose of students' writing and identify the internal andexternal purposes of the writing situation.

Fitzgerald pointed out that students' internal andexternal purposes complicate the writing situation forthem. At the University of Utah, faculty feel that thepurpose for students' writing needs to come from thestudents (i.e., internal), but in academia the purpose oftencomes from the instructor and is motivated by grades(external). The student has to think up his or her purposefor writing and must shape this purpose to serve theacademic external purpose. Therefore, the student'spurpose is always dual. These internal and externalpurposes are in essence the rhetorical situation and theyneed to he taken into account when faculty evaluatewriting, particularly when this evaluation is used to placefreshmen into English courses. Students' ability to handlethis complex rhetorical situation informs instructors ofthe students' readiness for college writing.

Next Fitzgerald described how the rhetoricalexpectations of University of Utah professors weredetermined and used to develop the rhetorical evaluationcriteria. These criteria consist of the following categories:

Category 1: The writer's relationship to collegereaders and writers. expectations: The mostproficient writers recognize that any single piece ofcollege writing is part of an ongoing writtendiscussion about a topic and that they are expected tomake a contribution to the discussic-1. Theyrecognize that an authority (i.e, professor, test giver)identifies issues for discussion.

Category 2: The writer's relationship with his orher subject matter. expectations: College writerscontrol their subject matter, pressing it into service tosupport their internal and external purposes.

Category 3: The writer's relationship to theconventions of the genre. Expectations: Collegewriters employ syntactical units appropriate to theirthought, precise vocabulary, and the mechanics andspelling of standard written American English.

University of Utah students are given placement essay

20

directions that explain the external rhetorical situation; andthey have 45 minutes to plan, write, and revise theiressays.

After reviewing the theoretical background and thecriteria, participants used these criteria to evaluate anddiscuss some student writing. Fitzgerald pointed out thatreaders are told to pay attention to content andreasonability, that there are no hard and fast rules, and thatjudgment is a balancing act of various criteria andexpectati -is of each institution. Readers at the Universityof Utah look at the quantity of student writing as relativeto every piece of writing. In summary, Fitzgerald statedthat these rhetorical evaluation criteria force readers toevaluate writing for its purpose, help readers define goodcollege writing, and address the need to teach studentsabout the effect that the rhetorical situation has on theirwriting.

USING VIDEO IN TRAINING N-.W READERS OFASSESSMENT ESSAYS

Speaker: George Cooper, University of MichiganIntroducer/Recorder: Terence Collins, University of

Minnesota

Lar'e scale testing programs face a recurringproblem of reader consistency and reliability. In thispresentation, George Cooper demonstrated how theEnglish Composition Board at the University of Michiganuses a video presentation of reader "standardizationsessions" for self-monitoring within the reader cadre, fortraining new readers, and for disseminating informationabout the ECB's procedures to variou.: campusconstituencies. While Cooper presented alone, hisremarks were prepared with Liz Hamps-Lyon.

In its placement readings, members of Michigan'sECB teams are guided by statements of criteria clusteredunder three headings: "structure of the whole essay,""smaller rhetorical and linguistic units," and "conventionsof standard English surface features." Students writeessays in response to prompts that define a situation andprovide several choices of opening sentences. Twoimportant characteristics of the 6000 student essays, then,are that topic choice is limited and orientation toward thetopic is guided through provision of choices for essayopenings. Further, the essays are rated for placement:recommendations fall into one of the following categories:exempt (7%), Introductory Composition (82%), andtutorial (11%). These recommendations reflect scores of1, 2-3, and 4. While criteria for quality are outlined torenders, no specific calibration of trait content for the fourpoint range i provided.

r)U

Page 21: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

NOTES from theNATIONAL TESTING NETWORK IN WRITING

The City University of New Yorkand the Fund for the Improvement of

Postsecondary Education

Directors, NTNWKaren GreenbergHarvey WienerRichard Donovan

Editors, NOTESKaren Greenberg

Virginia Slaughter

Published by the Instructional Resource Center, CUNY Office of Academic AffairsHarvey Wiener, Director

Scor-Ing in this system depends on achieving whatCooper calls a "community of values" among readers.The video of reader standardization sessions grew out ofone summer's experience in which this community ofvalues has been lost as Cooper put it, "readers were usingan unimaginable range of criteria by which to evaluateessays" and "had become entrenched in their ownperspectives." The original motive for the video was self-.examination. Through videotaping daily standardizationsessions in which papers receiving "split" scores were thefocus of discussion, Oloper's team of readers sought tocapture the articulatici of values giving rise to thediscrepancies and to r rd the process of moving toagreement on applic; tion of criteria. This led the team toanalyze and commitr icate important characteristics of theirstandardization sess ns and our assessment as a whole.Also, this procedure odeled a process of "give-and-take"that was helpful in t fining new readers and in explainingthe placement proce to various departments.

From ten hour,lthirty five minutes oexplanation and hie

of session tapes, the team assembledactual exchanges interspersed with

fighting. The standardizationdiscussion presented fin the tape enacts what Cooper calls"positive sharing": tkIllk marked by the various readers'attempts to recognize'the qualities in an essay that lead todivergent scoring, each reader's comments leading tofurther discussion and finally to agreement. Suchdiscussion (whether on the tape or in person at the start ofa reading session) reminds participants of the criteriagoverning scoring. It serves the further purpose ofhelping group members realize the vitality of the act ofreading, placing an apparently perfunctory reading act (inthe context of reader-response theory) into the full context

of extra-textual factors that shape readings in open view.The importance of reflecting on the evaluator as readerco-creator of a text--rests in the capacity of texts to sway areader-evaluator when they embody positions to which thereader might be favorably inclined or which the readermight find repugnant.

Cooper asserted that the taped standardizationsessions play the key role of "forming individualconsciousness into a community consciousness." Thevideo record of this work in progress puts flesh on theabstraction and models the process for beginners in orderto cultivate a community of readers who will evaluate notonly the student essays, but who will also study their ownresponses, keeping in mind the relationship of theirresponses to the criteria.

WPA PRESENTATION ON EVALUATINGWRITING PROGRAMS

Speakers: Robert Christopher, Ramapo College,New JerseyDonald Daiker, Miami University, OhioEdward White, California StateUniversity, San Bernardino

Introducer/Recorder: John Schwiebert, University of

Minnesota

This session was organized by the National Council ofWriting Program Administrators (WPA), and the panelists

21 r`

Page 22: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

wished to share their experiences as writing programevaluators and to address salient issues of writingassessment as they pertain to writing program evaluation.

Upon request, consultant-evaluators from theNational Council of Writing Program Administrators willconduct a writing program assessment for a college oruniversity. To prepare both themselves and the WPAevaluators (usually a team of two) for the assessment,schools are asked to complete a narrative "self-study" oftheir writing program at least one month before the WPAteam visits. Robert Christopher distributed copies of theself -study guidelines, which can be obtained from theaddress given at the end of this abstract. The purpose ofthe assessment is to help faculty and administratorsdevelop more effective writing programs appropriate totheir institutions' needs. Donald Daiker and Edward Whitedescribed occasions when the WPA service assistedwriting faculty on a campus to enlist high-leveladministrative support for innovative reforms in theirwriting programs.

Most of the !cession focused on the topic of testing,which, it was emphasized, is only one dimension of anoverall program assessment. To be effective, institution-wide prog'ams of assessment should be appropriate to theparticular needs, demographics, and aims of the individualschool. The challenge of deciding what is appropriateunderscores the relevance and value both of the WPAassessment and of the self-study a school does before theWPA visit. Panel members discussed some of the keyissues involved in each of the following kinds of testing:admissions, placement, equivalency, and course exit.Rising junior and value-added tests were also mentionedbut could not be discussed in detail in the time allotted.Key points about each type of test are below:

Admissions Tests: Discussing the purposes of theSAT verbal exam, White ,..-essed that the SAT assessesverbal aptitude and not writing ability. As such, it isuseful as a criterion for admissions but should not be abasis for exempting students from freshman composition.

Placement Tests: Before actually developing aplacement test, a school should decide if it mak one.Many institutions do need such exams to assure thatindividual students receive writing instruction appropriateto their abilities and experience. AfteL a need has beendetermined, a school should develop a test based upon it'

curriculum -- specifically, upon what is taught infreshman composition. Some schools borrow or adopttests that fail to mesh with their own institutional needs.Only by examining its curriculum can an institutionrationally decide what it is testing for.

Equivalency Tests: These tests provide a specialservice to students, and they differ fundamentally from

22

placement exams. The basic message of an equivalencyis: "Show us that you (i.e., the student) are in control ofwhat we do in freshman comp and well let you out of it."As such, equivalency tests must be based firmly on theschool's curriculum. Given its special purpose, thetesting instrument must also be more complex than oneused for placement.

Course Exit Tests: The course exit exam is acommon test that all students must pass in order tocomplete a course (freshman composition or other).Noting that such tests can discriminate against studentswho write well but who are poor drafters or test takers,White urged against tests being the only basis for exit. Agood exit exam covers materials and processes whichstudents have addressed in their class. White observed thatthe greatest potential benefit of an exit test derives lessfrom the test itself than from the incentive it can providefor departmental and interdepartmental faculty discussionsof writing and curriculum.

Institutions desiring more information on the WPAconsultant-evaluator service should write to Professor ToriHaring-Smith at the following address:

Rose Writing Fellows Program, Box 1962,Brown University, Providence, RI 02912.

DEVELOPING AND EVALUATING A WRITINGASSESSMENT PROGRAM

Speakers: Lorenz Boehm, Oakton CommunityCollege. ,llinois,Mary Ann McKeever OaktonCommunity College, Illinois

Introducer/Recorder: Marion Larson, Bethel College,

Minnesota

Lorenz Boehm and Mary Ann McKeever addressedissues of designing, implementing, and evaluating anessay test currently being used by three Chicago-areacommunity colleges. This test is designed both to placestudents in appropriate composition courses and todetermine if students in developmental or ESLcomposition courses are prepared to move on to FreshmanComposition.

Although the test has been used since 1984,preparations fur its implementation began in 1982, andevaluation and refinement of test questions and proceduresis ongoing. This test replaced an objective test ofgrammar and usage that was being used at the time.During the planning process, prompts were developed andpilot-tested, evaluation criteria were discussed, and readertraining methods were developed. In addition, thosedeveloping the test sought to gain campus-wide support

Page 23: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

and involvement from faculty, staff, and administration.

In the test, students are given two argumentativetopics from which to choose. With each topic, they aregive a context for writing and an audience for whom theyare told to formulate an essay arguing their position.They are given 50 minutes to plan and write their essay.Efforts are made to be fair to ESL students: topics are as"culture free" as possible, prompts are worded simply,ESL (and Learning Disabled) students are given anadditional 20 minutes to write their essay, and specially-trained readers evaluate ESL and LD responses.

Each essay is holistically scored on a 6-point scaleby three readers, two of whom must agree in theirassessment. In cases of disagreement, an additional readermay be used, and an appeals procedure is available tostudents. These readers come from across the college andall of them participate in frequent, extensive training tobe sure that the understand and agree upon criteria for theessays they %,11/ be asked to evaluate. In training, as wellas before actual evaluation sessions, agreement amongreaders is reached by examining, rating, and thendiscussing sample essays; discussing criteria for scoring;and then rating more sample essays.

Many benefits have come from Oakton's use of thiswriting placement test. Primary among them is thegreatly increased dialogue among faculty, administrators,staff, local high schools, parents, and students aboutwriting. Such cooperation is essential to the test'ssuccess, because it has helped short-circuit potent;a1disagreement and has made members of the collegecommunity more receptive to what the compositionfaculty are trying to accomplish. It has also greatly fueledwriting across the curriculum efforts on campus.

This test is continually being evaluated by Boehm,McKeever, and their colleagues to ensure that it is placingstudents appropriately, that the different prompts areeliciting responses of comparable quality, and thatagreement among readers is high. The results thus far arequite positive: composition teachers are very satisfied thatstudents are being placed in the courses they need. Pilottesting prompts in composition classes and then carefullymonitoring the ratings given to essays written in responseto these prompts has helped ensure that different versionsof the test are comparable; and evaluation criteria are keptconsistent by frequent, ongoing training of essay raters.

n23

THE CHANGING TASK: TRACKING GROWTHOVER TIME

Speaker: Catharine Lucas, San Francisco StateUniversity

Introducer/Recorder: Hildy Miller, University of Minnesota

Catharine Lucas explained that traditional writingassessment is designed to determine whether studentwriting improves on a given specified task, whereas whatwe need is a new kind of assessment that focuses on howstudents change the task as they grow as writers. Shenoted that we know that as writers develop, they formulatenew structures to represent tasks, and that they may beawkward in their initial attempts at working with newstructures. For example, writers may experiment withcomplex argumentative structures, abandoning the simplernarrative structures at which they may be more skilled.Ideally, writing assessment should recognize and rewardtheir attempts at more sophisticated formulations, evenwhen performance falls short, rather than constraining thewaiting task in a way that only measures their ability atwhat Moffett calls "crafting to given forms."

To debunk the myth that writing is a unitarymeasurable construct and to show instead the impact of astudent's maturing task representation, she providedsamples of one student's writing that were submitted inresponse to a longitudinal portfolio assessment of hiswriting abilities from ninth to twelfth grade. During eachof the four years, the student was asked to produce anessay as part of a school-wide assessment program. Fourreaders then rank- ordered the four papers to determine thewriter's best and weakest work. While we would assumethat his ninth grade essay would be weakest and thetwelfth grade version the best, instead a different patternemerged: raters consistently rated the twelfth grade effortthe worst.

The reason for this surprising result was foundthrough closer inspection of the writer's choices in taskrepresentation. In the three papers he submitted in grades9, 10, and 11, the writer used the narrative form, astructure that develops comparatively early, since 6thgraders are typically sophisticated story tellers. Theseessays were successful, in part, because he was using afamiliar form. However, in the 12th grade essay he choseto represent the task with an argumentative form, usuallya later developing skill, and one in which he was as yetinexperienced.

Thus, Lucas concluded, we need a way to take awriter's growth into account in assessment. Writersexperimenting with new structures face a harder task, onewhich is likely to cause the writer initially to produce newerrors. Evaluators of writing, like judges of figure

Page 24: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

skating, divinP and other "performance sports," need todevelop systc ,vatic ways of taking into account thedifficulty level of what the performer is attempting. Inorder to account for changes in what is attempted we needto study how writers develop both across and withindiscourse domains. This will require a common languagefor identifying domains and a way of charting what carriesover and what changes when writers move form one toanother. All discourse theorists polarize fictional and non-fictional writing, or as Britton terms it, poetic andtransactional writing. As a result, we tend to assume thatthe tw' are mutually exclusive: fiction writers rarelyinclude essays in fiction and in academia we rarely allowpoetic expression. In addition to these polar ends of thediscourse continuum, Lucas posits a middle category,which draws freely on both fictional and academic styles,and includes autobiography, belles kliers, the NewJournalism, and the personal reflection essay widely usedin classrooms and school assessments. While it isrelatively easy to chart a writer's development withineither the literary or the discursive domains, growth inthis middle domain is sometimes marked by shifts fromfictional techniques to extended abstract discourse, as inthe case presented. Whether students are moving withinthe mixed domain, or from the literary end of the spectrumto the discursive end, even when teachers recognize thesecond piece as representing a later effort, they recognizethat the text is often less successful in what it attemptsthan the earlier piece. This difference diminishes, ofcourse, as the student gains skill in handling discursive,transactional writing.

To make possible more careful comparisons of whatchanges as students move within and across domains,I. ucas has developed a method of defining tasks that drawson work done by Freedman and Pringle ("Why ChildrenCan't Write Arguments") based on Vygotsky's (Thoughtpad Langton) distinctions between focal, associationaland hierarchical arrangements, as well as on Coe's (Towarda Grammar of Passages) method of charting relationsbetween propositions in a text. Lucas's systemdistinguishes between four text patterns: (1) thechronological core in which the student tells a story,providing commentary at end--a sign the writer is movingtoward abstraction; (2) the focal core in which the titleprovides the subject of focus, with each sentence relatingto it--a sign that some notion of related ideas is emerging;(3) the associational core in which we see chains ofassociations forming, often with a closing commentary;and (4) the hierarchical core, in which long-distancelogical ties supplement short-range connections betweencomplexly interrelated ideas, in a pattern typical ofadvanced exposition in Western cultures. Using thissystem, we may begin to see how writers build newschema within these different domains, and begin toreward them for these promising signs of growth in ourassessments of their writing abilities.

24

ASSESSING WRITING TO TEACH WRITING

L -Aer: Vicki Spandel, Northwest RegionalEducation Laboratory

Introducer!Recorder: Alice Moorhead, Hamline University

Rarely are the lessons learned from large-scalewriting assessment translated into terms that make themrelevant for and useful to the classroom teacher. Yetmany of those lessons show how teact ..rs can usesystematic writing assessment--especially when teachingwriting as a process. Large scale, district-wide writingassessment is a costly process (at least 2.5 days fortraining/assessing and between $2-$8 a writing sample);however, as part of professional development programs,most districts could justify the necessary time and budget.

In this presentation, Vicki Spandel discussed herefforts, along with those of Richard Stiggins', to linkwriting assessment and instruction through their work inthe Portland area for Northwest Regional EducationLaboratory. Spandel's current assessment method focuseson using an analytic rating guide. She argues thatalthough it is difficult to F-parate form from content inassessment, one can ass0 the features of writing, thusher interest in an analytic guide that can be usedholistically to assess and to teach writing. Since teachersare often afraid of assessment, using the rating guide canensure that what teachers value gets assessed and then getstranslated into practice.

As an assessment tool, Spandel's analytic ratingguide was generated from writing samples rather thandeveloped as a guide to impose upon writing. The guidecaptures a more complete profile of the writing sampleswhen used along with holistic assessment. Itdistinguishes six features of writing: ideas and content;organization; voice, word choice; sentence structure;writing conventions. Each feature is described and rankedby degrees fora score of 5 or 3 or 1. Not only does thisanalytic rating guide objectify expectations for writing butit also offers a more defensible version of the subjectiveprocess of writing assessment.

Using this guide with the holistic assessmentprocess, particularly as in-service workshop forprofessional development, has two key advantages:

The assessment process promotes "real"agreement among teachers and professional ratersabout strengths and weaknesses in writing.

(2) Teachers can re-enter the classroom to teachwriting more explicitly on what "counts" inwriting and know this instruction is in concertwith and reinforced by others.

P4

Page 25: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

Not only can teachers use the analytic guide but so canstudents. In peer review groups, students can focus theirwriting efforts more directly with the six feature guide as"revision stations" for students to visit for specificfeedback on their writing. In Spenders experience,teachers welcome the use of this analytic guide forassessment and for teaching writing. Many teachersclaim: "I'll never teach or think or writing in quite thesame way."

READER-RESPONSE CRITICISM AS A MODELFOR HOLISTIC EVALUATION

Speaker: Karl Schnapp, Miami UniversityIntroducer/Recorder: Ann Hill Duin, University of

Minnesota

Karl Schnapp's session focused on the application ofreader-response theory to large and small scale holisticassessment. Schnapp began by citing the work of StanleyFish, David Bleich, and Norman Holland as workingmodels for the holistic evaluation of student writing. Hethen said that his own work is also based on EdwardWhite's theories of composition as a socializing andindividualizing discipline. From these theorists, Schnappconcluded that the best composition pedagogy viewsstudents' writing from both social and it .Avidualperspectives. In short, the interpretation and evaluation ofwriting depends on qualities of the community in whichthe writing was created and was evaluated.

Schnapp then described his specific project. Hismodel is based on three reading theories that lead to amodel for the holistic evaluation of writing. The rusttheory is the "top-down" model of reading as discussed byHolland and Bleich, the seconst is the "text-readerinteraction" theory (from information-processing theory)as discussed by Rosenblatt, and the third is the"communal association" theory as discussed by Fish.Schnapp described his model in detail. Then he askedconferees to fill out a survey identical to that used in hisstudy. The survey asked us to complete questionsregarding our perceptions and understanding ofcomposition/language arts. Next we read an essay writtenby a freshman student and rated the student essay.Finally, we completed a second survey in which we gaveinformation on the criteria we employ when holisticallyevaluating student writing. As with Schnapp's results, wehad about 75% agreement in terms of the common goalsof the composition instructors present. Schnapp statedthat his research shows that writing teachers see writing ashelping students on more of a practical level than on anaesthetic level.

The remainder of the presentation was a discussionbetween Schnapp aad the conferees. Key points that

emerged included: the need to ask readers about whatinfluences them as they evaluate papers; the need todetermine the evaluative standards for one's discoursecommunity; and the extent to which readers are influencedby what they are thinking about while evaluating writing.

THE DISCOURSE OF SELF- ASSESSMENT:ANALYZING METAPHORICAL STORIES

Speakers: Barbara Tomlinson, University ofCalifornia, San DiegoPeter Mortensen, University ofCalifornia, San Diego

Introducer/Recorder: Anne O'Meara, University of Minnesota

Barbara Tomlinson and Peter Mortensen gaveconferees attending this session an opportunity to becomestudents of their own writing processes. Much of thesession was devoted to composing, sharing, and analyzingour own metaphorical stories about how we write.Tomlinson and Mortensen feel that using metaphoricalstories in the classroom provides a means for students totake responsibility for their own writing, to balancepersonal with external assessment, and to center attentionon the writing process rather than the product.

Tomlinson began by sharing some of her ownmetaphors for writing as well as some of those she foundin her study of over 2000 professional writers. Handoutsgave further examples from both professional and studentwriters. The metaphors were sometimes relevant to forthe process of writing as a whole and sometimes symbolsfocusing on one aspect of writing. They ranged from clearanalogies (e.g. building, giving birth, cooking, mining,gardening, hunting, getting the last bit of toothpaste) tometaphors that needed elaboration like a "gusset" (a small,irregular piece of material necessary for the construction ofa garment, but hidden) and the "lost wax process" (a wayof making a mold which then melts away when theproduct is finished). Tomlinson stressed that metaphorscan reassure and guide her through composing problems aswell as help her describe these problems.

Tice speakers then simulated their technique for usingmetaphorical stories in the classroom. As the participantsbegan to compose their own metaphorical stories, PeterMortensen asked some guiding questions to get us started,encouraging us to think of metaphors we might use forbeginning writing, finishing writing, writing underpressure, writing badly, writing well, generating ideas, andso on. He suggested students could also use the guidingquestions (distributed on the handouts in interviews or incollaboration to get started.

In the discussion that followed, Tomlinson andMortensen stressed that metaphors should be accepted andexplored, rather than judged. They may be original,adopted, or enforced; they may be idiosyncratic,contradictory, or even strike us as "bad." The important

2525

Page 26: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

thing is that we and our students look at what the effectsof writing metaphors are, what they imply about writing,and how they match or might amplify oir experience.When they have students compare their metaphors tothose of professional writers, Tomlinson and Mortensenminimize possible intimidation by emphasizing that thepurpose is to find similarities and common problems.

Finally, the speakers summarized their reasons forusing metaphorical stories in the classroom. In additionto taking authority for their own writing and balancingpersonal with external assessment, students also need todevelop better self-monitoring processes because many donot have a language for thinking about their processes.(Tomlinson's survey of 23 secondary and college writingtexts showed that there was very little figurative languagein these texts). The speakers have found that bycomparing metaphorical stories, students can gainconfidence and learn that other writers (includingprofessionals) may encounter similar problems. Studentsbegin to talk like writers and develop a stronger interest inwriting.

THE USES OF COMPUTERS IN THE ANALYSISAND ASSESSMENT OF WRITING

Speakers: William Wresc' , University ofWisconsin-Stevens PointHelen Schwartz, Carnegie-MellonUniversity

Introducer/Reporter: Marie Jean Lederman, NTNW and

Baruch College, CUNY

William Wresch discussed the current state of thefield of computer analysis of student writing, dividing thesoftware programs into six different categories, each ofwhich has a different pedagogical orientation. The firstcategory is gmx checkers. These programs focus c/ahomonym confusions, sexist language, usage errors, andinfe';citous phrases. Some examples are Writer's Helper(Conduit), Sensible Grammar (Sensible Software),RightWriter (RightSoft), Ghost Writer (MECC), andWriter's Workbench (AT&T).

The second category is refornatters which, ratherthan find errors, make it easier for writers to find theirown errors. One of the first programs was Quill (DCHeath) which included a combination of prewriting,writing, and revising activities. For example, to helpstudents revise their work, it displayed each sentence oftheir paper alone on the screen. Rather than makestatements about or changes in the sentence, the programallowed students to Igak at each sentence in a new way.Other newer reformatters include Ghost Writer (MECC)and Writer's Helper (Conduit). The third category ofprograms is Bud ience awareness programs. Theseprograms include readability formulas and they pinpoint

26

vague references and other problems.

The fourth category is student conference utilities.These computer programs try to help students developediting skills as they read each other's papers and "send"comments to each other. Two examples are Quill andAlaska Writer (Yukon-Koyukuk School District). Thefifth category is grail utilities, programs designed tohelp teachers in the clerical aspects of paper grading.Students turn in their work on disks, and the telcher usesthe computer to 1..1p grade the work. By creating ten ortwelve messages for major errors, teachers can respondwith just a keystroke or two to most of the mistakes theyare likely to see. Examples are the RSVP project(Miami-Dade Commmity College) and WrI,er's Network(Ideal Learning).

The last category is automatic graders. This is thelogical "next step" after grading utilities. Ellis Page ofthe University of Wisconsin proved twenty years ago thata computer could grade papers quite well based on aformula of papa length, sentence length, level ofsuoordination, and word length. However, merelyassigning a grade isn't enough in a classroom situation inwhich students expect not only a grade but a Idnge ofresponses from teachers. It might be possible, however,to use such computer graders in large-scale assessmentprograms. Wresch concluded that there are many decisionsto be made about how computers will be used in writinganalysis, but it is certain that there are already manyopportunities and, surely, many more to come.

11,..len Schwartz began by discussing severalpurposes of assessment: diagnosis and revision as well asimproved self-evaluation. The range of writing behaviorswhich can be assessed are ideas, organization, rhetoricalpresentation (purpose and audience assessment) andgrammatical correctness. In answer to the question, "Howcan computer programs assess these behaviors for thesepurposes?" she first gave a short answer, "No computerprogram alone is now accurate or helpful enough" andmost of the existing programs may overwhelm the studentwith too much information at once. Style checkers candraw attention to problems, but the student must make thedecisions. And sometimes readability ft), mulas can leadstudents to vary sentence length by creating run-onsentences and fragments. Schwartz pointed out that"Computer programs are useful as delivery systems forteacher, peer and self-assessment. They help studentsbecome aware of problems in their writing and help themto solve these problems." She gave four examples:

1) Prewriting programs such as "ORGANIZE" (HelenSchwartz, Wadsworth Publishing) can be used notonly to help students see the shape of their papers butalso to desensitize peer review.

Cy

Page 27: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

2) Templates, such as the self-e.,..luation form given in"Interactive Writing," help students assess strengthsand weaknesses.

3) "SEEN" (Schwartz, anduit) in-Andes a built-inbulletin board where peer review can take place.

4) Programs for teacher and peer response to paper drafts,including (a) "Ch and Comments," developed byChristine Neuwirui at Carnegie Mellon whichfacilitates discussion and peer review; (b) "PROSE"(Prompted Revision of Student Essays by Davis,Kaplan, Martin, McGraw Hill) which allowssummary comments; comments embedded in thepaper, revision notes; and handbook-like responseswith an overview of the error, further explanation, andthen interactive tutorials on each of 18 features; and(c) "Prentice Hall College Writer" which is a wordprocessor that allows access to an on-line handbookand allows the insertion of comments that can includeexcerpts from the on-line handbook.

The discussion that followed centered on examples ofsoftware described and demonstrated by the speakers.

LEGAL RAMIFICATIONS OF WRITINGASSESSMENT

Speaker: William Lutz, Rutgers University,Camden

Introducer/Recorder: Chris Anson, University of Minnesota

William Lutz, who holds a law degree and is amember of the Pennsylvania Bar, addressed the importanceof considering the legal constraints under which testingmust operated. Lutz began by distinguishing the differentkinds of testing programs: those conducted within aninstitution and those conducted outside the *rqitution.External testing programs, such as those conducted by aschool district or by a state agency, are governed by a.tries of laws and court decisions. Internal testingprograms, such as course placement and proficiencytesting, come under fewer legal constraints and exist, atpresent, in a legal nether world. However, there is enoughlegal precedent to warrant caution by anyone involved inany testing program.

According to Lutz, testing programs may be attackedfrom a variety of legal approaches. Title VI of the CivilRights Act prohibits any practice that would have theeffect of restricting an individual, on the grounds of race,color, or national origin, "in the enjoyment of anyadvantage or privilege enjoyed by others receiving anyservice, financial aid, or other benefit." It is important tonote that this law would judge a testing program by its

27

effect, not its purpose. Moreover, the burden of proof inany legal action would fall on those conducting the test.Thus, under this law, testing programs withdisproportionate effects on minority students are subject toclose judicial scrutiny. If a state has a law guaranteeingan education to all its citizens, then all citizens have aproperty interest in an education. A testing program inthat state can be attacked as a denial of a pi perry rightwithout due process. Such attacks have succeeded.

Lutz pointed out that a testing program can beattacked as a denial of a liberty interest. Due processguarantees a right to liberty, and this liberty interest isinfringed where a stigma attaches to the student as a resultof the test. The 14th Amendment to the Constitutionstates that "No person shall . . . deny to any personwithin its jurisdiction the equal protection of the laws."While state laws may treat differently for various purposesby classification persons who are similarly situated withrespect to the purpose of the law," they must be accordedequal treatment. i.. hearing cases brought under thisAmendment, the court will ask two questions: (1) has thestate acted with an unconstitutional purpose? (2) has thestate classified together all and only those persons who aresimilarly situated? For example, if someone wanted toattack a placement test there are two possible argumentsunder the 14th Amendment which might be used First,the test itself can be attacked by arguing that while testingmay be a legitimate means of classification, this particulartest is so inadequate that one cannot possibly ten whethera particular student is ready for or has the ability to docollege level work. A second approach is to attack thetests results by arguing that while the means used toclassify a student may be legitimate, these means are soimprecise that one cannot possibly tell whether thestudent has been classified correctly.

There are some vague areas here, or the legal netherworld as Lutz calls it. Before the due process requirementsof the 14th Amendment can apply to a cause of action,two questions must be answered: (1) do the concepts ofliberty or property encompass the asserted interest? and(2) if due process does apply, what formal procedures doesdue process require to protect the interest adequately? Inother words, an individual must have a legitimate claim ofinterest before due process can apply. Thus far, a collegeeducation has not yet been found to be a benefit for whichsomeone can assert a claim of entitlement However, aclaim of liberty could apply because testing may affect anindividual's opportunity to choose his or her ownemployment. This issue is still open for litigation.

Based upon a review of federal court decisions, Lutzoffered the following Guidelines for Testing:

1. The purpose of the test must be clearlydelineated. The test must be matched with

P7

Page 28: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

specific skills and/or specific curriculumobjectives.

2. Mere correlation between the test and thecurriculum is not sufficient. There must beevidence, obtained from a regular process, thatclassroom activities are related to curriculumgoals and test specifications.

3. All test items must be carefully developed andevaluated to ensure that they conform tocurriculum and instructional practices.Moreover, there must be evidence that any biasrelated to racial, ethnic, or national originminority status has been eliminated.

4. If possible, other measures of performance andability should be used in conjunction with testresults.

5. Cut-off scores should be the result of a well-documented process of deliberation that conformsto state and federal statutory requirements. Thereshould be no suggestions of arbitrariness orcapriciousness in setting cut off scores.

6. Students should be informed well in advance ofwhat it is they need to know to perform well onthe test. Students should also be informed inadvance as to the nature of the test.

7. Options should be available for those studentswho fail thy test. These should include, at thevery least, the option to re-take the test, andinstitutional help to prepare and/or correctdeficiencies.

8. Students should have access to their test scoresand a full explanation of those scores.

Finally, Lutz suggested that anyone conducting a testingprogn.. should do the following immediately:

1. Conduct a full, impartial review of the testingprogram, and document this review.

2. Examine all the documentation in the program,and write any necessary additional documentation.

3. Correct all the deficiencies identified in theprogram, and then document the process bywhich the deficiencies were identified andcorrected.

4. Institute two procedures as a permanent part ofthe testing program:

2 8

(1) a formal process for administering andconducting the testing program, includingfull documentation;

(2) a formal review of the program conducted atregular intervals by an outside, impartial,objective reviewer.

Lutz concluded by saying that we live in a litigious age,and prudence suggests that those involved in testing beprofessional and institute the guidelines and take the stepshe outlined in his talk.

SOME NOT SO RANDOM THOUGHTS ON THEASSESSMENT OF WRITING

Speaker: Alan C. Purves, The State University ofNew York, Albany

As I near the end of a seven-year long comparativestudy of student performance in Written Compositionsponsored by the International Association for theEvaluation of Educational Achievement, I should like toset forth some conclusions I have reached about writingassessment.

1. Written Composition is an ill-defined domain.There have been a few recent efforts at mappingthe domain through an examination of writingtasks and through an examination of perceivedcriteria, but in general these have been ignored inmost assessments of student performance. Mostassessments tend to rely on a single assignmentselected at random.

2. Wntten composition is a domain in whichproducts are clearly the most importantmanifestation; the texts that students produceform the basis for judgments concerning thosestudents. Teachers and assessors know that andso do students.

3. These proi..ict.s are culturally embedded, andwritten composition is a culturally embeddedactivity. The culture may be fairly broad or itmay be relatively narrow such as the culture of aLee Odell or an Andrea Lunsford, but studentsinhabit and produce compositions that reflectthose cultures.

4. When a student writes something in a large scaleassessment in the United States, what is usuallywritten is a first-draft on an unknown assignmentthat is then rated by a group of people who makea judgment as to its quality. The result is an

PS

Page 29: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

index of "PDQ," Perceived Drafting Quality.Whether PDQ has any relation to writingperformance or ability is unclear, although it isprobably a fair index.

5. Given the fact that what is assessed is PDQ, it islittle wonder that students see writingperformance as comprising adequacy of content,handwriting, spelling, grammar, and nearness.Such is the case of the reports of secondaryschool students as to the most important featuresof the textual products of a school culture.

NATIONAL AND INTERNATIONAL WRITINGASSESSMENT: RESEARCH ISSUES

Speakers: Alan C. Purves, State University ofNew York at AlbanyThomas Gorman, National Foundationfor Educational Research, Great BritainRainer Lehmann, Institute forEducational Research, Federal Republicof Germany

Introducer/Recorder: Wayne Fenner, University of Minnesota

This session was the first of several sessions onresearch on international writing assessment. Alan Purvesbegan with an overview of the background of the fourteen-nation Written Composition Study. Begun in 1980, thisproject is the most recent undertaken by the InternationalAssociation for Educational Achievement (LEA). Previousstudies have examined the teaching and testing of science,math, reading, foreign language, and civic education.Unlike earlier subjects, the domain of written compositionis a cloudy one: it is both an act of communication andan act of cognitive processing. Researchers, then, hadfirst to define this domain, both empirically andtheoretically. After this phase of domain specification,researchers designed a series of specific writing tasks andwriting purposes to be included in the study. Third, afive-point scoring scheme was devis'Al that would be validand reliable across languages and cultures. Finally, raterswere chosen and trained.

Thomas Gorman discussed the results from a recentwriting assessment program in England in order to clarifywhat can be learned from international studies and cannotbe learned from separate, national writing assessmentprojects. The problem of domain specification seems tobe culturally relative. The purpose of writing varies in itsrelation to general educational aims, and specific tasksmay or may not reflect the kind of writing that isgenerally required of students in specific schools inparticular cultures. There is, however, remarkableunanimity of assessment criteria and standards of

29

performance across languages and cultures. Content, forexample, as well as form, style, and tone appear to berating factors utilized internationally. As a result of theIEA Study, we have learned more about the relativedifficulty of various writing tasks, and we have gathered agreat deal of information about background variablerelative to writing performance. These variables includestudents' interest and involvement in life at school, plansfor future education, amount of daily and weeklyhomework, and involvement of parents in the educationalprocess.

Rainer Lehmann discussed the methodology ofcomparative writing assessment, specifically theapplication of multitrait-multimethod analysis to theproblem of validating the analytical scoring scheme usedby all countries in the LEA Study. Although hisdiscussion was limited to results from the Hamburg data,Lehmann provided information from a non-Englishlanguage context that appeared to confirm the IEAstudent's methods and findings.

TEACHING STRATEGIES AND RATINGCRITERIA: ANINTERNATIONAL PERSPECTIVE

Speakers: Sauli Takala, University of Jvaskyla,FinlandR. Elaine Degenhart, University ofJvaskyla, Finland

Introducer/Recorder: Robin Murie, University of Minnesota

This session reported on data gathered in the IEA(International Association for the Evaluation ofEducational Achievement) study of Written Composition.The IEA study, now in its eighth year, is a large-scaleexamination of student writing in 14 countries (Chile,England, Finland, Hungary, Indonesia, Italy, theNetherlands, Nigeria, New Zealand, Sweden, Thailand, theUSA, Wales, W. Germany). An internationally developedscoring system was used to rate the writing tasks in termsof organization, content, style, tone, mechanics, andhandwriting. In addition, students, teachers, and schoolsfilled out questionnaires. These data are now beingexamined in a number of ways.

Sauli Takala, one of the coordinators of this study,described patterns of agreement and disagreement amongraters application of a five-point rating scale (whichincluded the criterion "off the topic"). He found that ratersbehaved in a uniform manner. Most of the time, tworeaders were within one point of being in full agreementwith each other. Beyond a one-point discrepancy on therating scale, there was a significant drop in frequency ( 2points off: 5-12%; "off the topic": 2.5-7.5%, 3 points off:

P3,

Page 30: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

2.5-5%). He then discussed where on the scale thesediscrepancies were occurring. Agreement was greatest atthe high end of the sole and least likely in the low-middlerange of scores.

Takala then discussed where the ratirg of "off topic"appeared: In early discussions with colleagues, it wasanticipated that this rating would pair up with rating; atthe high end of the scale (an essay would be so creative asto elicit either "very good" or "off topic".) In fact, justthe opposite was true: "off topic." appeared at the low endof the scale 'Atli "poor." Surprisingly, it also occurred inthe middle range. Takala noted that perhaps some raterswere unsure of how to score such essays and so chose amiddle ground. In general, similarities between ratersoutweighed differences, lending credibility to furthercomparisons.

Elaine Degenhart, another coordinator of the lEAStudy of Written Communication, looked at relationshipsbetween writing instruction and student performance,using data from the teacher questionnaires, and

questionnaires on the background and curriculum of theschools involved in the PEA study. The purpose of herwork was to identify some patterns in instructionalapproaches and to determine how well the variable that

show these approaches discriminate between low, middle,and high achieving classes. The four main approachesthat emerged were product,process, reading-literature, anda less well defined skills-oriented approach with emphasison product. Based on mean scores on the writing tasks,classes were divided into achievement levels: 25% high,50% middle, 25% low. The top two instructionalapproaches for each country were then examined in termsof how well they discriminate for the three levels ofclasses. Degenhart reported on findings from four of thecountries: Chile, Finland, New Zealand, and the U.S.

The top two teaching strategies found for Chile were(1) a strongly student-centered approach with a processorientation and (2) a stronger product orientation. Here itappeared that low-achieving students had more process-centered teaching, witereas the product-centered ; rproachdistinguished well for the middle group. In Roland, thetop two teaching strategies were (1) a reading-literatureapproach and (2) a process approach. The processapproach did not distinguish between the top and bottomgroups; the reading-literature approach was positive forlow-achieving students. In New Zealand, the top twowere (1) a teacher centered reading/literature approach and(2) a less clearly defined approach leaning toward process.Both discriminated between all three levels. In the UnitedStates, the top two approaches were (1) a structuredreading/literature approach and (2) a strong student-centeredproduct orientation. The product orientation was high forthe low-level students.

Questions centered around possible interpretations of

30

these findings. Degenhart was careful not to drawpremature conclusions or make quick generalizations.From the discussion it became clear that a greaterunderstanding of the background situation in each countrywould help with the interpretation of why classes werereceiving a particular type of writing instruction.

EFFECTS OF ESSAY TOPIC VARIATION ONSTUDENT WRITING

Speaker: Gorden Brossell, Florida StateUniversityJim Hoetker, Florida State University

Introducer/Recorder: Laura Brady, George Mason University,

VA

Gorden Brossell and Jim Hoetker presented theresuits of a study designed to analyze the ways in whichsystematic variations in essay topics affected the writingof college students under controlled conditions. Toexplore the question of whether a change in topic makes adifference in the quality of student response, Brossell andHoetker chose extremes of topic and student population.The population consisted of remedial students and honorsstudents writing in response to a regular courseassignment. The year-long study (May 1987-April 1988)was based on 557 essays collected from four Florida sites:the University of Florida, Miami-Dade CommunityCollege, Valencia Community College, and TallahasseeCommunity College.

The general essay topic for this project, "The mostharmful educational experience," was written according toprocedures developed by Brossell and Hoetker in theirprevious research on content-fair essay examination topicsfor large scale writing assessments (CCC, October 1986).Brossell and Hoetker then varied this topic in two ways:(1) they controlled the degree of rhetorical specificationand (2) they changed the wording to invite subjec'iveandobjective responses. These variations yielded fourversions of the topic:

Minimal rhetorical specification requesting arimpersonal discussion

Minimal rhetorical specification requesting areport of personal experience

Full rhetorical specification requestirg animpersonal account

Full rhetorical specification requesting a report ofpersonal experience

The essays written in response to these topic variationswere scored holistically on a 7-point scale by experiencedgraders; the scale included operational descriptions for fourlevels of quality (1,3,5,7) and left the other three variabi:Is

30

Page 31: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

(2,4,6) unspecified in order to give the raters greaterflexibility. The essays were also scored analyticallyaccording to ten items in three categories: (1)development, (2) voice/speaker/persona, and (2)readability.

Although the original plan had been gather samplesfrom extreme student populations (high- and low-ability),differences between institutions it the average quality ofstudent writing were noticeable: many low-ability"students wrote as well as or better than students ranked as"high-ability." As a result, th,?: sample fell into a bell-curve distribution. The research concluded that there is noevidence from either the holistic-scale scores or theanalytic-scale scores that even gross variations in phrasingaffect either the quality of student responses or the natureof student-topic interaction. Other conclusions: theappearance of first - person voice is significantly higher inessays written in response to topics calling for accounts ofpersonal experience, but it is unaffected by the degree ofrhetorical specification.

In a discussion following the presentation of theresearch, Brossell and Hoetker mentioned plans for futurework that include a study to evaluate the effect of contentvariation in essay topics when wording and rhetoricalspecification are held constant. They also plan to developtheir analytic score further, based on additional essayswritten at greater leisure and revised, and representingaverage and high-ability students as well as low-abilitystudents. With revision and development to make thescale reliable and "transportable," the analytic scale might,according to Brossell and Hoetker, have the potential tobecome an alternative to the single-digit holistic score.

WHAT SHOULD BE A TOPIC?

Speakers: Sandra Murphy, San Francisco StateUniversity,Leo Ruth, University of California,Berkeley

Introducer/Recorder: Robert L. Brown, Jr., University of

Minnesota

Taking a cue from the Bay Area Writing Project'scollective spirit, Sandra Murphy and Leo Ruth rejected theusual panel format by opening the session to audiencediscussion of issues influencing subject-selection forholistic scoring. They directed the session with sixquestions (tressed at greater length in their recent ABLEXbook Designing Writing Tasks for the Assessment ofWritinal Their questions examined the dual problemfacing assessment designers: naming a subject andproviding the writers with instructions about what to dowith it. In part, the session provided a forum for a

31

critique of both the entire agenda of holistic scoring and ofthe specifics of assessment design. But it also allowedMurphy and Ruth format in which to report some of thefmdir:zs from their work.

The six questions treat variously the syntactico-semantic structure of the items, the discourse structuressuggested, the power relationships established betweentester) and writer, and the cultural knowledgepresupposed. The six questions and comments from thepresenters and audience are as follows:

1. How much information should be provided about thesubject?

Murphy and Ruth's findings suggest that a simplereferring phrase (NP) elicited less rich responses thana full proposition. When a predicate was provided,writer responses were more "reasonable andresponsible."

2. How does specific.ption of a subject constrainresponse?

Discussion demonstrated the range of possibleconstraints: discourse type, qualification,quantification, text structure, style, and-- always --ideology, explicit and implied.

3. How does knowledge of the subject affectperformas-1?

The session members soon raised the meta-questionof whether any topic could not require "specializedknowledge," and therefore whether holistic essaytesting could be free from political bias. Generally,Murphy and Ruth and the session members agreedthat knowing a lot about the topic was a greatadvantage, and the "'knowledge extended wellbeyond simple propositional knowledge tofamiliarity with cultural discourses.

4. Should students be given options in selectingtopics?

Generally, options invite confusion. Items may notbe equally difficult. Students may not be wise inselecting, picking complex topics and writingcomplex, bad essays. Confusion over the selectionprocess may penalize.

5. How do rhetorical specifications affect performance?

Students did not seem to be helped by suggestions ofrhetorical type. Typically, they ignored them orfound that the problem of executing the rhetoricalcommand interfered with their writing in general.

3 1

Page 32: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

6. To what extent should admonitions about thewriting task be mentioned? Time limits, pitfalls,and soon?

Again, the political demands of the writingassessment as an institution overwhelms the testers'attempts to help: students write the essay they havein mind, ignoring the instructions or findingthemselves confounded by them.

The session eloquently cquesse.i reservations about theideology of holistic scoring and mass assessment ingeneral. The conferees reacted to the inherent artificialityof pretending to write authentic prose while authenticallydemonstrating familiarity with academic conventions.They agreed that students who know the conventions oftesting will, predictably, do best.

CLASSROOM RESEARCH AND WRITINGASSESSMENT

Speaker: Myles Meyers, California Federation ofTeachers

Introducer/Recorder: Deborah Appleman, Carleton College,

Minnesota

Myles Meyers addressed the issue of large scaleassessment from the perspectives of the K-12administrator and classroom teacher. From theseperspectives he finds large scale assessment to beproblematic and often ill-advised. The enormous diversityof schools makes it difficult to capture the current "stateof the art." Myers also contended that state assessmentssuch as California's CTBS work against teaching as wellas against the professionalization of teachers.

Meyers discussed at length the seeminglyreductionist quality of large scale assessment. Althoughrecent research on writing maintains that writing is amultiple construct, time and financial constraints limit theconstructs that can be examined. The construct that isemployed to define writing thus become, the primaryfocus for a particular grade (for example, autobiography ingrade 10). In our effort to handle the assessment task bylimiting constructs, our definition of writing, as well asits instruction, therefore becomes uni-dimensional.Moreover, because of tte inevitable prescriptive quality of

32

the interpretation of assessment results as well as teachers'lack of involvement and consequently lack of ownershipin the entire assessment process, Meyers claimed thatstatewide assessments can destroy teaching-as-inquiry andharm student learning.

Meyers then presented several suggestions forinvolving teachers in the assessment process. Heemphasized the importance of having teachers participatesignificantly through summer institutes at universitysettings. He also underscored the importance of viewingassessment as a process of inquiry, one in whichdisagreement is as important as agreement. To illustratethe value of assessment as inquiry, Meyers handed outthree sample student papers and asked the audience to rankthem as high, middle, and low. The resulting scoring wasquite discrepant, as were the reasons offered for therankings. Meyers then discussed the value of discrepancyin our aim to improve literacy for all children. Ratherthan considering agreement as the ultimate goal inassessment, discrepancy can lead to a fruitful dialogueabout our underlying assumptions about teaching goodwriting as well as about its evaluation.

Meyers pointed out that dialogues or debates such asthose generated by the conferees when they were asked torank the papers were a critical aspect of the assessmentprocess. He stressed the importance of having classroomteachers as active participrnts in an on-going debate onassessment, rather than as recipients of an administrativedecisio to employ a particular large scale assessmentinstrtuncnt. He then handed out six additional studentpapers, and asked conferees to rank them and then todiscuss the rankings in pairs. As with the first exercise,the rankings were widely discrepant. Meyers illustratedhow this kind of exercise can be used to encourageteachers to think explicitly about their pedagogy and alsodescribed several ways in which the ranking of studentwriting can be employed to generate discussion amongteachers. For example, he has asked teachers to devisesample lessons for students whose papers they haveranked.

Meyers ended hiv provocative discussion bysuggesting several ways in which writing can be viewedas a speech act and as a collaborative social event. Hediscussed the differences and similarities betweenconversation and written presentation. Meyers concludedhis talk with the following thought: "When you teachpeople how to write, you teach them a new definition ofthemselves."

Page 33: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

COMPUTERS AND THE TEACHING OFWRITING

Speakers: Michael Ribaudo, The City Universityof New York,Linda Meeker, Ball State University

Introducer/

Recorder: Donald Ross, University ofMinneapolis

Both speakers discussed the National Project onComputers and College Writing, a three-year projectsupported by the Fund for the Improvement ofPostsecondary Education and The City University of NewYork. This project is coordinated by three of the NTNWdirectors: Michael Ribaudo, Harvey Wiener, and KarenGreenberg.

Michael Ribaudo explained the goals of the project:it will (1) identify outstanding college programs that haveincorporated computers in freshman-level compositioncourses, (2) conduct research on the impact of computerson students' writing abilities, (3) develop and disseminatereports on this research and on instructional philosophyand methodology, and (4) host a national conferenceshowcasing the programs and the research.

Ribaudo noted that, at this point in time, fifteencolleges and universities from across the country areinvolved in the project. They are developing researchdesigns that will pair three "computer" sections and threetraditional sections and three traditional sections at eachsite. Some of the research instruments to assess studentsinclude essay tests (scored holistically and analytically),multiple-choice tests, and questionnaires on writinganxiety and writing attitudes.

Linda Meeker discussed her university's participationin the project, and summarized the efforts that Ball State

has already made in evaluating the effects of computers onthe teaching and learning of writing.

She described three of her recent studies. The firststudy assessed student attitudes toward using computer-assisted instruction (CAI) in basic writing classes. Shefound that CAI proved effective in terms of students' timemanagement and that basic writing students developedpositive attitude toward CAI. Her second study focused onusing "invention" software to assist the composingprocesses of basic writing students. Results indicatedhighly positive student attitudes and a noticeableimprovement in students' ability to focus on their topics.Meeker's third study examined the reviling strategies ofbasic writing students. This study revealed that stader isspent significant wounts of time in a variety ofprewriting and revising activities, but it was unclearwhether the text manipulations were clearly related to agreater flexibility provided by CAI. However, Meeker didfmd that the computer enabled students to do morefrequent and more productivepre-editing.

Next, Meeker described some of the studies that willbe conducted by the National Project on Computers andC )llege Writing. She noted that data collected from theselarge scale assessments will either confirm or call intoquestion the results of her studies. Students attitudestoward CAI and the effectiveness of word-processing as atool for inventing, composing, revising and editing willbe evaluated. Moreover, each of the project sites willexamine the comparative effectiveness of differenthardware and software configurations available at theirinstitutions.

For further information on this project, or theconference which is scheduled for Spring 1990, write toDean Ribaudo at CUNY, 535 East 80th Street, NewYork, NY 10021.

MINCANCIlikle: New works on writing assessment by NTNW members:

THE lEA STUDY OF WRITTEN COMPOSITION: THE INTERNATIONAL WRITINGTASKS AND SCORING SCALESEdited by T.P. Gorman, A.C. Purves, and R. E. DegenhartPergamon, Oxford, England, 1988

THE EVALUATION OF COMPOSITION INSTRUCTION, Second Editionby Barbara Gross, Michael Scriven, and Susan ThomasTeachers College Press, NY, 1987

DESIGNING WRITING TASKS FOR THE ASSESSMENT OF WRITINGby Leo Ruth and Sandra MurphyAblex, Norwood, NJ 1987

3333

Page 34: DOCUMENT RESUME ED 301 828 CS 211 614 AUTHOR Greenburg ... · Case Study: DEBORAH HOLDSTEIN & INES. BOSWORTH. We Did It and Lived: A State University Goes to. Exit Testing: PHYLLIS

How You May Participate

NTNW needs the active participation of those who have a concern with writing skillsassessment, whether as specialists, administrators, or classroom teachers. If youwish to become a member of the network or to learn more about who we are, what weplan to be doing, and how our plans could involve you, just complete the couponbelow and return it to us along with materials describing yourself and yourprofessional interests in writing instruction and assessment.

Name

Position

Institution

Address

I would like to be on NTNW 's mailing list.

_I would be willing to share information about my writing a sseasmentprogram with NTNW.

Please return to:Karen Greenberg, DirectorNational Tesemg Network in WritingOffice of Academic AffairsThe City University of New York535 East 80th StreetNew York, New York 10021(212) 794-5446

g

35