ethical issues in assessment

Wayne J. Camara ITC Bulletin: Test Ethics in the USA

Use and Consequences of Assessments in the USA:Professional, Ethical and Legal IssuesWayne J. CamaraThe College Board, Princeton Junction NJ, USA

Tests and assessments in the USA have taken on additional burdens as their uses have been greatly ex-panded by educators, employers, and policy makers. Increased demands are often placed on the sameassessment, by different constituencies, to serve varied purposes (e. g., instructional reform, student account-ability, quality of teaching and instruction). Such trends have raised considerable concerns about the ap-propriate use of tests and test data and testing is under increased scrutiny in education, employment andhealth care. This paper distinguishes among the legal, ethical and professional issues recently emergingfrom the increased demands of assessments and also identifies unique issues emanating from computer-based modes of test delivery and interpretation. Efforts to improve assessment practices by professionalassociations and emerging issues concerning the proper use of assessments in education are reviewed.Finally, a methodology for identifying consequences associated with test use and a taxonomy for evaluatingthe multidimensional consequential outcomes of test use within a setting are proposed.

Keywords: ethics, legal issues, assessment, USA

Individuals are first exposed to tests and assess-ments very early in their school years in the UnitedStates. By the age of 18, assessments have alreadyplayed a significant role in the life decisions of manyyoung adults such as graduation, promotion and re-tention, college admissions, placement, and scholar-ship awards. Tests and assessments are also widelyused in the psychological and educational screeningof children and adults, for career and vocational as-sessment, for certification and licensing of individu-als for a number of occupations, and for the selec-tion and placement of workers within governmentand private sector organizations. Given the diverseand important use of tests and assessments, mea-surement professionals have become increasinglyconcerned with questions of validity, fairness, in-tended use(s) and consequences related to the ap-propriate use of educational and psychological as-sessments.

The ethical and legal conduct of lawmakers, ce-lebrities, athletes and professionals from all areas(e. g., business, investment, marketing, law) have at-tracted headlines and the attention of mass media.In the U. S. and many European countries there hasbeen a fixation on such ethical and legal issues in-volving the responsible conduct and obligations tothe public in recent years.Attention has also focusedon the use of tests and test results in education, em-

ployment, and health care settings (Berliner & Bid-dle, 1995). The real and perceived misuses of assess-ments and assessment results have become one ofthe most challenging dilemmas facing measurementprofessionals and test users today. Abuses havebeen widely reported in preparing students to taketests and in the use and misuse of data resultingfrom large-scale testing programs. Elaborate andcostly test cheating operations have been disclosedby federal agencies (Educational Testing Service,1996), test preparation services have employed con-federates to allegedly steal large pools of items fromcomputer-based admissions testing programs, in-stances of students and employees being providedwith actual test items before an administration havebeen reported, as have unauthorized extension oftime limits, falsification of answer sheets and scorereports, and violations of the confidentiality of testdata (Schmeiser, 1992). Misuses of test data in highstakes programs abound and the accuracy and mar-keting tactics of test publishers have been criticizedin some (Sackett, Burris, & Callahan, 1989; Sackett& Harris, 1984).

Professional conduct and responsibilities in useof assessments can be ordered within three levels:(1) legal issues, (2) ethical issues, and (3) profession-al issues. The practices and behaviors within thesethree levels are certainly interrelated, yet this cate-

European Journal of Psychological Assessment, Vol. 13, Issue 2, pp. 140–152 © 1997 Hogrefe & Huber Publishers

gorization is useful to initiate a discussion of con-cerns, the severity of inappropriate practices and be-haviors and examples of assessment practices whichare most likely to be questioned by professionalsand the public.

This paper, focusing on the U. S., will first discussthe assessment practices and behaviors which raiseprofessional, ethical and legal concerns. Second, thepaper discusses efforts in addressing these concernsand the diversity among individuals using tests andtest results. Third, a paradigm is proposed for iden-tifying and evaluating consequences associated withtest use. Unique professional issues emerging fromthe increased reliance on technology and computer-based testing are briefly reviewed. Finally, the vari-ety of such issues directly relating to educational as-sessments are illustrated to provide a context fordiscussions of the technical, legal and professionalissues involved in assessment.

Legal, Ethical and Professional Issuesin Testing and Assessment

It is difficult to define the boundaries of and distinc-tions between professional, ethical and legal issuesor concerns surrounding the development and useof tests and assessments. Legal, ethical, and profes-sional issues form a continuum of standards for pro-fessional conduct in assessment and other areas.Laws and government regulations are legal man-dates that affect all individuals living in a society.Ethical codes may range from enforceable to exem-plary to educational principals that guide the pro-fessional behavior and conduct of members of anyprofession. Professional guidelines, principals andstandards are also developed to educate and guideprofessionals in more technical activities. All threelayers of regulations or standards exist in testing andassessment.

Laws and legal documents about testing and as-sessment are generally vague and ambiguous, but itis clear that where they exist, they have greatly in-fluenced both professional standards of conductand professional practices in assessment and testing.Government involvement and regulation of testingis most evident in personnel testing. But even here,laws and legal challenges to testing are limited tovery specific domains. They address some issues(e. g., discrimination) and applications (e. g., em-ployment testing) of assessment which have re-ceived widespread attention, while leaving many

more common issues and concerns of test use unad-dressed (e. g., validity of the measure when dispa-rate impact is not present). Numerous federal andstate laws and executive orders have implications onemployment testing primarily through prescribedstandards for equal employment opportunity(Camara, 1996), but also for the assessment of indi-viduals with disabilities, the handling and retentionof personnel records, and restrictions of the use ofcertain pre-employment techniques (e. g., Em-ployee Polygraph Protection Act of 1988). The gen-eral consensus among industrial psychologists isthat Civil Rights laws, which emanated in the 1960s,have been a major stimulus for improved pre-em-ployment assessment practices. Employers becamebetter educated about the technical, professional,and legal issues involved in the use of testing out ofnecessity, and while there is some evidence thatregulations initially decreased use of employmenttesting, today they are used by a higher proportionof organizations than ever (Deutsch, 1988).

The first formal ethics code for any profession us-ing assessments was adopted by the American Psy-chological Association (APA) in 1952. Eighteen ofthe more than 100 ethical principals from this Code(APA, 1953) addressed the use of psychologicaltests and diagnostic aids, and addressed the follow-ing issues of test use: (1) qualifications of test users(3 principles); (2) responsibilities of the psychologistsponsoring test use (4 principles); (3) responsibili-ties and qualifications of test publisher’ repre-sentatives (3 principles); (4) readiness of a test forrelease (1 principle); (5) the description of tests inmanuals and publications (5 principles); and (6) se-curity of testing materials (2 principles).

Codes from the Canadian and British Psycholog-ical Associations came later, as did those from otherEuropean nations (Lindsay, 1996). In the past dec-ade, many other professional associations haveadopted ethical standards and professional codeswhich cover measurement and assessment issues.These trends have resulted from the increased pub-lic awareness of ethical issues, the variety of newproposed and actual uses for assessments, the in-creased visibility given to assessments for account-ability purposes, and a commitment from the pro-fessions to safeguard the public (Eyde & Quain-tance, 1988; Schmeiser, 1992). Ethical standards ofthe American Counseling Association,and APA areunique in that these associations maintain formalenforcement mechanisms that can result in membersuspension and expulsion, respectively. In 1992, theAmerican Educational Research Association(AERA) adopted ethical standards, followed in

ITC Bulletin: Test Ethics in the USA 141

1995 by the National Council of Measurement inEducation’s (NCME) Code of Professional Re-sponsibilities in Educational Measurement. Severalother organizations such as the Society for Indus-trial and Organizational Psychology (SIOP) and re-gional I-O organizations formally adopted APA’smost recent ethical code for their members for edu-cational purposes without any enforcement mecha-nisms.

Laws which affect testing, primarily strive to pro-tect certain segments of the public from specificabuses. Ethical standards and codes attempt to es-tablish a higher normative standard for a broaderrange professional behaviors. For example, APA’sethical standards note that:

. . . in making decisions regarding their profes-sional behavior, psychologists must consider thisEthics Code, in addition to applicable laws andpsychology board regulations. If this Ethics Codeestablishes a higher standard of conduct than inrequired by law, psychologists must meet thehigher ethical standard. If the Ethics Code stan-dard appears to conflict with the requirements oflaw, then psychologists make known their com-mitment to the Ethics Code and take steps to re-solve the conflict in a responsible manner (APA,1992, p. 1598).

Coinciding with this increased attention to ethicalcodes has been a dramatic increase in professionaland technical standards for assessment issued whichis described later. The Standards for Educationaland Psychological Testing (AERA, APA, & NCME,1985) are the most widely cited document address-ing technical, policy, and operational standards forall forms of assessments that are professionally de-veloped and used in a variety of settings. Four sepa-rate editions of these standards have been devel-oped by these associations and a fifth edition is cur-rently under development. However, numerousother sets of standards have been developed to ad-dress more specific applications of tests or aimed atspecific groups of test users. Standards have beendeveloped for: (1) specific uses such as the valida-tion and use of pre-employment selection proce-dures (Society for Industrial and OrganizationalPsychology, 1987), integrity tests (ATP, 1990), licens-ing and certification exams (Council on Licensure,Enforcement, and Regulation, 1993; Council on Li-censure, Enforcement and Regulation & National

Organization for Competency Assurance, 1993),educational testing (Joint Committee on TestingPractices, 1988); (2) for specific groups or users suchas classroom teachers (AFT, NCME, NEA, 1990),and test takers (Joint Committee on Testing Prac-tices, 1996), and (3) for specific applications such asperformance assessments, adapting and translatingtests (Hambleton, 1994), and admissions testing(College Entrance Examination Board, 1988; Na-tional Association of Collegiate Admissions Coun-selors, 1995).

Professional standards, principals, and guidelinesare more specific and generally oriented towardmore technical issues to guide test users with specif-ic applications and use of assessments. First, techni-cal issues concerning the development, validation,and use of assessments are addressed in standards.Validity is the overarching technical requirementfor assessments, however, additional professionaland social criteria have been considered in evaluat-ing assessments, such as: (1) how useful the test isoverall, (2) how fair the test is overall, and (3) howwell the test meets practical constraints (Cole &Willingham, 1997). These criteria are directed atboth the intended and unintended uses and conse-quences of assessments. Existing standards guidetest users in the development and use of tests andassessments; however, these standards may rarelyreach and influence test users not associated with aprofession.* For example, most employers are un-aware of the Principles for the validation and use ofpersonnel selection procedures (Society for Indus-trial and Organizational Psychology, 1987) and cer-tainly the vast majority of educational administra-tors and policy makers who determine how to usetests and cite test results in making inferences aboutthe quality of education have never viewed a copyof any of the various standards in educational mea-surement and testing.

Professional standards developed by groups suchas APA and AERA do not appear in publicationscommonly read by employers, educators and policymakers. Many standards are written at a level wherethey may be incomprehensible to such individualseven if they had access to them. Finally, in many in-stances, members of the professional associationswhich develop and release standards themselvesmay not have and use copies of the standards andmay have had little exposure to the standards andother new topics in testing, measurement, and sta-

142 Wayne J. Camara

* Often professional standards have been cited by courts in case law and have influenced assessment practices in theseways.

tistics through graduate training courses (Aiken,West, Sechrest, Reno, Roediger, Scarr, Kazdin, &Sherman, 1990). For example, the Standards forEducational and Psychological Testing, referred toas “the Standards” (AERA, APA, & NCME, 1985),which are the most widely cited professional stan-dards for any form of testing and assessment, havetotal sales of 56,000 through 1996, while there aremore than 120,000 members of APA alone (Brown,1997).

Efforts to Improve Proper Use of Testsand Assessments

In the 1950s, when the first technical standards fortesting were adopted in the United States (APA1955, AERA, & NCMUE, 1954) the test user wasconsidered to be a trained professional who con-ducted testing or disseminated results and interpre-tations to an individual.This classic definition of testuser includes psychologists, physicians, counselors,personnel managers, and state or local assessmentdirectories who generally have both some trainingand explicit job responsibilities for assessment.These test users seek to qualify for purchasing testmaterials, provide detailed interpretation of testscores, or represent a client organization in procur-ing and using assessments. The current version ofthe Standards (AERA, APA, & NCME, 1985) fol-lows this line in defining the test user as someonewho “requires the test for some decision makingpurpose” (p. 1). These individuals may best betermed “primary test users” because of their roleand responsibilities. Today the concept of test useris much broader and often includes many differentindividuals with little or no training in measurementand assessment. However, there are secondary testusers, especially in education, including policy mak-ers, teachers, parents and the media, who often haveno general training in assessment and no prescribedresponsibilities for assessment. Some of these indi-viduals can greatly influence and distort the generalinterpretation of assessment results, misuse assess-ments and results, and may have political incentivesto selectively use results to support certain values orbeliefs (Berliner & Biddle, 1995). Many of the moststriking examples of test misuse are generated andsupported by such secondary users. The more re-moved the test user is from the examine, the less fa-miliar they are with the personal characteristics,aca-demic abilities, or workplace skills of the individual

tested, the more likely instances of test misuse willoccur.

With the expanded uses and increased focus onassessment has come renewed criticism of the mis-uses and negative consequences of assessments.Professionals in measurement and testing are in-creasingly struggling with how to best improveproper test use and to both inform and influence anincreasingly diverse group of tests users today whomay have no formal training in testing and measure-ment uses, but still have legitimate claims for usingtest results in a wide variety of ways. Such groupshave attempted to address the legal, ethical, andprofessional concerns with additional codes of con-duct, technical standards, workshops, and case stud-ies. However, most of these efforts rarely reach be-yond members of the specific professional associa-tion or clients/users of a specific assessment product.Clearly such efforts are essential for improving theproper use of assessments and appropriate under-standing of assessment results. Yet, these initiativeswill not generally reach the secondary users whomay insist on using assessment results as the sole de-terminant of high school graduation, rewards andsanctions to schools and teachers, and the primaryindicator of equity, educational improvement or stu-dent achievement. Unfortunately, efforts which areaimed at only one segment of a much more expan-sive population of test users may not go far enough,fast enough for improving proper assessment prac-tice.

Associations have attempted to cope with thisnew more expansive group of “secondary test users”by developing broader and simpler forms of stan-dards, such as the Code of Fair Testing Practices inEducation which basically condenses the primarystandards from a 100 page document into a fourpage booklet which encourages duplication. Otherefforts have been to work in collaboration withbroader groups such as the National Education As-sociation to develop codified guidelines or stan-dards. However all such efforts have had no consis-tent impact across situations because there are fewcommon linkages, different priorities and expecta-tions for assessments, and little common under-standing between primary and secondary test users.Relatively few efforts have been focused on under-graduate and graduate programs which train teach-ers and measurement specialists. Because universi-ties and colleges differ in types of programs offered,the title of courses and course sequences, and eventhe departments which such programs are housedin, targeting and reaching educational programsbroadly presents a number of substantial logistical


obstacles. Often it is difficult to identify the facultyand administrators responsible for such programsand to effect systematic changes in their trainingprograms.

For these and other reasons, Haney and Madaus(1991) state that test standards have had little directimpact on test publishers practice and even less im-pact on test use. They note that professional codesand standards primarily serve to enhance the pres-tige, professional status, and public relations imageof the profession rather than narrow the gap be-tween standards and actual practice. How do we re-solve these issues? Given the increased social policyimplications of testing, some have argued that great-er legal regulation, litigation, or enforcement oftechnical standards by independent auditing agencypresent some potential mechanisms for reducingthe misuse of assessment practices (Haney, 1996;Haney, Madaus, & Lyons, 1993; Madaus, 1992).However, such mechanisms may have little impacton many of the most visible misuses of assessmentsbecause it is often legislative and executivebranches of state and federal government who ad-vance expanded and often inappropriate use of as-sessments. Because test use is so expansive andabuses are so diverse, solutions which address onlyone element or one audience (e. g., test developer,teacher) may not be equipped to resolve the major-ity of instances where assessments are misused.

Consequences of Testing andAssessment

A more thorough understanding and considerationof potential consequences of test user can substan-tially reduce inappropriate uses and the resultingethical issues. When consequences are discussed weare often reminded of the exclusively negative con-sequences resulting from test use:– adverse impact on minorities and women– discouraging and ‘institutionalizing’ failure for

individuals– teaching to the test and limiting curriculum and

learning– reinforcing tasks requiring simple rote memory

at the expense of more complex cognitive pro-cesses required for success in the real world

– creating barriers– tracking individuals into classrooms and jobs of-

fering fewer challenges and opportunities

There are also often positive consequences whenvalidated assessments are appropriately used:– merit as a guide for decision making (selecting

the most qualified candidate or making awardsbased on relevant performance)

– efficiency (relatively quick and effective means ofcollecting a large amount of data across a rangeof skills/competencies)

– quality control (certification or licensure)– protection of the public (negligent hiring for criti-

cal occupations)– objectivity for making comparisons and decisions

among individuals or against established criteria– cost effectiveness and utility

Consideration of how social ramifications of assess-ments affect validity has been summarized by Cron-bach (1988) who stated that validity research is es-sentially a system which considers personal, institu-tional, and societal goals as they relate to inferencesderived from test scores. If validity is establishedthrough evidence that supports inferences regard-ing specific uses of a test, then intended and unin-tended consequences of test interpretation and useshould be considered in evaluating validity (Mes-sick, 1989). Test developers and test users need toanticipate negative consequences that might resultfrom test scores, potential corruption of tests, nega-tive fallout from curriculum coverage, and howteachers and students spend their time (Linn, 1993).While there is some consensus that the conse-quences of test use must become an important cri-terion in evaluating tests within education, this viewis not generally held in other settings (e. g., person-nel, clinical).

Before consequences can be integrated as a com-ponent in evaluating assessments, a taxonomy ormodel is required. Such a taxonomy must considerboth positive and negative impacts and conse-quences associated with test use. The impact, conse-quences, and feasibility of alternative procedures(e. g., biographical data, open admissions vs selec-tion). Further complicating such a taxonomy is theknowledge that different stakeholders will havewidely differing views on these issues. After the con-sequences have been identified, their probability ofoccurrence, the weight (positive or negative) asso-ciated with each consequence,and the level at whichthe consequence occurs (i. e., individuals, organiza-tions, or society) must be determined.

This taxonomy borrows from terminology andprocesses from expectancy theory (Vroom, 1964)where the weight of the consequences are similar tothe “valence” and the probability is related to the

144 Wayne J. Camara

“instrumentality.” Proposed steps (adopted from,Camara, 1994) in determining the consequences oftest use include:1. Identify the intended consequences and objec-

tives of assessment2. Identify subject matter or content experts or de-

velop an alternative consensus process3. Identify potential intended and unintended con-

sequences to individuals and organizationsthrough a review of the literature, interviews orfocus groups with key stakeholders

4. Determine the level that each consequence oc-

curs (does it impact individuals, organizations, orsociety?)

5. Determine the probability of occurrence for eachconsequence (e. g., instrumentality)

6. Determine the strength or weight of each conse-quence (e. g., valence)

7. Employ a consensus process to determine thesummative consequences of different aspects oftest use on individuals, organizations and society.

Ideally the test developer, test users, and other keystakeholder groups would consider these issues be-

Table 1. Paradigm for evaluating the consequential basis of assessments.

Consequence Individual Organization Societal(e. g., student) (e. g., school) (e. g., community)

PositiveHarmfulSummative

Consequence #1 = Valence × Instrumentality

Example: Individual Consequence #1

Example: Societal Consequence #2

Figure 1. Computation of the summa-tive consequence for each potentialconsequence associated with test use.Two examples of how specific outco-mes of a high standards graduation testcan produce summative consequencesfor arriving at a consensus regardingthe potential positive and harmful con-sequences of assessment. For each ofthe possible consequences, the sum-mative consequence would be sum-med by level (individual, organization,societal) to arrive at an overall indexof the potential consequences of an as-sessment for the level. A consensusprocess would be employed to developthese values and to determine theoverall desirability of the proposed as-sessment

SummativeConsequence =

Probabilityconsequencewill occur

0 to 10

Strength of theconsequence

×

–10 to +10

Individualconsequence

×–8

Example: A state proposes development of a high standards test which allstudents must pass to graduate from high school. This proposed test has num-erous potential consequences for the students, schools, districts, and the sta-te, which would include the business community, parents, citizens, etc. Be-low are only two of several potential consequences of such a testing program.

Consequence #1increase studentdrop out rate

–40

Probability

5

Organizationalconsequence

×+9

Consequence #2higher standardswill increase thevalue of a diplomaand produce morecompetent gradu-ates

27

Probability

3


fore embarking on a new or revised testing program.Most consequences will have multiple impacts onindividuals (e. g., test taker, teacher), organizations(e. g., schools, business), and society (e. g., commu-nity, state). Steps 5 and 6 require individuals, oftenwith very diverse views, to arrive at a consensus orcommon judgment about the probabilities andstrength (and direction) of consequences. The liter-ature on standard setting may be of assistance instructuring a more explicit process.

Table 1 illustrates how consequences may beidentified and classified through a consensus pro-cess. A list of potential consequences would be de-veloped and classified within each of the nine boxes(step 4). Once all potential consequences are iden-tified, each consequence is fully evaluated to deter-mine its valence and instrumentality as illustrated inFigure 1. Step 7 in the process would have key stake-holders determine the overall summative conse-quences on individuals, organizations and societybefore a final decision is reached on the desirabilityand appropriateness of an assessment program orproposed use for assessments.

Such a taxonomy would not ensure that test mis-use is minimized, but it would help to raise aware-ness of the diverse range of issues that emergeacross different stakeholders and constituencygroups who are involved in a high stakes assessmentprogram. The absence of literature proposing mod-els or taxonomies to identify and contrast conse-quences associated with test use leaves the test de-veloper and user with little or no guidance in im-proving professional conduct and appropriate useof assessments.

Concerns Arising from Technologyand Computed-Based Testing

Professional and ethical concerns about the appro-priate use of testing have dramatically increasedover the past few decades as the community of testusers has increased and the use of assessments hasexpanded. Most recently technological innovationshave given rise to a number of new and unique pro-fessional and ethical challenges in assessment.

Matarazzo (1986), and later, Eyde and Kowal(1987), identified many of the unique concerns cre-ated by the use of computerized clinical psycholog-ical test interpretation services. For example,Matarazzo noted that variation in clinical interpre-tation is often the rule rather than the exception, re-

quiring much greater time and effort for clinicaljudgment and interpretation of computer-generat-ed clinical interpretations than is usually the case.“Specifically, two or more identical soil readings,blood chemistries, meteorological conditions orMMPI profiles may require different interpreta-tions depending on the natural or human context inwhich each is found . . . use of the same ‘objective’finding (e. g., an IQ of 120 or a ‘27’ MMPI codetype)may be quite different if the ‘unique’ patient is a 23-year-old individual being treated for a first acute,frankly suicidal episode than if the ‘unique’ patientis a 52-year-old truck driver . . . applying for totaldisability” (Matarazzo, 1986, pp. 20–21). Eyde andKowal (1987) explained that computer-based test-ing provides greater access to tests and expressedconcern about the qualifications of such expandedtest users.

Technological innovations and the increasedpressure for accountability in health care servicesmay also be creating a different demand and marketfor clinical and counseling assessments. Assess-ments in all areas can be and are delivered directlyto consumers. The availability of a take-home CD-ROM IQ test for children and adults, marketed tothe general public or by Pro-Ed, a psychologicaltesting and assessment publisher, has raised thesesame ethical and professional issues for psycholo-gists. The CD-ROM test comes with an 80-pagemanual which informs parents of some of the theo-ries of testing, how to administer the test, and howto deal with test results (New York Times, January22, 1997). In such instances when tests are deliveredby the vendor directly to the test taker there is notraditional test user. The test taker or their parents,who have no training and little knowledge of testing,must interpret the results, which increases the riskof test misuse.

Computer-adaptive testing (i. e., assessments inwhich the examine is presented with items or tasksmatched to his or her ability or skill level) is increas-ingly used for credentialling and licensing examina-tions and admissions test programs today. Severalunique concerns arise even when computer-basedtests are administered under controlled conditions,such as in the above instances. First, issues of equityand access arise because these computer-based test-ing programs often charge substantially higher test-ing fees, which are required to offset the additionalexpenses incurred for test development and deliv-ery, and often have more limited geographical test-ing locations. Second, familiarization with technolo-gy and completing assessments on computer may berelated to test performance. Research has demon-

146 Wayne J. Camara

strated that providing students with practice tests ondisk in advance of testing, and tutorials at the begin-ning of the test administration are important in re-ducing anxiety and increasing familiarization withcomputer-based testing. Third, differences in theformat, orientation, and test specifications of com-puter-based testing may affect the overall perform-ance of individual test takers. Russell and Haney(1997) note that students who use computers regu-larly perform about one grade level worse if testedwith a paper-and-pencil test than with a computer-based test. Students completing computer adaptivetests may also become more frustrated and anxiousas they receive far fewer “easy items” since item se-lection algorithms are designed to match items withthe level of each test taker’s ability — resulting inmore items that are perceived as “hard” by the testtaker. Additionally, computer-based tests generallydo not permit test takers to review items previouslyanswered, as is common on paper-and-pencil tests.Additional rules are required for students who omita large number of items and disclosure of test forms.Computer adaptive testing could be manipulated bytest takers or coaching schools if some minimumthreshold of item completion were not required.Be-cause exposure of items is a major risk with thesetests disclosure of test forms can not be as easily ac-commodated as with paper-and-pencil tests (Mills& Stocking, 1996). These and other distinctions as-sociated with computer-based testing raise addi-tional professional issues for test users, test devel-opers, and test takers. As Everson (1997) notes, con-vergence of new theories of measurement withincreased technology presents many opportunitiesfor improved assessment frameworks, but alsoraises additional professional and ethical issues con-cerning assessment.

Fees for computer-based testing programs havegenerally been running between 300 to 600% higherthan the fees for the same paper-based tests. Cur-rently, test takers will receive immediate score re-ports and slightly more precise measurement accu-racy at their level, but little additional benefits fromthe higher test fees. The few national programs of-fering computer-based testing programs have eithereliminated (or plan to eliminate) the paper-and-pencil or raised fees on the paper-based test to en-sure adequate volume for the higher priced com-puter-based product. Until additional advantagesare realized from computer-based tests, businesspractices of replacing a lower priced test with onethat is three to six times as costly for the test takershould be questioned.

Educational Assessment Today:Legal, Ethical and ProfessionalConcerns

As tests are increasingly used for distinct and mul-tiple purposes negative consequences and misuseare more likely to emerge. The use of performanceassessments and portfolios in high stakes assess-ment programs can also raise additional issuesabout standardization and fairness. Nowhere arethese concerns more evident than in educational as-sessment today.

In the past decade, there have been expanded ex-pectations for assessments to not only measure edu-cational achievement but to bring it about. Assess-ments are increasingly viewed as tools to documentthe need for reform by holding schools and studentsaccountable for learning, and also as leverages of re-form (Camara & Brown, 1995; Linn, 1993). Presi-dent Clinton has proposed development of nationalassessments for all students in reading and mathe-matics by 1999 and called on all schools to measureachievement of their students in these and other ar-eas. Currently, forty-eight of fifty states in the U. S.currently have in place or are developing large-scaleeducational assessments to measure the perform-ance of their students. In some states these tests areused for high stakes purposes such as issuing a di-ploma or rewarding/sanctioning schools, districts,and even individual teachers. State and local boardsof education and state and local departments ofeducation translate test performance to make deci-sions about schools and/or individuals. School ad-ministrators come under increased pressure in suchhigh-stakes testing programs to defend instructionalpractices and student achievement. Classroomteachers who administer the assessments and in-creasingly view them as a driving force for instruc-tional change and educational reform also have arole in such assessment programs. Parents, students,and the general public who demand improved qual-ity in education,business leaders who are often criti-cal of graduates for lacking appropriate workplaceskills, higher education which finds an increasingproportion of incoming students requiring remedialinstruction, and policy makers who must respond toall these diverse stakeholder groups represent manydifferent types of secondary test users.

Dissatisfaction with standardized assessments isalso greatest in education because of their perceivednegative consequences on learning and instruction.The performance assessment movement has strongsupport both within education and educational


measurement and has not become another educa-tional fad as some had predicted. Several large as-sessment programs had sought to replace theirstandardized testing programs with fully perform-ance-based or portfolio systems. Today it appearsthat the “model” state assessment program willcombine such constructed response tasks with morediscrete, selected response (e. g., multiple choice,grid-in’s) test items. Employing multiple measuresallows educators to gain the benefits of more in-depth and applied performance tasks that increasecurricular validity, as well as increased reliabilityand domain coverage that selected response itemsoffer. However, a number of legal, ethical and pro-fessional concerns emerge with any high stakes as-sessment program whether the decisions made pri-marily affect the student or the school.

Single assessments, either norm-reference multi-ple choice assessments or more performance-basedassessments, do not well serve multiple, high-stakesneeds (Cresst, 1995).Often key proponents of large-scale assessments support multiple uses, but actuallyhave very different priorities given these uses. Kirstand Mazzeo (1996) explain that when such a stateassessment system moved from a design concept tobecoming an operational testing program it becameclear that not all the proposed uses and priorities forthe design could be accommodated. When prioritiesof key stakeholders could not be met, support forthe program decreased.

Phillips (1996) identified legal criteria which ap-ply to such expanded uses of assessments for highstakes purposes. These criteria have been modifiedand supplemented with several additional criteriawhich reflect a range of issues:

Adequate advance notification of the standards re-quired of students. To ensure fairness, students andparents should be notified several years in advanceof the type of standards they will be held to in thefuture. Students and teachers should be providedwith the content standards (knowledge and skills re-quired) and performance standards (level of per-formance). Sample tasks, model answers, and re-leased items should be provided and clear criteriashould be established when high stakes (e. g., gradu-ation) uses are associated with the test.

Evidence that students had an opportunity to learn.The critical issue is whether students had adequateexposure to the knowledge and skills included onthe assessment or whether they are being asked todemonstrate competency on content or in skills thatthey were not exposed to in school. Phillips (1996)

notes that such curricular validity can often be dem-onstrated through survey responses from teachersthat ensure students had on average more than oneopportunity to learn each skill tested.

Evidence of opportunity for success. This challengeemerges when major variations from standardiza-tion occur. This assumes that all students are famil-iar with the types of tasks on the assessment, themode of administration (e. g., computer-based test-ing), have the same standardized administrative,scoring procedures, and equipment (e. g., some stu-dents have access to a calculator or superior labora-tory equipment in completing the assessment), andthat outside assistance (e. g., group tasks, studentwork produced over time where parents and otherscould unduly offer assistance) could not affect per-formance on the assessment. Variations in these andother conditions can present an unfair advantage tosome students.

Assessments reflect current instructional and curricu-lar practices. If assessments are designed to reflectexemplary instructional or curricular practices, as isoften the desire of educators who hope to use theassessment to drive changes, which are not reflectedin the actual practices for many schools, a funda-mental fairness requirement may not be met. Thesame challenges could be brought where teachersdo not receive the professional development to im-plement new instructional or assessment practices(e. g., use of a graphing calculator) that are requiredon the assessment or in end-of-course assessmentswhere the teacher lacks appropriate credentials forthe subject area.

While these concerns apply to most educationalassessments, they move from professional issues tolegal and ethical concerns when assessments areused to make high stakes decisions. Additional ethi-cal and professional issues which have been associ-ated with various high stakes educational assess-ments may also affect other types of testing pro-grams in other settings. Only a few of these issuesare briefly addressed below.

Overreliance or exclusive reliance on test scores. Testperformance should be supplemented with all rele-vant and available information to form a coherentprofile of students when making individual highstakes decisions (e. g., admissions, scholarships).Student performance on tests should be interpretedwithin the larger context of other relevant indica-tors of their performance. In admissions decisions,students grades, courses, and test scores are gener-

148 Wayne J. Camara

ally all considered, with supporting information onpersonal qualities and other achievements. Whentesting has been repeated, performance on all ad-ministrations will permit individuals to identify anyparticular aberrations; less weight should generallybe assigned to that particular test score or other in-dicator in these instances. Similar errors occur whenindividuals overinterpret small score differences be-tween individuals, groups, or schools.

Cheating and “teaching to the test.” There have beennumerous examples of individuals cheating on highstakes tests. In addition, several instances whereeducators and other test users have been accused ofsystematic efforts of cheating (e. g., excessively higherasure rates on students papers, disclosure of an-swer keys to job incumbents on promotional exams)have received national attention, with some esti-mates that over 10% of test takers are cheating onhigh stakes tests (Fairtest, 1996). Because test scoresare used as an indicator of school quality, school per-formance influences property values, school fund-ing, and school choice — creating added incentivesto increase school and district test scores by anymeans possible. These pressures often result inteaching to the test according to many educatorsand this common criticism of standardized tests. Itis such negative consequences and the prospect forimproved schooling that has caused the impetus forperformance assessment, not the desire for bettermeasurement for its own sake (Dunbar, Koretz, &Hoover, 1991).

Consideration of the cultural and social experiencesof the test taker. Students bring their prior social, andcultural experiences with them when they partici-pate in the class, compete in a sporting event, orcomplete an assessment. For many students the cu-mulative effect of these experiences may be to em-phasize certain behaviors, skills or abilities that areless similar to those required of the assessment. Thegreater the similarity of an individual’s socioeco-nomic and cultural background to that of the major-ity population, the better his or her test perform-ance will generally be (Helms, 1992). Additional ef-forts are required to both ensure that all studentsare familiar with the types of tasks on the assess-ment and to ensure that divergent skills and abilitiesare considered in the construction and validation ofassessment programs. Sensitivity to cultural, ethnic,gender, and language differences are required wheninterpreting results from assessments or other mea-sures for individual students. Similarly, differencesin these and other demographic variables must be

considered when making simplistic comparisonsamong schools, districts, and other units.When theseissues are not adequately considered by test devel-opers and test users serious professional and ethicalissues arise.

Exclusion of students from large-scale testing pro-grams. Most large-scale national assessment pro-grams which use aggregate level data (school, dis-trict, state) to monitor educational progress and per-mit comparisons systematically exclude largeproportions of students with limited English profi-ciency and disabilities (McGrew,Thurlow,& Spiegel,1993) Often school staff determine which studentsmay be excluded from such national and state test-ing programs and there is variation across schoolsin the exclusion rates and application of criteria forexcluding students. Paris, Lawton, Turner, and Roth,(1991) have also demonstrated that “low achievers”are often excluded by some schools or districtswhich would have the effect of artificially raisingdistrict test scores. Such practices introduce addi-tional error into analyses, complicate accurate pol-icy studies,affect the rankings resulting from the testdata and introduce a basic unfairness in the use oftest data (National Academy of Education, 1992).

Use of test scores for unintended purposes. Many ofthe most visible misuses of tests occur when scoresare used for unintended purposes (Linn, Baker, &Dunbar, 1991). This occurs with state comparisonsof unadjusted SAT or ACT scores, when resultsfrom state assessments are used as indicators ofteacher competence, and when test results becomethe primary basis for inferences concerning the rela-tive quality of learning or education among differ-ent schools or geographical regions. Test scores willalways be considered an important indicator oflearning. However, test users must become moreaware of the extraordinary limitations and weak-nesses of placing undue weight on test scores in suchsituations. Many state reports cards have attemptedto provide public reports on the quality of educationby examining a range of criteria (e. g., safety, learn-ing, continuation in higher education, gainful em-ployment, student honors) with a range of indicatorsthat extend beyond test scores. As the test user be-comes increasingly removed from personal knowl-edge of the examine, or less familiar with the units(e. g., schools, districts) of comparison, instances ofmismeasurement and test misuse will increase(Scheuneman & Oakland, in press).


Conclusion

This paper has attempted to distinguish among legaland regulatory mandates, ethical issues, and profes-sional responsibilities all which concern the appro-priate use of tests and test data. Numerous effortshave been undertaken by testing professionals andprofessional organizations to improve responsibleuse of tests, yet often these efforts are judged tohave fallen short. As tests are used by an increasingnumber of users with a variety of objectives (e. g.,policy makers, state and local education officials,business) the potential for misuse of tests increasesand efforts to educate and monitor test users be-come less effective. Existing testing standards andspecialty guidelines and other forms of addressingthe responsible use of tests are discussed. The po-tential consequences of testing and assessment arereviewed and a taxonomy has been proposed to aidtest users in addressing the multiple and multidi-mensional consequences resulting from test usewith various key stakeholder groups. Finally, this pa-per provides a more detailed review of the profes-sional concerns arising from the migration of teststo a computer-based platforms and the increaseddemands placed on assessments in U. S. education.

The value of assessment is often related to its im-pact. Individual appraisals should bring to bear allrelevant information to describe and explain impor-tant qualities, minimize problems, promote growthand development, and increase the validity of im-portant decisions (e. g., course placement, admis-sions, certification, selection) (Scheuneman & Oak-land, in press). National, state, and local testing pro-grams should provide comprehensive data that cansupplement other sources of information in both in-forming us of student skills and knowledge todayand the growth in learning over time.

Legal, ethical, and professional concerns with as-sessment are difficult to distinguish. All such issuesconcern the proper use of assessment and the prob-able consequences of using assessments. Conse-quences of testing are in the eye of the beholder.The same assessment which presents several poten-tial benefits to some groups (e. g., policy makers,community, business) may also result in negativeconsequences to individuals (e. g., test takers, stu-dents). A paradigm is needed to assist test usersidentify and evaluate the potential consequencesthat result from test use and the consequenceswhich would result from alternative practices (useof more subjective processes, collecting no data).Additional attention to the consequences of testing,

and how these are determined and evaluated by thevarious stakeholders is essential to reduce the mis-use of testing and improve assessment practicesamong the increasingly diverse types of individualsusing tests and results from testing.

Résumé

Les tests et les évaluations ont fait l’objet de con-traintes supplémentaires au fur et à mesure que leurutilisation a été étendue par les enseignants, les em-ployeurs et les décideurs. Des exigences grandissan-tes sont souvent adressées à ces mêmes évaluationspar diverses instances, pour répondre à des butsvariés (par ex: la réforme de l’enseignement, la re-sponsabilité des étudiants, la qualité des méthodesd’enseignement). Ces tendances ont suscité despréoccupations considérables quant à l’emploi ap-proprié des tests et de leurs résultats, et lesméthodes de testing sont examinées de plus en plusscrupuleusement dans le domaine de l’éducation,del’emploi et de la santé. Le présent article différencieles problèmes légaux, éthiques et professionnels ap-parus récemment du fait de demandes accruesd’évaluations et il identifie les problèmes spéci-fiques liés à l’application et à l’interprétation infor-matisées des tests. L’auteur passe en revue les ef-forts entrepris par les associations professionnellesen vue d’améliorer les pratiques d’évaluation ainsique les problèmes concernant l’utilisation adéquatedes évaluations dans le domaine de l’éducation. En-fin il propose une méthodologie pour identifier lesconséquences liées à l’emploi des tests et unetaxonomie destinée à évaluer les conséquencesmultidimensionnelles de l’emploi des tests dans uncontexte donné.

Author’s address:

Dr. Wayne J. CamaraThe College Board19 Hawthorne DrivePrinceton Junction, NJ 08550USAE-mail: [email protected]

References

Aiken, L., West, S. G., Sechrest, L., Reno, Raymond R.,Roediger III, H. L., Scarr, S., Kazdin, A. E., & Sherman,S. J. (1990). Graduate training in statistics, methodolo-gy, and measurement in psychology: A survey of PhD

150 Wayne J. Camara

programs in North American. American Psychologist,45, 721–734.

American Educational Research Association, AmericanPsychological Association, & National Council onMeasurement in Education. (1985). Standards for edu-cational and psychological testing. Washington, DC:APA.

American Educational Research Association & NationalCouncil on Measurements Used in Education. (1955).Technical Recommendations for Achievement Tests.Washington, DC: National Educational Association.

American Federation of Teachers, National Council onMeasurement in Education, & the National Educa-tional Association (1990). Standards for teacher compe-tence in educational assessment of students. Washington,DC: Authors.

American Psychological Association (1953). Ethical stan-dards for psychologists. Washington, DC: Author.

American Psychological Association (1954). Technicalrecommendations for psychological tests and diagnostictechniques. Washington, DC: Author.

American Psychological Association (1993). Ethical prin-cipals of psychologists and code of conduct. AmericanPsychologist, 49, 1597–1611.

Association of Personnel Test Publishers (1990). Modelguidelines for preemployment integrity testing programs.Washington, DC: Author.

Berliner, D. C., & Biddle, B. J. (1995). The manufacturedcrisis: Myths fraud and the attack on America’s publicschools. Reading, MA: Addison-Wesley.

Brown, D. C. (February 8, 1997). Personal correspon-dence.

Camara, W. J. (1994). Consequences of test use: The needfor criteria. Paper presented at the 23rd InternationalCongress of Applied Psychology. Madrid, Spain.

Camara, W. J. (1996). Fairness and public policy in em-ployment testing. In R. Barrett (Ed.) Fair employmentstrategies in human resource management (pp. 3–11).Westport, CT: Quorum Books.

Camara, W. J., & Brown, D. C. (1995). Educational andemployment testing: Changing concepts in measure-ment and policy. Educational Measurement: Issues andPractice, 14, 1–8.

Center for Research on Evaluation, Standards and Stu-dent Testing (1995). Results from the 1995 CRESSTConference: Assessment at the crossroads. Los Angeles:UCLA, CRESST.

College Entrance Examination Board (1988). Guidelineson the uses of College Board test scores and related data.New York: Author.

Cole, N., & Willingham, W. (1997). Gender and fair assess-ment. Hillsdale, NJ: Erlbaum.

Council on Licensure, Enforcement, and Regulation(1993). Development, administration, scoring, and re-porting of credentialing examinations. Lexington, KY:Council of State Governments.

Council on Licensure, Enforcement, and Regulation &National Organization for Competency Assurance.(1993). Principles for fairness: An examining guide forcredentialing boards. Lexington, KY: Author.

Cronbach, L. J. (1988). Five perspectives on validity argu-ments. In H. Wainer & H. I. Braun (Eds.), Test validity(pp. 3–17). Hillsdale, NJ: Erlbaum.

Deutsch, C. H. (October 16, 1988). A mania for testingspells money. New York Times.

Dunbar, S. B., Koretz, D. M., & Hoover, H. D. (1991).Quality control in the development and use of perform-ance assessments. Applied Measurement in Education,4(4), 289–303.

Educational Testing Service (October 31, 1996). Testcheating scheme used encoded pencils, complaintscharges. ETS Access. Princeton, NJ: Author.

Employee Polygraph Protection Act of 1988, Sec. 200001et sec., 29 U. S.C.

Everson, H. E. (in press). A theory-based framework forfuture college admissions tests. In S. Messick (Ed.), As-sessment in higher education. Hillsdale, NJ: Erlbaum.

Eyde, L. D., & Kowal, D. M. (1987). Computerized testinterpretation services: Ethical and professional con-cerns regarding U. S. producers and users. Applied Psy-chology: An International Review, 36, 401–417.

Eyde, L. D., & Quaintance, M. K. (1988). Ethical issuesand cases in the practice of personnel psychology. Pro-fessional psychology: Research and Practice, 19(2), 148–154.

Fairtest (Summer, 1996). Cheating cases reveal testing ma-nia. Fairtest Examiner, 9, 3–4.

Hambleton, R. K. (1994). Guidelines for adapting psycho-logical and educational tests: A progress report. Euro-pean Journal of Psychological Assessment, 10, 229–244.

Haney, W. (1996). Standards, schmandards: The need forbringing test standards to bear on assessment practice.Paper presented at the Annual Meeting of the Ameri-can Educational Research Association, New York.

Haney W., & Madaus, G. C. (1991). In R. K. Hambleton& J. C. Zaal (Eds.), Advances in educational and psy-chological testing (pp. 395–424). Boston, MA: Kluwer.

Haney, W., Madaus, G. C., & Lyons, R. (1993). The frac-tured marketplace for standardized testing. Boston, MA:Kluwer.

Helms, J. E. (1992). Why is there no study of culturalequivalence in standardized cognitive ability testing?American Psychologist, 47, 1083–1101.

Joint Committee on Testing Practices. (1988). Code of fairtesting practices in education. Washington, DC: Author.(Copies may be obtained from NCME, Washington,DC)

Joint Committee on Testing Practices. (1996). Rights andresponsibilities of test takers (Draft). Washington, DC:Author.

Kirst, M. W., & Mazzeo, C. (1996). The rise and fall of stateassessment in California 1993–96. Kappan, 22, 319–323.

Lindsay, G. (1996). Ethics and a changing society. Euro-pean Psychologist, 1, 85–88.

Linn, R. L. (1993). Educational assessment: Expanded ex-pectations and challenges. Educational evaluation andpolicy analyses, 15(1), 1–16.

Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex,performance-based assessment: Expectations and vali-dation criteria. Educational Researcher, 20(8), 15–21.

Madaus, G. F. (1992). An independent auditing mechan-ism for testing. Educational Measurement: Issues andPractice, 11, 26–31.

Matarazzo, J. D. (1986). Computerized clinical psycholog-


ical test interpretations: Unvalidated plus all mean andno sigma. American Psychologist, 41, 14–24.

McGrew, K. S., Thurlow, M. L., & Spiegel, A. N. (1993).An investigation of the exclusion of students with dis-abilities in national data collection programs. Educa-tional Evaluation and Policy Analyses, 15, 339–352.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educa-tional measurement (3rd ed., pp. 33–46). New York:Macmillan.

Mills, C. N., & Stocking, M. L. (1996). Practical issues inlarge-scale computerized adaptive testing. AppliedMeasurement in Education, 9, 287–304.

National Academy of Education (1992). Assessing studentachievement in the states: The first report of the NationalAcademy of Education panel on the evaluation of theNAEP trial state assessment; 1990 Trail State Assess-ment. Stanford, CA: Stanford University, NationalAcademy of Education.

National Association of Collegiate Admissions Counsel-ors (1995). NCACA commission on the role of stand-ardized testing in college admissions. Author.

New York Times (January 22, 1997). One of the newesttake-at-home tests: IQ. New York Times.

Paris, S. G., Lawton, T. A., Turner, J. C., & Roth, J. L.(1991). A developmental perspective of standardized

achievement testing. Educational Researcher, 20(5),12–20.

Phillips, S.E. (1996). Legal defensibility of standards: Is-sues and policy perspectives. Educational Measure-ment: Issues and Practice, 15, 5–13.

Russell, M., & Haney, W. (1997). Testing writing on com-puters: An experiment comparing student performanceon tests conducted via computer and via paper-and-pencil. Educational Policy Analyses Archives, 5(3) 1–18.

Sackett, P. R., Burris, L. R., & Callahan, C. (1989). Integ-rity testing for personnel selection: An update. Person-nel Psychology, 42, 491–529.

Sackett, P. R., & Harris, M. M. (1984). Honesty testing forpersonnel selection: A review and critique. PersonnelPsychology, 32, 487–506.

Scheuneman, J. D., & Oakland, T. (in press). High stakestesting in education.

Schmeiser, C. B. (1992). Ethical codes in the professions.Educational Measurement: Issues and Practice, 11(3),5–11.

Society for Industrial and Organizational Psychology(1987). Principles for the validation and use of personnelselection procedures. Bowling Green, OH: Author.

Vroom, V. H. (1964). Work and motivation. New York:Wiley.

152 Wayne J. Camara

ethical issues in assessment

Documents