measurement and assessment in teaching
TRANSCRIPT
Instructor’s Resource Manual and Test Bank
for
Measurement and Assessment in Teaching Eleventh Edition
M. David Miller
University of Florida
Robert L. Linn
Professor Emeritus, University of Colorado–Boulder
Norman E. Gronlund
Late of University of Illinois at Urbana–Champaign
Prepared by
Michael Poulin
Boston Columbus Indianapolis New York San Francisco Upper Saddle River
Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto
Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo
i
______________________________________________________________________________
Copyright © 2013, 2009, 2005 by Pearson Education, Inc. All rights reserved. Manufactured in the United States of
America. This publication is protected by Copyright, and permission should be obtained from the publisher prior to
any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic,
mechanical, photocopying, recording, or likewise. To obtain permission(s) to use material from this work, please
submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River,
New Jersey 07458, or you may fax your request to 201-236-3290.
Instructors of classes using Miller, Linn, and Gronlund’s Measurement and Assessment in Teaching, 11e, may
reproduce material from the instructor's resource manual and test bank for classroom use.
10 9 8 7 6 5 4 3 2 1 ISBN-10: 0-13-297795-8
ISBN-13: 978-0-13-297795-1
ii
PREFACE
This manual is intended for use with the 11th edition of MEASUREMENT AND
ASSESSMENT IN TEACHING. It consists of (1) descriptions of student activities that might be
used to enhance learning, and (2) a set of test items for each chapter of the book. The majority of
test items are multiple-choice but there are also some other selection-type items and short-answer
problems. The test items for each chapter have been revised to reflect changes the 11th edition of
the textbook and many new items have been added.
The student projects contained in this manual are useful for developing skill in the construction
and selection of tests and other assessment procedures. These types of activities also help
students appreciate some of the difficulties involved in developing objectives, constructing
specifications for tests and assessments, constructing test items and performance-based
assessment tasks, and selecting published tests. Such projects are typically considered an
important part of a measurement course, both from the standpoint of enhancing student learning
and as a useful means of evaluating how well students are able to apply the concepts and
procedures learned in the course.
The test bank items are arranged by chapter and are intended only as a beginning pool of items to
aid in constructing classroom tests. The items are geared closely to each chapter in the textbook
and thus have more knowledge items than would ordinarily be used in a typical classroom test.
The best procedure is to supplement these items with selection-type and supply-type items of
your own as well as performance tasks requiring more extended responses. By supplementing the
items in this way it should be possible to give more emphasis to testing and assessment issues
that cut across chapters and to measuring the understanding of concepts and issues discussed in
class.
iii
CONTENTS
Page
STUDENT PROJECTS……………………………………………………………….... 1
1. Educational Testing and Assessment: Context, Issues, and Trends………..………... 4
2. The Role of Measurement and Assessment in Teaching………………………….… 15
3. Instructional Goals and Objectives: Foundation for Assessment…………………… 26
4. Validity……………………………………………………………………......…..… 38
5. Reliability and Other Desired Characteristics………………………………..……… 49
6. Planning Classroom Tests and Assessments………………………………………… 61
7. Constructing Objective Test Items: Simple Forms……………………………..….... 73
8. Constructing Objective Test Items: Multiple-Choice Forms………………………... 85
9. Measuring Complex Achievement: The Interpretative Exercise…………………... 96
10. Measuring Complex Achievement: Essay Questions……………………………. 107
11. Measuring Complex Achievement: Performance-Based Assessments………….. 118
12. Portfolios………………………………………………………………………… 129
13. Assessment Procedures: Observational Techniques, Peer Appraisal, and
Self-Report……………………………………………………………...……. 139
14. Assembling, Administering, and Appraising Classroom Tests and Assessments.. 149
15. Grading and Reporting…………………………………………………………… 159
16. Achievement Tests…………………………………………………………….… 170
17. Aptitude Tests ……………………………………………………………………. 181
18. Test Selection, Administration, and Use…………………………………………. 192
19. Interpreting Test Scores and Norms……………………………...……………… 204
Appendix A – Elementary Statistics…………………………………………….…… 215
1
STUDENT PROJECTS
During a one-semester course it is usually possible to have students develop a complete test and
assessment construction project and critically review one or more standardized tests or
assessments in their teaching fields. If the course is being offered for a shorter period of time, or
as part of another course, it may be desirable to use a series of shorter projects.
Complete Test and Assessment Construction Project
Each student is asked to select some course, or unit of work within a course, and develop a test
and assessment construction project that includes the following:
1. A list of 5 to 15 important learning outcomes to be assessed.
2. A list of subject-matter topics to be covered in the instruction.
3. A set of specifications for the test items and assessment tasks as described in Chapter
6.
4. A 40-item test using a combination of selection-type and short-answer, supply-type
items which includes: (a) complete directions, (b) test items that are appropriate for
the specific learning outcomes being measured, and (c) a scoring key. Each test item
should be keyed to a specific learning outcome.
5. Four extended-response assessment tasks using either the essay question format
discussed in Chapter 10 or the performance-based task approach described in Chapter
11. The assessment tasks should include complete directions, including specification
of any special resources (e.g., equipment, books) available to students and a scoring
guide. Each task should include a brief description of the learning outcomes the task
is intended to measure and why those outcomes would be difficult or impossible to
measure with items like those used in the 40-item test.
6. A bibliography of books and other source materials used in completing the project.
This project is fairly time consuming but it takes the student through the major steps of
constructing tests and assessments that are emphasized in the textbook. Since the steps in the
project closely parallel the sequence of the chapters in the textbook, it is possible for students to
start on it early in the semester and to work on each phase of it as it is discussed in class.
The above project can be reduced in scope by reducing the number of objectives, the number of
test items, or the number of performance assessment tasks.
2
Brief Test and Assessment Construction Projects
1. Have students select a chapter in the textbook and do the following:
a. State the learning outcomes stressed in the chapter.
b. Construct 10 objective test items and one essay question or performance-
based assessment task.
c. Indicate the learning outcome measured by each item and task.
2. Have student construct one multiple-choice item to measure each of the following
learning outcomes.
a. Knowledge of a specific term.
b. Knowledge of a specific test.
c. Knowledge of a method or procedure.
d. Understanding of a fact, principle, or procedure.
e. Ability to apply a fact, principle, or procedure.
3. Construct an interpretive exercise for each of the following.
a. A paragraph of written material.
b. Some type of pictorial material.
4. Construct a performance-based assessment task that could be completed in one class
period that would measure the ability to apply critical course concepts in a realistic
setting.
Portfolio Construction Project
Have students construct guidelines for a portfolio intended to display progress to parents during
the school year. Allow students to choose the subject area and grade for the portfolio. The
guidelines should specify:
a. The purpose of the portfolio.
b. Who will have access to the portfolio.
c. The number and types of entries students are expected to include.
d. The role of collaboration in developing portfolio entries.
e. The inclusion of self-evaluations of the entries.
f. The evaluation criteria to be employed.
Critical Evaluation of Published Tests
Have each student critically evaluate one or more of the following tests, using Chapter 18 of the
textbook.
1. Achievement test battery.
2. Achievement test in a specific content area.
3. Reading test (readiness, diagnostic, or survey type).
4. Scholastic aptitude test or multiaptitude test.
5. Test in a special area (art, music, creativity).
3
Item Analysis Project
Provide students with responses to a set of items (possibly one of your own tests) and have them
conduct an item analysis and interpret the results. Access to an easy-to-use item analysis package
for a personal computer would facilitate this project as well as show students how to use such
software.
Construction of Rating Scales and Checklists
Have student select an appropriate area and construct either a rating scale or a checklist.
4
Chapter 1 Educational Testing and Assessment: Context, Issues, and Trends
Exercise 1-A
HISTORY OF TEST-BASED REFORM
LEARNING GOAL: Identifies trends in the use of tests in educational reform efforts.
Directions: Indicate whether the following statements about the use of tests in reform efforts
during the past forty years are true (T) or false (F) by circling the appropriate letter.
T F 1. The current emphasis on accountability in education has resulted in increased
school testing.
T F 2. The rapid growth of minimum-competency testing requirements in the
1970s and early1980s was stimulated by the widely-held belief that high
school graduates often lacked essential skills.
T F 3. There is public support for the use of test results to compare schools
academically.
T F 4. Concerns that accountability leads to teaching to the test have contributed to calls
for increased reliance on performance-based assessments.
T F 5. Content standards specify the minimum score required to pass a test.
LEARNING GOAL: Distinguishes between the purposes and characteristics of content standards
and performance standards.
Directions: List the defining features of content standards and of performance standards.
Distinguish between and describe the primary purposes of these two types of standards.
Content Standards:
Performance Standards:
Note: Answers will vary
5
Exercise 1-B
PERFORMANCE ASSESSMENTS
LEARNING GOAL: Identifies characteristics of and rationales for the use of performance
assessments.
Directions: Indicate whether measurement specialists would agree (A) or disagree (D) with
each of the following statements concerning performance assessments by circling the appropriate
letter.
A D 1. The belief that testing and assessment shapes instruction has led to
increased emphasis on performance assessments.
A D 2. The best way to achieve authentic assessment in the classroom is through
performance assessments.
A D 3. Many proponents of performance assessments accept the idea that “what
you test is what you get.”
A D 4. Tasks requiring extended responses have been the target of most criticisms
of testing and assessment.
A D 5. Anything that can be measured by a performance assessment task could
also be measured by a multiple-choice test.
LEARNING GOAL: Identifies advantages and disadvantages of performance-based
assessments.
Directions: List some of the major advantages and disadvantages of performance-based
assessments.
Advantages:
Disadvantages:
Note: Answers will vary
6
Exercise 1-C
NATIONAL AND INTERNATIONAL ASSESSMENT
LEARNING GOAL: Identifies characteristics and limitations of national and international
assessments.
Directions: Indicate whether the following statements about national and international
assessment are true (T) or false (F) for circling the appropriate letter.
T F 1. The National Assessment of Educational Progress (NAEP) enables schools to
compare the performance of their students to the nation as a whole.
T F 2. NAEP provides a means of monitoring trends in the achievement of a student
over 25 years.
T F 3. In addition to national results, NAEP now provides results for state-by-state
comparisons on a voluntary basis.
T F 4. NAEP collects achievement data for students by both age and grade level.
T F 5. Comparisons of nations based on international assessments are as trustworthy as
comparison of regions of the country based on NAEP results.
T F 6. Comparability of results in international assessments is assured by translating
assessments to the languages spoken in different countries.
LEARNING GOAL: Identifies influences at the national level that may influence the role and
nature of testing and assessment in the future.
Directions: Briefly describe actions of the federal government that are likely to influence testing
and assessment in the future.
Note: Answers may vary.
7
Exercise 1-D
CURRENT TRENDS IN EDUCATIONAL MEASUREMENT
LEARNING GOAL: Identifies factors related to current trends in testing and assessment.
Directions: Indicate whether measurement specialists would agree (A) or disagree (D) with
each of the following statements concerning current trends in testing and assessment by circling
the appropriate letter.
A D 1. Computers are especially useful for adaptive testing.
A D 2. Computer-administered simulations of problems enable the measurement of
complex skills not readily measured by paper-and-pencil tests.
A D 3. Despite concern about the quality of school programs, there has been a demand
for less testing and assessment.
A D 4. Computerized adaptive tests increase efficiency by reducing the number of items
that need to be administered to achieve reliable measurement for a given test
taker.
A D 5. The focus on the consequences of testing and assessment has decreased in recent
years.
A D 6. The computer provides the potential to present simulations that can measure the
processes that student use to solve problems.
LEARNING GOAL: Describes advantages and limitations of expanded uses of computer-based
tests and assessments.
Directions: Briefly describe some of the important advantages and major limitations of expanded
uses of computer-based tests assessments.
Advantages:
Limitations:
Note: Answers may vary
8
Exercise 1-E
CONCERNS AND ISSUES IN TESTING AND ASSESSMENT
LEARNING GOAL: Identifies factors related to concerns and issues in testing and assessment.
Directions: Indicate whether test specialists would agree (A) or disagree (D) with each of the
following statements describing concerns and issues in testing and assessment by circling the
appropriate letter.
A D 1. Many of the criticisms of testing are the result of misinterpretation and misuse of
test scores.
A D 2. A common misinterpretation of scores on tests and assessments is to
assume they measure more than they do.
A D 3. Testing can only benefit students.
A D 4. If a particular group of students receives lower scores on a test, it means the test is
biased against members of that group.
A D 5. It is good practice to post scores on standardized tests so that students in a class
can see how their performance compares to that of their peers.
A D 6. Test anxiety may lower the performance of some students.
LEARNING GOAL: Lists the possible effects of students and parents examining school testing
and assessment results.
Directions: List the advantages and disadvantages of the legal requirement that students and
parents must be provided with access to school testing and assessment records.
Advantages:
Disadvantages:
Note: Answers may vary
9
Answers to Student Exercises
1-A 1-B 1-C 1-D 1-E
1. T 1. A 1. F 1. A 1. A
2. T 2. D 2. T 2. A 2. A
3. T 3. A 3. T 3. D 3. D
4. T 4. D 4. T 4. D 4. D
5. T 5. D 5. F 5. D 5. D
6. F 6. A 6. D
10
Chapter 1
Educational Testing and Assessment: Context, Issues, and Trends
1. Externally mandated testing and assessment programs are often appealing to policy makers
because they
A. are popular with teachers.
B. are written by teachers in the child’s given school or school system.
C. indicate whether a given school or school district is effective.
D. indicate high or low teacher quality.
2. Content standards are intended to specify which of the following?
A. instructional approaches to use in teaching specific content
B. the curriculum for all subjects and grade levels
C. what students are expected to learn in a subject or course
D. lists of curriculum materials that should be used in lessons
3. Accountability programs for educational reform have put pressure on schools to make which
of the following decisions?
A. abolish the use of published tests and assessments for test preparation
B. reduce the number of classroom aides working in schools
C. increase the use of a variety of tests and assessments to prepare students
D. offer financial incentives such as scholarships for high-performing students
4. When externally mandated tests are used to measure current student achievement and
progress, the tests are being used as
A. a barometer
B. a lever
C. a method of formative assessment
D. a process to test teacher efficiency and quality.
5. Which of the following best summarizes the findings in the report “A Nation at Risk”?
A. children tended to be tested too much, especially in later grades
B. tests should be administered beginning in the upper elementary or middle school
grades
C. children in the USA scored better than students in most European countries but
lower than most students in Asian countries
D. the quality of American education was mediocre compared with other countries
6. One negative influence of the pressures of accountability on schools is that it encourages
teachers to
A. show students how to make educated guesses on difficult multiple choice test
questions.
B. put less emphasis on important instructional topics not on the test.
C. stress the importance of test scores on students’ overall academic career.
D. organize into grade level teams in order to co-teach curriculum.
11
7. Which of the following would likely be a possible danger of the accountability movement on
the local school program?
A. a narrowing of objectives
B. a neglect for basic skills
C. an expansion of the curriculum
D. an overemphasis on performance objectives
8. Which of the following events followed shortly after the publication of “A Nation at Risk”?
A. Many teachers and administrators were fired and/or transferred to other positions.
B. All 50 states introduced some form of educational reform.
C. No Child Left Behind was enacted.
D. One standardized test was adopted for use by all 50 states.
9. Which of the following summarizes the main difference between content standards and
performance standards?
A. Content standards define what will be learned while performance standards define
how things will be learned.
B. Content standards define how things will be learned while performance standards
define what will be learned.
C. Content standards measure student effort while performance standards measure
the quality of the student performance.
D. Content standards are gender specific while performance standards are specific to
certain minority groups.
10. Computerized testing can increase the efficiency of testing by incorporating
A. adaptive testing procedures.
B. conventional test layout and formats.
C. informal teacher-made tests.
D. more essay questions.
11. Test critics have focused much of their attention on which of the following?
A. how essay tests are administered
B. how math and science tests are scored
C. the use of multiple-choice items
D. printing tests other languages for nonnative English speaking students
12. Abolishing all published tests would most likely yield which of the following results?
A. quicker administrative staffing decisions
B. less effective educational decisions
C. a more objective assessment of accountability programs
D. more opportunities for individuals to succeed on merit
12
13. Misuse of published tests probably can best be prevented by more careful
A. administration.
B. interpretation.
C. scoring.
D. collation.
14. Which of the following would serve as a particularly well founded criticism of standardized
tests?
A. they are used to evaluate teachers rather than children’s achievement levels
B. they measure only limited characteristics of an individual
C. they require excessive time to administer
D. they result in an overemphasis on complex reasoning skills
15. Critics of externally mandated tests argue that these tests cause anxiety for children. Which
of the following arguments might a proponent of externally mandated test likely counter with?
A. Moderate test anxiety can lead to student motivation to learn and do well on tests.
B. Students with test anxiety tend score well on these tests because they are awarded
extra time.
C. Giving students positive rewards for doing well on tests negates most test anxiety.
D. Test anxiety may be present for older students but is virtually nonexistent for
younger students.
16. Which of the following is of particular concern regarding the interpretation of students’ test
scores?
A. that students were given adequate time to take the test
B. that the test did not contain any open-response questions
C. that the test was administered in the morning
D. that the test results do not lead to stereotyping or labeling students
17. Mr. Johnson has told Billy that he “can do better” while admitting to Monica that she is
probably doing “as well as can be expected.” Which of the following acts is Mr. Johnson
likely guilty of?
A. alienating parents
B. reinforcing a self-fulfilling prophecy
C. relying too much on test results
D. not taking into account that some tests may contain gender bias
18. Ms. Smith is using an assessment to gauge how well James is learning day-to-day class
material as well as devising educational programs designed to help students learn the
classroom material better. Which of the following types of assessment would be most
beneficial to Ms. Smith?
A. externally mandated
B. summative
C. formative
D. arbitrary
13
19. Briefly explain, in 4–6 sentences, the mandates of No Child Left Behind and how it relates to
the inclusion or exclusion of testing children with disabilities.
14
Chapter 1: Answer Key
1. C
2. C
3. C
4. A
5. D
6. B
7. A
8. B
9. A
10. A
11 C
12. B
13. B
14. B
15. A
16. D
17. B
18. C
19. According to NCLB, all students regardless of disability must indicate proficiency in
mastering learning goals. Only students with the most severe disabilities may obtain a waiver
from this requirement and such waivers must be arrived at the district level. However, in
achieving State proficiency standards, allowance and accommodations must be allowed for
students with disabilities. The requirement for such accommodations states that a student’s
disability cannot be an impediment or the cause of his/her inability to demonstrate
competency in learning goals. An example of this would be allowing a student with learning
disabilities to have extra time to take the State mandated tests.
15
Chapter 2 The Role of Measurement and Assessment in Teaching
Exercise 2-A
PRINCIPLES AND PROCEDURES OF CLASSROOM ASSESSMENT
LEARNING GOAL: Distinguishes between sound and unsound principles and procedures.
Directions: Indicate whether each of the following statements represents a sound (S) or
unsound (U) principle or procedure of classroom assessment by circling the appropriate letter to
the left of the statement.
S U 1. The first step in measuring classroom learning is to decide on the type of
test to use.
S U 2. Classroom assessment should be based on objective data only.
S U 3. The type of classroom assessment used should be determined by the
performance to be measured.
S U 4. Effective classroom assessment requires the use of a variety of assessment
techniques.
S U 5. Assessment techniques should replace teacher observation and judgment.
S U 6. Error of measurement must always be considered during the interpretation
of assessment results.
LEARNING GOAL: State the meaning of test, measurement, and assessment.
Directions: In your own words, state the meaning of each of the following terms.
Test:
Measurement:
Assessment:
Note: Answers will vary.
16
Exercise 2-B
CLASSROOM ASSESSMENT AND
THE INSTRUCTIONAL PROCESS
LEARNING GOAL: Identifies how classroom assessment functions in the instructional process.
Directions: Indicate whether the textbook authors would agree (A) or disagree (D) with each of
the following statements by circling the letter to the left of the statement.
A D 1. The main purpose of classroom assessment is to improve student learning.
A D 2. The first step in both teaching and assessment is to determine the intended
student learning outcomes.
A D 3. Classroom assessments should not be given until the end of instruction.
A D 4. Instructional objectives should aid in selecting the types of assessment
instruments to use.
A D 5. Assessment results should be used primarily for assigning grades.
LEARNING GOAL: Describes the role of instructional objectives.
Directions: Describe the role of instructional objectives in classroom assessment.
Note: Answers will vary.
17
Exercise 2-C
MEANING OF PLACEMENT, FORMATIVE, DIAGNOSTIC, AND SUMMATIVE
ASSESSMENT
LEARNING GOAL: Classifies examples of classroom assessment procedures.
Directions: For each of the following descriptions, indicate which type of assessment is
represented by circling the appropriate letter using the following key.
KEY P = Placement F = Formative
D = Diagnostic S = Summative
P F D S 1. An achievement test is used to certify student mastery.
P F D S 2. Students are given a ten-item test to determine their learning progress.
P F D S 3. A teacher observes the process used by a student solving arithmetic
problems.
P F D S 4. Algebra students take an arithmetic test on the first day of class.
P F D S 5. Course grades are assigned.
P F D S 6. An assessment is given at the beginning of a new unit.
LEARNING GOAL: State examples of types of assessment procedures.
Directions: For each of the following types of assessment state one specific example that
illustrates its use in some subject area.
Placement:
Formative:
Diagnostic:
Summative:
Note: Answers will vary.
18
Exercise 2-D
MEANING OF CRITERION-REFERENCED
AND NORM-REFERENCED INTERPRETATIONS
LEARNING GOAL: Distinguishes between examples of each type of interpretation.
Directions: Indicate whether each of the following statements represents a criterion-referenced
(C) interpretation or a norm-referenced (N) interpretation by circling the appropriate letter.
C N 1. Erik obtained the highest score on the reading test.
C N 2. Carlos can identify all of the parts of a sentence.
C N 3. Connie can type 60 words per minute.
C N 4. John earned an average score on an arithmetic test.
C N 5. Tonia defined only 20 percent of the science terms.
C N 6. Maria set up her laboratory equipment faster than anyone else.
LEARNING GOAL: Writes statements representing each type of interpretation.
Directions: Write three statements that represent criterion-referenced interpretations and three
statements that represent norm-referenced interpretations.
Criterion-referenced interpretations:
Norm-referenced interpretations:
Note: Answers will vary.
19
Exercise 2-E
MEANING OF CONTRASTING TEST TYPES
LEARNING GOAL: Distinguishes between contrasting test types.
Directions: For each of the following test descriptions indicate which test type is represented by
circling the letter to the left of each description using the following key.
KEY A = Informal C = Mastery E = Speed G = Objective I = Verbal
B = Standardized D = Survey F = Power H = Subjective J = Performance
A B 1. A test using national norms for interpretation.
C D 2. A test used to measure many skills with just a few items for each skill.
E F 3. A test with many items, most relatively simple.
G H 4. A test on which different students obtain the same score results.
I J 5. A test requiring students to set up laboratory equipment.
LEARNING GOAL: Describes a test representing a given test type.
Directions: In the spaces below, write a brief description of a specific test representing each of
the test types.
Survey test:
Mastery test:
Power test:
Objective test:
Note: Answers will vary.
20
Answers to Student Exercises
2-A 2-B 2-C 2-D 2-E
1. U 1. A 1. S 1. N 1. B
2. U 2. A 2. F 2. C 2. D
3. S 3. D 3. D 3. C 3. E
4. S 4. A 4. P 4. N 4. G
5. U 5. D 5. S 5. C 5. J
6. S 6. P 6. N
21
Chapter 2
The Role of Measurement and Assessment in Teaching
1. Classroom assessment of students should primarily focus on which of the following?
A. behavior
B. grading
C. learning
D. feedback
2. Which of the following terms is the most limited?
A. Assessment
B. Measurement
C. Testing
D. Quantitative description
3. Which of the following forms of assessment is the most effective way to determine whether
students are making satisfactory progress?
A. diagnostic
B. formative
C. norm-referenced
D. summative
4. Measurement always involves which of the following?
A. numbers
B. testing
C. performance
D. value judgments
5. When teachers use tests and assessments in the classroom, the highest priority should be
given to which of the following factors?
A. assigning course grades
B. improving instruction
C. maintaining adequate school records
D. reporting student progress to parents
6. Which of the following is one of the most important issues to consider when selecting an
assessment technique?
A. accuracy
B. convenience
C. objectivity
D. relevance
22
7. The first step in measuring student achievement is to determine the
A. date of testing
B. difficulty of the test
C. current student averages
D. method of assessment
8. Measures of maximum performance most likely would include which of the following?
A. mid-term tests
B. attitude scales
C. student journals
D. personality measures
9. Which of the following would be evaluated by using a measure of typical performance?
A. Arithmetic computation
B. Arithmetic problem solving
C. Writing a friendly letter
D. Reading comprehension
10. Which of the following methods of assessment would most likely be given at the beginning
of instruction?
A. Contextual
B. Formative
C. Diagnostic
D. Summative
11. Formative assessment is used primarily for which of the following purposes?
A. grading students
B. monitoring student progress
C. placing students in groups
D. selecting students for awards
12. Summative assessments are most appropriate for which of the following?
A. determining the extent to which instructional goals have been
B. achieved
C. diagnosing student strengths and weaknesses
D. measuring entry learning skills in students from various backgrounds
E. measuring progress during learning
13. Assessments may be classified as norm-referenced and criterion-referenced on the basis of
the types of
A. directions used.
B. interpretations to be made.
C. learning outcomes measured.
D. test items used.
23
14. Which of the following types of assessments would most likely be norm-referenced?
A. Diagnostic
B. Mastery goal attainment
C. Readiness
D. College entrance exam
15. Which of the following factors is likely to differ when constructing norm-referenced and
criterion-referenced tests?
A. Arrangement of items
B. Item difficulty
C. Types of items
D. Relevance to objectives
16. Which of the following is most likely to be used in a criterion-referenced interpretation?
A. Average score in a group
B. Highest score in a group
C. Percentage correct score out of 20
D. Percentile score of 80
17. Norm-referenced and criterion-referenced tests are best viewed as
A. standardized tests.
B. similar tests with similar intents.
C. two different types of tests.
D. valid measures of student learning.
18. Which of the following best represents an untimed test that has items arranged in
increasing order of difficulty?
A. Diagnostic test.
B. Standardized test.
C. Performance assessment.
D. Power test.
19. Which of the following tests is likely to measure many skills with only a few items for each
skill?
A. Diagnostic test
B. Mastery test
C. Performance assessment
D. Survey test
20. A test is referred to as objective when it meets which of the following criteria?
A. Different scorers obtain the same results
B. It is constructed using a variety of question types
C. It measures a clearly defined set of standards
D. There is a standard procedure for interpreting the results
24
21. Which of the following would best describe most teacher-made tests?
A. Informal, power.
B. Formal, speed.
C. Standardized, power.
D. Standardized, speed.
22. Which of the following represents a norm-referenced interpretation?
A. Henry wrote over 500 words for each of his essay questions.
B. Jane defined 70 percent of the items correctly.
C. Bruce’s score was near the top of the class.
D. Emily completed 30 of the 40 math problems correctly.
23. Mr. Rich is a new teacher and is concerned about his class understanding the material being
taught. Which of the following types of assessment would best monitor his instruction?
A. A pretest at the beginning of the class.
B. Frequent class quizzes.
C. Standardized achievement tests.
D. Aptitude tests.
25
Chapter 2: Answer Key
1. C
2. C
3. B
4. A
5. B
6. D
7. D
8. A
9. C
10. C
11. B
12. A
13. B
14. D
15. B
16. C
17. C
18. D
19. D
20. A
21. A
22. C
23. B
26
Chapter 3 Instructional Goals and Objectives: Foundation for Assessment
Exercise 3-A
INSTRUCTIONAL OBJECTIVES AS LEARNING GOALS
LEARNING GOAL: Distinguishes between statements of learning process and learning
outcomes.
Directions: Indicate whether each of the following features describes a learning process (P) or a
learning outcome (O).
P O 1. Learns the meaning of terms.
P O 2. Develops a more favorable attitude toward reading.
P O 3. Demonstrates concern for the environment.
P O 4. Locates a position on a map.
P O 5. Practices interpreting charts and graphs.
P O 6. Describes the value of good study habits.
LEARNING GOAL: Writes well-stated outcomes.
Directions: In the spaces below (1) rewrite as learning outcomes each of the statements at the
top of the page that were classified as learning processes, and (2) write three general statements
of learning outcomes for a course of subject area.
1. Learning outcomes rewritten from process statements.
2. Three general learning outcomes for a course.
Note: Answers will vary.
27
Exercise 3-B
DOMAINS OF THE TAXONOMY (COGNITIVE, AFFECTIVE, PSYCHOMOTOR)
LEARNING GOAL: Identifies examples of instructional objectives belonging to each
taxonomy.
Directions: Indicate the taxonomy domain to which each of the following general instructional
objectives belongs by circling the appropriate letter.
KEY A = Affective
C = Cognitive
P = Psychomotor
A C P 1. Understands basic concepts.
A C P 2. Appreciates the contributions of scientists.
A C P 3. Evaluates a book.
A C P 4. Operates a slide projector.
A C P 5. Writes smoothly and legibly.
A C P 6. Demonstrates an interest in science.
LEARNING GOAL: Writes general instructional objectives that fit each taxonomy domain.
Directions: Write two general instructional objectives for each of the following domains of the
taxonomy.
Cognitive objectives:
Affective objectives:
Psychomotor objectives:
Note: Answers will vary.
28
Exercise 3-C
SELECTING APPROPRIATE INSTRUCTIONAL OBJECTIVES
LEARNING GOAL: Distinguishes between sound and unsound criteria for selecting
instructional objectives.
Directions: Indicate whether each of the following statements is a sound (S) or unsound (U)
criterion for selecting instructional objectives, by circling the appropriate letter.
S U 1. Instructional objectives should be limited to those learning outcomes that can be
measured objectively.
S U 2. Instructional objectives should be in alignment with the goals of the school.
S U 3. Instructional objectives should be concerned primarily with knowledge of facts.
S U 4. Instructional objectives should be selected in terms of their feasibility.
S U 5. Instructional objectives should specify the intended learning outcomes.
LEARNING GOAL: Describes the importance of selecting appropriate instructional objectives.
Directions: In your own words, describe the importance of carefully selecting instructional
objectives.
Note: Answers will vary.
29
Exercise 3-D
STATING GENERAL INSTRUCTIONAL OBJECTIVES
LEARNING GOAL: Distinguishes between well-stated and poorly stated general instructional
objectives.
Directions: For each of the following pairs of objectives, indicate the one that is best stated as a
general instructional objective by circling the letter of your answer (A or B).
1. A Reads supplementary references.
B Sees the importance of reading.
2. A Is aware of the value of money.
B Comprehends oral directions.
3. A Shows students how to make accurate computations.
B Judges the adequacy of an experiment.
4. A Demonstrates proficiency in laboratory skills.
B Gains minimum proficiency in mathematics.
5. A Studies weather maps.
B Constructs weather maps.
6. A Is familiar with the use of the library.
B Locates references in the library.
LEARNING GOAL: Rewrites poorly stated objectives as well-stated general instructional
objectives.
Directions: Rewrite as well-stated general instructional objectives each of the six poorly stated
objectives at the top of the page.
1.
2.
3.
4.
5.
6.
Note: Answers will vary.
30
Exercise 3-E
STATING SPECIFIC LEARNING OUTCOMES
LEARNING GOAL: Distinguishes between performance and non-performance statements of
specific learning outcomes.
Directions: For each of the following pairs of specific learning outcomes, indicate the one that is
stated in performance terms.
1. A States the principle.
B Realizes the value of the principle.
2. A Increases ability to read.
B Selects the main thought in a passage.
3. A Learns facts about current events.
B Relates facts in explaining current events.
4. A Distinguishes facts from opinions.
B Is aware that opinions should not be stated as facts.
5. A Grasps the meaning of terms when used in context.
B Defines the terms in his or her own words.
6. A Identifies the value of a given point on a graph.
B Determines the trend shown in a graph.
LEARNING GOAL: States specific learning outcomes in performance terms.
Directions: Write three specific learning outcomes, in performance terms, for each of the
following general instructional objectives.
Knows basic terms.
Demonstrates good study habits.
Interprets a weather map.
Note: Answers will vary.
31
Answers to Student Exercises
3-A 3-B 3-C 3-D 3-E
1. O 1. C 1. U 1. A 1. A
2. P 2. A 2. S 2. B 2. B
3. P 3. C 3. U 3. B 3. B
4. O 4. P 4. S 4. A 4. B
5. P 5. P 5. S 5. B 5. B
6. O 6. A 6. B 6. B
32
Chapter 3
Instructional Goals and Objectives: Foundation for Assessment
1. For measurement purposes, instructional goals and objectives should be stated in terms of the
A. instructional process.
B. learning process.
C. subject-matter content to be covered.
D. types of learning outcomes expected.
2. Content standards, such as those developed by the National Council of Teachers of
Mathematics provide which of the following for teachers?
A. a general framework for developing curriculum specifications
B. comprehensive instructional materials for classroom use
C. detailed curriculum specifications
D. specifications of standards of performance for students
3. Which of the following is the best example of student learning?
A. Mary increases her speed in reading.
B. John interprets weather maps to predict next week’s weather.
C. Sara memorizes her weekly spelling words.
D. Charles practices playing his guitar for music class.
4. Recent research on learning and cognitive development has led to an increased emphasis on
which of the following?
A. basic skills instruction
B. learning hierarchies of sequential skills of increasing complexity
C. use of drill-and-practice learning activities
D. students constructing meaning from problem solving
5. When specifying learning outcomes to be used in the development of classroom tests and
assessments, teachers need to guard against an overemphasis on
A. application of principles.
B. higher level thinking skills.
C. easy to measure factual knowledge.
D. problem solving skills.
6. Which of the following factors should be considered first by teachers when assigning weight
to each learning outcome in an assessment?
A. the complexity of the objective
B. the emphasis given on popular standardized tests
C. the instructional time devoted to the topic
D. the number of times it will be assessed
33
7. Which of the following questions would best satisfy the criteria for selecting instructional
objectives?
A. Can they be quickly and easily assessed?
B. Are they representative of all disciplines?
C. Can they all be assessed on an essay exam?
D. Are they attainable by the students to be taught?
8. Which of the following would indicate the lowest level of learning for a student?
A. applies a principle to a specific situation.
B. explains a principle in his or her own words.
C. gives a textbook definition of a principle.
D. states an example of a principle.
9. Which of the following indicates the most specific learning outcome?
A. educational goal
B. developmental objective
C. standardized test goal
D. behaviorally stated objective
10. Which of the following is an example of an “application” behavior?”
A. solving a math problem
B. reciting a poem
C. underlining a sentence
D. enjoying classical music
11. Which of the following is best stated as a general instructional objective.
A. applies learning, as needed
B. demonstrates how to use laboratory equipment
C. gains skill in reading
D. possesses ability to use reference materials
12. Which of the following is best stated as a specific learning outcome?
A. can tell when conclusions lack validity
B. develops ability to evaluate conclusions
C. knows when conclusions are valid
D. states valid conclusions
13. Which of the following is an example of a performance term?
A. appreciates
B. outlines
C. realizes
D. thinks
34
14. Which of the following is stated in performance terms?
A. explains the value of a hypothesis
B. increases his or her ability to recognize hypotheses
C. realizes the importance of testing hypotheses
D. sees the difference between a fact and a hypothesis
15. Which of the following best describes how teachers should address unanticipated learning
outcomes?
A. ignored them
B. include them in instruction as they occur
C. note them for future use, but do not assess them
D. document them as evidence of poor planning for the current term
16. The three major domains of the Taxonomy of Educational Objectives are
A. affective, cognitive, psychomotor.
B. attitude, knowledge, performance.
C. knowledge, understanding, application.
D. competency, attitudes, skills.
17. Which of the following provides the best example of a student showing analysis of material?
A. circles an answer
B. outlines a chapter
C. define vocabulary terms
D. recites multiplication tables
18–25. Following is a list of statements that a teacher compiled to clarify what is meant by
understanding principles. If the statement is properly stated in performance terms, circle
P. If it is not properly stated in performance terms, circle N.
Key P = Performance.
N = Not performance.
P N 18. Makes a prediction using the principle studied.
P N 19. Describes situations in which the principle is applicable.
P N 20. Realizes the essential features of the principle.
P N 21. Is familiar with the uses of the principle.
P N 22. Explains the principle in his or her own words.
P N 23. States tenable hypotheses based on the principle.
P N 24. Identifies misapplications of the principle.
P N 25. Develops complete understanding of the principle.
35
26. Which of the following is considered a behavioral verb?
A. think
B. choose
C. appreciate
D. understand
27–29. Identify the correct cognitive taxonomic level with the behavior.
K = Knowledge
U = Understanding
Ap = Application
An = Analysis
S = Synthesis
E = Evaluation
27. Critique an artistic product based on sound objectives principles.
28. Outline the major themes of a play.
29 Recite a poem.
30–34 Identify the correct taxonomy from the Taxonomy of Educational Objectives
C = Cognitive
A = Affective
P = Psychomotor
30. Choosing to go to an opera instead of a rock concert.
31. Hitting a tennis ball over the net.
32. Correctly solving an algebra problem.
33. List and describe the three domains of the Taxonomy of Educational Objectives. Give an
example of the behavior that might be appropriately included in each domain.
34. Describe a behavioral instructional objective? Why is it important that instructional
objectives include behavioral rather than nonbehavioral verbs. Give three examples of a
behavioral verb and three examples of a nonbehavioral verb.
36
Chapter 3: Answer Key
1. D
2. D
3. B
4. D
5. C
6. C
7. D
8. C
9. D
10 A
11. B
12. D
13. B
14. A
15. B
16. A
17. B
18. P
19. P
20. N
21. N
22. P
23. P
24. P
25. N
26. B
27. E
28. A
29. K
30. A
31. P
32. C
33. The three domains of the Taxonomy of Behavioral Objectives are Cognitive, Affective, and
Psychomotor. The cognitive domain includes factual and intellectual knowledge similar to
most content taught in school. The affective domain includes areas such as values, interests
and motivation. The psychomotor domain includes skills related to fine motor and gross
motor movements and skills. An example of a skill under the cognitive domain would be
correctly solving a set of math problems. A skill included under the affective domain would
be an interest in stamp collecting as a hobby. An example of a psychomotor skill would be
riding a bicycle.
37
34. A behavioral instructional objective is one in which the student must perform a quantitatively
measurable task. Such a task is measurable if two or more observers can agree that the
behavior did or did not take place. If the goal or required task is not stated behaviorally, it is
difficult to judge whether the content has been mastered and the objective achieved.
Examples of behavioral verbs are list, solve, and outline. Examples of nonbehavioral verbs
are know, understand, and appreciate.
38
Chapter 4 Validity
Exercise 4-A
VALIDITY AND RELATED CONCEPTS
LEARNING GOAL: Identifies the nature of validity.
Directions: Indicate which of the following statements concerning validity are correct (C) and
which are incorrect (I) by circling the appropriate letter.
C I 1. A test is by definition valid if it is consistent.
C I 2. Validity is a matter of degree (e.g., high, low).
C I 3. Validity is a general quality that applies to various uses of assessment results.
C I 4. Validity is a unitary concept.
C I 5. An objective test is by definition valid.
C I 6. Validity may be described by the correlation of assessment scores with a
criterion measure.
LEARNING GOAL: Distinguishes among validity, reliability, and usability.
Directions: Briefly describe the key feature of each concept.
Validity:
Reliability:
Usability:
Note: Answers will vary
39
Exercise 4-B
MAJOR VALIDITY CONSIDERATIONS
LEARNING GOAL: Identifies characteristics of the major validity considerations.
Directions: for each of the following statements, indicate which major validity consideration is
being described by circling the appropriate letter using the following key.
KEY A = content
B = construct
C = test-criterion relationships
D = consequences
A B C D 1. Can be expressed by an expectancy table.
A B C D 2. Infers a trait by observable behavior.
A B C D 3. Evaluates what happens when assessment results are used.
A B C D 4. Its correlation can range from –1.00 to +1.00
A B C D 5. Emphasizes the representativeness of the sample of tasks.
A B C D 6. Involves use of a table of specifications.
LEARNING GOAL: Writes an example that illustrates each of the four major validity
considerations.
Directions: Briefly describe an example of evidence that would be relevant for each major
consideration.
Content
Construct
Test-Criterion Relationships:
Consequences:
Note: Answers may vary.
40
Exercise 4-C
MEANING OF CORRELATION
LEARNING GOAL: Interprets correlation coefficients and the effects of various conditions on
them.
Directions: In each of the following pairs of statements, select the statement that indicates the
greater degree of relationship and circle the letter of your answer (A or B). Assume other
things are equal.
1. A A correlation coefficient of .60.
B A correlation coefficient of .10.
2. A A predictive validity coefficient.
B A concurrent validity coefficient.
3. A A predictive validity coefficient of .70.
B A predictive validity coefficient of .80.
4. A A correlation between test scores and a criterion measure obtained one week later.
B A correlation between test scores and a criterion measure obtained one year later.
5. A The concurrent validity of a test for the academically gifted standardized on
academically gifted students.
B The concurrent validity of a test for the academically gifted standardized on
students taken as a cross section of any given school.
LEARNING GOAL: Lists factors influencing a correlation coefficient.
Directions: List three factors that will cause correlation coefficients to be small.
Note Answers will vary.
41
Exercise 4-D
EXPECTANCY TABLE
LEARNING GOAL: Interprets an expectancy table.
Directions: In the expectancy table below, the row for each score level shows the percentage of
students who earned a grade of A, B, C, D, or F. Review the table and answer the question
following it (“Chances” means chances in 100).
Percentage of Students
Score F D C B A Total
__________________________________________
115–134 0 12 20 26 42 100
95–114 10 18 18 24 30 100
75–94 32 26 18 18 6 100
__________________________________________
_______ 1. If Sara had a score of 120, what are her chances of obtaining a grade of A?
_______ 2. If Bob had a score of 113, what are his chances of obtaining a failing grade.
_______ 3. If Tanya had a score of 90, what are her chances of obtaining a grade of C or
higher?
_______ 4. How many times greater are Sara's chances than Tanya's of obtaining a grade of A?
_______ 5. What score levels provide the best prediction?
_______ 6. What score levels provide the weakest prediction?
LEARNING GOAL: Describes the advantages, limitations, and cautions in using expectancy
tables.
Directions: In the appropriate spaces below, describe the advantages, limitations, and cautions in
using expectancy tables.
Advantages:
Limitations:
Cautions in Interpreting:
Note: Answers will vary.
42
Exercise 4-E
FACTORS AND CONDITIONS INFLUENCING VALIDITY
LEARNING GOAL: Identifies the influence of assessment practices on validity.
Directions: Indicate what influence each of the following assessment practices is most likely to
have on validity by circling the appropriate letter to the left of each statement, using the
following key.
KEY R = Raise validity; L = Lower validity.
R L 1. Increase item difficulty by using more complex sentence structure.
R L 2. Increase the number of items measuring each specific skill from five to ten.
R L 3. Replace multiple-choice items with short-answer items for measuring the ability
to define terms.
R L 4. Replace multiple-choice items by laboratory performance tasks for measuring
ability to conduct experiments.
R L 5. Use selection-type items instead of supply-type items to measure spelling ability.
R L 6. Use an essay test to measure factual knowledge of historical events.
LEARNING GOAL: Lists factors that lower assessment validity.
Directions: In the space provided below, list as many factors as you can think of that might
lower the validity of a classroom assessment.
Note: Answers will vary.
43
Answers to Student Exercises
4-A 4-B 4-C 4-D 4-E
1. I 1. C 1. A 1. 42 1. L
2. C 2. B 2. B 2. 10 2. R
3. C 3. D 3. B 3. 42 3. R
4. I 4. C 4. B 4. 7 4. R
5. I 5. A 5. A 5. 115–134 5. L
6. C 6. A 6. 95–114 6. L
44
Chapter 4
Validity
1. The term validity, as used in testing and assessment, refers to which of the following?
A. interpretation of the results
B. items or tasks in the test or assessment
C. sets of scores
D. setting learning standards
2. The current concept of validity is best described as
A. a collection of test items.
B. test scores over time.
C. a statistical concept.
D. a unitary concept.
3. All the following relationships between validity and reliability are possible EXCEPT which
of the following?
A. high validity and low reliability.
B. high validity and high reliability.
C. low validity and low reliability.
D. low validity and high reliability.
4. Which of the following data sources would a teacher likely examine to obtain evidence of
validity based on content considerations?
A. frequency distribution
B. correlation coefficient size
C. description of criterion used
D. table of specifications
5. If a test is valid for one group of individuals it most likely means that it
A. is valid for all groups of individuals
B. may not be valid for other groups of individuals
C. by definition possesses strong content validity
D. holds strong construct validity
6. If a teacher wants to generalize from the sample of items in a test to the larger domain of
achievement that the sample represents, the teacher is concerned with
A. a concurrent validation study.
B. content considerations.
C. criterion-related evidence of validity.
D. evidence of face validity.
45
7. When evaluating a standardized achievement test, the most important validity consideration
is the
A. construct claim made by the publisher.
B. content covered by the test.
C. variety of questions offered on the test.
D. predictive relationship with a criterion.
8. Which of the following would likely be a validity consideration inferred from observable
behavior?
A. Construct
B. Content
C. Criterion
D. Consequence
9. Criterion-related validity considerations typically include which of the following?
A. correlations
B. cut-off scores
C. psychological traits
D. construct variables
10. Which of the following is an example of concurrent validity?
A. two sets of behaviors occurring at once
B. updating classroom behaviors
C. known high science achievers scoring high on a biology achievement test
D. the setting of a standard of performance that is expected to be reached
11. Criterion-related evidence of validity can best be obtained by examining which of the
following?
A. reliability coefficient
B. expectancy table
C. table of specifications
D. test sample
12. Interpreting a student's chances of success in college can be most effectively done through
the use of which of the following?
A. construct validity
B. reliability
C. expectancy tables
D. test blueprints
13. Which of the following is an example of criterion-referenced validity?
A. permanence
B. construct
C. predictive
D. content
46
Below are the names of four major considerations in the evaluation of the validity of particular
interpretations and uses of assessment results. For each statement indicate which consideration is
of primary importance and indicate your answer by the appropriate letter (A = Content, B =
Criterion relationships, C = Construct, D = Consequences).
A B C D 14. A test to substitute for a more complex measure.
A B C D 15. A test of science principles.
A B C D 16. An assessment of musical abilities.
A B C D 17. A test of school achievement.
A B C D 18. A proposed math test correlates with a highly valid math test.
A B C D 19. Test results used to place children into reading groups.
A B C D 20. A test to select college students.
A B C D 21. A test used to determine grade-to-grade promotion.
Indicate what influence each of the following assessment practices is most likely to have on
validity by circling the appropriate letter (R = raise validity, L = lower validity).
R L 22. Increasing the number of test items used.
R L 23. Including irrelevant difficulty in test items.
R L 24. Changing the administration rules on a standardized test.
R L 25. Increasing the reading level of test questions.
R L 26. Using a variety of assessment procedures.
R L 27. Telling students how extended responses will be scored.
R L 28. Placing the most difficult items at the beginning of the test.
29. Define each of the four types of validity.
30. What is a construct? Give an example. How might one go about assessing the construct
validity of a test?
31. What is correlation? Give an example. What is the statistic by which it is expressed? What
are the outside ranges of this statistic?
47
Chapter 4: Answer Key
1. A
2. D
3. A
4. D
5. B
6. B
7. B
8. B
9. A
10. C
11. B
12. C
13. C
14. B
15. A
16. C
17. A
18. B
19. D
20. C
21. A
22. R
23. L
24. L
25. L
26. R
27. R
28. L
29. The four types of validity described in the text are: content, construct, criterion-related and
consequence. Content validity refers to the idea that a test should assess a representative
sample of all of the content presented to students. Construct validity refers to the idea that a
test measures a hypothetical concept or trait that is inferred from observable behavior.
Criterion-related validity refers to the idea that the test adequately measures a standard or
criterion of skills for which it is designed to measure. Consequence validity refers to the
adequacy of the decisions or interpretations by which test results will be made.
48
30. Construct validity refers to the testing of a hypothetical trait or set of traits that a person
exhibiting a set of observable behaviors is thought to possess. Examples of constructs include
giftedness, self-esteem and reading comprehension skill. Construct validity is usually
assessed by giving the test in question to a group of individuals who are believed by experts
to hold high levels of that concept. For example, a test of giftedness might be given to a
group of individuals judged to be highly gifted. They should score high on that test in order
to demonstrate the test’s construct validity.
31. Correlation is the degree of relatedness between two events. That is, two events are
correlated when a change in one variable or events leads to an expected change in the second
variable or event. An example of two events being correlated is that as people exercise more
they tend to drink more fluids. Correlation is measured by the Person Product Moment (r)
statistic. The outside limits of this statistic are –1.00 to 1.00.
49
Chapter 5 Reliability and Other Desired Characteristics
Exercise 5-A
COMPARISON OF VALIDITY AND RELIABILITY
LEARNING GOAL: Identifies similarities and differences between validity and reliability.
Directions: Indicate whether each of the following statements is characteristic of validity (V),
reliability (R), or both (B), by circling the appropriate letter to the left of the statement.
V R B 1. Can be expressed by an expectancy table or regression equation.
V R B 2. Refers to the consistency of a measurement.
V R B 3. Is often based on a comparison with an external criterion.
V R B 4. May be used to predict future behaviors.
V R B 5. Compares performance on two halves of an assessment.
V R B 6. Contributes to more effective classroom teaching.
LEARNING GOAL: Explains the relationship between validity and reliability.
Directions: In the appropriate spaces below, briefly explain each of the following statements.
1. If assessment results are highly valid, they will also be highly reliable.
2. If assessment results are highly reliable, they may or may not be valid.
3. In selecting an assessment, validity has priority over reliability.
Note: Answers will vary.
50
Exercise 5-B
METHODS FOR DETERMINING RELIABILITY
LEARNING GOAL: Distinguishes among the methods for determining reliability.
Directions: For each of the following statements, indicate which method of determining
reliability is being described by circling the appropriate letter. Use the following key.
KEY: A = Test-retest, same form; B = Equivalent form,
C = Equivalent form, test-retest; D = Coefficient alpha; E = Split half
A B C D 1. Provides an inflated reliability coefficient for a speeded test.
A B C D 2. Would probably be wise to use if the same test is to be administered twice
to the same students
A B C D 3. Pretest, posttest
A B C D 4. Do two versions of my test measure identical content?
A B C D 5. Correlation coefficient must be adjusted with the Spearman-Brown
formula.
A B C D 6. Student scores should be consistent when they are given the same test
twice.
LEARNING GOAL: Summarizes the procedure for obtaining various types of reliability
coefficients.
Directions: Briefly describe the procedure for obtaining each type of reliability coefficient.
Test-retest:
Equivalent forms:
Split-half:
Interrater agreement:
Note: Answers will vary.
51
Exercise 5-C
RELATING RELIABILITY TO THE USE OF ASSESSMENT RESULTS
LEARNING GOAL: Selects the reliability method that is most relevant to a particular use of
assessment results.
Directions: For each of the following objectives, select the reliability method that is most
relevant by circling the appropriate letter using the following key.
KEY: T = Test-retest, E = Equivalent form, S = Split-half, I = Interrater consistency.
T E S I 1. Determining whether test scores on school records are still dependable.
T E S I 2. Selecting an achievement test to measure growth over one school year
(pre-test, post-test)
T E S I 3. Two versions of my test measure the same content?
T E S I 4. Evaluating the adequacy of judgmental scoring of performances on
a complex task.
T E S I 5. Seeking support for the adequacy of the sample of test items.
T E S I 6. Determining whether an informal classroom assessment has
internal consistency.
LEARNING GOAL: Justifies the selection of a reliability method for a particular test use.
Directions: For each of statements 1 through 6 above, write a sentence or two to justify why you
think the selected reliability method would provide the most relevant information for that
particular use.
1.
2.
3.
4.
5.
6.
Note: Answers will vary
52
Exercise 5-D
RELIABILITY COEFFICIENT AND STANDARD ERROR OF MEASUREMENT
LEARNING GOAL: Identifies the similarities and differences between the two basic methods of
expressing reliability.
Directions: Indicate whether each of the following statements is more characteristic of the
reliability coefficient (R), the standard error of measurement (E), or both (B) by circling the
appropriate letter to the left of the statement.
R E B 1. Indicates the degree to which a set of scores contains error.
R E B 2. Is high when the range of scores is low.
R E B 3. Cannot be computed without the other.
R E B 4. Useful in selecting a test for a particular grade.
R E B 5. Increases as the spread of scores increases.
R E B 6. Would be zero if the test were perfectly reliable.
LEARNING GOAL: Describes the use of standard error in interpreting test scores.
Directions: In the appropriate spaces below describe how confidence bands (error bands) are
used to interpret each of the following.
1. An individual test score.
2. The difference between two test scores.
Note: Answers will vary
53
Exercise 5-E
FACTORS INFLUENCING RELIABILITY AND INTER-RATER CONSISTENCY
LEARNING GOAL: Identifies the influence of assessment practices on reliability.
Directions: Indicates whether each of the following practices is most likely to raise (R), lower
(L), or have no effect (N) on reliability.
R L N 1. Add more items like those in the test.
R L N 2. Remove ambiguous tasks from the assessment.
R L N 3. Add five items that everyone answers correctly.
R L N 4. Replace a multiple-choice test with an essay test.
R L N 5. Modify the assessment tasks to obtain a wide spread of scores.
R L N 6. Replace a 10-item multiple-choice quiz with a 10 item true-false quiz.
LEARNING GOAL: Computes and interprets inter-rater consistency expressed as the percent of
exact agreement.
Directions: Using the information from the following table compute the percent exact agreement
for the scores provided by two independent raters.
________________________________________________________________________
Scores Assigned by Rater 1
_________________________________________________________________
Score 1 2 3 4 Row
Total
_________________________________________________________________
Scores 4 0 1 4 12 17
Assigned 3 1 5 17 6 29
by 2 6 18 7 1 32
Rater 2 1 16 5 1 0 22
_________________________________________________________________
Column
Total 23 29 29 19 100
________________________________________________________________________
Percent exact agreement =
Briefly interpret the results:
Note: Answers will vary
54
Answers to Student Exercises
5-A 5-B 5-C 5-D 5-E
1. V 1. D 1. T 1. B 1. R
2. R 2. B 2. T 2. B 2. R
3. V 3. A 3. E 3. E 3. N
4. V 4. B 4. I 4. E 4. N
5. R 5. D 5. E 5. E 5. I
6. B 6. A 6. S 6. E 6. N
55
Chapter 5
Reliability and Other Desired Characteristics
1. The term reliability is closest in meaning to which of the following terms?
A. consistency
B. objectivity
C. practicality
D. validity
2. The term reliability, as used in testing and assessment, refers to which of the following?
A. accuracy of test construction
B. method of test interpretation
C. test or assessment results
D. method of test construction
3. A set of test scores may be classified as which of the following?
A. inconsistent and accurate
B. unreliable and valid
C. inaccurate and valid
D. reliable and invalid
4. Reliability can be best determined by
A. analyzing the overall assessment plan.
B. correlating assessment scores.
C. comparing test scores to a criterion.
D. comparing the errors examiners make.
5. Which of the following types of reliability provides a measure of internal consistency?
A. test blueprint analysis method
B. equivalent-forms with no time interval
C. Kuder-Richardson method
D. test-retest with a one-month interval
6. Which of the following methods of determining reliability is most likely to provide the
smallest reliability coefficient?
A. administer Form A— two week interval—administer Form B.
B. administer Form A—three month interval—administer Form B.
C. administer Form A—no time interval—administer Form B.
D. administer Form A and Form B on the same day and apply the Kuder-Richardson
formula.
7. Which of the following methods of estimating reliability is easiest to obtain?
A. equivalent-forms (with a time interval)
B. equivalent-forms (without a time interval)
C. split-half method
D. test-retest method
56
8. Which of the following types of reliability provides an inflated reliability coefficient for a
speeded test?
A. equivalent-forms (with a time interval)
B. equivalent-forms (without a time interval)
C. split-half method
D. test-retest method
9. Which of the following methods of estimating reliability takes into account the greatest
number of types of consistency in scores?
A. test-retest (immediate)
B. test-retest (time interval)
C. equivalent-forms (immediate)
D. equivalent-forms (time interval)
10. Which of the following types of reliability is the best to assess when students take a pretest
and posttest?
A. equivalent-forms method
B. Kuder-Richardson method
C. split-half method
D. D interrater reliability method
11. The split-half method provides a measure of
A. equivalence.
B. internal consistency.
C. stability.
D. external reliability.
12. The standard error of measurement refers to the error involved in which of the following?
A. any assessment
B. computing the standard deviation
C. setting mastery standards
D. standardized testing
13. The standard error of measurement is especially useful for which of the following
calculations?
A. comparing the reliability of different tests
B. converting raw scores to standard scores
C. estimating validity coefficients
D. interpreting individual test scores
14. Whenever the reliability coefficient is low, the standard error of measurement will be:
A. zero.
B. low.
C. high.
D. unchanged.
57
15. A reliability coefficient (r) and a standard error of measurement (SEM) were computed on a
test with 10 items. What would happen to those statistics if the test were increased to forty
items?
A. r would decrease, SEM would increase.
B. r would decrease, SEM would stay the same.
C. r would increase, SEM would decrease.
D. r would increase, SEM would stay the same.
16. As the reliability coefficient (r) increases the SEM
A. increases.
B. decreases.
C. stays the same.
D. increases at first then decreases to zero.
17. If a test has a standard deviation of 8 and an equivalent-form reliability of .75, the
standard error of measurement is
A. 2.
B. 3.
C. 4.
D. 6.
18. The standard error of measurement tends to be smallest for which of the following scores?
A. average
B. extremely high
C. extremely low
D. high and low
19. A set of test scores is most likely to provide a small reliability coefficient when there is a
large
A. number of test items.
B. number of test scores.
C. spread of scores.
D. standard error of measurement.
20. Which of the following test characteristics would remain most constant over different
groups?
A. reliability coefficient
B. standard error of measurement
C. standard deviation
D. validity coefficient
58
21. The reliability of a criterion-referenced, performance mastery test focuses on which of the
following?
A. decision consistency
B. spread of scores
C. stability of test scores
D. standard error of measurement
22. Which of the following would be the best procedure to use for improving the reliability of a
classroom test?
A. making items easier
B. making items more difficult
C. increasing the number of items or tasks
D. using more extended-response tasks
23. An achievement test is most useful if it is
A. easy to score.
B. comprised of different question types.
C. reliable and valid.
D. given in essay form.
24. An SEM for a given test is 3. A student gets a score of 75 on the test. What are the limits of
the confidence band on this student’s score?
A. 69–71
B. 72–78
C. 79–82
D. 80–86
25. As SEM increases, what happens to the confidence band?
A. It widens.
B. It narrows.
C. It stays the same.
D. It automatically goes to zero.
26. As reliability increases, what happens to the confidence band?
A. It widens.
B. It narrows.
C. It stays the same.
D. It automatically goes to zero.
27. Two raters are asked to score an essay test. A total of 100 students took the test. The raters
scored the test and gave the same grade on 40 of the tests. What was their interrater reliability?
28. What does interrater reliability measure? In what types of cases or assessments is it
important?
29. Describe the factors that influence reliability. Provide an example of each.
60
Chapter 5: Answer Key
1. A
2. C
3. D
4. B
5. C
6. B
7. C
8. C
9. D
10. A
11. B
12. A
13. D
14. C
15. C
16. B
17. C
18. A
19. D
20. B
21. A
22. C
23. C
24. B
25. A
26. B
27. 40%
28. Interrater reliability is the degree of agreement between two raters that a given behavior has
taken place. It is of particular importance in performance or behavioral assessment situations.
29. Factors that influence reliability are number of assessments performed, the spread in the
distribution of scores, the objectivity of the questions in the assessment and them methods of
estimating reliability.
30. Factors influencing decisions about how high a reliability is adequate for a given assessment
are importance of the decisions to be made, if the decision is final, if the decision is
irreversible, if the results cannot be confirmed, if the decision concerns individuals rather
than groups, and if the consequences of the decision are lasting.
61
Chapter 6 Planning Classroom Tests and Assessments
Exercise 6-A
TYPES AND USES OF CLASSROOM TESTS AND ASSESSMENTS
LEARNING GOAL: Relates type of test item or assessment task to information needed.
Directions: For each of the following questions, indicate which type of test item or assessment
task provides the most useful information by circling the appropriate letter to the left of the
question.
KEY P = Placement
F = Formative
S = Summative
P F S 1. Are students making satisfactory progress in learning to make
connections among major mathematical concepts?
P F S 2. What types of errors are students making in learning grammar?
P F S 3. Should Carman enroll in an advanced mathematics course?
P F S 4. Is Michael ready for instruction on the new unit?
P F S 5. What final grade should Lizanne receive in the science course?
P F S 6. How do my students rank in achievement?
LEARNING GOAL: States whether a criterion-referenced or norm-referenced test is more useful
for a particular use and justifies the choice.
Directions: For each of the questions 1–6 above, (1) state whether a criterion-referenced test or a
norm-referenced test would provide more useful information, and (2) explain, in a sentence or
two, why you think that test type would be more useful.
1.
2.
3.
4.
5.
6.
Note: Answers will vary.
62
Exercise 6-B
SPECIFICATIONS FOR CLASSROOM TESTS AND ASSESSMENTS
LEARNING GOAL: Identifies the procedures involved in preparing specifications for classroom
tests and assessments.
Directions: For each of the following statements, determine whether the procedure is a desirable
(D) or undesirable (U) practice when preparing specifications for test and assessments. Circle
the appropriate letter to the left of the statement.
D U 1. Start by identifying the intended learning outcomes.
D U 2. Limit the specifications to those outcomes that can be measured objectively.
D U 3. Consider the instructional emphasis when specifying the sample of items and
tasks.
D U 4. Increase the relative weighting of topics by including more items on those topics.
D U 5. Use a table of specifications for summative tests only.
D U 6. Consider the purpose of testing when determining item difficulty.
LEARNING GOAL: Explains the importance and nature of using a table of specifications.
Directions: Briefly explain each of the following statements in the space that follows it.
1. Well-defined specifications contribute to validity.
2. Well-defined specifications contribute to interpretability of the results.
3. Tables of specifications may differ for end of unit and end of course assessments.
Note: Answers will vary.
63
Exercise 6-C
USE OF OBJECTIVE ITEMS AND PERFORMANCE ASSESSMENT TASKS
LEARNING GOAL: Identifies whether objective items or performance assessment tasks are
more appropriate for a given condition.
Directions: For each of the following conditions, determine whether objective (O) items or
performance (P) assessment tasks would be more appropriate. Circle the correct letter to the left
of each statement.
O P 1. A broad sampling of learning outcomes is desired.
O P 2. The need is to measure ability to organize.
O P 3. Probably offers highest interrater reliability
O P 4. Time available for scoring is short.
O P 5. The need is to measure knowledge of important facts and major concepts covered
throughout the semester.
O P 6. The need is to measure learning at the synthesis level.
LEARNING GOAL: States whether objective items or performance tasks are more useful for
measuring a particular instructional objective and justifies the choice.
Directions: For each of the following general instructional objectives, (1) state whether
objective items or performance tasks would be more appropriate, and (2) explain, in a sentence
or two, why you think that approach would be more appropriate. 1. Knows specific facts. 2. Interprets a weather map. 3. Evaluates a plan for an experiment. Note: Answers will vary.
64
Exercise 6-D
SELECTING SPECIFIC OBJECTIVE-TYPE ITEMS FOR CLASSROOM TESTS
LEARNING GOAL: Identifies the most relevant objective-type items for a given specific
learning outcome.
Directions: Indicate the type of objective test item that is most appropriate for measuring each of
the specific learning outcomes listed below by circling the appropriate letter to the left of the
outcome, using the following key.
KEY A = Short answer, B = True-false, C = Matching, D = Multiple-choice.
A B C D 1. Links inventors and their inventions.
A B C D 2. Distinguishes between correct and incorrect statements.
A B C D 3. Recalls chemical formulas.
A B C D 4. Identifies the correct date for a historical event.
A B C D 5. Reduces fractions to lowest terms.
A B C D 6. Selects the best reason for an action.
LEARNING GOAL: States specific learning outcomes that can be measured most effectively by
each item type.
Directions: For each of the following types of objective test items, state two specific learning
outcomes that can be measured most effectively by that item type.
Short-answer:
True-false:
Matching:
Multiple-choice:
Note: Answers will vary.
65
Exercise 6-E
PREPARING CLASSROOM TESTS AND ASSESSMENTS
LEARNING GOAL: Distinguishes between sound and unsound procedures for constructing
classroom tests and assessments.
Directions: Indicate whether each of the procedures listed below is sound (S) or unsound (U) in
the construction of classroom tests and assessments by circling the appropriate letter to the left of
the statement.
S U 1. Using a table of specifications in test preparation.
S U 2. Writing more test items and assessment tasks than needed.
S U 3. Including a large number of items and tasks for each interpretation.
S U 4. Writing items on the day before testing.
S U 5. Including some clues on items to aid struggling learners.
S U 6. Putting items and tasks aside for a while before reviewing them.
LEARNING GOAL: Describes the role of item difficulty in preparing classroom tests.
Directions: Describe what is appropriate item difficulty for tests that are designed for each
particular type of interpretation and explain why they differ.
Norm-referenced interpretation:
Criterion-referenced interpretation:
Why they differ:
Note: Answers will vary.
66
Answers to Student Exercises
6-A 6-B 6-C 6-D 6-E
1. F 1. D 1. O 1. C 1. S
2. F 2. D 2. P 2. B 2. S
3. P 3. D 3. O 3. A 3. S
4. P 4. D 4. O 4. D 4. U
5. S 5. U 5. O 5. A 5. U
6. S 6. D 6. P 6. D 6. S
67
Chapter 6
Planning Classroom Tests and Assessments
1. Which of the following is the first consideration in planning for a classroom test or
assessment?
A. Should it criterion-referenced or norm-referenced?
B. Should it be objective or performance-based?
C. What will the results be used for?
D. What content should be covered?
2. Which of the following types of assessment should be used to evaluate student progress in
learning a unit on multiplication?
A. Diagnostic
B. Formative
C. Placement
D. Summative
3. A pretest is a type of
A. diagnostic test.
B. formative test.
C. placement test.
D. summative test.
4. To certify student accomplishment or assign final grades it would be best to use which of the
following?
A. diagnostic assessment
B. formative assessment
C. readiness assessment
D. summative assessment
5. Using a table of specifications will most likely improve a test’s
A. objectivity.
B. practicality.
C. reliability.
D. validity.
6. Failure to use proper specifications for tests and assessments will most likely result in an
overemphasis on which of the following?
A. difficult material
B. state-mandated curriculum
C. factual knowledge
D. writing ability
68
7. The distribution of items and tasks in a table of specifications should reflect the relative
A. importance of objective.
B. objectivity of measurement.
C. practicality of measurement.
D. timeliness of the topic.
8. If a classroom test measures recall of factual information only, it is apt to lack which of the
following?
A. interpretability
B. reliability
C. usability
D. validity
9. Constructing a test with items from a single cell of a two-way (content by objective) table of
specifications is likely to
A. decrease validity and decrease reliability.
B. decrease validity and increase reliability.
C. increase validity and decrease reliability.
D. increase validity and increase reliability.
10. A table of specifications is also referred to as which of the following?
A. a one-way chart
B. a scatter plot
C. a test blueprint
D. a frequency distribution
11. The weight assigned to each instructional objective in a table of specifications should be
determined by which of the following factors?
A. whether it comes first or last in the order of instruction.
B. the instructional time devoted to it.
C. whether or not it is represented on state standardized tests.
D. the time required to respond.
12. The two major classes of essay questions are
A. extended-response and restricted-response.
B. short answer and selection.
C. short answer and supply.
D. supply and selection.
13. One advantage of the supply items over performance-based tasks is that they
A. can be used to measure complex outcomes.
B. have a more desirable influence on student learning.
C. require only paper and pencil.
D. require less time to prepare.
69
14. Which characteristic of a test or assessment is apt to be increased following the
principle, “Use the type of test item or assessment task that measures a learning
outcome most directly”?
A. Objectivity.
B. Practicality.
C. Reliability.
D. Validity.
15. To measure recall of important historical dates it would be best to use which of the following
types of items?
A. matching
B. multiple-choice
C. short-answer
D. true-false
16. To measure integration and application of critical concepts it would be best to use which of
the following question types?
A. extended-response performance tasks
B. multiple-choice items
C. restricted-response essays
D. true-false items
17. Which of the following types of objective test item is likely to be most appropriate for
measuring the following objective: “Distinguishes between fact and opinion”?
A. Matching
B. Multiple-choice
C. Short-answer
D. True-false
18. One advantage of the multiple-choice item over other selection type items is that they
A. eliminate guessing.
B. are easier to construct.
C. are easier to score.
D. provide clues to misunderstandings.
19. For a criterion-referenced interpretation, the difficulty should be determined by which of the
following factors?
A. length of the test
B. nature of the learning tasks
C. spread of the scores
D. type of items used
70
20. The most desirable way to increase the difficulty of a classroom test is to do which of the
following?
A. introduce irrelevant difficulty into the items
B. include more higher-level learning outcomes
C. include more obscure instructional content
D. use a longer test and shorter time limits
21. Removing clues to the answer from test items is most likely to improve the test’s
A. objectivity.
B. practicality.
C. reliability.
D. validity.
22. Using specifications when constructing test items and assessment tasks is most likely to
improve the
A. objectivity.
B. reliability.
C. sampling.
D. standardization.
23. Substituting multiple-choice items for extended-performance tasks is most likely to increase
which of the following?
A. objectivity
B. practicality
C. reliability
D. the number of items in the test
24. The answer to an item in a classroom test should be one that is
A. plausible under specific circumstances.
B. agreed upon by experts.
C. stated somewhere in the textbook glossary.
D. mentioned in the classroom.
25. The reading level of a test item should be at which of the following levels?
A. higher than the reading level of the student.
B. lower than the reading level of the student.
C. the same as the reading level of the student.
D. one level higher than the average of the class.
26. The most basic principle in selecting the type of test items and assessment tasks is to select
item types that are
A. the most direct measure of the intended learning outcome.
B. easy to construct.
C. the most challenging for students to answer in a short time period.
D. liked best by students.
71
27 What are the two types of tables of specifications? How are they different? When would you
use each?
28. Discuss the difference between selection-type and supply-type items? What are two
advantages and disadvantages of each?
29. Identify a type of learning outcome where you would prefer to use matching items and
explain why.
72
Chapter 6: Answer Key
1. C
2. B
3. C
4. D
5. D
6. C
7. A
8. D
9. B
10. C
11. B
12. A
13. C
14. D
15. C
16. A
17. D
18. D
19. B
20. B
21. D
22. C
23. D
24. B
25. B
26. A
27. The two types of table of specifications are a two-way table and a one-way table. A two-way
table has the content topics and/or subtopics down the y-axis of the table and the taxonomic
levels of the objectives (from knowledge through evaluation) across the x-axis. The one-way
table has the content or topics running down the y-axis and the number of tests running
across the x-axis. The two-way table will be the best type to use in most cases.
28. Selection items are those in which the correct answer is given to a student along with a
number of incorrect answers. The student is then asked to select the correct answer. Supply
items are those in which no answers are provided to the student and the student is to supply
the answer on his/her own, from memory. One advantage to selection items is that that more
of them can be included on a test than supply items, and thus sampling of content is better. A
second advantage is that they are more objective to score. A disadvantage of selection items
is that they do not measure higher order learning outcomes and are not very “real-world”
(i.e., most tasks in life are not multiple choice). Advantages to supply items are that they are
more “real world” and that they measure higher-order learning outcomes.
29. Matching items work well when you wish to have students understand the relationship
between two facts or events. An example of a good use of matching questions would be in a
social studies lesson where the names of major explorers are in one list and their discoveries
in another list, and the student task is to match the explorers to their discoveries.
73
Chapter 7 Constructing Objective Test Items: Simple Forms
Exercise 7-A
CHARACTERISTICS OF SHORT-ANSWER, TRUE-FALSE, AND MATCHING ITEMS
LEARNING GOAL: Distinguishes among the characteristics of different item types.
Directions: Indicate which type of objective test item best fits each of the characteristics listed
below by circling the appropriate letter, using the following key.
KEY: S = Short answer, T = True-false, M = Matching.
S T M 1. Is classified as a supply-type item.
S T M 2. Most effective when relationships are involved.
S T M 3. Is most influenced by guessing.
S T M 4. Is most difficult to score.
S T M 5. Directions are most difficult to write for this type.
S T M 6. Correct answer may be obtained on the basis of misinformation.
LEARNING GOAL: States advantages and limitations of each item type.
Directions: For each of the following types of objective test items, state one advantage and one
limitation.
Short-answer item
Advantage:
Limitation:
True-false item
Advantage:
Limitation:
Matching Item
Advantage
Limitation
Note: Answers will vary.
74
Exercise 7-B
EVALUATING AND IMPROVING SHORT-ANSWER ITEMS
LEARNING GOAL: Identifies common faults in short-answer items.
Directions: Indicate the type of fault, if any, in each of the following short-answer items by
circling the appropriate letter, using the following key.
KEY A = Has no faults, B = Has more than one correct answer,
C = Contains clue to the answer
A B C 1. John Glenn first orbited the earth in_______.
A B C 2. In what year did Burgoyne surrender at Saratoga? _______.
A B C 3. The United Nations Building is located in the City of _______ _______.
A B C 4. An animal that eats only plants is classified as _______.
A B C 5. Test specifications can be indicated by a table of_______.
A B C 6. Abraham Lincoln was born in _______.
LEARNING GOAL: Improves defective short-answer items.
Directions: Rewrite as well-constructed short-answer items each of the faulty items in 1–6
above. If an item has not faults, write no faults in the space.
1.
2.
3.
4.
5.
6.
Note: Answers will vary.
75
Exercise 7-C
EVALUATING AND IMPROVING TRUE-FALSE ITEMS
LEARNING GOAL: Identifies common faults in true-false items.
Directions: Indicate the type of fault, if any, in each of the following true-false items by circling
the appropriate letter, using the following key.
KEY A = Has no faults, B = Is ambiguous, C = Contains a clue to the answer,
D= Opinion state (not true or false).
A B C D 1. Camping is fun for the entire family.
A B C D 2. A parasite may provide a useful function.
A B C D 3. The best place to study is in a quiet room.
A B C D 4. A nickel is larger than a dime.
A B C D 5. Abraham Lincoln was born in Kentucky.
A B C D 6. True-false statements should never include the word always.
LEARNING GOAL: Improves defective true-false items.
Directions: Rewrite as well-constructed true-false items each of the faulty items in 1 to 6 above.
If an item has no faults, write no faults in the space.
1.
2.
3.
4.
5.
6.
Note: Answers will vary.
76
Exercise 7-D
EVALUATING AND IMPROVING MATCHING ITEMS
LEARNING GOAL: Identifies common faults in a matching exercise.
Directions: Indicate the specific faults in the following matching exercise by circling the
appropriate letter below the exercise (Y = yes, N = no).
Directions: Match the items in the two columns.
Column I Column II
____1. Wind vane A. Used to measure temperature
____2. Tornado B. Water vapor in the air
____3. Humidity C. Violent storm
____4. Thermometer D. Used to measure wind direction
Y N 1. Directions are inadequate.
Y N 2. Columns are inappropriately placed.
Y N 3. Clues are provided in the answers.
Y N 4. Both lists are the same length
Y N 5. Order of responses is improper.
Y N 6. Matching exercise lacks homogeneity.
LEARNING GOAL: Improves a defective matching exercise.
Directions: In the space below, rewrite the matching exercise at the top of the page. You may (1)
add material, (2) delete material, or (3) rework it into more than one exercise, but you should
cover the same type of material.
Note: Answers will vary.
77
Exercise 7-E
CONSTRUCTING SHORT-ANSWER, TRUE-FALSE, AND MATCHING ITEMS
LEARNING GOAL: Constructs sample test items that are relevant to stated learning outcomes.
Directions: In the spaces provided, construct (1) two short-answer items (one in question form
and one in incomplete statement form), (2) four true-false items, and (3) one four-item matching
exercise. State the specific learning outcome for each item or set of items.
Short-answer item (question form)
Outcome:
Item:
Short-answer item (incomplete statement form)
Outcome:
Item:
True-false items
Outcome:
Items:
1.
2.
3.
4.
78
Matching exercise
Outcome:
Directions:
Column I Column II
____1.
____2.
____3.
____4.
____5.
Note: Answers will vary.
79
Answers to Student Exercises
7-A 7-B 7-C 7-D
1. S 1. B 1. D 1. Y
2. M 2. A 2. B 2. N
3. T 3. C 3. D 3. Y
4. S 4. A 4. B 4. Y
5. M 5. C 5. A 5. Y
6. T 6. B 6. D 6. N
80
Chapter 7
Constructing Objective Test Items: Simple Forms
1. Which of the following item types is classified as a supply-type item?
A. Matching
B. Short answer
C. True-false
2. Which of the following item types would provide the highest score based on guessing alone?
A. Matching
B. Short answer
C. True-false
3. Which of the following item types would be most effective for measuring the ability to
distinguish between factual statements and opinion statements?
A. Matching
B. Short answer
C. True-false
4. Which of the following item types is least useful for diagnosing learning difficulties?
A. Matching
B. Short answer
C. True-false
5. Which of the following item types is difficult to assess students’ higher order thinking skills?
A. True-false
B. Short answer
C. Essay
6. Short-answer test items are clearly superior to matching or true-false items in measuring
A. the ability to distinguish between fact and opinion.
B. the ability to interpret data.
C. computational skill.
D. knowledge of terms.
7. The main shortcoming in using short-answer items is the difficulty for educators to do which
of the following?
A. construct them
B. make them challenging
C. score them
D. interpret the results
81
8. Which of the following is the most well-written short answer item?
A. Cleveland may be found on ______.
B. A person first landed on the moon in ___________.
C. The author of Silas Marner is _________ _________.
D. The United Nations building is in ________ _______.
9. Which of the following is the most well-written short-answer item?
A. A test that is ________ is not necessarily _____ or ________.
B. A test that is ________ is not ___________ valid or ________.
C. A test that is reliable is not ___________ _____ or ________.
D. A test that is reliable is not necessarily _____ or useful.
10. In correcting a short-answer test to measure students’ knowledge of specific principles, a
teacher deducted one point from the total score for each misspelled word. How would this
procedure affect test results?
A. Lowers reliability.
B. Lowers validity.
C. Raises the validity.
D. Raises both the reliability and the validity
11. In large-scale mathematics testing programs, many of the advantages of supply-type items
can be maintained by using which of the following item types?
A. grid-in
B. matching
C. essay
D. true-false
12. Which of the following is a limitation of short-answer items?
A. they are not provided with teacher’s edition textbooks
B. they are time consuming to score
C. they reduce the chance of guessing the correct answer
D. they don’t provide an opportunity to distinguish between fact and opinion
13. “All whales are mammals because they are large.” Asking students to mark the previous
statement true or false would be considered poor testing practice because
A. it cannot be classified as true or false.
B. misinformation could lead to the correct answer.
C. the statement is too vague.
D. the word “all” provides a clue to the answer.
14. Which of the following is the most well-written true-false item?
A. A barometer may be useful in predicting weather.
B. All barometers give precise measures of air pressure.
C. A rising barometer forecasts fair weather.
D. The barometer is the most useful weather instrument.
82
15. Absolute terms like all or none that provide clues in true-false statements are known as
A. exaggeration clues.
B. grammatical inconsistencies.
C. specific determiners.
D. verbal associations.
16. A true-false test can be made more reliable if a teacher does which of the following?
A. closely relate the items to the learning outcomes to be measured
B. increase the number of items
C. increase the difficulty of the items
D. instruct students to answer every item
17. Which of the following true-false items statements has a specific determiner?
A. Alaska has both oil and mineral deposits.
B. Alaska is cold in the winter.
C. Alaska is increasing in population.
D. Alaska is never hot in the summer.
18. Matching items are most useful for measuring learning outcomes at which of the following
levels?
A. application
B. interpretation
C. knowledge
D. synthesis
19. The matching item is a modified form of the
A. multiple-choice item.
B. short-answer item.
C. true-false item.
D. fill-in item.
20. One difficulty in constructing matching items is that it is difficult to find material that
requires students to
A. analyze.
B. explain.
C. interpret.
D. relate.
21. A difficulty in constructing matching items is finding material that is
A. homogeneous.
B. interesting.
C. related to the teaching objectives.
D. unquestionably true.
83
22. Guessing on a matching item can be reduced by doing which of the following?
A. making the responses longer than the premises
B. making the responses more homogeneous
C. using an equal number of premises and responses
D. using responses more than once
23. Excessive use of matching items will most likely result in overemphasis on which of the
following?
A. complex learning
B. reasoning ability
C. rote learning
D. synthesis outcomes
24. Which of the following types of items has the greatest chance of measuring irrelevant
material?
A. Matching
B. Short answer
C. True-false
D. Fill-in
25. One way to cut down on the assessment of irrelevant information in matching items is to
write them first as true-false questions.
A. True.
B. False.
26. One problem with true-false items is that when a false item is correct, it does not measure
whether the student actually knows the information that makes the item true.
A. Agree.
B. Disagree.
27. In which of the following types of questions would interrater reliability probably be an
issue?
A. Short-answer.
B. Multiple choice.
C. True-False.
D. Fill-in
28. Discuss why a teacher would likely not take off for spelling when scoring a short answer
question?
29. Explain what is wrong with the following true-false question: “It takes more skill to write
an opera than a rock and roll song.”?
84
Chapter 7: Answer Key
1. B
2. C
3. C
4. C
5. A
6. C
7. C
8. C
9. D
10. B
11. A
12. B
13. B
14. C
15. C
16. B
17. D
18. C
19 A
20. D
21. A
22. D
23. C
24. A
25. B
26. A
27. A
28. It reduces validity.
29. It measures an opinion.
85
Chapter 8 Constructing Objective Test Items: Multiple-Choice Forms
Exercise 8-A
CHARACTERISTICS OF MULTIPLE-CHOICE ITEMS
LEARNING GOAL: Identifies the advantages and limitations of multiple-choice items in
comparison to other item types.
Directions: The following statements compare multiple-choice (MC) items to other item types
with regard to some specific characteristic or use. Indicate whether test specialists would agree
(A) or disagree (D) with each statement by circling the appropriate letter.
A D 1. MC items avoid the possible ambiguity of the short-answer item.
A D 2. MC items are easier to construct than true-false items.
A D 3. MC items have less need for homogeneous material than the
matching exercise
A D 4. MC items can be scored more reliably than short-answer items.
A D 5. MC items can measure all learning outcomes effectively.
A D 6. MC items have higher reliability per item than true-false items.
LEARNING GOAL: Lists the characteristics of an effective multiple-choice item.
Directions: List the important characteristics of each of the following parts of a multiple-choice
item.
Item Stem:
Correct Answer:
Distracters:
Note: Answers will vary.
86
Exercise 8-B
EVALUATING STEMS OF MULTIPLE-CHOICE ITEMS
LEARNING GOAL: Distinguishes between effective and ineffective stems for multiple-choice
items.
Directions: For each of the following pairs, indicate which element would make the most
effective stem for a multiple-choice item by circling its letter (A or B).
1. A Why did the cost of energy rise so rapidly in the 1970s?
B Which one of the following statements is true about energy?
2. A Achievement tests should
B Achievement tests are useful for
3. A A whale is a
B Whales are classified as
4. A Aluminum, which is finding many new uses, is made from
B Aluminum is made from
5. A The man who first explored Lake Michigan was
B The Frenchman who first explored Lake Michigan was
6. A Which of the following illustrates what is meant by the word climate?
B Which of the following does not illustrate what is meant by the word climate?
LEARNING GOAL: Describes the faults in ineffective stems for multiple-choice items.
Directions: For each of the ineffective stems in 1–6 above, briefly describe the type of fault it
contains.
1.
2.
3.
4.
5.
6.
Note: Answers will vary.
87
Exercise 8-C
EVALUATING ALTERNATIVES USED IN MULTIPLE-CHOICE ITEMS
LEARNING GOAL: Distinguishes between effective and ineffective use of alternatives in
multiple-choice items.
Directions: Each of the following multiple-choice item stems has two sets of alternatives.
Indicate which set would make the most effective alternatives for the items by circling the letter
(A or B). The items are kept simple and the alternatives are placed across the page to save space.
1. A United State astronaut flew to the moon in
A (1) 1967 (2) 1968 (3) 1969 (4) 1970
B (1) a spaceship (2) 1969 (3) 1979 (4) 1989
2. Who was the 33rd President of the United States?
A (1) Lincoln (2) Bush (3) Bush (4) Truman
B (1) Roosevelt (2) Truman (3) Eisenhower (4) Kennedy
3. Which of the following represents what is meant by the term reforestation?
A (1) Cutting (2) Replanting (3) Spraying (4) Surveying
B (1) Recutting (2) Replanting (3) Spraying (4) Resurveying
4. Which of the following best describes observable student performance?
A (1) Constructs (2) Fears (3) Realizes (4) Thinks
B (1) Constructs (2) Fears (3) Realizes (4) None of these
LEARNING GOAL: Describes the faults in ineffective sets of alternatives.
Directions: For each of the ineffective sets of alternatives in 1 to 4 above, briefly describe the
type of fault it contains.
1.
2.
3.
4.
Note: Answers will vary.
88
Exercise 8-D
EVALUATING AND IMPROVING MULTIPLE-CHOICE ITEMS
LEARNING GOAL: Identifies common faults in multiple-choice items.
Directions: Indicate the major type of fault, if any, in each of the following multiple-choice
items by circling the appropriate letter, using the following key. The alternatives are placed
across the pages to save space.
KEY A = No fault B = Stem is inadequate,
C = Contains inappropriate distracters D = Contains a clue to the answer
A B C D 1. Reliability (a) means consistency, (b) is the same as objectivity,
(c) refers to usability, (d) is a synonym for interpretability.
A B C D 2. The characteristic that is most desired in test results is
(a) consistency, (b) reliability, (c) stability, (d) validity.
A B C D 3. If a test is lengthened, its reliability will
(a) decrease, (b) increase, (c) stay the same, (d) none of these.
A B C D 4. A method of determining reliability that requires correlating scores
from two halves of a test is called (a) equivalent forms, (b) Kuder-
Richardson method, (c) split-half method, (d) test-retest method.
LEARNING GOAL: Improves defective multiple-choice items.
Directions: Rewrite as well-constructed multiple-choice items each of the faulty items in 1–4
above. If an item has no faults, write no faults in the space.
1.
2.
3.
4.
Note: answers will vary.
89
Exercise 8-E
CONSTRUCTING MULTIPLE-CHOICE ITEMS
LEARNING GOAL: Constructs sample multiple-choice items that are relevant to stated learning
outcomes.
Directions: In a subject area you have studied or plan to teach, state the desired learning outcome
and construct one multiple-choice item for each of the general instructional objectives listed
below.
Understands basic terms:
Outcome:
Item:
Understands specific facts:
Outcome:
Item:
Understands principles (or facts):
Outcome:
Item:
Applies principles or facts
Outcome:
Item:
Note: Answers will vary.
90
Answers to Student Exercises
8-A 8-B 8-C 8-D
1. A 1. A 1. A 1. B
2. D 2. B 2. B 2. C
3. A 3. B 3. B 3. C
4. A 4. B 4. B 4. D
5. D 5. B
6. A 6. A
91
Chapter 8
Constructing Objective Test Items: Multiple-Choice Forms
1. The problem presented in a multiple-choice item should be clear after reading which of the
following?
A. item stem
B. the correct answer
C. the distracters
D. item stem and all the alternatives
2. A distracter in a multiple-choice item refers to which of the following?
A. an alternative that is plausible yet clearly incorrect
B. alternatives that may be correct under certain circumstances
C. negatively stated alternatives
D. any alternative that diverts the reader’s attention away from the item stem
3. An incorrect alternative can be made more plausible by
A. avoiding textbook language.
B. making it grammatically inconsistent with the stem.
C. making it shorter than the others.
D. using common errors made by students.
4. One advantage of multiple-choice items over true-false items is that they reduce the
A. chance for cheating.
B. difficulty of machine scoring.
C. influence of guessing on the score.
D. time needed in test preparation.
5. Which of the following describes one advantage of multiple-choice items over matching
items?
A. for multiple choice items, clearly stated objectives are not needed
B. multiple choice items are more adaptable to different types of outcomes
C. multiple choice items require less testing time
D. there is less need to correct for guessing on multiple choice items
6. One advantage of multiple-choice items over short-answer items is that multiple choice items
A. encourage students to study harder.
B. measure computation skill more effectively than short-answer items.
C. provide more freedom of response.
D. provide a more objective measure of achievement.
92
7. A 50-item multiple-choice test would provide more reliable test scores than a 50-item true-
false test because
A. a bigger spread of scores is obtained.
B. greater care is needed during item writing.
C. the items can be written to include correct answers.
D. the scoring is more objective.
8. Which of the following learning outcomes is most likely to require the best-answer type of
multiple-choice item?
A. Can justify methods and procedures
B. Can identify proper grammar usage
C. Can distinguish between fact and opinion
D. Can understand specific facts
9. Which of the following learning outcomes is most likely to require the best-answer type of
multiple-choice item?
A. Distinguishes between fact and opinion.
B. Identifies the dates of historical events.
C. Understands specific historical facts.
D. Selects the reason a historical event occurred.
10. Inexperienced item writers will produce more effective multiple-choice items if they start
with which of the following?
A. best-answer items
B. a list of possible alternatives
C. the stem in question form
D. the entire test blueprint
11. A multiple-choice item that measures at the understanding level must include which of the
following?
A. a table of specifications
B. a situation that is new to the students
C. at least two plausible distracters
D. introductory material that requires a high level of reading ability
12. A major drawback in using multiple-choice items in which more than one alternative may
be marked as correct creates a difficulty in
A. administering the test.
B. constructing the test items.
C. developing directions for the test.
D. scoring the test.
93
13. The reliability of a multiple-choice item test will tend to increase, with an increase in which
of the following?
A. difficulty of items
B. number of learning outcomes measured
C. number of alternatives in each item
D. diversity in the group being tested
14. Which of the following provides the best stem for a multiple-choice item?
A. Penicillin is
B. Penicillin was discovered by
C. Penicillin, which has many uses in medicine, was discovered by
D. Which of the following scientists discovered penicillin?
15. Which of the following provides the best stem for a multiple-choice item?
A. Which of the following did not contribute to the depression?
B. One major factor that contributed to the depression is
C. The depression was
D. The depression was caused by
16. What is wrong with the stem of the following multiple-choice question? “Which of the
following states is the largest state in the United States?”
A. Largest can be measured either geographically or by population.
B. It measures opinion rather than fact.
C. It measures only a lower order skill.
D. It should be posed as a statement instead of a question.
17. Which of the following sets of alternatives would be best for a multiple-choice item about a
battle in the Civil War?
A. Davis, Grant, Lincoln, none of the above.
B. Lincoln, Mason-Dixon Line, Sherman, Vicksburg.
C. Grant, Jackson, Lee, Sherman.
D. Jefferson, Lincoln, Roosevelt, Washington.
18. Which of the following sets of alternatives is best for the following multiple-choice item:
“The perimeter of a rectangle 4 inches long and 2 inches wide is ______”?
A. 6 inches, 8 inches, 12 inches, 16 inches
B. 2 inches, 12 inches, 24 inches, 36 inches
C. 11 inches, 12 inches, 13 inches, 22 inches
D. 2 inches, 3 inches, 17 inches, 18 inches
19. In order to measure application with multiple-choice items, the problem situations should be
A. described in complex terms.
B. new to the students.
C. the same as those solved in class.
D. able to measure factual knowledge.
94
20. When using the “best-answer” type of multiple-choice item, which of the following should
be included?
A. a stem that consists of at least three sentences
B. the alternative “none of the above”
C. more than seven alternatives
D. alternatives of equal length
21. Ambiguity can best be reduced in multiple-choice items by
A. avoiding the use of “best-answer” items.
B. having another teacher review the items.
C. keeping the length of the alternatives equal.
D. using no more than four alternatives.
22. Multiple-choice tests are particularly well suited for measuring analysis, synthesis and
evaluation.
Agree
Disagree
23. List three advantages that multiple-choice items have over true false items?
24. If one test had 50 multiple-choice items and an equivalent test had 50 true-false items,
which test would probably have higher reliability? Why? What is probably the only way
that a true-false test might possess higher reliability than a multiple-choice test?
95
Chapter 8: Answer Key
1. A
2. A
3. D
4. C
5. B
6. D
7. A
8. A
9. D
10. C
11. B
12. D
13. C
14. D
15. B
16. A
17. C
18. A
19. B
20. D
21. B
22. Disagree
23. Multiple choice items are less susceptible to guessing. They require that the correct answer
be selected, not that an incorrect statement simply be reorganized, and they can measure
skills higher than simple facts.
24. All things being equal, a multiple-choice test will have a higher reliability than an equivalent
true-false test because of the high probability of getting a true-false item correct as a result of
guessing. One way to increase the reliability of true-false tests is to increase the number of
items on the test. Thus, a true-false test with many more items than a corresponding multiple-
choice test might possess higher reliability.
96
Chapter 9 Measuring Complex Achievement: The Interpretative Exercise
Exercise 9-A
CHARACTERISTICS OF INTERPRETIVE EXERCISES
LEARNING GOAL: Identifies the advantages and limitations of interpretive exercises in
comparison to other item types.
Directions: The following statements compare the interpretive exercise (IE) to other types with
regard to some specific characteristics or use. Indicate whether test specialists would agree (A)
or disagree (D) with the statements by circling the appropriate letter.
A D 1. The IE is more difficult to construct than other item types.
A D 2. The IE can be designed to measure more complex learning outcomes than the
single-objective item.
A D 3. The IE provides a more reliable measure of complex learning outcomes than the
essay test.
A D 4. The IE is more effective than the essay test for measuring the ability to
organize ideas.
A D 5. The IE is one of the most effective item types to use with poor readers.
A D 6. The IE measures knowledge of specific facts more effectively than other item
types.
LEARNING GOAL: Lists the characteristics of an effective interpretive exercise.
Directions: List the important characteristics of each of the following parts of an interpretive
exercise.
Introductory material:
Related test items:
Note: Answers will vary.
97
Exercise 9-B
EVALUATING AND IMPROVING THE INTERPRETIVE EXERCISE
LEARNING GOAL: Identifies the common faults in a interpretive exercise.
Directions: Indicate the specific faults in the following interpretive exercise by circling
appropriate letter (Y = yes, N = no) by each fault.
INTERPRETIVE EXERCISE
Directions: Read the paragraph and mark your answers.
Some teachers falsely believe that multiple-choice items are limited to the measurement of
simple learning outcomes because they depend on the recognition of the answer rather than the
recall of it. However, complex outcomes can be measured, and the selection of the correct
answer is not based on the mere recognition of a previously learned answer. It involves the use of
higher mental processes to arrive at a solution and then the correct answer is selected from
among the alternatives presented. This is the reason multiple-choice items are also called
selection-type items rather than recognition-type items. It makes clear that the answer is
selected by the mental process involved and is not limited to recognition.
T F 1. Some teachers think multiple-choice items measure only at the
recognition level.
T F 2. All selection-type items are recognition-type items.
T F 3. Multiple-choice items are also called selection-type items.
T F 4. Selection-type items include multiple-choice, true-false, and matching.
Items for evaluating the interpretive exercise:
Y N 1. Directions are adequate.
Y N 2. Some items measure simple reading skill only.
Y N 3. Some items measure extraneous material.
Y N 4. Some of the items can be answered without reading the paragraph.
LEARNING GOAL: Improves defective interpretive exercise.
Directions: Rewrite the directions for the above interpretive exercise and write one true-false
item that calls for interpretation of the material.
Note: Answers will vary.
98
Exercise 9-C
CONSTRUCTING INTERPRETIVE EXERCISES
LEARNING GOAL: Constructs sample exercise for interpreting a paragraph.
Directions: Construct an interpretive exercise that measures the ability to interpret a paragraph
of written material. Include complete directions, the paragraph, and at least two multiple-choice
items.
Note: Directions will vary.
99
Exercise 9-D
CONSTRUCTING INTERPRETIVE EXERCISES
LEARNING GOAL: Constructs sample exercise for interpreting pictorial material.
Directions: Construct an interpretive exercise that measures the ability to interpret a picture or
cartoon. Include complete directions, the pictorial material, and two objective items of any
type.
Note: Directions will vary.
100
Exercise 9-E
CONSTRUCTING INTERPRETIVE EXERCISES
LEARNING GOAL: Constructs sample exercise for interpreting a table, chart, or graph.
Directions: Construct an interpretive exercise that measures the ability to interpret a table,
chart, or graph. Include complete directions, the pictorial material, and two objective items of
any type.
Note: Directions will vary.
102
Chapter 9
Measuring Complex Achievement: The Interpretive Exercise
1. The interpretive item is probably most effective for measuring which of the following?
A. a broad range of factual knowledge
B. higher-order thinking skills
C. the ability to organize ideas
D. the ability to present relevant arguments
2. Using interpretive exercises will typically result in more effective
A. interpretive writing skills.
B. measurement of complex learning outcomes.
C. motivation to learn factual information.
D. sampling of course content.
3. The introductory material used in an interpretive exercise should include which of the
following qualities?
A. it should be based on pictorial material rather than written material
B. it should be complex and difficult enough that only the better students will answer
correctly
C. it should be in harmony with the instructional objectives
D. it should be selected from the material that the student had studied during the
course
4. The type of interpretive exercise to use should be determined by which of the following?
A. the amount of reading required
B. the intended learning outcomes
C. scoring method to be used
D. types of test items to be written
5. One advantage of the interpretive exercise over the single multiple-choice item is that the
interpretive exercise
A. can measure more complex outcomes.
B. is easier to construct.
C. is easier to score.
D. measures factual information more effectively.
6. One advantage of the interpretive exercise over performance-based assessment tasks is that
the interpretive exercise
A. measures more important outcomes.
B. places less emphasis on reading.
C. provides a more structured task.
D. prevents students from cheating.
103
7. Which of the following “enabling skills” would most likely to lower the validity of an
interpretive exercise?
A. imitating
B. reading
C. thinking
D. writing
8. Requiring students to recall an excessive amount of factual information in order to solve an
interpretive exercise can result in lower
A. objectivity.
B. practicality.
C. reliability.
D. validity.
9. Which of the following is a common error made by test designers in constructing interpretive
exercises?
A. including introductory material that is new to students
B. including material that requires a low level of reading skill
C. including items that can be answered on the basis of general knowledge
D. including too many items for each section of the assessment
10. Which of the following types of test items are most commonly used with interpretive
exercises?
A. alternative-response and key-type
B. key-type and multiple-choice
C. matching and alternative-response
D. matching and multiple-choice
11. The use of interpretive exercises to measure complex learning outcomes reduces the
influence of which of the following?
A. irrelevant factual information
B. students guessing answers
C. reading skills
D. thinking skills
12. The introductory material in the interpretive exercise is most effective if it
A. comes directly from the textbook.
B. is new to the students.
C. places a high demand on reading skill.
D. was thoroughly covered in class discussion.
13. The interpretive exercise is especially useful for educators because it assists in measuring the
ability to
A. detect invalid inferences.
B. express original ideas.
C. recall information.
D. use grammar and spelling skills.
104
14. The test items in an interpretive exercise are most effective if they include which of the
following qualities?
A. they are answered directly from the introductory material
B. they can be answered without any additional information
C. they contain only two alternatives
D. they require more than the recall of information to answer
15. When testing very young students, it is best to use material that is
A. familiar.
B. humorous
C. visual.
D. interesting.
16. When compared to essay questions, interpretive exercises possess
A. greater validity.
B. lower standard deviation.
C. higher interrater reliability.
D. easier construction.
17. A simple method for checking the adequacy of an interpretive exercise is to do which of the
following?
A. attempt to answer the questions without the introductory material
B. read the exercise to students and obtain their feedback on it
C. read and answer the item to yourself, just as a student would
D. score the item after students have taken it and then review their answers
18. Which of the following factors makes it difficult use interpretive exercises on a test?
A. administering them
B. constructing them
C. scoring them
D. relating them to learning outcomes
19. The interpretive exercise is least useful for measuring the ability to
A. appraise the plan for an experiment.
B. evaluate the adequacy of an experiment.
C. produce a product.
D. recognize the validity of conclusions.
20. If the higher achieving students in class can answer the questions on an interpretive exercise
without looking at the introductory material but the lower achieving students can’t, the
exercise is likely
A. well constructed and valid.
B. well constructed and invalid.
C. poorly constructed and valid.
D. poorly constructed and invalid.
105
21. Which of the following would be considered a sound principle for constructing an
interpretive exercise based on a reading passage?
A. offering students a minimum of six alternatives to choose from
B. using introductory materials that students have seen before
C. choosing a passage that is relatively brief
D. incorporating data from a table that is one page in length
22. Keeping the introductory materials on an interpretive exercise relatively brief would assist
students who may be experiencing issues of:
A. excessive page turning.
B. short-term memory.
C. long-term memory.
D. English grammar.
23. Which of the following would likely be a legitimate use of the interpretive exercise?
A. answering reading comprehension questions
B. evaluating a work of art
C. judging a musical composition
D. describing a movie theme
24. Which of the following should probably be the relationship between introductory materials
in a science curriculum interpretive exercise and reading level of students?
A. higher than the class average
B. higher than grade level
C. lower than grade level
D. right at grade level
25. How are interpretive exercises, true-false items, and matching items alike? How are they
different? How are interpretive exercises and performance items alike? How are they
different?
26. What types of errors of interpretation might a social studies interpretive exercise
possess if it contains a relatively long passage that requires high-level reading? Be
as specific as possible.
106
Chapter 9: Answer Key
1. B
2. B
3. C
4. B
5. A
6. C
7. B
8. D
9. C
10. B
11. A
12. B
13. A
14. D
15. C
16. C
17. A
18. B
19. C
20. D
21. C
22. B
23. A
24. C
25. Interpretive, true false and matching exercises are similar in that they are objective and of the
selection variety. They are different in that while true-false and matching items are usually
used to measure lower-order knowledge and factual material, interpretive exercises may be
used for higher-order learning such as understanding and application. Interpretive exercises
and performance items are alike in that they can both measure higher forms of learning. They
are different in that introductory exercises are of the selection variety and performance items
are of the supply variety.
26. Error would result in the teacher not knowing if the student did not answer the questions
correctly because he or she did not know the social studies materials or if the reading
demands were too great for the student. Likewise, an involved passage might have taxed the
student’s attention or short-term memory demands rather than the student’s not learning the
social studies material.
107
Chapter 10 Measuring Complex Achievement: Essay Questions
Exercise 10-A
CHARACTERISTICS OF ESSAY QUESTIONS
LEARNING GOAL: Identifies the advantages and limitations of essay questions in comparison
to objective items.
Directions: The following statements compare essay questions to objective items with regard to
some specific characteristic or use. Indicate whether test specialists would agree (A) or disagree
(D) with the statement by circling the appropriate letter.
A D 1. Essay questions are more efficient than objective items for measuring
knowledge of facts.
A D 2. Essay questions are more subject to bluffing than are objective items.
A D 3. Essay questions are preferred when a teacher is measuring the student’s
ability to organize.
A D 4. Essay questions measure a more limited sampling of content than
objective questions in a given amount of testing time.
A D 5. Essay questions provide more reliable scores than do objective items.
A D 6. Essay questions can measure complex learning outcomes that are
difficult to measure by other means.
LEARNING GOAL: Lists the characteristics of an effective essay question.
Directions: List the important characteristics of each type of essay question.
Restricted-response question:
Extended-response question:
Note: Answers will vary.
108
Exercise 10-B
EVALUATING AND IMPROVING ESSAY QUESTIONS
LEARNING GOAL: Describes faults in essay questions and rewrites them as effective items.
Directions: Describe the faults in each of the following sample essay questions and rewrite each
question so that it meets the criteria for an effective essay item.
1. Why are essay questions better than objective items?
Faults:
Rewrite item:
2. List the rules from your textbook for constructing essay questions.
Faults:
Rewrite item:
3. How do you feel about using essay questions?
Faults:
Rewrite item:
4. Write on one of the following: (1) constructing essay questions, (2) scoring essay
questions, (3) using essay questions to improve learning.
Faults:
Rewrite item:
Note: Answers will vary.
109
Exercise 10-C
CONSTRUCTING RESTRICTED-RESPONSE ESSAY QUESTIONS
LEARNING GOAL: Constructs sample restricted-response essay questions.
Directions: Construct one restricted-response essay question for each of the types of thought
questions listed.
1. Comparing two things.
2. Justifying an idea or action.
3. Classifying things or ideas.
4. Applying a fact or principle.
Note: Answers will vary.
110
Exercise 10-D
CONSTRUCTING EXTENDED-RESPONSE ESSAY QUESTIONS
LEARNING GOAL: Constructs sample extended-response essay questions.
Directions: Construct one extended-response essay question for each of the types of thought
questions listed. For each question, describe the scoring procedures to be used and the elements
included in the scoring.
1. Synthesis: Production of a plan for doing something (e.g., experiment), for constructing
something (e.g., graph, table, dress), or for taking some social action (e.g., preventing
pollution).
Question:
Scoring procedure:
2. Evaluation: Judging the value for something (e.g., a proposal, book, poem, teaching
method, research study) using definite criteria.
Question:
Scoring procedure:
Note: Answers will vary.
111
Exercise 10-E SCORING ESSAY QUESTIONS LEARNING GOAL: Distinguishes between good and bad practices in scoring essay questions. Directions: Indicate whether each of the following statements describes a good (G) practice or a bad (B) practice in scoring essay questions by circling the appropriate letter. G B 1. Use a model answer for scoring restricted-response questions. G B 2. Evaluate all answers on a student’s paper before doing the next paper. G B 3. Review a student's scores on earlier tests before reading the answers. G B 4. Score content and writing skills separately. G B 5. Use the rating method for scoring extended-response questions. G B 6. Lower the score one point for each misspelled word. LEARNING GOAL: Prepares a list of points for scoring essay questions. Directions: List five do’s and five don’ts to serve as a guide for scoring essay tests. DO: 1. 2. 3. 4. 5. DON'T: 1. 2. 3. 4. 5. Note: Answers will vary.
112
Answers to Student Exercises
10-A 10-E
1. D 1. G
2. A 2. B
3. A 3. B
4. A 4. G
5. D 5. G
6. A 6. B
113
Chapter 10
Measuring Complex Achievement: Essay Questions
1. The use of some essay questions in a classroom test will probably improve the assessment’s
A. objectivity.
B. practicality.
C. reliability.
D. validity.
2. For which of the following types of learning outcomes is the essay item most useful?
A. Application
B. Comprehension
C. Knowledge
D. Synthesis
3. Which of the following is useful for improving essay testing?
A. providing unlimited time for student responses
B. permitting students to choose among optional questions
C. determining the scoring procedures in advance
D. scoring the answers while looking at students’ names
4. Which of the following characteristics is shared by both objective tests and essay tests?
A. Both are efficient for measuring knowledge of specific facts.
B. Both are useful in both formative and summative assessment
C. Both provide for extensive sampling of content.
D. The reliability of scoring is high for both.
5. One major problem in using essay questions to evaluate learning is that they are difficult to
A. administer.
B. construct.
C. interpret.
D. score.
6. Essay questions should be used in achievement tests when
A. a wide sampling of material is desired.
B. knowledge of factual information is stressed.
C. little time is available for scoring.
D. organizing and integrating ideas is important.
7. Essay questions are more appropriate than multiple-choice items when the learning outcome
calls for which of the following?
A. development of an argument
B. identification of concepts
C. interpretation of data
D. recognition of relationships
114
8. Essay questions are more appropriate than objective items when measuring the ability
to do which of the following?
A. identify the importance of information
B. integrate information
C. interpret information
D. recall information
9. Which of the following is a serious limitation of an essay test?
A. difficulty of construction
B. limited sampling
C. lack of validity
D. susceptibility to cheating
10. An extended-response essay question is better than a restricted-response question if
A. administration time is limited.
B. complex learning outcomes are being assessed.
C. questions cannot be phrased clearly.
D. reliable scoring is of special importance.
11. A restricted-response essay question is better than an extended-response question if
A. creativity is desired in the response.
B. scoring is to be done by the rating method.
C. the task involves a global approach to problem solving.
D. specific information needs to be supplied by the student.
12. For which of the following learning outcomes would objective items be better than essay
questions?
A. Identifying the meaning of concepts.
B. Relating concepts to form a theory.
C. Synthesizing the arguments in favor of a proposal.
D. Using concepts in solving problems.
13. Which of the following is considered a sound essay grading procedure?
A. assigning points to restricted-response answers
B. using the rating method for extended-response questions
C. using a separate score for spelling errors
D. grading each student’s complete paper before doing the next one
14. Deducting points for neatness on essay responses will have the greatest influence on which of
the following?
A. objectivity
B. reliability
C. validity
D. writing ability
115
15. Which of the following is a desirable practice for scoring restricted-response essay
questions?
A. Give extra credit for brief answers.
B. Lower the score if bluffing is detected.
C. Prepare a model answer in advance.
D. Use the rating method of scoring.
16. Which of the following test construction procedures is most likely to result in valid responses
to extended-response essay questions?
A. Clearly indicate the nature of the desired answer.
B. Set brief time limits for each question to restrict “bluffing.”
C. Write questions that can be answered in a few sentences.
D. Write questions that are limited to the recall of factual information, but cover a
broad range of topics.
17. Which of the following is a desirable practice for grading essay answers?
A. Grade content and spelling separately.
B. Mark an answer with a zero, if bluffing is detected.
C. Read the essays of the highest performing students first to establish the scoring
standard.
D. Score a student's answer in light of what is known about his or her past
achievement.
18. In order to assess students’ ability to express themselves creatively in writing, it would be
best to use which of the following?
A. an objective test of writing skill
B. extended-response essay questions
C. restricted-response essay questions
D. themes and other writing assignments
19. Presenting students with several essay questions and permitting them to choose any two of
them to respond to is an undesirable testing practice for which of the following reasons?
A. it encourages students to write more on each question
B. it is difficult for students to adequately prepare for the test
C. the basis of comparing students is undermined
D. students do not have an opportunity to demonstrate all that they have learned
20. Which of the following would most likely improve the scoring of essay answers on a test?
A. having a second competent scorer grade the papers
B. including errors in grammar in the total score
C. looking at the student's name before reading each paper
D. reading answers of students who typically are high achieving first
116
21. Discuss three differences between restricted-response and extended-response essay
questions.
22. Discuss three differences between analytic and holistic scoring.
117
Chapter 10: Answer Key
1. D
2. D
3. C
4. B
5. D
6. D
7. A
8. B
9. B
10. B
11. D
12. A
13. B
14. C
15. C
16. A
17. A
18. B
19. C
20. A
21. Restricted-response questions closely circumscribe the questions and the manner of
answering that the students should use in their essay answer. They restrict students from
answering the question in a tangential or irrelevant fashion and are easier to score. Extended
response essays give the students much more latitude in framing their answers. They are
more general than the restricted response essay ands possesses fewer instructions.
22. Holistic scoring involves reading the essay in its entirety and giving it an overall, or holistic,
score based on that reading. Analytic scoring requires that the essay be read and scored in
sections, with the section subtotals added before an overall essay grader is given. Analytic
scoring usually results in more reliable scoring.
118
Chapter 11 Measuring Complex Achievement: Performance-Based Assessments
Exercise 11-A
CHARACTERISTICS OF PERFORMANCE-BASED ASSESSMENT TASKS
LEARNING GOAL: Identifies the advantages and limitations of performance-based assessment
tasks.
Directions: The following statements compare performance-based assessment (PBA) tasks to
objective items with regard to some specific characteristic or use. Indicate whether test
specialists would agree (A) or disagree (D) with the statement by circling the appropriate letter.
A D 1. PBA tasks are more likely to be used for higher-level learning objectives.
A D 2. PBA tasks provide a better means of assessing the breadth of a student's
knowledge.
A D 3. PBA tasks are preferred when teachers are measuring the process that a student
uses to solve a problem.
A D 4. PBA tasks measure a more limited sampling of behavior.
A D 5. PBA tasks are more suitable for measuring the ability to solve ill-structured
problems.
A D 6. PBA tasks provide more reliable scores.
LEARNING GOAL: Lists the characteristics of an effective performance-based assessment task.
Directions: List the important characteristics of each type of performance-based assessment task.
Restricted-response task:
Extended-response task:
Note: Answers will vary.
119
Exercise 11-B
CONSTRUCTING RESTRICTED-RESPONSE PERFORMANCE-BASED
ASSESSMENT TASKS
LEARNING GOAL: Constructs simple restricted-response performance-based assessment tasks.
Directions: Construct two restricted-response performance-based assessment tasks for a grade
and subject matter of your choice. Include a description of the directions to students and the
criteria to be used in judging their performances.
1. Directions:
Task:
Scoring criteria:
2. Directions:
Task:
Scoring criteria:
Note: Answers will vary.
120
Exercise 11-C
CONSTRUCTING EXTENDED-RESPONSE PERFORMANCE-BASED ASSESSMENT
TASKS
LEARNING GOAL: Constructs an extended-response performance-based assessment task.
Directions: Construct an extended-response performance-based assessment task involving
problem solving for a grade and subject matter of your choice. The task should require students
to decide on an approach to solving the problem, identify or gather relevant information, and
integrate that information to produce a product. Include a description of the directions to students
and the criteria to be used in judging their performances.
Directions:
Task
Scoring:
Note: Answers will vary.
121
Exercise 11-D
RATING SCALES
LEARNING GOAL: Distinguishes between desirable and undesirable practices in using rating
scales.
Directions: Indicate whether each of the following statements describes a desirable (D) practice
or an undesirable (U) practice in using rating scales with performance-based assessments by
circling the appropriate letter.
D U 1. The descriptive graphic rating scale should be favored over the
numerical scale.
D U 2. In rating performance, derive the characteristics to be rated from
the list of learning objectives.
D U 3. Use a least ten points on each scale to be rated.
D U 4. Use holistic rating procedures to provide students with diagnostic
feedback.
D U 5. Separate ratings of secondary characteristics such as neatness,
form ratings of accomplishment of primary learning objectives.
D U 6. Communicate the criteria to be used in judging performances to
students.
LEARNING GOAL: Constructs items for a rating scale.
Directions: Prepare two items for a descriptive graphic rating scale to be used in assessing some
type of student performance or some product produced by the student. Do not use sample items
from your textbook.
Note: Answers will vary.
122
Exercise 11-E
CHECKLISTS
LEARNING GOAL: Distinguishes between desirable and undesirable practices in using
checklists.
Directions: Indicate whether each of the following statements describes a desirable (D) practice
or an undesirable (U) practice in using checklists by circling the appropriate letter.
D U 1. Use a checklist wherever frequency of occurrence is an important element in the
assessment.
D U 2. When assessing performance with a checklist, include in that checklist both
desired actions and common errors.
D U 3. Use a checklist for assessing some products.
D U 4. Use a checklist to determine if steps in performance were completed in proper
order.
D U 5. Use a checklist for assessing process but not for assessing student products.
D U 6. Avoid the use of checklists in assessing process.
LEARNING GOAL: Constructs a performance checklist.
Directions: Prepare a brief checklist for some simple performance-based assessment task.
Include directions describing how to respond.
Note: Answers will vary.
123
Answers to Student Exercises
11-A 11-D 11-E
1. A 1. U 1. U
2. A 2. D 2. D
3. A 3. U 3. D
4. A 4. U 4. D
5. A 5. D 5. U
6. D 6. D 6. U
124
Chapter 11
Measuring Complex Achievement: Performance-Based Assessments
1. For which of the following types of learning outcomes are performance-based assessment
tasks most useful?
A. Comprehension of concepts
B. Distinguishing fact from opinion
C. Knowledge of appropriate procedures
D. Problem solving
2. Performance-based assessments are more effective than multiple-choice items in measuring
which of the following?
A. the ability to formulate problems
B. the ability to recognize faulty procedures
C. reliability of scoring
D. understanding of concepts
3. Which of the following validity considerations has led to the strongest argument for the
increased use of performance-based assessments?
A. Content
B. Test-criterion relationship
C. Construct
D. Consequences
4. An advantage of performance-based assessments over objective tests is that they can be used
to evaluate which of the following?
A. the attitudes of students
B. both process and product
C. reading skills
D. strengths and weaknesses
5. An advantage of performance-based assessments over objective tests is that they
A. are easier to construct.
B. are easier to score.
C. can better communicate instructional goals requiring complex problem solving.
D. can provide coverage of a broader array of instructional objectives in a given period
of time.
6. Which of the following is a limitation of performance-based assessments?
A. alignment to state standards
B. ability to measure higher-order learning
C. lengthy administration time
D. the ability to assess all learners accurately
125
7. A major advantage of developing criteria for judging performance prior to task
administration is that the criteria can
A. help students understand what is expected.
B. increase students’ appreciation of the task.
C. reduce the scoring burden for teachers.
D. restrict the range of student performances.
8. Restricted-response performance-based assessments are better than extended-response
assessments under which of the following conditions?
A. when the problems are unstructured and allow for multiple solutions
B. when the problems call for originality
C. when the problems require the integration of information from several sources
D. when the problems are structured so that model performances can be constructed
9. Extended-response performance-based assessments are better than restricted-response
assessments under which of the following conditions?
A. when the administration time is limited
B. if a broad sampling of the domain of content is desired
C. when a measure of the ability to gather and integrate information is needed
D. if reliable scoring is of special importance
10. Which of the following would be the best justification for the relatively large amount of time
required to respond to many performance-based assessment tasks?
A. students and parents like them
B. multiple scores can be derived from a single task
C. performance on one task generalizes well to performance on other tasks
D. the tasks can provide students with valuable learning opportunities
11. The dependence of task performance on skills that are irrelevant to the intended purpose of
the assessment tasks (e.g., reading skill for some mathematics tasks) will have the biggest
negative influence on which of the following factors?
A. variety of student performances
B. reliability
C. consistency in scoring
D. validity
12. The most reliable grading of task performances is likely to result when a teacher does which
of the following?
A. grades all performances on one task before going to the next one
B. looks at a student's name before grading the performance
C. starts with performances of the lower performing students
D. uses the rating method of scoring for all tasks
126
13. The points on a rating scale will be least ambiguous when using which of the following
scales?
A. constant alternative
B. descriptive graphic
C. evaluative
D. numerical
14. The most objective information is obtained from an assessment when rating which of the
following?
A. adjustment
B. attitudes
C. overt behavior
D. personality traits
15. Rating the performances of students who are perceived to be most able higher than
comparable performance of students perceived to be less able is an example of which of the
following errors?
A. central tendency error
B. halo effect
C. mathematical-logical error
D. severity error
16. A teacher who rates all performances lower than they are rated by other teachers
demonstrates an example of which of the following errors?
A. central tendency error
B. halo effect
C. mathematical-logical error
D. severity error
17. Which of the following statements best describes an instance of a logical rating error?
A. A rater gives a lower rating to a student who has obtained low scores on previous
tests and assessments than to the same performance by other students.
B. A rater gives low scores to the performance of the class “clown” because of the
rater’s belief that someone who acts so silly in class could not perform well.
C. A rater rates all performances as about average.
D. A rater uses only the high end of the scale in rating all performances.
18. When rating the products of student performances, it is best for educators to do which of the
following?
A. grade all the products produced by a given student before moving on to the next
student
B. identify the student so that background knowledge can be considered
C. include judgments of neatness of the product in the overall rating
D. rate performances on one task for all students before rating performance on
another task
127
19. Analytic scoring is better than holistic scoring when an educator is trying to
A. increase the efficiency of scoring.
B. provide diagnostic feedback to students.
C. remove sources of unreliability.
D. increase overall validity.
20. A checklist should be used in cases where the judgment is based on which of the following?
A. a matter of degree
B. a total impression
C. present or absent decisions
D. ambiguous criteria
21. Describe the advantages and limitations for using performance-based assessments.
22. Describe the types of rating scales used in performance assessment. Contrast a rating scale
with a checklist.
128
Chapter 11: Answer Key
1. D
2. A
3. D
4. B
5. C
6. C
7. A
8. D
9. C
10. D
11. D
12. A
13. B
14. C
15. B
16. D
17. B
18. D
19. B
20. C
21. A major advantage of performance assessments is that they can clearly communicate
instructional goals that involve complex performances in natural settings in and outside of
school. A second advantage of performance assessments is that they can measure complex
learning outcomes that cannot be measured by other means. A third advantage of
performance assessments is that they provide a means of assessing process or procedure as
well as the product that results from performing a task. Finally, a fourth advantage of
performance assessments is that they implement approaches that are suggested by modern
learning theory. Regarding limitations, the most commonly cited limitation of performance
assessments is the unreliability of ratings of performances across teachers or across time for
the same teacher. A second limitation of performance assessments is that they are time-
consuming.
22. The type of rating scale most often used in performance assessments is the numerical rating
scale. With this type of rating scale, the rater checks or circles a number to indicate the
degree to which a characteristic is present. Another type of rating scale is the graphic rating
scale. Another type of scale is the graphic rating scale. The distinguishing feature of the
graphic rating scale is that a horizontal line follows each characteristic. The rating is made by
placing a check on the line. A set of categories identifies specific positions along the line, but
the rater is free to check between these points. Checklists differ from rating scales in that
checklists are of the yes-no variety and thus have only two possible choices.
129
Chapter 12 Portfolios
Exercise 12-A
PURPOSES
LEARNING GOAL: Identifies and distinguishes among the major purposes of portfolios.
Directions: The poles of four dimensions distinguishing the purposes of portfolios are listed
below. Describe the ways in which the purposes of portfolios at the ends of each continuum
differ.
1. a. Instruction
b. Assessment
2. a. Current accomplishments
b. Progress
3. a. Showcase
b. Documentation
4. a. Finished
b. Working
Note: Answers will vary.
130
Exercise 12-B
STRENGTHS AND WEAKNESSES
LEARNING GOAL: Identifies major strengths and weaknesses of using portfolios of student
work for particular purposes.
Directions: Identify four strengths and three weaknesses of portfolios when used for purposes of
assessment.
Strengths:
1.
2.
3.
4.
Weaknesses:
1.
2.
3.
Note: Answers will vary.
131
Exercise 12-C
GUIDELINES FOR PORTFOLIO ENTRIES
LEARNING GOAL: Constructs sample guidelines for entries in a portfolio designed for a
specified assessment purpose.
Directions: Assuming that a portfolio is intended for assessing a student’s progress in writing
during the school year, construct sample guidelines for entries in a portfolio dealing with each of
the four issues below.
Guidelines of Uses of Portfolio:
Guidelines Regarding Access to Portfolio:
Guidelines on Portfolio Construction and Entries:
Guidelines on the Criteria for Evaluation of the Portfolio:
Note: Answers will vary.
132
Exercise 12-D
EVALUATION CRITERIA
LEARNING GOAL: Identifies characteristics of effective criteria for evaluating student
portfolios.
Directions: Indicate whether measurement specialists would agree (A) or disagree (D) with
each of the following statements concerning current trends in testing and assessment by circling
the appropriate letter.
A D 1. Fairness is enhanced by clear specifications of evaluation criteria.
A D 2. Reliability of scores assigned to portfolios is enhanced by using holistic evaluation
criteria for the portfolio as a whole rather than criteria for individual entries.
A D 3. It is desirable for students to include self-evaluations of their work with their
portfolio entries.
A D 4. Analytic criteria are more useful for summative evaluations than for formative
evaluations of portfolio entries.
A D 5. Evaluation criteria should be communicated to students in the guidelines provided to
students for constructing their portfolios.
A D 6. Evaluation criteria, while necessary for portfolios used for assessment purposes, are
not needed for portfolios used for instructional purposes.
LEARNING GOAL: Constructs evaluation criteria for entries in a portfolio used to display best
works.
Directions: Construct a set of evaluation criteria to be used for a portfolio designed to be a
showcase of a student’s best work in a subject area of interest to you.
Note: Answers will vary.
133
Exercise 12-E
PORTFOLIO CONSTRUCTION
LEARNING GOAL: Constructs a showcase portfolio for a class
Directions: For a class of your choice, construct a showcase portfolio to demonstrate your best
work in that subject area.
Note: Answers will vary.
135
Chapter 12
Portfolios
1. Student portfolios are distinguished from file folders of work in that portfolios are
characterized as _________ collections of student work.
A. artistic
B. comprehensive
C. graded
D. purposeful
2. One of the strengths of portfolios that makes them appealing to many teachers is
A. the efficiency and time savings they provide to teachers.
B. the ease with which they can be integrated with instruction.
C. their high reliability.
D. their uniformity of work for purposes of grading.
3. A potential weakness in using portfolios for purposes of student assessment is that they
A. are used in parent conferences.
B. frequently include only “best work” entries.
C. lack standardization needed for comparability.
D. often include student self-evaluations of their work.
4. Portfolios of student work can be especially useful in parent-teacher conferences because
they provide parents with which of the following?
A. a complete record of student work
B. concrete examples of student accomplishments
C. grades on each entry in the portfolio
D. reliable scores that are easily understood
5. Portfolios involving student collaboration on entries are most readily justified when
portfolios are used for purposes of
A. assessment.
B. grading.
C. instruction.
D. job applications.
6. Which of the following is a major obstacle to the effective use of portfolios?
A. they are labor intensive
B. they frequently involve student collaboration
C. they include only examples of a student’s best work
D. they are unpopular with students
136
7. Which of the following is a common misperception of portfolios?
A. they can be used for communication with parents
B. they consist of a haphazard collection of student work
C. they include student self-evaluations of their work
D. they require a clear specification of purpose
8. Comparability of the selections included in a portfolio is of greatest concern when portfolios
are used for which of the following?
A. assignment of course grades
B. communication with parents
C. feedback to students
D. instructional purposes
9. The types of work that are appropriate to include in a portfolio should be
A. completely open to allow for student creativity.
B. determined solely by the student.
C. specified in the portfolio guidelines.
D. the same for all types of portfolios.
10. Which of the following evaluation criteria is most useful when using analytic scoring criteria
for individual portfolio entries?
A. formative evaluation
B. parent teacher conferences
C. student self analysis
D. summative evaluation
11. Which of the following goals is least likely to be effectively achieved using portfolio
assessments?
A. communication with parents
B. demonstration of progress in achievement
C. student self-reflection concerning performance
D. the assessment of factual knowledge
12. The assignment of summative grades to students based on portfolio assessments is best done
with the use of which of the following?
A. analytic evaluation criteria
B. holistic evaluation criteria
C. peer evaluations
D. student self evaluations
13. The inclusion of collaborative assessment tasks in portfolios is particularly useful when
portfolios are used primarily for purposes of
A. assessment
B. communication with parents
C. grading
D. instruction
137
14. Reliability in rating portfolios is enhanced by which of the following?
A. clearly specified evaluation criteria
B. collaborative assessment tasks
C. student freedom to decide on the types of work to include
D. the inclusion of drafts and peer comments as well as final copy
15. Rescoring of portfolios by persons other than a student’s teacher is most likely to be needed
when portfolios are used for which of the following?
A. assigning grades to students
B. formative evaluation of student work
C. interrater reliability
D. reporting achievement to parents
16. Many of the rubric rules for grading performance assessments also apply to grading
portfolios.
A. True
B. False
17. It is best to evaluate portfolios on their appearance.
A. Agree
B. Disagree
18. Which of the following is an advantage of communicating portfolios results with parents?
A. it allows parents to have input into classroom curriculum
B. it usually results in parents liking the teacher better
C. it is used as evidence to retain a given student
D. gives parents insight into what goes on in classrooms
19. Which of the following is a particular issue that educators need to consider when portfolios
are used for group projects?
A. Will the work be best work or work in progress?
B. Will students receive an individual grade or a group grade?
C. Should spelling and grammar count?
D. Are all of the children reading at grade level?
20. Which of the following words best describes portfolios?
A. accidental
B. systematic
C. occasional
D. voluntary
21. Describe the major advantages and limitations of portfolios.
22. Why are portfolios useful tools in communicating student progress to parents?
138
Chapter 12: Answer Key
1. D
2. B
3. C
4. B
5. C
6. A
7. B
8. A
9. C
10. A
11. D
12. B
13. D
14. A
15. C
16. A
17. A
18. D
19. B
20. B.
21. Advantages to portfolios include: They are useful in integrating with student instruction; they
give students an opportunity to show what they can do; they encourage students to become
reflective learners; and they help students take responsibility for setting goals and evaluating
their progress. Other advantages: they provide teachers and students with opportunities to
collaborate and reflect on student progress; they are effective way of communicating with
parents by showing concrete examples of student work and demonstrations of progress; they
provide a mechanism for student-centered and student-directed conferences with parents; and
they give parents concrete examples of a student’s development over time as well as current
skills. Among their limitations are that they are difficult to score, they often suffer from poor
interrater reliability, they are more difficult to construct than they at first appear, and it is
difficult to convert portfolio assessment to summative grades.
22. Portfolios provide an excellent means of communicating with parents. The products and
student self-reflections can provide parents with a window into the classroom. It gives them a
more intimate basis for seeing aspects of their children’s experiences in school. Portfolios
can also be used as a vehicle for student-directed conferences with students, parents, and
teachers. The specifics of the portfolio provide a framework for meaningful three-way
discussions of the student’s achievements, progress, and areas to work on next. Parents’
comments on the specific entries and overall portfolio can also contribute to and become part
of the portfolios.
139
Chapter 13
Assessment Procedures: Observational Techniques, Peer Appraisal, and Self-Report
Exercise 13-A
ANECDOTAL RECORDS
LEARNING GOAL: Distinguishes between desirable and undesirable practices in using
anecdotal records.
Directions: Indicate whether each of the following statements describes a desirable (D) practice
or an undesirable (U) practice in using anecdotal records by circling the appropriate letter.
D U 1. Confine observations to areas that can be verified by objective testing.
D U 2. Keep factual descriptions of incidents and interpretations of them separate.
D U 3. Limit each anecdote to a single incident.
D U 4. Limit each anecdote to the behavior of only one student.
D U 5. Wait until after school hours to record the observed incidents.
D U 6. Record both positive and negative incidents.
LEARNING GOAL: Writes an anecdotal record on an incident.
Directions: Briefly observe some aspect of student performance (e.g., speaking, playing a game)
and write an anecdotal record of the incident.
Note: Answers will vary.
140
Exercise 13-B
USE OF PEER APPRAISAL AND SELF-REPORT TECHNIQUES
LEARNING GOAL: Selects the most appropriate technique for a particular use.
Directions: Indicate which technique is most appropriate for each of the uses listed below by
circling the appropriate letter. Use the key below.
KEY G = “Guess who” technique, S = Sociometric technique,
A = Attitude scale, I = Interest inventory
G S A I 1. To analyze the social structure of a group.
G S A I 2. To aid in selecting reading material for a poor reader.
G S A I 3. To determine the reputation a student holds among his or her classmates.
G S A I 4. To see how accurately students rate peer’s talents and abilities.
G S A I 5. To determine how well a particular student is accepted by his or her
classmates.
G S A I 6. To aid students in career planning.
LEARNING GOAL: States the advantages and disadvantages of peer appraisal and self-report
techniques.
Directions: Briefly state one advantage and one disadvantage of each of the following
techniques.
Peer appraisal:
Self report:
Note: Answers will vary.
141
Exercise 13-C
GUESS WHO TECHNIQUE
LEARNING GOAL: Distinguishes between desirable and undesirable practices in using the
guess who technique.
Directions: Indicate whether each of the following statements describes a desirable (D) practice
or an undesirable (U) practice in using the guess who techniques by circling the appropriate
letter.
D U 1. Use only clearly favorable behavior descriptions.
D U 2. Have students write as many names as they wish for each behavior description.
D U 3. Permit students to name a person for more than one behavior description.
D U 4. Have students respond by using first name and initial of last name.
D U 5. Use the “guess who” technique for evaluation personal and social development
only.
D U 6. Score the responses by counting the number of nominations a student
receives on each behavior description.
LEARNING GOAL: Constructs items for a “guess who” form.
Directions: List six statements that could be used in a “guess who” form for evaluating students
“study and work habits.”
Note: Answers will vary.
142
Exercise 13-D
SOCIOMETRIC TECHNIQUE
LEARNING GOAL: Distinguishes between desirable and undesirable practices in using the
sociometric technique.
Directions: Indicate whether each of the following statements describes a desirable (D) practice
or and undesirable (U) practice in using the sociometric technique by circling the appropriate
letter.
D U 1. Students should identify themselves on sociometric assessment instruments.
D U 2. The situations used in sociometric choosing should be ones in which all students are
equally free to participate.
D U 3. Students should be told to state their first choice only, in order to simplify the
tabulation of results.
D U 4. The plotted sociogram should show the social position of each student and the social
pattern of the group.
D U 5. Each student should be shown, in an individual conference, his or her place on the
sociogram.
D U 6. Sociometric choices should be used to assess the influence of school practices on
students' social relations.
LEARNING GOAL: Constructs items for a sociometric form.
Directions: List three choice situations to be used on a sociometric form. The situations should
be suitable for the grade level at which they will be used. Indicate the grade level. Do not use any
sample items from your textbook.
Note: Answers will vary.
143
Exercise 13-E
ATTITUDE MEASUREMENT
LEARNING GOAL: Distinguishes between desirable and undesirable practices in using a
Likert-type attitude scale.
Directions: Indicate whether each of the following statements describes a desirable (D) practice
or an undesirable (U) practice in using a Likert-type attitude scale by circling the appropriate
letter.
D U 1. Use only clearly favorable and unfavorable attitude statements.
D U 2. Have students write statements for use in the attitude scale.
D U 3. Use seven or more scale choices for each question.
D U 4. Have students respond by indicating how strongly they agree or disagree.
D U 5. Use a group of judges to obtain scoring weights.
D U 6. Have students put their names on the attitude scale.
LEARNING GOAL: Constructs a Likert-type attitude scale.
Directions: List six statements that could be used to measure student attitudes toward testing
according to a Likert-type scale. Include a place to respond and the scoring weights for each
item.
Note: Answers will vary.
144
Answers to Student Exercises
13-A 13-B 13-C 13-D 13-E
1. A 1. S 1. U 1. U 1. D
2. D 2. I 2. D 2. D 2. U
3. D 3. G 3. D 3. U 3. U
4. U 4. G 4. D 4. D 4. D
5. U 5. S 5. U 5. U 5. D
6. D 6. I 6. D 6. D 6. U
145
Chapter 13
Assessment Procedures: Observational Techniques, Peer Appraisal, and Self-Report
1. Anecdotal records can be made most useful by observing a student under which of the
following conditions?
A. at the same time each day
B. for the same amount of time for each observation
C. in various situations
D. under standardized conditions
2. Which of the following is the most serious limitation in the use of anecdotal records in the
classroom?
A. the lack of opportunities to observe
B. the lack of student cooperation
C. possible bias in the observations
D. the need for several observers
3. When writing anecdotal records, educators should try to include an objective record of
students’
A. attitudes.
B. motivation.
C. unique behavior.
D. values.
4. Anecdotal records are best for obtaining information about which of the following student
skills?
A. mathematical
B. social
C. science
D. writing
5. The value of anecdotal records can be improved by recording student behavior
A. after school, when a complete record can be written.
B. in positive terms only.
C. occurring in a variety of situations.
D. on cards instead of a note pad.
6. Which of the following methods is best for highlighting evidence of exceptional or atypical
student behavior on the playground?
A. Anecdotal record
B. Checklist
C. Peer appraisal
D. Self-appraisal
146
7. Peer appraisal methods are especially useful in which of the following areas?
A. attitudes
B. interests
C. performance skills
D. social skills
8. Self-report techniques are most useful when
A. frank responses are given.
B. the items are in question form.
C. they are scored objectively.
D. they are interpreted by counselors.
9. Students make nominations to fit behavior descriptions when using which of the following
techniques?
A. guess who
B. paired-comparison
C. sociometric
D. self-report
10. The results of the guess who technique should be interpreted as evidence of how students are
A. treated by others.
B. viewed by others.
C. feeling about school activities.
D. feeling about themselves.
11. The self-report technique is likely to provide valid evidence when used to assess students'
A. achievement.
B. interests.
C. personal adjustment.
D. personality traits.
12. Which of the following best describes the attitude scale assessment?
A. an objective test
B. a peer-appraisal method
C. a projective test
D. a self-report method
13. A Likert-type attitude scale should include which of the following characteristics?
A. clearly favorable and unfavorable statements
B. questions at progressing difficulty levels
C. questions measuring personal abilities and skills
D. statements covering negative attitudes
147
14. Which of the following statements would be best for a Likert-type attitude scale?
A. Reading helps you study better.
B. Reading is exciting.
C. Reading is one of the basic skills.
D. Some students like to read.
15. Which of the following numerical choices should a Likert scale possess?
A. 1–2
B. 2–3
C. 3–5.
D. 7 or more
16. Which of the following is assumed in a student self-assessment?
A. objectivity
B. telling the truth
C. writing ability
D. long-term memory
17. A student's response to each set of three items on a Kuder General Interest Survey indicates
which of the following results?
A. disliking of each item
B. liking, indifference, or disliking of each item
C. liking of each set of 3 items in comparison to other sets
D. ranking of items in each set
18. The routine use of personality inventories in the school has declined primarily because of
which of the following factors?
A. difficulty of scoring
B. invasion of privacy issues
C. unreliability of the scores
D. wider use of projective techniques
19. It takes special training to appropriately administer and score projective personality
assessments.
True
False
20. Projective personality assessments are usually assessed with a Likert scale.
Agree
Disagree
21. Describe the advantages and limitations for using anecdotal records.
22. Describe the characteristics of a Likert scale. For what purpose might it be used?
148
Chapter 13: Answer Key
1. C
2. C
3. C
4. B
5. C
6. A
7. D
8. A
9.A
10. B
11. B
12. D
13. A
14. B
15. C
16. B
17. D
18. B
19. True
20. Disagree
21. Probably the most important advantage of anecdotal records is that they depict actual
behavior in natural situations. Anecdotal records also allow for descriptions of the most
characteristic behavior of a student, and they facilitate gathering evidence on events that are
exceptional but significant. Anecdotal records can also be used with very young students and
with students who have limited basic communication skills. One limitation of anecdotal
records is the amount of time required to maintain an adequate system of records. Another
serious limitation is the difficulty of being objective when observing and reporting student
behavior. A third difficulty is obtaining an adequate sample of behavior, which can
negatively impact validity.
22. A Likert scale is a self-report method giving clearly favorable or unfavorable attitude
statements; it asks the students to respond to each statement. Most Likert scales use a five-
point system: strongly agree (SA), agree (A), undecided (U), disagree (D), and strongly
disagree (SD). Likert scales are most commonly used to assess attitudes.
149
Chapter 14
Assembling, Administering, and Appraising Classroom Tests and Assessments
Exercise 14-A
REVIEWING AND ARRANGING ITEMS AND TASKS IN CLASSROOM TESTS AND
ASSESSMENTS
LEARNING GOAL: Distinguishes between good and bad practices in reviewing and arranging
test items and assessment tasks.
Directions: Indicate whether each of the following statements describes a good (G) practice or a
bad (B) practice in reviewing items and tasks and arranging them in classroom tests and
assessments by circling the appropriate letter.
G B 1. Have another teacher review the items and tasks for defects.
G B 2. Recheck relevance to the specific learning outcome when reviewing an item or
task.
G B 3. During item and task review, remove any racial or sexual stereotyping.
G B 4. Group items by type (e.g. multiple choice, true false etc,).
G B 5. Intersperse true-false items among multiple-choice items.
G B 6. Put easy items last to maintain student motivation.
LEARNING GOAL: Prepares a list of points for reviewing and arranging test items and
assessment tasks.
Directions: Make a list of four do’s and four don’ts to serve as a guide for reviewing and
arranging test items and assessment tasks.
DO
1.
2.
3.
4.
DON'T
1.
2.
3.
4.
Note: Answers will vary.
150
Exercise 14-B
PREPARING TEST DIRECTIONS
LEARNING GOAL: Prepares sample directions for a classroom test.
Directions: Prepare a complete set of directions for a test in a specific subject. Assume that the
general directions for the test as a whole and the specific directions for each item type are all for
the same test.
General directions:
Short-answer items:
True-false items:
Multiple-choice items:
Matching items:
Note: Answers will vary.
151
Exercise 14-C
ADMINISTERING AND SCORING CLASSROOM TESTS AND ASSESSMENTS
LEARNING GOAL: Distinguishes between good and bad practice in administering and scoring
classroom tests and assessments.
Directions: Indicate whether each of the following statements describes a good (G) practice or a
bad (B) practice in administering and scoring classroom tests and assessments by circling the
appropriate letter.
G B 1. Students are told whether there is a correction for guessing.
G B 2. Students answer every item, so their scores are corrected for guessing.
G B 3. Students are told repeatedly how important this test is to their grade.
G B 4. Students are told to skip items that seem too difficult and come back to them later.
G B 5. The teacher explains the meaning of an ambiguous question to the student who
asked about it.
G B 6. An objective test is scored by counting important items 1 point and very
important items 2 points.
LEARNING GOAL: Describes and illustrates the use of the correction-for-guessing formula.
Directions: Describe when the correction-for-guessing formula should and should not be used
for classroom tests and compare the corrected scores for the given data.
Use the correction formula when:
Do not use the correction formula when:
Compute the corrected scores on an eight-item, true-false test for the student responses shown
below:
(R = Right, W = Wrong, O = Omit)
Corrected
1 2 3 4 5 6 7 8 Score
Bob R R R R R R O O
Sara R R R R R W R W
Terry R R R W O R W W
152
Exercise 14-D
APPLICATION OF ITEM ANALYSIS PRINCIPLES TO PERFORMANCE-BASED
ASSESSMENT TASKS
LEARNING GOAL: Applies and interprets item analysis principles with performance-based
assessment tasks.
Directions: A set of eight performance-based assessment tasks were administered to a group of
30 students. Each task was scored on a five-point scale. The total score for the assessment was
the sum of the eight task scores. Total scores for the assessment and scores on the last task of the
assessment are listed below. The scores are listed in order of the total score. Use these data to
analyze the discriminating power of last task by comparing the performance of the upper and
lower groups of ten students.
Student 1 2 3 4 5 6 7 8 9 10
Total
Score
36 35 35 34 33 33 33 32 32 32
Score
on
Item 8
5 5 4 5 4 4 4 3 5 3
Student 11 12 13 14 15 16 17 18 19 20
Total
Score
31 30 30 30 29 29 29 29 28 28
Score
on
Item 8
4 4 3 3 4 3 3 3 3 2
Student 21 22 23 24 25 26 27 28 29 30
Total
Score
27 27 26 25 23 23 21 20 18 16
Score
on
Item 8
4 3 3 3 3 2 3 2 1 1
Construct an analysis table for Item 8
153
Answers to Student Exercises
14-A 14-C 1. G 1. G
2. G 2. B
3. G 3. G
4. G 4. G
5. B 5. G
6. B 6. B
14-D
Score 1 2 3 4 5
Upper 0 0 2 4 4
Lower 2 2 5 1 0
154
Chapter 14
Assembling, Administering, and Appraising Classroom Tests and Assessments
1. A fellow teacher’s review of test items and assessment tasks is helpful in
A. identifying the objectives measured.
B. improving clarity.
C. improving difficulty.
D. relating items and tasks to instruction.
2. When arranging items in a test, it is best to ensure which of the following?
A. item types are mixed within each section
B. essay questions are placed last
C. difficult items are placed first
D. items are placed randomly
3. If test directions instruct students to “answer every item,” it is not recommended that
educators
A. compute item difficulty.
B. compute item discrimination.
C. correct for guessing.
D. determine split-half reliability.
4. Which of the following is a desirable procedure for reducing student cheating on a test?
A. Correct the test for guessing
B. Do not permit questions during testing
C. Have students turn in all scratch paper
D. Allow students to use class notes
5. The correction-for-guessing formula assumes that student guesses are based on which of the
following?
A. blind choosing
B. incorrect information
C. partial information
D. testwiseness
6. A multiple-choice test contains 100 items, each having a correct answer and three distracters.
Which of the following would be the corrected score for 70 correct and 24 incorrect on the
test using the correction-for-guessing formula?
A. 58
B. 60
C. 62
D. 64
155
7. The effect of guessing on scores can be best be reduced on a multiple-choice test by increasing
which of the following?
A. complexity of the items
B. number of alternatives
C. objectivity of the items
D. use of interpretive exercises
8. A test item has positive discriminating power when answered correctly by
A. all students.
B. more high-scoring students.
C. more low-scoring students.
D. average students.
9. On a test of 50 students, if 25 students answered an item correctly, the item difficulty is
A. 25%.
B. 50%.
C. 75%.
D. 100%.
10. If item analysis data showed that an item was answered correctly by 8 out of 10 students in
the upper group and 6 out of 10 students in the lower group, the difficulty of the test item is
A. 20%.
B. 60%.
C. 70%.
D. 80%.
11. Low discriminating power is acceptable, but only if the item
A. has a 50% level of difficulty.
B. has only three alternatives.
C. is closely related to other items.
D. measures a unique learning outcome.
12. Which of the following is a major limitation of using item analysis procedures with the
typical classroom test?
A. the complexity of the computations
B. the difficulty of the interpretations
C. the small number of students
D. the time needed to collect the data
13. If 8 out of 10 students in the upper group and 2 out of 10 students in the lower group answer
an item correctly, then the difficulty and discriminating power of the item would be
A. 50%, .60.
B. 50%, .80.
C. 60%, .60.
D. 60%, .80.
156
14. A distracter in a multiple-choice item is judged good if it attracts more students who have
A. cheated on the test.
B. obtained high scores.
C. obtained low scores.
D. marked the items carelessly.
15. The 10 students with the highest scores on a set of 8 assessment tasks had the following
distribution of scores on one of the assessment tasks that was scored on a 3-point scale: 1 (1
student), 2 (5 students), 3 (4 students). Which of the following distributions of scores for the
10 lowest scoring students indicates that the task discriminated negatively?
A. 1 (3 students), 2 (4 students), 3 (3 students)
B. 1 (5 students), 2 (5 students), 3 (0 students)
C. 1 (9 students), 2 (1 student), 3 (0 students)
D. 1 (0 students), 2 (3 students), 3 (7 students)
16. If a test is composed of items that all have high discrimination indexes (based on the total test
score), the test is said to also have
A. content relevance.
B. difficulty.
C. high reliability.
D. high predictive power.
17. Item discriminating power should typically not be interpreted as item validity because
A. item analysis usually is based on a partial sample..
B. item analysis usually uses an internal criterion.
C. item validity is also based on item difficulty.
D. defects in test items lower validity.
18. In a 100-item criterion-referenced test, how many items should have zero difficulty at the end
of instruction?
A. none
B. 10
C. 50
D. 100
157
19. An item analysis yields the following results for a multiple-choice item (alternative A is the
correct answer):
Number of students in the upper and lower scoring groups on the total test choosing each
alternative.
Alternative A* B C D Omit Total
Upper Group 3 1 5 1 0 10
Lower Group 6 2 0 1 1 10
Which one of the following statements is NOT justified by these results?
A. The item should be reviewed for possible miskeying.
B. The item has negative discrimination.
C. The item has a difficulty of 45%.
D. The item is invalid.
20. It is probably a sound motivational procedure for the teacher to announce that an upcoming
test is high stakes for students.
True
False
21. One good method for discouraging cheating is to proctor an exam by moving around the
room.
Agree
Disagree
22. If paper costs are a factor, educators can reduce the print size of test questions.
True
False
23. It is good practice to not break up a test item and continue it on the next page.
Agree
Disagree
24. It is good practice to have a colleague check test items for bias.
True
False
158
Chapter 14: Answer Key
1. B
2, B
3. C
4. C
5. A
6. C
7. B
8. B
9. B
10. C
11. D
12. C
13. A
14. C
15. D
16. C
17. B
18. A
19. D
20. True
21. Agree
22. False
23. Agree
24. True
159
Chapter 15 Grading and Reporting
Exercise 15-A
TYPES OF MARKING AND REPORTING SYSTEMS
LEARNING GOAL: Distinguishes among the characteristics of different types of marking and
reporting systems.
Directions: Indicate which type of marking and reporting system best fits each statement listed
below by circling the appropriate letter, using the following key.
KEY: A = traditional letter grade (A, B, C, D, F), B = Two-letter grade (pass, fail),
C = Checklist of objectives, D = Parent-teacher conference.
A B C D 1. Provides for two-way reporting.
A B C D 2. Provides most useful learning guide to student.
A B C D 3. Provides least information concerning learning.
A B C D 4. Most preferred by college admissions officers.
A B C D 5. May be too complex to be understood by parents.
A B C D 6. Most widely used method of reporting in high school.
LEARNING GOAL: Lists the advantages and disadvantages of the traditional (A, B, C, D, F)
marking system.
Directions: List the advantages and disadvantages of using the traditional (A, B, C, D, F)
marking system as the sole method of reporting student progress.
Advantages:
Disadvantages:
Note: Answers will vary.
160
Exercise 15-B
ASSIGNING RELATIVE LETTER GRADES
LEARNING GOAL: Distinguishes between desirable and undesirable practices in assigning
relative letter grades.
Directions: Indicate whether each of the following statements describes a desirable (D) practice
or an undesirable (U) practice in assigning relative letter grades by circling the appropriate
letter.
D U 1. The grades should reflect the learning outcomes specified for the course.
D U 2. To give test scores equal weight in a composite score, the scores should be
simply added together.
D U 3. If you decide to assign different weights to some scores, the weighting should be
based on the maximum possible score on the test.
D U 4. Grades should be lowered for tardiness or misbehavior.
D U 5. Grading typically should be based on the normal curve.
D U 6. Pass-fail decisions should be based on an absolute standard of achievement.
LEARNING GOAL: Assigns weights in obtaining composite scores for grading purposes.
Directions: Following is a list of types of information a teacher would like to include in
assigning a final grade to each student. If the teacher wants to count each type of information
one-fourth of the final grade, what weight should be given to each type of information?
Type of Information Range of Scores Weight to Be Used
_______________________ _______________ _________________
Midsemester examination 30 to 50
Term project 5 to 10
Performance assessments 15 to 25
Final examination 20 to 100
Note: Answers will vary.
161
Exercise 15-C
ASSIGNING ABSOLUTE GRADES
LEARNING GOAL: Distinguishes between desirable and undesirable practices in assigning
absolute grades.
Directions: Indicate whether each of the following statements describes a desirable (D) practice
or an undesirable (U) practice in assigning absolute letter grades by circling the appropriate
letter.
D U 1. Absolute grades should be used with mastery learning.
D U 2. Clearly defined domains of learning tasks should provide the basis for
grading.
D U 3. If all students pass a test, a harder test should be given before grades are
assigned.
D U 4. The distribution of grades to be assigned should be predetermined and
explained.
D U 5. Criterion referenced grades may be assigned as pass/fail.
D U 6. When you are using absolute grading, the standard for passing should be
predetermined.
LEARNING GOAL: Lists guidelines for effective grading.
Directions: List five important guidelines for effective grading.
1.
2.
3.
4.
5.
Note: Answers will vary.
162
Exercise 15-D
PARENT-TEACHER CONFERENCE
LEARNING GOAL: Distinguishes between desirable and undesirable practices in conducting a
parent-teacher conference.
Directions: Indicate whether each of the following statements describes a desirable (D) practice
or an undesirable (U) practice in conducting parent-teacher conferences by circling the
appropriate letter.
D U 1. Before the conference, assemble a portfolio of specific information about and
examples of the student's learning progress.
D U 2. Present examples of the student’s work to parents
D U 3. Begin the conference by describing the student’s learning difficulties.
D U 4. Make clear to parents that, as a teacher, you know what is best for the
student's learning and development.
D U 5. In the concluding phase, review your conference notes with the parents.
D U 6. End the conference with a positive comment about the student.
LEARNING GOAL: Lists questions that might be asked of parents during the conference.
Directions: Write a list of questions that you could ask parents during parent-teacher conferences
that might help you better understand students’ problems regarding learning and development.
Note: Answers will vary.
163
Exercise 15-E
REPORTING RESULTS OF PUBLISHED TESTS TO PARENTS
LEARNING GOAL: Distinguishes between desirable and undesirable practices in reporting
results of published tests to parents.
Directions: Indicate whether each of the following statements describes a desirable (D) practice
or an undesirable (U) practice in reporting results of published tests to parents by circling the
appropriate letter.
D U 1. Describe what the test measures in brief, understandable terms.
D U 2. Make clear the distinction between percentile rank and percentage-correct scores.
D U 3. Use grade-equivalent scores to indicate the grade at which the student can
perform.
D U 4. Give explanations without using jargon whenever possible.
D U 5. Describe a difference between two test scores as a “real difference” only after the
error of measurement is considered.
D U 6. Explain how the test results will be used only if the parent asks.
LEARNING GOAL: Identifies and corrects common errors in reporting standardized test results.
Directions: Indicate what is wrong with each of the following statements and rewrite each one so
that it provides an accurate report.
1. “Derek’s percentile rank of 70 in spelling means he can spell 70 percent of the words in
the test.”
2. “Marie’s stanine score of 6 in reading indicates she is performing below average in
reading.”
3. “Erik’s grade-equivalent scores of 5.4 in reading and 6.2 in math indicate that his
performance in math is superior to his performance in reading.”
Note: Answers will vary.
164
Answers to Student Exercises
15-A 15-B 15-C 15-D 15-E
1. D 1. U 1. D 1. D 1. D
2. C 2. D 2. D 2. D 2. D
3. B 3. U 3. D 3. D 3. D
4. A 4. U 4. U 4. D 4. D
5. C 5. U 5. D 5. D 5. D
6. A 6. D 6. D 6. U 6. U
165
Chapter 15
Grading and Reporting
1. The main purpose of a marking and reporting system should be to accomplish which of the
following?
A. improve student learning
B. inform parents about students’ school progress
C. maintain effective school records
D. provide evidence of achievement for colleges and employers
2. A serious limitation of reporting progress with a single letter grade only is that letter grades
A. are disliked by administrators.
B. are difficult to average.
C. include too many different elements.
D. tend to be limited to achievement.
3. When a letter grade (A, B, C, D, F) is used to report student progress, the grade should be
based on which of the following?
A. achievement
B. effort
C. attitude
D. behavior
4. A school’s marking and reporting system should be based on which of the following?
A. estimates of students’ learning ability
B. fixed percentages of grades
C. instructional objectives
D. a normal curve
5. An effective grading and reporting system is based on which of the following?
A. adequate assessment of students
B. estimates of each student's learning potential
C. the normal curve
D. the use of at least five test scores
6. Which of the following methods is most useful for overcoming learning difficulties?
A. Checklist of objectives
B. Informal letter to parents
C. Pass-fail system
D. Single letter grade
166
7. Assigning grades that represent a pure measure of achievement is most feasible with which
of the following systems?
A. multiple marking
B. pass-fail
C. satisfactory-unsatisfactory
D. single letter grade
8. Which of the following is a major disadvantage of using the pass-fail system?
A. it tends to lower the grade-point average
B. it doesn’t assess graduations of learning
C. students take courses they are unable to pass
D. teachers find grading more difficult
9. One advantage of the pass-fail grading system in elective courses is that it
A. encourages students to explore new areas of study.
B. helps students improve their grade-point average.
C. makes criterion-referenced grading possible.
D. motivates students to study harder.
10. Which of the following is the most serious limitation of the traditional letter grade as a means
of reporting student progress?
A. colleges prefer more detailed reports
B. schools have not agreed upon a common set of letters
C. they are limited to academic learning outcomes
D. they lack common meaning from one teacher to another
11. When weighting sets of test scores to obtain a composite score for assigning grades, the
weighting should be based on which of the following?
A. average score on each test
B. right score on each test
C. number of items on each test
D. spread of scores on each test
12. Absolute grading would require information concerning a student’s
A. growth in achievement.
B. level of performance.
C. performance in relation to learning ability.
D. rank in the group.
13. Relative grading involves comparing a student’s achievement to which of the following?
A. a set of norms
B. the student’s learning ability
C. the student’s past performance
D. the achievement levels of the other students
167
14. Mastery learning would most likely require which of the following types of grading?
A. absolute
B. relative
C. curved
D. stratified
15. The distribution of letter grades (A, B, C, D, F) to be assigned in relative grading should be
determined by which of the following?
A. school aides and parents
B. class averages
C. teacher and administrator agreement
D. the percentages in a normal distribution
16. One advantage of a parent-teacher conference as a reporting method is
A. its ease of use.
B. its flexibility.
C. the systematic record it provides.
D. the time it saves in preparing written reports.
17. During parent-teacher conferences, which of the following actions should be avoided?
A. beginning by discussing the student’s weaknesses
B. interruptions by the parent during the teacher’s report
C. listening to parents’ complaints about school
D. telling the parents anything negative about the child
18. Near the end of a parent-teacher conference, it is most important for the teacher to
A. clarify the student’s shortcomings.
B. plan how the conference can be ended on time.
C. summarize and plan a course of action.
D. tell the parents what you they are expected to do next.
19. When reporting standardized test results to parents, the explanation should be
A. complete and detailed.
B. kept separate from other assessment information.
C. presented in simple terms.
D. repeated for clarity.
20. Which of the following is the most useful score for reporting standardized test results to
parents?
A. NCE score
B. percentile rank
C. raw score
D. T-score
168
21. Grades such as T-scores are based on variability.
True
False
22. Sending letters home to the parent or guardian as a report card assumes that the adult is able
to understand the vocabulary on the report.
Agree
Disagree
23. It is appropriate to collapse achievement and effort into a single letter grade.
True
False
24. Record keeping of student grades may be streamlined by using computer spreadsheets.
Agree
Disagree
25. List and discuss three major limitations to letter grading systems.
26. Under what circumstances are letters to parents useful in regards to reporting student grades?
Should this system be the sole method of reporting grades? Why or why not?
169
Chapter 15: Answer Key
1. A
2. C
3. A
4. C
5. A
6. A
7. A
8. B
9. A
10. D
11. D
12. B
13. D
14. A
15. C
16. B
17. A
18. C
19. C
20. B
21. True
22. Agree
23. True
24. Agree
25. There are three major limitations to traditional letter grades. First, they typically represent a
combination of achievement, effort, work habits, and good behavior. Second, the proportion
of students assigned each letter grade varies from teacher to teacher. Third, they do not
indicate a student’s specific strengths and weaknesses in learning. In order to reduce the
effects of these limitations, grades for effort should be eliminated. If such effort scores are to
be recorded, they should exist as a separate grade from letter grades for achievement. Next, a
standard school- or district-wide standard should exist for what constitutes a given letter
grade. Finally, letter grades should be accompanied by another grading system, such as
checklists, that outlines students’ strengths and weaknesses
26. Some schools have turned to the use of letters to provide for greater flexibility in reporting
student progress to parents (or guardians). Letters make it possible to report on the unique
strengths, weaknesses, and learning needs of each student and to suggest specific plans for
improvement. In addition, the letter/report can include as much detail as needed to make
clear the student’s progress in all areas of development. However, these letters should not be
the sole method of grading and if used, should be combined with other methods such as letter
grades.
170
Chapter 16 Achievement Tests
Exercise 16-A
STANDARDIZED ACHIEVEMENT TESTS VERSUS INFORMAL CLASSROOM
TESTS
LEARNING GOAL: Identifies the comparative advantages of standardized and informal
classroom tests for measuring student achievement.
Directions: Indicate whether each of the following statements best describes a standardized
achievement test (S) or an informal classroom test (C) by circling the appropriate letter.
S C 1. Likely to be more relevant to a teacher's instructional objectives.
S C 2. Likely to provide more reliable test scores.
S C 3. Technical quality of test items is consistently high.
S C 4. Most useful in formative assessment.
S C 5. Typically provides the larger spread of scores.
S C 6. Best for use in rapidly changing content areas.
LEARNING GOAL: States a major advantage and limitation of standardized achievement tests.
Directions: Briefly state one major advantage and one major limitation of standardized
achievement tests.
Advantage:
Limitation:
Note: Answers will vary.
171
Exercise 16-B
USE OF PUBLISHED ACHIEVEMENT TEST BATTERIES
LEARNING GOAL: Selects the most appropriate type of achievement test battery for a
particular use.
Directions: Indicate which type of test is most useful for each of the following testing purposes
by circling the appropriate letter using the following key.
KEY S = Survey achievement test battery
D = Diagnostic achievement test battery
S D 1. To compare schools on basic skill development.
S D 2. To describe the specific skills a student has yet to learn in reading.
S D 3. To measure achievement in science and social studies.
S D 4. To detect specific weaknesses in adding fractions.
S D 5. To determine how a fifth-grade class compares to other fifth grade classes in
reading.
S D 6. To determine mastery of particular language skills.
LEARNING GOAL: States a major advantage and limitation of achievement test batteries.
Directions: Briefly state one major advantage and one major limitation of achievement test
batteries of the survey type.
Advantage:
Limitation:
Note: Answers will vary.
172
Exercise 16-C
COMPARISON OF READING READINESS TESTS AND READING SURVEY TESTS
LEARNING GOAL: Identifies the functions measured by different types of reading tests.
Directions: Indicate whether each of the functions listed below is measured by reading readiness
tests (R), by reading survey tests (S), by both (B), or by neither (N), by circling the appropriate
letter.
R S B N 1. Auditory discrimination.
R S B N 2. Comprehension of the meaning of words.
R S B N 3. Ability to draw inferences.
R S B N 4. Ability to read maps.
R S B N 5. Rate of reading.
R S B N 6. Attitude toward reading.
LEARNING GOAL: Compares reading tests.
Directions: Compare two reading survey tests (or readiness tests) for a particular grade level. (1)
Briefly describe how the two tests differ, and (2) indicate which one you would prefer to use for
a particular purpose and why.
Note: Answers will vary.
173
Exercise 16-D
COMPARISON OF STANDARDIZED AND CUSTOMIZED ACHIEVEMENT TESTS
LEARNING GOAL: Distinguishes among the characteristics of different types of achievement
tests.
Directions: Indicate which type of test best fits each feature listed below by circling the
appropriate letter, using the following key.
KEY: S = Standardized achievement tests
C = Customized achievement tests
S C 1. Are most useful to the classroom teacher.
S C 2. Have the greatest need for adequate norms.
S C 3. Most adaptable to changing conditions.
S C 4. Most likely to have some content the students have not studied.
S C 5. Best for making criterion-referenced interpretations.
S C 6. Likely to provide the most valid measure of local instructional objectives.
LEARNING GOAL: Describes the procedure for producing customized achievement tests.
Directions: List and briefly describe the procedural steps to follow in producing locally prepared
customized achievement tests.
Note: Answers will vary.
174
Exercise 16-E
SELECTING PUBLISHED ACHIEVEMENT TESTS
LEARNING GOAL: Selects the type of test that is most appropriate for a particular purpose.
Directions: For each of the following statements, indicate which type of test would be used by
circling the appropriate letter using the following key.
KEY: A = Achievement Test Battery, B = Separate Test of Content,
C = Customized Achievement Test, D = Individual Achievement Test
A B C D 1. To test a student’s mastery of classroom objectives.
A B C D 2. To test a student who has a learning disability.
A B C D 3. To compare a student's performance in reading and mathematics.
A B C D 4. To test students at the end of each unit.
A B C D 5. Give a science test to a child who has difficulty in reading.
A B C D 6. To measure student progress from one grade level to the next.
LEARNING GOAL: Compares the usefulness of customized tests and standardized tests.
Directions: State one advantage and one disadvantage of using a customized test instead of a
standardized test to measure student achievement.
Advantage:
Disadvantage:
Note: Answers will vary.
175
Answers to Student Exercises
16-A 16-B 16-C 16-D 16-E
1. C 1. S 1. R 1. C 1. C
2. S 2. D 2. R 2. S 2. D
3. S 3. S 3. S 3. C 3. A
4. C 4. D 4. S 4. S 4. D
5. S 5. S 5. S 5. C 5. D
6. C 6. S 6. N 6. C 6. A
176
Chapter 16
Achievement Tests
1. Which of the following should be the first consideration when selecting a published
achievement test?
A. cost
B. interpretability
C. reliability
D. validity
2. Which of the following factors should be given the most weight in selecting a published
achievement test battery?
A. Cost of administration and scoring.
B. Equivalence of the various forms.
C. Relevance to local objectives.
D. Reliability of the tests and subtests.
3. The validity of a published test used to measure student achievement at the end of a specific
school science course can best be determined by
A. curriculum experts in science.
B. test experts with a science background.
C. the teacher of the course.
D. the test publisher.
4. A standardized achievement test differs most from a teacher-made objective test in which of
the following areas?
A. Arrangement of question types.
B. Known reliability of items.
C. Objectivity of scoring.
D. Level of difficulty.
5. One advantage of a teacher-made test over a standardized achievement test is that the
teacher-made test has greater
A. interpretability.
B. objectivity.
C. relevance.
D. reliability.
6. One advantage of a standardized test over a teacher-made test is that the standardized test has
greater
A. known technical quality.
B. flexibility.
C. overall objectivity.
D. relevance.
177
7. The validity of a standardized achievement test to be selected for classroom use can best be
determined by examining the test’s
A. directions.
B. items.
C. norms.
D. reliability.
8. The main advantage of using an achievement test battery, instead of a series of separate
achievement tests covering the same areas, is that the subtests in the achievement test battery
have
A. comparable norms.
B. higher reliability.
C. higher validity.
D. more items.
9. Which of the following should be determined first when evaluating a standardized
achievement test battery?
A. how reliability is reported for the tests
B. the content that the tests measure
C. whether or not comparable forms are available
D. whether the norms are adequate
10. An essential characteristic of a test battery is that each test can be interpreted in terms of the
same
A. content.
B. norm group.
C. objectives.
D. type of item.
11. A standardized achievement test is best used for which of the following purposes?
A. assigning grades
B. comparing achievement in several schools
C. evaluating a school's objectives
D. tabulating the achievements of each student in a classroom
12. Achievement test batteries are not used as often at the high school level compared to the
elementary school level for which of the following reasons?
A. because course content varies more at the high school level
B. high school courses are more difficult
C. teachers have greater test construction skills at the high school level
D. teachers in high school assign more homework
178
13. When making criterion-referenced interpretations of standardized achievement test results,
educators should pay special attention to the number of items included in
A. each item cluster.
B. each subtest.
C. the total test.
D. the criterion used for the standard.
14. A survey achievement test battery would be least useful for determining a student's
A. achievement in different areas.
B. progress from year to year.
C. relative level of performance.
D. specific learning weaknesses.
15. One advantage of a diagnostic achievement test over a survey battery is that a diagnostic
achievement test includes
A. better norms.
B. clearer directions.
C. more items.
D. simpler scoring.
16. A reading readiness test is best used to identify which of the following?
A. pre-requisite skills in students
B. children who are nonreaders
C. students with visual defects
D. academically gifted children
17. Reading readiness tests place major emphasis on which of the following skills?
A. finger dexterity and motor
B. eye movements and motivation
C. recognition and discrimination
D. social and emotional adjustment
18. Achievement tests of the survey type are not very effective for diagnosing learning problems
because of inadequate
A. sampling.
B. selection of norm groups.
C. standardization.
D. preparation.
19. One advantage of customized achievement tests over standardized tests is that customized
achievement tests
A. are more readily adapted.
B. are based on more adequate norms.
C. contain more adequate rules for administration.
D. provide more reliable test scores.
179
20. Which of the following is likely to produce the most valid measure of classroom learning?
A. Achievement battery.
B. Locally prepared customized test.
C. Publisher-prepared customized test.
D. Single-content standardized achievement test.
21. An individual achievement test is usually used with children who have disabilities.
True
False
22. When giving a standardized achievement test in science, a teacher should be aware of any
gaps between the reading level of the tests and students’ reading problems.
Agree
Disagree
23. One problem in using a customized test bank is the issues of low reliability.
True
False
24. Standardized test scores usually leave rules of test administration up to the teacher.
Agree
Disagree
25. Discuss the major differences between a standardized test battery and a single content
standardized test? Discuss two advantages and two limitations of each.
26. In what types of situations would a standardized test battery, a single-content standardized
test, a diagnostic teacher-made test, and a standardized individual achievement test be used?
180
Chapter 16: Answer Key
1. D
2. C
3. C
4. B
5. C
6. A
7. B
8. A
9. B
10. B
11. B
12. A
13. A
14. D
15. C
16. A
17. C
18. A
19. A
20. D
21. True
22. Agree
23. True
24. Disagree
25. Standardized test batteries measure several curricular areas with the same battery For
example, a battery might measure reading, mathematics written expression and spelling. A
single-content standardized test measures only one content area (e.g. reading). An advantage
of the test battery is that a teacher does not have to purchase separate tests for each content
area tested. Another advantage is that the norm group is comparable in a battery through
assessment of the curricular areas. A disadvantage is that, because it asks fewer questions at
each content area, it may be less reliable and diagnostic than the single-content test.
Conversely, norm groups are not comparable across single content tests and hence
meaningful comparisons cannot be made across content areas. However, single content area
tests are more useful for diagnostic purposes.
26. A standardized test battery is probably the best when comparable results are needed across
subject areas for given schools or school districts within a state. Single content tests are best
when more diagnostic information is needed and when teachers wish to see if specific
content has been mastered in a given content area. Individual standardized diagnostic tests
are usually used to test the achievement of poor readers or students with disabilities.
181
Chapter 17 Aptitude Tests
Exercise 17-A
COMPARISON OF APTITUDE AND ACHIEVEMENT TESTS
LEARNING GOAL: Identifies the similarities and differences in the characteristics of aptitude
and achievement tests.
Directions: Indicate whether each of the following statements is characteristic of an aptitude (P)
test, an achievement (C) test, or both (B) types of tests by circling the appropriate letter.
P C B 1. Measures learned ability.
P C B 2. Useful in predicting future achievement.
P C B 3. Content-related evidence of validity is emphasized.
P C B 4. Criterion-related evidence of validity is emphasized.
P C B 5. Can be used in grades from kindergarten through grade 12.
P C B 6. Emphasizes reasoning abilities.
LEARNING GOAL: Lists the major differences between aptitude and achievement tests.
Directions: List the major differences between aptitude tests and achievement tests.
Note: Answers will vary.
182
Exercise 17-B
GROUP TESTS OF LEARNING ABILITY
LEARNING GOAL: Identifies the types of scores provided by selected group tests.
Directions: Indicate the types of scores provided by each of the group tests listed below by
circling the appropriate letter using the following key:
KEY: A = single score, B = verbal and quantitative scores only,
C = verbal, nonverbal and total scores only,
D = verbal, quantitative, and nonverbal scores, E = more than three scores.
A B C D E 1. Cognitive Abilities Test.
A B C D E 2. Differential Aptitude Tests.
A B C D E 3. Matrix Analogies Test.
A B C D E 4. Otis-Lennon School Ability Test.
LEARNING GOAL: States advantages and disadvantages of using different types of learning
ability tests.
Directions: Briefly state one advantage and one disadvantage of each type of group test of
learning ability.
Single score.
Separate-scores (verbal, nonverbal, quantitative).
Note: Answers will vary.
183
Exercise 17-C
INDIVIDUAL TESTS
LEARNING GOAL: Identifies the similarities and differences in the characteristics of individual
tests.
Directions: Indicate whether each of the following statements is characteristic of the Stanford-
Binet Intelligence Scale (S), the Wechsler Intelligence Scales-Revised (W), or both (B).
S W B 1. Uses a variety of item types.
S W B 2. Items are arranged by subtest.
S W B 3. Includes a vocabulary test.
S W B 4. Provides separate verbal and performance IQs.
S W B 5. Scores are reported in Standard Age Scores.
S W B 6. Provides total scores and scores on subtests.
LEARNING GOAL: List conditions that might lower scores on tests of learning abilities.
Directions: List five conditions that might lower a student's score on a test of learning ability.
1.
2.
3.
4.
5.
Note: Answers will vary.
184
Exercise 17-D
DIFFERENTIAL APTITUDE TESTING
LEARNING GOAL: Identifies the characteristics of the Differential Aptitude Tests (DAT).
Directions: Indicate whether each of the following statements is characteristic of the Differential
Aptitude Tests (DAT) by circling yes (if it is) and no (if it is not).
Yes No 1. The DAT would be classified as a test battery.
Yes No 2. The intercorrelations between subtests on the DAT are high (average about .90).
Yes No 3. Some of the DAT subtests measure abilities like those measured by group
scholastic aptitude tests.
Yes No 4. The DAT profile indicates scores in terms of percentile rank.
Yes No 5. The DAT can be administered “adaptively.”
Yes No 6. The eight tests on the DAT are speed tests.
LEARNING GOAL: States a major advantage and limitation of the Differential Aptitude Tests.
Directions: Briefly state one major advantage and one major limitation of using the Differential
Aptitude Tests instead of a series of separate tests from different publishers.
Advantage of DAT:
Limitation of DAT:
Note: Answers will vary.
185
Exercise 17-E
SELECTING APPROPRIATE TESTS
LEARNING GOAL: Selects the type of test that is most appropriate for a particular use.
Directions: For each of the following purposes indicate which type of test should be used by
circling the appropriate letter using the following key:
KEY: G = Group test of learning ability, I = Individual test of learning ability,
D = Differential aptitude tests.
G I D 1. To test a preschool child.
G I D 2. To test a fourth-grade student who is unable to speak.
G I D 3. To test a sixth-grade student who has a severe learning disability.
G I D 4. To assist a tenth-grade student with career planning.
G I D 5. To aid in forming learning groups within the classroom.
G I D 6. To aid in planning an individual program for students with severe learning
disabilities.
LEARNING GOAL: Compares the usefulness of culture-fair test and conventional test of
learning ability.
Directions: State one advantage and one disadvantage of using a culture-fair test instead of a
conventional learning ability test for testing students from disadvantaged homes.
Advantage:
Disadvantage:
Note: Answers will vary.
186
Answers to Student Exercises
17-A 17-B 17-C 17-D 17-E
1. B 1. D 1. B 1. Y 1. I
2. B 2. E 2. B 2. N 2. I
3. C 3. A 3. B 3. Y 3. I
4. P 4. C 4. W 4. Y 4. G
5. B 5. S 5. Y 5. D
6. P 6. B 6. N 6. D
187
Chapter 17
Aptitude Tests
1. Tests of learning ability differ from published achievement tests in that tests of learning
ability
A. are useful in predicting future achievement.
B. depend less on specific school learning.
C. measure school objectives more effectively.
D. provide norms for score interpretation.
2. One advantage of a learning ability test over an achievement test for predicting achievement
is that a learning ability test
A. can be used before instruction has been given.
B. provides more reliable scores.
C. measures a broader range of course content.
D. measures only innate learning potential.
3. In the spectrum of ability tests, which of the following test types would be most different
from the content-oriented achievement test?
A. a nonverbal test
B. a school-oriented aptitude test
C. a test of general educational development
D. a verbal ability test
4. Scholastic aptitude tests are best interpreted as measures of which of the following?
A. fixed learning capacity
B. mastery of the school’s curriculum
C. present learning ability
D. recent exposure to course content
5. Which of the following tests provides the widest array of scores?
A. Cognitive Abilities Test.
B. Differential Abilities Tests.
C. Otis-Lennon Ability Test.
D. School and College Ability Tests.
6. Standard age scores (SAS), used on some group and individual tests has a mean of
A. 10
B. 16.
C. 50.
D. 100.
188
7. Which of the following is an advantage of a learning ability test with verbal and nonverbal
scores?
A. a check on the poor reader is provided.
B. differential prediction is made possible.
C. scoring is made easier.
D. test administration is simplified.
8. Scholastic aptitude tests used for purposes like college admission should be interpreted as
measures of
A. inherited ability.
B. innate learning ability.
C. potential for future development.
D. present developed ability.
9. The fourth edition of the Stanford-Binet Intelligence Scale is arranged by
A. age levels.
B. cognitive areas and subtests.
C. spiral omnibus pattern.
D. verbal and performance tests.
10. The Stanford-Binet differs from the WISC-R in that the Stanford-Binet uses
A. individual administration.
B. standard age scores.
C. separate subtests.
D. verbal and performance tests.
11. In comparison to a group test of learning ability, the Stanford-Binet provides more
A. objective results.
B. observational information.
C. restriction on test responses.
D. use of standard scores.
12. The Stanford-Binet has a score with a standard deviation of 16. In a normally distributed
population of ten-year-old children approximately two-thirds of the cases will fall between
A. 68 and 100.
B. 68 and 132.
C. 84 and 116.
D. 100 and 132.
13. The Wechsler Intelligence Scales differ from the Stanford-Binet in that the Wechsler
Intelligence Scales are
A. arranged by age levels rather than subtests.
B. suitable for group administration.
C. made up of multiple subtests.
D. evaluated with separate verbal and performance scale scores.
189
14. If a student’s standard age score drops from 90 in the fifth grade to 85 in the sixth grade, the
score difference is most likely due to which of the following?
A. inadequate learning opportunities in sixth grade
B. lack of motivation
C. some type of emotional problem
D. the errors of measurement
15. Carl, a third grade student, received a standard age score of 66 on a group test of scholastic
aptitude. Based on this score, his teacher should recommend that Carl be
A. continued in the third grade.
B. given an individual ability test.
C. moved back to the second grade.
D. placed in a class for academically gifted students.
16. Culture-fair testing typically uses materials that are
A. common in many cultures.
B. free of cultural influences.
C. indicators of innate abilities.
D. most familiar to members of minority groups.
17. The Differential Aptitude Tests can be used to compare student's scores in different aptitude
areas because the subtests are
A. highly intercorrelated.
B. speeded tests.
C. standardized on the same group.
D. valid and reliable.
18. If students asked if they could improve their aptitudes as measured by the DAT, which of the
following would be the most appropriate response for an educator?
A. All of these aptitudes can be readily modified.
B. Aptitudes are fixed traits that cannot be modified.
C. Aptitudes are seldom modified by training.
D. Some of these aptitudes can be improved more readily than others.
19. One advantage of the computerized adaptive edition of a test such as the Differential
Aptitude Tests over the paper-and-pencil version is that
A. a profile of scores can be obtained.
B. speed of response is used in scoring.
C. students can complete more items.
D. test results can be obtained sooner.
20. Many tests that profess to be culturally fair are nonverbal or pictorial.
True
False
190
21. Most learning aptitude tests given to children with disabilities are group tests.
Agree
Disagree
22. Aptitude tests measure potential for learning while achievement tests measure learned
material.
True
False
23. The Stanford-Binet and the Wechsler Scales may be given by any teacher who has read the
manual.
Agree
Disagree
24. Most if not all students eventually are administered an individual test of learning ability.
True
False
25. Discuss two advantages and two disadvantages of group and individual learning ability tests?
26. Discuss some of the issues and concerns that culture fair tests of learning ability try to
address?
191
Chapter 17: Answer Key
1. B
2. A
3. A
4. C
5. B
6. D
7. A
8. D
9. A
10. B
11. B
12. C
13. D
14. D
15. B
16. A
17. C
18. D
19. D
20. True
21. Disagree
22. False
23. Disagree
24. False
25. Group tests possess time and economic efficiency. They can be given to a number of students
at once. As such, they are usually given more often to the average student than those
suspected of possessing a disability. Group tests are usually be administered by an individual
familiar with administration and scoring. They are less diagnostic than individual tests and
are more dependent on student reading ability. Individual tests are given in a one-on-one
setting. Hence they are more expensive and time consuming to give. These tests however
allow for more follow-up questions and behavioral observations than group tests. They
depend less on reading than group tests. However, they usually can be administered and
scored only by a licensed school psychologist.
27. The assumption behind culture fair tests is that all cultures do not look at intelligence or
intelligent behavior the same way. What might be considered intelligent behavior in one
culture may not be viewed that way in another. Also, cultures differ in the complexities of
their language and how members interpret concept. Thus culture fair tests try as much as
possible to incorporate universal or near universal concepts that occur inmost cultures. They
try to mitigate the effects of language by asking questions that measure nonverbal attributes.
192
Chapter 18 Test Selection, Administration, and Use
Exercise 18-A
SOURCES OF INFORMATION ON PUBLISHED TESTS
LEARNING GOAL: Identifies the most useful source of information for a given situation.
Directions: Below is a list of four sources of information concerning published tests. For each of
the statements following the list, indicate the source of information that should be consulted first
by circling the appropriate letter.
KEY A = Mental Measurements Yearbooks, B = Professional journals,
C = Test manual, D = Test publisher's catalog.
A B C D 1. To find information about cost of tests and scoring.
A B C D 2. To obtain critical reviews of a published test.
A B C D 3. To find out how a particular published test was constructed.
A B C D 4. To locate the most recent research studies using a particular test.
A B C D 5. To determine if any tests of study skills have been published.
A B C D 6. To determine the type of norms used in a published test.
LEARNING GOAL: Summarizes the purpose and content of the Standards for Educational and
Psychological Testing.
Directions: Briefly describe the purpose and content of the Standards for Educational and
Psychological Testing.
Purpose:
Content:
Note: Answers will vary.
193
Exercise 18-B
EVALUATING AN ACHIEVEMENT TEST
LEARNING GOAL: Evaluates a test using a test evaluation form.
Directions: Select an achievement test at the grade level of your choice and obtain a copy of the
test, the manual, and other accessory material; your instructor can help you with this. Study the
test materials, consult the reviews in the latest Mental Measurements Yearbook (MMY), and
write your evaluation using the following test evaluation form. Be brief and include only the
most essential information.
TEST EVALUATION FORM
Test title _____________________ Author(s)_____________________________
Publisher ______________________ Copyright date(s) _____________________
Purpose of test ________________________________________________________
For grades (ages)_______________ Forms _________________________________
Scores available _______________ Method of scoring _____________________
Administration time_____________ Time(s) of parts ______________________
Validity (cite manual pages) __________. Summarize evidence below.
Content considerations:
Test-criterion relationships:
Construct considerations:
Evidence regarding consequences of use:
194
TEST EVALUATION FORM, CONTINUED
Reliability (cite manual page numbers) _______. Summarize data below.
Age or Grade Type of Range of Number Range of
reliability reliabilities Tested reliabilities
(Total test) (Part scores)
Standard errors of measurement ____________ _____________
Norms (cite manual page numbers) ___________. Summarize data below.
Type (e.g., percentile rank):
Groups (size, age, or grade):
Separate norms (e.g., type of district):
Criterion-referenced interpretation
Describe (if available):
Practical features
Ease of administration:
Ease of scoring:
Ease of interpretation:
Adequacy of manual and materials:
Comments of reviewers (See MMY)
Summary Evaluation
Advantages:
Limitations:
Note: Answers will vary.
195
Exercise 18-C
EVALUATING AN ABILITY TEST
LEARNING GOAL: Evaluates a test using test evaluation form.
Directions: Select an aptitude test at the grade level of your choice and obtain a copy of the test,
the manual, and other accessory material; your instructor can help you with this. Study the test
materials, consult the reviews in the latest Mental Measurements Yearbook (MMY), and write
your evaluation using the following test evaluation form. Be brief and include only the most
essential information.
TEST EVALUATION FORM
Test title _____________________ Author(s)_____________________________
Publisher ______________________ Copyright date(s) _____________________
Purpose of test ________________________________________________________
For grades (ages)_______________ Forms _________________________________
Scores available _______________ Method of scoring _____________________
Administration time_____________ Time(s) of parts ______________________
Validity (cite manual pages) __________. Summarize evidence below.
Content considerations:
Test-criterion relationships:
Construct considerations:
Evidence regarding consequences of use:
196
TEST EVALUATION FORM, CONTINUED
Reliability (cite manual page numbers) _______. Summarize data below.
Age or Grade Type of Range of Number Range of
reliability reliabilities Tested reliabilities
(Total test) (Part scores)
Standard errors of measurement ____________ _____________
Norms (cite manual page numbers) ___________. Summarize data below.
Type (e.g., percentile rank):
Groups (size, age, or grade):
Separate norms (e.g., type of district):
Criterion-referenced interpretation
Describe (if available):
Practical features
Ease of administration:
Ease of scoring:
Ease of interpretation:
Adequacy of manual and materials:
Comments of reviewers (See MMY)
Summary Evaluation
Advantages:
Limitations:
Note: Answers will vary.
197
Exercise 18-D
ADMINISTERING PUBLISHED TESTS
LEARNING GOAL: Distinguishes between good and bad practices in administering published
tests.
Directions: Indicate whether each of the following statements describes a good (G) practice or a
bad (B) practice in administering a published test by circling the appropriate letter.
G B 1. Read the directions word for word.
G B 2. Give students extra time if there was an interruption during testing.
G B 3. Walk around the room and point out to students where they made silly answers.
G B 4. Tell students what to do about guessing if the directions failed to include them.
G B 5. If asked about a particular item, tell the student: “I’m sorry but I cannot help you.
Do the best you can.”
G B 6. Record any unusual student behavior during testing.
LEARNING GOAL: Describes a procedure for improving students’ test-taking skills.
Directions: Briefly describe an ethical procedure that a classroom teacher might follow for
improving students’ test-taking skills.
Note: Directions will vary.
198
Exercise 18-E
USES OF PUBLISHED TESTS
LEARNING GOAL: Distinguishes between correct and incorrect statements concerning uses of
published tests.
Directions: Indicate whether test specialists would agree (A) or disagree (D) with each of the
following statements concerning test use by circling the appropriate letter.
A D 1. Published achievement tests are most useful in the areas of basic skills.
A D 2. Published achievement tests need not match the instructional objectives of
the school.
A D 3. The best index of underachievement is a relatively large difference between the
scores of learning ability tests and achievement tests.
A D 4. Norm-referenced achievement tests are especially useful for
individualizing instruction.
A D 5. Course grades are more valid when based on scores from published
achievement tests.
A D 6. No important educational decision should be based on the scores of
published tests alone.
LEARNING GOAL: Lists misuses of published tests.
Directions: List as many ways as you can think of that published test results might be misused.
Use brief concise statements.
Note: Answers will vary.
199
Answers to Student Exercises
18-A 18-D 18-E
1. D 1. G 1. A
2. A 2. B 2. D
3. C 3. B 3. D
4. B 4. B 4. D
5. A 5. G 5. D
6. C 6. G 6. A
200
Chapter 18
Test Selection, Administration, and Use
1. Tests in Print should be consulted by educators when they are seeking
A. a comprehensive list of published tests.
B. newly created test blueprints.
C. test reviews.
D. validity and reliability data.
2. The Mental Measurements Yearbooks are best known for which of the following contents?
A. annual reports on standardized assessments
B. test descriptions
C. test reviews
D. well organized technical information
3. Which of the following publications would be most appropriate for educators to consult for
descriptions of the latest editions of a test?
A. Test in Print
B. Test critique electronic files
C. Mental Measurements Yearbooks
D. Test publishers’ catalogues
4. Which of the following publications would be most appropriate for educators to consult for
information that would be most helpful in evaluating a test manual?
A. Standards for Educational and Psychological Testing
B. test critique electronic files
C. Tests in Print
D. test publishers’ catalogues
5. Which of the following publications would be most appropriate for educators to consult for
information that provides guidance about the responsibilities of test developers and test users
for informing test takers about tests?
A. Code of Fair Testing Practices.
B. Mental Measurements Yearbooks.
C. Test Critiques.
D. Test Publishers’ technical manuals.
6. Which of the following can most adequately be determined from the description of an
achievement test in a test publisher's catalogue?
A. readability
B. reliability
C. usability
D. validity
201
7. Which of the following is the first consideration in selecting published achievement tests?
A. Availability of comparable forms
B. Cost of the tests
C. Ease of administration
D. Relevance to local objectives
8. To obtain information concerning the issue of whether a published achievement test is valid
for students, it is best for educators to
A. compare the items to local curriculum goals.
B. examine item analysis data in the test manual.
C. examine reliability data in the test manual.
D. make a local test-retest study of the scores.
9. In order for educators to obtain information concerning the validity of a test of learning
ability, it would be best to first examine the test manuals’
A. item analysis data.
B. predictive studies.
C. reliability data.
D. standardization studies.
10. Changing the instructions when administering a standardized achievement test will probably
have the greatest influence on the test scores'
A. interpretability.
B. objectivity.
C. reliability.
D. relevance to local objectives.
11. Publishers attempt to reduce the influence of test-taking skills by doing which of the
following?
A. not providing alternative test forms
B. providing practice tests
C. using multiple-choice items
D. using special scoring formulas
12. Published achievement tests are probably most useful to classroom teachers for which of the
following reasons?
A. diagnosing strengths and weaknesses
B. evaluating teaching
C. grading students
D. reporting to parents
202
13. Measuring educational progress over several grade levels with published tests is most
feasible when measuring which of the following?
A. basic skills
B. critical thinking skills
C. science
D. social studies
14. Published achievement tests are least useful in which of the following situations?
A. curriculum planning
B. grade assignment
C. grouping students
D. monitoring educational progress
15. Standardized achievement tests are inadequate for evaluating teaching effectiveness because
they typically have
A. inadequate norms.
B. inappropriate item difficulty.
C. low reliability.
D. low relevance to local objectives.
16. A standardized published achievement test is one useful tool in diagnosing learning
disabilities.
True
False
17. It is probably best not to report the results of published tests to parents because they do not
have the necessary statistical background.
Agree
Disagree
18 Using published test results to retain teachers or give merit raises is a fair and equitable
method.
True
False
19. One reason that teachers teach to the tests is that they feel they are being evaluated on the
basis of their students’ test results.
Agree
Disagree
20. Discuss some things that a teacher should ensure when administering a published test, so as
not to invalidate the results or interpretability?
21. What are three legitimate uses of published test information? What are three inappropriate
uses?
203
Chapter 18: Answer Key
1. A
2. C
3. D
4. A
5. A
6. C
7. D
8. A
9. B
10. A
11. B
12. A
13. A
14. B
15. D
16. True
17. Disagree
18. False
19. Agree
20. Teachers can make sure that they are not invalidating the test by following the administration
rules. Perhaps the greatest thing they can do is shift their thinking from teacher/her role to
test administrator role. Other things they can do include trying to motivate students to do
their best, strictly following all test administration rules, keeping accurate time, and not
giving students extra time. Still other strategies include recording any significant events
during test administration that may influence test results, and collecting all test materials
promptly at the end of the test.
21. Perhaps the best use of published tests results is in instructional planning. Knowing how
students did in a class, school, or school system can help educators make instructional
modifications for the future. A second good use of test results is in reporting the results to
parents in conferences. This information helps reinforce the teacher’s message about the
student reaching his or her learning goals. Finally, test results can be helpful in diagnosing or
qualifying children for special education services or in identification of a learning disability.
However, no published test should be used alone to accomplish any of these goals. In
appropriate uses of published tests include using test results to give a student grades for a
semester or marking period, using tests to evaluate teaching effectiveness, or assigning
students to a remedial track or retaining them in a grade for the following academic year.
204
Chapter 19 Interpreting Test Scores and Norms
Exercise 19-A
USES CRITERION-REFERENCED AND NORM-REFERENCED INTERPRETATIONS
LEARNING GOAL: Relates test interpretation to the type of information needed.
Directions: For each of the following questions, indicate whether a criterion-referenced (C) or
a norm-referenced (N) interpretation would be more useful by circling the appropriate letter.
C N 1. How does a student’s test performance compare to that of other students in
the same grade?
C N 2. What type of remedial work would be most helpful for a student struggling
academically?
C N 3. Has a student met state-mandated learning goals?
C N 4. Which students’ test performances exceed those of 90 percent of their
classmates?
C N 5. Which students have achievement mastery of computational skills?
C N 6. How does student test performance in our school compare with that of other
schools?
LEARNING GOAL: Describes the cautions to keep in mind when making criterion-referenced
interpretations of tests designed for norm-referenced use.
Directions: List and briefly describe several factors to consider when using criterion-referenced
interpretations with norm-referenced survey tests.
Notes: Answers will vary.
205
Exercise 19-B
NATURE OF DERIVED SCORES
LEARNING GOAL: Distinguishes among the characteristics of different types of derived scores.
Directions: Indicate which type of derived score is described by each statement listed below by
circling the appropriate letter. Use the following key.
KEY: G = grade equivalent scores, P = percentile rank, S = standard scores.
G P S 1. Provides units that are based on the average score earned in different groups.
G P S 2. Provides units that are systematically unequal.
G P S 3. Provides units that are most nearly equal.
G P S 4. Provides units that are most meaningful when interpreted with reference to
normal curves.
G P S 5. Provides units that are most meaningful at the elementary school level.
G P S 6. Provides units that are easily interpreted and typically compare students
with their own age group.
LEARNING GOAL: States the advantages and limitations of derived scores.
Directions: Briefly state one advantage and one disadvantage of percentile ranks and standard
scores.
Percentile Ranks
Advantage:
Disadvantage:
Standard Scores
Advantage:
Disadvantage:
Note: Answers will vary.
206
Exercise 19-C
GRADE EQUIVALENT SCORES
LEARNING GOAL: Distinguishes between appropriate and inappropriate interpretations of
grade equivalent scores.
Directions: Indicate whether each of the following interpretations of grade equivalent (GE)
scores are appropriate (A) or inappropriate (I) by circling the appropriate letter.
A I 1. A student who obtained a GE score of 3.1 in the spring of grade 4 would be
expected to obtain a GE score of 4.1 in the spring of grade 5.
A I 2. A student in grade 4 who obtained a GE score of 4.7 in April scored higher
than about half the students in the grade 4 norm group.
A I 3. A student with a GE score of 5.3 in reading and a GE score of 6.1 in math is
performing better in math than in reading.
A I 4. A student who has GE scores in all subjects that are more than 1.5 above grade
placement should probably be skipped to the next grade.
A I 5. A GE score of 11.0 for a grade 6 student indicates that the student did
exceptionally well on the 6th grade content, but not that he or she could do
grade 11 work.
A I 6. One year gains in GE scores for two students of 3.0 to 4.5 for one and 5.0 to 6.5
for the other indicate equivalent amounts of progress.
LEARNING GOAL: State the advantages and limitations of grade-equivalent scores.
Directions: Briefly state some of the major advantages and major limitations of grade equivalent
scores.
Advantages:
Limitations:
Note: Answers will vary.
207
Exercise 19-D
RELATIONSHIP OF DIFFERENT SCORING SYSTEMS
LEARNING GOAL: Converts scores from one scoring system to others.
Directions: Complete the following table by converting the given scores into comparable derived
scores in the other scoring systems. Round your answers to the nearest whole number, except for
z-scores where one decimal place should be reported. Assume that all score distributions are
normal and based on a common reference group. The first row of scores, form a given z-score of
1.0 has been completed to illustrate the procedure. Try to complete the exercise without looking
at the table in your book.
________________________________________________________________________
Standard Age
Score Percentile
z-Score T-Score (SD = 16) Stanine Rank
________________________________________________________________________
1.0 60 116 7 84
________________________________________________________________________
–1.0
________________________________________________________________________
0.5
________________________________________________________________________
–0.5
________________________________________________________________________
35
________________________________________________________________________
70
________________________________________________________________________
100
________________________________________________________________________
124
________________________________________________________________________
0.7
________________________________________________________________________
–1.3
________________________________________________________________________
LEARNING GOAL: Explains the value of using score bands on test profiles.
Directions: Explain why it is desirable to plot scores on a test profile as score bands instead of
specific score points.
Note: Answers will vary.
208
Exercise 19-E
INTERPRETATIONS OF SCORES ON PUBLISHED TESTS
LEARNING GOAL: Distinguishes between appropriate and inappropriate interpretations of
scores on published tests.
Directions: Indicate whether test specialists would agree (A) or disagree (D) with each of the
following statements about interpretations of scores on published tests by circling the appropriate
letter.
A D 1. Percentile ranks of tests of the same subject can be used interchangeably.
A D 2. The percentile rank score does not require an assumption of a normal
distribution.
A D 3. Information about a student's previous educational experiences and
language background is used in interpreting test scores.
A D 4. A band that extends one standard error of measurement above and below a
student's observed score helps guard against overly precise interpretations.
A D 5. Because the units on a grade equivalent scale are approximately equal, the
difference between 4.0 and 5.0 can be treated as equivalent to that between
8.0 and 9.0.
A D 6. Before making an important decision based on a test score, the interpretation
should be verified by other evidence.
LEARNING GOAL: States the advantages and limitations of using local norms.
Directions: Briefly state the advantages and limitations of using local norms to interpret test
performance.
Advantages:
Limitations:
Note: Answers will vary.
209
Answers to Student Exercises
19-A 19-B 19-C 19-E
1. N 1. G 1. I 1. D
2. C 2. P 2. A 2. A
3. N 3. S 3. I 3. A
4. N 4. S 4. I 4. A
5. C 5. G 5. A 5. D
6. N 6. P 6. I 6. A
210
Chapter 19
Interpreting Test Scores and Norms
1. A student in grade 6 earned a raw score of 50 on an achievement test. Which of the following
interpretations is most justified?
A. The grade equivalent score is 6.0.
B. The percentage-correct score is 50.
C. The percentile score is 50.
D. The score is interpreted without more information.
2. Brent earned a score of 91 on a 100-item achievement test. Which of the following
interpretations is most justified?
A. He is in the top half of the group.
B. His percentile score is 91.
C. His stanine score is 8.
D. His percentage-correct score is 91.
3. Which of the following best illustrates a criterion-referenced interpretation?
A. Comparing test performance with that of others.
B. Comparing performance on two different tests.
C. Describing the nature of the individual's performance.
D. Evaluating the correlation of test scores with a criterion.
4. When making criterion-referenced interpretations of standardized tests, educators must be
sure that
A. the total test is reliable.
B. the norms are adequate.
C. there are enough items for each interpretation.
D. there are provisions for using percentile scores.
5. Which of the following best describes test norms?
A. actual performance of representative groups
B. desired performance based on expert judgment
C. standards set by testing selected groups
D. test scores that have been normalized
6. At the end of grade 5, Erik has a grade-equivalent score of 6.5 in reading. When he is at the
end of grade 6 his score will most likely be
A. 7.0
B. 7.5
C. greater than 7.5
D. less than 6.0
211
7. Grade equivalent scores are least useful for
A. comparing performance on two tests.
B. describing growth in achievement.
C. interpreting scores to students.
D. reporting to parents.
8. A percentile score of 50 on an achievement test indicates that
A. half the norm group had lower scores.
B. half the items were marked correctly.
C. the raw score is at least 50.
D. this person failed the test.
9. Which of the following is a disadvantage of percentile ranks?
A. they are difficult to prepare
B. they are difficult to interpret
C. they depend on the number of items in the test
D. they have unequal units
10. The z-score serves as the basis or which of the following?
A. percentile score
B. standard age score
C. average score
D. formative score
11. Which of the following is true about T scores?
A. They are criterion-referenced.
B. They possess an absolute zero.
C. They are based on the z-score.
D. They are based on percentiles.
12. One advantage of T-scores over z-scores is that they
A. always have the same standard deviation.
B. are more easily converted to percentile ranks.
C. can be added and subtracted.
D. include only positive scores.
13. If a student ranks 5th in a class of 50, the percentile rank would be
A. 5.
B. 10.
C. 45.
D. 90.
212
14. In a normal distribution, which of the following scores equals a percentile rank of 16?
A. Standardized Age Score of 66.
B. Stanine of 2.
C. T-score of 40.
D. z-score of –1.6.
15. The smallest number of percentile ranks fall between which of the following ranges of T-
scores?
A. 30 and 35
B. 40 and 45
C. 50 and 55
D. 55 and 60
16. In a normal distribution a percentile rank of 84 would be the equivalent to a stanine of
A. 6.
B. 7.
C. 8.
D. 9.
17. The “score bands” used on test profiles indicate which of the following?
A. score intercorrelations
B. score objectivity
C. score reliability
D. score validity
18. Other things being equal, which of the following represents the highest level of achievement?
A. Normal-curve equivalent = 80
B. Percentile rank = 84
C. T-score = 59
D. z-score = .8
19. In a normal distribution, which of the following represents the lowest level of achievement?
A. Normal-curve equivalent = 40
B. Percentile rank = 40
C. Stanine = 5
D. T-score = 40
20. Which of the following statements about stanine scores is most accurate?
A. It is possible to have a stanine of 0.
B. Stanines are based on a nine-point scale.
C. Stanines are more precise than other types of standard scores.
D. The stanine of 105 corresponds closely to the mean of the normal curve.
21. Percentages and percentiles are interchangeable statistics.
Agree
Disagree
213
22. Norm relevancy should be left up to the test publisher.
True
False
23. It is legitimate for the test consumer to ask questions about the recency of test norms.
Agree
Disagree
24. The normal curve has the particular property of being symmetrical.
True
False
25. List and describe the three attributes that good test norms should possess.
26. Describe z-scores and T scores. How are they different? What is their relationship to the
normal curve?
214
Chapter 19: Answer Key
1. D
2. D
3. C
4. C
5. A
6. C
7. A
8. A
9. D
10. B
11. C
12. D
13. D
14. C
15. A
16. B
17. C
18. A
19. D
20. B
21. Agree
22. False
23. Agree
24. True
25. Good norms should be relevant, representative and up to date (recent). Relevancy is the
degree of agreement between the test norm group and the attributes of the test takers.
Relevancy of norm groups needs to be decided by teachers and other educational
professionals before adopting a test with a given group of students. Representativeness of
norms reflects the notion of a random sample in the test authors selecting the norm group for
the test. This is extremely difficult and expensive, however, so we must usually settle for
something less. At a minimum, we should demand that all significant subgroups of the
population be adequately represented. Norm recency means that the norms are up to date and
not updated. Test norms ten or more years old are probably outdated. Anytime a test is put
out in a new edition the test should probably also contain up to date test norms.
215
Appendix A Elementary Statistics
Exercise A-1
MEASURES OF CENTRAL TENDENCY
LEARNING GOAL: Distinguishes among measures of central tendency.
Directions: For each of the following statements, indicate which measure of central tendency is
being used by circling the appropriate letter using the following key.
KEY: A = Mean, B = Median, C = Mode.
A B C 1. It is the most frequent score in a set of scores.
A B C 2. It accounts for the numerical value of each score.
A B C 3. It is always an actual score.
A B C 4. It is always equal to the 50th percentile.
A B C 5. It is determined by dividing the sum of a set of scores by the number of
scores.
A B C 6. It would not change if an extremely high score earned by a single
individual was deleted from the set.
LEARNING GOAL: Selects the measure of central tendency that is most appropriate for a
particular use.
Directions: For each of the following statements, indicate whether the mean (A) or the median
(B) is most appropriate by circling the letter.
A B 1. To use with the quartile deviation.
A B 2. To use with the standard deviation.
A B 3. To divide a set of scores into two equal halves.
A B 4. To compute a set of standard scores.
A B 5. To limit the influence of a single score of 85 when all the other scores range
between 35 and 65.
A B 6. To report the most widely used measure of central tendency.
216
Exercise A-2
MEASURES OF VARIABILITY
LEARNING GOAL: Distinguishes among the measures of variability.
Directions: For each of the following statements, indicate which measure of variability is being
described by circling the appropriate letter using the following key.
KEY: A = Standard deviation, B = Quartile deviation, C = Range.
A B C 1. It is based on the highest and lowest scores only.
A B C 2. It is half the distance between the 25th and 75th percentiles.
A B C 3. It is also called the semi-interquartile range.
A B C 4. It accounts for the numerical value of each score.
A B C 5. It is influenced least by adding one extremely low score.
A B C 6. It can be used to identify the range of the middle 68 percent of scores in a
normal distribution.
LEARNING GOAL: Selects the measure of variability that is appropriate for a particular use.
Directions: For each of the following statements, indicate which measure of variability would be
most appropriate by circling the letter using the following key.
KEY: A = Standard deviation, B = Quartile deviation, C = Range.
A B C 1. To obtain the most stable measure of variability.
A B C 2. To obtain the simplest and quickest estimate of variability.
A B C 3. To obtain the range of the middle 50 percent of a set of scores.
A B C 4. To compute the amount of error in test scores.
A B C 5. To use with a small set of scores that includes one extremely high score.
A B C 6. To compute a set of T-scores.
217
Exercise A-3
CONSTRUCTING GRAPHS AND COMPUTING MEASURES OF CENTRAL
TENDENCY AND VARIABILITY
LEARNING GOAL: Constructs graphical representations of scores and computes measures of
central tendency and variability.
Directions: Use the following set of scores to: (1) construct a frequency polygon with a class
interval of 3, (2) construct a stem-and-leaf diagram, and (3) compute the mean, median, range,
and standard deviation.
Student Score
_______________
A 60
B 58
C 56
D 54
E 53
F 52
G 48
H 47
I 45
J 42
K 40
L 39
M 37
N 35
O 32
P 29
Q 28
R 20
S 15
T 10
Median =
Range =
Mean =
Standard deviation =
218
Exercise A-4
CORRELATION COEFFICIENT AND REGRESSION
LEARNING GOAL: Identifies the characteristics of the product-moment correlation coefficient.
Directions: Indicate whether each of the following features describes the product-moment
correlation coefficient by circling Yes (if it does) and No (if it does not).
Yes No 1. During the computation, it accounts for the numerical value of each score.
Yes No 2. It is easy to compute without the aid of a calculator.
Yes No 3. It can be used to compute estimates of test reliability.
Yes No 4. It can be used to evaluated test-criterion relationships.
Yes No 5. The degree of relationship is shown by the plus and minus signs.
Yes No 6. It can be used to indicate the cause/effect relations between measurement
variables.
LEARNING GOAL: Uses the regression equation to obtain predicted criterion scores from test
scores.
Directions: The regression equation for predicting a criterion measure, Y, from a test score, X,
is: predicted Y = –1.0 + .4X. Find the predicted criterion score for the following three students.
Carlos: Test score = 20
Predicted criterion score =
Kim: Test score = 15
Predicted criterion score =
Sam: Test score = 10
Predicted criterion score =
219
Exercise A-5
COMPUTING THE PRODUCT-MOMENT CORRELATION COEFFICIENT
LEARNING GOAL: Computes the product-moment correlation coefficient.
Directions: Compute the product-moment correlation for the pairs of scores in the following
table.
____________________________________________
Student X Y X2 Y
2 XY
____________________________________________
A 15 16
B 18 15
C 12 8
D 13 11
E 19 17
F 10 9
G 14 13
H 11 5
I 17 17
J 11 9
220
Answers to Student Exercises
A-1 (Top) 1. C 2. A 3. C 4. B 5. A 6. C
(Bottom) 1. B 2. A 3. B 4. A 5. B 6. A
A-2 (Top) 1. C 2. B 3. B 4. A 5. B 6. A
(Bottom) 1. A 2. C 3. B 4. A 5. B 6. A
A-3 STEM LEAF
6 0
5 23468
4 02578
3 2579
2 089
1 05
Median = 41
Range = 50
Mean = 40
Standard deviation = 14
A-4 1. Y 2. N 3. Y 4. Y 5. N 6. N
Predicted criterion scores: Carlos, 7; Kim, 5; Sam, 3.
A-5 Product-moment correlation = .86.