1.test design 2.item specifications 3.item writing 4.item review
TRANSCRIPT
![Page 1: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/1.jpg)
EFFECTIVE ITEM WRITING
Item Writing RetreatOrlando Florida
January 2012
![Page 2: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/2.jpg)
Steps for Item Development1. Test Design2. Item Specifications3. Item Writing4. Item Review
![Page 3: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/3.jpg)
Steps for Item Development1. Test Design
• Creates a design for what an effective end-of-course exam would look like for the selected course
• Determines which benchmarks are evaluated at what cognitive complexity level
![Page 4: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/4.jpg)
2. Item SpecificationsSteps for Item Development
Writing Test Item Specifications: A Template
Step 1: Narrative & Depth of Knowledge
Narrative
Subject Area
Grade Level
Strand
Standard
Benchmark
Depth of Knowledge
Cognitive Complexity Level
Step 2: Assessment Item Type
Objective Item/ Multiple-choice
Performance Item
Step 3: Item Contexts, Content Limits, Key Vocabulary, & Prior Skills
Item Context
Content Limits
Key Vocabulary
Prior Skills
Step 4: Essential Questions
Question 1
Question 2
• Creates an outline to construct items to effectively measure the intended content
![Page 5: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/5.jpg)
Steps for Item Development3. Item Writing
• Uses the course description and item specifications document to create items
• Items written to cognitive complexity levels that the benchmark warrants
![Page 6: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/6.jpg)
Steps for Item Development4. Item Review
• Reviews to look for bias, grammar, punctuation• Pilot testing to determine validity and reliability
![Page 7: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/7.jpg)
Item Types
Multiple-choiceWhat is the capital of Florida?
A. Miami
C. OrlandoD. Jacksonville
This is considered the ‘STEM’
B. Tallahassee
Incorrect answers are called ‘DISTRACTORS’
What is the capital of Florida?
![Page 8: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/8.jpg)
Item Types
Multiple-choice
The advantage is that, with careful construction, this type can be used to measure knowledge at most levels.
The disadvantage is that it's hard to write good distracters for levels beyond factual recall.
![Page 9: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/9.jpg)
Item TypesTrue/False
The border between the U.S. and Canada is longer than the border between the U.S. and Mexico.
A. True B. False
![Page 10: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/10.jpg)
Item TypesTrue/False
The advantage is that, it's the most efficient way to measure a lot of content in a short period of test time.
The disadvantages are that it's hard to measure higher-level knowledge areas, and guessing (50% chance of being right).
![Page 11: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/11.jpg)
Item TypesMultiple Select
What colors are in the American flag? Mark all that are correct.
__ Red __ Green __ White __ Blue __ Black
![Page 12: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/12.jpg)
Item TypesMultiple Select
The advantage is that, it is an efficient way of measuring a set of facts or concepts that cluster together.
The disadvantage is that, this is suitable only for certain knowledge areas.
![Page 13: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/13.jpg)
Item TypesMatching
For each concept on the left, select the word from the list on the right that best matches it.
__Test predicts future performance
__Test appears a reasonable measure __Re-test scores are very similar
__Low standard error
A. Face validityB. ReliabilityC. AccuracyD. ValidityE. Consistency
![Page 14: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/14.jpg)
Item TypesMatching
The advantage is, that it allows the comparison of related ideas or concepts.
The disadvantages are that it's not suitable for measuring isolated facts and information, and scoring can be complex.
![Page 15: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/15.jpg)
Item TypesRanking
Put the following steps in the correct order a test author should take in writing a new test.
__ Prepare test blueprint __ Determine test objectives __ Draft test items __ Evaluate items against criteria __ Perform item analysis __ Check with subject matter experts __ Select item types to be used __ Pilot the test and modify as needed
![Page 16: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/16.jpg)
Item TypesRanking
The advantage is, that this is perfect when knowing the correct order is important.
The disadvantages are, that it's not suitable for anything else, and scoring can be complex.
![Page 17: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/17.jpg)
Item TypesFill in the blank
The first President of the United States was ___________________.
![Page 18: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/18.jpg)
Item TypesFill in the blank
This type has little advantage over well-written multiple-choice items..
The disadvantages are that scoring can be difficult (and sometimes subjective).
![Page 19: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/19.jpg)
Effective Item ChecklistTrying to measure more than one thing with a single item is a mistake commonly made by new test authors.
Importance is sometimes confused with item difficulty. Something could be extremely important, but if 100% of the test takers always get the item right, it's probably trivial and should be eliminated.
The easiest way to tell if the stem is a complete thought is to cover up the response options and see if you know what you're supposed to do.
![Page 20: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/20.jpg)
Effective Item ChecklistWrite each item as concisely as you can.
As you ensure that all response options are grammatically correct with respect to the stem, try to avoid the "a(n)," "is/are" solution. Rewrite the item so you can measure the knowledge or skill without getting hung up on the grammar.
Writing plausible distracters is both an art and a science, and it's very hard work.
![Page 21: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/21.jpg)
Effective Item Checklist
Response options that aren't independent
a) <10 b) <20c) >40d) >50
The average combined score in the NFL playoff games in 2010 was:
![Page 22: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/22.jpg)
Do’s and Do not'sThings to do:
• Ensure that there is only one true and defensible answer
• Ask peers for help
• Get clarification if you have questions
![Page 23: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/23.jpg)
Do’s and Do not'sThings to avoid:
• Avoid Jargon and Textbook Language• Clichés • Common Misinformation • Logical Misinterpretations • Copy questions from a textbook• Partial Answers • “None of These” • “None of the Above” • “All of these” • “All of the Above”
![Page 24: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/24.jpg)
Moderate Cognitive Complexity
Low Cognitive Complexity
Webb’s 3 Tier Cognitive Complexity
![Page 25: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/25.jpg)
Webb’s 3 Tier Cognitive Complexity
Low Cognitive Complexity:
• One-step problem or basic facts
• Recall and basic comprehension, identify, label, define
![Page 26: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/26.jpg)
Webb’s 3 Tier Cognitive ComplexityModerate Cognitive Complexity:
• Integrate and analyze
• Classify, analyze, explain, synthesize, implement
![Page 27: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/27.jpg)
Webb’s 3 Tier Cognitive ComplexityHigh Cognitive Complexity:
• Analyze and represent knowledge in new and innovative ways
• Create, represent, rearticulate, argue, extend, - content knowledge
![Page 28: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/28.jpg)
Webb’s 3 Tier Cognitive Complexity• DOK level should reflect the level of work
most commonly required to perform
• DOK level should reflect complexity of the cognitive processes
• DOK level describes the kind of thinking required by a task, not whether or not the task is “difficult”
• If there is a question between two levels select the higher of the levels
![Page 29: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/29.jpg)
Examples of items
A test which may be scored merely by counting the correct responses is an _______________ test.
Locate and delete irrelevant clues
a) consistent b) objective c) stable d) standardized e) valid
The item could be rewritten.
A test which may be scored by counting the correct responses is said to be ____________
![Page 30: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/30.jpg)
Examples of items
a) possible lack of fuel for industry. b) possibility of widespread
unemployment. c) threat to our environment from
pollution. d) possible increase in inflation. e) cost of developing alternate sources of
energy.
Include one correct or most defensible answer
The most serious aspect of the energy crisis is the According to the National Energy Council, the most serious aspect of the energy crisis is the
![Page 31: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/31.jpg)
Examples of items
Multiple-choice items
a) may have several correct answers.
b) consists of a stem and some options.
c) always measure factual details.
The item should be revised
The components of a multiple-choice item are: a) stem and several foils. b) correct answer and several
foils. c) stem, a correct answer, and
some foils. d) stem and a correct answer.
![Page 32: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/32.jpg)
Examples of items
Options should be presented in a logical and systematic order
What type of validity is determined by correlating scores on a test with scores on a criterion measured at a later date?
a) Concurrent b) Construct c) Content d) Predictive
![Page 33: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/33.jpg)
Examples of items
Options should be grammatically parallel and consistent with the stem
A test which can be scored by a clerk untrained in the content area of the test is an
a) diagnostic test. b) criterion-referenced tests. c) objective test. d) reliable test. e) subjective test.
![Page 34: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/34.jpg)
Examples of items
Options should be mutually exclusive
a) Less than 10b) Less than 20c) More than 80d) More than 90
What should be the index of difficulty for an effective mastery-model test item?What should be the index of difficulty for an effective mastery-model test item?
a) Approximately 10b) Approximately 20c) Approximately 80d) Approximately 90
![Page 35: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/35.jpg)
Examples of items
Insure that correct responses are not consistently shorter or longer than the foils.
A random sample is one in which
a) subjects are selected by levels.b) each subject has an equal probability of being
chosen for the sample.c) every nth subject is chosen.d) groups are the unit of analysis.
a) subjects are selected by levels in proportion to the number at each level in the population.
b) each subject has an equal probability of being chosen.
c) every nth subject is chosen from a list.d) groups, rather than individuals, are the
unit of analysis.
![Page 36: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/36.jpg)
Examples of items
Use negatively stated items infrequently
Which of the following is NOT a method of determining test reliability?
a) Coefficient of equivalence b) Coefficient of stability c) K-R #20 d) Split-halves procedure e) Test-criterion intercorrelation
Which of the following is a method of determining the validity of a test?
a) Coefficient of equivalence b) Coefficient of stability c) K-R #20 d) Split-halves procedure e) Test-criterion correlation
![Page 37: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/37.jpg)
Examples of itemsThe quality of a test which indicates how consistently the test measures is called
a) objectivity. b) reliability. c) subjectivity. d) validity.
Easiest
Alter item difficulty by making options more alike or less alike in meaning
![Page 38: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/38.jpg)
Examples of items
Alter item difficulty by making options more alike or less alike in meaning
The least expensive way to determine the reliability of a test is the
a) Kuder-Richardson procedure. b) test-retest procedure. c) parallel forms procedure. d) parallel forms over time procedure.
Harder
![Page 39: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/39.jpg)
Examples of items
Alter Item Difficulty by Making Options More Alike or Less Alike in Meaning
Which of the following procedures provides the most stable estimate of equivalence?
a) K-R #20 b) K-R #21 c) Odd-even split-halves d) Randomized split-halves
Hardest
![Page 40: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/40.jpg)
ValidityA test is valid when it:
• produces consistent scores over time
• correlates well with a parallel form
• measures what it purports to measure
• can be objectively scored
• has representative norms
![Page 41: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/41.jpg)
Online training modules
http://www.coedu.usf.edu/ychen/hc-htm.htm
![Page 42: 1.Test Design 2.Item Specifications 3.Item Writing 4.Item Review](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bff41a28abf838cbd435/html5/thumbnails/42.jpg)
EFFECTIVE ITEM WRITING