the good, the bad and the ugly of mcqs - connected · writing mcqs 3. a clear stem each item should...
TRANSCRIPT
The good, the bad and the ugly of MCQs
Brett Vaughan PhD (candidate)
Department of Medical Education
Melbourne Medical School
1
Overview
Introduction
Writing good single best answer MCQs at Knowledge level
Technical item flaws
Post-test review
2
Introduction
MCQ’s & Miller’s triangle/pyramid
• Typically test at the Know’s and Know’s how level
3
Multiple choice questions (Type A)
• Easy to write!
• Easy to be poor!
• Context free (i.e. no lead-in information)
• Context rich (i.e. case details)
Assessment utility
• Reliability
• Validity
• Cost-effectiveness
• Feasibility
https://faculty.londondeanery.ac.uk/e-learning/setting-learning-objectives/some-theory
Introduction
Overarching issues
• Content under-representation (resolved through blueprinting)
• Construct irrelevance
– Item flaws (e.g. inappropriate distractors)
– Test-wiseness (e.g. answers follow a pattern, stem ques the answer)
– Teaching to the test
– Poor phraseology (e.g. double negatives)
4
5
Writing good MCQs at Knowledge level
Writing MCQs1. Blueprint
• Make sure the learning outcomes are fairly assessed
• Ensure it is matched to the learner level
– E.g. Block 1 may focus mainly knowledge
6
Learning Objective
Percentage of Block Content
Knowledge Comprehension Analysis No. of Questions
LO1 20% 10 5 2 17 (21.25%)
LO2 20% 10 5 2 17 (21.25%)
LO3 30% 15 6 3 24 (30%)
LO4 15% 8 2 1 11 (13.75%)
LO5 15% 8 2 1 11 (13.75%)
100% 80
Writing MCQs2. Consider topic focus
Item(s) should focus on a single important concept (common problem, serious problem, problem with significant consequences)
• Perianal (‘saddle’) paraesthesia may be associated with which of the following conditions:
A. Abdominal aortic aneurism
B. Cauda equina syndrome
C. Lumbar muscle strain
D. ‘Pinched’ nerve
Avoid testing trivial or insignificant facts
• A patient says their pain is 8/10. This would be classified as:
A. Insignificant
B. Mild
C. Moderate
D. Severe7
Writing MCQs3. A clear stem
Each item should have a clear question.
• The ectoderm forms which body tissues?
A. Nucleus pulposus of the intervertebral disc
B. Kidneys
C. Connective tissues (bones, ligaments)
D. Brain and spinal cord
The item should be answerable if the options are covered.
• The nerve supply to the diaphragm is the phrenic nerve. The nerve root levels for this are:
A. C2, C3 and C4
B. C3, C4 and C5
C. C4, C5 and C6
D. C5, C6 and C7
8
Writing MCQs4. Distractors (incorrect options)
Should be:
• Homogenous (e.g. all diagnoses, all treatments, all blood tests)
• Plausible
• Worded in a similar way
• Potentially a logical response
• Same length as the correct answer
• Logically ordered
9
Writing MCQs5. Review
Have a colleague review each question
• Subject each question to the ‘covering response options process’
Identify potential technical item flaws
Reword item stem / response options as appropriate
Opportunity to pre-test items?
• Could you use some of the questions in formative assessment tasks?
Standard setting
10
11
Technical item flaws
Common technical flaws
Testwiseness
• Logical cue
• Repeating words
• Longest option is correct
• Use of “all / none of the above”
• Convergence (patterns in responses)
• Implausible distractors
• Grammar cues
12
Irrelevant difficulty
• Vague terms
• Unnecessary information (in stem or response)
• Negative stems (e.g. not, except)
• More than one correct answer
• No correct answer
• Fails ‘response covering test’
• Numeric data not sequenced
Example Questions
The posterior intercostal arteries arise from which part of the arterial system?
A. Abdominal aorta
B. Inferior vena cava
C. Thoracic aorta
D. Aortic arch
13
The nerve supply to the tensor fascia lata muscle is:
A. Superior gluteal nerve
B. Femoral nerve
C. Inferior gluteal nerve
D. Nerve to tensor fascia lata
14
Post-test analysis
Post-test analysis
Must do!
Quantitative analysis
• Classical test theory
• Modern test theory (Rasch analsys, Mokken scaling)
Review common statistics
Post-test standard setting:
• ‘Cohen’ method
15
Post-test analysis
Considerations:
• Identify ‘potential’ item flaws, content poorly understood
• Do not equate analysis with item validity
• Reliability will vary based on cohort, test length, teaching methods, time of day etc.
Overall test:
• Cronbach’s alpha (ideally >0.7 for formative, >0.8 for summative)
• Be aware that if the assessment is testing a wide range of concepts, this value may be lower (i.e. different constructs being measured in the one assessment)
• May also see KR-20 being used
• Long test length will result in higher reliability
16
Post-test analysis
Item level statistics
• Difficulty Index (% of students who were correct versus incorrect)
– <30% of cohort correct = ‘difficult’ item
– >70% of cohort correct = ‘easy’ item
• Discrimination Index
– Also known as point biserial correlation
– >0.35 = good item
– 0.2-0.35 = acceptable item
– <0.2 = poor item
– Will rarely be >0.5
• Distractor analysis
– Functioning distractors (selected by students typically in the lower half of the cohort)
– Non-functioning distractors (selected by up to 5% of students) 17
Post-test analysisExample (‘good’)
Somites are formed in which layer of the embryo? (1 mark)
A. Ectoderm
B. Mesoderm
C. Endoderm
D. Periderm
18
Post test analysis
Answer is A
Difficulty: 44% correct answer (or 0.44)
Discrimination: 0.29
Distractors: B (20%), C (14%), D (19%)
Post-test analysisExample (‘bad’)
Which of the following is NOT a factor that will affect conduction velocity of neurons? (1 mark)
A. Age
B. Neurological disorders
C. Gender
D. Location of the neuron
19
Post test analysis
Answer is A
Difficulty: 98% correct answer (or 0.98)
Discrimination: 0.07
Distractors: B (1%), C (1%), D (1%)
Post-test analysisExample (‘ugly’)
The pubic symphysis is classified as what type of joint?
A. Plane cartilaginous joint
B. Plane synovial joint
C. Secondary cartilaginous joint
D. Primary cartilaginous joint
20
Post test analysis
Answer is C
Difficulty: 15% correct answer (or 0.15)
Discrimination: 0.02
Distractors: A (6%), B (1%), D (15%)
Post-test analysis
‘Cohen’ standard setting
• Relies on standard error of measurement (SEM) – “true score reliability” derived from Cronbach’s alpha or similar
• Might be difficult to use with institutional policies
• Can provide an indication as to overall test difficulty
How can the analysis be done?
• Excel (easy!)
• R (the QME package produces neat reports)
21
Resources
Constructing A-Type Multiple Choice Questions (MCQs): Step By Step Manual Mohammed Elhassan Abdalla, Abdelrahim Mutwakel Gaffar and Rasha Ali Suliman
https://www.academia.edu/1215065/Constructing_A-Type_Multiple_Choice_Questions_MCQs_Step_By_Step_Manual
Constructing Written Test Questions for the Basic and Clinical Sciences (4th edition)National Board of Medical Examiners® (NBME®)
https://www.nbme.org/publications/item-writing-manual.html
22
Thank you
@BrettVaughan4
excite.mdhs.unimelb.edu.au
Department of Medical Education
Melbourne Medical School