Download - Evaluation Rating Forms
Evaluation Rating Forms
Craig McClure, MDMay 15, 2003Educational Outcomes Service Group
Typical Use of Rating Scales
End of Rotation (global)After single encounter (focused)To incorporate input from multiple evaluatorsVideotaped encountersNOT As checklist for single encounters: Yes/No
Alternate FormsMultiple episodes versus focused (single) episodeMeasuring global (six domains) versus task-specific behavior
Global Rating of LearnerDomains of competence, not specific skills, tasks, or behaviorsCompleted retrospectively concerning multiple days and activitiesMay be from multiple sourcesUse rating scales
Focused Rating ScaleSingle patient encounterConcerning specific task, skill, behavior
Advantages (Global)Easy to developEasy to use (training minimal)Can be used to evaluate all domainsReasonable reliability when
Focused evaluationTailored to competencies measured
Systematic Rater Errors (Global)
Leniency/SeverityRange RestrictionHalo EffectInappropriate Weighting
Drawbacks (Global)Content validity uncertainQuestionable validity of general assessments extrapolated to whole domainInefficient at directing learner improvementAccuracy variableGenerosity factorPoor discrimination between learners
Mixed Research resultsDiscriminating between competence levelsReliably rating more skilled physicians higher than less skilledReliability of ratingsReproducibility
Best: knowledgeHarder: patient care, interpersonal skills
Clarify Evaluative Objectives
Global versus focusedDefine using competency-based language emphasized by ACGME
Group the CompetenciesPatient Care, Medical knowledge, Practice-Based Learning and Improvement, Interpersonal and Communication Skills, Professionalism, and Systems-Based Practice.
Composition of FormShort is better than longBig font is better than smallClean better than cluttered
Each Behavior is Evaluated Independently Otherwise:
Uncertain what to evaluateLearner uncertain what to address
Decide on Options in the Scale
Best if minimum of fiveBest if a descriptor present for each
Absence of middle labels skews ratings toward the positive side
Primacy Effect“The results showed that when the
positive side of the scale was on the left, the ratings were more positive and had reduced variance than when the positive label was on the right.”
•Lake Wobegon Effect• Where all the children are above
average• Faculty tend to interpret anchors as
more negative than literal• Generosity effect
Consider Changing Anchors
IF desire to keep evaluative anchorsPoor, fair, below average, average, above average and excellent Very poor, poor, fair, good, very good, excellent
Consider Using Frequency Anchors
Frequency of observable resident behaviors from “never” to “always”Considerable education of the evaluators to minimize inter-rater variability needed for judgmental ratingPermits PD competency judgment
Example of Stem for Frequency Anchor
Resident demonstrates respect in speaking to patient…
Never, 25%, 50%, 75%, Always
Competency Judgment at Program Level
Permits competency definitions to vary by year of trainingDiminishes effect of inter-rater variabilityFocuses on observable behaviorRequires less training of evaluators
ReferencesEvaluations, S. Swing, Academic Emergency Medicine 2002;9:1278-88Assessment of Communication and Interpersonal Skills Competencies, Academic Emergency Medicine 2002;9: 1257-69ACGME/ABMS Joint Initiative Toolbox of Assessment Methods, September 2000
References (2)Challenges in using rater judgments in medical education, M.A. Albanese, Journal of Evaluation in Clinical Practice,6:3: 305-319