Download - Evaluation Rating Forms

Evaluation Rating Forms

Craig McClure, MDMay 15, 2003Educational Outcomes Service Group

Typical Use of Rating Scales

End of Rotation (global)After single encounter (focused)To incorporate input from multiple evaluatorsVideotaped encountersNOT As checklist for single encounters: Yes/No

Alternate FormsMultiple episodes versus focused (single) episodeMeasuring global (six domains) versus task-specific behavior

Global Rating of LearnerDomains of competence, not specific skills, tasks, or behaviorsCompleted retrospectively concerning multiple days and activitiesMay be from multiple sourcesUse rating scales

Focused Rating ScaleSingle patient encounterConcerning specific task, skill, behavior

Advantages (Global)Easy to developEasy to use (training minimal)Can be used to evaluate all domainsReasonable reliability when

Focused evaluationTailored to competencies measured

Systematic Rater Errors (Global)

Leniency/SeverityRange RestrictionHalo EffectInappropriate Weighting

Drawbacks (Global)Content validity uncertainQuestionable validity of general assessments extrapolated to whole domainInefficient at directing learner improvementAccuracy variableGenerosity factorPoor discrimination between learners

Mixed Research resultsDiscriminating between competence levelsReliably rating more skilled physicians higher than less skilledReliability of ratingsReproducibility

Best: knowledgeHarder: patient care, interpersonal skills

Clarify Evaluative Objectives

Global versus focusedDefine using competency-based language emphasized by ACGME

Group the CompetenciesPatient Care, Medical knowledge, Practice-Based Learning and Improvement, Interpersonal and Communication Skills, Professionalism, and Systems-Based Practice.

Composition of FormShort is better than longBig font is better than smallClean better than cluttered

Each Behavior is Evaluated Independently Otherwise:

Uncertain what to evaluateLearner uncertain what to address

Decide on Options in the Scale

Best if minimum of fiveBest if a descriptor present for each

Absence of middle labels skews ratings toward the positive side

Primacy Effect“The results showed that when the

positive side of the scale was on the left, the ratings were more positive and had reduced variance than when the positive label was on the right.”

•Lake Wobegon Effect• Where all the children are above

average• Faculty tend to interpret anchors as

more negative than literal• Generosity effect

Consider Changing Anchors

IF desire to keep evaluative anchorsPoor, fair, below average, average, above average and excellent Very poor, poor, fair, good, very good, excellent

Consider Using Frequency Anchors

Frequency of observable resident behaviors from “never” to “always”Considerable education of the evaluators to minimize inter-rater variability needed for judgmental ratingPermits PD competency judgment

Example of Stem for Frequency Anchor

Resident demonstrates respect in speaking to patient…

Never, 25%, 50%, 75%, Always

Competency Judgment at Program Level

Permits competency definitions to vary by year of trainingDiminishes effect of inter-rater variabilityFocuses on observable behaviorRequires less training of evaluators

ReferencesEvaluations, S. Swing, Academic Emergency Medicine 2002;9:1278-88Assessment of Communication and Interpersonal Skills Competencies, Academic Emergency Medicine 2002;9: 1257-69ACGME/ABMS Joint Initiative Toolbox of Assessment Methods, September 2000

References (2)Challenges in using rater judgments in medical education, M.A. Albanese, Journal of Evaluation in Clinical Practice,6:3: 305-319

Download - Evaluation Rating Forms

Top Related