using simulation in medical student assessment

www.wiser.pitt.edu

Using simulation in medical student assessment

WR McIvor, MD Associate Professor of Anesthesiology

Associate Director of WISER for Medical Student Simulation Education

www.wiser.pitt.edu

I don’t use sim to determine med student proficiency

• 12 yrs experience

• “Teach” medical students, and presume my efforts improve their KSA’s

• “Day 1” MS III course – 90 minutes long, from 7:30 - 9:00 am

– Goal: KSA around “Do you want to intubate my patient?”

– Value of keeping students in the sim lab who are grossly incompetent?

www.wiser.pitt.edu

Factors driving assessment

• Public accountability1

• LCME2 – Educational program provide a general

professional education that prepares students for all career options in medicine. Cite relevant outcomes indicating success in that preparation.

– Ensure students have acquired core clinical skills.

• Performance (vs time) criterion for advancement3

1Crossing the Quality Chasm: A New Health System for the 21st Century. IOM, 2001 2http://www.lcme.org/selfstudyguide1011.pdf 3Educating Physicians: A call for reform of medical school and residency

www.wiser.pitt.edu

Advantages of sim-based assessment

• Reproducible

• Realistic

• Safe for patients

• Assess ability across many medical and surgical scenarios

www.wiser.pitt.edu

Challenges

• What should we expect of a trainee? • How hard is this scenario? • Limitations of what can be simulated • Require a number (4-8) scenarios in order to get accurate

assessment – Necessitates short experiences – Time validity?

• Clear understanding of what we are seeking to measure – Knowledge – Procedural skill – Decision making – Communication – Professionalism

www.wiser.pitt.edu

Simulation used to assess medical students: USMLE Part 2 CS

• Uses SP’s to assess:

– Communication skills

– Diagnostic skills

– Interpersonal skills

– Documentation ability

– English proficiency

• Pass/fail exam

www.wiser.pitt.edu

CS test characteristics

• Utilizes a method that has 35 years of history • The cases (12) all have the same difficulty • Very specific instructions

– Trust the VS, unless you don’t think you should – Do a focused, not necessarily complete physical exam – Some physical findings will be real/some simulated

(suspend belief) – Genital/rectal/pelvic simulators are used for those

exams

• Only performed in Philadelphia • Schools (certainly Pitt) rehearse this test

1http://www.usmle.org/Examinations/step2/cs/content/description.html

www.wiser.pitt.edu

Mannequin simulator limitations

• Some things the simulators do not model well – Cyanosis – Sweating – Respiratory distress

• Airway problems tend to be all or nothing – Can’t have a moderately difficult intubation

• Time issues – Students give drugs, or mask ventilate, and expect an

instantaneous change in VS – Sometimes administer several drugs at once; produces

conflicting responses

• The frequency of simulators crashing

www.wiser.pitt.edu

Key areas of human-patient simulation (HPS) assessment1

1. Defining the skills to be assessed

• Choosing appropriate sim tasks

• Appropriate simulators

2. Establishing appropriate metrics

3. Determining the source of error in measurements

4. Evidence of the validity of test scores

1Anesthesiology 2010; 112:1041–52

www.wiser.pitt.edu

1. Defining the skills to be measured and choosing the correct simulation

• The assessment needs – Defined purpose

– Delineation of the knowledge and skills evaluated

– Context for performance-based activities

• Targeted to the examinee’s ability

• Choose scenarios based upon: – Competency guidelines

– Curriculum information

– Simulation capabilities

www.wiser.pitt.edu

2. Developing appropriate metrics- Do the scores reflect actual ability?

• Implicit and explicit scoring – Explicit: checklists or key actions

• Established by content experts informed by experience and practice guidelines

• Advantages: logical, objective scoring, modest reproducibility • Disadvantages: subjectively constructed, reward scripted approach &

“shot gun” performance, do not consider the order in which actions are taken

– Implicit: Entire performance is rated as a whole (“Global assessment”) • Applied to teamwork/communication assessment • Often require multiple well-trained raters • Typically scored retrospectively • How to assess varying performance over time?

• “Patient” (simulator) outcome

www.wiser.pitt.edu

3. Test score reliability

• Generalizability (G) studies are conducted to identify sources of error (score inconsistency) and their interactions

• Decision (D) studies are then conducted to determine optimal scoring design – How many simulations and raters are necessary for reliable

scores given the construct being assessed

• Task sampling variance has greater impact on assessment than rater’s effect – Participants can do a great job treating hypotension, and a poor

job with hypoxia – Need more sim scenarios (not more raters) to improve reliability

www.wiser.pitt.edu

4. Validity of test scores- What inferences can be made from the

assessment scores? • Content validity:

– Base simulations on actual occurrences/practice characteristics – Base scoring rubrics on evidence – Stakeholder feedback – Realistic modeling using real world equipment

• Internal consistency – Good proceduralists are likely good communicators

• Criterion validity – Sim performance correlates positively with experience and test

scores (board scores, for e.g.)

• Competency threshold (“cut score”) must be determined

www.wiser.pitt.edu

Experience with mannequin-based sim assessment1

• Med school grads are expected to manage acute care scenarios

– These scenarios can’t be modeled with SP’s

– Knowledge (cognitive tests) may not be sufficient to assess management skills

– Looked to HPS for a testing platform

• Had MS IV’s and interns perform 6 of 10 scenarios

1Anesthesiology 2003; 99:1270–80

www.wiser.pitt.edu

Results

• Interns were more proficient than MS IV’s

• Variance in student/resident scores were attributable to case content

• To improve the precision of the assessment measurement, increase the number of cases performance

• Increasing the number of raters would not improve reliability – Agreement among raters about key elements during

scenario development

www.wiser.pitt.edu

Results

– Based scoring on specific diagnostic and treatment guidelines

– Brief scenarios

– Evaluated technical, not non-technical skills

• Participants with ACLS/PALS certification and CCM experience performed better

www.wiser.pitt.edu

Conclusions

• Rater’s facet did not impact overall reproducibility – Scenarios with high degree of content validity

(performance objectives established by experts)

– Well-defined scoring rubrics

• Person x case variance was large – Number of cases are the most important factors affecting

the reliability of this assessment

• Clinical experience correlated with better performance

• HPS can be used to evaluate clinical performance in med students and residents

www.wiser.pitt.edu

To be an effective assessment piece, participants must be familiar with HPS

• More penetration of HPS into med school curricula

• ACGME statement that anesthesia residency programs use simulation yearly

• MOCA’s HPS requirement

• HPS is being studied as an evaluation instrument

• HPS will become common place in the next few years, therefore…

www.wiser.pitt.edu

The future of assessment

• HPS will become an assessment tool for med students, like the NBMLE Step 2 CS Exam, by 20??

• Sim-based assessment will drive more sim experience

• Improve patient safety

using simulation in medical student assessment

Documents