computer based exams tedeq - question bank and curriculum mapping
DESCRIPTION
Computer Based Testing: Design of a Tagged Electronic Database of Exam Questions (TEDEQ) as a Tool for Assessment Management within an Undergraduate Medical Curriculum.TRANSCRIPT
www.examsoft.com | [email protected] | 866.429.8889
Design of a Tagged Electronic Database of Exam Questions (TEDEQ) as a Tool
for Assessment Management within an Undergraduate Medical Curriculum.
Dr. Dale D. VandreDepartment of Physiology and Cell Biology, College of
Medicine, The Ohio State University, Columbus, OH
Eric ErmieOffice of Medical Education, College of Medicine,
The Ohio State University, Columbus, OH
EXAMSOFT WHITE PAPER
Abstract
An aspect of curriculum mapping that is often overlooked is exam blueprinting,
which provides a necessary link between instructional content and examination items.
Computerized testing not only increases the efficiency and reliability of examination
delivery, it also provides an effective tool for exam blueprinting, item banking, and
management of examination content and quality. We designed a method to categorize
the exam items used in our preclinical medical curriculum using a unique identifying tag
to create a tagged electronic database of exam questions (TEDEQ) using the SofTeach
module of the ExamSoft test management system. Utilizing the TEDEQ output, a detailed
report of exam performance is now provided to students following completion of each
examination.
This enables students to better evaluate their performance in relevant subject areas
after each examination, and follow their cumulative performance in subject areas spread
longitudinally across the integrated medical curriculum. These same reports can be used
by faculty and staff to aid in academic advising and counseling of students. The TEDEQ
system provides a flexible tool for curricular management and examination blueprinting
that is relatively easy to implement in a medical curriculum. The information retrieved from
the TEDEQ enables students and faculty to better evaluate course performance.
Introduction
The importance of assessment in medical education is well established, not only
for guiding learning and ensuring competence, but also as a means to provide feedback
to the student as they negotiate the curriculum. Perhaps the most critical aspect of
assessment is the licensure exam, which has a significant impact on determining career
options. Assessment of knowledge and evaluating the acquisition of competencies by
students during their undergraduate medical education relies heavily upon the utilization
of multiple choice question based examinations. Despite the extensive use of multiple
choice questions in medical curricula comparatively little faculty time is dedicated to the
construction of the exam in comparison to the time involved in the design, preparation,
and delivery of curricular content.1 Instructors often do not devote sufficient effort into
the preparation of questions, and the exam tends to be assembled at the last minute with
little or no time for adequate review of the questions or evaluation of the overall balance
and quality of the exam as a whole.2 As a result, the quality of in house multiple choice
questions, especially in the pre-clinical curriculum, may suffer from being too reliant on
questions that focus on simple recall and comprehension of knowledge without effectively
testing higher order thinking skills.3,4
Curriculum mapping programs have been designed to facilitate the management
of integrated medical school curricula in order to keep track of institutional objectives,
course objectives, content areas, learning resources, learning events, assessments, and
outcomes. Many of these approaches focus on generating a database that includes a
taxonomy of subjects or concepts included in the curriculum with varying degrees of
granularity. Examples include databases such as KnowledgeMap,5 Topics for Indexing
Medical Education (TIME),6,7 and CurrMit.8 Documentation of where specific topics are
covered in the curriculum is an important component of the accreditation process for
medical schools, and utilization of curriculum management tools provides an efficient
mechanism to aid in addressing accreditation standards.
One component of effective curriculum mapping is exam blueprinting, which is often
overlooked in medical education.3,9 A test blueprint is used to link the content delivered
during a period of instruction to the items appearing on the corresponding examination,
and is a measurement of the representativeness of the test items to the subject matter.10,11
To improve the quality of written examinations it is essential that the examination questions
reflect the content of the curriculum. Therefore, test blueprinting is a critical component
of improving examination quality, but linking curricular topics with those in the exam
questions is not sufficient to ensure that the examination has content validity.9,12 In addition
to measuring whether the examination adequately represents the learning objectives,
content validity ensures that the examination is comprehensive and does not contain
biased and/or under sampling of the curriculum. Moreover, test blueprinting ensures that
the questions are balanced with regard to degree of difficulty, that the items are clearly
written and the format is not flawed, and that the examination measures higher order
thinking skills and not just factual recall.1,3,9,13 Therefore additional information, beyond
that provided by the subject taxonomy, is required of individual test questions in order to
effectively determine the content validity of assessments.
The introduction of computerized testing to medical education provides an opportunity
to increase the efficiency and reliability of the assessment process. When compared to
paper-and-pencil examinations, no difference was found in the performance of medical
students on computer-based exams.14 In addition to delivery of examination by computer,
programs designed to maintain a database of exam questions, or item bank, from which
examinations could be assembled were described nearly 30 years ago.15 However,
testing software must be extremely flexible to meet the demands of a medical school
curriculum. In addition to facilitate item banking, the software must be user friendly, be
able to collect item statistics, provide immediate scoring feedback, have the capability of
presenting items using various multimedia formats, and be able to deliver the examination
in a secure mode. Because of these various demands, suitable commercial software
products were unavailable until recently, and as a result medical schools that were early
adopters of computerized testing developed in-house solutions such as the ItemBanker
program developed at the University of Iowa.16 As part of the ItemBanker system, each
exam question is identified by a unique serial number, and the database allows for a
breakdown of question topic taxonomy, and provides statistics on performance and item
difficulty.
On-line administration of licensure examinations is becoming more commonplace in
professional education. The United States Medical Licensing Examination (USMLE) Step
1, Step 2, and Step 3 tests have been delivered in a computerized format since 1998, and
the National Board of Medical Educators provide an increasing number of examinations in
an on-line format. Similarly, the bar examination is used to determine whether a student
is qualified to practice law in a given jurisdiction. Unlike the USMLE, which is a national
examination, bar examinations are administered by each state in the United States, and 38
states currently use ExamSoft Worldwide, Inc., as the provider of secure on-line computer-
based testing software for the administration of the state bar examination.17 Based upon
this utilization, we evaluated ExamSoft among other commercial testing software products
for the administration of examinations to pre-clinical medical students, and adopted
ExamSoft for the administration of multiple-choice examinations in 2009.
The construction of a well-written exam is required to effectively measure student
competencies throughout the curriculum. Ultimately, exam performance is used to assess
the success of the educational program in preparing the student for licensure exams and
more advanced training. Therefore, a quality exam must not only measure the student’s
application of knowledge, but it is also essential that the questions adequately reflect the
course content and objectives. We describe the development of a tool to generate a tagged
electronic database of exam questions (TEDEQ) that can be used for the categorization
of multiple choice examination questions. TEDEQ provides information necessary to link
existing questions to curricular objectives using a taxonomy of instructional objectives,
identifies question characteristics, and helps ascertain the level of knowledge required to
address the question. The TEDEQ tool is easy to implement and integrate into existing
curricula and can be customized using the SofTeach module of the ExamSoft test
management system to derive the maximum amount of information from assessments. We
have integrated the TEDEQ tool as part of the computerized administration of our exams
using ExamSoft, and the information is being used to help improve examination quality,
provide input into curricular management, supplement curricular mapping documentation
for accreditation, and make content area specific performance feedback available to our
students.
The preclinical Integrated Pathway (IP) program at Ohio State University College of
Medicine is broken down into organ system blocks, which are subdivided into divisions
ranging in length from three to five weeks. At the end of each division, a 100-125 multiple
choice question examination is administered to assess whether the outlined learning
objects have been achieved. We have amassed an item bank of over 3,500 multiple
choice questions distributed across 22+ exams during the Med 1 and Med 2 years of the
IP curriculum. Previously, little or no information was gathered linking course learning
objectives with the items included on the examination, rather test items were simply
grouped according to the division test they were part of. Additionally the only performance
feedback provided to the students, administrators, or faculty following an examination
was the overall test scores and an item analysis of each exam question.
We set out to design a system that would provide more thorough feedback to the
students regarding their performance in content specific areas of a particular exam as well
as across longitudinal topics that span the curriculum. In addition, we wanted to generate
information that would help faculty guide curricular management and enable improved
examination quality, and give administrators an additional measure by which to compare
internal course performance against student performance on the USMLE Step 1 exam.
To accomplish these goals we developed a simple coding system for each question, the
Tagged Electronic Database of Exam Questions, which utilizes specific data markers to
categorize, organize, and track use of items. We accomplished this goal using features of
the question categorization tool developed within ExamSoft, which is the software used
currently for secure computerized delivery of examinations within the IP program.
While the question categorization system, contained within the ExamSoft software,
allowed for an unlimited number of categories to be assigned to each question, one of our
design objectives was to limit the number of fields applied to each question as well as the
granularity of the topic categories in order to obtain data that would be most meaningful.
For students useful information would include performance feedback in subject areas of
the curriculum, provide a study guide indicating areas of deficiency for those students
requiring remediation, and serve as an aide in planning for their USMLE Step 1 preparation.
Further, these limitations allowed us to create a system that was not overly complex or
difficult for faculty to implement. This structure insured greater faculty buy in, without
compromising the impact of information collected, which was required to meet our goals
with regard to curricular management and test construction.
The method used for item tagging/question categorization consisted of six categories
as outlined in Table 1. Each of these categories provides distinct information that will have
different significance depending on the recipient audience. The content for categories one
and two associate the exam question with the block and division of the IP curriculum, and
provides a link to the faculty member responsible for writing the item. In most cases, the
Methods
identified faculty member is also responsible for the design and delivery of the learning
materials used to meet the learning objective for which the question is designed to assess.
Therefore, the tagging system links the question to the appropriate content expert if
questions or issues arise over the validity or accuracy of the question. The “options” for
each of the first two categories are simply the names of the blocks in the curriculum and
the names of the faculty members.
Table 1 contains a detailed list of all the categorization options within the first five categories of the TEDEQ system. This table is distributed to faculty for use in categorizing questions.
Table 1 – Design of the Tagged Electronic Database of Exam Questions (TEDEQ) Categories
For category three we defined four component choices that would classify the
question type with regard to level of cognitive complexity: 1) Recall of Factual Knowledge
- memory recall/factoid questions; 2) Interpretation or Analysis of Information - questions
that required the interpretation of data from a table/graph and use of that information
to answer the question; 3) Application: Basic Science Vignette - questions pertaining to
foundational science that contain a vignette of patient information that must be applied
requiring multiple steps of knowledge application to deduce the correct answer; and 4)
Application: Clinical Science Vignette - questions pertaining to clinical science that contain
a vignette of patient information that must be applied requiring multiple steps of knowledge
application to deduce the correct answer. To code the question, a number was assigned
to each category component. For example, a recall question generated by Dr. Smith in
the first division of the Neuroscience block would be coded Neuro1.Smith.1. Subsequent
categories were assigned numeric values and added to the end of the code sequentially
separated by periods.
Categories four, five and six are used to map the question to content areas. The
categorization was designed to meet the components of our curriculum, but is also flexible
enough to be applied to other medical curricula as well. To make the tagging system as
widely applicable as possible we focused on the subject categories of the USMLE Step 1
exam.18 We first tag the question into broad categories with regard to process and focus.
These include normal process, abnormal process, therapeutics, or gender, ethnic, and
behavioral considerations.
All medical schools whose students take the USMLE Step 1 exam receive a report from
USMLE that breaks down the performance of their students into 20 specific categories in
comparison to the national average for each of those categories. These categories are
the same as the 20 categories content is broken down in the USMLE Step 1 study guide
provided to medical students for Step 1 exam preparation. Since these major subject
areas are comparable to those used in either the block design of our curriculum or as
longitudinal subject areas that run across the IP curriculum, they were adopted as the
20 subject areas for category five. The study guide also contains specific sub-topics for
each of the 20 subject areas. We reviewed the sub-topics from the USMLE Step 1 study
guide and compared them with an internal set of learning objectives and topics that we
use within our curriculum. These two lists were combined and modified as necessary to
create the sub-categories that comprise category six. A sampling of those sub-categories
is presented in Table 2. A set of sub-categories was created for and matched to each of
the 20 subject areas of category five.
Table 2 contains a sampling of the sub-categories used in the TEDEQ system. In total there are 290 sub-categories within the system that are associated with one of the 20 USMLE subject areas. Faculty were required to select one sub-category for each subject area they associated with a question.
Faculty buy in was key to the successful implementation of the system. As such we
met with our faculty members who serve as block leaders within the curriculum to explain
the categorization system, how it worked and what it could do for them. We provided
all block leaders instructions detailing the guidelines for applying the categorization to
their exams. Faculty members were instructed that they could only assign one option
from each category for categories one through four. A question could be assigned to up
to three subject categories (category 5), however for each designation in category five a
corresponding sub-category designation in category six is required. The number of sub-
categories per subject category ranged from 5 to 25.
Rather than attempt to force a curriculum wide application of the TEDEQ system
simultaneously, we chose to work the application into the existing framework of the exam
process. Therefore, categorization was required for each examination as it was generated
during the academic year. As a part of normal preparation and revision, application of
the TEDEQ categorization was added to the examination development process. A report
was generated using ExamSoft that served as a template for faculty members to review all
questions appearing on the exam. The report included the item analysis for each question
(if available from previous assessment records), and a section was provided for assignment
of the TEDEQ code to each question (Figure 1). The TEDEQ database was generated based
upon the codes assigned to each exam item. The database could be used retroactively,
since any question used on a previous assessment in ExamSoft would be recognized
and assigned TEDEQ categories. This allowed second year medical students to review
category performance on previous exams from their first year of medical school.
Figure 1 - Sample Question with TEDEQ Categorization
Figure 1 contains an example from the report faculty members use to categorize exam questions. It displays the item analysis of the question (from use on the previous year’s exam), the question text, the image (if any) associated with the question and the categories associated with each question.
The TEDEQ method has been implemented successfully across the Med 1 and Med 2
IP curriculum for the 2010-2011 academic year. It has been well received by both faculty
and students, and we have successfully gathered data on all of our division assessments
using this system. While data collection is continuing, there have been some immediate
applications of the information provided by TEDEQ. As feature a of the ExamSoft software
students can instantly view a breakdown report of their individual performance in every
category applied to exam items (Table 3).
Results
Table 3 - Sample Exam Performance Breakdown
Table 3 is a sample of the TEDEQ report generated following each exam, which breaks down the performance of each student on that exam. This same report is also generated for faculty and staff; however it also contains the breakdown for the overall class performance for comparison purposes.
The students access these reports at anytime post exam by logging into a website,
and can view and compare their results from exam to exam throughout the course of the
year. Course leaders receive an identical report, but the results represent an aggregate for
the class as a whole and not an individual student. The individual students performance
data can also be readily plotted in comparison to class average performance for any
coded category, which serves as an additional aid for students to evaluate their academic
success in the IP program (Figure 2). For example, a student would be able to evaluate
their performance on recall questions in comparison to clinical vignettes, or in particular
subject areas such as pharmacology. This information can be provided to the student for
each exam as shown in Figure 2, as well as a cumulative average across exams as the
student progresses through the curriculum. These individual student performance reports
(Table 3 and Figure 2) can be generated for the faculty and staff review as necessary for
academic advising.
Figure 2 - Sample Student Performance Analysis
Figure 2 illustrates comparative student performance in four of the six TEDEQ categories to the performance of the whole class. This report is used by students and faculty to distinguish areas of strength and weakness in student performance.
Discusion
The TEDEQ method we developed for creating a linked database of information relating
exam items to curricular content was implemented with several specific applications in mind.
At the time of development, we did not have the ability to either accurately monitor how
well examination content addressed instructional objectives nor track student performance
on specific longitudinal foundational science subjects that cross blocks of the curriculum
and appear on multiple assessments. Therefore, the initial goal was to create a centralized
repository of information that allows for both improved quality control of assessments and
more detailed tracking of student performance in selected subject areas that span the
curriculum. The preclinical IP program is organized around organ systems, and integrates
normal structure/function with pathophysiology and clinical aspects of disease.
The TEDEQ coding system begins by identifying the temporal location and source of
each question within the curriculum. This is followed by two broad categories that define
general properties of each question with relation to cognitive skills and process that the
item is addressing. Each item is then assigned to a subject area corresponding to the
classification of topics used by the USMLE to both guide student preparation for the Step
1 licensure examination and breakdown student performance. The final level of granularity
in the tagging code indicates the most relevant sub-categories within each subject area
based upon curricular learning objectives. In many, but not all cases, this final level of
classification is directly related to the categories also defined by the USMLE for Step 1.
Utilizing a coding system that closely aligns with the USMLE Step 1 categories not only
allows for the collection of the necessary information to link internal learning objectives
with the assessment content, but also provides an opportunity to directly compare and
analyze student performance in discipline or subject specific components of the integrated
curriculum with performance on the Step 1 examination. This provides an opportunity
to identify potential areas of curricular content that excel in preparing students for the
licensure examinations, or those areas that may need attention and improvement in order
to better prepare the students.
The TEDEQ database provides a critical component necessary to blueprint the medical
curriculum, namely an exam blueprint,3,9 and the corresponding content validity.10,11
Another important aspect of aligning content between the curriculum and the examination
is that it provides greater relevance to the assessment.19 For example, the feedback report
TEDEQ enables us to generate, provides the student with immediate information relevant
to their success in the curriculum. In addition, these performance reports also allow the
student learner to more easily visualize how the knowledge base builds upon itself as they
progress through the curriculum especially in longitudinal subject areas providing academic
relevance. Since the TEDEQ subject breakdown relates their current performance in
the curriculum to topics they will encounter in future licensure examinations, the report
provides an additional tool the student can use to gauge readiness and develop their study
plans for USMLE Step 1 preparation. Thus the current curricular examination gains future
relevance to the student. Together, feedback reports generated by TEDEQ contribute to
providing greater authentic relevance to the assessment process.19
The TEDEQ reports have already been used by the teaching faculty to identify selected
areas of curricular content that students are not performing as well as expected on the
assessment. For example, we have identified sub-categories within the gross anatomy
content that indicate students are having difficulty with specific anatomical regions.
Having identified these topics, anatomy faculty are designing additional e-learning
objects targeting these topics that will be available for the incoming class of students
to supplement the material currently presented in the gross anatomy component of the
curriculum.
In summary, the TEDEQ database provides a powerful tool for curricular management
that is easy for faculty to implement. The system was developed within the ExamSoft
software, which has been used to deliver computerized medical examinations at our
institution for the past two years. Information provided by the TEDEQ database allows for
exam blueprinting, which serves as a source of additional information necessary to meet
accreditation guidelines for curricular content management. The exam item breakdown
reports provide useful performance feedback to the students as well as faculty instructors.
The feedback information can be used to guide student remediation, student study habits,
and direct curricular modification. In future, we plan to use the TEDEQ database to guide
the design and assembly of higher quality examinations within the preclinical medical
curriculum.
1. Wallach PM, Crespo LM, Holtzman KZ, Galbraith RM, Swanson DB. Use of a committee review process to improve the quality of course examinations. Adv Health Sci Educ 2006;11:61-8.
2. Jozefowicz RF, Koeppen BM, Case S, Galbraith R, Swanson D, Glew RH. The quality of in-house medical school examinations. Acad Med 2002;77:156-61.
3. Hamdy H. Blueprinting for the assessment of health care professionals. Clin Teach 2006;3:175-9.
4. Chandratilake MN, Davis MH, Ponnamperuma G. Evaluating and designing assessments for medical education: the utility formula. Internet J Med Educ 2010;1:1-9.
5. Denny JC, Smithers JD, Armstrong B, Spickard III A. “Where do we teach what?” Finding broad concepts in the medical school curriculum. J Gen Intern Med 2005;20:943-6.
6. Willett TG, Marshall KC, Broudo M, Clarke M. TIME as a generic index for outcome-based medical education. Med Teach 2007;29:655-9.
7. Willett TG, Marshall KC, Broudo M, Clarke M. It’s about TIME: a general-purpose taxonomy of subjects in medical education. Med Educ 2008;42:432-8.
8. Salas AA, Anderson MB, LaCourse L, Allen R, Candler CS, Cameron T, Lafferty D: CurrMIT. a tool for managing medical school curricula. Acad Med 2003;78:275-9.
9. Bridge PD, Musial J, Frank R, Roe T, Sawilowsky S. Measurement practices: methods for developing content-valid student examinations. Med Teach 2003;25:414-21.
10. McLaughlin K, Coderre S, Woloschuk W, Mandin H. Does blueprint publication affect students’ perception of validity of the evaluation process? Adv Health Sci Educ 2005;1015-22.
11. Coderre S, Woloschuk W, McLaughlin K. Twelve tips for blueprinting. Med Teach 2009;31:322-4.
12. Lynn MR. Determination and quantification of content validity. Nurs Res 1986;35: 382-5.
13. Yaghmale F. Content validity and its estimation. J Med Educ 2003;3:25-7.
14. Kies SM, Williams BD, Freund GC. Gender plays no role in student ability to perform on computer-based examinations. BMC Med Educ 2006;6:57.
15. Hall JR, Weitz FI. Question database and program for generation of examinations in national board of medical examiner format. Proc Annu Symp Comput Appl Med Care 1983;26:454-6.
16. Peterson MW, Gordon J, Elliott S, Kreiter C. Computer-based testing: initial report of extensive use in a medical school curriculum. Teach Learn Med 2004;16:51-9.
17. ExamSoft [http://examsoft.com/main/index.php?option=com_content&view= article&id=33&Itemid=7#NEWS12].
18. United States Medical Licensing Examination Website [http://www.usmle.org/ examinations/step1/2011step1.pdf].
19. D’Eon M, Crawford R. The elusive content of the medical-school curriculum: a method to the madness. Med Teach 2005;27:699-703.
References