a little history - phlesig.files.wordpress.com · 3 © 2007 pedagogue solutions. all rights...
TRANSCRIPT
1
Best Practices in Test Development and
Validation
© 2007 Pedagogue Solutions. The ideas expressed in this presentation are the sole property of Pedagogue Solutions and may not be disseminated without its
express written permission.
© 2007 Pedagogue Solutions. All Rights Reserved.
A Little History
2
© 2007 Pedagogue Solutions. All Rights Reserved.
Testing Circa 1991
Testing is a nice to have, not a need to haveTesting is not part of our corporate culture
© 2007 Pedagogue Solutions. All Rights Reserved.
Fifteen Years Ago
Tests were largely paper and pencil (if testing was done at all)Tests were not taken seriously by trainers (poorly written, poorly referenced, and perfunctory)Tests were not taken seriously by learners or management (no consequences)Tests were written by anyoneTests were an afterthought to the training curriculum Test data was not captured and analyzed
3
© 2007 Pedagogue Solutions. All Rights Reserved.
The Art and Science of Training
© 2007 Pedagogue Solutions. All Rights Reserved.
Today
4
© 2007 Pedagogue Solutions. All Rights Reserved.
The Art and Science of Training
© 2007 Pedagogue Solutions. All Rights Reserved.
Corporate Assessment
The eLearning Guild’s 2005 report on Metrics and Measurementprovides the following data on how frequently each of the Kirkpatrick levels is actually used by organizations:
Never Rarely Sometimes Frequently Always
Level 1 0% 1% 11% 25% 63%
Level 2 1% 7% 20% 38% 34%
Level 3 15% 25% 31% 23% 6%
Level 4 31% 36% 19% 10% 4%
5
© 2007 Pedagogue Solutions. All Rights Reserved.
Do you certify your representatives on their job required knowledge?
Yes No
I think s
o
59%
0%5%
36%
1. Yes2. No3. I think so4. I don’t know
© 2007 Pedagogue Solutions. All Rights Reserved.
Do you have positive or negative consequences related to test results?
1. Yes2. No3. I think so4. I don’t know
Yes No
I think s
o
73%
9%5%14%
6
© 2007 Pedagogue Solutions. All Rights Reserved.
Today
Tests are largely delivered on-line (though a surprising number of companies still use paper-and-pencil tests)Tests are not taken seriously by trainers (poorly written, poorly referenced, and perfunctory)Tests are not taken seriously by learners or management Tests are written by anyoneTests are an afterthought to the training curriculum Test data is not captured and analyzed
© 2007 Pedagogue Solutions. All Rights Reserved.
The Need for an Assessment Strategy
7
© 2007 Pedagogue Solutions. All Rights Reserved.
Why Do We Measure?
© 2007 Pedagogue Solutions. All Rights Reserved.
The Science of Training: Why Do We Evaluate?
To ensure competency
To improve the quality of the training
To justify training costs
8
© 2007 Pedagogue Solutions. All Rights Reserved.
Governance and administrationLegal issuesMake expectations explicit and publicDetermine methods of testingEstablish assessment frequencyAssessment securityJob competency analysisDefine termsCreate fair, valid and reliable assessmentsDetermine cut (passing) scoresRemediation and consequencesRecertificationProgram evaluation/Item analysis
Elements of an Assessment Strategy
© 2007 Pedagogue Solutions. All Rights Reserved.
Governance and administrationLegal issuesMake expectations explicit and publicDetermine methods of testingEstablish assessment frequencyAssessment securityJob competency analysisDefine termsCreate fair, valid and reliable assessmentsDetermine cut (passing) scoresRemediation and consequencesRecertificationProgram evaluation/Item analysis
Elements of an Assessment Strategy
9
© 2007 Pedagogue Solutions. All Rights Reserved.
Common TermsCommon Terms
AssessmentQuizTestExamEvaluationPretestPost-testFormative Assessment
• Summative Assessment• Diagnostic Assessment• Rubric• Performance Assessment• Self-assessment• High Stakes Assessment• Certification
© 2007 Pedagogue Solutions. All Rights Reserved.
Common TermsCommon Terms
AssessmentQuizTestExamEvaluationPretestPost-testFormative Assessment
• Summative Assessment• Diagnostic Assessment• Rubric• Performance Assessment• Self-assessment• High Stakes Assessment• Certification
11
© 2007 Pedagogue Solutions. All Rights Reserved.
Passing Scores Must be “Defensible”
© 2007 Pedagogue Solutions. All Rights Reserved.
and…
12
© 2007 Pedagogue Solutions. All Rights Reserved.
Knowledge must be re-certified periodically
© 2007 Pedagogue Solutions. All Rights Reserved.
Creating Fair, Valid and Reliable Assessments
13
© 2007 Pedagogue Solutions. All Rights Reserved.
Validity and Reliability
© 2007 Pedagogue Solutions. All Rights Reserved.
What is Validity?
14
© 2007 Pedagogue Solutions. All Rights Reserved.
Validity
Construct ValidityFace ValidityPredictive ValidityContent Validity
© 2007 Pedagogue Solutions. All Rights Reserved.
Construct Validity
Are you measuring what you think you are measuring?
15
© 2007 Pedagogue Solutions. All Rights Reserved.
Face Validity
Will your exam appear fair to the test takers?
© 2007 Pedagogue Solutions. All Rights Reserved.
Predictive Validity
A quantitative measure of how well a test predicts some form of measurable behavior.
17
© 2007 Pedagogue Solutions. All Rights Reserved.
Reliability
Consistency over timeConsistency across forms Consistency among itemsConsistency among evaluators
© 2007 Pedagogue Solutions. All Rights Reserved.
A Model for Defensible Training and Measurement
18
© 2007 Pedagogue Solutions. All Rights Reserved.
The Science of Training
© 2007 Pedagogue Solutions. All Rights Reserved.
Developing Valid Content
19
© 2007 Pedagogue Solutions. All Rights Reserved.
Content Development
Learning Objectives
© 2007 Pedagogue Solutions. All Rights Reserved.
Learning Objectives
The learning objectives drive the creation of the contentEach objective should be conciseEach objective should be “testable”Assessment questions test the learning objectives
20
© 2007 Pedagogue Solutions. All Rights Reserved.
Bloom’s Taxonomy
© 2007 Pedagogue Solutions. All Rights Reserved.
Bloom’s Taxonomy
22
© 2007 Pedagogue Solutions. All Rights Reserved.
Examples of testing for:
Knowledge– Choose the best definition for the term arthritis.
Comprehension– Which of the following is an example of
thrombosis?
Application– You are in conversation with a physician about the
safety data for your drug. Identify the criticalsupporting data points from the pivotal trial results.
© 2007 Pedagogue Solutions. All Rights Reserved.
Question Distribution
150 (100%)26 (~20%)62 (~40%)62 (~40%)Totals
30
(20%)
61113Selling Skills
30
(20%)
61014Managed Care
30
(20%)
51510Market
30
(20%)
51411Disease State
30
(20%)
41214Anatomy/Physiology
Application(Level 3)
Comprehension(Level 2)
Knowledge(Level 1)
# of items(percent)
Objective/ Content Area
23
© 2007 Pedagogue Solutions. All Rights Reserved.
The Science of Training: Measurement
.
© 2007 Pedagogue Solutions. All Rights Reserved.
Developing Valid Measurements
24
© 2007 Pedagogue Solutions. All Rights Reserved.
Knowledge-based Assessments:Four Keys to Developing Valid Questions
Questions must be properly constructed
Questions must be content-validated by placement within a structure of learning objectives
Questions must be written at the proper cognitive level by categorization within Bloom’s Taxonomy
Thorough post-hoc statistical evaluation must be performed
© 2007 Pedagogue Solutions. All Rights Reserved.
Question Development
Some Rules for Writing Valid Questions
25
© 2007 Pedagogue Solutions. All Rights Reserved.
Question 1
Cardiac tissue, composed of atrial muscle,ventricular muscle and specialized excitatory and conductive muscle fibers with variable contractile properties, can be correctly described as:
A. a collection of intercalated discsB. a syncytium of many cells *C. a gap junction structureD. a conductive filament bundle
© 2007 Pedagogue Solutions. All Rights Reserved.
Rule 1: One piece of information per question
Which of the following describes the structural nature of cardiac muscle cells?
A. a collection of intercalated discsB. a syncytium of many cells *C. a gap junction structureD. a conductive filament bundle
26
© 2007 Pedagogue Solutions. All Rights Reserved.
Question 2
What is the probable cause of myelosuppression in patients withCML therapy?
A. The product is toxic to most progenitor cells, which is why the drug is effective.
B. The product removes cells with the Lewes mutation, which make up an increasing fraction of progenitor cells as the disease advances. *
C. The product promotes apoptosis of rapidly dividing cells, which includes progenitor cells, erythrocytes, leukocytes, andthrombocytes.
D. The product inhibits cell division in neutrophils, which is a desired effect.
© 2007 Pedagogue Solutions. All Rights Reserved.
Rule 2: All information should appear in the stem
What is the probable cause of myelosuppression in patients withCML therapy?
A. The product is toxic to most progenitor cells.B. The product removes cells with the Lewes mutation. *C. The product promotes apoptosis of rapidly dividing cells.D. The product inhibits cell division in neutrophils.
27
© 2007 Pedagogue Solutions. All Rights Reserved.
Question 3
Sympathetic nervous stimulation of the heart results in:
A. decreased ejection volumeB. decreased contractile forceC. increased heart rate *D. the Frank-Starling mechanism
© 2007 Pedagogue Solutions. All Rights Reserved.
Rule 3: Choices should be parallel in format
Sympathetic nervous stimulation of the heart results in:
A. decreased ejection volumeB. decreased contractile forceC. increased heart rate *D. increased reserve volume
29
© 2007 Pedagogue Solutions. All Rights Reserved.
Question 5
Which of the following waves, complexes and intervals can appear on an electrocardiogram:
A. PB. QRSC. QTD. All of the above *
© 2007 Pedagogue Solutions. All Rights Reserved.
Rule 5: Avoid “All of the above”
Which of the following waves, complexes and intervals DOES NOT appear on an electrocardiogram:
A. PB. QRSC. QTD. U*
30
© 2007 Pedagogue Solutions. All Rights Reserved.
Question 6
The cardiac cycle’s period of ejection is phase:
A. IVB. IIC. ID. III *
© 2007 Pedagogue Solutions. All Rights Reserved.
Rule 6: Arrange choices in logical order
The cardiac cycle’s period of ejection is phase:
A. IB. IIC. III *D. IV
31
© 2007 Pedagogue Solutions. All Rights Reserved.
Question 7
The function of the atrioventricular valves is to:
A. prevent backflow of blood from the ventricles to the atria during systole *
B. regulate pressureC. ensure atrial fillingD. ensure atrial ejection
© 2007 Pedagogue Solutions. All Rights Reserved.
Rule 7: Correct choice and distractors should be of the same length
Atrioventricular valves are unlike the pulmonary valves in that they:
A. are exposed to more abrasionB. snap shut rapidly at the end of systoleC. close by papillary muscle contractionD. are subjected to much lower pressures *
32
© 2007 Pedagogue Solutions. All Rights Reserved.
Question 8
Which effects are correlated with a complete cytogenetic response (CCR)?
A. Increased risk of adverse eventsB. Increased survival *C. Increased risk of treatment resistanceD. Decreased probability of comorbidities
© 2007 Pedagogue Solutions. All Rights Reserved.
Rule 8: All choices should be in syntactic and semantic agreement with the stem
Which of the following is correlated with a completecytogenetic response (CCR)?
A. Increased risk of adverse eventsB. Increased survival *C. Increased risk of treatment resistanceD. Decreased probability of comorbidities
33
© 2007 Pedagogue Solutions. All Rights Reserved.
Question 9
What member of the mental health team is the key prescriber of psychotropic medication?
A. Psychiatric nurseB. Psychiatrist *C. Occupational therapistD. Receptionist
© 2007 Pedagogue Solutions. All Rights Reserved.
Rule 9: All choices must be plausible
What member of the mental health team is the key prescriber of psychotropic medication?
A. Psychiatric nurseB. Psychiatrist *C. Occupational therapistD. Primary care physician
34
© 2007 Pedagogue Solutions. All Rights Reserved.
Summary: Rules for Creating Multiple Choice Questions
One piece of information per questionAll information should appear in stemAll choices should be parallel in formDo not use double negativesAvoid “all of the above”Arrange responses in logical orderCorrect choice and distractors should be of the same lengthAll choices should be in syntactic and semantic agreement with the stemAll choices must be plausible
© 2007 Pedagogue Solutions. All Rights Reserved.
Summary: Elements of Test Validity
Identify each course objective and determine if it is criticalto the job
Determine that all questions properly test their respective learning objectives.
Make sure all questions follow the rules of valid question writing
Agree that the assessment can determine a learner who has mastered the content
Make sure you have a proper balance of Knowledge, Comprehension and Application questions
Document your decisions
35
© 2007 Pedagogue Solutions. All Rights Reserved.
Setting a Passing Score
© 2007 Pedagogue Solutions. All Rights Reserved.
Who sets your passing test score?
1. I do2. Upper management3. Training management4. Therapeutic area5. I haven’t a clue who sets it
50%
I do
Upper man
agem
ent
Training M
anag
emen
t
Therapeu
tic ar
ea
28%
0%
17%
6%
I hav
en’t a
clue w
ho sets
it
36
© 2007 Pedagogue Solutions. All Rights Reserved.
Setting Cut Scores: The Three Most Common Methods
The Higher Authority Method:– “Our Vice President said it should be 90”
The Committee Method:– “90 seems about right”
The Received Wisdom Method:– “I don’t know how or when it got set, but it’s always
been 90”
© 2007 Pedagogue Solutions. All Rights Reserved.
How you set the passing score depends on the type of test you
are giving
37
© 2007 Pedagogue Solutions. All Rights Reserved.
Types of Tests
Broadly speaking there are two types of tests:
• Norm-referenced tests• Criterion-referenced (or mastery) tests
© 2007 Pedagogue Solutions. All Rights Reserved.
Norm-referenced Tests
A norm-referenced test is one in which all scores are compared to a mean score
38
© 2007 Pedagogue Solutions. All Rights Reserved.
Criterion-referenced Tests
A criterion-referenced test is one in which scores are judged against a pre-set “mastery” level
© 2007 Pedagogue Solutions. All Rights Reserved.
What is your passing test score?
1. <80%2. 80%3. 85%4. 90%5. >90%6. Varies from test to test
< 80%
80% 85
%90
%
0%
47%
0%0%
26%26%
> 90%
Varies
from te
st to te
st
39
© 2007 Pedagogue Solutions. All Rights Reserved.
Angoff Method
Identify judges who are familiar with the competency covered by the test.For each item on the test each judge estimates the probability that a minimally competent person would get it right.Sum the probabilities of each judgeAverage the judges’ scores
© 2007 Pedagogue Solutions. All Rights Reserved.
Angoff Method: Example
Item
1
2
3
4
5
Total
Percent
Judge 1
.75
.80
.75
.90
.95
4.15
83%
Judge 2
.80
.90
.75
.90
.75
4.10
82%
Judge 3
.85
1.00
.90
.80
.85
4.40
88%
Averaging the totals for each Judge Cut Score= 84%
40
© 2007 Pedagogue Solutions. All Rights Reserved.
Remediationand Consequences
© 2007 Pedagogue Solutions. All Rights Reserved.
Do you have a formal system of remediation for students who fail a test?
Yes No
Unsure
33%
11%
56%1. Yes2. No3. Unsure
41
© 2007 Pedagogue Solutions. All Rights Reserved.
Remediation
Must have a well-thought out remediation plan
Should involve:– Trainer(s)– Manager(s)
Provide multiple, but fixed number of, attempts to display mastery
© 2007 Pedagogue Solutions. All Rights Reserved.
Consequences
There must be consistent and increasing consequences for failure
At each “failure” you may involve higher levels of corporate management
Usually the final step is to involve HR
42
© 2007 Pedagogue Solutions. All Rights Reserved.
Recertification
© 2007 Pedagogue Solutions. All Rights Reserved.
Do You Retest Knowledge Periodically?
1. Yes2. No3. I think so4. I don’t know
43
© 2007 Pedagogue Solutions. All Rights Reserved.
Ebbinghaus Curve of Forgetting
© 2007 Pedagogue Solutions. All Rights Reserved.
Ebbinghaus Curve of Forgetting
44
© 2007 Pedagogue Solutions. All Rights Reserved.
Re-certification
Re-certification applies to credentials that have a time limit.
It usually involves re-training and re-assessment.
© 2007 Pedagogue Solutions. All Rights Reserved.
Analyzing Results
45
© 2007 Pedagogue Solutions. All Rights Reserved.
Analyzing Results
Point-biserial correlationDifficulty levelChoice distributionMean score for each answer choiceScore by taxonomy levelScore by learning objective/topic
© 2007 Pedagogue Solutions. All Rights Reserved.
In Summary
Take the time to construct a good assessment
Take the time to validate your assessments
Take the time to set the passing score
Find reviewers and test the assessments
Set the right expectations for the learners
Analyze results and revise assessments as necessary
46
© 2007 Pedagogue Solutions. All Rights Reserved.
Questions?
Steven B. Just Ed.D.Pedagogue [email protected] x12