meta-analysis of personnel selection tests & overview of situational judgment tests michael a....
Post on 21-Dec-2015
220 views
TRANSCRIPT
Meta-Analysis of Personnel Selection Tests
&Overview of Situational
Judgment TestsMichael A. McDaniel
Virginia Commonwealth University [email protected]
Deborah L. WhetzelHuman Resources Research Organization
Prepared for:International Workshop on “Emerging Frameworks and Issues for S&T Recruitments”Society for Reliability Engineering, Quality and Operations Management (SREQOM)
Delhi, India
September, 2008
Nhung NguyenTowson [email protected]
SREQOM, Delhi, India 2
Overview Introduction—Dr. McDaniel Introduction—Dr. Whetzel Introduction—Dr. Nguyen Meta-analysis Meta-analysis results What are SJTs? Brief history of SJTs Item characteristics, response instructions,
and item heterogeneity Steps in developing SJTs Scoring SJTs
SREQOM, Delhi, India 3
Dr. McDaniel’s Department
Department of Management, School of Business, Virginia Commonwealth University, Richmond, Virginia 100 miles south of Washington, DC
SREQOM, Delhi, India 4
Dr. McDaniel’s Department
PhD program in Management emphasizes organizational behavior and human resources.
The Center for the Advancement of Research Methods and Analysis (CARMA) is a non-profit unit of the School of Business at Virginia Commonwealth University (VCU). Established in 1997 by Dr. Larry Williams Hosted over 60 events and 100 presentations on research
methods topics for faculty and doctoral students world-wide. Interdisciplinary focus, with emphasis on topics relevant to the
social and organizational sciences. (www.pubinfo.vcu.edu/carma/)
SREQOM, Delhi, India 5
Dr. McDaniel’s Research Theme
Applications of meta-analysis to examine the validity of personnel selection methods:
Cognitive ability tests Interviews Reviews of training and experience Customer service tests Firefighter tests Short-term memory tests Job experience Job knowledge Situational judgment tests (SJTs)
SREQOM, Delhi, India 6
Dr. Whetzel’s Organization
Human Resources Research Organization (HumRRO) in Alexandria, Virginia 5 miles from Washington DC
SREQOM, Delhi, India 7
The Human Resources Research Organization (HumRRO)
Independent non-profit research organization Established in 1951 as part of the U.S.
Army Became independent in 1969 Headquarters in Alexandria, VA
Diverse staff: industrial/organizational psychologists, instructional designers, statisticians, management analysts, web programmers
100 professional staff, 20 support staff Strong history in selection, assessment,
training, and evaluation
SREQOM, Delhi, India 8
Deborah Whetzel Experience in personnel selection research
and development Areas of expertise include:
conducting job analyses, developing competency models, developing performance appraisal systems, and developing and validating assessment processes,
including structured interviews and SJTs.
SREQOM, Delhi, India 9
Dr. Nguyen’s Department Towson University, Towson, Maryland
60 miles north of Washington,DC
SREQOM, Delhi, India 10
Nhung Nguyen Experience in personnel selection
research and development Areas of expertise include:
Situational judgment test research, including subgroup differences
Wrote monograph for the International Personnel Management Association Assessment Council (IPMAAC)
Meta-Analyses of Personnel Selection Tests
SREQOM, Delhi, India 12
What is Meta-Analysis?
Meta-analysis is the quantitative combination of information from multiple empirical studies to produce an estimate of the overall magnitude of a relationship between an employment test and job performance.
Meta-analysis uses statistical procedures to determine the best estimate of the correlation between test and job performance.
SREQOM, Delhi, India 13
Meta-Analysis
Although meta-analysis can be statistically complicated, conceptually it is simple.
In a meta-analysis of employment tests, one averages the correlations between the test and job performance across studies. The correlations are typically called “validity
coefficients.” Weight the studies so that studies with greater
numbers of participants have more effect on the average.
SREQOM, Delhi, India 14
Meta-Analysis
In addition to calculating the mean (average) validity, one also looks at the variability around the mean.
Some of the variability is due to random causes (sampling error).
Particularly with smaller sample studies, results will vary due to some samples being more representative of the population than other samples.
SREQOM, Delhi, India 15
Meta-Analysis
The next few slides may be difficult for those without statistical training.
Do not worry if you do not understand all of it. We will get to the typical validities of different
types of selection tests soon. These typical validities are what one needs to
make informed decisions.
SREQOM, Delhi, India 16
Meta-Analysis and Sampling Error
Sample Size
Sam
plin
g E
rror
SREQOM, Delhi, India 17
Meta-Analysis and Sampling Error
The relationship between sampling error and sample size is asymptotic. Increasing sample size results in decreasing
random sampling errorAs sample size increases, one gets
diminishing returns in the reduction of random sampling error.
SREQOM, Delhi, India 18
Meta-Analysis and Sampling Error
Meta-analysis combines the results from many different studies.
Sampling error across studies tends to cancel out (sampling errors in one direction will be balanced by the sampling errors in the other direction).
The meta-analytic result gives a close approximation to the population value.
SREQOM, Delhi, India 19
Meta-Analysis and Moderators
Some of the variability is likely due to the test working better for some jobs than others.
These job effects are moderators of the correlation between the test and job performance. Extroversion is better predictor of sales jobs than
most other jobs. Cognitive ability tests are good predictors of all jobs
but work the best for cognitively demanding jobs (i.e., S&T jobs)
SREQOM, Delhi, India 20
Meta-Analysis
Meta-analysis tries to partition variability in the studies to better understand it.
Sampling Error
Artifactual variance in addition to sampling error
Moderator
SREQOM, Delhi, India 21
Meta-Analysis
So far, we have briefly talked about random sampling error and moderators.
But the graph shows some variance called “artifactual variance in addition to moderators.”
Measurement error and range restriction differences across studies are this additional artifactual variance.
SREQOM, Delhi, India 22
Meta-Analysis: Measurement Error
All measures (job performance ratings) have some measurement error.
The more the measurement error, the lower the reliability of the measure.
Measurement error causes an observed effect size to underestimate its population parameter.
Differences across studies in measurement error cause variance.
SREQOM, Delhi, India 23
Meta-Analysis: Range Restriction
Personnel selection literature has range restriction due to the pre-selection of the sample. Seek to know the value of a predictor of
performance but only have job performance data for the sample members who score high on the predictor.
Range restricted samples tend to underestimate the validity coefficient.
SREQOM, Delhi, India 24
Meta-Analysis and Artifactual Variance Meta-analysis methods try to estimate the
effects of measurement error and range restriction when determining the mean validity of an employment test and the extent to which it accounts for variability in validities.
SREQOM, Delhi, India 25
Meta-Analysis: Publication Bias
A growing concern in the meta-analysis literature is whether the studies available to average are representative.
Some employment test publishers have been known to suppress studies that make their testing products look bad.
This causes the studies available to the reviewer to overestimate the validity of the test.
SREQOM, Delhi, India 26
Meta-Analysis: Know what you are predicting Another consideration is what measures of job
performance are used. For example, integrity test validity studies often
use a self-report of employee theft as the measure of job performance. Far fewer studies correlate the tests with job performance as measured by more common measures (e.g., supervisor ratings).
SREQOM, Delhi, India 27
Meta-Analysis: Summary of Validity
Schmidt and Hunter (1998) summarized 85 years of research findings
The next table shows validity for predicting job performance (typically, supervisor ratings)
All corrected for downward bias due to measurement error, range restriction using incumbent samples
A second table presents results for personality tests.
SREQOM, Delhi, India 28
Meta-Analysis ResultsEmployment test Validity (r)
General mental abilityFrom Hunter (1980)
.51
InterviewsFrom McDaniel et al (1994)
.38 (unstructured)
.51 (structured)
Job knowledge testsFrom Hunter & Hunter (1984)
.48
Training and experience (behavioral consistency)From McDaniel et al (1988a)
.45
Assessment centersFrom Gaugler et al (1987)
.37
BiodataFrom Schmidt & Hunter (1998)
.35
Situational judgment testsFrom McDaniel et al (2007) (No range restriction correction)
.26
SREQOM, Delhi, India 29
Meta-Analysis Results for Personality
Personality Test Validity (r)
Conscientiousness .20
Emotional stability .13
Agreeableness .11
Extraversion .09
Openness .06
From Hurtz & Donovan, 2000
Situational Judgment Tests
SREQOM, Delhi, India 31
What Are SJTs?
An applicant is presented with a situation and several response options and is asked to evaluate the responses.
SJT items are typically in a multiple choice format.
SREQOM, Delhi, India 32
Everyone in your work group has received a new computer except you. What is the best action to take?
A. Assume it was a mistake and speak to your supervisor.
B. Ask your supervisor why you are being treated unfairly.
C. Take a new computer from a co-worker’s desk.
D. Complain to human resources.
E. Quit.
SREQOM, Delhi, India 33
Brief History
Judgment scale in the George Washington University Social Intelligence Test (Moss, 1926)
Used in World War II by psychologists working for the US military
Practical Judgment Test (Cardall, 1942)
SREQOM, Delhi, India 34
Brief History continued
How Supervise? (File & Remmers, 1948) 1949 Test of Supervisory Judgment
(Richardson, Bellows & Henry, 1949) 1960’s SJTs were in use at the U.S. Civil
Service System (Greenberg, 1963)
SREQOM, Delhi, India 35
Brief History continued
1990’s Motowidlo reinvigorated interest in SJTs (Motowidlo et al. 1990; Motowidlo, & Tippins, 1993) “Low fidelity” simulations
1990’s Sternberg “tacit knowledge” tests (Sternberg et al, 1993, 1995; Wagner & Sternberg, 1991)
SREQOM, Delhi, India 36
Brief History continued
Today, SJTs are used in many organizations, are promoted by various consulting firms, and are researched by many.
SREQOM, Delhi, India 37
Brief History continued
Current popularity is based on assertions that SJTs:Have low adverse impact (subgroup
differences)Have good acceptance by applicants Assess job-related knowledge or skills not
readily tapped by other measures
SREQOM, Delhi, India 38
Item Characteristics Item stems can be distinguished along
five characteristics:FidelityLengthComplexityComprehensibilityNested and un-nested stems
SREQOM, Delhi, India 39
Item Characteristics continued
Fidelity: Extent to which the format of the stem is consistent with how the situation would be encountered in a work setting.High fidelity: Situation is conveyed through a
short video.Low fidelity: Situation presented in written
form.
SREQOM, Delhi, India 40
Item Characteristics continued
Length:Some stems are very short (How Supervise?,
File & Remmers, 1971)Other stems present very detailed
descriptions of situations (Tacit Knowledge Inventory, Wagner & Sternberg, 1991)
SREQOM, Delhi, India 41
Item Characteristics continued
Complexity: Stems vary in the complexity of the situation presented.Low complexity: One has difficulty with a new
assignment and needs instructions.High complexity: One has multiple supervisors,
who are not cooperating with each other and who are providing conflicting instructions concerning which of your assignments has highest priority.
SREQOM, Delhi, India 42
Item Characteristics continued
Comprehensibility: It is more difficult to understand the meaning and import of some situations than other situations.Sacco, Schmidt & Rogg (2000) examined the
comprehensibility of item stems using reading formula.
SREQOM, Delhi, India 43
Item Characteristics continued
Length, complexity, and comprehensibility of the stems are interrelated and probably drive the cognitive loading of the items.
SREQOM, Delhi, India 44
Item Characteristics continued
Nested stemsSome situational judgment tests (Clevenger &
Halland, 2000; Parker, Golden & Redmond, 2000) present an overall situation followed by subordinate situations.
Subordinate stems are the stems linked to the responses.
SREQOM, Delhi, India 45
Item Characteristics Nature of Responses Unlike item stems that vary widely in
format, item responses are usually presented in a written format and are relatively short.Even SJTs that use video to present the
situation often present the responses in written form, sometimes accompanied by an audio presentation.
SREQOM, Delhi, India 46
Item Characteristics Response Instructions The various item instructions can be
described in a two-dimensional taxonomy:(1) Behavioral tendency vs. Knowledge
- how do you typically behave vs. what is the most effective response
(2) Number of scorable responses
SREQOM, Delhi, India 47
Item Characteristics Response Instructions
One Scoreable Response
Two Scoreable Responses
As Many Scoreable Responses as Response
Options
Behavioral Tendency
•What would you most likely do?
•What would you most likely do?• What would you least likely do?
•Rate each response for the likelihood you would perform the response.•Rank the responses from the most likely to the least likely.
Knowledge •Pick the best answer.•What should you do?
•Pick the best answer and pick the worst answer.•Pick the best and second best.
•Rate each response for effectiveness.•Rank the responses from the best to the worst.
Item Heterogeneity
SREQOM, Delhi, India 49
Item Heterogeneity
SJT items tends to be construct heterogeneous at the item level. This means that they measure many things. They are typically correlated with one or more of the
following: Cognitive ability Agreeableness Conscientiousness Emotional stability
SREQOM, Delhi, India 50
Scenario A
You assigned a very high profile project to one of your project managers. During each of the project update meetings, your project manager indicates that everything is going as scheduled. Now, one week before the project is due, your project manager informs you that the project is less than 50% complete.
Correlation with:
Responses: g(n = 448-450)
Conscientiousness(n = 1196-1222)
Agreeableness(n = 1196-1222)
Personally take over the project and meet with the customer to determine critical requirements.
.10* .01 -.13*
Meet with the customer to extend the deadline. Talk with the project manager about how the lack of communication has jeopardized the company’s relationship with the customer.
.11* -.03 -.05
Fire the project manager and take over the project yourself.
.08 .00 -.16*
Coach the project manager on how to handle the project more efficiently.
-.17* .01 .09
Do not assign any high profile jobs to this project manager in the future.
.13* .07 -.08
SREQOM, Delhi, India 51
Scenario B
You lead a project that requires specific, accurate data to make decisions. The data-capturing method currently being used does not provide you with the information you need. Another department promised to provide you with the information, but failed to do so at the last minute. This setback delayed your project and you are certain that you will require the information to complete your project accurately.
Correlation with:
Responses: g(n = 448-450)
Conscientiousness(n = 1196-1222)
Agreeableness(n = 1196-1222)
Do the time-consuming work yourself even though it is not technically your responsibility.
.07 .11* -.08
Temporarily allocate some member of your team to capture the data.
-.01 .11* .00
Ask the customer for a deadline extension and explain that the other department failed to provide the necessary information.
.12* .06 -.02
Ask your manager to pressure the other department to deliver the information.
.17* .02 -.10*
SREQOM, Delhi, India 52
Degree of item heterogeneity
Implications of item heterogeneity:Difficult to get interpretable factor analysesDifficult to build subscales that show
discriminant validity A set of items intended to measure
conscientiousness will likely be correlated with conscientiousness but also with cognitive ability, agreeableness, and emotional stability.
SREQOM, Delhi, India 53
Degree of item heterogeneity
More implications of item heterogeneity:Difficult to specify the constructs assessed by
the test.Difficult to defend use of internal consistency
as a reliability estimate. Best to use test-retest reliability
SREQOM, Delhi, India 54
Degree of item heterogeneity
Probably best to think of SJTs as a measurement method in which you can and typically do measure multiple constructs.
Overview of SJT Test Development
SREQOM, Delhi, India 56
Overview of SJT Test Development
Identify a job or job class for which a SJT is to be developed
Write critical incidents Sort critical incidents Turn selected critical
incidents into item stems
Generate item responses
Edit item responses Determine response
instructions Develop a scoring key
SREQOM, Delhi, India 57
Development Issues
Identify a job or job class
Get clarification on the job(s) for which the SJT is intended.
Determine if supervisor jobs are included in a job “class” and if separate supervisor items are needed.
SREQOM, Delhi, India 58
Development Issues
Critical Incidents
Motowidlo et al. (1990, 1997) recommended having subject matter experts write critical incidents to generate stems and use additional subject matter experts to generate responses. A subject matter expert is someone who is very
knowledgeable about the job (e.g., an incumbent or supervisor)
Some test authors just write items.
SREQOM, Delhi, India 59
Development Issues
Critical Incidents
Recommend critical incidents from subject matter experts It is unlikely that an item writer can come up
with the richness and breadth of scenarios that can be generated by a group of subject matter experts writing critical incidents.
SREQOM, Delhi, India 60
Development Issues
Critical Incident WorkshopsCritical Incident Form
1. What was the situation leading up to the event? [Describe the context.]
2. What did the employee do?
3. What was the outcome or result of the employee’s action?
4. What competency category is most relevant for this incident?
5. Circle the number below that best reflects the level of performance that this event exemplifies.
1 2 3 4 5 6 7
Low High
SREQOM, Delhi, India 61
Development Issues
Critical Incident WorkshopsCritical Incident Form1. What was the situation leading up to the event? [Describe the context.] I was the
lead scientist on a complex project that had a very short timeline. My team and I identified some creative solutions to an ongoing problem addressed by the project. When the project was nearly completed, my supervisor told me that someone in another agency was assigned the same project and it was completed last week.
2. What did the employee do? Although I was disappointed, I asked if our creative solutions could be used by the other team. I explained the issue to my co-workers and asked them to stop work. I then asked my supervisor for the next project.
3. What was the outcome or result of the employee’s action? My supervisor appreciated my flexibility and willingness to provide our solutions to the other team and thanked me for my willingness to shift focus toward a different project.
4. What competency category is most relevant for this incident? Adapting to change
5. Circle the number below that best reflects the level of performance that this event exemplifies.
1 2 3 4 5 6 7 Low High
SREQOM, Delhi, India 62
Development Issues
Critical Incident Workshops
Plenty of room/privacy/anonymity Critical incidents are often embarrassing to someone
(My boss did this stupid thing…). Anonymity permits these critical incidents to be
offered.
Raise comfort level Spelling is not important Interested in the story, not the quality of the writing.
SREQOM, Delhi, India 63
Development Issues
Critical Incident Workshops
Prompts for generating critical incidents: Think about a time when someone did a really good
job. Think about a time when someone could have done
something differently. Think of a recent work challenge you faced and how
you handled it. Think of something you did in the past of which you
were proud.
SREQOM, Delhi, India 64
Development Issues
Critical Incident Workshops
Individual feedback on initial critical incidentsReinforce productivity
Consider laptops. Many people are more comfortable typing for three hours than writing with a pen.
SREQOM, Delhi, India 65
Development Issues
Critical Incident Workshops
Conduct at least two critical incident workshops. In the first workshop, let them write on
whatever they want. In the following workshops, direct them away
from topics that have already been covered and toward topics that need better coverage.
SREQOM, Delhi, India 66
Development Issues
Sort Critical Incidents
Sort incidents into categories based on similarity of content.
Provide dimension name for each category.
Have others sort incidents using dimension names to determine agreement (retranslation).
Typical content piles (next page)
SREQOM, Delhi, India 67
Development Issues
Sort Critical IncidentsToo much workUnpleasant workChanging workNew procedures are badChallenging workWork that is not usually
part of your job
Problematic bossProblematic co-workersProblematic subordinatesProblematic upper
management Problematic other
departments/vendorsProblematic customers
SREQOM, Delhi, India 68
Development Issues
Sort Critical Incidents
Goals of sorting: Identify dimensions for which item stems will
be written Identify duplicate or near duplicate critical
incidentsCheck on gaps in coverage
SREQOM, Delhi, India 69
Development Issues
Sort Critical Incidents Goals…
Identify content that is inappropriate for items (content that you do not want to share with job applicants). For example:
Ethnic discrimination Workplace violence Topics that are sources of conflict within the
organization (crashing stock price, unpopular new policy)
SREQOM, Delhi, India 70
Development Issues
Sort Critical Incidents
Developing item stems from critical incidents is the next step.
This is labor intensive. If you will ultimately drop the stem due to
content, make the decision now so you do not waste time turning the critical incident into a stem.
SREQOM, Delhi, India 71
Development Issues
Turn Critical Incidents into Item Stems
Write item stems using first part of critical incidents.
The stem needs to be appropriate and job- related for all jobs covered by the SJT.
SREQOM, Delhi, India 72
Development Issues
Example from Critical Incident described above You are the lead scientist on a complex project that had
a very short timeline. You and your team have identified some creative solutions to an ongoing problem addressed by the project. When the project was nearly completed, your supervisor told you that someone in another agency was assigned the same project and it was completed last week.
SREQOM, Delhi, India 73
Development Issues
Turn Critical Incidents into Item Stems
For technical jobs, a critical incident may concern difficulty learning a new software package for inventory control.
If all jobs do not require the use of this software, make the stem refer to “new software for your job”.
If all jobs do not involve software, make the stem refer to “difficulty in learning a new work procedure.”
SREQOM, Delhi, India 74
Development Issues
Turn Critical Incidents into Item Stems
Stems need to be edited for clarity and brevity. Stems with ambiguous meanings will result in
disagreement concerning the effectiveness of the responses.
Standardize the use of terms (boss vs. supervisor, co-worker vs. team member, etc.) Making these decisions early will reduce editing time.
SREQOM, Delhi, India 75
Development Issues
Generate item responses
Assemble a survey of item stems with space for respondents to write potential responses to the stem.
The critical incident from which the stem was developed probably contained one response to the situation.
SREQOM, Delhi, India 76
Development Issues
Generate item responses
Have multiple subject matter experts write additional responses for each stem.
Prompts for writing responses:What would you do?What is the best thing to do?What is a bad response that you think many
people would do?Think of a good/poor employee. What would
he/she do?
SREQOM, Delhi, India 77
Development Issues
Generate item responses You are the lead scientist on a complex project that had
a very short timeline. You and your team have identified some creative solutions to an ongoing problem addressed by the project. When the project was nearly completed, your supervisor told you that someone in another agency was assigned the same project and it was completed last week.
A. B. C. D.
SREQOM, Delhi, India 78
Development Issues
Generate item responses
Use multiple subject matter experts working independently to get the maximum number of non-redundant responses. A given subject matter expert will often only be able to
generate 2-3 non-redundant responses.
A group of subject matter experts working independently can usually generate between 5 and 8 non-redundant responses.
SREQOM, Delhi, India 79
Development Issues
Generate item responses
Edit item responses. Many of the item responses will be
redundant. Might permit some redundancy in
responses to convey a nuance:Confront your boss about X and …Assume X was a mistake and speak with your
boss …
SREQOM, Delhi, India 80
Development Issues
Generate item responses
Screen out responses that will have little variance. These will primarily be very inappropriate responses that no applicant will state they find effective:Tell boss you think his/her idea was stupid
SREQOM, Delhi, India 81
Development Issues
Determine Item Response Instructions
One now has a set of items, each with multiple responses.
The next step is to determine the response instructions for the test.
Response instructions tell the respondent how to evaluate the item responses.
Choices are knowledge instructions or behavioral consistency.
SREQOM, Delhi, India 82
Development Issues
Determine Item Response Instructions
Knowledge instructions ask for the “best” answer and are thus assessments of knowledge of the appropriateness of responses. Pick the best response.Pick the best response and then the worst
response.Rate the responses on effectiveness.
SREQOM, Delhi, India 83
Development Issues
Determine Item Response Instructions
Behavioral tendency instructions ask for the applicant’s likely behavior.What would you most likely do?What would you most likely do and what
would you least likely do?Rate each response on how likely you would
do the response.
SREQOM, Delhi, India 84
Development Issues
Determine Item Response Instructions
As noted earlier, whether one uses knowledge or behavioral tendency instructions has important implications for:Applicant fakingMagnitude of cognitive and non-cognitive
correlatesCriterion-related validityMagnitude of mean ethnic differences
SREQOM, Delhi, India 85
Development Issues
Response Instructions and Faking Applicants may recognize that what they
would most likely do (Behavioral Tendency) is not the most effective response.
Some applicants may choose to misrepresent their behavioral tendency.
McDaniel keeps a messy desk. McDaniel will report that he would keep his desk clean and tidy.
SREQOM, Delhi, India 86
Development Issues
Response Instructions and Faking
Nguyen, Biderman & McDaniel (2005) showed that it is more difficult to intentionally fake a knowledge item than a behavioral tendency item.
By way of metaphor, compare a personality item (behavioral tendency) to a math item (knowledge).
Behavioral tendency item: How dependable are you?
Knowledge item: What is the cube root of 46,656?
SREQOM, Delhi, India 87
Development Issues
Response Instructions and Construct Validity (correlations with other tests) SJTs with knowledge instructions tend to
be more correlated with cognitive ability and less correlated with non-cognitive traits.
SJTs with behavioral tendency instructions tend to be more correlated with non-cognitive traits and less correlated with cognitive ability.
SREQOM, Delhi, India 88
Development Issues
Response Instructions and Construct Validity
Knowledge Instructions
Behavioral Tendency Instructions
Cognitive ability .35 .19Conscientiousness .24 .34Agreeableness .19 .37Emotional stability .12 .35
From McDaniel, Hartman, Whetzel, Grubb (2007)
SREQOM, Delhi, India 89
Development Issues
Response Instructions and Correlations with Job Performance
Meta-analysis resultsKnowledge instructions (ρ = .26; k = 96)Behavioral tendency instructions (ρ
= .26; k = 22)
From McDaniel, Hartman, Whetzel, Grubb (2007)
SREQOM, Delhi, India 90
Development Issues
Response Instructions and Mean White-Black Differences in SJTs Mean differences are somewhat larger for
knowledge instruction SJTs than for behavioral tendency instruction SJTsAlmost all data are for written presentation
(non-video) SJTs
SREQOM, Delhi, India 91
Development Issues
Response Instructions and Mean White-Black Differences in SJTs
Distribution k N d
All Effect Sizes 62 42,178 .38
Written – Knowledge 45 36,348 .39
Written – Behavioral tendency 17 5,830 .34
From Whetzel, McDaniel, Nguyen (2008)
SREQOM, Delhi, India 92
Development Issues
Response Instructions and Mean White-Black Differences in SJTs The correlation of the SJT with cognitive
ability controls almost all of the differences across studies in mean white-black differences.
SREQOM, Delhi, India 93
Development Issues
Response Instructions and Mean Ethnic Differences in SJTs Some employers may want to use a video
presentation format or use a behavioral tendency response format to reduce mean ethnic differences.This may help, but even these options are
driven somewhat by the cognitive loading and video-based SJTs can be expensive to develop.
SREQOM, Delhi, India 94
Development Issues
Response Instructions and Mean Ethnic Differences in SJTs Readily comprehensible and simple
situations may have lower cognitive load regardless of presentation format or response instructionsBut, will simple items meet your assessment
needs?
SREQOM, Delhi, India 95
Development Issues
Mean Sex Differences in SJTs
Females score slightly higher than males (d = .11) on average
The effect is not related to cognitive loading and is not moderated by response instructions
Some mean sex differences in conscientiousness and agreeableness favoring females.
SREQOM, Delhi, India 96
Development Issues
Caveat on response instruction differences Most published data on SJTs are based
on employee samples (less likely to fake). Applicants are more likely to respond to
behavioral tendency instructions as if they were knowledge instructions.They will try to give the best answer when
asked for behavioral tendency Thus, need more research using applicant
samples.
Overview of SJT Scoring
SREQOM, Delhi, India 98
Development Issues
Scoring
One needs to determine what the right answer is to build a scoring key.
The options are:Rational keysEmpirical keysHybrid keys
SREQOM, Delhi, India 99
Development Issues
Scoring with Rational Keys
Rational keys SJTs are often keyed based on expert
judgmentReject item responses with low inter-rater
agreement
SREQOM, Delhi, India 100
Development Issues
Scoring with Rational Keys
Data-assisted expert keyingCollect effectiveness data and have mean
and standard deviations and frequencies of ratings available to experts who decide the key
SREQOM, Delhi, India 101
Development Issues
Scoring with Rational Keys
Data-assisted keying without expertsCollect effectiveness data and use the means
to make the keyDrop response options with high standard
deviations
SREQOM, Delhi, India 102
Development Issues
Scoring with Empirical Keys
Any empirical keying approach for biodata is applicable for SJTs Score items according to their relationship
with a criterion variableSee Hogan (1994) for a good review of
empirical keying procedures.
SREQOM, Delhi, India 103
Development Issues
Scoring with Hybrid Keys
A hybrid key is some mix of rational and empirical keying.
For example, you might empirically key but only retain the keyed option if it makes sense.
SREQOM, Delhi, India 104
Development Issues
Scoring Issues
If one uses a Likert rating scale to record responses and uses a rational keying method, what do you do with the responses rated as average?
Suggestion: Make the Likert scales to have an even number of response categories (4 or 6).
This way all response options are forced to be either effective or ineffective (or likely to be performed or unlikely to be performed).
SREQOM, Delhi, India 105
Development Issues
Scoring Issues
A Likert scale often uses adjectives:Very effective, effective, ineffective, very
ineffectiveFrom a litigation point of view, it makes some
uneasy to try to defend the difference between very effective and effective.
Your “very effective” might mean the same as my “effective”
SREQOM, Delhi, India 106
Development Issues
Scoring Issues
For the purpose of rational keying, one might consider “very effective” and “effective” to be identical responses.
Thus, one could score the item as dichotomous. If the scoring key indicates that the response is a
good thing to do, a respondents providing a rating of “very effective” or “effective” gets a point; other ratings get zero.
SREQOM, Delhi, India 107
Development Issues
Scoring Issues
Many applications of SJTs use discrete points assigned to response options:Very effective = 1Effective = 1 Ineffective = 0Very ineffective = 0
SREQOM, Delhi, India 108
Development Issues
Scoring Issues
Legree et al. (2005) discussed the use of mean effectiveness ratings as the correct answer and score responses as deviations from the mean: If the mean is 1.5, a respondent who provided a rating
of 1 or 2 would both have a -.5 as a score on the item. Zero is the highest possible score Scores are often inverted to make favorable scores
positive.
SREQOM, Delhi, India 109
Development Issues
Scoring Issues
Some research shows that mean ratings by experts give the same means as those given by novices.
The novices have greater standard deviations.
SREQOM, Delhi, India 110
Development Issues
Scoring Issues
Differences across applicants in how they use the scale creates variance that might not be relevant to predicting the criterion.
Consider standardizing the item responses within person.Within-person z transformation
SREQOM, Delhi, India 111
Development Issues
Scoring Issues
The within-person standardization makes the mean and standard deviation of all the item responses to be the same for all applicants.
This within-person standardization often improves the item validity substantially.
SREQOM, Delhi, India 112
Development Issues
Scoring Issues
A few slides ago, we suggested dropping an item response when the experts could not agree on the effectiveness of the item response.
One could also consider dropping the item responses where the applicants have large variance in their responses.
SREQOM, Delhi, India 113
Development Issues
Scoring Issues
When using Likert scales to rate each response option, the response options with large variances typically have mean ratings in the middle of the Likert scale.
Thus, mid-range mean items may indicate an item response where there is much disagreement concerning its effectiveness.
SREQOM, Delhi, India 114
Development Issues
Scoring Issues
SREQOM, Delhi, India 115
Development Issues
Scoring Issues
The response options with high or low mean Likert ratings often have the highest mean item validities.
SREQOM, Delhi, India 116
Development Issues
Scoring Issues
SREQOM, Delhi, India 117
Development Issues
Scoring Issues
Although the low and high mean items will likely have the highest validity, the mid-mean items may have some validity.
If one wants to shorten the test for future administrations, consider dropping the mid-mean items.
SREQOM, Delhi, India 118
Development Issues
Scoring Issues
Incumbent vs. applicant differences Incumbents are typically the experts for keying. If a company policy guides an action,
incumbents will rate behaviors consistent with the policy as effective.
High quality applicants might respond differently because they don’t know the policy.
Call center example
SREQOM, Delhi, India 119
Development Issues
Scoring Issues
Cultural differences are possible with keysAmericans will often question their
supervisor’s judgment openly.Other cultures do not typically question their
supervisor’s judgments.Your believe your boss has overlooked a key
fact when making a decision…
SREQOM, Delhi, India 120
Summary of SJT Development and Scoring Best developed using critical incidents. Carefully edit the critical incidents into item
stems. Use several subject matter experts to
suggest item responses. Drop response options when subject
matter experts cannot agree.
SREQOM, Delhi, India 121
Summary of SJT Development and Scoring Have the applicants rate each response
on a Likert scale of effectiveness.A knowledge instruction
Use a within-person standardization on the applicant ratings.This tends to result in higher validities
To shorten SJTs for future administrations, drop the mid-range mean items.
SREQOM, Delhi, India 122
References
Provided in book chapter.
SREQOM, Delhi, India 123
Thank you.
Questions??