transitioning from traditional acs to automated ...€¦ · human resources research organization...
TRANSCRIPT
Human Resources Research Organization (HumRRO) 66 Canal Center Plaza, Suite 700 Alexandria, Virginia 22314-1578 | Phone: 703.549.3611 | Fax: 703.549.9661 | www.humrro.org 66 Canal Center Plaza, Suite 700 Alexandria, Virginia 22314-1578 | Phone: 703.549.3611 | Fax: 703.549.9661 | www.humrro.org
Transitioning from Traditional ACs to Automated Simulations: Insights for Practice and Science
Presented at:
ACSG 2014 Stellenbosch, South Africa
Presenter: Dan J. Putka Alexandria, VA, U.S.A.
March 14, 2014
● Advances in technology are creating new and exciting opportunities for automating traditional AC assessments, but…
● Assessment practice is having trouble keeping pace with
technological advancement – Constantly evolving technology + demand for latest/greatest + limited budgets/time
● Psychometrics is having trouble keeping pace with technical advancement and
assessment practice – Is the “technology train” taking us to a station without a psychometric platform?
Balancing promise and peril to create cutting-edge, yet scientifically sound
assessments
Introduction
2
● Practice is getting ahead of the AC research literature –
research continues to be thin (Gibbons, 2013) – Long publication cycles + data access issues
In sum…
3
Technology Assessment
Practice
AC Research & Psychometrics
● Helping a large U.S. federal government transition a long-standing, nationwide AC programme to an automated simulation-driven assessment programme – History of the AC programme – Why change? – Technology, meet tradition – Making the transition – Challenges and opportunities – The road ahead
Today’s presentation
4
History of the AC Programme
Organizational context
● U.S. Bureau of Alcohol, Tobacco, Firearms, and Explosives (ATF) – Federal law enforcement agency within
the U.S. Department of Justice – Dual responsibilities for enforcing Federal
criminal laws and regulating firearms and explosives industries
– Offices located in cities throughout the U.S.
6
Context for the original AC programme
● 1996 settlement agreement required ATF to change its human resource practices for special agents
● Develop AC programmes for promotion decisions at two levels of leadership within ATF – 1st and 2nd level supervisors
● Court-appointed oversight committee overseeing custom
development of AC programmes
7
Programme scope
● 1st level supervisor AC – 2002 N = 170 - 2003 N = 100 – 2006 N = 275 - 2008 N = 347
● 2nd level supervisor AC
– 2003 N = 71 - 2004 N = 41 – 2005 N = 13 - 2007 N = 85 – 2009 N = 73
Note. N = Number of candidates evaluated at the AC in the given year.
8
Dimension × exercise map
Category/Dimension Analysis Exercise
In Basket
Role Play
Past Behavior Interview
Technical Procedural Knowledges Knowledge of Relevant Laws, Regs, Policies X X General Investigative Knowledge X X X Knowledge of Administrative Procedures X
Management/Administrative Resources Management X X Judgment & Problem Solving X X X X Decisiveness X X Plan, Organize, Prioritize X X X X
Influence/Interpersonal Communicate Orally X X X X Relate to Others X X Lead Others X X
9
Comparison to other ACs
10
Characteristic 2008
ATF AC Woehr & Arthur (2003)
Studies Number of Assessments 4 M = 4.78, SD = 1.47 Number of Dimensions 10 M = 10, SD = 5.11 Candidate-to-Assessor Ratio 1-to-2 (.50) M = 1.71
Rating Approach Within-exercise 63% of those reporting used a within-exercise approach
Assessor Occupation Managers/ supervisors
83% of those reporting used managers/supervisors
Length of Assessor Training 4 days (32 hours) M = 3.35 days, SD = 3.06
AC Purpose Selection/ promotion
81% of those reporting were for selection/promotion
Summary of original AC
● Highly rigorous, custom development process grounded in SIOP Principles for developing content valid assessments and Guidelines and Ethical Principles for AC Operations
● Legal challenges drastically reduced
● Overall, viewed as a clear improvement
● Leveled the playing field for all candidates
11
Why Change?
$$$$$$$
● ATF was looking for alternatives that would retain as many benefits of a traditional AC as possible, but that would not involve human assessors
● The cost of “manager” assessors – Flights: 30+ managers flying in for the AC from all over the U.S. – Hotels, per diem, labor costs
• Two week AC + 4 days of pre-AC training – Productivity losses: Managers away from their jobs for two+ weeks
● Flights, hotels, per diem, labor costs for 300+ internal candidates!
13
Technology, Meet Tradition
Pros & Cons: Financial/business perspective Technology Enhanced Sims Traditional AC Exercises
Lower long term costs (no assessors)
Higher long term costs (assessors)
Higher short term costs (tech development/coding/testing)
Lower short term costs (little or no technology)
Ease of administration (lower logistics burden)
Difficulty of administration (greater logistics burden)
Reduced testing time (e.g., 4.5 hrs) Longer testing time (e.g., 9 hrs)
Lower long term costs + lower admin burden allows benefits of ACs to be pushed to more levels of leadership
Higher long term costs + higher admin burden tends to limit ACs to highest levels of leadership
15
Pros & Cons: Psychometric perspective Technology Enhanced Sims Traditional AC Exercises
Lower fidelity with on-the-job behavior – closed-ended response formats
Higher fidelity with on-the-job behavior – free response formats
Difficulty measuring constructs best judged by actual behavioral observation (e.g., oral communication)*
Well suited for measuring constructs best judged by actual behavioral observation
Difficulty of validation strategy – criterion and content focused
Relative ease of validation strategy – content focused
Potential for fully standardized assessment
Potential lack of standardization across assessors/role players
More objective scoring More subjective scoring (perceived)
Complex scoring and measurement issues to confront
Relatively simple scoring and measurement issues
Thin research literature – far less precedent for best practice
Deep research literature – much precedent for best practice
16
Making the Transition
Overview of the EPAS
● Electronic Promotion Assessment System (EPAS) – Suite of three assessments for promoting ATF Special Agents to 1st
line supervisor positions throughout the U.S. – Delivered online to 623 candidates at eight proctored test sites
throughout the U.S. in the fall of 2012 • Situational Judgment Test • Office Simulation • Virtual Role Play
– The EPAS was custom developed by HumRRO (prime contractor) and ClicFlic (subcontractor) in partnership with ATF
18
EPAS development ● 7 months start to finish!
– Initial assessment and scoring development • Working from previously developed “paper” in-basket and role play
exercises and detailed job analysis data • Multiple SME workshops
– Audio-animation production – Coding – Beta testing and quality control – Pilot testing and refinement – Criterion development – Concurrent, criterion-related validation study – Implementation – Sleep, lots of sleep
19
EPAS assessments
● Situational Judgment Test (SJT) – Simple progression through a series of animated scenarios – Closed-ended response format – “rate the effectiveness of each response”
● Office Simulation (OS)
– Variation on a traditional AC in-basket – Simple progression through a series of animated “in-basket” items
• E-mails with attachments • Phone calls • Office visits from “virtual” employees
– Limited branching within items – Wide variety of response types designed to mimic job behavior
● Virtual Role Play (VRP)
– Highly interactive – a lot of conditional branching – Relatively fewer response types than the OS, but still varied
20
A video is worth 10,000 words
21
Dimension × exercise map
Category/Dimension SJT Office Sim
Virtual Role Play
Technical Procedural Knowledges Knowledge of Relevant Laws, Regs, Policies X General Investigative Knowledge X X
Management/Administrative Judgment & Problem Solving X X X Plan, Organize, Prioritize X X X
Influence/Interpersonal Relate to Others X X Lead Others X X
22
EPAS development
● Followed a strict content-oriented development process (SIOP, 2003) – Reviewed task/KSAO linkages from Special Agent job analysis
(and existing in-basket and role play assessments) – Identified critical job tasks that could be simulated and provide
sufficient stimuli for eliciting dimension-relevant behavior – Developed assessments based on a subset of those tasks – Worked closely with Special Agent SMEs on content and scoring – Ensured ample opportunities for candidates to demonstrate
behaviors relevant to the critical KSAOs required by the job/tasks – Evaluated strength of linkages between KSAOs/dimensions and
each assessment (post-development content validity ratings)
23
EPAS criterion-related validity
● Concurrent, criterion-related validation study prior to operational use – Job incumbents and their supervisors (raters) – Supervisor ratings of incumbent job performance
• Multi-dimension, behavioral summary scales
– Sample size = 134
● Raw, uncorrected correlations between job performance and EPAS scores: – Overall EPAS score: Correlation in the mid .40s – Dimension and exercise-level scores: Correlations averaged in the
mid- to upper- 20s.
24
EPAS cost savings
25
4100
2400
1000
$0
$1,000
$2,000
$3,000
$4,000
$5,000
Original AC EPAS 1st Administration
Year
EPAS Subsequent
Administration Years*
Full development and administration cost per AC candidate in U.S. dollars
Note. *Projected cost for future administration years.
The bottom line
● Custom, content-oriented development process that follows SIOP Principles
● Evidence of content and criterion-related validity study prior to implementation
● Substantial cost savings per candidate - even in the development year
● ATF received a 2013 SIOP-SHRM HRM Impact Award for the EPAS work
26
Challenges & Opportunities
Challenges and opportunities
● In practice, many issues arise for which the research literature is unclear or non-existent (i.e., fun!) – In terms of implementing sound practice, these represent challenges – In terms of scientific advancement, these represent opportunities
● Development and implementation of the EPAS presented several challenging opportunities – Standardization in the face of branching – Fidelity of closed-ended response formats – Multiple item response formats + branching = scoring fun!
28
Standardization in the face of branching
Recommendations and lessons learned ● Don’t go overboard with branching
– Enough for realism, but not so much that it makes it difficult to ensure a sufficient number of “common” assessment points per competency
● Map and “play out” different potential branches with multiple SMEs
● Look for creative ways to “build in” commonality and strategically “redirect” the candidate as needed – “Dead ends” that feed into a new common path or assessment point that
requires action on the part of the candidate – “Visit” from another member of the office – Arrival of a new e-mail – Incoming phone call
29
Fidelity of closed-ended responses
Recommendations and lessons learned ● Use response formats that “naturally” follow from the stimulus
used to elicit the competency you are trying to measure
● To the extent possible, let the response format mirror the types of judgments and decisions candidates make on the job, e.g.:
– Evaluating the effectiveness of potential courses of action – Prioritizing potential courses of actions – Identifying errors/deficiencies in work products – Evaluating the seriousness of errors/deficiencies – Evaluating the criticality of information for decision making – Determining when sufficient information has been obtained
– This goes well beyond simply “pick the best answer”.
30
Multiple response formats + branching = …
31
● Use of items with different response formats can lead to unintended weighting of items when forming assessment scores – Large item variance differences can greatly impact weighting – Contrast 0-1 scale with 1-100 scale
● Individuals completion of different items (as a result of branching)
can decrease the reliability of the measure – When individuals complete the same items, differences in item difficulty
do not affect the rank ordering of candidates – When individuals complete different sets of items, differences in item
difficulty can affect the rank ordering of candidates – Potentially exacerbated by use of items with different response formats
Recommendations and lessons learned ● Don’t go overboard with branching or multiple response formats!
– Need to balance realism/fidelity with psychometric reality
● Map different item response scales to a common scale – Simple (e.g., linear) transformations: cost effective and perhaps
sufficient if meaning of resulting scale is not critical (e.g., simple rank ordering of candidates)
– SME-based mapping: more expensive, may not completely solve it, but important if meaning of overall scale is important (e.g., comparison of scores to a proficiency benchmark or cut-off)
● When designing simulations with branching, be very cognizant of the amount of overlap among items that are scored and strive to maximize that overlap
32
= psychometric fun
The Road Ahead
● As technology evolves, there will ALWAYS be issues in
need of research attention, and unknown questions – These are sources of risk, potential value….and fun!
● How do we ensure sound practice given differences in
rates of knowledge advancement highlighted earlier? – This is issue is bigger than AC research/practice – it spans domains – Assessment technology consortia spanning academia-practice?
• What would be the stimulus for practice to participate? • What would be the stimulus for academe to participate?
– Balancing competitive advantage with scientific advancement
Advancing science and practice
34
In sum…
35
Technology, assessment practice , AC research, and psychometrics
The more we can bring knowledge in these areas into alignment, the more likely it is that our assessments will reflect truly innovative, best-in class solutions for the individuals we assess, while at the same time providing the basis for making impactful, practical contributions to the scientific knowledge base.
Thank You!
● Gibbons, A. M. (2013). Research evidence and AC 2.0: What we know and what we don’t. Presentation at the 33rd Annual Assessment Centre Study Group Conference. Stellenbosch, South Africa.
● Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th edition). Bowling Green, OH: Author.
● Woehr, D. J., & Arthur, W., Jr. (2003). The construct-related validity of assessment center ratings: A review and meta-analysis of the role of methodological factors. Journal of Management, 29, 231–258.
References
37
Dr. Dan Putka, Principal Staff Scientist Human Resources Research Organization (HumRRO) www.humrro.org Email: [email protected]
For More Information
38
Linked Slides
40