battery test

39
technical manual 9 THE CRITICAL TEST BATTERY Test Batteries

Upload: bobrockd

Post on 06-Apr-2015

808 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Battery Test

technicalmanual 9

THECRITICAL TESTBATTERY

Test Batteries

Page 2: Battery Test

ONTENTS1 THEORETICAL OVERVIEW

2 THE CRITICAL REASONING TESTS

3 THE PSYCHOMETRIC PROPERTIES OF THE CRITICAL REASONING TESTS

4 REFERENCES

5 APPENDICES

ADMINISTRATION INSTRUCTIONS

SCORING INSTRUCTIONS

CORRECTION FOR GUESSING

NORM TABLES

c

Page 3: Battery Test

2

Page 4: Battery Test

31 MEAN AND SD OF AGE, AND GENDER BREAKDOWN, OF THE NORMATIVE SAMPLE

FOR THE VCR22 MEAN SCORES FOR MEN AND WOMEN (MBAS) ON THE VCR2 AND NCR23 ALPHA COEFFICIENTS FOR THE VERBAL AND NUMERICAL CRITICAL REASONING

TESTS

4 CORRELATIONS BETWEEN THE VCR2 AND NCR25 CORRELATIONS BETWEEN THE VERBAL AND NUMERICAL CRITICAL REASONING

TESTS WITH THE APIL-B

6 CORRELATIONS BETWEEN THE ORIGINAL VERSIONS OF THE VCR2 AND NCR2WITH THE AH5

7 ASSOCIATION BETWEEN THE VCR2, NCR2 AND INSURANCE SALES SUCCESS

8 CORRELATIONS BETWEEN THE VCR2, NCR2 AND MBA PERFORMANCE

LIST OF TABLES

Page 5: Battery Test

4

Page 6: Battery Test

1THEORETICALOVERVIEW

1 THE ROLE OF PSYCHOMETRIC TESTS

IN PERSONNEL SELECTION AND

ASSESSMENT

2 THE ORIGINS OF REASONING TESTS

Page 7: Battery Test

6

THE ROLE OFPSYCHOMETRIC TESTS INPERSONNEL SELECTIONAND ASSESSMENT

A major reason for using psychomet-ric tests to aid selection decisions isthat they provide information thatcannot be obtained easily in otherways. If such tests are not used thenwhat we know about the applicant islimited to the information that canbe gleaned from an application formor CV, an interview and references. Ifwe wish to gain information about aperson’s specific aptitudes & abilitiesand about their personality, attitudesand values then we have little optionbut to use psychometric tests. In fact,psychometric tests can do more thansimply provide additional informa-tion about the applicant. They canadd a degree of reliability and valid-ity to the selection procedure that itis impossible to achieve in any otherway. How they do this is bestaddressed by examining the limita-tions of the information obtainedthrough interviews, applicationforms and references and exploringhow some of these limitations can beovercome by using psychometrictests.

While much useful informationcan be gained from the interview,which clearly has an important rolein any selection procedure, it doesnonetheless suffer from a variety ofweaknesses. Perhaps the mostimportant of these is that the inter-view as been shown to be a veryunreliable way to judge a person’scharacter. This is because it is anunstandardised assessment proce-dure. That is to say, each interviewwill be different from the last. This istrue even if the interviewer isattempting to ask the same questionsand act in the same way with each

applicant. It is precisely this aspectof the interview that is both its mainstrength and its main weakness. Theinterview enables us to probe eachapplicant in depth and discover indi-vidual strengths and weaknesses.Unfortunately, the interviewsunstandardised, idiosyncratic naturemakes it difficult to compare appli-cants, as it provides no base lineagainst which to contrast intervie-wees’ differing performances. Inaddition, it is likely that differentinterviewers may come to radicallydifferent conclusions about the sameapplicant. Applicants will responddifferently to different interviewers,quite often saying very differentthings to them. In addition, what anyone applicant might say will beinterpreted quite differently by eachinterviewer. In such cases we have toask which interviewer has formedthe correct impression of the candi-date? This is a question to whichthere is no simple answer.

A further limitation of the inter-view is that it only assesses thecandidate’s behaviour in one setting,and with regard to a small numberof people. How the candidate mightact in different situations and withdifferent people (e.g. when dealingwith people on the shop floor) is notassessed, and cannot be predictedfrom an applicant’s interview perfor-mance. Moreover, the interviewprovides no reliable informationabout the candidate’s aptitudes andabilities. The most we can do is askthe candidate about his strengthsand weaknesses, a procedure thathas obvious limitations. Thus therange, and reliability of the informa-

Page 8: Battery Test

7tion that can be gained through aninterview are limited.

There are similar limitations onthe range and usefulness of the infor-mation that can be gained fromapplication forms or CV’s. Whilework experience and qualificationsmay be prerequisites for certainoccupations, in and of themselvesthey do not determine whether aperson is likely to perform well orbadly. Experience and academicachievement is not always a goodpredictor of ability or future success.While such information is importantit may not be sufficient on its own toenable us to confidently choosebetween applicants. Thus aptitudeand ability tests are likely to play asignificant role in the selectionprocess as they provide informationon a person’s potential and not justtheir achievements to date.Moreover, application forms tell uslittle about a person’s character. It isoften a candidate’s personality thatwill make the difference between anaverage and an outstanding perfor-mance. This is particularly truewhen candidates have relativelysimilar records of achievement andpast performance. Therefore,personality tests can play a majorrole in assisting selection decisions.

There is very little to be saidconcerning the usefulness of refer-ences. While past performance isundoubtedly a good predictor offuture performance references areoften not good predictors of pastperformance. If the name of thereferee is supplied by the applicant,then it is likely that they have chosensomeone they expect to speak highly

of them. They will probably haveavoided supplying the names ofthose who may have a less positiveview of their abilities. Aptitude andability tests, on the other hand, giveus an indication of the applicant’sprobable performance under examconditions. This is likely to be a truereflection of the person’s ability.

What advantages do psychometrictests have over other forms of assess-ment? The first advantage they haveis that they add a degree of reliabil-ity to the selection procedure thatcannot be achieved without theiruse. Test results can be representednumerically making it easy both tocompare applicants with each other,and with pre-defined groups (e.g.successful vs. unsuccessful jobincumbents). In the case of personal-ity tests the test addresses the issueof how the person characteristicallybehaves in a wide range of differentsituations and with different people.Thus psychometric tests, bothpersonality tests and aptitude andability tests provide a range of infor-mation that are not easily andreliably assessed in other ways. Suchinformation can fill important gapswhich have not been assessed byapplication forms, interviews andreferences. It can also raise questionsthat can later be directly addressedin the interview. It is for this reasonthat psychometric tests are beingused increasingly in personnel selec-tion. Their use adds a degree ofbreadth to assessment decisionswhich cannot be achieved in anyother way.

Page 9: Battery Test

8RELIABILITY AND VALIDITY

As previously noted, besides provid-ing information that cannot be easilyobtained in other ways psychometrictests also add reliability and validityto the selection procedure. There aretwo ways in which psychometrictests increase the reliability of theassessment procedure:

i) The use of a standardisedassessment procedure:Reliability is achieved by using thesame tests on each applicant andadministering, scoring and interpret-ing the test results in the same way.Thus, individual biases and distor-tions are removed from theassessment procedure. By comparingeach applicant’s scores against anagreed norm we create a baselinethat enables us not only to compareapplicants with each other, but alsoto contrast them against some agreedcriterion (e.g. against the perfor-mance of a sample of graduates,accountants etc.). Thus, subjectiveand idiosyncratic interpretations of acandidate’s performance areremoved from the assessmentprocess.

ii)The use of well standardised& reliable psychometric tests:To ensure the assessment procedureproduces reliable and consistentresults it is necessary to use well-constructed psychometric tests. It isnot sufficient simply to administerany questionnaire that purports to bea psychometric test, or assessmentsystem. If the test has beenconstructed badly, it will neither bereliable nor valid and will add littleto the assessment process. In themost extreme case the use of such atest may invalidate an otherwisevalid selection procedure. For a testto be reliable each of the questions ineach scale must be a good measureof the underlying trait that the scaleis attempting to assess. To this endthe test publisher should providedata to demonstrate that the test isboth reliable and valid. (The statis-tics that are used to determine thisare described later in the manual).

Page 10: Battery Test

9The assessment of intelligence orreasoning ability is perhaps one ofthe oldest areas of research interestin psychology. Gould (1981) hastraced attempts to scientificallymeasure psychological aptitudes andabilities to the work of Galton at theend of the last century. Prior toGalton’s pioneering work, however,interest in this area was aroused byphrenologists’ attempts to assessmental ability by measuring the sizeof people’s heads. Reasoning tests, intheir present form, were first devel-oped by Binet, a Frencheducationalist who published thefirst test of mental ability in 1905.

Binet was concerned with assess-ing the intellectual development ofchildren and to this end invented theconcept of mental age. Questions,assessing academic ability, weregraded in order of difficulty accord-ing to the average age at whichchildren could successfully completeeach item. From the child’s perfor-mance on such a test it was possibleto derive its mental age. Thisinvolved comparing the performanceof the child with the performance ofthe ‘average child’ from different agegroups. If the child performed at thelevel of the average 10 year old, thenthe child was said to have a mentalage of 10, regardless of its chrono-logical age. From this idea theconcept of the Intelligence Quotient(IQ) was developed by William Stern(1912) who defined it as mental agedivided by chronological age multi-plied by 100. Previous to Stern’spaper chronological age had been

subtracted from mental age toprovide a measure of mental alert-ness. Stern showed that it was moreappropriate to take the ratio of thesetwo constructs, which would providea measure of the child’s intellectualdevelopment relative to other chil-dren. He further proposed that thisratio should be multiplied by 100 forease of interpretation; thus avoidingcumbersome decimals.

Binet’s early tests were subse-quently revised by Terman et al.(1917) to produce the now famousStanford-Binet IQ test. These earlyIQ tests were first used for selectionby the American’s during the firstworld war, when Yerkes (1921)tested 1.75 million soldiers with thearmy alpha and beta tests. Thus bythe end of the war, the assessment ofreasoning ability had firmly estab-lished its place within psychology.

THE ORIGINS OFREASONING TESTS

Page 11: Battery Test

bk

Page 12: Battery Test

2THE CRITICALREASONING TESTS

1 THE DEVELOPMENT OF THE

CRITICAL REASONING TESTS

2 REVISIONS FOR THE SECOND EDITION

Page 13: Battery Test

bm

THE DEVELOPMENTOF THE CRITICALREASONING TESTS

Research has clearly demonstratedthat in order to accurately assessreasoning ability it is necessary todevelop tests which have beenspecifically designed to measure thatability in the population underconsideration. That is to say, weneed to be sure that the test has beendeveloped for use on the particulargroup being tested, and thus isappropriate for that particulargroup. There are two ways in whichthis is important. Firstly, it is impor-tant that the test has been developedin the country in which it is intendedto be used. This ensures that theitems in the test are drawn from acommon, shared cultural experience,giving each candidate an equalopportunity to understand the logicwhich underlies each item. Secondly,it is important that the test isdesigned for the particular abilityrange on which it is to be used. Atest designed for those of averageability will not accurately distinguishbetween people of high ability as allthe scores will cluster towards thetop end of the scale. Similarly, a testdesigned for people of high abilitywill be of little use if given to peopleof average ability. Not only will it notdiscriminate between applicants, asall the scores will cluster towards thebottom of the scale, but also as thequestions will be too difficult formost of the applicants they are likelyto be de-motivated, producing artifi-cially low scores. Consequently, theVCR2 and NCR2 have been devel-oped on data from undergraduates.That is to say, people of aboveaverage intelligence, who are likelyto find themselves in senior manage-ment positions as their careerdevelops.

In constructing the items in theVCR2 and NCR2 a number of guidelines were borne in mind. Firstly,and perhaps most importantly,special care was taken when writingthe items to ensure that in order tocorrectly solve each item it wasnecessary to draw logical conclusionsand inferences from the stempassage/table. This was done toensure that the test was assessingcritical (logical/deductive) reasoningrather than simple verbal/numericalchecking ability. That is to say, theitems assess a person’s ability tothink in a rational, critical way andmake logical inferences from verbaland numerical information, ratherthan simply check for factual errorsand inconsistencies.

In order to achieve this goal forthe Verbal Critical Reasoning(VCR2) test two further points wereborn in mind when constructing thestem passages for the VCR2. Firstly,the passages were kept fairly shortand cumbersome grammaticalconstructions were avoided, so that aperson’s scores on the test would notbe too affected by reading speed;thus providing a purer measure ofcritical reasoning ability. Secondly,care was taken to make sure that thepassages did not contain any infor-mation which was counter-intuitive,and was thus likely to create confu-sion.

To increase the acceptability ofthe test to applicants the themes ofthe stem passages were chosen to berelevant to a wide range of businesssituations. As a consequence of theseconstraints the final stem passageswere similar in many ways to theshort articles found in the financialpages of a daily newspaper, or trademagazines.

Page 14: Battery Test

bnThe second edition of the Verbal andNumerical Critical Reasoning testshas been revised To meet the follow-ing aims:● To impove the face validity of the

test items, thus increasing thetest’s acceptability to respondents.

● To modernise the items to reflectcontemporary business and finan-cial issues.

● To improve the tests’ reliabilityand validity while maintainingthe tests’ brevity – with the CRBTbeing administrable in under onehour.

● To simplify test scoring.● To make available a hand scored

as well as a computer scoredversion of the tests.

● To remove the impact of guessingon raw VCR2scores, thus increas-ing the power of the VCR2 todiscriminate between respon-dents.

As noted above the most significantchange in the second edition of theVCR2 has been the incorporation ofa correction for guessing. This obvi-ates the problem that, due to thethree-point response scale that isused in most verbal critical reasoningtest, it is possible for respondents toget 33% of the items correct simplyby guessing.

While a variety of methods havebeen proposed for solving thisproblem (including the use of nega-tive or harsh scoring criteria) webelieve that a correction for guessingis the most elegant and practicalsolution to this problem.

This correction is based on thenumber of items the respondent getswrong on the test. We know that toget these items wrong the respondent

must have incorrectly guessed theanswer to that item. We can furtherassume that, by chance, the respon-dent incorrectly guessed

the answer 66% of the time andcorrectly guessed the answer 33%

of the time. Thus it is possible toestimate the number of correctguesses the respondent made fromthe number of incorrect responses.This correction can then besubtracted from the total score toadjust for the number of items therespondent is likely to have correctlyguessed.

The use of this correctionimproves the test’s score distribution,increasing its power to discriminatebetween the respondents’ ‘true’ability level. Thus it is recommendedthat test users correct sores forguessing before standardising scores.

However, as the norm tables forcorrected and uncorrected scores aresignificantly different from eachother it is important, if hand scoringthe Critical Reasoning tests, toensure that the correct norm table isused to standardise the scores on theVCR2. That is to say, either thenorm table for the uncorrected(Appendix IV - Table 2) or correctedscores (Appendix IV - Table 3)depending upon whether or not thecorrection for guessing has beenapplied).

REVISIONS FOR THESECOND EDITION

Page 15: Battery Test

bo

Page 16: Battery Test

3THE PSYCHOMETRICPROPERTIES OF THECRITICAL REASONINGTESTS

This chapter presents informationdescribing the psychometric proper-

ties of the Verbal and NumericalCritical Reasoning tests. The aim willbe to show that these measures meetthe necessary technical requirements

with regard to standardisation, relia-bility and validity, to ensure the psy-

chometric soundness of these testmaterials.

1 INTRODUCTION

2 STANDARDISATION

3 BIAS

4 RELIABILITY OF THE CRITICAL

REASONING TESTS

5 VALIDITY

6 STRUCTURE OF THE CRITICAL

REASONING TESTS

7 CONSTRUCT VALIDITY OF THE

CRITICAL REASONING TESTS

8 CRITERION VALIDITY OF THE CRITICAL

REASONING TESTS

Page 17: Battery Test

bq INTRODUCTIONSTANDARDISATION –NORMATIVE

Formative data allows us to comparean individuals score on a standard-ised scale against the typical scoreobtained from a clearly identifiable,homogeneous group of people.

RELIABILITY RELIABILITY

The property of a measurementwhich assesses the extent to whichvariation in measurement is due totrue differences between people onthe trait being measure or tomeasurement error. In order toprovide meaningful interpretations,the reasoning tests were standardisedagainst a number of relevant groups.The constituent samples are fullydescribed in the next section.Standardisation ensures that themeasurements obtained from a testcan be meaningfully interpreted inthe context of a relevant distributionof scores. Another important techni-cal requirement for apsychometrically sound test is thatthe measurements obtained fromthat test should be reliable.Reliability is generally assessed usingtwo specific measures, one related tothe stability of scale scores over time,the other concerned with the internalconsistency, or homogeneity of theconstituent items that form a scalescore.

RELIABILITY – ASSESSINGSTABILITY

Also known as test-retest reliability,an assessment is made of the similar-ity of scores on a particular scaleover two or more test occasions. Theoccasions may be from a few hours,days, months or years apart.Normally Pearson correlation coeffi-cients are used to quantify thesimi-larity between the scale scoresover the two or more occasions.Stability coefficients provide animportant indicator of a test’s likelyusefulness of measurement. If thesecoefficients are low (< approx. 0.6)then it is suggestive of either that theabilities/behaviours/attitudes beingmeasured are volatile or situationallyspecific, or that over the duration ofthe retest interval, situational eventshave made the content of the scaleirrelevant or obsolete. Of course, theduration of the retest intervalprovides some clue as to which effectmay be causing the unreliability ofmeasurement. However, the secondmeasure of a scales reliability alsoprovides valuable information as towhy a scale may have a low stabilitycoefficient. The most commonmeasure of internal consistency isCronbach’s Alpha. If the items on ascale have high inter-correlationswith each other, and with the totalscale score, then coefficient alphawill be high. Thus a high coefficientalpha indicates that the items on thescale are measuring very much thesame thing, while a low alpha wouldbe suggestive of either scale itemsmeasuring different attributes or thepresence of error.

Page 18: Battery Test

brRELIABILITY – ASSESSINGINTERNAL CONSISTENCY

Also known as scale homogeneity, anassessment is made of the ability ofthe items in a scale to measure thesame construct or trait. That is aparameter can be computed thatindexes how well the items in a scalecontribute to the overall measure-ment denoted by the scale score. Ascale is said to be internally consis-tent if all the constituent itemresponses are shown to be positivelyassociated with their scale score. Thefact that a test has high internalconsistency & stability coefficientsonly guarantees that it is measuringsomething consistently. It providesno guarantee that the test is actuallymeasuring what it purports tomeasure, nor that the test will proveuseful in a particular situation.Questions concerning what a testactually measures and its relevancein a particular situation are dealtwith by looking at the tests validity.Reliability is generally investigatedbefore validity as the reliability oftest places an upper limit on testsvalidity. It can be mathematicallydemonstrated that a validity coeffi-cient for a particular test can notexceed that tests reliability coeffi-cient.

VALIDITY

The ability of a scale score to reflectwhat that scale is intended tomeasure. Kline’s (1993) definition is‘A test is said to be valid if itmeasures what it claims to measure’.Validation studies of a test investi-gate the soundness and relevance ofa proposed interpretation of that test.Two key areas of validation areknown as criterion validity andconstruct validity.

VALIDITY – ASSESSINGCRITERION VALIDITY

Criterion validity involves translatinga score on a particular test into aprediction concerning what could beexpected if another variable wasobserved. The criterion validity of atest is provided by demonstratingthat scores on the test relate in somemeaningful way with an externalcriterion. Criterion validity comes intwo forms – predictive & concurrent.Predictive validity assesses whether atest is capable of predicting anagreed criterion which will be avail-able at some future time – e.g. can atest predict the likelihood of someonesuccessfully completing a trainingcourse. Concurrent validity assesseswhether the scores on a test can beused to predict a criterion measurewhich is available at the time of thetest – e.g. can a test predict currentjob performance.

VALIDITY – ASSESSINGCONSTRUCT VALIDITY

Construct validity assesses whetherthe characteristic which a test isactually measuring is psychologicallymeaningful and consistent with thetests definition. The construct valid-ity of a test is assessed bydemonstrating that the scores fromthe test are consistent with thosefrom other major tests whichmeasure similar constructs and aredissimilar to scores on tests whichmeasure different constructs.

Page 19: Battery Test

bs STANDARDISATIONThe critical reasoning tests werestandardised on a mixed sample of365 people drawn from graduate,managerial and professional groups.The age and sex breakdowns of thenormative sample for the VCR2 andNCR2 are presented in Tables 1 and2 respectively. As would be expectedfrom an undergraduate sample theage distribution is skewed to theyounger end of the age range of thegeneral population. The sex distribu-tion is however broadly consistentwith that found in the general popu-lation.

Norm tables for the VCR2 andNCR2 are presented in Appendix IV.For the Verbal Critical Reasoningtest different norm tables arepresented for test scores that have, orhave not, been corrected for guess-ing. (A correction for guessing hasnot been made available for theNumerical Critical Reasoning test asthe six-point scale this test uses miti-gates against the problem ofguessing.) As noted above it is

recommended that scores on theVCR2 are corrected for guessing.The correction for guessing shouldbe applied to the raw score (i.e. tothe score before it has been stan-dardised.) The corrected (oruncorrected) raw score is then stan-darised with reference to theappropriate norm table (AppendixIV Table 2 for uncorrected scoresand Table 3 for corrected scores.)Thus it is important that particu-lar care is taken to refer to thecorrect norm table when standra-dising VCR2 raw scores.

In addition, for users of theGeneSys system normative data isavailable also from within the soft-ware, which computes for any givenraw score, the appropriate standard-ised scores for the selected referencegroup. In addition the GeneSys™software allows users to establishtheir own in-house norms to allowmore focused comparison withprofiles of specific groups.

BIASGENDER AND AGEDIFFERENCES

Gender differences on CRTB wereexamined by comparing samples ofmales and female respondentsmatched, for educational and socio-economic status. Table 2 oppositeprovides mean scores for men and

women on the verbal and numericalcritical reasoning tests, along withthe F-ratio for the differencebetween theses means. While themen in this sample obtained margin-ally higher scores on both the verbaland numerical reasoning tests, thiswas not statistically significant.

Page 20: Battery Test

Age Age Male FemaleMean SD

31.7 7.9 n=245 n=119

bt

meanmen women Significance

(n=218) (n=166) F-ratio of difference

VCR2 21.1 22.1 .64 n.s.

NCR2 9.0 10.1 .15 n.s.

Table 2 – Mean scores for men and women (MBAs) on the VCR2 and NCR2

Table 1 – Mean and SD of age, and gender breakdown, of the normativesample

Page 21: Battery Test

ck

RELIABILITY OF THECRITICAL REASONINGTESTS

If a reasoning test is to be used forselection and assessment purposesthe test needs to measure each of theaptitude or ability dimensions it isattempting to measure reliably, forthe given population (e.g. graduateentrants, senior managers etc.). Thatis to say, the test needs to be consis-tently measuring each ability so thatif the test were to be used repeatedlyon the same candidate it wouldproduce similar results. It is gener-ally recognised that reasoning testsare more reliable than personalitytests and for this reason high stan-dards of reliability are usuallyexpected from such tests. While

many personality tests are consid-ered to have acceptable levels ofreliability if they have reliabilitycoefficients in excess of .7, reasoningtests should have reliability coeffi-cients in excess of .8.

GRT2 INTERNAL CONSISTENCY

Table 3 presents alpha coefficientsfor the Verbal and NumericalCritical Reasoning tests. Each ofthese reliability coefficients issubstantially greater than .8, clearlydemonstrating that the VCR2 andNCR2 are highly reliable across arange of samples.

Whereas reliability assess the degree of measurement error of a reasoningtest, that is to say the extent to which the test is consistently measuring oneunderling ability or aptitude, validity addresses the question of whether ornot the scale is measuring the characteristic it was developed to measure.This is clearly of key importance when using a reasoning test for assessmentand selection purposes. In order for the test to be a useful aid to selection weneed to know that the results are reliable and that the test is measuring theaptitude it is supposed to be measuring. Thus after we have examined a test’sreliability we need to address the issue of validity. We traditionally examinethe reliability of a test before we explore its validity as reliability sets thelower bound of a scale’s validity. That is to say a test cannot be more validthan it is reliable.

Insurance Sales Agents MBA’s Undergraduatesalpha (n=132) (n=205) (n=70)

VCR2 .88 .84 .88

NCR2 .83 .81 .86

VALIDITY

Table 3 – Alpha coefficients for the Verbal and Numerical CriticalReasoning Tests

Page 22: Battery Test

cl

STRUCTURE OF THECRITICAL REASONINGTESTS

Specifically we are concerned thatthe tests are correlated with eachother in a meaningful way. Forexample, we would expect the Verbaland Numerical Critical Reasoningtests be moderately correlated witheach other as they are measuringdifferent facets of critical reasoningability – namely verbal and numeri-cal ability. Thus if the VCR2 andNCR2 were not correlated with eachother we might wonder whether eachis a good measure of critical reason-

ing ability. Moreover, we wouldexpect the Verbal and NumericalCritical Reasoning tests Not to be sohighly correlated with each other asto suggest that they are measuringthe same construct (i.e. we wouldexpect the VCR2 and NCR2 to showdiscriminant validity). Consequently,the first way in which we mightassess the validity of a reasoning testis by exploring the relationshipbetween the tests.

InsuranceSales Agents MBA’s Undergraduates

(n=132) (n=170) (n=70)

.40 .57 .49

THE GRADUATE REASONING TESTS (GRT1) THE GENERAL REASONINGTHE CRITICAL REASONING

Table 4, which presents the PearsonProduct moment correlation betweenthe VCR2 and NCR2, demonstratesthat while the Verbal and Numericaltests are significantly correlated, theyare nevertheless measuring distinctabilities.

Table 4 – Correlations between the VCR2 and NCR2

Page 23: Battery Test

cm

CONSTRUCT VALIDITY OFTHE CRITICAL REASONINGTESTS

As an evaluation of construct valid-ity, the Verbal and NumericalCritical Reasoning tests were corre-lated with other widely usedmeasures of related constructs.

The VCR2 and NCR2 were corre-lated with the APIL-B (Ability,Processing of Information andLearning Battery) that has beendeveloped by Taylor (1995). TheAPIL-B has been specifically devel-oped to be a culture fair assessmenttool for use in a multi-racial context(South Africa). As such, it has beendesigned to assess an individual’score cognitive capabilities, ratherthan specific skills that may dependupon educational experience and lifeadvantagement/disadvantagement.

Table 5 presents the correlationsbetween the Verbal and NumericalCritical Reasoning tests with theAPIL-B, on a sample of MBAstudents. These correlations arehighly statistically significant, andsubstantial in size, providing strongsupport for the construct validity ofthe VCR2 and NCR2.

The VCR2 and NCR2 were alsofound to correlate substantially(r=.42 and r=.36 respectively) withFactor B (Intellectual Self-confi-dence) on the 16PFi on a sample(n=132) of insurance sales agents.This suggests that those respondentswho were more confident of theirown intellectual ability had higher

levels of critical reasoning ability;providing some tangential supportfor the construct validity of theVCR2 and NCR2.

Table 6 presents the correlationsbetween the original edition of theVerbal and Numerical CriticalReasoning tests and the AH5 – awidely respected measure of generalreasoning ability. These data thusprovide evidence demonstrating thatthe first edition of these two testsmeasure reasoning ability rather thansome other (related) construct (i.e.verbal or numerical checking ability).As was noted above, because of thenature of critical reasoning testsitems, it is particularly importantwhen developing such tests todemonstrate that they are measuringreasoning ability, and not checkingability. This is demonstrated byinspection of table 6.

The relationship between the firstedition of the CRTB and the Watson-Glaser Critical Thinking Appraisalwas examined by Correlating theVCR2 and NCR2 with the W-GCTA.The correlations with the W-GCTAwere .38, for both the Verbal andNumerical tests. While modest, thesecorrelations nonetheless demonstratea degree of congruence betweenthese two tests, as would be expectedfrom different measures of criticalreasoning.

Page 24: Battery Test

cnAPIL-B sample size significance

VCR2 .569 n=250 p<.001

NCR2 .512 n=169 p< .001

VCR2 NCR2

Verbal/Numericalsubscale of the AH5 .60 .51

Table 5 – Correlations between the Verbal and NumericalCritical Reasoning tests with the APIL-B

Table 6 – Correlations between the originalversions of the VCR2 and NCR2 with the AH5

Page 25: Battery Test

co

CRITERION VALIDITY OFTHE CRITICAL REASONINGTESTS

In this section, we provide details ofa number of studies in which thecritical reasoning tests have beenused to predict job related perfor-mance criteria.

INSURANCE SALES

A sample of 132 Insurance SalesAgents completed the CRTB as partof a validation study. The associationbetween their scores on the VCR2and NCR2 and their job perfor-mance was examined using t-tests.Job incumbents were classified aseither successful or unsuccessfuldepending upon their performanceafter one year in post. Table 7presents the mean scores for thesetwo groups on the VCR2 and NCR2.Inspection of this table indicatesthat, on average, the successful

incumbents had significantly higherscores on these tests than did thenon-successful incumbents. Thedifference in scores between thesetwo groups reached statistical signifi-cance for the NCR2. This providesstrong support for the criterion-related validity of this test.

A group of MBA studentscompleted the VCR2 and NCR2prior to enrolling. Their scores onthese tests were then correlated withtheir performance across differentcourses on the MBA syllabus. Theresults of this analysis are presentedin Table 8. Inspection of table 8 indi-cates that the critical reasoning testswere predictive of performanceacross a number of areas of study.This provides strong support for thepredictive validity of the CRTB.

Mean Mean (n=29) (n=23) t-value p

unsuccessful successful

VCR2 18.13793 21.21739 1.47715 n.s.

NCR2 9.72414 12.60870 2.18352 < .05

Table 7 – Association between the VCR2, NCR2 and insurance sales success

Page 26: Battery Test

cp VCR2 NCR2

Innovation & design .374 (n=89, p< 01) .260 (n=89 p< .01)

Business decision making .467 (n=35, p<.01) .433 (n=35 p<.01)

Macro economics .478 (n=89, p<.001) .386 (n=89, p<.001)

IT .468 (n=35, p<.01) .511 (n=35 p<.01)

Post Graduate Diplomain Business AdministrationAverage to date .364 (n=34, p<.05) .510 (n=34, p<.01)

Economics .236 (n=56, n.s.) .013 (n=56, n.s.)

Analytical Tools andTechniques .312 (n=51, p<.05) .134 (n=51, n.s.)

Marketing .204 (n=53, n.s.) -.124 (n=53, n.s.)

Finance & Accounting .209 (n=56, n.s.) -.007 (n=56, n.s.)

Organisational Behaviour .296 (n=56, p<.05) -.032 (n=56, n.s.)

MBA Category .389 (n=48, p<.01) .109 (n=48, n.s.)

Table 8 – Correlations between the VCR2, NCR2 and MBA performance

Page 27: Battery Test

cq

Page 28: Battery Test

4REFERENCESBinet. A (1910) Les idées modernessur les enfants Paris: E. Flammarion.

Cronbach L.J. (1960) Essentials ofPsychological Testing (2nd Edition)New York: Harper.

Galton F. (1869) Heriditary GenuisLondon: MacMillan.

Gould, S.J. (1981). The Mismeasureof Man. Harmondsworth, Middlesex:Pelican.

Heim, A.H. (1970). Intelligence andPersonality. Harmondsworth,Middlesex: Penguin.

Heim, A.H., Watt, K.P. andSimmonds, V. (1974). AH2/AH3Group Tests of General Reasoning;Manual. Windsor: NFER Nelson.

Jackson D.N. (1987) User’s Manualfor the Multidimensional AptitudeBattery London, Ontario: ResearchPsychologists Press.

Johnson, C., Blinkhorn, S., Wood, R.and Hall, J. (1989). ModernOccupational Skills Tests: User’sGuide. Windsor: NFER-Nelson.

Budd R.J. (1991) Manual for theClerical Test Battery: Letchworth,Herts UK: Psytech InternationalLimited

Budd R.J. (1993) Manual for theTechnical Test Battery: Letchworth,Herts UK: Psytech InternationalLimited

Stern W (1912) PsychologischeMethoden der Intelligenz-Prüfung.Leipzig, Germany

Barth Terman, L.M. et. al., (1917).The Stanford Revision of the Binet-Simon scale for measuringintelligence. Baltimore: Warwick andYork

Watson & Glaser (1980) Manual forthe Watson-Glaser Critical ThinkingAppraisal Harcourt BraceJovanovich: New York

Yerkes, R.M. (1921). Psychologicalexamining in the United Statesarmy. Memoirs of the NationalAcademy of Sciences, 15.

Page 29: Battery Test

cs

Page 30: Battery Test

5 APPENDIX I

ADMINISTRATIONINSTRUCTIONS

Good practice in test administration requires the assessor to set the scenebefore the formal administration of the tests. This scene-setting generallyincludes: welcome and introductions; the nature, purpose and use of theassessment and feedback arrangements.

If only one (either the Verbal or Numerical) of the Critical Reasoning tests isbeing administered then Say:

‘From now on, please do not talk among yourselves, but askme if anything is not clear. If you have a mobile phoneplease ensure that it is switched off. We shall be doing onlyone of the two tests contained in the booklet that I willshortly be distributing.

Say either:

The Verbal Critical Reasoning Test which takes 15 minutes

or

The Numerical Critical Reasoning Test which takes 25minutes

Continue

During the test I shall be checking to make sure you are notmaking any accidental mistakes when filling in the answersheet. I will not be checking your responses to see if you areanswering correctly or not.’

If you are administering both the Verbal and Numerical Critical Reasoningtests (as is more common), and if this is the first or only questionnaire beingadministered give an introduction as per or similar to the example scriptprovided.

Page 31: Battery Test

dkContinue by using the instructions exactly as given. Say:

‘From now on, please do not talk among yourselves, butask me if anything is not clear. If you have a mobile phoneplease ensure that it is switched off. We shall be doing twotests, the Verbal Critical Reasoning Test which takes 15minutes and the Numerical Critical Reasoning Test whichtakes 25 minutes. During the test I shall be checking tomake sure you are not making any accidental mistakeswhen filling in the answer sheet. I will not be checking yourresponses to see if you are answering correctly or not.’

WARNING: It is most important that answer sheets do not go astray. Theyshould be counted out at the beginning of the test and counted in again atthe end.

DISTRIBUTE THE ANSWER SHEETS

Then ask:

‘Has everyone got two sharp pencils, an eraser, some roughpaper and an answer sheet. Please note the answer boxesare in columns (indicate) and remember do not write onthe booklets.’

Rectify any omissions, then say:

‘Print your last name and first name on the line provided,and indicate your title and sex followed by your age andtoday’s date.’

Explain to the respondents what to enter in the boxes marked ‘Test Centre’and ‘Comments’. Walk round the room to check that the instructions arebeing followed.

WARNING: It is vitally important that test booklets do not go astray. Theyshould be counted out at the beginning of the session and counted in again atthe end.

DISTRIBUTE THE BOOKLETS WITH THE INSTRUCTION:

‘Please do not open the booklet until instructed.’

Remembering to read slowly and clearly, go to the front of the group. If youare only administering the Numerical Critical Reasoning test then go thesection below head Numerical Critical Reasoning test. If you are administer-ing both Critical Reasoning tests, or if you are just administering the VerbalCritical Reasoning test say:

‘Please open the booklet at Page 2 and follow the instruc-tions for this test as I read them aloud.’ (Pause to allowbooklets to be opened).’

Page 32: Battery Test

‘In this test you have to draw inferences from shortpassages of text. You will be presented with a passage oftext followed by a number of statements. Your task is todecide, on the basis of the information contained in thepassage, whether each statement is true, false or cannot beinferred from the passage. Your decision should be basedonly on the information contained in the passage and noton your own knowledge or opinions.’

‘Mark your answer by filling in the appropriate box, onyour answer sheet, that corresponds to your choice.’

‘You now have a chance to complete the example questionson page 3 in order to make sure that you understand thetest. Enter your responses to the example questions in thesection marked Example Questions at the top of the answersheet.’

Point to the section on the answer sheet marked Example Questions (as youread the above).

Then pause while candidates read the instructions, then say:

‘Please attempt the example questions now.’

While the candidates are doing the examples, walk around the room to checkthat everyone is clear about how to fill in the answer sheet. Make sure thatno-one is looking at the actual test items during the example session. Whenall have finished (allow a maximum of two and a half minutes) give theanswers as follows:

‘The correct response to Example 1 is False. It is explicitlystated within the text that further growth in the number ofradio stations is limited due to there being no new radiofrequencies available.’

‘The correct response to Example 2 is True. It is explicitlystated that audience figures affect advertising revenue, thusaffecting profitability.’

‘The correct response to Example 3 is Cannot Determine.It is impossible to infer, from the information provided inthe text, whether radio stations in general will become moreprofitable. The text indicates that audience figures arecurrently poor for many radio stations and that it isexpected that some may go bankrupt. However, it is notpossible to infer from this that audience figures (and as aresult advertising revenue) will increase for the remainingradio stations.’

dl

Page 33: Battery Test

dmCheck for understanding, then say:

‘Time is short so when you begin the timed test work asquickly and as accurately as you can.

If you are unsure of answer, mark your best choice andmove on to the next question.

If you want to change an answer cross it out, as indicatedin the instructions in the top left-hand corner of the answersheet, and fill in your new choice of answer.’

Point to the top left-hand corner of the answer sheet.

Then continue:

‘There are 8 passages of text and a total of 40 questions.You have 15 minutes in which to answer the questions.

If you reach the ‘End of Test’ before time is called you mayreview your answers if you wish.

If you have any questions please ask now, as you will not beable to ask questions once the test has started.’

Then say very clearly:

‘Is everyone clear about how to do this test?’

Deal with any questions, appropriately, then, starting stop watch or setting acount-down timer on the word begin say:

‘Please turn over the page and begin’

Answer only questions relating to procedure at this stage, but enter in theAdministrator’s Test Record any other problems which occur. Walk aroundthe room at appropriate intervals to check for potential problems.

At the end of the 15 minutes, say:

‘Stop’

You should intervene if candidates continue after this point.

If you are only administering the Verbal Critical Reasoning test say:

‘Close the test booklets’

Page 34: Battery Test

COLLECT ANSWER SHEETS AND BOOKLETS, ENSURING THATALL MATERIALS ARE RETURNED (COUNT BOOKLETS ANDANSWER SHEETS)

Then say:

‘Thank you for completing the Critical Reasoning TestBattery’

If you are administering both of the Critical Reasoning tests continue bysaying:

‘ Now please turn to Page 12 which is a blank page’

Then say:

‘We are now ready to start the n ext test. Has everyone stillgot two sharpened pencils, an eraser, some unused roughpaper?’

If not, rectify, then say:

‘The next test follows on the same answer sheet, pleaselocate the section now.’

Check for understanding.

The say:

‘Now please turn to page 14…’

If you are only administering the Numeriacl Critical Reasoning test say:

‘Please open the booklet at Page 14…’

and continue by saying:

‘and follow the instructions for this test as I read themaloud.’ (Pause to allow booklets to be opened).

In this test you will have to draw inferences from numericalinformation which is presented in tabular form.

You will be presented with a numerical table and asked anumber of questions about this information. You will thenhave to select the correct answer to each question from oneof six possible choices. One and only one answer is correctin each case.

dn

Page 35: Battery Test

doMark your answer, by filling in the appropriate box, onyour answer sheet that corresponds to your choice.

You now have a chance to complete the example questionson Pages 15 in order to make sure that you understand thetest. Enter your responses to the example questions in thesection marked Example Questions at the top of the answersheet.’

Point to the section on the answer sheet marked Example Questions (as youread the above).

Then pause while candidates read the instructions, then say:

‘Please attempt the example questions now.’

While the candidates are doing the examples, walk around the room to checkthat everyone is clear about how to fill in the answer sheet. Make sure thatno-one is looking at the actual test items during the example session. Whenall have finished (allow a maximum of three minutes) give the answers asfollows:

‘The correct answer to Example 1 is Design (answer no. 5).It can be seen, in the table, that amongst women, designwas consistently chosen by the lowest percentage as themost important feature of a car.

The correct answer to Example 2 is performance (answerno. 1). It can be seen that of all the features of a car, perfor-mance is rated by men as being the most important featureof a car.

The correct answer to Example 3 is 10.4 (answer no.5). Ofmen below the age of 30, 5% identified safety and 52%identified performance as the most important feature of acar. 5 over 52 is 10.4, therefore the answer is number 5.

Please do not turn over the page yet’

Then say:

‘Time is short so when you begin the timed test work asquickly and as accurately as you can.

If you want to change an answer cross it out, as indicatedin the instructions in the top left-hand corner of the answersheet, and fill in your new choice of answer.’

Point to the top left-hand corner of the answer sheet.

Page 36: Battery Test

Then continue:

‘There are 6 tables of numerical information and a total of25 questions. You have 25 minutes in which to answer thequestions.

If you reach the ‘End of Test’ before time is called you mayreview your answers if you wish.

If you have any questions please ask now, as you will not beable to ask questions once the test has started.’

Then say very clearly:

‘Is everyone clear about how to do this test?’

Deal with any questions, appropriately, then, starting stop watch or setting acount-down timer on the word begin say:

‘Please turn over the page and begin’

Answer only questions relating to procedure at this stage, but enter in theAdministrator’s Test Record any other problems which occur. Walk aroundthe room at appropriate intervals to check for potential problems.

At the end of the 25 minutes, say:

‘Stop. Close the test booklets’

You should intervene if candidates continue after this point.

If you are only administering the Verbal Critical Reasoning test say:

COLLECT ANSWER SHEETS AND BOOKLETS, ENSURING THATALL MATERIALS ARE RETURNED (COUNT BOOKLETS ANDANSWER SHEETS)

Then say either:

‘Thank you for completing the Critical Reasoning TestBattery’

or

‘Thank you for completing the Numerical CriticalReasoning Test’

dp

Page 37: Battery Test

dq APPENDIX IISCORING INSTRUCTIONS

The completed answer sheets are scored and profiled by following the stepslisted below:

1 Remove the top cover sheet of the combined answer/scoring sheet toreveal the scoring key.

To score and standardise the VCR2 follow steps 2-8. To score and standard-ise the NCR2 follow steps 9-10.

2 Count up the number of correct responses for the VCR2 and enter thetotal in the box marked ‘Total’ (Raw Score).

If you do not wish to correct the VCR2 score for guessing go straight to step 7.

3 To correct the VCR2 score for guessing add up the total number ofincorrect responses (i.e. the total number of items attempted minus theraw score) and enter this in the box marked ‘Number Wrong’.

4 The correction for guessing can be found in Appendix III. The numberof incorrect responses is listed in the first column of this table and thecorresponding correction for guessing is listed in the second column.Make note of the correction for guessing (that corresponds to thenumber of incorrectly completed items).

5 To obtain the corrected raw score, subtract the correction for guessingfrom the raw score. If this number is negative (i.e. the number correctedfor guessing is larger than the raw score) then the corrected raw score iszero. Enter the corrected raw score in the box marked‘Corrected/Uncorrected Raw Score’. To indicate that you have made thecorrection, delete ‘Uncorrected’.

6 To standardise the corrected raw score, look this up in the norm tablepresented in Appendix IV – Table 3 and enter this in the box marked‘Standard Score’.

You have scored and standardised the VCR2. If you wish to score and stan-dardise the NCR2 follow steps 9-10.

7 Enter the total score obtained from step 2 in the box marked‘Corrected/Uncorrected Raw Score’. To indicate that you have not madethe correction, delete ‘Corrected’.

8 To standardise the uncorrected raw score, look this value up in the normtable presented in Appendix IV – Table 2 and enter this in the boxmarked ‘Standard Score’.

9 Count up the number of correct responses to the NCR2 and enter thetotal in the box marked ’Total’.

10 To standardise the raw score, look this value up in the norm tablepresented in Appendix IV – Table 1 and enter this in the box marked‘Standard Score’.

Page 38: Battery Test

dr

APPENDIX IIICORRECTION FORGUESSING

Number ofincorrectanswers

12345678910111213141516171819202122232425262728293031323334353637383940

Correction(to be deductedfrom raw score)

.511.522.533.544.555.566.577.588.599.5

1010.51111.51212.513

CorrectedRaw Score = 0

Page 39: Battery Test

VCR2Correct

NCR2Raw

0-23

4-56-78-1011-1314-1617-1819-2021-25

VCR2Raw

0-78-1011-1213-1617-2021-2324-2728-2930-3233-40

dsAPPENDIX IVNORM TABLES

StenValues

12345678910

Table 1 – Norm:NCR2 Graduates/Managers

StenValues

12345678910

Table 2 – Norm: VCR2(Uncorrected)Graduates/Managers

StenValues

Table 3 – Norm: VCR2Corrected Graduates/Managers

Data not yetavailable