chapter 6 validity §1 basic concepts of validity what is the validity? interpretation the validity...
TRANSCRIPT
Chapter 6 ValidityChapter 6 Validity §1§1 Basic Concepts of ValidityBasic Concepts of Validity
What is the Validity?What is the Validity?
InterpretationInterpretation
• The validity of a test concernsThe validity of a test concerns what what the test the test
measure and measure and how wellhow well it does so. it does so.
— —Anne AnastasiAnne Anastasi
It tell us what can be inferred from test scoresIt tell us what can be inferred from test scores
——Anne AnastasiAnne Anastasi
Figure6.1
One Funny
Picture
• Validity can be defined as the agreement betweValidity can be defined as the agreement between a test score or measure and the quality it is belien a test score or measure and the quality it is believed to measure.eved to measure.
— — Robert M. KaplanRobert M. Kaplan
Dennis P. SaccuzzoDennis P. Saccuzzo
Does the test measure what it is supposed to Does the test measure what it is supposed to measure?measure?
• Validity is the evidence for inferences made Validity is the evidence for inferences made
about a test score.about a test score.
— —AERA, APA, NCMEAERA, APA, NCME STANDARS FOR EDUCATIONAL AND PSYCHOLOGICAL TESTINGSTANDARS FOR EDUCATIONAL AND PSYCHOLOGICAL TESTING
Validity effected by random and Validity effected by random and
systematic errors.systematic errors.
Random errors and systematic errors both Random errors and systematic errors both reduce the accuracy of the test.reduce the accuracy of the test.
Mathematic Definition of ValidityMathematic Definition of Validity
2
2
t
co
s
sVal
222spco sss
Validity coefficient is the ratio of The variance concerned to the trait measured to observed score variance.
(6.1)
Comparing Validity with ReliabilityComparing Validity with Reliability The reliability of test is low, usually, the validity is low too;The reliability of test is low, usually, the validity is low too;
The reliability of test is high, the validity isn’t necessarily high.The reliability of test is high, the validity isn’t necessarily high.
Figure 6.2 Components of the Variance of Observed ScoresFigure 6.2 Components of the Variance of Observed Scores
Reliability is a necessary premise for validity and validity Reliability is a necessary premise for validity and validity represents the ultimate purpose of the test.represents the ultimate purpose of the test.
2cos2
sps 2es
2s
Three Types of ValidityThree Types of Validity
Criterion-Related ValidityCriterion-Related Validity
Content-Related validityContent-Related validity
Construct –Related ValidityConstruct –Related Validity
Note: The most recent standards emphasize that validity is a unitary concept. The use of categories does not imply that there are distinct forms of validity
Effect Factors for ValidityEffect Factors for Validity
Test ItselfTest Itself
Test Administration and ScoringTest Administration and Scoring
ExamineesExaminees
The Criterion Chosen for Criterion ValidityThe Criterion Chosen for Criterion Validity
Effect from test itselfEffect from test itself
The statement of the items is clear or notThe statement of the items is clear or not
The items represent the trait measured or notThe items represent the trait measured or not
The length of the test is adequate or notThe length of the test is adequate or not
The test difficulty is proper or not.The test difficulty is proper or not.
……
Test administration and scoringTest administration and scoring
Whether the sample is representative, Whether the sample is representative,
heterogeneous.heterogeneous.
Whether the testing conditions are appropriate and Whether the testing conditions are appropriate and
unexpected disturbances occur.unexpected disturbances occur.
Whether the tester administers the test according to Whether the tester administers the test according to
the manual.the manual.
Whether the test guides for examinees are clear.Whether the test guides for examinees are clear.
Whether the Scoring system is object and standard.Whether the Scoring system is object and standard.
ExamineesExaminees
Interests and Motivation on the TestInterests and Motivation on the Test
Emotional State and Attitude During the TestingEmotional State and Attitude During the Testing
State of Physical HealthState of Physical Health
Experiences on TestExperiences on Test
The criterion chosen for criterion The criterion chosen for criterion validityvalidity
YYXXXY rrr
§2 Content Validity and §2 Content Validity and Construct ValidityConstruct Validity
Content ValidityContent Validity
Interpretation Interpretation
Content validity involves the careful definition Content validity involves the careful definition
of the domain of behaviors to be measured by the of the domain of behaviors to be measured by the
test and the logical design of items to cover all the test and the logical design of items to cover all the
important areas of the domain. important areas of the domain.
The purpose of a content validity is to assess whThe purpose of a content validity is to assess wh
ether the items adequately represents a performanether the items adequately represents a performan
ce domain or construct of specific interestce domain or construct of specific interest
It is established through a rational analysis of thIt is established through a rational analysis of th
e content of a test.e content of a test.
Steps for Content Validation Using Steps for Content Validation Using Experts JudgmentExperts Judgment
1.1. Defining the performance domain of interestDefining the performance domain of interest
2.2. Selection a panel of qualified experts in the Selection a panel of qualified experts in the content domaincontent domain
3.3. Providing a structured framework for the Providing a structured framework for the process of matching items to the process of matching items to the performance domain performance domain
4.4. Collecting and summarizing the data from Collecting and summarizing the data from the matching process the matching process
ApplicationApplication
Content validity is most often employed with Content validity is most often employed with
achievement testachievement test, so the performance domain is , so the performance domain is
often defined by a list of instructional objectives.often defined by a list of instructional objectives.
Content validity is also applicable to certain Content validity is also applicable to certain
occupational testoccupational test designed for employee designed for employee
selection and classification. selection and classification.
Table6.1 Table of Instructional objectivesTable6.1 Table of Instructional objectives
knowledge Comprehensionknowledge Comprehension application analysis evaluation synthesisapplication analysis evaluation synthesis SumSum
Chapter1Chapter1
Chapter2 Chapter2
Chapter3Chapter3
Chapter4Chapter4
8 2 8 2
10 6 2 10 10 6 2 10
3 6 2 4 7 3 6 2 4 7
2 9 12 6 5 62 9 12 6 5 6
10 10
2828
22 22
40 40
SumSum 5 25 28 14 22 6 5 25 28 14 22 6 100100
Distinction form Face ValidityDistinction form Face Validity
The face validity refers to what it appears The face validity refers to what it appears
superficially to measure, not to what the test superficially to measure, not to what the test
actually measures. actually measures.
Construct ValidityConstruct Validity
Interpretation Interpretation
The construct validity of a test is the extent to The construct validity of a test is the extent to
which the test may be said to measure a theoreticawhich the test may be said to measure a theoretica
l construct or trait.l construct or trait.
What is Construct?What is Construct?
Each construct is developed to explain and Each construct is developed to explain and organize observed response consistencies. Itorganize observed response consistencies. It
derives from established interrelationships among derives from established interrelationships among behavioral measures.behavioral measures.
Examples: Examples: scholastic aptitude, intelligence, verbal fluency, scholastic aptitude, intelligence, verbal fluency, anxiety, depression, self-esteem, etc..anxiety, depression, self-esteem, etc..
Construct validation has focused attentioConstruct validation has focused attentio
n on the role of psychological theory in test cn on the role of psychological theory in test c
onstruction and on the need for formulate hyonstruction and on the need for formulate hy
potheses that can be proved or disproved in potheses that can be proved or disproved in
validation process.validation process.
Anne AnastasiAnne Anastasi
Procedures for Construct ValidationProcedures for Construct Validation
Correlations between a measure of the construCorrelations between a measure of the construct and designated ct and designated
Internal ConsistencyInternal Consistency
Differentiation between GroupsDifferentiation between Groups
Development ChangesDevelopment Changes
Factor Analysis Factor Analysis
Multitrait –multimethod matrix Multitrait –multimethod matrix
Method 1 Method 2 Method 3Method 1 Method 2 Method 3 Trait A B C A B C A B CTrait A B C A B C A B C
1.True-False1.True-FalseA. Sex-Guilt (.95)A. Sex-Guilt (.95)B. Hostility-Guilt .28 (.86)B. Hostility-Guilt .28 (.86)C. Morality-Conscience .58 .39 (.92) C. Morality-Conscience .58 .39 (.92) 2.Force Choice2.Force ChoiceA. Sex-Guilt A. Sex-Guilt .86.86 .32 .57 (.95) .32 .57 (.95)B. Hostility-Guilt .30 B. Hostility-Guilt .30 .90.90 .40 .39 (.76) .40 .39 (.76)C. Morality-Conscience .52 .31 C. Morality-Conscience .52 .31 .86.86 .55 .26 (.84) .55 .26 (.84) 3.Incomplete Sentences 3.Incomplete Sentences A. Sex-Guilt A. Sex-Guilt .73.73 .10 .43 .10 .43 .64 .64 .17 .37 (.48) .17 .37 (.48) B. Hostility-Guilt .10 B. Hostility-Guilt .10 .63 .63 .17 .22 .67 .19 .15 (.41) .17 .22 .67 .19 .15 (.41) C. Morality-Conscience .35 .16 C. Morality-Conscience .35 .16 .52 .52 .31 .17 .31 .17 .56 .56 .41 .30 (.58) .41 .30 (.58)
Example Example How to Search the Evidences for How to Search the Evidences for a a
Supposed Intelligence Test?Supposed Intelligence Test? State the theory hypotheses of test:State the theory hypotheses of test:
1. Intelligence grows with the age growing1. Intelligence grows with the age growing
2. IQ is relatively stable2. IQ is relatively stable
3. Intelligence is substantially related to school achievement3. Intelligence is substantially related to school achievement
4.Intelligence is affected by inheritance4.Intelligence is affected by inheritance Administer the test to population and analyze the data. Administer the test to population and analyze the data.
Judge: whether the test scores increase with the ages Judge: whether the test scores increase with the ages increasing; whether IQ and school achievements is correlated; increasing; whether IQ and school achievements is correlated; IQs keep stably cross a time interval; whether the correlation IQs keep stably cross a time interval; whether the correlation between MZ is higher than the correlation between DZ.between MZ is higher than the correlation between DZ.
§3 §3 Criterion-Related ValidityCriterion-Related Validity
ConceptsConcepts
1.interpretation of Criterion-related 1.interpretation of Criterion-related ValidityValidity
It is the degree on which the test scores can be It is the degree on which the test scores can be related to a criterion.related to a criterion.
It indicate the effectiveness of a test in It indicate the effectiveness of a test in predicting an individual performance in specified predicting an individual performance in specified activities.activities.
Two TypesTwo TypesPredictive Validity Predictive Validity refers to the degree to which refers to the degree to which
test scores predict criterion measurement that will test scores predict criterion measurement that will be made at some point in the future.be made at some point in the future.
Concurrent ValidityConcurrent Validity refers to the relationship refers to the relationship between test scores and criterion measurements between test scores and criterion measurements made at the time the test was given. made at the time the test was given.
2.What is 2.What is criterioncriterion??
The Criterion is some behavior that the test The Criterion is some behavior that the test
scores are used to predicted.scores are used to predicted.
For exampleFor example, use the grade-point averages as the criterion of , use the grade-point averages as the criterion of
a school admissions test .a school admissions test .
The problems About CriterionThe problems About Criterion
The reliability of criterionThe reliability of criterion
The validity of criterionThe validity of criterion
Whether it can be measuredWhether it can be measured
Criterion contaminationCriterion contamination
Usually Used CriterionUsually Used Criterion
academic achievement academic achievement ( for intelligence test)( for intelligence test)
performance in specialized training performance in specialized training (for special aptitude test)(for special aptitude test)
job performancejob performance
contrasted group contrasted group (for personality, domain-referenced test)(for personality, domain-referenced test)
psychiatric diagnosis ( psychiatric diagnosis ( for personality testfor personality test ) )
ratings by schoolteachers, job supervisorratings by schoolteachers, job supervisor
previously available testspreviously available tests
Procedures of Criterion-Related Procedures of Criterion-Related ValidationValidation
Validity Coefficient Validity Coefficient
Discrimination Between Two GroupsDiscrimination Between Two Groups
1.1. Estimate Validity CoefficientEstimate Validity Coefficient• Pearson Product Moment Correlation Pearson Product Moment Correlation
CoefficientCoefficient
YXXY sNs
xyr
YXXY ss
YXnXYr
nYYnXX
YXn
XYr XY
/)()(
1
2222 /
Exercise 1Exercise 1
Suppose that 10 male applicants were examined one job Suppose that 10 male applicants were examined one job
interests test and the admitted as salesman by one company. interests test and the admitted as salesman by one company.
The job interest test scores (The job interest test scores (XX) and the sale amount for the first ) and the sale amount for the first
year year (Y, unit is “ten thousands (Y, unit is “ten thousands $$”)”) of each applicant are listed of each applicant are listed
in the following tablein the following table..
table 6.2 10 Applicants’ Test Scores and Sale Amounttable 6.2 10 Applicants’ Test Scores and Sale Amount
examineesexaminees
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
XX 30 34 32 47 20 24 27 25 22 16 30 34 32 47 20 24 27 25 22 16
YY 2.5 3.8 3 4 0.7 1 2.2 3.5 2.8 1.2 2.5 3.8 3 4 0.7 1 2.2 3.5 2.8 1.2
• Biserial Correlation CoefficientBiserial Correlation Coefficient (for correlation between a continuous variable and a dichotomous variabl(for correlation between a continuous variable and a dichotomous variabl
e)e)
(6.2)(6.2)Y
pq
s
XXr
t
qpb
pX
p , is the percentage of examinees who get point “1” on dichotomous variable
q , is equal to 1-p
, is the mean of the test scores on the continuous variable of the examinees who get point “1” on dichotomous variable
qX ,is the mean of the test scores on the continuous variable of the examinees who get point “0” on dichotomous variable
ts , is the standard deviation of test scores for all examinees on continusous variable
Y , is the Y oirdinate of the standard normal curvve at the z-score associated with the p value.
Research CaseResearch Case Use Use rrbb to to estimate the validity of the fist application for WISC-R in estimate the validity of the fist application for WISC-R in
Shanghai. Shanghai.
Data concerned:Data concerned:
the number of first level middle school students is 66the number of first level middle school students is 66
the number of second level middle school students is 286the number of second level middle school students is 286
the mean of IQs of the first level students is 114the mean of IQs of the first level students is 114
the mean of IQs of the second level students is 96the mean of IQs of the second level students is 96
the standard deviation of all students’ IQs is 14.53the standard deviation of all students’ IQs is 14.53
if if pp=.1875, then=.1875, then Y Y is .2685is .2685
1875.035266 p 8125.0352286 q
70.2685.
8125.1875..
53.14
96114
Y
pq
s
XXr
t
qpb
pp=.1875, =.1875, thenthen Y Y is .2685is .2685
Exercise 2Exercise 2 The middle school students attended a math test. The The middle school students attended a math test. The
mean scores of students who have been instructed with mean scores of students who have been instructed with
higher math program is 60.188, and their number is 382. higher math program is 60.188, and their number is 382.
The mean of the students who have accepted normal The mean of the students who have accepted normal
program is 47.429, and their number is 618. The standard program is 47.429, and their number is 618. The standard
deviation for all students is 11.910. Please estimate the deviation for all students is 11.910. Please estimate the
validity coefficient of the math test. validity coefficient of the math test.
188.60pX
2. 2. Discrimination Between Two GroupsDiscrimination Between Two Groups• Compare the means of two groups (t Test)Compare the means of two groups (t Test)
21
21
XXs
XXt
)11
(2
)1()1(
2121
222
211
21 nnnn
snsns
XX
)1()1( 21 nndf
Degree of freedom
• Compute the overlap amount of the Compute the overlap amount of the two groupstwo groups
Method 1Method 1 Compute the number of the examinees from one Compute the number of the examinees from one
group (usually contrasted )whose test scores is group (usually contrasted )whose test scores is higher than the mean of the other group;higher than the mean of the other group;
Compute the rate of the number of those test Compute the rate of the number of those test scores is higher than the mean for the other scores is higher than the mean for the other group; group;
Then calculate the rate of the two numbers. Then calculate the rate of the two numbers.
Method 2Method 2
Compute the overlap percentage of the score disCompute the overlap percentage of the score dis
tribution for each grouptribution for each group
§4 Application of Validity §4 Application of Validity CoefficientCoefficient
Predict the Criterion ScorePredict the Criterion Score1.1. Establish Regression EquationEstablish Regression Equation
YXYX aXbY ˆ
Y , is the predicted criterion score for a examinee
X , the test score of a examinee
YXb , is the regression coefficient, and XYYXYX ssrb
YXa , is the intercept, and XbYa YXYX
Example Figure 6.3 100 Examinees ’ Scores on Job Aptitude Test and Real
Performance Scores
68.089.180.128,435.5 XYYX rssYX ,,,,
714.0)80.1/89.1(68.0 XYYXYX ssrb
46.035.5714.028.4 XbYa YXYX
46.0714.0ˆ XY
If one applicant get 6 on the test, then we can use If one applicant get 6 on the test, then we can use the regression equation to predict his job the regression equation to predict his job performance in the future.performance in the future.
744.446.06714.0ˆ Y
Exercise 3Exercise 3
Suppose a group of students from high school Suppose a group of students from high school were examined a job interests test. Researcher were examined a job interests test. Researcher obtained these statistics: obtained these statistics:
The validity coefficient is 0.6. If John got 54 The validity coefficient is 0.6. If John got 54 points on the job interest test, then what his points on the job interest test, then what his criterion scores (job performance) would be?criterion scores (job performance) would be?
10,50 XsX 8.0,4.2 YsY
2. Estimate Error2. Estimate Error
Standard Error of Estimate ( )Standard Error of Estimate ( )
The error of estimate shows the margin of error to The error of estimate shows the margin of error to be expected in the individual’s predicted criterion be expected in the individual’s predicted criterion score, as a result of the imperfect validity of the rest.score, as a result of the imperfect validity of the rest.
ests
21 XYYest rss
X1 X
1Y
2XYr Coefficient of Determination,
indicating the proportion of the
variance of criterion test scores which
is related to the variance of the
predictor test scores.
3. Establish the approximate interval for an 3. Establish the approximate interval for an actual criterion actual criterion YY
estpszY ˆ
Validity Coefficient and Validity Coefficient and Classification DecisionClassification Decision
Y
X
Yc
Xc
Figure 6.4 Scatter Plots of the Predictor and Criterion Scores
Basic ConceptsBasic Concepts
Cut-off ScoresCut-off Scores
Valid AcceptanceValid Acceptance
Valid RejectionValid Rejection
False AcceptenceFalse Acceptence
False RejectionFalse Rejection
Four ratesFour rates
Base RateBase Rate
the proportion of successful applicants selected the proportion of successful applicants selected without the use of a test.without the use of a test.
Selection Ratio Selection Ratio
the proportion of applicants who must be acceptedthe proportion of applicants who must be accepted
Hit RateHit Rate
the percentage of predictions that are correct.the percentage of predictions that are correct.
Success RatioSuccess Ratio
the proportion of selected applicants who succeedthe proportion of selected applicants who succeed
Table 6.3 Taylor-Russell Table foe a Base Rate of .60Table 6.3 Taylor-Russell Table foe a Base Rate of .60