Validity
• Does test measure what it says it does?
• Is the test useful?
• Can a test be reliable, but not valid?
• Can a test be valid, but not reliable?
Types of validity
• Face validity– Important only so far as it doesn’t interfere with
an examinee’s willingness to cooperate.
• Content validity– How well does the test cover areas of content
that it should?– How adequately does it sample the universe of
behavior it was designed to assess?
Content validity (cont.)
• Panel of “experts”– Is the item/content essential?– Lawshe (1975) >50% of experts see skill as
essential
• Important for: – Achievement/classroom tests– Training program exams– Professional exams
Criterion-Related Validity
• How well does a test score relate to another score/variable of interest?– Correlate test with criterion
• Standard against which test is evaluated
• Concurrent
• Predictive
Criterion-Related Validity (cont.)
• Criterion should be– Reliable
• Reliability limits validity; can’t be valid if not reliable.
– Relevant
– Valid
– Uncontaminated• Criterion measure has been based in part on predictor measure
Criterion-Related Validity (cont.)
• Concurrent validity– Criterion immediately available– Present standing on a criterion
• Diagnosis, score on another test
– Used to predict the performance of new test takers or for people for whom the criterion isn’t available.
Criterion-Related Validity (cont.)
• Predictive validity– Test given, criterion measured later– Ex. ACT & College GPA; employment test &
job performance
• Incremental validity
Base Rate & Decision Theory
• Base rate: proportion of population who possess a certain trait, characteristic or attribute– % of EIU undergrads who graduate– % of African Americans with sickle cell anemia
• Base rate affects usefulness of tests
Decision Theory
• 4 outcomes
False rejections/negatives
Valid Acceptances/
Positives
Valid Rejections/
negatives
False Acceptances/
Positives
Cut scores & Hit rates
False rejections/negatives Valid Acceptances/
Positives
Valid Rejections/
negatives
False Acceptances/
Positives
Cut scores & Hit rates (cont.)
• Reciprocal relationship between # of false rejections and # of false acceptances
• Which is more acceptable: to limit the number accepted who shouldn’t be, or to minimize the # rejected who could be successful?
Construct Validity
• Construct:– Scientific idea hypothesized to explain behavior
– Postulated attribute of people, assumed to be reflected in test score
– Ex.: intelligence, self-esteem, motivation
• Construct validity: Does the test measure the construct?– Gives theoretical meaning to scores;
– Subsumes all other types of validity
Construct Validity (cont.)
• Convergent evidence/validity
• Divergent/discriminant evidence
• Factor analysis– Data reduction/simplification of complex
correlational matrices … to reveal major dimensions that underlie a set of items
– A factor is considered to be the construct that best represents relationships among variables
Factor Analysis (cont.)
• Methods of factor analysis– Exploratory
1. Correlation matrix
2. Factor matrix with loadings
3. Label factors
• Used to develop or eliminate items or scales from composite scores
Factor Analysis (cont.)
• Confirmatory factor analysis– Goodness of fit– After test has been developed
Validity & Bias
• Bias: a factor inherent within a test that systematically prevents accurate, impartial measurement– Bias implies systematic, not random variation
• Can you make equally valid predictions for different groups?
Bias in Predictions
• Questions of regression– Slope– Intercept– Error of estimate
Slope Bias
Bias & the DAS
60
80
100
120
140
75 85 100 115
General Conceptual Ability Scores
Word Reading Scores
Whites
Asian Americans
Linear (Whites)
Linear (AsianAmericans)
Intercept Bias
Bias & the DAS
0
20
40
60
80
100
120
140
1 2 3 4 5 6
General Conceptual Ability Scores
Basic Number Skills
Series1
Series2
Rating error
• Leniency Error
• Severity Error
• Central Tendency Error
• Halo Effect
Test Fairness
• Is the test used in an impartial, just, and equitable manner?
• Good tests Discriminate among individuals– Are group differences due to inadequate tests?– Is the test being used fairly?