chapter 2 ppt eval & testing 4e formatted 01.10 kg edits
TRANSCRIPT
© 2013 Springer Publishing Company, LLC.
Chapter 2Qualities of EffectiveAssessment ProceduresOermann & GabersonEvaluation and Testing in Nursing Education4th edition
© 2013 Springer Publishing Company, LLC.
General Criteria for Effective Assessment Procedures♦ Produce results that can be used to make
appropriate inferences about learners’ knowledge and abilities – Important educational decisions based on
such inferences♦ Practical and easy to use
2
© 2013 Springer Publishing Company, LLC.
Guiding Questions
♦ To what extent will the interpretation of the scores be appropriate, meaningful, and useful for their intended application?
♦ What are the consequences of how the results are used and interpreted?
3
© 2013 Springer Publishing Company, LLC.
Assessment Validity
♦ Concept has changed over time♦ Current philosophy– Meaningfulness of the interpretations that
teachers make of assessment results – Adequacy and appropriateness of inferences
about scores and how results are used – Emphasis on consequences (intended and
unintended) of test use
4
© 2013 Springer Publishing Company, LLC.
Assessment Validity (cont’d)
♦ Not a static property of the test itself♦ Not an either/or judgment– Degrees of validity depending on purpose of test
and how scores will be used
5
© 2013 Springer Publishing Company, LLC.
Assessment Validity (cont’d)
♦ Unitary concept– Variety of sources of evidence to support the
validity of the interpretation and use of assessment results
– Four major considerations for validation• Content• Construct• Assessment-criterion relationships• Consequences
6
© 2013 Springer Publishing Company, LLC.
Content Considerations
♦ Goal of content validation– Determine the degree to which the assessment
tasks accurately represent the domain of content or abilities about which the teacher wants to interpret assessment results
– A test is only a sample of the universe of possible assessment tasks
– “Face validity” is insufficient evidence of content representativeness
7
© 2013 Springer Publishing Company, LLC.
Content Considerations
♦ Start by defining the universe of content– Should be related to the purpose for which
the test will be used ♦ Write or select test items that satisfactorily
represent the desired content domain– Test blueprint or table of specifications documents– Also important when selecting a published test
8
© 2013 Springer Publishing Company, LLC.
Content Considerations (cont’d)
♦ Assessed by content-domain experts– Determine if assessment tasks represent the• content domain (as specified on test blueprint)• learning outcomes
– Trustworthiness of this evidence is based on estimation of rater reliability• How closely do the judgments of multiple
experts agree?
9
© 2013 Springer Publishing Company, LLC.
Construct Considerations
♦ “Umbrella” concept for all types of assessment validation
♦ Goes beyond content considerations– Used to make inferences from assessment results
to more general abilities (e.g., clinical reasoning)– What construct is the assessment intended to
measure?
10
© 2013 Springer Publishing Company, LLC.
Construct Considerations (cont’d)
♦ Construct – Characteristic assumed to exist because it explains some
observed behavior– Cannot be observed directly—inferred from performance
♦ Construct validation – Determining the extent to which assessment results can be
interpreted in terms of the construct
♦ Two central elements – Construct representation– Construct relevance
11
© 2013 Springer Publishing Company, LLC.
Construct Considerations (cont’d)
♦ Construct representation– Extent to which important elements of the
construct are represented in the assessment♦ Construct relevance– Extent to which the assessment focuses only
on relevant elements of the construct– Omits factors that are unrelated or irrelevant
to the construct (e.g., writing ability, English language literacy)
12
© 2013 Springer Publishing Company, LLC.
Construct Considerations (cont’d)
♦ Methods used in construct validation– Define the domain to be measured– Analyze the process of responding to tasks
required by the assessment– Compare assessment results of known groups– Compare assessment results before and after a
learning activity– Correlate assessment results with other measures
13
© 2013 Springer Publishing Company, LLC.
Assessment-Criterion Relationship Considerations♦ Predictive validation– Focuses on predicting future performance (the
criterion) based on current assessment results ♦ Concurrent validation– Uses assessment results to estimate
performance on another assessment (the criterion measure) at the same time
– Not widely used for teacher-made assessments
14
© 2013 Springer Publishing Company, LLC.
Assessment-Criterion Relationship Considerations (cont’d)♦ Relationship between assessment scores and
criterion-measure scores usually expressed as a correlation coefficient
♦ Teacher who uses the test must judge what magnitude of correlation is adequate for the intended use of the assessment
15
© 2013 Springer Publishing Company, LLC.
Consideration of Consequences
♦ Assessment has intended and unintended consequences
♦ Concept of validity includes consideration of the consequences of assessment use and how results are interpreted by students, teachers, and other stakeholders
16
© 2013 Springer Publishing Company, LLC.
Influences on Validity
♦ Characteristics of the assessment– Examples: clarity of directions, number of items,
test construction errors♦ Assessment administration and scoring factors – Examples: cheating, scoring errors, time limits
♦ Student characteristics – Examples: test anxiety, motivation
17
© 2013 Springer Publishing Company, LLC.
Reliability
♦ Consistency of test scores♦ Extent to which test scores are accurate,
error-free, and stable♦ Reproducibility and generalizability of
test scores♦ Necessary but insufficient condition
for validity
18
© 2013 Springer Publishing Company, LLC.
Reliability (cont’d)
♦ Sources of inconsistency– Instability of the behavior being measured – Sample of tasks varies from one assessment to
another – Assessment conditions vary significantly– Scoring procedures are inconsistent
♦ These and other factors introduce an unknown amount of error into every measurement
19
© 2013 Springer Publishing Company, LLC.
Reliability (cont’d)♦ Obtained score
– The number of correct answers
♦ True score– Hypothetical– Cannot be measured directly– Represents what the student actually knows
♦ Error score– Difference between true score and obtained score– Cannot be measured directly– Affects measurement reliability
20
© 2013 Springer Publishing Company, LLC.
Reliability (cont’d)
♦ Methods of determining assessment reliability estimate how much measurement error is present
♦ When assessment results are reasonably consistent, measurement error ↓ and reliability ↑
21
© 2013 Springer Publishing Company, LLC.
Reliability (cont’d)
♦ Reliability pertains to assessment results, not to the assessment instrument
♦ A reliability estimate always refers to a particular type of consistency
♦ A reliability estimate is always represented by a statistical value (reliability coefficient or standard error of measurement)
22
© 2013 Springer Publishing Company, LLC.
Methods of Estimating Reliability
♦ Measures of stability– Indicates whether students would achieve similar
scores if they took the same assessment at another time—test-retest procedure
– Appropriate when the trait being measured is expected to be stable over time
– Limited usefulness for teacher-made assessments, but an important consideration when selecting standardized tests
23
© 2013 Springer Publishing Company, LLC.
Methods of Estimating Reliability (cont’d)♦ Measures of equivalence – Use of two or more forms of the same assessment,
based on the same blueprint– Both forms administered to the same group of students
in close succession; resulting scores are correlated– High reliability coefficient indicates that the forms
sample the domain equally well – Widely used in standardized testing, but not practical
for teacher-made assessments
24
© 2013 Springer Publishing Company, LLC.
♦ Measures of internal consistency—split-half methods– Used with a set of scores from only one
administration of a single assessment: Divide the assessment into two equal subtests, score subtests separately, correlate the two sets of subscores
– Underestimates the true reliability of the scores produced by the whole assessment—correct with Spearman-Brown prophecy formula
Methods of Estimating Reliability (cont’d)
25
© 2013 Springer Publishing Company, LLC.
♦ Measures of internal consistency—coefficient alpha– Extent to which the assessment tasks measure
similar characteristics– Kuder-Richardson formulas are a specific
type of coefficient alpha• Require dichotomously scored assessment tasks
Methods of Estimating Reliability (cont’d)
26
© 2013 Springer Publishing Company, LLC.
♦ Measures of consistency of ratings– Determine if same scores would have been obtained if a
different person had scored the assessment or judged the performance
– Two equally qualified persons score each student’s paper or rate each student’s performance; two scores are compared
– Produces a percentage of agreement or index of scorer consistency (correlation)
– Interrater consistency facilitated by the use of scoring rubrics and training of raters
Methods of Estimating Reliability (cont’d)
27
© 2013 Springer Publishing Company, LLC.
Influences on Reliability of Scores
♦ Assessment-related factors – Length of the test
• In general, more assessment tasks (e.g., test items) → greater score reliability
– Homogeneity of assessment tasks• Score reliability enhanced by homogeneity of content covered by
the assessment
– Item difficulty and discrimination ability• Moderately difficulty items, good discrimination between high and
low achievers, and absence of technical errors → greater score reliability
28
© 2013 Springer Publishing Company, LLC.
Influences on Reliability of Scores (cont’d) ♦ Student-related factors – Heterogeneity of the student group
• In general, increased range of ability in the group of students → greater score reliability
– Testwiseness• Student with test-taking skills and experience may obtain a higher
score than true ability would predict
– Motivation• Influences individual students differently• Scores of poorly motivated students may not accurately represent
their actual achievement levels
29
© 2013 Springer Publishing Company, LLC.
♦ Assessment administration conditions – Time limits• Inadequate time can lower the reliability of scores• Some students who know the content well may be
unable to respond to all of the items
– Cheating • Contributes random errors to assessment scores• Raises offenders’ observed scores above their
true scores
Influences on Reliability of Scores (cont’d)
30
© 2013 Springer Publishing Company, LLC.
Practicality (Usability)
♦ A quality of the assessment instrument itself and its administration procedures
♦ Qualities of practical assessments– Easy to administer and score– Do not take too much time away from other
instructional activities – Have reasonable resource requirements
31
© 2013 Springer Publishing Company, LLC.
Practicality (Usability; cont’d)
♦ Practicality criteria – Easy to construct and use – Reasonable time requirements for administration
and scoring the assessment and interpreting results
– Reasonable costs associated with assessment construction, administration, and scoring
– Assessment results can be interpreted easily and accurately by those who will use them
32