chapter 2 ppt eval & testing 4e formatted 01.10 kg edits

© 2013 Springer Publishing Company, LLC.

Chapter 2Qualities of EffectiveAssessment ProceduresOermann & GabersonEvaluation and Testing in Nursing Education4th edition


General Criteria for Effective Assessment Procedures♦ Produce results that can be used to make

appropriate inferences about learners’ knowledge and abilities – Important educational decisions based on

such inferences♦ Practical and easy to use

2


Guiding Questions

♦ To what extent will the interpretation of the scores be appropriate, meaningful, and useful for their intended application?

♦ What are the consequences of how the results are used and interpreted?

3


Assessment Validity

♦ Concept has changed over time♦ Current philosophy– Meaningfulness of the interpretations that

teachers make of assessment results – Adequacy and appropriateness of inferences

about scores and how results are used – Emphasis on consequences (intended and

unintended) of test use

4


Assessment Validity (cont’d)

♦ Not a static property of the test itself♦ Not an either/or judgment– Degrees of validity depending on purpose of test

and how scores will be used

5


Assessment Validity (cont’d)

♦ Unitary concept– Variety of sources of evidence to support the

validity of the interpretation and use of assessment results

– Four major considerations for validation• Content• Construct• Assessment-criterion relationships• Consequences

6


Content Considerations

♦ Goal of content validation– Determine the degree to which the assessment

tasks accurately represent the domain of content or abilities about which the teacher wants to interpret assessment results

– A test is only a sample of the universe of possible assessment tasks

– “Face validity” is insufficient evidence of content representativeness

7


Content Considerations

♦ Start by defining the universe of content– Should be related to the purpose for which

the test will be used ♦ Write or select test items that satisfactorily

represent the desired content domain– Test blueprint or table of specifications documents– Also important when selecting a published test

8


Content Considerations (cont’d)

♦ Assessed by content-domain experts– Determine if assessment tasks represent the• content domain (as specified on test blueprint)• learning outcomes

– Trustworthiness of this evidence is based on estimation of rater reliability• How closely do the judgments of multiple

experts agree?

9


Construct Considerations

♦ “Umbrella” concept for all types of assessment validation

♦ Goes beyond content considerations– Used to make inferences from assessment results

to more general abilities (e.g., clinical reasoning)– What construct is the assessment intended to

measure?

10


Construct Considerations (cont’d)

♦ Construct – Characteristic assumed to exist because it explains some

observed behavior– Cannot be observed directly—inferred from performance

♦ Construct validation – Determining the extent to which assessment results can be

interpreted in terms of the construct

♦ Two central elements – Construct representation– Construct relevance

11



♦ Construct representation– Extent to which important elements of the

construct are represented in the assessment♦ Construct relevance– Extent to which the assessment focuses only

on relevant elements of the construct– Omits factors that are unrelated or irrelevant

to the construct (e.g., writing ability, English language literacy)

12



♦ Methods used in construct validation– Define the domain to be measured– Analyze the process of responding to tasks

required by the assessment– Compare assessment results of known groups– Compare assessment results before and after a

learning activity– Correlate assessment results with other measures

13


Assessment-Criterion Relationship Considerations♦ Predictive validation– Focuses on predicting future performance (the

criterion) based on current assessment results ♦ Concurrent validation– Uses assessment results to estimate

performance on another assessment (the criterion measure) at the same time

– Not widely used for teacher-made assessments

14


Assessment-Criterion Relationship Considerations (cont’d)♦ Relationship between assessment scores and

criterion-measure scores usually expressed as a correlation coefficient

♦ Teacher who uses the test must judge what magnitude of correlation is adequate for the intended use of the assessment

15


Consideration of Consequences

♦ Assessment has intended and unintended consequences

♦ Concept of validity includes consideration of the consequences of assessment use and how results are interpreted by students, teachers, and other stakeholders

16


Influences on Validity

♦ Characteristics of the assessment– Examples: clarity of directions, number of items,

test construction errors♦ Assessment administration and scoring factors – Examples: cheating, scoring errors, time limits

♦ Student characteristics – Examples: test anxiety, motivation

17


Reliability

♦ Consistency of test scores♦ Extent to which test scores are accurate,

error-free, and stable♦ Reproducibility and generalizability of

test scores♦ Necessary but insufficient condition

for validity

18


Reliability (cont’d)

♦ Sources of inconsistency– Instability of the behavior being measured – Sample of tasks varies from one assessment to

another – Assessment conditions vary significantly– Scoring procedures are inconsistent

♦ These and other factors introduce an unknown amount of error into every measurement

19


Reliability (cont’d)♦ Obtained score

– The number of correct answers

♦ True score– Hypothetical– Cannot be measured directly– Represents what the student actually knows

♦ Error score– Difference between true score and obtained score– Cannot be measured directly– Affects measurement reliability

20



♦ Methods of determining assessment reliability estimate how much measurement error is present

♦ When assessment results are reasonably consistent, measurement error ↓ and reliability ↑

21



♦ Reliability pertains to assessment results, not to the assessment instrument

♦ A reliability estimate always refers to a particular type of consistency

♦ A reliability estimate is always represented by a statistical value (reliability coefficient or standard error of measurement)

22


Methods of Estimating Reliability

♦ Measures of stability– Indicates whether students would achieve similar

scores if they took the same assessment at another time—test-retest procedure

– Appropriate when the trait being measured is expected to be stable over time

– Limited usefulness for teacher-made assessments, but an important consideration when selecting standardized tests

23


Methods of Estimating Reliability (cont’d)♦ Measures of equivalence – Use of two or more forms of the same assessment,

based on the same blueprint– Both forms administered to the same group of students

in close succession; resulting scores are correlated– High reliability coefficient indicates that the forms

sample the domain equally well – Widely used in standardized testing, but not practical

for teacher-made assessments

24


♦ Measures of internal consistency—split-half methods– Used with a set of scores from only one

administration of a single assessment: Divide the assessment into two equal subtests, score subtests separately, correlate the two sets of subscores

– Underestimates the true reliability of the scores produced by the whole assessment—correct with Spearman-Brown prophecy formula

Methods of Estimating Reliability (cont’d)

25


♦ Measures of internal consistency—coefficient alpha– Extent to which the assessment tasks measure

similar characteristics– Kuder-Richardson formulas are a specific

type of coefficient alpha• Require dichotomously scored assessment tasks


26


♦ Measures of consistency of ratings– Determine if same scores would have been obtained if a

different person had scored the assessment or judged the performance

– Two equally qualified persons score each student’s paper or rate each student’s performance; two scores are compared

– Produces a percentage of agreement or index of scorer consistency (correlation)

– Interrater consistency facilitated by the use of scoring rubrics and training of raters


27


Influences on Reliability of Scores

♦ Assessment-related factors – Length of the test

• In general, more assessment tasks (e.g., test items) → greater score reliability

– Homogeneity of assessment tasks• Score reliability enhanced by homogeneity of content covered by

the assessment

– Item difficulty and discrimination ability• Moderately difficulty items, good discrimination between high and

low achievers, and absence of technical errors → greater score reliability

28


Influences on Reliability of Scores (cont’d) ♦ Student-related factors – Heterogeneity of the student group

• In general, increased range of ability in the group of students → greater score reliability

– Testwiseness• Student with test-taking skills and experience may obtain a higher

score than true ability would predict

– Motivation• Influences individual students differently• Scores of poorly motivated students may not accurately represent

their actual achievement levels

29


♦ Assessment administration conditions – Time limits• Inadequate time can lower the reliability of scores• Some students who know the content well may be

unable to respond to all of the items

– Cheating • Contributes random errors to assessment scores• Raises offenders’ observed scores above their

true scores

Influences on Reliability of Scores (cont’d)

30


Practicality (Usability)

♦ A quality of the assessment instrument itself and its administration procedures

♦ Qualities of practical assessments– Easy to administer and score– Do not take too much time away from other

instructional activities – Have reasonable resource requirements

31


Practicality (Usability; cont’d)

♦ Practicality criteria – Easy to construct and use – Reasonable time requirements for administration

and scoring the assessment and interpreting results

– Reasonable costs associated with assessment construction, administration, and scoring

– Assessment results can be interpreted easily and accurately by those who will use them

32

chapter 2 ppt eval & testing 4e formatted 01.10 kg edits

Documents