© 2011 pearson prentice hall, salkind. measurement, reliability and validity

© 2011 Pearson Prentice Hall, Salkind.

Measurement, Reliability and Validity


Explain why measurement is important to the research process.

Discuss the four levels of measurement and provide an example of each.

Explain the concept of reliability in terms of observed score, true score, and error.

Describe the two elements that can make up an error score. List methods for increasing reliability. Discuss four ways in which reliability can be examined. Provide a conceptual definition of validity. List the three traditional types of validity. Explain the relationship between reliability and validity.


The Measurement Process Levels of Measurement Reliability and Validity: Why They Are Very,

Very Important Validity The Relationship Between Reliability and

Validity Closing (and Very Important) Thoughts


Two definitions◦ Stevens—“assignment of numerals to objects or

events according to rules.”◦ “…the assignment of values to outcomes.”

Chapter foci◦ Levels of measurement◦ Reliability and validity


Variables are measured at one of these four levels Qualities of one level are characteristic of the next level

up The more precise (higher) the level of measurement, the

more accurate is the measurement process

Level of Measurement

For Example Quality of Level

Ratio Rachael is 5’ 10” and Gregory is 5’ 5” Absolute zero

Interval Rachael is 5” taller than Gregory An inch is an inch is an inch

Ordinal Rachael is taller than Gregory Greater than

Nominal Rachael is tall and Gregory is short Different from


Qualities Example What You Can Say

What You Can’t Say

Assignment of labels

Gender— (male or female)Preference— (like or dislike)Voting record— (for or against)

Each observation belongs in its own category

An observation represents “more” or “less” than another observation




Assignment of values along some underlying dimension

Rank in collegeOrder of finishing a race

One observation is ranked above or below another.

The amount that one variable is more or less than another




Equal distances between points

Number of words spelled correctlyIntelligence test scoresTemperature

One score differs from another on some measure that has equally appearing intervals

The amount of difference is an exact representation of differences of the variable being studied




Meaningful and non-arbitrary zero

AgeWeightTime

One value is twice as much as another or no quantity of that variable can exist

Not much!


Continuous variables◦ Values can range along a continuum◦ E.g., height

Discrete variables (categorical)◦ Values are defined by category boundaries◦ E.g., gender


Measurement should be as precise as possible

In psychology, most variables are probably measured at the nominal or ordinal level

But—how a variable is measured can determine the level of precision


Reliability—tool is consistent Validity—tool measures “what-it-should” Good assessment tools

◦ Rejection of Null hypotheses OR

◦ Acceptance of Research hypotheses


Method ErrorObserved Score = True Score + Error Score

Trait Error


Observed score ◦ Score actually observed◦ Consists of two components

True Score Error Score


Trait Error


True score◦ Perfect reflection of true value for individual◦ Theoretical score


Trait Error


Error score ◦ Difference between observed and true score


Trait Error


Method error is due to characteristics of the test or testing situation

Trait error is due to individual characteristics Conceptually, reliability =

Reliability of the observed score becomes higher if error is reduced!!


Trait Error

True ScoreTrue Score + Error Score


Increase sample size Eliminate unclear questions Standardize testing conditions Moderate the degree of difficulty of the

tests Minimize the effects of external events Standardize instructions Maintain consistent scoring procedures


Reliability is measured using a◦ Correlation coefficient◦ r test1•test2

Reliability coefficients◦ Indicate how scores on one test change relative to

scores on a second test◦ Can range from -1.0 to +1.0

+1.00 = perfect reliability 0.00 = no reliability


Type of Reliability

What It Is How You Do It What the Reliability Coefficient Looks Like

Test-Retest A measure of stability

Administer the same test/measure at two different times to the same group of participants

rtest1•test1

Parallel Forms

A measure of equivalence

Administer two different forms of the same test to the same group of participants

rform1•form2

Inter-Rater A measure of agreement

Have two raters rate behaviors and then determine the amount of agreement between them

Percentage of agreements

Internal Consistency

A measure of how consistently each item measures the same underlying construct

Correlate performance on each item with overall performance across participants

Cronbach’s alpha

Kuder-Richardson


A valid test does what it was designed to do A valid test measures what it was designed

to measure


Validity refers to the test’s results, not to the test itself

Validity ranges from low to high, it is not “either/or”

Validity must be interpreted within the testing context


Type of Validity What Is It? How Do You Establish It?

Content A measure of how well the items represent the entire universe of items

Ask an expert if the items assess what you want them to

Criterion

Concurrent A measure of how well a test estimates a criterion

Select a criterion and correlate scores on the test with scores on the criterion in the present

Predictive A measure of how well a test predicts a criterion

Select a criterion and correlate scores on the test with scores on the criterion in the future

Construct A measure of how well a test assesses some underlying construct

Assess the underlying construct on which the test is based and correlate these scores with the test scores


Correlate new test with an established test

Show that people with and without certain traits score differently

Determine whether tasks required on test are consistent with theory guiding test development


Convergent validity—different methods yield similar results Discriminant validity—different methods yield different results

Method 1

Paper and Pencil

Method 2

Activity Level Monitor

Method 1

Paper and Pencil

Method 2


Trait 1

Method 1

Paper and Pencil Moderate Low

Impulsivity Method 2


Moderate

Trait 2

Method 1

Paper and Pencil

Activity Level

Method 2


Low

Trait 1

Impulsivity

Trait 2

Activity Level


A valid test must be reliableBut

A reliable test need not be valid


You must define a reliable and valid dependent variable or you will not know whether or not there truly is no difference between groups!

Use a test with established and acceptable levels of reliability and validity.

If you cannot do this, develop such a test for your thesis or dissertation (and do no more than that) OR change what you are measuring.


Explain why measurement is important to the research process?

Discuss the four levels of measurement and provide an example of each?

Explain the concept of reliability in terms of observed score, true score, and error?

Describe the two elements that can make up an error score?

List methods for increasing reliability? Discuss four ways in which reliability can be examined? Provide a conceptual definition of validity? List the three traditional types of validity? Explain the relationship between reliability and validity?

© 2011 pearson prentice hall, salkind. measurement, reliability and validity

Documents