psy 407 reliability
TRANSCRIPT
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 2/28
Evaluation of Measurement Instruments
• Reliability has to do with the consistency of the instrument.
- Internal Consistency (Consistency of the items)
- Test-retest Reliability (Consistency over time)
- Interrater Reliability (Consistency between raters)
- Split-half Methods
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 3/28
Correlation- a measure of the associationbetween items/variables
Correlations are measured by a numerical
value from 0 (no correlation) to 1( perfect orstrong correlation)
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 4/28
A) Strength
Correlations can be weak (0) or strong (1)
B) Direction
A) Positive--- the variables go in the same direction(as one increases the other increases or as onedecreases the other decreases
B) Negative ---they go in opposite directions (as one
increases the other decreases)
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 5/28
Alcohol consumption and reaction time(Positive or negative?????)
Correlation does not imply causation!!!
Breastfeeding and academic development (positivecorrelation)
Rap music and violent behavior (positive
correlation)
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 6/28
Reliability
• Reliability is synonymous with consistency. It is the degree to
which test scores for a an individual test taker or group of test
takers are consistent over repeated applications.
• No psychological test is completely consistent, however, a
measurement that is unreliable is worthless.
For Example
A student receives a score of 100 on one intelligence tests and
114 in another or imagine that every time you stepped on a
scale it showed a different weight.
Would you keep using these measurement tools?
• The consistency of test scores is critically important indeterminin whether a test can rovide ood measurement.
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 7/28
Reliability (cont.) • Because no unit of measurement is exact, any time you measure
something (observed score), you are really measuring two things
1. True Score - the amount of observed score that truly represents
what you are intending to measure.
2. Error Component - the amount of other variables that can impact
the observed score
Observed Test Score = True Score + Errors of Measurement
For Example – Personality Scores from the MMPI may reflect yourtrue personality and: a) your mood that day; b) what you ate that
morning; c) the actions of the tester; and d) bias in the test itself
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 8/28
Why Do Test Scores Vary?
Possible Sources of Variability of Scores (pg. 110)
- General Ability to comprehend instructions
- Stable response sets (e.g., answering “C” option more frequently)
- The element of chance of getting a question right
- Conditions of testing
- Unreliability or bias in grading or rating performance
- Motivation- Emotional Strain
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 9/28
Measurement Error
• Any fluctuation in test scores that results from factors related tothe measurement process that are irrelevant to what is being
measured.
• The difference between the observed score and the true score is
called the error score. S true = S observed - S error • Developing better tests with less random measurement error is
better than simply documenting the amount of error.
Measurement Error is Reduced By:- Writing items clearly
- Making instructions easily understood
- Adhering to proper test administration
- Providing consistent scoring
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 10/28
Determining Reliability
• There are several ways that a measurements reliability
can be determined, depending on the type of
measurement the and the supporting data required.
They include:
- Internal Consistency
- Test-retest Reliability
- Interrater Reliability- Split-half Methods
- Odd-even Reliability
- Alternate Forms Methods
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 11/28
Internal Consistency
• Measures the reliability of a test solely on the number of items on
the test and the intercorrelation among the items. Therefore, it
compares each item to every other item.
• If a scale is measuring a construct, then overall the items on that
scale should be highly correlated with one another.
• There are two common ways of measuring internal consistency …
1. Cronbach’s Alpha: .80 to .95 (Excellent)
.70 to .80 (Very Good)
.60 to .70 (Satisfactory)
<.60 (Suspect)
2. Item-Total Correlations - the correlation of the item with the
remainder of the items (.30 is the minimum acceptable item-total
correlation).
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 12/28
Split Half & Odd-Even Reliability
Split Half - refers to determining a correlation between the first
half of the measurement and the second half of the measurement
(i.e., we would expect answers to the first half to be similar to the
second half).
Odd-Even - refers to the correlation between even items and odditems of a measurement tool.
• In this sense, we are using a single test to create two tests,
eliminating the need for additional items and multiple
administrations.
• Since in both of these types only 1 administration is needed and
the groups are determined by the internal components of the test,
it is referred to as an internal consistency measure.
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 13/28
Split Half & Odd-Even Reliability
Possible Advantages• Simplest method - easy to perform
• Time and Cost Effective
Possible Disadvantages
• Many was of splitting
• Each split yields a somewhat different reliability estimate
• Which is the real reliability of the test?
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 14/28
Test-retest Reliability • Test-retest reliability is usually measured by computing
the correlation coefficient between scores of twoadministrations.
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 15/28
Test-retest Reliability (cont.) • The amount of time allowed between measures is critical.
• The shorter the time gap, the higher the correlation; the longer
the time gap, the lower the correlation. This is because the two
observations are related over time.
• Optimum time betweem administrations is 2 to 4 weeks.
• If a scale is measuring a construct consistently, then there should
not be radical changes on the scores between administrations ---unless something significant happened.
• The rationale behind this method is that the difference between
the scores of the test and the retest should be due to measurementsolely.
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 16/28
Test-retest Reliability (cont.)
• It is hard to specify one acceptable test-retest correlationsince what is considered acceptable depends on the the
type of scale, the use of the scale, and the time between
testing.
For example - it is not clear whether differences in test
scores are regarded as sources of measurement error or
as sources of real stability.
Possible difference in scores between tests? : experience,
characteristic being measured may change over time
(e.g. reading test), carryover effects (e.g., remember test)
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 17/28
Test-retest Reliability (cont.)
• A minimum correlation of at least .50 is expected.
• The higher the correlation (in a positive direction) the
higher the test-retest reliability
• The biggest problem with this type of reliability is what
called memory effect. Which means that a respondent
may recall the answers from the original test, therefore
inflating the reliability.
• Also, is it practical?
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 18/28
Interrater Reliability
• Whenever you use humans as a part of your measurementprocedure, you have to worry about whether the results you get
are reliable or consistent. People are notorious for their
inconsistency. We are easily distractible. We get tired of doing
repetitive tasks. We daydream. We misinterpret.
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 19/28
Interrater Reliability (cont.)
• For some scales it is important to assess interraterreliability.
• Interrater reliability means that if two different raters
scored the scale using the scoring rules, they should
attain the same result.
• Interrater reliability is usually measured by computing
the correlation coefficient between the scores of two
raters for the set of respondents.• Here the criterion of acceptability is pretty high (e.g., a
correlation of at least .9), but what is considered
acceptable will vary from situation to situation.
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 20/28
Factors Affecting Reliability
• Administrator Factors
• Number of Items on the instrument
• The Instrument Taker• Heterogeneity of the Items
• Heterogeneity of the Group Members
• Length of Time between Test and Retest
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 21/28
• Poor or unclear directions given duringadministration or inaccurate scoring can affect
reliability.
For Example - say you were told that your scores on
being social determined your promotion. The result
is more likely to be what you think they want thanwhat your behavior is.
Administrator Factors
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 22/28
• The larger the number of items, the greater the
chance for high reliability.
For Example -it makes sense when you ponder that
twenty questions on your leadership style is more
likely to get a consistent result than four questions.
• Remedy: Use longer tests or accumulate
scores from short tests.
Number of Items on the Instrument
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 23/28
For Example -If you took an instrument in August
when you had a terrible flu and then in December
when you were feeling quite good, we might see a
difference in your response consistency. If you wereunder considerable stress of some sort or if you were
interrupted while answering the instrument
questions, you might give different responses.
The Test Taker
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 24/28
Heterogeneity of the Items -- The greater the
heterogeneity (differences in the ways that the same
issue is assessed) of the items, the greater the chance
for high reliability correlation coefficients.
****You ask the same question in “different” ways
*****Clients cannot determine what you are trying to
assess and “fake” answers
Heterogeneity
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 25/28
• The shorter the time, the greater the chance for highreliability correlation coefficients.
• As we have experiences, we tend to adjust our views a little
from time to time. Therefore, the time interval between the
first time we took an instrument and the second time is
really an "experience" interval.
• Experience happens, and it influences how we see things.
Because internal consistency has no time lapse, one can
expect it to have the highest reliability correlation
coefficient.
Length of Time between Test and Retest
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 26/28
How High Should Reliability Be?
• A highly reliable test is always preferable to a test withlower reliability.
.80 > greater (Excellent)
.70 to .80 (Very Good)
.60 to .70 (Satisfactory)
<.60 (Suspect)
• A reliability coefficient of .80 indicates that only 20% of
the variability in test scores is due to measurement error.
8/13/2019 Psy 407 Reliability
http://slidepdf.com/reader/full/psy-407-reliability 27/28
Is there a trait for kindness? Aggression?
Are we simply a sum total of our environment
and our experiences?