Download - Reliability and validity
Reliability and ValidityReliability and Validity
Hatim Al-JifreeHatim Al-JifreeMB;ChB(Hon), FRCSC, GOC, MMedEdMB;ChB(Hon), FRCSC, GOC, MMedEd
Lecture objectivesLecture objectives
To review the definitions of reliability To review the definitions of reliability
and validityand validity
To review methods of evaluating To review methods of evaluating
reliability and validity in survey researchreliability and validity in survey research
EBM prospectiveEBM prospective
ReliabilityReliability
DefinitionDefinition
The degree of The degree of stabilitystability exhibited when a exhibited when a measurement is measurement is repeatedrepeated under identical under identical conditionsconditions
Lack of reliability may arise from Lack of reliability may arise from divergences between divergences between observersobservers or or instrumentsinstruments of measurement or of measurement or instabilityinstability of the attribute being of the attribute being measured measured
(from Last. Dictionary of Epidemiology)(from Last. Dictionary of Epidemiology)
Assessment of reliabilityAssessment of reliability
Reliability is assessed in 3 formsReliability is assessed in 3 forms
1.1. Test-retest reliabilityTest-retest reliability
2.2. Alternate-form reliabilityAlternate-form reliability
3.3. Internal consistency reliabilityInternal consistency reliability
Test-retest reliabilityTest-retest reliability
Most common form in surveysMost common form in surveys
Same respondents complete a survey Same respondents complete a survey
at at twotwo different points in different points in timetime
Usually quantified with a Usually quantified with a correlation correlation
coefficient (coefficient (rr value) value)
rr values are considered good if values are considered good if rr
0.700.70
Test-retest reliability (2)Test-retest reliability (2)
If data are recorded by an observer, If data are recorded by an observer,
you can have the you can have the same observer same observer
make make twotwo separate measurements separate measurements
The comparison between the two The comparison between the two
measurements is measurements is intrintraaobserverobserver
reliabilityreliability
What does a difference mean?What does a difference mean?
Test-retest reliability (3)Test-retest reliability (3)
You can test-retest You can test-retest specific questions specific questions
or the or the entireentire survey instrument survey instrument
Variables likely to change over a short Variables likely to change over a short
period of time, such as energy, period of time, such as energy,
happiness, anxiety happiness, anxiety
Test-retest over very short periods of Test-retest over very short periods of
timetime
Test-retest reliability (4)Test-retest reliability (4)
Potential problem with test-retest Potential problem with test-retest
is the is the practice effectpractice effect
Individuals become familiar with Individuals become familiar with
the itemsthe items
What effect does this have on your What effect does this have on your
reliability estimates?reliability estimates?
It inflates the reliability estimateIt inflates the reliability estimate
Alternate-form reliabilityAlternate-form reliability
Use differently worded forms Use differently worded forms to measure the same attributeto measure the same attribute
Questions or responses are Questions or responses are reworded reworded
Or their order is changed Or their order is changed
To produce two items that are To produce two items that are similar but not identicalsimilar but not identical
Alternate-form reliability Alternate-form reliability (2)(2)
Two items address: Two items address:
The same aspect of behavior The same aspect of behavior
Same vocabulary Same vocabulary
Same level of difficultySame level of difficulty
Items should differ in wording onlyItems should differ in wording only
It is common to simply change the order of It is common to simply change the order of
the response alternativesthe response alternatives
This reduces practice effectThis reduces practice effect
Example: Assessment of Example: Assessment of depressiondepression
Circle one itemCircle one item
Version A:Version A:
During the past 4 weeks, I have felt downhearted:During the past 4 weeks, I have felt downhearted:
Every dayEvery day 11
Some daysSome days 22
NeverNever 33
Version B:Version B:
During the past 4 weeks, I have felt downhearted:During the past 4 weeks, I have felt downhearted:
NeverNever 11
Some daysSome days 22
Every dayEvery day 33
Alternate-form reliability Alternate-form reliability (3)(3)
You could also You could also change the change the
wording of the wording of the responseresponse
alternatives without changing alternatives without changing
the meaningthe meaning
Example: Assessment of urinary Example: Assessment of urinary functionfunction
Version A:Version A:
During the past week, how often did you usually During the past week, how often did you usually empty your bladder?empty your bladder?
1 to 2 times per day1 to 2 times per day
3 to 4 times per day3 to 4 times per day
5 to 8 times per day5 to 8 times per day
12 times per day12 times per day
More than 12 times per dayMore than 12 times per day
Example: Assessment of urinary Example: Assessment of urinary functionfunction
Version B:Version B:
During the past week, how often did you usually During the past week, how often did you usually empty your bladder?empty your bladder?
Every 12 to 24 hoursEvery 12 to 24 hours
Every 6 to 8 hoursEvery 6 to 8 hours
Every 3 to 5 hoursEvery 3 to 5 hours
Every 2 hoursEvery 2 hours
More than every 2 hoursMore than every 2 hours
Alternate-form reliability Alternate-form reliability (4)(4)
You could also change the actual You could also change the actual
wording of the wording of the questionquestion
The two items must be equivalentThe two items must be equivalent
Items with different degrees of difficulty do Items with different degrees of difficulty do
not measure the same attributenot measure the same attribute
What might they measure?What might they measure?
Reading comprehension or cognitive functionReading comprehension or cognitive function
Example: Assessment of Example: Assessment of lonelinessloneliness
Version A:Version A:
How often in the past month have you felt alone in the world?How often in the past month have you felt alone in the world?
Every dayEvery day
Some daysSome days
OccasionallyOccasionally
NeverNever
Version B: Version B:
During the past 4 weeks, how often have you felt a sense of loneliness?During the past 4 weeks, how often have you felt a sense of loneliness?
All of the timeAll of the time
SometimesSometimes
From time to timeFrom time to time
NeverNever
Example of nonequivalent item Example of nonequivalent item rewordingrewording
Version A:Version A:
When your boss blames you for something you did not do, how often When your boss blames you for something you did not do, how often do you stick up for yourself?do you stick up for yourself?
All the timeAll the time
Some of the timeSome of the time
None of the timeNone of the time
Version B:Version B:
When presented with difficult professional situations where a superior When presented with difficult professional situations where a superior censures you for an act for which you are not responsible, how censures you for an act for which you are not responsible, how frequently do you respond in an assertive way?frequently do you respond in an assertive way?
All of the timeAll of the time
Some of the timeSome of the time
None of the timeNone of the time
Alternate-form reliability Alternate-form reliability (5)(5)
You can measure alternate-form reliability at the You can measure alternate-form reliability at the
same timepointsame timepoint or or separate timepointsseparate timepoints
If large enough sample:If large enough sample:
You can split it in half and administer one item to You can split it in half and administer one item to
each half each half
Then compare the two halvesThen compare the two halves
This is called a split-halves methodThis is called a split-halves method
Can split into thirds and administer three forms of the Can split into thirds and administer three forms of the
itemitem
Internal consistency Internal consistency reliabilityreliability
Applied to Applied to groups of items groups of items that are thought that are thought
to measure to measure different aspects different aspects of the of the same same
conceptconcept
CronbachCronbach’’s coefficient alphas coefficient alpha
Measures internal consistency reliability Measures internal consistency reliability
It is a reflection of how well the different items It is a reflection of how well the different items
complement eachcomplement each
Interpret like a correlation coefficient (Interpret like a correlation coefficient (0.70 is 0.70 is
good)good)
Example: Assessment of physical Example: Assessment of physical functionfunction
Limited a lot
Limited a little
Not limited
Vigorous activities, such as running, lifting heavy objects, participating in strenuous sports
1 2 3
Moderate activities, such as moving a table, pushing a vacuum cleaner, bowling, or playing golf
1 2 3
Lifting or carrying groceries 1 2 3
Climbing several flights of stairs 1 2 3
Bending, kneeling, or stooping 1 2 3
Walking more than a mile 1 2 3
Walking several blocks 1 2 3
Walking one block 1 2 3
Bathing or dressing yourself 1 2 3
Calculation of CronbachCalculation of Cronbach’’s coefficient alphas coefficient alpha
Example: Assessment of emotional healthExample: Assessment of emotional health
During the past month: During the past month: Yes No Yes No
Have you been a very nervous person?Have you been a very nervous person? 1 1 0 0
Have you felt downhearted and blue?Have you felt downhearted and blue? 1 1 0 0
Have you felt so down in the dumps thatHave you felt so down in the dumps that nothing could cheer you up? nothing could cheer you up? 1 0 1 0
ResultsResults Patient
Item 1
Item 2
Item 3
Summed scale score
1 0 1 1 2
2 1 1 1 3
3 0 0 0 0
4 1 1 1 3
5 1 1 0 2
Percentage positive
3/5=.6
4/5=.8
3/5=.6
CalculationsCalculationsMean score=2Mean score=2
Sample variance=Sample variance=
86.02
3
5.1
)4)(.6(.)2)(.8(.)4)(.6(.1
1
)(%)(%1
k
k
Var
negposalphaCC
ii
Conclude that this scale has good reliability
Internal consistency Internal consistency reliability (2)reliability (2)
If internal consistency is low:If internal consistency is low:
You can add more items You can add more items
Re-examine existing items Re-examine existing items
for clarityfor clarity
Interobserver reliabilityInterobserver reliability
How well How well twotwo evaluators agree in evaluators agree in
their assessment of a variabletheir assessment of a variable
Use Use correlation coefficient correlation coefficient to to
compare data between observerscompare data between observers
May be used as May be used as property of the property of the
test test or as an or as an outcome variableoutcome variable
ValidityValidity
DefinitionDefinition
How well a survey How well a survey
measures what it measures what it
sets out to measure sets out to measure
Assessment of validityAssessment of validity
Validity is measured in four formsValidity is measured in four forms
Face validityFace validity
Content validityContent validity
Criterion validityCriterion validity
Construct validityConstruct validity
Face validityFace validity
Cursory review of survey items by Cursory review of survey items by
untrained judgesuntrained judges
Ex. Showing the survey to Ex. Showing the survey to untrained untrained
individualsindividuals to see whether they to see whether they
think the items look okaythink the items look okay
Very casual, softVery casual, soft
Many donMany don’’t really consider this as a t really consider this as a
measure of validity at allmeasure of validity at all
Content validityContent validity
SubjectiveSubjective measure of how appropriate measure of how appropriate
the items seem to a set of reviewers who the items seem to a set of reviewers who
have have some knowledge some knowledge of the subject of the subject
mattermatter
Usually consists of an organized review Usually consists of an organized review
of the surveyof the survey’’s contentss contents
Still very qualitativeStill very qualitative
Criterion validityCriterion validity
Measure of how well Measure of how well one instrument one instrument
stacks up stacks up against another instrument against another instrument
or predictoror predictor
ConcurrentConcurrent: assess your instrument : assess your instrument
against a against a ““gold standardgold standard””
PredictivePredictive: assess the ability of your : assess the ability of your
instrument to forecast instrument to forecast future eventsfuture events, ,
behavior, attitudes, or behavior, attitudes, or outcomesoutcomes
Assess with Assess with correlation coefficientcorrelation coefficient
Construct validityConstruct validity
Most Most valuablevaluable and most and most
difficultdifficult measure of validity measure of validity
Basically, it is a measure of Basically, it is a measure of
how meaningful the scale or how meaningful the scale or
instrument is instrument is when it is in when it is in
practical usepractical use
Construct validity (2)Construct validity (2)
ConvergentConvergent: Implies that : Implies that several several
different methodsdifferent methods for obtaining the for obtaining the
same informationsame information about a given trait or about a given trait or
concept produce similar resultsconcept produce similar results
Evaluation is analogous to Evaluation is analogous to alternate-form alternate-form
reliabilityreliability exceptexcept that it is that it is more more
theoreticaltheoretical and requires a great deal of and requires a great deal of
work-usually work-usually by multiple investigators by multiple investigators
with different approacheswith different approaches
Construct validity (3)Construct validity (3)
DivergentDivergent: The ability of a : The ability of a
measure to estimate the measure to estimate the
underlying truth in a given area-underlying truth in a given area-
must be shown not to correlate must be shown not to correlate
too closely with similar but too closely with similar but
distinct concepts or traits distinct concepts or traits
EBM ProspectiveEBM Prospective
IntroductionIntroduction
Three Steps in Using Medical Three Steps in Using Medical
Literature Articles :Literature Articles :
Are the results of the study Are the results of the study
valid? valid?
What are the results? What are the results?
How can I apply these results How can I apply these results
to patient care? to patient care?
IntroductionIntroduction
Four types of papers:Four types of papers:
TherapyTherapy
Diagnostic InterventionDiagnostic Intervention
PrognosisPrognosis
Systematic reviewSystematic review
Therapy Therapy
Study design: RCTStudy design: RCT
Were Patients Randomized? Were Patients Randomized?
Was Randomization Concealed? Was Randomization Concealed?
Were Patients Analyzed in the Groups Were Patients Analyzed in the Groups
to Which They Were Randomized? to Which They Were Randomized?
Intention to treat analysisIntention to treat analysis
TherapyTherapy
Were Patients in Were Patients in The Treatment The Treatment And Control Groups And Control Groups Similar With Respect to Similar With Respect to Known Prognostic Factors? Known Prognostic Factors?
Were Patients Aware of Were Patients Aware of Group Allocation? Group Allocation?
TherapyTherapy
Were Clinicians Aware of Were Clinicians Aware of Group Allocation? Group Allocation?
Were Outcome Assessors Were Outcome Assessors Aware of Group Allocation? Aware of Group Allocation?
Was Follow-up Complete? Was Follow-up Complete?
Was Follow-up Long Enough? Was Follow-up Long Enough?
Diagnostic InterventionDiagnostic InterventionStudy Design: Cross-sectional Study Design: Cross-sectional
Was there an independent, blind comparison with Was there an independent, blind comparison with a reference standard?a reference standard?
•Spectrum of patientsSpectrum of patients
•Did the results of the test being evaluated Did the results of the test being evaluated influence the decision to perform the reference influence the decision to perform the reference standard?standard?
•Were the methods description permit replication? Were the methods description permit replication?
PrognosisPrognosis• Study design: Cohort Study design: Cohort
• Was a Was a – Defined, Defined, – representative sample of patient representative sample of patient – assembled at a common point in the course of assembled at a common point in the course of
their disease?their disease?
• Inception Cohort; early Inception Cohort; early
• Late stage prognosisLate stage prognosis
• Patient equal in all prognostic factorsPatient equal in all prognostic factors• Stratified analysis?Stratified analysis?
• Follow up complete and long enoughFollow up complete and long enough
• Valid and reliable data collectionValid and reliable data collection
Thank YouThank You