development and validation of a health status measure susan stock, md msc frcpc institut national de...
TRANSCRIPT
Development and Validation of a health status measure
Susan Stock, MD MSc FRCPCSusan Stock, MD MSc FRCPCInstitut national de santé publique du QuébecInstitut national de santé publique du QuébecDirection de santé publique de Montreal-CentreDirection de santé publique de Montreal-CentreDept of Epidemiology, Biostatistics and Occupational Health, McGillDept of Epidemiology, Biostatistics and Occupational Health, McGill
513-611A STUDY DESIGN AND ANALYSIS I
Sept 29, 2003
Susan Stock : Developing & Validating Health Status Measures
Plan
Types of Health Status MeasuresTypes of Health Status Measures Steps in the development of a health status Steps in the development of a health status
measuremeasure Steps in the development of the Neck and Upper Steps in the development of the Neck and Upper
Limb IndexLimb Index Steps in the validation of a health status measureSteps in the validation of a health status measure Steps in the validation of the Neck and Upper Steps in the validation of the Neck and Upper
Limb IndexLimb Index
Susan Stock : Developing & Validating Health Status Measures
Health status measure
A health outcome questionnaire that quantifies symptoms, A health outcome questionnaire that quantifies symptoms, function, feelings and/or behaviour directly from the function, feelings and/or behaviour directly from the respondentrespondent to measure overall health status (generic to measure overall health status (generic instrument) or disorder-specific health statusinstrument) or disorder-specific health status
Vary in scopeVary in scope Activities of daily living ("ADL”- e.g. self care, mobility)Activities of daily living ("ADL”- e.g. self care, mobility) Functional status – measure capacity or performance of Functional status – measure capacity or performance of
physical functioning, e.g. household tasks, work, physical functioning, e.g. household tasks, work, recreational activitiesrecreational activities
Health-related "quality of life" instruments - measure not Health-related "quality of life" instruments - measure not only physical functioning but also psychological, social only physical functioning but also psychological, social and role functioningand role functioning
Susan Stock : Developing & Validating Health Status Measures
Health status measures
allow patient/subject to identify impact of a disorder or allow patient/subject to identify impact of a disorder or health problem on his/her life across many dimensions health problem on his/her life across many dimensions based on his/her experience rather than the interpretation based on his/her experience rather than the interpretation of a health care professionalof a health care professional
Useful in a wide range of studies and clinical contexts:Useful in a wide range of studies and clinical contexts: In studies of aetiology, prevalence and prognostic In studies of aetiology, prevalence and prognostic
factors they can be incorporated into case definitions factors they can be incorporated into case definitions that distinguish according to severitythat distinguish according to severity
In intervention studies and health services research In intervention studies and health services research they can be used as the primary outcome to they can be used as the primary outcome to demonstrate change over time in health statusdemonstrate change over time in health status
Susan Stock : Developing & Validating Health Status Measures
Development of Health Status Measures: references Streiner DL, Norman GR. Health measurement scales. A practical Streiner DL, Norman GR. Health measurement scales. A practical
guide to their development and use. Second edition. New York: guide to their development and use. Second edition. New York: Oxford University Press, 1995: 28-53Oxford University Press, 1995: 28-53
Guyatt GH, Bombardier C, Tugwell PX. Measuring disease-Guyatt GH, Bombardier C, Tugwell PX. Measuring disease-specific quality of life in clinical trials. specific quality of life in clinical trials. CMAJ CMAJ 1986; 134: 889-1986; 134: 889-895 895
Guyatt GH, Jaeschke R, Feeny DH, Patrick DL. Measurement in Guyatt GH, Jaeschke R, Feeny DH, Patrick DL. Measurement in clinical trials: Choosing the right approach. clinical trials: Choosing the right approach.
Juniper EF, Guyatt GH, Jaeschke R. How to develop and validate Juniper EF, Guyatt GH, Jaeschke R. How to develop and validate a new health-related quality of life instrument. a new health-related quality of life instrument. In Spilker B (ed), Quality of Life and Pharmacoeconomics in In Spilker B (ed), Quality of Life and Pharmacoeconomics in
Clinical Trials, Second edition. Lippencott-Raven Publishers, Clinical Trials, Second edition. Lippencott-Raven Publishers, Philadelphia, 1996Philadelphia, 1996
Susan Stock : Developing & Validating Health Status Measures
Neck and Upper Limb Index (NULI)
Health-related quality of life instrument:Health-related quality of life instrument: specific to neck and upper extremity musculoskeletal specific to neck and upper extremity musculoskeletal
disordersdisorders capable of measuring changes within subjects over time in capable of measuring changes within subjects over time in
intervention studiesintervention studies capable of distinguishing between subjects (i.e., assess capable of distinguishing between subjects (i.e., assess
severity) in prognostic, prevalence or etiologic studiesseverity) in prognostic, prevalence or etiologic studies applicable to both French and English speaking applicable to both French and English speaking
populations in Canada, andpopulations in Canada, and practical and easy to use in clinical settingspractical and easy to use in clinical settings
Susan Stock : Developing & Validating Health Status Measures
In order to develop an instrument that was equally In order to develop an instrument that was equally appropriate to the two major cultural and linguistic appropriate to the two major cultural and linguistic groups in Canadagroups in Canada
Conducted two separate studies with similar Conducted two separate studies with similar protocols for item reduction and selection and protocols for item reduction and selection and subsequent validationsubsequent validation one in an Ontario English-speaking population one in an Ontario English-speaking population the other in a Quebec French-speaking the other in a Quebec French-speaking
populationpopulation
Neck and Upper Limb Index (NULI)
Susan Stock : Developing & Validating Health Status Measures
Steps in development of a health status measure Search for appropriate existing measure!Search for appropriate existing measure!
If none available:If none available: Identify domains of interestIdentify domains of interest Generating potential itemsGenerating potential items Refine items and pre-testRefine items and pre-test Choose appropriate response scale(s) for the itemsChoose appropriate response scale(s) for the items Carry out item reduction and item selection Carry out item reduction and item selection
strategiesstrategies
Susan Stock : Developing & Validating Health Status Measures
Steps in development of the NULI Identification of domains of interestIdentification of domains of interest Generation of potential itemsGeneration of potential items Item refinement and pre-testingItem refinement and pre-testing English item reduction and selection studyEnglish item reduction and selection study French translation of potential items French translation of potential items French item reduction and selection studyFrench item reduction and selection study Comparison of English and French resultsComparison of English and French results Selection of 20 items appropriate for both Selection of 20 items appropriate for both
populationspopulations Reliability and validity testing of the final 20-item Reliability and validity testing of the final 20-item
instrument in both English and French populationsinstrument in both English and French populations
Susan Stock : Developing & Validating Health Status Measures
Domain
a dimension of life potentially affected by the a dimension of life potentially affected by the disorder or health problem in questiondisorder or health problem in question e.g. self care, household responsibilities, work, social e.g. self care, household responsibilities, work, social
life, sexual life, mood, self esteem, transportation, life, sexual life, mood, self esteem, transportation, recreation, sleep, financial impact of disorder, recreation, sleep, financial impact of disorder, iatrogenic effect of evaluation or treatmentiatrogenic effect of evaluation or treatment
Susan Stock : Developing & Validating Health Status Measures
Identifying domains & generating items
Strategies for identifying the most appropriate Strategies for identifying the most appropriate domains of interest and for generating potential domains of interest and for generating potential
items are aimed at optimizing items are aimed at optimizing content validitycontent validity the extent to which the measurement incorporates the extent to which the measurement incorporates all all
the relevant content orthe relevant content or domains of the phenomenon domains of the phenomenon under studyunder study
Susan Stock : Developing & Validating Health Status Measures
NULI: Identifying domains & generating items review of relevant literature (rheumatology, rehabilitation, review of relevant literature (rheumatology, rehabilitation,
orthopaedics, back pain)orthopaedics, back pain) review of existing health status instruments identified by review of existing health status instruments identified by
bibliographic search and contact with experts bibliographic search and contact with experts clinical experience of investigators clinical experience of investigators survey of 30 clinicianssurvey of 30 clinicians interviews with 33 worker-patients who presented with interviews with 33 worker-patients who presented with
neck and/or upper limb disorders in five clinical neck and/or upper limb disorders in five clinical occupational healthoccupational health settings settings
Susan Stock : Developing & Validating Health Status Measures
Evaluating content validity of existing instruments
Identify relevant domains for the concept of interest and Identify relevant domains for the concept of interest and evaluate whether instruments measure these domains evaluate whether instruments measure these domains adequatelyadequately
Identify number or proportion of items in each instrument Identify number or proportion of items in each instrument that are not relevant to the concept you wish to measurethat are not relevant to the concept you wish to measure
Ref: Ref: Stock SR, Cole DC, Tugwell P. Review of applicability of Stock SR, Cole DC, Tugwell P. Review of applicability of existing functional status measures to the study of workers with existing functional status measures to the study of workers with musculoskeletal disorders of the neck and upper limb. musculoskeletal disorders of the neck and upper limb. Am J Am J Indust MedIndust Med 1996, 29, 679-688 1996, 29, 679-688
Susan Stock : Developing & Validating Health Status Measures
WoWorkrk
Self Self carecare
HouseHousehold & hold & familyfamily
SoSocial cial lifelife
RecreaRecreationtion
SleSleepep
MooMoodd
Sex Sex lifelife
PDIPDI 22 44 22 -- 22 -- -- --
DRIDRI 22 11 11 -- 22 -- -- --
NULINULI 44 11 44 -- 22 22 44 --
DASHDASH 11 11 99 11 3 (+4)3 (+4) 11 -- 11
ASESASES 11 44 -- -- 11 11 -- --
An example of evaluation of content validity
Distribution of items among domains for selected musculoskeletal functional status instruments
Susan Stock : Developing & Validating Health Status Measures
80 questions in 8 domains identified through 80 questions in 8 domains identified through investigator clinical experience, existing investigator clinical experience, existing instruments and literatureinstruments and literature
52 additional items and 2 domains generated by 52 additional items and 2 domains generated by clinician surveyclinician survey
48 additional items and 2 more domains identified 48 additional items and 2 more domains identified by patient interviewsby patient interviews
Total of 12 domains identifiedTotal of 12 domains identified
NULI: Identifying domains & generating items
Susan Stock : Developing & Validating Health Status Measures
Item refinement
Redundant items eliminatedRedundant items eliminated Pool of approximately 150 items with 7-30 items per domainPool of approximately 150 items with 7-30 items per domain Wording of itemsWording of items Literacy editor to ensure Grade 6 languageLiteracy editor to ensure Grade 6 language ““Applicability”: Screening question developed for Applicability”: Screening question developed for activity-activity-
related items to evaluate whether the item was applicable or related items to evaluate whether the item was applicable or relevant to the subject (work, household and family relevant to the subject (work, household and family responsibilities, transportation/driving, recreation, and social responsibilities, transportation/driving, recreation, and social activities; sexual life) activities; sexual life) Vacuuming, shovelling snowVacuuming, shovelling snow Sports activitiesSports activities
Susan Stock : Developing & Validating Health Status Measures
Item refinement: Choice of response scale Response scale: Response scale: 7-point numbered scale with 7-point numbered scale with
verbal anchors verbal anchors Maximize reliability: reliability of a scale rises Maximize reliability: reliability of a scale rises
rapidly as the number of divisions increases to rapidly as the number of divisions increases to seven and then rises more slowly until there are 11 seven and then rises more slowly until there are 11 points points (Streiner and Norman 1995, (Streiner and Norman 1995, Nunnally et Wilson 1975, Nunnally et Wilson 1975, Nishisato et Torii 1985Nishisato et Torii 1985 ) )
Susan Stock : Developing & Validating Health Status Measures
Response scale : number of points on a scale Loss of test re-test reliability:Loss of test re-test reliability:
7-10 categories: little reduction of reliability7-10 categories: little reduction of reliability 5 categories reduces reliability by 12%5 categories reduces reliability by 12% 2 categories reduces reliability by 35%2 categories reduces reliability by 35%
OptimOptimum number of points recommended: um number of points recommended: ((55 to) to) 7 7 categoriescategories
(Reference: Streiner and Norman 1995, Chap 4)(Reference: Streiner and Norman 1995, Chap 4)
Treating rating scales as interval data statistically Treating rating scales as interval data statistically will result in less measurement error when there will result in less measurement error when there are more itemsare more items
Susan Stock : Developing & Validating Health Status Measures
Scaling: number of points on a scale
Potential sources of error when there are few Potential sources of error when there are few points on a scale:points on a scale:
Uncertainty, confusion of respondentsUncertainty, confusion of respondents Reduction in reliabilityReduction in reliability Loss of Loss of efficiencyefficiency of the instrument of the instrument
More subjects needed to show an effect (S Suissa More subjects needed to show an effect (S Suissa J Clin Epidemiol 1991, 44: 241-8)J Clin Epidemiol 1991, 44: 241-8)
Lower correlation with other measures (Hunter & Lower correlation with other measures (Hunter & Schmidt 1990, J Applied Psychol 75:334-49)Schmidt 1990, J Applied Psychol 75:334-49)
Susan Stock : Developing & Validating Health Status Measures
Pre-test
Pre-test in 10 clients with musculoskeletal Pre-test in 10 clients with musculoskeletal disorders of neck or upper extremity in a disorders of neck or upper extremity in a vocational rehabilitation clinicvocational rehabilitation clinic
To identify questions that are unclear, ambiguous, To identify questions that are unclear, ambiguous, difficult to understand or inappropriatedifficult to understand or inappropriate
Revise items following pre-testRevise items following pre-test
Susan Stock : Developing & Validating Health Status Measures
Inter-rater reliability testing inter-rater reliability study of revised potential items inter-rater reliability study of revised potential items English study conducted on 38 worker-patients with neck and upper limb English study conducted on 38 worker-patients with neck and upper limb
disorders in four clinical settings prior to the item selection study; French disorders in four clinical settings prior to the item selection study; French inter-rater reliability study was conducted with 16 worker-patients inter-rater reliability study was conducted with 16 worker-patients
2 raters interviewed each patient on the same day, at 2-4 hour intervals2 raters interviewed each patient on the same day, at 2-4 hour intervals Following the second interview, feedback was sought from respondents to Following the second interview, feedback was sought from respondents to
further identify any ambiguous items or those difficult to understandfurther identify any ambiguous items or those difficult to understand ICC (intraclass correlations) calculated for the mean of items in each ICC (intraclass correlations) calculated for the mean of items in each
domain and for each individual item. domain and for each individual item. Items with low inter-rater reliability (ICC<0.7) identified and source of Items with low inter-rater reliability (ICC<0.7) identified and source of
difficulty reviewed with the interviewers.difficulty reviewed with the interviewers. Items were reformulated where indicated.Items were reformulated where indicated.
Susan Stock : Developing & Validating Health Status Measures
Interviewer training
3-5-day training sessions for interviewers 3-5-day training sessions for interviewers to be familiar with content of questions, use of to be familiar with content of questions, use of
scalesscales to teach appropriate standardised technique to teach appropriate standardised technique interviewers trained to probe in a non-directive, interviewers trained to probe in a non-directive,
non-biasing fashion, and be interpersonally neutralnon-biasing fashion, and be interpersonally neutral feedback on tape-recorded interviewsfeedback on tape-recorded interviews role-playing of interviews with potentially role-playing of interviews with potentially
difficult subjectsdifficult subjects
Susan Stock : Developing & Validating Health Status Measures
Interviewer training To reduce bias and random error and ensure strict adherence to research To reduce bias and random error and ensure strict adherence to research
protocolprotocol Inform re purpose of study, type of data to be gathered, how results will be Inform re purpose of study, type of data to be gathered, how results will be
usedused Familiarize with questionnaire, understand every itemFamiliarize with questionnaire, understand every item How to handle first meeting with respondent, techniques for building rapportHow to handle first meeting with respondent, techniques for building rapport How to answer questions commonly asked by respondentsHow to answer questions commonly asked by respondents Confidentiality proceduresConfidentiality procedures When and how to probeWhen and how to probe How to ask questionsHow to ask questions How to record responsesHow to record responses Checking the questionnaireChecking the questionnaire How to end interviewsHow to end interviews How to deal with special situations (angry, tearful, or verbose respondents)How to deal with special situations (angry, tearful, or verbose respondents)
Susan Stock : Developing & Validating Health Status Measures
Item reduction studies
Study procedure: Study procedure: Pre and post-treatment administration of 170 potential items Pre and post-treatment administration of 170 potential items
and validating measures to 119 English-speaking Ontario and validating measures to 119 English-speaking Ontario workers and to 93 French-speaking Quebec workers with neck workers and to 93 French-speaking Quebec workers with neck or upper limb disorders recruited from occupational and or upper limb disorders recruited from occupational and physiotherapy clinics physiotherapy clinics 7-30 specific items in each of the 12 domains including a global 7-30 specific items in each of the 12 domains including a global
question about the overall impact of the disorder on that domainquestion about the overall impact of the disorder on that domain An additional administration 3-7 days after the initial An additional administration 3-7 days after the initial
administration for test re-rest reliabilityadministration for test re-rest reliability Subjects rank ordered the 12 domains according to the relative Subjects rank ordered the 12 domains according to the relative
importance of the impact of their musculoskeletal disorder on importance of the impact of their musculoskeletal disorder on these dimensions of their livesthese dimensions of their lives
Susan Stock : Developing & Validating Health Status Measures
NULI Item reduction
Objective of item reduction:Objective of item reduction: To identify and omit items that were irrelevant, To identify and omit items that were irrelevant,
unresponsive, had poor test re-test reliability, unresponsive, had poor test re-test reliability, discriminated poorly or were unresponsive to discriminated poorly or were unresponsive to changechange
Susan Stock : Developing & Validating Health Status Measures
Criteria for item reduction
Applicability of activity related itemsApplicability of activity related items Eliminate items not applicable to at least 80% study Eliminate items not applicable to at least 80% study
populationpopulation Eliminate items not applicable to at least 70% of men Eliminate items not applicable to at least 70% of men
and 70% of womenand 70% of women e.g. vacuuming applicable to 49% men 83% womene.g. vacuuming applicable to 49% men 83% women Shovelling snow not applicable to 82% womenShovelling snow not applicable to 82% women
Reproducibility Reproducibility Eliminate items with Pearson correlation coefficient Eliminate items with Pearson correlation coefficient
0.50.5
Susan Stock : Developing & Validating Health Status Measures
Criteria for item reduction
Internal consistencyInternal consistency Eliminate items with correlation Eliminate items with correlation 0.3 between item score and: (1) 0.3 between item score and: (1)
mean of all items in the domain without that item; (2) the global mean of all items in the domain without that item; (2) the global question score for the domainquestion score for the domain
Responsiveness to change Responsiveness to change Eliminate items with correlation Eliminate items with correlation 0.3 between 0.3 between the residual the residual
change scores pre-treatment and post-treatment to the residual change scores pre-treatment and post-treatment to the residual change score of the domain Global Score change score of the domain Global Score
Discriminative AbilityDiscriminative Ability Eliminate items withEliminate items with a skewness statistic > 2 standard deviations a skewness statistic > 2 standard deviations
of the standard error of this statisticof the standard error of this statistic
Susan Stock : Developing & Validating Health Status Measures
Measuring change
Problem with change scores: regression to the Problem with change scores: regression to the mean (tendency of outlying scores to return to the mean (tendency of outlying scores to return to the mean)mean) by chance low pre-test scores will be higher on post-by chance low pre-test scores will be higher on post-
test and high pre-test scores will be lower on post-test) test and high pre-test scores will be lower on post-test) Possible solution: residual change scoresPossible solution: residual change scores
Susan Stock : Developing & Validating Health Status Measures
Selection of final domains
Selection of domains: relative impact and importance Selection of domains: relative impact and importance study subjects attributed to each domainstudy subjects attributed to each domain mean score of the global question for each domain and mean score of the global question for each domain and domain rankings domain rankings calculated for each study population as well as by gendercalculated for each study population as well as by gender committee of co-investigators reviewed these data and, through committee of co-investigators reviewed these data and, through
consensus discussion, arrived at a choice of priority domains and consensus discussion, arrived at a choice of priority domains and the number of items of each domain the final instrument should the number of items of each domain the final instrument should includeinclude
Selection among remaining itemsSelection among remaining items
Susan Stock : Developing & Validating Health Status Measures
Comparison of Global Question Mean Scores for Each Domain between Quebec
and Ontario Study Populations
Work Sleep Recr Mood Housework Esteem Self care Financial Driving Sex life Iatrog Social1
2
3
4
5
6
7
Mea
n gl
obal
que
stio
n sc
ore
Ontario
Québec
Ontario 5.4 4.9 4 3.9 3.7 3.5 3.3 3 2.9 2.9 2.5 2.1
Québec 5.3 5.2 4.3 3.8 4 4 3.4 2 3 4.2 3.2 2.6
Susan Stock : Developing & Validating Health Status Measures
Comparison of Mean Ranking for Each Domain between Quebec and Ontario
Study Populations
WORK HOUSE/F SLEEP MOOD RECR $$$ S/CARE DRIV ESTEEM IATRO SOCIAL SEX1
2
3
4
5
6
7
8
9
10
11
12
13
- m
ea
n r
an
k
Ontario
Quebec
Ontario 10.8 8.4 8.1 6.9 6.5 6.5 6.5 5.9 5.7 4.8 4.7 3.7
Quebec 10.3 8.1 8.6 6.3 7.2 4.2 6.7 6.3 5.1 5.9 4.8 4.5
Susan Stock : Developing & Validating Health Status Measures
Selection of remaining items
Selection of the most responsive and most discriminating Selection of the most responsive and most discriminating items that covered the priority domains items that covered the priority domains
Number of items that would result in an instrument that Number of items that would result in an instrument that takes no more than 5-10 minutes to complete (version 1= takes no more than 5-10 minutes to complete (version 1= 35-items; version 2 = 20 items)35-items; version 2 = 20 items)
Selection among items with similar responsiveness and Selection among items with similar responsiveness and discriminative ability were selected based on the clinical discriminative ability were selected based on the clinical judgement of the co-investigator research committee judgement of the co-investigator research committee
Susan Stock : Developing & Validating Health Status Measures
Translation into French
double reverse parallel translation methoddouble reverse parallel translation method (Vallerand 1989) (Vallerand 1989) translation into French of the English questionnaire by two independent translation into French of the English questionnaire by two independent
translators (versions A and B)translators (versions A and B) the two French versions (versions A and B) translated into English by the two French versions (versions A and B) translated into English by
two different translators (versions C and D)two different translators (versions C and D) versions C and D compared to the original English version by a versions C and D compared to the original English version by a
committee comprised of three bilingual study researchers (two committee comprised of three bilingual study researchers (two francophones, one anglophone) and discrepancies resolved through francophones, one anglophone) and discrepancies resolved through consensus to arrive at a revised French translation, version Econsensus to arrive at a revised French translation, version E
version E pre-tested on 16 francophone workers with neck or upper version E pre-tested on 16 francophone workers with neck or upper extremity disorders to identify ambiguous or difficult to understand extremity disorders to identify ambiguous or difficult to understand itemsitems
results of the pre-test reviewed by the research translation committee results of the pre-test reviewed by the research translation committee and a final French version of the questionnaire was agreed upon and a final French version of the questionnaire was agreed upon (version F). (version F).
Susan Stock : Developing & Validating Health Status Measures
Criteria for acceptance of a French formulation
meaning of the French version was as close as possible to the English one
the most simple term would be selected (in order to be understandable at a Grade 6 or lower reading level)
French syntax would be respected the terms most commonly used in current Quebec
French would be selected
Susan Stock : Developing & Validating Health Status Measures
Comparison of English and French item reduction results Compare demographic profile of the 2 populationsCompare demographic profile of the 2 populations comparcomparee English and French subjects’ English and French subjects’ mean responses for the global mean responses for the global
question of each domainquestion of each domain by t-test for univariate analyses and by t-test for univariate analyses and multiple regression analyses controlling for sex, age, income and multiple regression analyses controlling for sex, age, income and duration of symptomsduration of symptoms
comparcomparee English and French subjects’ English and French subjects’ mean ranking scoresmean ranking scores for each for each domain by Wilcoxon rank-sum test for univariate analyses and by domain by Wilcoxon rank-sum test for univariate analyses and by partial Spearman correlations between the mean ranking score of each partial Spearman correlations between the mean ranking score of each domain and the study group status (i.e., English or French study domain and the study group status (i.e., English or French study group) controlling for sex, age, income and duration of symptomsgroup) controlling for sex, age, income and duration of symptoms
Susan Stock : Developing & Validating Health Status Measures
Comparison of Ontario and Quebec study populations
Ontario n=119 Quebec n=93
Gender
40.3% female; 59.7% male
55.9% female; 44.1% male
Mean age (S.D.)
39.7 yr. (± 10.1)
41,1 yr. (± 10,0)
% cases with duration of injury > 6 months
30.4
58.8
% cases off work
72.9
57.0
% cases on WCB
67.8
26.9
The Quebec study population was more likely to be female (p=.02), have had symptoms > 6 months (p=.001), still be at work (p=.02) and less likely to be on WCB benefits (p=0.0001)
Susan Stock : Developing & Validating Health Status Measures
Comparison of Global Question Mean Scores for Each Domain between Quebec
and Ontario Study Populations
Work Sleep Recr Mood Housework Esteem Self care Financial Driving Sex life Iatrog Social1
2
3
4
5
6
7
Mea
n gl
obal
que
stio
n sc
ore
Ontario
Québec
Ontario 5.4 4.9 4 3.9 3.7 3.5 3.3 3 2.9 2.9 2.5 2.1
Québec 5.3 5.2 4.3 3.8 4 4 3.4 2 3 4.2 3.2 2.6
Susan Stock : Developing & Validating Health Status Measures
Comparison of Mean Ranking for Each Domain between Quebec and Ontario
Study Populations
WORK HOUSE/F SLEEP MOOD RECR $$$ S/CARE DRIV ESTEEM IATRO SOCIAL SEX1
2
3
4
5
6
7
8
9
10
11
12
13
- m
ea
n r
an
k
Ontario
Quebec
Ontario 10.8 8.4 8.1 6.9 6.5 6.5 6.5 5.9 5.7 4.8 4.7 3.7
Quebec 10.3 8.1 8.6 6.3 7.2 4.2 6.7 6.3 5.1 5.9 4.8 4.5
Susan Stock : Developing & Validating Health Status Measures
Comparison of mean rank of each domain between English and French study subjects: univariate analyses
Domain «Wilcoxon rank sum» test (p)
Personal care .798 Family and domestic responsibilities .448 Work .099 Transportation .412 Mood .168 Self esteem .069 Sleep .251 Sexual life .018 recreation .084 Social life .442 Financial impact .000 Iatrogenic effects .017
Susan Stock : Developing & Validating Health Status Measures
Correlation of study status (English or French) to mean domain ranking controlling for age, gender, income and duration of symptoms
Domain Partial Spearman correlation coefficient
p
Personal care .003 .966 Family and domestic responsibilities
-.037 .622
Work -.092 .216 Transportation .076 .313 Mood -.116 .122 Self esteem -.135 .075 Sleep .077 .305 Sexual life .146 .051 recreation .170 .022 Social life .061 .415 Financial impact -.293 .0001 Iatrogenic effects .180 .015
Susan Stock : Developing & Validating Health Status Measures
Multiple regression for each domain to assess whether study status (English or French) was a predictor of the mean score of the domain global question when controlling for age, gender, income and duration of symptoms Domain Standardised
coefficient Signif.
Personal care -.016 .829 Family and domestic responsibilities
-.004 .950
Work -.041 .587 Transportation -.032 .676 Mood -.080 .290 Self esteem -.020 .791 Sleep -.013 .869 Sexual life .3091 .0004 recreation .102 .193 Social life .108 .156 Financial impact -.2302 .002 Iatrogenic effects .1491 .049
1 A positive coefficient indicates that French study subjects had significantly higher mean global scores than English subjects for that domain
2 A negative coefficient indicates that English study subjects had significantly higher mean global scores than French subjects for that domain
Susan Stock : Developing & Validating Health Status Measures
Synthesis of English-French comparisons Sexual life:Sexual life:
Statistically significant differences in mean ranking and mean domain Statistically significant differences in mean ranking and mean domain global score but clinically insignificant difference in rankingglobal score but clinically insignificant difference in ranking
Domain did not meet applicability criteriaDomain did not meet applicability criteria Financial impact/iatrogenic effects:Financial impact/iatrogenic effects:
Statistically significant differences in mean ranking and mean domain Statistically significant differences in mean ranking and mean domain global score probably reflecting differences in proportion of subjects off global score probably reflecting differences in proportion of subjects off work and differences in clinical treatment programwork and differences in clinical treatment program
Overall no major differences in mean domain rankings or mean Overall no major differences in mean domain rankings or mean domain scores or in results of individual item reduction domain scores or in results of individual item reduction
A single instrument could be developed for both populationsA single instrument could be developed for both populations
Susan Stock : Developing & Validating Health Status Measures
Final instrument
20 items:20 items: 4 work4 work 7 physical activities (self care, domestic 7 physical activities (self care, domestic
responsibilities, leisure)responsibilities, leisure) 6 psychosocial (mood, self esteem, social role function)6 psychosocial (mood, self esteem, social role function) 2 sleep2 sleep 1 iatrogenic1 iatrogenic
Susan Stock : Developing & Validating Health Status Measures
Validation of a health status measure Internal consistencyInternal consistency Reproducibility (test re-test reliability)Reproducibility (test re-test reliability) ValidityValidity
ContentContent Criterion or convergentCriterion or convergent ConstructConstruct PredictivePredictive
Responsive to changeResponsive to change
Susan Stock : Developing & Validating Health Status Measures
Measures of internal consistency
Cronbach alphaCronbach alpha (0.0-1.0) (0.0-1.0) An estimate of the correlation between the total score across a An estimate of the correlation between the total score across a
series of items from a rating scale and the total score that would series of items from a rating scale and the total score that would have been obtained had a comparable series of items been have been obtained had a comparable series of items been employedemployed
Inter-item correlationsInter-item correlations Item-total correlationsItem-total correlations (total (total ± item)± item) Correlation of item to mean of itemsCorrelation of item to mean of items (mean (mean ±± item) item) Split halSplit half reliabilityf reliability (items randomly divided and 2 sub-scales (items randomly divided and 2 sub-scales
correlated)correlated)
Susan Stock : Developing & Validating Health Status Measures
Reliability
Test re-test reliabilityTest re-test reliability: the stability : the stability exhibited when a exhibited when a measurement is repeated under identical conditionsmeasurement is repeated under identical conditions calculation of the intra-class correlation (ICC) for two calculation of the intra-class correlation (ICC) for two
administrations of the index, 3-7 day apart in 99 administrations of the index, 3-7 day apart in 99 Ontario subjects and 33 Quebec subjectsOntario subjects and 33 Quebec subjects
Internal consistencyInternal consistency: : intercorrelation between items of a intercorrelation between items of a scale meant to measure the same conceptscale meant to measure the same concept Cronbach’s alpha calculated for 119 Ontario subjects Cronbach’s alpha calculated for 119 Ontario subjects
and 93 Quebec subjects present at the initial pre-and 93 Quebec subjects present at the initial pre-treatment administration of the questionnairestreatment administration of the questionnaires
Susan Stock : Developing & Validating Health Status Measures
Ways of improving reproducibility
Increase the number of items in a test or measurement Increase the number of items in a test or measurement scalescale
Increase the number of response choices for each itemIncrease the number of response choices for each item Reduce inter-observer variation (training of interviewers, Reduce inter-observer variation (training of interviewers,
standardised protocol)standardised protocol) Reduce ambiguity in questionsReduce ambiguity in questions
Susan Stock : Developing & Validating Health Status Measures
Validity
An expression of the degree to which a measurment An expression of the degree to which a measurment measures what it purports to measure (Last)measures what it purports to measure (Last)
Is the scale measuring what it was intended to measure?Is the scale measuring what it was intended to measure?
Susan Stock : Developing & Validating Health Status Measures
How do we demonstrate validity?
SSubjective judgement by ubjective judgement by ““expertsexperts””:: Face validityFace validity:: the extent to which, on the face of it, the the extent to which, on the face of it, the
measurement appears to be assessing the desired measurement appears to be assessing the desired qualitiesqualities
Content validityContent validity: the extent to which the measurement : the extent to which the measurement incorporates incorporates all the relevant content orall the relevant content or domains of the domains of the phenomenon under studyphenomenon under study
Susan Stock : Developing & Validating Health Status Measures
How do we demonstrate validity?
Criterion validityCriterion validity:: extent to which the extent to which the measurement correlates with an external criterion measurement correlates with an external criterion (preferably (preferably a a ""gold standardgold standard")")
Susan Stock : Developing & Validating Health Status Measures
How do we demonstrate validity?
Convergent (concurrent) validity:Convergent (concurrent) validity: correlation correlation between measurebetween measurementment of interest and another measure of interest and another measurementment known to measure the same concept known to measure the same concept and measured at the and measured at the same time same time (0.4-0.8)(0.4-0.8)
Predictive validity:Predictive validity: ability of a measurement to predict ability of a measurement to predict the criterionthe criterion
Susan Stock : Developing & Validating Health Status Measures
Reliability and Responsiveness of Revised 20-item NULI / IDVQ
Ontario (based on
original items)
Quebec (based on
revised format)
Reproducibility test re-test reliability (ICC)
0.88 (n =99)
0.83 (n = 33)
Internal consistency (Chronbach alpha)
0.90 (n = 119)
0.93 (n = 93)
Reponsiveness (Standardised response mean with 95% CI)
1.48 (1.1-1.8) (n = 33)
1.63 (1.3-2.0) (n = 35)
Susan Stock : Developing & Validating Health Status Measures
Example of convergent validity: Pearson’s Correlations between NULI and other measures
Ontario (based on original items)
Québec (based on revised format)
1-item Global question (mean subject-clinician)
0.60
0.73
Pain Scale
0.42
0.55
Shoulder abduction
-0.32
-0.47
Scratch test
0.37
0.30
Hand grip strength (Jamar)
0.29
-0.41
SIP 0.66
N/A
Physical component SF-36
N/A
-0.50
Mental SF-36
N/A
-0.52
Susan Stock : Developing & Validating Health Status Measures
How do we demonstrate validity?
Construct validityConstruct validity:: the extent to which the the extent to which the measurement corresponds to theoretical constructs measurement corresponds to theoretical constructs concerning the phenomenon under study concerning the phenomenon under study e.g., testing a hypothesis about whether the measure will e.g., testing a hypothesis about whether the measure will
distinguish between 2 groups who differ with respect to the distinguish between 2 groups who differ with respect to the concept of interestconcept of interest
e.g. NULI: those who returned to work had significantly lower e.g. NULI: those who returned to work had significantly lower NULI scores at the post-treatment administration than those NULI scores at the post-treatment administration than those who did not return to work at that timewho did not return to work at that time
Tests theory and measure at the same timeTests theory and measure at the same time
Susan Stock : Developing & Validating Health Status Measures
Responsiveness
The ability of a measure to detect change (in the The ability of a measure to detect change (in the construct being measured) over timeconstruct being measured) over time
AKA « sensitivity to change »AKA « sensitivity to change » Important when testing effectiveness of an Important when testing effectiveness of an
interventionintervention
Susan Stock : Developing & Validating Health Status Measures
Statistical measures of responsiveness
Effect sizeEffect size – ability to detect the effect of treatments – ability to detect the effect of treatments Ratio of the difference between groups to the variability within Ratio of the difference between groups to the variability within
groupsgroups Numerator: raw change scoreNumerator: raw change score Denominator: Denominator:
• standard deviation of pre-test scores vs standard deviation of pre-test scores vs • SD of change scores vs SD of change scores vs • standard error of change score vs standard error of change score vs • SD of change score in stable subjectsSD of change score in stable subjects
Example: Example: Standardised response meanStandardised response mean: mean change score : mean change score SD of change scoresSD of change scores
Susan Stock : Developing & Validating Health Status Measures
Responsiveness to change of NULI
Standardized response mean (SRM) Standardized response mean (SRM) calculated for 33 Ontario subjects and 35 calculated for 33 Ontario subjects and 35 Quebec subjects who both subject and Quebec subjects who both subject and clinician deemed improved on a 1-item clinician deemed improved on a 1-item global question of disabilityglobal question of disability
Susan Stock : Developing & Validating Health Status Measures
Comparison of standardised response means of Revised 20-item NULI / IDVQ and
other measures Ontario
Québec
NULI (IDVQ) -20
1.48 (1.1,1.8)
1.63 (1.3,2.0)
Pain Scale 1.22 (0.9,1.6) 1.73 (1.4,2.1) Shoulder abduction -0.61 (-1.0,-0.3) -1.16 (-1.6,-0.7) Scratch test 0.02 (-0.3,0.4) 0.59 (0.2,1.0) Hand grip strength (Jamar)
-0.80 (-1.2,-0.5) -0.33 (-0.8,0.1)
SIP (total) 1.14 (0.8,1.5) - Physical component-SF36
- -1.26 (-1.6,-0.9)
Mental component –SF36
- -0.48 (-0.5,0.2)
Susan Stock : Developing & Validating Health Status Measures
Existing instrument vs. designing your own
Development of a reliable, valid instrument is a Development of a reliable, valid instrument is a lengthy, complicated processlengthy, complicated process
Whenever possible, use existing instrument with Whenever possible, use existing instrument with known reliability and validityknown reliability and validity
When choosing among existing instruments, When choosing among existing instruments, choose the instrument with the best reliability, choose the instrument with the best reliability, validity and/or responsiveness that will measure validity and/or responsiveness that will measure the concept you wish to measurethe concept you wish to measure
Susan Stock : Developing & Validating Health Status Measures
Choosing a health outcome measure: Internet resources
11. Quality of life compendium: choosing a quality of life instrument . Quality of life compendium: choosing a quality of life instrument (from the Dept of Public Health and Primary Health Care, (from the Dept of Public Health and Primary Health Care, University of Bergen, NorwayUniversity of Bergen, Norway)) www.www.uibuib.no/.no/isfisf/people/doc//people/doc/qolqol/comp0006./comp0006.htmhtm
2. 2. Quality of LifeQuality of Life Assessment in MedicineAssessment in Medicine - - Internet ResourcesInternet Resources http://www.http://www.qlmedqlmed.org/.org/urlurl..htmhtm
33. Clinician’s computer-assisted guide to the choice of instruments . Clinician’s computer-assisted guide to the choice of instruments for quality of life assessment in medicine for quality of life assessment in medicine http://www.http://www.glammglamm.com/.com/qlql/guide./guide.htmhtm
44. Medical Outcomes Trust Scientific Advisory Committee . Medical Outcomes Trust Scientific Advisory Committee Instrument Review CriteriaInstrument Review Criteria
http://www.outcomes-trust.org/bulletin/34sacrev.http://www.outcomes-trust.org/bulletin/34sacrev.htmhtm