1 measurement issues in health disparities research anita l. stewart, ph.d. university of...
Post on 04-Jan-2016
212 Views
Preview:
TRANSCRIPT
1
Measurement Issues in Health Disparities Research
Anita L. Stewart, Ph.D.University of California, San Francisco
Clinical Research with Diverse CommunitiesEPI 222, SpringApril 17, 2008
2
Background
U.S. population becoming more diverse More minority groups are being included
in research due to:– NIH mandate
– Recent health disparities initiatives
3
Types of Diverse Groups
Health disparities research focuses on differences in health between the following groups:– Minority vs. non-minority– Low income vs. others– Low education vs. others– Limited English skills vs. others– …. and others
4
Health Disparities Research
Describe health disparities– Health differences across various diverse groups
Identify mechanisms by which health disparities occur– Individual level– Environmental level
Intervene to reduce health disparities
5
Health Care Disparities
Differential access to and quality of health care is well known– Thus, health care disparities become a
plausible mechanism for health disparities Understanding determinants of health care
disparities is also of interest
6
Types of Self-Report Measures Needed
Measures of health, and of various mechanisms for disparities– Class 4 will present numerous mechanisms
Examples from this class: sense of control, self-efficacy for managing disease, health-related quality of life for various health conditions
7
Measurement Implications of Research in Diverse Groups
Most self-reported measures were developed and tested in mainstream, well-educated groups– Subgroup analysis of measures has been rare
Thus, little information is available on appropriateness, reliability, validity, and responsiveness in minority and other diverse groups
8
The Measurement Goal: Identify Measures That Can Be Used…
To compare diverse groups To study mechanisms within any
particular “diverse” group
9
Group Comparisons are the Most Problematic
Disparities research involves comparing mean levels of health or its determinants
Requires “equivalent” concepts and measures – Potential true differences may be obscured
– Observed group differences may be inaccurate
10
Alternative Explanations for Observed Group Differences
Observed group mean differences in a measure can be due to
– culturally- or group-mediated differences in true score (true differences) -- OR --
– bias - systematic differences between group observed scores not attributable to true scores
11
Bias - A Special Concern
Measurement bias in any one group may make group comparisons invalid
Bias can be due to group differences in:– the meaning of concepts or items – the extent to which a measure represents a concept – cognitive processes of responding– use of response scales– appropriateness of data collection methods
12
Example of Effect of Biased Items
5 CES-D items administered to black and white men– 1 item subject to differential item functioning (bias)
5-item scale including item suggested that black men had higher levels of somatic symptoms than white men (p < .01)
4-item scale excluding biased item showed no differences between black and white men
S Gregorich, Med Care, 2006;44:S78-S94.
13
Bias or “Systematic Difference”?
Bias refers to “deviation from true score” Cannot speak of a measure being “biased” in
one group compared to another w/o knowing true score
Preferred term: differential “item” functioning– Item (or measure) that has a different meaning
in one group than another
14
Typical Sequence of Developing New Self-Report Measures
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
15
Typical Sequence of Developing New Self-Report Measures
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
16
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
17
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
.. to reflect these perspectives
18
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
.. to reflect these perspectives
.. in all diverse groups
19
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
.. to reflect these perspectives
.. in all diverse groups
.. in all diverse groups
20
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
.. to reflect these perspectives
.. in all diverse groups
.. in all diverse groupsMeasurement studies across groups
21
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
.. to reflect these perspectives
.. in all diverse groups
.. in all diverse groups
If results are non-equivalent
22
Measurement Adequacy vs. Measurement Equivalence
Making group comparisons requires conceptual and psychometric adequacy and equivalence
Adequacy - within a “diverse” group– concepts are appropriate– psychometric properties meet minimal criteria
Equivalence - between “diverse” groups– conceptual and psychometric properties are
comparable
23
Why Not Use Culture-Specific Measures? Measurement goal - identify measures that
can be used across all groups, yet maintain sensitivity to diversity and have minimal bias
Most health disparities studies require comparing mean scores across diverse groups– need comparable measures
24
Conceptual and Psychometric Adequacy and Equivalence
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
25
Left Side of Matrix: Issues in a Single Group
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
26
Ride Side of Matrix: Issues in More Than One Group
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
27
Conceptual Adequacy in One Group
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
28
Conceptual Adequacy in One Group Is concept relevant, meaningful, and acceptable
to a “diverse” group? Traditional research
– Conceptual adequacy = simply defining a concept– Mainstream population “assumed”
Minority and health disparities research– Mainstream concepts may be inadequate– Concept should correspond to how a particular
group thinks about it
29
Qualitative Approaches to Explore Conceptual Adequacy in Diverse Groups
Literature reviews– ethnographic and anthropological
In-depth interviews and focus groups – discuss concepts, obtain their views
Expert consultation from diverse groups– review concept definitions– rate relevance of items
30
Example of Inadequate Concept
Patient satisfaction typically conceptualized in mainstream populations in terms of, e.g.,– access, technical care, communication,
continuity, interpersonal style In minority and low income groups,
additional relevant domains include, e.g., – discrimination by health professionals
– sensitivity to language barriers
31
Method for Examining Conceptual Relevance
Compiled set of 33 HRQL items spanning many concepts
Assessed relevance to older African Americans After answering each question, asked “how
relevant is this question to the way you think about your health?”– Response scale: 0-10 scale with endpoints labeled– Labels: 0=not at all relevant, 10=extremely relevant
Cunningham WE et al., Qual Life Res, 1999;8:749-768.
32
Results: Conceptual Relevance
Most relevant items:– Spirituality (3 items)– Weight-related health (2 items)– Hopefulness (1 item)
Spirituality items– importance of spirituality to well-being, level
of spirituality, being sick affected spirituality
33
Results: Conceptual Relevance
Least relevant items:– Physical functioning
– Role limitations due to emotional problems All standard MOS measures ranked in the
lower 2/3, including all SF12 items
34
Conceptual Relevance of Spanish FACT-G
Bilingual/bicultural expert panel reviewed all 28 items for relevance– One item had low cultural relevance to quality of life
– One concept was missing – spirituality
Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts– Sample item “I worry about dying”
Cella D et al. Med Care 1998: 36;1407
35
Psychometric Adequacy in One Group
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
36
Psychometric Adequacy in any Group
Minimal standards:– Sufficient variability– Minimal missing data– Adequate reliability/reproducibility– Evidence of construct validity– Evidence of responsiveness to change
Basic classical test theory approach
37
Evidence of Psychometric Inadequacy of Measure in Various Diverse Groups
SF-36 social functioning scale - internal consistency reliability < .70 in three different samples:– Chinese language, adults aged 55-96 years
– Japanese language, Japanese elders
– English, Pima Indians
Stewart AL & Nápoles-Springer A, Med Care, 2000;38(9 Suppl):II-102
38
Conceptual Equivalence Across Groups
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
39
Conceptual Equivalence
Is the concept relevant, familiar, acceptable to all diverse groups being studied?
Is the concept defined the same way in all groups? – all relevant “domains” included (none missing)
– interpreted similarly Is the concept appropriate for all diverse
groups?
40
Generic/Universal vs Group-Specific(Etic versus Emic)
Concepts unlikely to be defined exactly the same way across diverse ethnic groups
Generic/universal (etic)– features of a concept that are appropriate across
groups Group-Specific (emic)
– idiosyncratic or culture-specific portions of a concept
41
Etic versus Emic (cont.)
Goal in health disparities research on more than one group:– identify generic/universal portion of a concept (could be entire
concept) that can be applied across all groups For within-group analyses or studies
– the culture-specific portion is also relevant– Same as examining “conceptual adequacy” within one group
42
Approaches Similar to Those for Conceptual Adequacy
Main difference: Need to assure concept is equivalent across groups– Additional criterion
What do we mean by equivalent conceptually?
Methods are poorly developed
43
Obtain Perspective of All Diverse Groups on Concept
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
44
Example: Develop Concept of Interpersonal Processes of Care
Began with conceptual framework from literature and psychometric studies of preliminary survey
IPC Version I Conceptual Framework Three major multi-dimensional categories:
– Communication– Decision-making– Interpersonal Style
45
IPC Version I: Subdomains
Communication Elicitation of concerns, explanations,
general clarity Decision-making
Involving patients in decisions Interpersonal Style
Respectfulness, emotional support, non-discrimination, cultural sensitivity
Stewart et al., Milbank Quarterly, 1999: 77:305
46
Limitations of First IPC Framework
Tested on small sample of 600 patients from San Francisco General Hospital
Several hypothesized concepts were not confirmed– e.g., cultural sensitivity
Needed further development and validation on a larger sample
47
Developed Revised IPC Concept
Draft IPC II
conceptualframework
IPC Version I frameworkin Milbank Quarterly
19 focus groups -African American, Latino,and White adults
Literature review of quality
of care in diverse groups
48
IPC-II Conceptual Framework
I. COMMUNICATION III. INTERPERSONAL STYLE General clarity Respectfulness Elicitation/responsiveness Courteousness Explanations of Perceived discrimination --processes, condition, Emotional support self-care, meds Cultural sensitivity Empowerment II. DECISION MAKING Responsive to patient preferences Consider ability to comply
49
IPC-II Conceptual Framework (cont)
IV. OFFICE STAFF Respectfulness Discrimination V. FOR LIMITED ENGLISH PROFICIENCY PATIENTS MD’s and office staff’s sensitivity to language
50
Psychometric Equivalence
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
51
Psychometric Equivalence
Measures have similar measurement properties in all diverse group of interest in your study– e.g., English and Spanish language, African
Americans and Caucasians Measures have similar measurement properties
in one diverse group as in original (mainstream) groups on which the measures were developed
52
Equivalence of Reliability?? No!
Difficult to compare reliability because it depends on the distribution of the construct in a sample– Thus lower reliability in one group may simply
reflect poorer variability More important is the adequacy of the
reliability in both groups– Reliability meets minimal criteria within each group
53
Equivalence of Criterion Validity
Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g.– a measure predicts utilization in both groups
– a cutpoint on a screening measure has the same specificity and sensitivity in both groups
54
Equivalence of Construct Validity Are hypothesized patterns of associations
confirmed in both groups?– Example: Scores on the Spanish version of the
FACT had similar relationships with other health measures as scores on the English version
Primarily tested through subjectively examining pattern of correlations– Can test differences using confirmatory factor
analysis (e.g., through Structural Equation Modeling)
55
Item Equivalence
Differential Item Functioning (DIF)– Items are non-equivalent if they are
differentially related to the underlying trait– Equivalence indicated by no DIF
Meaning of response categories is similar across groups
Distance between response categories is similar across groups
56
Equivalence of Response Choices: Spanish and English Self-rated Health
Excellent Very good Good Fair Poor
Excelente Muy buena Buena Regular Mala
“Regular” in Spanish may be closer to “good” in English, thus is not comparable to the meaning of “fair”
57
Spanish and English Self-rated Health Responses
Excellent Very good Good Fair Poor
Excelente Muy buena Buena Regular (Pasable?) Mala
Another choice, “pasable,” may be closerin meaning to “fair”
58
Methods for Identifying Differential Item Functioning (DIF)
Item Response Theory (IRT) Examines each item in relation to underlying
latent trait Tests if responses to one item predict the
underlying latent “score” similarly in two groups– if not, items have “differential item functioning”
59
Equivalence of Factor Structure
Factor structure is similar in new group to structure in original groups in which measure was tested– In other words, the measurement model is the
same across groups Methods
– Specify the number of factors you are looking for
– Determine if the hypothesized model fits the data
60
Confirmatory Factor Analysis (CFA)
Can specify a hypothesized structure a priori Can test mean and covariance structures
– to estimate bias
61
Equivalence of Factor Structure: Testing Psychometric Invariance
Psychometric invariance (equivalence) Important properties of theoretically-based
factor structure (measurement model) do not vary across groups (are invariant)– measurement model is the same across groups
Empirical comparison across groups– Not simply by examination
62
Criteria for Psychometric Invariance
Across all groups – a sequential process: Same number of factors or dimensions Same items on same factors Same factor loadings No bias on any item across groups Same residuals on items No item or scale bias AND same residuals
63
Dimensional Invariance: Same number of factors
Configural Invariance: Same items load on same factors
Metric or Factor Pattern Invariance:Items have same loadings on same factors
Scalar or Strong Factorial Invariance:
Observed scores are unbiased
Residual Invariance:Observed item and factor
variances are unbiased
Strict Factorial InvarianceBoth scalar and residual criteria are met
Criteria for Evaluating Invariance AcrossGroups: Technical Terms
64
Interpersonal Processes of Care (IPC)
Social-psychological aspects of the patient-physician interaction– communication, respectfulness, patient-
centered decision-making, and being sensitive to patients’ needs
Developed survey of 92 items based on principles outlined above
65
Conducted Survey
From over 16,000 primary care patients, randomly sampled those who:– Made at least one visit in prior 12 months
– Records indicated they were African American, Latino, or White (Caucasian)
Sampled within race/ethnic group
66
Sample Size (N=1,664)
383 Spanish speaking Latino
435 African American
428 English speaking Latino
418 White
67
Results
Of the 92 items, 29 had similar factor structure across all 4 groups– achieved “metric invariance”
68
Dimensional Invariance: Same number of factors
Configural Invariance: Same items load on same factors
Metric or Factor Pattern Invariance:Items have same loadings on same factors
Strong Factorial or ScalarInvariance:
Observed scores are unbiased
Residual Invariance:Observed item and factor variances
can be compared across groups
Strict Factorial InvarianceBoth scalar invariance and residual invariance criteria are met
Results: Metric Invariance Across 4 Groups for 29 Items
69
Seven “Metric Invariant” Scales (29 items)
I. COMMUNICATION Hurried communication Elicited concerns, responded Explained results, medications
II. DECISION MAKING Patient-centered decision-making
III. INTERPERSONAL STYLE Compassionate, respectful Discriminated Disrespectful office staff
70
Continued Exploration of Invariance: Item Bias
Tested invariance of model parameter estimates across groups for “scalar invariance”– Bias in items
71
Dimensional Invariance: Same number of factors
Configural Invariance: Same items load on same factors
Metric or Factor Pattern Invariance:Items have same loadings on same factors
Strong Factorial or ScalarInvariance:
Observed scores are unbiased
Residual Invariance:Observed item and factor variances
can be compared across groups
Strict Factorial InvarianceBoth scalar invariance and residual invariance criteria are met
Obtained Partial Scalar Invariance Across 4 Groups for 18 Items
72
Seven “Scalar Invariant” (Unbiased) Scales (18 items)
I. COMMUNICATION Hurried communication – lack of clarity Elicited concerns, responded Explained results, medications – explained results
II. DECISION MAKING Patient-centered decision-making – decided together
III. INTERPERSONAL STYLE Compassionate, respectful–(subset) compassionate, respectful Discriminated – discriminated due to race/ethnicity Disrespectful office staff
73
What to do if Measures Are Not Equivalent in a Specific Study Comparing Groups
Need guidelines for how to handle data when substantial non-comparability is found in a study– Drop bad or “biased” items from scores
»Compare results with and without biased items
– Analyze study by stratifying diverse groups The current challenge for measurement in minority
health studies
74
Example: 20-item Spanish CES-D in Older Latinos
2 items had very low item-scale correlations, high rates of missing data in two studies– I felt hopeful about the future– I felt I was just as good as other people
20-item version Study 1 Study 2– Item-scale correlations -.20 to .73 .05 to .78– Cronbach’s alpha
18-item version – Item-scale correlations .45 to .76 .33 to .79
75
Example: Measure Can be Modified
GHAA Consumer Satisfaction Survey Adapted to be appropriate for African
American patients– Focus groups conducted to obtain perspectives of
African Americans– New domains added (e.g., discrimination/
stereotyping)– New items added to existing domains
Fongwa M et al. Ethnicity and Disease, 2006:16;948-955.
76
Approaches to Conducting Studies When You Are Not Sure
Use a combination of “universal” and group- specific items– use universal items to compare across groups – use specific items (added onto universal items)
when conducting analyses within one group»To find a variable that correlates with a health
measure within one group
77
Conclusions Measurement in health disparities and
minority health research is a relatively new field - few guidelines
Encourage first steps - test and report adequacy and equivalence
As evidence grows, concepts and measures that work better across diverse groups will be identified
78
Two Special Journal Issues on Measurement in Diverse Populations
Measurement in older ethnically diverse populations– J Mental Health Aging, Vol 7, Spring 2001
Measurement in a multi-ethnic society– Med Care, Vol 44, November 2006
79
Homework for Class 3
Using the same template and measure you reviewed for class 2, complete sections 14-21– Use same file and submit entire document by
email to anita.stewart@ucsf.edu
top related