medicare health outcomes survey evaluation...medicare+choice (m+cos) plans (renamed medicare...

Medicare Health Outcomes Survey Evaluation

December 27, 2004

FINAL REPORTContract #50002MD02

Centers for Medicare & Medicaid Services

Co-Government Task Leaders

Samuel C. Haffer, Ph.D. Sonya Bowen, MSW

Delmarva Foundation for Medical Care, Inc.

Mark Plunkett, Ph.D.

Roxanne Rodgers, RN

University of Maryland, Baltimore County

Nathaniel Jones III, Ph.D., M.B.A

Stephanie L. Jones, Ph.D. Marvin Mandell, Ph.D.

Dave E. Marcotte, Ph.D.Nancy A. Miller, Ph.D.

Delmarva Foundation We don’t provide healthcare…we make it better.

M ed ica re H ea lth O u tcom es S u rvey P rogram E va lu a tion

Contents EXECUTIVE SUMMARY ................................................................................................................iii

Background.......................................................................................................................................iii Sources of Information .....................................................................................................................iii Key Findings.....................................................................................................................................iii Key Recommendations .....................................................................................................................iv

MEDICARE HEALTH OUTCOMES SURVEY PROGRAM EVALUATION ...........................1 I. INTRODUCTION..........................................................................................................................1

Evaluation Logic Model ...............................................................................................................2 II. HOS PROGRAM: ORIGINS, MILIEU, AND PROGRAM DESCRIPTION .............................6

Introduction ..................................................................................................................................6 Research Design and Methodology..............................................................................................7 Measuring Health Care Quality and Performance........................................................................8

Health Outcomes Measurement ................................................................................................8 Linking Outcomes and Process Measurement ..........................................................................9 Transitions in Health Care Quality Improvement Initiatives ..................................................10

Health Care Quality Improvement at CMS ................................................................................11 Performance Measurement in Medicare Managed Care ............................................................13

Medicare HEDIS .....................................................................................................................13 Medicare CAHPS ....................................................................................................................14 Quality Assessment and Performance Improvement (QAPI) ..................................................14

The Medicare Health Outcomes Survey (HOS) Program ..........................................................14 Early Development ..................................................................................................................14 HOS Program Goals ...............................................................................................................18 Instrument................................................................................................................................18 HOS Design and Survey Methodology ....................................................................................20 HOS Partners ..........................................................................................................................20 Program Administration..........................................................................................................22 HOS Costs ...............................................................................................................................22

Summary ....................................................................................................................................24 III. HEALTH OUTCOMES SURVEY TECHNICAL PROPERTIES ...........................................25

Introduction ................................................................................................................................25 Using the SF-36 for Medicare Managed Care Enrollees............................................................26 Reliability of the SF-36 in Older Populations ..........................................................................26

Analytic Approach for HOS Reliability Assessment.............................................................28 Results Related to HOS Reliability........................................................................................29 Discussion of HOS Reliability Results ..................................................................................38

Validity of the SF-36 in Older Populations ..............................................................................39 Analytic Approach for Assessment of HOS Validity ............................................................41 Results Related to HOS Criterion Validity ............................................................................42 Discussion of HOS Validity Results......................................................................................46

Attrition .....................................................................................................................................47 Overview of Analyses to Assess Significance of Attrition in HOS.......................................48 Results of Analyses to Assess Significance of Attrition in HOS...........................................49

Demographics .....................................................................................................................51 Socioeconomic Status .........................................................................................................52 Health Status and Context ..................................................................................................53

Final Report December 27, 2004

i



Multivariate Analyses ............................................................................................................57 Discussion of Attrition Results ..............................................................................................61

Statistical Power and Minimum Detectable Effects..................................................................62 Statistical Model and Estimation Procedure ..........................................................................63 Calculation of Power and Minimum Detectable Effects .......................................................66 Results....................................................................................................................................68

Power ..................................................................................................................................68 Minimum Detectable Effects ...............................................................................................71 Ability to Detect Differences in Performance Among Health Plans...................................80 Discussion of Power and Minimal Detectable Effects........................................................80

IV. UTILIZATION OF HOS RESULTS ........................................................................................81 HOS Dissemination Strategies and Communication Tools........................................................81 Analytic Approach to Review of Current HOS Use...................................................................83 QIO and M+CO User Surveys..................................................................................................83 QIO and M+CO Focus Groups ................................................................................................85 CMS Expert Interviews .............................................................................................................85

Results of Utilization Surveys ....................................................................................................85 Utility of HOS for QIOs ............................................................................................................85 Utility of HOS for M+COs........................................................................................................94 Utility of HOS for CMS ...........................................................................................................103 Utility of HOS for Health Services Researchers .....................................................................105

Discussion of How HOS Data Are Currently Used .................................................................106 V. PROGRAM ALTERNATIVES................................................................................................109

Increasing The Effective Sample Size......................................................................................109 Shortening the HOS Instrument ...............................................................................................111 Adding Process Measures to the HOS Instrument ...................................................................112 Increasing the Impetus for QIOs and Health Plans To Use HOS Data for Quality Improvement .......................................................................................................113 Discussion of Alternatives........................................................................................................115

VI. SUMMARY AND FUTURE DIRECTIONS.........................................................................116 Summary...................................................................................................................................116 Future Directions ......................................................................................................................117

APPENDIX I ....................................................................................................................................120 List of Interviewees .......................................................................................................................120 Example of Stakeholder Interview Questions – Health Services Advisory Group (HSAG).........121

APPENDIX II...................................................................................................................................123 CMS Leadership/Technical Experts Central to HOS Development..............................................123

APPENDIX III .................................................................................................................................124 M+CO & QIO Survey Instruments................................................................................................124 M+CO & QIO Focus Group Agenda and Questions.....................................................................124 CMS Users Interview Questions....................................................................................................125

REFERENCES.................................................................................................................................127


i i



EXECUTIVE SUMMARY Background

The Medicare Health Outcomes Survey (HOS) program was developed in an era when there was growing concern about the quality of care in managed care organizations. The Medicare HOS is the first national survey to measure the quality of life and functional health status of Medicare beneficiaries enrolled in managed care (Cooper et al., 2001). It is a measure of a health plan’s ability to maintain or improve physical and mental health of its Medicare beneficiaries over time. The primary goal of the HOS is to gather valid and reliable health status data in Medicare managed care plans for use in quality improvement activities, public reporting, plan accountability, and improving health outcomes. The HOS was first implemented in 1998 using a nonstratified random sample of Medicare beneficiaries enrolled in Medicare+Choice (M+CO) plans. Each HOS respondent receives a follow-up survey two years after completing the baseline survey. To date, seven baseline surveys and five follow-up surveys have been administered.

In the spring of 2003 the Centers for Medicare & Medicaid Services (CMS) funded this evaluation of the Medicare HOS program. The evaluation encompasses three components: 1) a review of the health care quality improvement milieu at the time of the genesis of HOS, a description of the current program, and methods for considering program changes; 2) an evaluation of the properties of the HOS instrument and its operational protocol as well as an assessment of the use of HOS data by M+COs, quality improvement organizations (QIOs), CMS, and health care researchers; and 3) a discussion of the alternatives for improving the HOS program. This report is both a compilation of these components and an analysis of the conclusions on key issues that are raised.

Sources of Information

CMS let a contract for this evaluation study. The Delmarva Foundation administered the contract and served as the synthesizer of this document. Researchers from the University of Maryland, Baltimore County’s Department of Public Policy and the Maryland Institute of Policy Analysis and Research performed the research and analyses and developed most of the conclusions that form the body of this evaluation study. The following sources of information were used in this evaluation study:

1. Review of literature 2. Cohorts 1 through 5 HOS data 3. Interviews and focus groups with research professionals, QIOs, M+COs, and CMS

Key Findings

In this evaluation study: (a) relevant literature was extensively reviewed; (b) HOS data were analyzed for reliability, validity, attrition, and power; and (c) stakeholders involved in the development or use of the HOS were interviewed. The key research findings are as follows:

1. The HOS was an innovative response to the need for information about the health status of Medicare beneficiaries enrolled in managed care and M+CO plan quality.


i i i



2. The HOS measure is reliable for the elderly population. 3. The HOS measure is valid for the elderly population. 4. Most survey attrition is the result of random factors. 5. Whether the HOS has adequate power depends upon the threshold of actual net plan effects

deemed important to detect. This threshold cannot be determined on analytical grounds alone. Rather, it depends critically on the clinical and policy implications of actual net plan effects of various sizes.

6. Both QIOs and M+COs use HOS data for quality improvement purposes. 7. Room for improvement in the extent to which QIOs and M+COs use HOS data for quality

improvement exists. 8. CMS uses the HOS data to assess M+CO quality and reward performance. 9. Health researchers use the HOS data extensively. 10. The HOS program is meeting CMS’s stated goals.

Key Recommendations

The HOS program is found to be meeting goals; however, areas for improvement were identified. Recommendations are as follows:

1. Consider possible means of increasing the effective sample size of the HOS, if the conclusion is reached that the HOS does not have adequate power. Possible mechanisms include increasing the baseline sample size, increasing the baseline response rate, reducing the attrition rate between the baseline and follow-up surveys, and reducing the amount of missing data.

2. Evaluate the merits of shortening the instrument. 3. Evaluate adding process measures to the HOS. 4. Develop mechanisms to encourage QIOs and M+COs both to use the HOS data and to work

together. 5. Consider publicly reporting the HOS data. 6. Develop mechanisms to help M+COs and QIOs understand how to improve the HOS results. 7. Develop ways to raise awareness of HOS products and timelines. 8. Consider allowing M+COs to increase sample sizes at their expense to facilitate M+CO

quality improvement projects.


iv



MEDICARE HEALTH OUTCOMES SURVEY PROGRAM EVALUATION

I. INTRODUCTION

Managed care plans are an important source of health care services for Medicare beneficiaries. At present, 5.3 million beneficiaries receive care in these settings, of whom 87% are enrolled in Medicare+Choice (M+COs) plans (renamed Medicare Advantage plans in the fall of 2004) (CMS Data Compendium, 2003). The number enrolled in M+CO plans is projected to increase to 13.6 million by 2010, given the recent passage of the Medicare Prescription Drug, Improvement and Modernization Act (Trustees Report, 2004). Thus, an estimated 30% of Medicare beneficiaries will be enrolled in managed care settings by 2010. Quality of care provided in managed care settings remains a critical and growing concern.

A similar concern about health quality and rising managed care enrollment was a major impetus when, in the spring of 1997, the Centers for Medicare & Medicaid Services (CMS) sponsored the development of a national health outcomes survey of Medicare beneficiaries enrolled in M+COs. The primary goal of the HOS is to gather valid and reliable health status data in Medicare managed care plans for use in quality improvement activities, public reporting, plan accountability, and improving health outcomes. The HOS has the potential of becoming a model of government leadership in our nation’s quest to improve the quality of health care for all Americans.

In the spring of 2003 CMS funded an evaluation of the Medicare HOS program. The evaluation encompasses three components: (1) a review of the health care quality improvement milieu at the time of the genesis of the HOS, a description of the current program, and methods for considering program changes; (2) an evaluation of the properties of the HOS instrument and its operational protocol, as well as an assessment of the use of the HOS data by M+COs, quality improvement organizations (QIOs), CMS, and health care researchers; and (3) a discussion of the alternatives for improving the HOS. This report is both a compilation of these components and an analysis of conclusions on the key issues that are raised.


1


M ed ica re H ea lth O u tcom es S u rvey P rogram E va lu a tion The first component of this report is an examination of the context and current operation of the HOS program. This component includes a detailed examination of the milieu that gave rise to the HOS as well as the process by which the measure was created, and the specific issues the measure was designed to address. Details regarding how the HOS program is administered, together with survey protocol, program participants and their roles, and program costs, are also discussed in this component of the program evaluation.

In the second component of this report, the technical properties of the HOS instrument are evaluated: (a) the reliability and validity of the SF-36®1 (a physical and mental health status measure that comprises the core of the instrument) in the context of its use as a component of the HOS instrument; (b) patterns of attrition; and (c) statistical power.

The third component then assesses policy issues related to turning the HOS data into useful information for health plans, QIOs, CMS, and health care researchers.

The first three components present a number of possibilities and alternatives. The fourth component of this evaluation survey is an examination of alternatives for improving the HOS program.

Evaluation Logic Model

To begin consideration of potential improvements to the HOS, it is necessary to first understand analytic and programmatic contexts within which the HOS data are collected and utilized. For purposes of this analysis, we summarize relevant aspects of this context with the logic model shown in Figure I-1. Of particular concern in this logic model is a set of four primary criteria: 1) costs to CMS; 2) costs to health plans; 3) costs to respondents; and 4) health outcomes of Medicare managed care enrollees. Costs to CMS, health plans, and respondents, respectively, are broadly defined to include both tangible costs (i.e., “out-of-pocket” expenditures) and intangible costs (e.g., staff or respondent time and effort).

Health outcomes are directly affected by: the extent to which consumers exercise choice of health plans; CMS oversight activities; and the extent to which health plans use the HOS results in quality and health outcomes improvement activities. In addition to directly affecting health outcomes, consumer choice and CMS oversight have indirect effects on health outcomes (through their effect on the extent to which health plans engage in quality improvement activities).

1 SF-36® is a registered trademark of the Medical Outcomes Trust.


2


M ed ica re H ea lth O u tcom es S u rvey P rogram E va lu a tion A central element of the evaluation logic model is the ability of the HOS to detect meaningful performance differences among health plans with respect to their ability to maintain or improve the health status of their Medicare beneficiaries over a 2-year period. This ability is an important determinant of the extent to which the following can be performed effectively: consumer choice among health plans; CMS oversight functions; and health plans’ use of the HOS data in quality and health outcomes improvement activities. It should be noted that the level of competition within a specific market also influences the extent to which consumers can exercise choice. The higher reimbursement rates resulting from the Medicare Prescription Drug, Improvement and Modernization Act of 2003 could lead to greater market competition by increasing the number of Medicare Advantage plans serving the Medicare managed care population.

The ability of the HOS to detect meaningful performance differences among health plans with respect to their ability to maintain or improve the health status of their Medicare beneficiaries over a 2-year period in turn depends on six technical characteristics of the HOS: 1) baseline survey response patterns; 2) patterns of attrition2; 3) patterns of missing data; 4) statistical power; 5) validity, and 6) reliability. In addition to directly affecting the ability of the HOS to detect meaningful performance differences among health plans, the first three of these technical characteristics affect statistical power.

The logic model also posits that the extent to which health plans use the HOS results in quality improvement activities also depends on: the understanding and familiarity of health plans and QIOs with the HOS data; the data’s availability; the degree of accountability brought to bear by CMS and consumers; and the resources available to QIOs and health plans for quality and health outcomes improvement activities.

2 Attrition refers to baseline survey respondents for whom a follow-up survey is not available for reasons of death, involuntary disenrollment, voluntary disenrollment, nonresponse at follow-up, or invalid follow-up surveys.


3


M ed ica re H ea lth O u tcom es S u rvey P rogram E va lu a tion Many of the variables contained in the logic model are discussed in sections III and IV of this report. Specifically, reliability, validity, patterns of attrition, and statistical power are addressed in section III. In addition, MCO and QIO familiarity with and use of the Medicare HOS results and communication tools are discussed in section IV. Other determinants of the effectiveness of the HOS, including patterns of baseline survey response, patterns of missing data, the level of competition within various market areas, and CMS oversight of MCOs, are beyond the scope of this project, but should nonetheless be kept in mind as CMS further addresses the question of what, if any, changes to the HOS are warranted.


4



5


C onsu m er ch oice

H ea lth O u tcom es

C osts to R esp on d en ts

E xten t to wh ich p lan s engage in Q I activities

C osts to P la n s

C osts to C M S

A b i l ity to d etect h igh /low p erform ing p la n s

B ase lin e su rvey resp onse ra tes

A ttrition R esp on d en t B u rd en

S ta tistica l P ower

M iss in g D a ta V a lid ity

R e lia b i l ity

U n d erstan d ing/ F am ilia rity on th e p a rt of p la n s an d

Q I O s

C M S O vers igh t

L eve l of com p etition with in

m arket

R esou rces ava i la b le to Q I O s an d p lan s for Q I activities

Figure I-1. Alternative Assessment Logic Model



II. HOS PROGRAM: ORIGINS, MILIEU, AND PROGRAM DESCRIPTION

Introduction

The Medicare HOS program was developed in an era when there was growing concern about the quality of care provided by managed care organizations. Before 1998 (when the HOS was implemented) a growing body of research had begun to focus on the quality of care in managed care settings. In particular, research found that certain vulnerable populations appeared to experience worse health outcomes in managed care relative to the traditional system (Manton et al., 1993; Miller, 1992; Shaughnessy et al., 1994; Ware et al., 1996). The most widely recognized of these studies, the Medical Outcomes Study conducted by Ware and colleagues (1996), found that both the elderly and the poor with chronic health conditions experienced worse physical health outcomes in health maintenance organizations over a 4-year period.

Research findings, work by the Office of the Inspector General (OIG), and federal legislation during the late 1980s and early 1990s spurred the development of a systematic series of activities to evaluate Medicare managed care performance, including initiatives related to quality of care. Following the Balanced Budget Act of 1997 (BBA), the Medicare program enhanced its evaluation of managed care providers by initiating a set of activities to assess both quality of care and beneficiary satisfaction. The HOS is one key component of this ongoing quality assessment. Since its implementation in 1998, the survey has provided an assessment of managed care quality focused on beneficiary health status in an outcome-oriented measure of quality.

Concerns about quality of care within the Medicare managed care program before 1997 mirrored broader national concerns related to the quality of care provided in the U.S. health care system. In the first of a series of reports focused on quality, the Institute of Medicine’s (IOM’s) National Roundtable on Health Care Quality issued the following statement in 1998: “Serious and widespread quality problems exist throughout American medicine. . . . [They] occur in small and large communities alike, in all parts of the country, and with approximately equal frequency in managed care and fee-for-service systems of care. Very large numbers of Americans are harmed as a result” (Chassin and Galvin, 1998). Subsequent reports outlined areas in which substantial change is required (IOM, 2000; IOM, 2001a, 2001b), as well as priority areas for national action (IOM, 2003).

Efforts to improve the quality of health care thus emerged in both the public and private sectors of the American health care system. In this broader effort to improve the quality of health care, the Medicare HOS program is an example of an initiative adopted by the public sector to ensure the proper and effective use of Medicare funds and to improve care (Bhatia and Blackstock, 2000).


6



The Medicare HOS is the first national survey to measure the quality of life and functional health status of Medicare beneficiaries enrolled in managed care (Cooper et al., 2001). It is a measure of a health plan’s ability to maintain or improve the physical and mental health of its Medicare beneficiaries over time.

As part of its continuing commitment to data-based quality improvement, CMS funded a review of the HOS program. In this portion of the evaluation report, the historical context that led to the development and implementation of the HOS program is explored. Also examined are the administration of the HOS program and its role and function within CMS. A brief overview is provided on the measurement of quality of care and, in particular, the transition from structural and process measures of care to the inclusion of outcomes of care.

Also discussed is the role of quality of care assessment and improvement within a broader strategy of health plan performance measurement (Lied and Kazandjian, 1999). Several national initiatives are noted. As a context for quality assessment and improvement activities within the Medicare program, a number of Medicare provider–specific quality initiatives (e.g., quality initiatives in skilled nursing facilities) are briefly described. These are contrasted to quality measurement within Medicare managed care.

Research Design and Methodology

The information for this portion of the evaluation was gathered through a review of the published health services research literature, national reports (e.g., IOM reports, Agency for Healthcare Research and Quality [AHRQ] publications), CMS documents and reports, and select in-person and telephone interviews with key current and former CMS staff members. (See Appendix I for list of interviewees). Interview questions were e-mailed or faxed prior to interviews to ensure adequate preparation by participants (Appendix I). Individual interviews lasted 1 to 1½ hours, while interviews with group partners lasted 2 hours. Interview questions focused on participants’ roles in the HOS program, the environment that led to the HOS’s development, administration of the HOS, and the uses of the HOS results.


7



Measuring Health Care Quality and Performance

Health Outcomes Measurement

The most widely recognized paradigm in the assessment of quality of care focuses on three factors associated with quality—structure, process, and outcome (Donabedian, 1988). While structural measures such as staffing ratios and medical specialty certification are the furthest removed from quality, the data to measure structural aspects of care are often the easiest to obtain. Process measures focus on actions taken in the delivery of care. On the other hand, outcome measures focus on the end result of medical care using health status measures such as mortality and quality of life. During the HOS development, there was an increased interest in measuring outcomes in health care.

The use of outcome measures in quality improvement efforts stems in part from a desire to focus on the impact of managed care on patients (Ware, 1997). W. Rogers (telephone interview, 2003) noted that one of the motivations to pursue functional status assessment was to inject more awareness of outcomes into the medical care system. Interest shifted from the focus on “treating diseases” to “treating patients.” Likewise, representatives from the Health Assessment Lab (HAL) stated there was an increased emphasis on looking at the whole person, noting the importance of physical and mental health status to individuals (personal communication, 2003).

Traditional health outcome measures have included physiologic measures as well as mortality. More recently, a variety of tools to measure additional dimensions of the importance of health and well being to individuals have been developed (Bierman et al., 2001a). As described by Bierman and colleagues, “health status measures assess the net effect of one or all health problems and treatments on multiple domains of health” (2001b). Health status measures can be generic in nature, or disease specific. Those that are generic should consider function across a number of domains as well as be applicable to various health states (2001b).


8



Linking Outcomes and Process Measurement

Although outcomes measurement has been the focus of the HOS since its early development, there has been a renewed interest in examining the interaction between patient outcomes and the health care system and/or process variables. Understanding the manner in which process and outcome measures interact provides a basis for the development of delivery systems that lead to better patient outcomes. Chassin (1997) maintains that process measures are valid only when they are linked to health outcomes and that, in order for a health outcome to be a valid quality measure, it must be linked to a process or a set of processes that can be altered to enhance the outcome. The Institute of Medicine (IOM) has called for greater attention to be given to the development of processes of care for common chronic conditions (2001a; as cited in IOM, 2003), presumably leading to improved health outcomes. In an article detailing quality improvement activities in the State Children’s Health Insurance Program (SCHIP), Halfon et al. (1999) assert that quality improvement efforts should involve linking specific management processes to measured outcomes, implementing changes to those processes, and monitoring the effects of those changes. Brook, McGlynn, and Cleary (1996) also noted that valid outcome measures needed to be associated with processes of care, as well as structural aspects of care.

Sheingold and Lied (2001) view the assessment of health outcomes as a component of a broader system of health plan performance measurement. Modifying and expanding Donabedian’s framework, they identify several measurement dimensions:

1. Structure—measures that indicate the potential that appropriate services provided by experienced providers will be available when needed by patients.

2. Process—measures that indicate the degree to which services are available and provided according to best evidence available and needs of patients.

3. Consumer experience and acceptability—the degree to which patients are satisfied with their care. 4. Outcomes—indicators of actual changes in health status and functioning.


9



Transitions in Health Care Quality Improvement Initiatives

The release of a study in 1990 by Berwick and colleagues detailing the findings of a national demonstration project on quality improvement in health care represented a shift in quality assurance activities. As described by Jencks and Wilensky (1992), these new models of quality improvement “focus on improving the processes of producing typical care rather than using inspection to correct unusual errors...these quality management models suggest that we should inspect care in order to identify patterns of errors, not to correct individual errors.” Subsequently, a number of performance measurement efforts focusing on processes and outcomes of care were implemented by organizations including: the National Committee for Quality Assurance (NCQA); the Joint Commission on Accreditation of Healthcare Organizations (JCAHO); the Foundation for Health Care Accountability; IOM; AHRQ; and CMS. In particular, as described below, CMS requires Medicare managed care organizations to participate in performance measurement activities that include the reporting of Health Plan Employer Data and Information Set (HEDIS®3) data, participation in the Medicare HOS program, and quality improvement activities through quality assessment and performance improvement projects (McIntyre et al., 2001).

Adding to the ferment about quality of care was the great uncertainty regarding the quality of care that managed care plans would provide (Ginsburg and Lesser, 1999) along with findings from the 1996 Medical Outcomes study (Ware et al., 1996). These findings showed that “among the elderly and disadvantaged, there was some solid evidence of worse health outcomes in health maintenance organizations” (Rogers, personal communication, 2003). HAL representatives (personal communication, 2003) noted that “there was a lot of publicity surrounding the 1996 Medical Outcomes Study article.” One outcome, as John Ware testified before Congress, was a “heightened interest in establishing effective mechanisms for measuring and, ultimately, ensuring quality of health care in managed care organizations.”

3 HEDIS is a registered trademark of the National Committee for Quality Assurance.


1 0



This research and the transition in thinking about care quality, which had begun before the implementation of the HOS program, have continued. Particularly influential is the work of the IOM. As discussed by Fineberg (IOM, 2003), the IOM’s quality initiative has included three phases. In the first phase, the IOM convened the National Roundtable on Health Care Quality, whose article “The Urgent Need to Improve Health Care Quality” highlighted “serious and pervasive problems in quality of care nationally” (IOM, 2003). Between 1999 and 2001, two reports were released in the second phase of the IOM’s efforts: To Err is Human and Crossing the Quality Chasm. These reports called for “drastic redesign of our health care delivery system to narrow the gap between the best clinical practices and the usual practices today” (IOM, 2003). Most recently, the IOM released its report Priority Areas for National Action—Transforming Health Care Quality (IOM, 2003). In the report, the committee identified 20 priority areas for quality improvement.

Health Care Quality Improvement at CMS

As thoughts regarding health care improvement shifted in the late 1980s and early 1990s, health care quality improvement efforts began to change throughout the various national health programs, including those that CMS administers. Early CMS quality improvement efforts involved the implementation of federal quality assurance standards in various health care facilities such as hospitals and nursing homes. Through contracts with state Peer Review Organizations (PROs—now Quality Improvement Organizations, or QIOs), CMS sought to ensure that the health care provided to Medicare beneficiaries was appropriate, was medically necessary, and met established clinical standards (Jencks and Wilensky, 1992).

In 1992, CMS redesigned its approach to enhancing quality of care for Medicare beneficiaries (Jencks and Wilensky, 1992). The Health Care Quality Improvement Program (HCQIP) was started in early 1993 with the goal of moving away from handling individual clinical errors to assisting providers in enhancing the overall care of beneficiaries (Jencks, personal communication, 2003). One major component involved PROs using clearer and more standardized national criteria to assess care and outcomes. Such efforts had the goal of producing measurable improvements in care and outcomes for Medicare beneficiaries.


1 1



Several CMS quality measures have emerged, including the Medicare HOS. These measures are designed for specific health care settings such as skilled nursing facilities, home health agencies, hospitals, and managed care organizations. Although dates of implementation have varied, the development of many of these measures was on a parallel track throughout the 1990s. According to Kang (personal communication, 2003), End Stage Renal Disease (ESRD) and the Minimum Data Set “were up and running on a parallel track along with HOS during the developmental stages. Home health and the Outcome and Assessment Information Set (OASIS) came later, but the OASIS system was consistent with the approach followed by HOS.” Building from this developmental work, CMS’s Quality Initiative was instituted in November 2002 for nursing homes and expanded to home health agencies and hospitals in 2003.

A fee-for-service version of the HOS was pilot tested in 1998. This pilot was part of CMS’s larger goal to develop performance measures across health care providers and settings. Additionally, the BBA required providing information to beneficiaries comparing the effectiveness of care between Medicare fee-for-service and coordinated care plans. The primary goals of the HOS fee-for-service project were to: 1) study the feasibility and applicability of developing managed care-like HEDIS performance measures for fee-for-service physicians in group practices; and 2) use longitudinal estimates of self-reported health status for evaluating care provided to Medicare fee-for-service beneficiaries. Results of the pilot suggested that health status scores were not appropriate for identifying high and low performing physicians. However, the results of the pilot helped further CMS’s efforts to measure quality in the fee-for-service sector (www.cms.hhs.gov/surveys/hos).

An outgrowth of the HOS fee-for-service pilot is the Physician Focused Quality Initiative. This initiative includes several demonstration projects, including the Doctor’s Office Quality (DOQ) Project and the Doctor’s Office Quality Information Technology (DOQ-IT) Project. For example, through the DOQ Project, CMS is working with stakeholders to develop and test the Doctor’s Office Quality measurement set. This measurement set has three components: 1) a clinical performance measurement set; 2) a practice system assessment survey; and 3) a patient experience of care survey (www.cms.hhs.gov/quality/pfqi.asp).


1 2



Performance Measurement in Medicare Managed Care

The BBA also required that the Medicare program enhance its evaluation of M+CO organizations. This legislation initiated a set of activities to assess both quality of care and beneficiary satisfaction. Activities related to quality assurance reflect the previously described movement toward continuous quality improvement. As stated previously, CMS requires M+COs to participate in performance measurement activities that include the reporting of HEDIS data, quality improvement activities through quality assessment and performance improvement (QAPI) projects, and participation in the Medicare HOS program, as well as participation in the Consumer Assessment of Health Plans Study (CAHPS®4). Thus, HOS is a key component of this broader Medicare managed care performance measurement system. Before reviewing the development of the HOS program, we briefly describe the other components of CMS’s performance measurement system in managed care.

Medicare HEDIS

The NCQA develops and maintains HEDIS, guided by its Committee on Performance Measurement (CPM), which includes purchasers, consumers, managed care organizations, policy makers, and providers (NCQA, 2003b). HEDIS is the most widely used set of performance measures in managed care. NCQA’s CPM is supported by a group of technical experts who guide and oversee the scientific development of HEDIS, which includes a total of 52 measures covering eight domains of care. Public purchasers, regulators, and consumers use HEDIS for improving quality and developing consumer report cards. Additionally, HEDIS is useful to health management systems focused on plan accountability (NCQA, 2003b).

In 1996, CMS contracted with NCQA to develop HEDIS measures for assessing performance in the Medicare program. Some measures, such as breast cancer screening, are common measures across the HEDIS measurement sets for Medicare, Medicaid, and commercial plans. Others, such as the provision of flu shots for older adults, are limited to the Medicare HEDIS. The 2004 Medicare HEDIS measurement set comprises 33 measures covering five domains: 1) Effectiveness of Care; 2) Access/Availability of Care; 3) Health Plan Stability; 4) Health Plan Descriptive Information; and 5) Use of Services. The Medicare HOS is an Effectiveness of Care measure (NCQA, 2004).

4 CAHPS is a registered trademark of the Agency for Healthcare Research and Quality (AHRQ).


1 3



Medicare CAHPS

The Medicare CAHPS effort consists of three components: 1) the CAHPS survey of managed care enrollees; 2) a managed care disenrollment survey; and 3) a Medicare fee-for-service survey. The Medicare CAHPS survey was an extension of a survey developed for use with any population. In 1996, CMS funded a CAHPS consortium to develop a survey specifically for enrollees in Medicare managed care plans to assess their health plan experience. The Medicare Managed Care CAHPS draws core survey items from the original general population survey and adds items, queried only of Medicare enrollees, regarding ease of obtaining needed medical equipment, therapy services (e.g., physical therapy) and advice related to quitting smoking (Goldstein et al., 2001). The Medicare managed care CAHPS survey is conducted annually.

Quality Assessment and Performance Improvement (QAPI)

The Quality Assessment and Performance Improvement (QAPI) program was also developed in response to the BBA. The QAPI was designed to ensure that M+COs emphasized quality assurance initiatives and actively created programs that improved health outcomes and the satisfaction of Medicare beneficiaries enrolled in health plans. Through the QAPI program, M+COs must provide evidence that there is a continuous quality assessment and improvement program in their organization. Additionally, ongoing evaluations are conducted on the program to ensure that the program is effective and that necessary modifications are made when appropriate. QAPI projects have included the National Diabetes Project, the National Pneumonia Project, the National Project on Congestive Heart Failure, and the National Breast Cancer Screening Project. The most recent projects have included the Clinical Health Care Disparities and Culturally and Linguistically Appropriate Services in 2003, and a return to Diabetes in 2004 (www.cms.hhs.gov/healthplans/quality).

The Medicare Health Outcomes Survey (HOS) Program

Early Development

The development of the HOS in the 1990s reflected a synergism of factors including: 1) a recognized need to monitor the performance of managed care plans; 2) burgeoning technical expertise and advancement in the areas of quality measurement and health outcomes assessment; 3) the existence of a tested functional health status assessment tool (SF-36), which was valid for an elderly population; 4) CMS leadership; and 5) political interest in quality improvement.


1 4



Following a national trend, by the mid-1990s, Medicare beneficiaries were joining health maintenance organizations in significant numbers (Haffer et al., 2003). However, there was uncertainty regarding the quality of care that managed care plans would provide (Ginsburg and Lesser, 1999).

While many process and clinical care performance measures existed in the 1990s, noticeably lacking from them were appropriate measures of health outcomes. Jencks (personal communication, 2003) indicated that “CMS needed to develop an appropriate health outcomes measure because one did not already exist and there was not likely to be one developed by someone other than CMS in the near term.” Stevic (personal communication, 2003) noted that, “coming from a quality measurement background, there were no patient-based measures, no measures of function, no patient feedback.” Due to this need for more patient-based outcomes measures, several projects in CMS and outside of CMS were launched. For example, in 1990, CMS conducted a pilot project to determine if it was feasible to have a comprehensive dataset on Medicare that included information similar to SF-36 data gathered at the time persons signed up for Medicare (Stevic, personal communication, 2003).

Similarly, the Department of Veterans Affairs (VA) SF-36 version was developed and piloted shortly afterwards in 1993 (Kazis, personal communication, 2003). During this time, there was increasing financial support within the VA to conduct patient-centered assessments. As a result, key individuals with the VA began to adopt measures for quality improvement activities (Kazis, personal communication, 2003).

Building on the work previously described, in 1996, CMS partnered with NCQA to develop a functional health status assessment tool to be used as a health plan performance measure for Medicare. The Medicare HOS was the resulting instrument. According to Kang (personal communication, 2003), functional health status was selected as CMS’s outcome measure of choice because it was of greatest interest to Medicare beneficiaries. Paul (personal communication, 2003) stated that the decision to use functional health status was the right one, reflecting the agency’s strategic objective of being patient-centered. Additionally, according to Bierman, Haffer, and Wang (2001a) there was evidence that complex issues such as chronic illness, comorbidity, and functional impairment are particular problems among older individuals and present distinct challenges to this population. CMS believed that a functional status measure was an important outcome of care, because of the complexities involved in the health of older adults and because of the growing concern about how managed care would affect these conditions and this population’s subsequent functional status (Bierman, Haffer, & Wang, 2001a).


1 5



Although the planning and development of HOS began in 1996, it was the 1997 Balanced Budget Act that gave HOS a clear mandate. Among the BBA’s requirements was the establishment of quality requirements for health plans enrolling Medicare and Medicaid beneficiaries. The Act authorized performance measurement reporting requirements for Medicare managed care to promote quality improvement. These statutorily mandated provisions were implemented in the regulations at CFR 417.470 et seq. They reference 42 CFR 417.126(a), which states that each contracted managed care plan must have an effective procedure to develop, compile, evaluate, and report to CMS, to its enrollees, and to the general public, developments in the health status of its enrollees to the extent practical.

CMS established a technical expert panel (TEP) to guide the development, implementation, and operations of HOS (www.cms.hhs.gov/surveys/hos). TEP membership has always consisted of individuals with specific expertise in the health care industry and outcomes measurement. (See Appendix II for a list of CMS leadership and technical experts central to the HOS’s development.)

The TEP recommended that the HOS become a component of HEDIS and that the SF-36 be used as the foundation of the HOS instrument. As previously noted, the SF-36 had been used in the Medical Outcomes Study (Ware et al., 1996) and also had a history of use in estimating relative disease burden for numerous conditions (NCQA, 2003b). The SF-36 has been widely used in the general population, as well as specific populations such as the elderly and younger people with disabilities (NCQA, 2003b). The instrument also had been used to determine the effectiveness of various treatments such as hip and knee replacement and heart valve surgery (Kantz et al., 1992; Phillips and Lanka, 1992), as well as the burden of various illnesses on populations (Turner-Bowker et al., 2002).

Commenting on the selection of the SF-36 for the HOS, HAL representatives (personal communication, 2003) noted that the SF-36 was the most widely used health status measure at the time of HOS’s development and was found to be reliable and valid for use with older individuals and other demographic groups. The SF-36 had good psychometric properties and had been peer-reviewed. Additionally, both a user’s manual and normed SF-36 data were available.


1 6



Although the SF-36 was the most widely used generic health status measure at the time of developing the HOS instrument, debate existed about its use with older individuals (Hayes et al., 1995; Johansen, 1993; Lyons et al., 1993). As discussed by McHorney (1996), there are “practical and psychometric issues that can differentially affect the performance of generic health status tools in cross-sectional and longitudinal studies of elderly persons, including mode-of-administration effects, floor and ceiling effects, and score stability versus internal consistency.” Two additional issues are relevant for longitudinal studies: (1) selective mortality; and (2) proxy reports (McHorney, 1996). Bierman and colleagues (2001a) raised similar concerns related to administration and survey response. For example, while the SF-36 takes approximately ten minutes to complete, it likely takes longer for participants with less than a high school education (McHorney, 1996), older participants (Sherbourne and Meridith, 1992) and participants with cognitive impairments (McHorney et al., 1990). The effect these differences may produce is unclear (Cooper and Kohlmann, 2001) and will be important to address in the broader HOS evaluation.

One of the key factors facilitating the HOS’s development was the leadership of CMS officials and collaborators. Several individuals emphasized the importance of having support and the necessary tools available within CMS, and in the greater health care community, to nurture the development of HOS (HAL, personal communication, 2003; Jencks, personal communication, 2003; Kang, personal communication, 2003; Stevic, personal communication, 2003). Jencks (personal communication, 2003) noted that there was a great level of enthusiasm from a number of people who really wanted to see the HOS program succeed. Kang (personal communication, 2003) perceived a high level of support for the HOS within NCQA’s CPM. In the context of developing Medicare HEDIS, the majority of CPM members wanted to expand and include an outcome measure. According to Stevic (personal communication, 2003), the necessary technical expertise and interest in outcomes was evident early in the process.

CMS leadership felt that an emphasis on the use of outcomes-oriented performance data would be consistent with the goals of value-based purchasing (Sheingold and Lied, 2001). During the Clinton administration, the agency thought of itself as a beneficiary-centered, value-based purchaser of services (Haffer, personal communication, 2003). The agency sought to assess the value that it was receiving for the services it was purchasing on behalf of the beneficiaries. Performance measures provided CMS a means both to examine what was being spent on care and to compare plans based on performance.

Broader political support was also important to the implementation of the HOS program. In addition to many aspects of the BBA previously noted, the Act authorized performance measurement reporting requirements for Medicare managed care to promote quality improvement. Both the Office of Inspector General (1997) and Bailit (1997a, b) released reports encouraging CMS to adopt a performance assessment model that emphasized the use of outcome-oriented data.


1 7



HOS Program Goals

Goals for a program that spans a number of years (as the HOS does) tend to evolve. It is, therefore. probably most worthwhile to note the current statement of goals (Haffer and Bowen, personal communication, 2004), because program success tends to be measured by how current goals are met with only minor reflection on past objectives. In the next two components, we examine if these goals are being met. Haffer and Bowen (2004) state that “the goal of the HOS program has been to gather valid, reliable, and clinically meaningful data that can be used by:

1. M+COs, providers, and QIOs to monitor and improve health care quality 2. CMS to assess the performance of M+COs and reward high performers 3. Medicare beneficiaries, their families, and advocates when making health care purchasing

decisions 4. Health researchers to advance the state-of-the-science in functional health outcomes

measurement, and quality improvement interventions and strategies.”

Instrument

The HOS instrument consists of three primary components: 1) the SF-36; 2) case-mix and risk-adjustment questions, including activities of daily living (ADL) and chronic health conditions; and 3) demographic and other questions required by the BBA (NCQA, 2003b). The SF-36, as previously noted, is a multipurpose short-form general health survey, which provides eight scale scores of physical and mental health properties, as well as two aggregated summary measures of physical and mental health status, the Physical Component Summary (PCS) score, and the Mental Component Summary (MCS) score. The eight scales represent the most frequently used measures of physical and mental health and are described in Table II-1.


1 8



Table II-1. SF-36 scale measures.

Scale Measures Questions

P h ysica l C om p onen t S u m m ary ( P C S score) S u m m ary m easu re wh ich in clu d es: P F , R P , B P , V T , S F , R E , M H , a n d G H .

P h ys ica l F u n ction in g ( P F ) T en qu estion s ask resp on d en ts to in d ica te th e exten t to wh ich th e ir h ea lth lim its th em in p erform ing p h ysica l activities .

R ole — P h ys ica l ( R P ) F ou r qu estion s assess wh eth er resp on d en ts’ p h ysica l h ea lth lim its th em in th e k in d of work or u su a l activities th ey p erform .

B od i ly P a in (B P ) Two qu estion s d eterm in e frequ ency of p a in an d exten t to wh ich p a in in terferes with norm a l activities .

G enera l H ea lth (G H ) F ive qu estion s ask resp on d en ts to ra te th e ir cu rren t h ea lth sta tu s overa ll , th e ir su scep tib i l ity to i l ln ess, a n d th e ir exp ectation s for h ea lth in th e fu tu re .

M en ta l C om p onen t S u m m ary (M C S score ) S u m m ary m easu re wh ich in clu d es: V T , M H , R E , P F , R P , B P , G H , a n d S F .

V ita lity (V T ) F ou r qu estion s ask resp on d en ts to ra te th e ir we ll-b e in g b y in d ica tin g h ow frequ en tly th ey exp erien ce energy an d fa tigu e .

M en ta l H ea lth (M H ) F ive qu estion s ask resp on d en ts h ow frequ en tly th ey exp erien ce fee lin gs rep resen tin g 4 m a jor m en ta l h ea lth d im en sion s.

R ole — E m otion a l ( R E )

T h ree qu estion s assess wh eth er em otiona l p rob lem s h ave ca u sed resp on d en ts to accom p lish less in th e ir work or oth er u su a l activities in term s of tim e an d p erform ance .

S ocia l F u n ction in g (S F ) Two qu estion s ask resp on d en ts to in d ica te lim ita tion s in socia l fu n ction in g d u e to h ea lth .

(S ou rce : W are an d K osin sk i , 2 001 )

The responses to the SF-36 are combined with different weights based on standardized 1998 norms to produce PCS and MCS scores (Ware et al., 2003; Ware and Kosinski, 2001). Because each beneficiary is measured twice, at baseline and again at follow-up, the beneficiary serves as his/her own control. Given that not all factors that could affect the change in a beneficiary’s health status are constant, plan-to-plan comparisons of health outcomes are case-mix and risk adjusted for socio-demographic and clinical characteristics, including chronic conditions, age, ethnicity, gender, and education. Using scoring algorithms consisting of logistic regression models for risk adjustment and missing data, changes in individual-level two-year physical and mental health status are determined as: better than expected, same as expected, or worse than expected. The risk-adjusted outcomes are aggregated across beneficiaries for each plan to provide a plan-specific performance assessment (NCQA, 2003b).


1 9



To assess M+CO performance, a four-stage method of data analysis is employed. In the first stage, each beneficiary is classified as to whether his or her PCS and MCS 2-year change scores, respectively, are “worse than expected,” “the same as expected,” or “better than expected.” In the second stage, each beneficiary is classified according to whether his or her follow-up scores differ more than would be “expected” by chance. Then, the expected probability of being better or worse is estimated using statistical models. The third and fourth stages calculate plan-level results. In particular, in the third stage, the mean expected rate of various categories, such as “alive and PCS score same or better,” as well as the actual rate of those categories, is calculated. The differences between the corresponding actual and expected rates are then calculated. In the final stage, the statistical significance of plan-level differences is assessed. The variables on which statistical significance is assessed are “alive and PCS score same/better” and “MCS score same/better.” They were determined in advance as the key outcomes of interest (NCQA, 2003a).

HOS Design and Survey Methodology

HOS is a longitudinal cohort survey administered to a nonstratified random sample of 1,000 Medicare beneficiaries enrolled in M+COs from each applicable Medicare contract-market at baseline (NCQA, 2003b). Beneficiaries are deemed eligible if they were continuously enrolled for at least six months in the same plan and do not have ESRD. In plans with fewer than 1,000 Medicare beneficiaries, all eligible enrollees are surveyed. The survey is administered at baseline, and again two years later to those who remain enrolled in their same health plan; a new baseline cohort is chosen each year (Cooper et al., 2001). The surveys are mailed to selected participants, with a series of reminder postcards and telephone contacts used to increase the survey response rate. Since 1998, seven baseline and five follow-up surveys have been fielded.

HOS Partners

CMS, responsible for leadership, oversight, coordination, and successful implementation of the national Medicare HOS Program, contracted with organizations that had skills and experience in the areas of health outcomes and performance measurement. The team of HOS partners and their responsibilities are described below in Table II-2.


2 0



Table II-2. HOS partners: Roles and responsibilities.

HOS Partners Role & Responsibilities

H ea lth A ssessm en t L a b (H A L ) & Q u a lityM etric (Q M ) U n d er su b con tract with N C Q A , H A L an d Q M sta ff an d consu ltan ts h ave colla b ora ted in d eve lop in g th e H O S su rvey form , d esign in g th e H O S case-m ix a d ju stm en t m eth od ology , stu d y in g th e p sych om etric p rop erties of th e H O S su rvey , a n d tran s la tin g th e H O S form in to S p an ish an d C h in ese .

H ea lth S ervices A d visory G rou p (H S A G ) U n d er con tract with C M S , H S A G p erform s H O S d a ta clean in g an d an a lys is , d eve lop s an d d issem in a tes d a ta fi les an d rep orts, ed u cates d a ta u sers an d stakeh old ers on th e H O S fin d in gs an d a p p lica tion s, a n d con d u cts a p p lied resea rch with th e H O S d a ta to su p p ort C M S p riorities .

B oston U n iversity/H ea lth O u tcom es T ech nologies P rogram (H O T )

U n d er su b contract with N C Q A , B oston U n iversity com p ares h ea lth ou tcom es b etween M ed icare m anaged care u sing th e H O S d ata an d th e V eteran 's H ea lth A d m in istration u sing d ata from V eteran 's versions of th e S F -3 6® (V eterans S F -3 6 ) an d S F -1 2 ® 5 (V eterans S F -1 2 ) h ea lth su rveys. T h e ana lyses inclu d e p sych om etric com p arisons of th e S F -3 6 b etween th e H O S an d V A an d com p arisons of d isease b u rd en of p atien ts seen in M ed icare m anaged care an d veterans seen in th e V A system of care .

N a tion a l C om m ittee for Q u a lity A ssu rance (N C Q A ) U n d er con tract with C M S , N C Q A im p lem en ts th e H E D I S M ed ica re H O S , wh ich in clu d es m anaging th e d a ta collection an d tran sm itta l of th e H O S , su p p ortin g th e d eve lop m en t an d stan d a rd iza tion of th e H O S m easu re , a n n u a lly certify in g an d eva lu a tin g H O S ven d ors, a n d con d u ctin g ongoing qu a lity a ssu rance of th e su rvey p rocess.

R esea rch T ria ngle I n stitu te (R T I ) I n tern ation a l U n d er su b con tract with N C Q A , R T I I n tern ation a l is in volved in th e sam p le se lection for each rou n d of th e M ed ica re H O S , d eve lop m en t, fie ld in g an d an a lys is of th e H O S for u se in sp ecia l p la n s th a t ta rget fra i l b eneficia ries , th e d eve lop m en t of fra i lty a d ju sters for p a ym en t u s in g th e H O S d a ta , a n d th e ca lib ra tion of M ed ica re costs associa ted with th e H O S m easu res. R T I a lso p i loted th e H O S in th e fee-for-service sam p les for com p arison s to m anaged ca re .

(S ou rce : www.cm s. h h s . gov/su rveys/h os)

5 SF-36® and SF-12® are registered trademarks of the Medical Outcomes Trust.


2 1



Program Administration

CMS and its partners perform the following tasks as part of the HOS program: 1) support the technical/scientific development of the HOS measure; 2) certify survey vendors; 3) collect HEDIS HOS data; 4) clean, score, and disseminate annual rounds of the HOS data, public use files and reports to CMS, QIOs, M+COs, and other stakeholders; 5) train M+COs and QIOs in the use of functional status measures and best practices for improving care; 6) provide technical assistance to CMS, QIOs, M+COs and other data users; and 7) conduct analyses using the HOS data to support CMS and the Department of Health and Human Services (DHHS) priorities (www.cms.hhs.gov/surveys/hos).

Plans contract with an NCQA-certified vendor to conduct the survey once the sample has been selected by the Research Triangle Institute (RTI) International and approved by CMS. Vendors receive HOS survey administration training annually from NCQA. Vendors then administer the surveys in accordance with the applicable NCQA HEDIS protocol. Once the survey data have been collected, they are submitted to NCQA for consistency review. The data are then submitted to the Health Services Advisory Group (HSAG) for cleaning. CMS scores the data and then returns the data to HSAG for aggregation, and analysis. Following this process, HSAG develops and disseminates data files and reports to CMS, QIOs, and the health plans (NCQA, 2003b).

HOS Costs

Since the inception of the Medicare HOS program in 1998, CMS has, on average, spent more than $2.1 million per year to fund the HOS program. The HOS is funded from the QIO program budget (Bowen, personal communication, 2004b). CMS administers two contracts to facilitate the administration of the HOS program. CMS has a contract with HSAG for data cleaning, analysis, dissemination, technical support, education, and applied research. The HSAG contract has averaged $1,077,000 per year (Bowen, personal communication, 2004b). In addition, CMS contracts with NCQA for the implementation the HEDIS Medicare HOS measure, which includes sample selection, management of the data collection and transmittal process, continued development and standardization of the HOS measure, annual certification and evaluation of HOS vendors, and conducting ongoing quality assurance of the survey process. The NCQA contract has been $750,000 each year since 1998 (Bowen, personal communication, 2004b). In addition to the contract costs, CMS pays salaries and benefits, on average, for 2.2 GS 13/14-level personnel, with an average annual cost of $219,000 per year (Bowen, personal communication, 2004b).


2 2



Health plans are responsible for the costs associated with fielding the HOS survey. On average, fielding the surveys (baseline and follow-up) costs each plan no more than $21,000 annually (Bowen, personal communication, 2003). It also should be noted that NCQA charges the HOS vendors to obtain their certification (New Application Fee = $500, Annual Training Seminar Fee = $3,000, and Sample Fee = $550) (NCQA, personal communication, 2004a). Further, NCQA charges for the HEDIS Volume 6 manuals ($90/hard copy) (Bowen, personal communication, 2003.). Finally, because of arrangements that CMS has with QualityMetric, the proprietary nature of the SF-36 Version 1 does not result in additional program costs for the use of the SF-36 and its scoring algorithms. However, the cost of instrument use and scoring is a critical consideration in the assessment of alternatives to the SF-36 Version 1.

On average, the HOS cost to Medicare beneficiaries is approximately 20 minutes in time (Bowen, personal communication, 2003). Although not excessive by research standards, it may be somewhat burdensome to the HOS respondents because of the physical and mental limitations associated with this older population. Study results have shown that elderly persons, individuals with less than a high school education, and individuals with cognitive impairments need more time to complete general health surveys and have higher levels of nonresponse, selective mortality (attrition), and proxy reporting (McHorney, 1996; McHorney et al., 1990; Sherbourne and Meridith, 1992). In our statistical analysis of attrition, discussed later, we found that follow-up nonrespondents were more likely to be in poorer physical and mental health. In addition, those with less than a high school education and poorer health status were more prone to attrition, as stated above. CMS also has received feedback from some Medicare beneficiaries indicating that the HOS is too long.


2 3



Summary

Development of the HOS program can be attributed to several factors. Increased enrollment in managed care, coupled with research findings in the 1990s, led to heightened concern with the quality of care provided in these settings. The technical expertise in the areas of performance assessment, quality improvement, and health outcomes measurement, developed in part through research studies, was available to CMS. CMS leadership was also critical, as was broader political support through the requirements of the BBA. The approach to measuring quality in the HOS, with its focus on health outcomes, reflected a more general movement toward assessing participant’s health outcomes to ascertain quality. The system of Medicare health plan performance, of which HOS is a central component, mirrored broader trends in quality assessment toward continuous quality improvement. This trend was reflected both in a variety of CMS and private sector quality initiatives.

The HOS measure itself is an outgrowth of this combination of trends. CMS worked with the leading experts in quality measurement, using the most current of the evolving thinking about how health quality should be measured. The measure is administered by survey vendors selected (and supervised) by one of the leading health quality assurance organizations.

In 1998, the first HOS baseline survey was conducted. To date, seven baseline and five follow-up surveys have been fielded (although much of the analysis in the program evaluation has been done on the five baseline and three follow-up surveys available when the evaluation project began). Data from these surveys have provided plans with the opportunity to monitor the health status of their enrollees, as well as to develop strategies to improve care. The HOS data have been combined with additional plan data such as HEDIS measures and disenrollment rates to monitor plan performance. HOS data have been used to develop payment methodologies and have supported a broad array of research studies.

There are HOS program cost implications for CMS, M+COs, and beneficiaries. Since the inception of the Medicare HOS program in 1998, CMS has, on average, spent more than $2.1 million per year to fund HOS. The cost to M+COs has been, on average, $21,000 annually for fielding the surveys (baseline and follow-up). NCQA charges the HOS vendors fees, and these fees are passed through to M+COs. The cost of the HOS to Medicare beneficiaries is, on average, approximately 20 minutes.


2 4



III. HEALTH OUTCOMES SURVEY TECHNICAL PROPERTIES

Introduction

Every performance measure has a quantifiable set of properties: reliability, validity, attrition, power, and minimal detectable effects. It is the nature of these qualities in relation to the program goals that determines whether a given measure supplies worthwhile information; it is these qualities on which a measure is judged. Unless the quantifiable properties of the measure meet certain standards, then the measure cannot provide useful information to potential users. The overall HOS goals (as stated previously) are to provide valid and reliable information for M+COs, providers, CMS, beneficiaries, and health researchers to draw on to meet their quality improvement, performance assessment, or research objectives.

Many of the HOS end users (CMS and M+COs in particular) use the HOS data to assess the extent to which M+COs improve health care quality and, in particular, maintain or improve physical and mental health of their Medicare beneficiaries over time. The ability of the HOS to detect meaningful differences among plans with respect to their ability to maintain or improve the health status of their Medicare beneficiaries is dependent upon several characteristics, including: 1) the reliability and validity of the SF-36 in the context of its use as a component of the HOS instrument; 2) patterns of attrition; and 3) statistical power and minimal detectable effects.

In this second component of the evaluation study, we evaluate the measurable properties of the HOS and consider implications of our analysis for the HOS program goals. Information is presented regarding the instrument's reliability, validity, attrition, power and minimal detectable effects. Data were available for our analyses from the first five cohorts (baseline data Cohorts 1–5 and follow-up data Cohorts 1–3). Normed data from 1998 using the half-scale rule for imputation of missing items were used in our analyses.


2 5



Using the SF-36 for Medicare Managed Care Enrollees

Critical to the performance of any measure are two criteria: reliability and validity. The reliability and validity of the SF-36 have been well established for adult populations (Gandek et al., 1998; Turner-Bowker et al., 2002; Ware et al., 2003; Ware and Kosinski, 2001; Ware and Sherbourne, 1992; see also NCQA, 2003 for a review). Studies have also examined the instrument’s performance in an older population (Anderson et al., 1996; Andresen et al., 1996; Andresen et al., 1998; Dexter et al., 1996; Gandek et al., 2004; Hayes et al., 1995; Hill et al., 1996; McHorney, 1996; McHorney et al., 1994; O’Mahoney et al., 1998; Reuben et al., 1995; Seymour et al., 2001; Stadnyk et al., 1998; Weinberger et al., 1991). In this section, we analyze the HOS data to assess the reliability and validity of the SF-36 as it is used in the HOS instrument, starting with reliability. We first present the analytic methods and findings related to our assessment of reliability in the Medicare HOS. After that, we address the validity of the SF-36.

Reliability of the SF-36 in Older Populations

Measures of reliability are concerned with reproducibility and consistency of measurement scores. Evaluating reliability consists of estimating how much of the observed variation in scores is due to true differences, and how much is due to chance (Selltiz et al., 1976). Although several measures of reliability are available (e.g., test–retest reliability estimated by correlating scores), most studies use Cronbach’s alpha coefficient, a measure of internal consistency reliability (Nunnally and Bernstein, 1994, p. 234). Generally accepted minimum levels of reliability range from values >0.90 for comparisons across individuals (one person compared to another person) to 0.70 for comparisons across groups (one group compared to another group, such as African American as compared to Hispanic) (Nunnally and Bernstein, 1994, p. 265).


2 6



Prior research has examined the reliability of the SF-36 in an older population. Reliability has been examined in samples of community-dwelling elderly (Andresen et al., 1996; Andresen et al., 1998; Gandek et al., 2004; McHorney et al., 1994; Reuben et al., 1995; Seymour et al., 2001); older individuals who are stroke survivors (Anderson et al., 1996); outpatients at risk for acute deterioration (Dexter et al., 1996); frail older individuals (Stadnyk et al., 1998); and older individuals with cognitive impairments (Seymour et al., 2001). Among community-dwelling elderly, Gandek et al. (2004) reported a reliability estimate of PCS and MCS scores as 0.94 and 0.89, respectively. Andresen et al. (1998) reported a reliability estimate of 0.90 and 0.69, respectively. Reliability estimates for scale scores have also been reported. Reliability estimates exceeded the minimum level of 0.70 for group comparisons in Andresen et al. (1996); Andresen et al. (1998), Gandek et al. (2004); and Reuben et al. (1995). For example, Andresen et al. (1996) examined the reliability of the eight scale scores in a sample of community-dwelling older individuals affiliated with one of two primary care clinics. Reliability estimates based on Cronbach’s alpha coefficients exceeded 0.80 for all scales. Estimates by Gandek et al. (2004) ranged from 0.83 for General Health and Mental Health to 0.93 for Physical Functioning, again exceeding the minimum level of 0.70 for group comparisons.

Among individuals at risk for acute deterioration, Cronbach’s alpha coefficients exceeded 0.70 for all scales, ranging from 0.74 for General Health to 0.90 for Physical Functioning (Dexter et al., 1996). Reliability estimates for scale scores of individuals who were stroke survivors exceeded the minimum level for group comparisons of 0.70 for seven of eight scales, the estimate of Vitality being the exception (Anderson et al., 1996). Seymour and colleagues (2001) studied the reliability of the SF-36 scale scores in a sample of older patients with physical disabilities. For the cognitively normal group, a Cronbach’s alpha coefficient of 0.70 or higher was obtained for six of the eight dimensions; Cronbach’s alpha coefficient was below 0.70 for Role Physical and Vitality. For the cognitively impaired group, five of the eight dimensions were associated with a Cronbach’s alpha coefficient of 0.70 or higher. The three scales for which the Cronbach’s alpha coefficient was below 0.70 were General Health, Social Functioning, and Mental Health.

Stadnyk et al. (1998) assessed the reliability of the eight scale scores in a sample of “frail” individuals. Reliability estimates of internal consistency exceeded the minimum of 0.70 for all scales. Test–retest reliability was lower. Three of eight scale estimates—Vitality, Social Functioning, and Role—Emotional—did not meet the minimum criterion of 0.70 for group comparisons.


2 7



Analytic Approach for HOS Reliability Assessment

A standard approach was used for assessing reliability. We first estimated the reliability of each of the eight scale scores for respondents of Cohorts 1 through 5 at baseline and 1 through 3 at follow-up by computing Cronbach’s alpha coefficients (Nunnally and Bernstein, 1994, p. 234) . We then examined the reliability of the two SF-36 summary measures (PCS score and MCS score) for respondents of Cohorts 1 through 5 at baseline and 1 through 3 at follow-up. We then examined Cronbach’s alpha coefficients for the summary and scale scores by various respondent demographic characteristics, respondent type (e.g., self, proxy), survey mode (e.g., mail), and vendor. We discuss whether the minimum criterion for group (0.70) level comparisons is met. As discussed previously, one purpose of HOS is to provide plan level estimates, for which the group level criterion is the criterion of interest.

Our measure of reliability for each of the two summary measures was computed as follows (Nunnally and Bernstein, 1994, p. 271):

2

8

1

228

1

22

1Y

iiiii

iii

YY

wrwr

σ

σσ ∑∑==

−−=

where: rYY is the measure of reliability for the summary measure being considered (i.e., PCS score or MCS score);

=the variance of the summary measure being considered; 2Yσ

=the variance of the i2iσ th SF-36 scale score;

rii=the reliability of the ith SF-36 scale score;

wi=the factor score coefficient by which the ith SF-36 scale score is multiplied when computing the summary measure being considered


2 8



Results Related to HOS Reliability

Table III-1 provides a description of respondents for the baseline and follow-up surveys for Cohorts 1 through 3, followed by characteristics of baseline respondents for Cohorts 4 and 5. Baseline respondents for Cohort 1 were predominantly Caucasian (88.1%), female (56.8%), with a high school education (34.0%) or additional years of education beyond high school (33.7%). Thirty-nine percent of respondents reported incomes under $20,000. Most (85.3%) respondents completed the survey themselves, and most (87.2%) completed the mailed version of the survey. Baseline characteristics are similar across cohorts. Respondent characteristics vary somewhat between baseline and follow-up surveys. This is the focus of the analysis of sample attrition, which is discussed later in this component of this evaluation study.

Sample sizes for Tables III-2 through III-7 are based on those reflected in Table III-1. Although some missing data exist in Tables III-2 through III-7, sample sizes are sufficient to support our analyses, with the exception of subgroup analyses for Native Americans. Due to the small sample size of these respondents, related analyses may be somewhat unstable.


2 9



Table III-1. Description of survey respondents.

Cohort 1 Cohort 2 Cohort 3 Cohort 4 Cohort 5 B F B F B F B B

Number of respondents

1 6 7 , 09 6 ( 1 00%)

7 5 , 3 6 5 ( 1 00%)

1 9 4 , 3 7 8 ( 1 00%)

7 2 , 5 1 7 ( 1 00%)

2 08 , 6 55 ( 1 00%)

6 8 , 03 2 ( 1 00%)

1 2 5 , 4 02 ( 1 00%)

1 09 , 8 53 ( 1 00%)

Demographic Age < 6 5 1 0, 3 6 0

(6 . 2 %) 3 , 54 2

(4 . 7 %) 1 3 , 6 06 ( 7 . 0%)

3 , 553 (4 . 9%)

1 3 , 1 4 5 (6 . 3%)

3 , 1 2 9 (4 . 6%)

8 , 02 6 (6 . 4 %)

7 , 1 4 0 (6 . 5%)

6 5-6 9 50, 7 9 7 (3 0. 4 %)

1 4 , 9 2 2 ( 1 9 . 8 %)

56 , 56 4 (2 9 . 1 %)

1 1 , 7 4 8 ( 1 6 . 2 %)

56 , 7 54 (2 7 . 2 %)

1 0, 2 05 ( 1 5 . 0%)

3 0, 8 4 9 (2 4 . 6%)

2 5 , 04 6 (2 2 . 8 %)

7 0-7 4 4 4 , 2 8 0 (2 6 . 5%)

2 2 , 3 8 3 (2 9 . 7 %)

5 1 , 3 1 6 (2 6 . 4 %)

2 1 , 9 00 (3 0. 2 %)

55 , 2 9 4 (2 6 . 5%)

1 9 , 9 3 3 (2 9 . 3%)

3 3 , 608 (2 6 . 8 %)

2 9 , 001 (2 6 . 4 %)

7 5-7 9 3 2 , 4 1 7 ( 1 9 . 4 %)

1 7 , 4 8 5 (2 3 . 2 %)

3 8 , 09 8 ( 1 9 . 6%)

1 7 , 54 9 (2 4 . 2 %)

4 2 , 56 6 (2 0. 4 %)

1 6 , 7 3 6 (2 4 . 6%)

2 6 , 08 4 (2 0. 8 %)

2 3 , 6 1 8 (2 1 . 5%)

8 0+ 2 9 , 07 5 ( 1 7 . 4 %)

1 7 , 03 2 (2 2 . 6%)

3 4 , 7 9 4 ( 1 7 . 9%)

1 7 , 7 6 7 (2 4 . 5%)

4 1 , 1 05 ( 1 9 . 7 %)

1 8 , 02 8 (2 6 . 5%)

2 6 , 8 3 6 (2 1 . 4 %)

2 5 , 04 6 (2 2 . 8 %)

Gender M a le 7 2 , 1 8 5

(4 3 . 2 %) 3 2 , 03 0

(4 2 . 5%) 8 2 , 9 9 9

(4 2 . 7 %) 3 0, 1 6 7

(4 1 . 6%) 8 9 , 3 04

(4 2 . 8 %) 2 7 , 7 5 7

(4 0. 8 %) 52 , 4 1 8

(4 1 . 8 %) 4 4 , 9 3 0

(4 0. 9%) F em a le 9 4 , 9 1 1

(56 . 8 %) 4 3 , 3 3 5

(5 7 . 5%) 1 1 1 , 3 7 9 (5 7 . 3%)

4 2 , 3 50 (58 . 4 %)

1 1 9 , 3 5 1 (5 7 . 2 %)

4 0, 2 7 5 (59 . 2 %)

7 2 , 9 8 4 (58 . 2 %)

6 4 , 9 2 3 (59 . 1 %)

Race/Ethnicity C a u casia n 1 4 7 , 2 1 2

(8 8 . 1 %) 6 6 , 9 2 4

(8 8 . 8 %) 1 6 8 , 7 2 0 (8 6 . 8 %)

6 4 , 54 0 (8 9 . 0%)

1 8 2 , 3 6 4 (8 7 . 4 %)

60, 6 8 4 (8 9 . 2 %)

1 09 , 9 7 8 (8 7 . 7 %)

9 5 , 4 6 2 (8 6 . 9%)

A frican A m erican

1 2 , 1 9 8 ( 7 . 3%)

4 , 6 7 3 (6 . 2 %)

1 5 , 9 3 9 (8 . 2 %)

4 , 7 8 6 (6 . 6%)

1 5 , 6 4 9 ( 7 . 5%)

4 , 4 2 2 (6 . 5%)

1 0, 03 2 (8 . 0%)

9 , 7 7 7 (8 . 9%)

H isp an ic 2 , 8 4 1 ( 1 . 7 %)

1 , 3 7 9 ( 1 . 8 %)

3 , 4 9 9 ( 1 . 8 %)

1 , 2 3 3 ( 1 . 7 %)

3 , 54 7 ( 1 . 7 %)

1 , 08 9 ( 1 . 6%)

1 , 7 56 ( 1 . 4 %)

1 , 8 6 8 ( 1 . 7 %)

A s ian 1 , 6 7 1 ( 1 . 0%)

9 8 0 ( 1 . 3%)

1 , 9 4 4 ( 1 . 0%)

1 , 01 5 ( 1 . 4 %)

1 , 8 7 8 (0. 9%)

9 52 ( 1 . 4 %)

1 , 6 3 0 ( 1 . 3%)

1 , 3 1 8 ( 1 . 2 %)

N ative A m erican

1 6 7 (0. 1 %)

7 5 (0. 1 %)

1 9 4 (0. 1 %)

7 2 (0. 1 %)

2 09 (0. 1 %)

6 8 (0. 1 %)

1 2 5 (0. 1 %)

1 1 0 (0. 1 %)

O th er 2 , 506 ( 1 . 5%)

1 , 1 3 0 ( 1 . 5%)

3 , 3 04 ( 1 . 7 %)

7 9 8 ( 1 . 1 %)

4 , 1 7 3 (2 . 0%)

8 1 6 ( 1 . 2 %)

1 , 6 3 0 ( 1 . 3%)

1 , 3 1 8 ( 1 . 2 %)

U n known 501 (0. 3%)

2 2 6 (0. 3%)

7 7 8 (0. 4 %)

7 2 (0. 1 %)

8 3 5 (0. 4 %)

6 8 (0. 1 %)

1 2 5 (0. 1 %)

1 1 0 (0. 1 %)

Socioeconomic Education 8 th gra d e or less

2 1 , 2 2 1 ( 1 2 . 7 %)

9 , 04 4 ( 1 2 . 0%)

2 6 , 6 3 0 ( 1 3 . 7 %)

8 , 7 02 ( 1 2 . 0%)

2 7 , 3 3 4 ( 1 3 . 1 %)

8 , 1 6 4 ( 1 2 . 0%)

1 5 , 4 2 4 ( 1 2 . 3%)

1 3 , 6 2 2 ( 1 2 . 4 %)

S om e h igh sch ool

2 9 , 5 7 6 ( 1 7 . 7 %)

1 2 , 7 3 7 ( 1 6 . 9%)

3 5 , 7 6 6 ( 1 8 . 4 %)

1 2 , 4 00 ( 1 7 . 1 %)

3 6 , 5 1 5 ( 1 7 . 5%)

1 1 , 4 9 7 ( 1 6 . 9%)

2 1 , 9 4 5 ( 1 7 . 5%)

1 9 , 3 3 4 ( 1 7 . 6%)

H igh sch ool gra d u ate

56 , 8 1 3 (3 4 . 0%)

2 6 , 001 (3 4 . 5%)

6 6 , 08 9 (3 4 . 0%)

2 5 , 59 9 (3 5 . 3%)

7 2 , 1 9 5 (3 4 . 6%)

2 4 , 6 2 8 (3 6 . 2 %)

4 4 , 7 6 9 (3 5 . 7 %)

3 9 , 8 7 7 (3 6 . 3%)

S om e college 3 3 , 9 2 0 (2 0. 3%)

1 5 , 7 5 1 (2 0. 9%)

3 7 , 7 09 ( 1 9 . 4 %)

1 4 , 5 7 6 (2 0. 1 %)

4 1 , 4 9 0 (2 0. 1 %)

1 3 , 8 7 9 (2 0. 4 %)

2 5 , 08 0 (2 0. 0%)

2 1 , 6 4 1 ( 1 9 . 7 %)

C ollege d egree 1 0, 8 6 1 (6 . 5%)

4 , 8 9 9 (6 . 5%)

1 1 , 6 6 3 (6 . 0%)

4 , 6 4 1 (6 . 4 %)

1 2 , 9 3 7 (6 . 2 %)

4 , 2 8 6 (6 . 3%)

7 , 7 7 5 (6 . 2 %)

6 , 7 01 (6 . 1 %)

> C ollege 1 1 , 53 0 (6 . 9%)

5 , 2 9 4 ( 7 . 3%)

1 2 , 2 4 6 (6 . 3%)

5 , 004 (6 . 9%)

1 3 , 56 3 (6 . 5%)

4 , 4 90 (6 . 6%)

7 , 9 00 (6 . 3%)

6 , 8 1 1 (6 . 2 %)


3 0



Table III-1. Description of survey respondents (continued).

Cohort 1 Cohort 2 Cohort 3 Cohort 4 Cohort 5 B F B F B F B B

M iss ing 3 , 1 7 5 ( 1 . 9%)

1 , 3 05 ( 1 . 8 %)

4 , 4 7 1 (2 . 3%)

1 , 6 6 8 (2 . 3%)

4 , 1 7 3 (2 . 0%)

1 , 08 9 ( 1 . 6%)

2 , 7 59 (2 . 2 %)

1 , 8 6 8 ( 1 . 7 %)

Income < $ 5 , 000 5 , 5 1 4

(3 . 3%) 2 , 2 6 1

(3 . 0%) 1 1 , 08 0 (5 . 7 %)

1 , 8 1 3 (2 . 5%)

7 , 3 03 (3 . 5%)

1 , 9 05 (2 . 8 %)

3 , 3 8 6 (2 . 7 %)

3 , 4 05 (3 . 1 %)

$ 5 , 000-$ 9 , 9 9 9 1 8 , 3 8 1 ( 1 1 . 0%)

7 , 2 3 5 (9 . 6%)

2 2 , 3 53 ( 1 1 . 5%)

6 , 6 7 2 (9 . 2 %)

2 0, 6 5 7 (9 . 9%)

6 , 1 9 1 (9 . 1 %)

1 1 , 6 6 2 (9 . 3%)

1 0, 6 56 (9 . 7 %)

$ 1 0, 000-$ 1 9 , 9 9 9

4 1 , 1 06 (2 4 . 6%)

1 8 , 54 0 (2 4 . 6%)

50, 53 8 (2 6 . 0%)

1 7 , 6 2 2 (2 4 . 3%)

5 1 , 3 2 9 (2 4 . 6%)

1 6 , 8 7 2 (2 4 . 8 %)

3 0, 09 6 (2 4 . 0%)

2 7 , 2 4 4 (2 4 . 8 %)

$ 2 0, 000-$ 2 9 , 9 9 9

2 8 , 5 1 3 ( 1 7 . 1 %)

1 3 , 03 8 ( 1 7 . 3%)

3 3 , 2 3 9 ( 1 7 . 1 %)

1 2 , 4 00 ( 1 7 . 1 %)

3 4 , 8 4 5 ( 1 6 . 7 %)

1 2 , 3 8 2 ( 1 8 . 2 %)

2 1 , 1 9 3 ( 1 6 . 9%)

1 9 , 1 1 4 ( 1 7 . 4 %)

$ 3 0, 000-$ 3 9 , 9 9 9

1 7 , 2 1 1 ( 1 0. 3%)

7 , 7 6 3 ( 1 0. 3%)

1 8 , 6 6 0 (9 . 6%)

7 , 3 2 4 ( 1 0. 1 %)

1 9 , 8 2 2 (9 . 5%)

7 , 07 5 ( 1 0. 4 %)

1 2 , 4 1 5 (9 . 9%)

1 1 , 09 5 ( 1 0. 1 %)

$ 4 0, 000-$ 4 9 , 000

9 , 3 57 (5 . 6%)

4 , 4 4 7 (5 . 9%)

1 0, 1 08 (5 . 2 %)

4 , 3 5 1 (6 . 0%)

1 1 , 2 6 7 (5 . 4 %)

3 , 9 4 6 (5 . 8 %)

7 , 2 7 3 (5 . 8 %)

6 , 1 52 (5 . 6%)

$ 50, 000-$ 7 9 , 000

8 , 8 3 8 (5 . 3%)

4 , 52 2 (6 . 0%)

9 , 7 1 9 (5 . 0%)

4 , 4 2 4 (6 . 1 %)

1 1 , 6 8 5 (5 . 6%)

3 , 8 7 8 (5 . 7 %)

7 , 2 7 3 (5 . 8 %)

6 , 2 6 2 (5 . 7 %)

$ 8 0, 000-$ 9 9 , 9 9 9

1 , 8 3 8 ( 1 . 1 %)

1 , 055 ( 1 . 4 %)

2 , 1 3 8 ( 1 . 1 %)

1 , 08 8 ( 1 . 5%)

2 , 7 1 3 ( 1 . 3%)

9 52 ( 1 . 4 %)

1 , 7 56 ( 1 . 4 %)

1 , 53 8 ( 1 . 4 %)

$100,000 or more

2 , 1 7 2 ( 1 . 3%)

1 , 1 3 0 ( 1 . 5%)

2 , 3 3 3 ( 1 . 2 %)

1 , 2 3 3 ( 1 . 7 %)

3 , 1 3 0 ( 1 . 5%)

1 , 02 0 ( 1 . 5%)

1 , 8 8 1 ( 1 . 5%)

1 , 7 58 ( 1 . 6%)

D on ’t know 1 8 , 3 8 1 ( 1 1 . 0%)

7 , 3 8 6 (9 . 8 %)

1 5 , 9 3 9 (8 . 2 %)

7 , 1 7 9 (9 . 9%)

2 2 , 3 2 6 ( 1 0. 7 %)

7 , 3 4 7 ( 1 0. 8 %)

1 2 , 2 8 9 (9 . 8 %)

1 2 , 3 04 ( 1 1 . 2 %)

M iss ing 1 5 , 7 07 (9 . 4 %)

7 , 9 8 9 ( 1 0. 6%)

1 8 , 1 3 5 (9 . 3 3%)

8 , 3 3 9 ( 1 1 . 5%)

2 3 , 5 7 8 ( 1 1 . 3%)

6 , 3 9 5 (9 . 4 %)

1 6 , 1 7 7 ( 1 2 . 9%)

1 0, 4 3 6 (9 . 5%)

Administration Respondent E n rollee 1 4 2 , 53 3

(8 5 . 3%) 6 1 , 04 6

(8 1 . 0%) 1 52 , 004 ( 7 8 . 2 %)

56 , 9 9 8 ( 7 8 . 6%)

1 6 6 , 9 2 4 (8 0. 0%)

54 , 01 7 ( 7 9 . 4 %)

1 00, 3 2 2 (8 0. 0%)

8 8 , 1 02 (8 0. 2 %)

F am ily 1 6 , 3 7 5 (9 . 8 %)

7 , 53 7 ( 1 0. 0%)

1 9 , 6 3 2 ( 1 0. 1 %)

7 , 54 2 ( 1 0. 4 %)

2 2 , 9 52 ( 1 1 . 0%)

7 , 1 4 3 ( 1 0. 5%)

1 3 , 04 2 ( 1 0. 4 %)

1 1 , 4 2 5 ( 1 0. 4 %)

F rien d 1 , 003 (0. 6%)

4 52 (0. 6%)

1 , 3 6 1 (0. 7 %)

4 3 5 (0. 6%)

1 , 4 6 1 (0. 7 %)

4 08 (0. 6%)

7 52 (0. 6%)

7 6 9 (0. 7 %)

P rofessiona l 501 (0. 3%)

3 01 (0. 4 %)

7 7 8 (0. 4 %)

3 6 3 (0. 5%)

1 , 04 3 (0. 5%)

4 08 (0. 6%)

7 52 (0. 6%)

6 59 (0. 6%)

M iss in g 6 , 6 8 4 (4 . 0%)

6 , 02 9 (8 . 0%)

2 0, 604 ( 1 0. 6%)

7 , 2 52 ( 1 0. 0%)

1 6 , 2 7 5 ( 7 . 8 %)

6 , 1 2 3 (9 . 0%)

1 0, 4 08 (8 . 3%)

8 , 8 9 8 (8 . 1 %)

Survey Disposition M a i l 1 4 5 , 7 08

(8 7 . 2 %) 6 8 , 3 56

(90. 7 %) 1 7 1 , 4 4 1 (8 8 . 2 %)

6 5 , 9 9 0 (9 1 . 0%)

1 7 2 , 3 4 9 (8 2 . 6%)

58 , 3 7 1 (8 5 . 8 %)

1 04 , 8 3 6 (8 3 . 6%)

8 9 , 9 7 0 (8 1 . 9%)

T e lep h one 2 0, 2 1 9 ( 1 2 . 1 %)

6 , 3 3 1 (8 . 4 %)

2 1 , 5 7 6 ( 1 1 . 1 %)

6 , 09 1 (8 . 4 %)

2 4 , 6 2 1 ( 1 1 . 8 %)

7 , 007 ( 1 0. 3%)

1 5 , 2 9 9 ( 1 2 . 2 %)

1 5 , 1 6 0 ( 1 3 . 8 %)

M iss ing 1 , 1 7 0 (0. 7 %)

6 7 8 (0. 9%)

1 , 3 6 1 (0. 7 %)

4 3 5 (0. 6%)

1 1 , 6 8 5 (5 . 6%)

2 , 6 53 (3 . 9%)

5 , 2 6 7 (4 . 2 %)

4 , 7 2 4 (4 . 3%)


3 1



Table III-2 presents the estimates of reliability for the summary measures—PCS score and MCS score. The reliability of these two summary measures has been high and consistent over time. Reliability coefficients for the PCS score are on the order of 0.94 for baseline and follow-up in each cohort, while reliability coefficients for the MCS score are on the order of 0.90 for baseline and follow-up in each cohort. Our estimates for the PCS score are essentially identical to those reported by Gandek et al. (2004) (0.94) and Andresen et al. (1998) (0.90) for community-dwelling elderly. Our estimates of MCS score are also essentially identical to those reported by Gandek et al. (2004) (0.89) and higher than the estimate of Andresen et al. (1998) (0.69). Reliability estimates for the two HOS summary measures well exceed the minimum standard of reliability for group comparisons (r>0.70).

Table III-2. Reliability estimates for PCS score and MCS score: all respondents.

PCS score MCS score Cohort 1 B ase lin e 0. 9 4 1 0. 9 03 F ollow-u p 0. 9 4 0 0. 9 04 Cohort 2 B ase lin e 0. 9 4 1 0. 9 02 F ollow-u p 0. 9 4 2 0. 9 05 Cohort 3 B ase lin e 0. 9 4 1 0. 9 04 F ollow-u p 0. 9 4 1 0. 9 04 Cohort 4 B ase lin e 0. 9 4 1 0. 9 04 Cohort 5 B ase lin e 0. 9 4 2 0. 9 04

We also examined the reliability of the eight scales, at both baseline and follow-up (Table III-3). Cronbach’s alpha coefficients ranged from 0.935 to 0.940 (PF), 0.904 to 0.910 (RP), 0.881 to 0.886 (BP), 0.832 to 0.838 (GH), 0.868 to 0.873 (VT), 0.849 to 0.858 (SF), 0.875 to 0.883 (RE) and 0.839 to 0.843 (MH). In comparison, Gandek et al. (2004) report the following Cronbach’s alpha coefficients for the eight scales: PF (0.93), RP (0.91), BP (0.88), GH (0.83), VT (0.87), SF (0.85), RE (0.88) and MH (0.83). Andresen et al. (1998) report the following estimates: PF (0.92), RP (0.89), BP (0.88), GH (0.82), VT (0.89), SF (0.80), RE (0.86) and MH (0.88). Thus, our analyses of scale scores are comparable. All scale scores exceed the minimum level of reliability for group comparisons.


3 2



Table III-3. Reliability estimates for SF-36 component scales: all respondents.

PF RP BP GH VT SF RE MH Cohort 1 B ase lin e 0. 9 3 5 0. 9 06 0. 8 8 1 0. 8 3 8 0. 8 7 3 0. 8 50 0. 8 7 5 0. 8 4 3 F ollow-u p 0. 9 3 5 0. 9 04 0. 8 8 3 0. 8 3 4 0. 8 7 3 0. 8 58 0. 8 8 0 0. 8 4 2 Cohort 2 B ase lin e 0. 9 3 5 0. 9 08 0. 8 8 2 0. 8 3 7 0. 8 7 2 0. 8 4 9 0. 8 7 7 0. 8 3 9 F ollow-u p 0. 9 3 6 0. 9 05 0. 8 8 6 0. 8 3 6 0. 8 7 3 0. 8 54 0. 8 8 3 0. 8 4 2 Cohort 3 B ase lin e 0. 9 3 6 0. 9 07 0. 8 8 4 0. 8 3 7 0. 8 7 2 0. 8 52 0. 8 8 0 0. 8 4 2 F ollow-u p 0. 9 3 5 0. 9 04 0. 8 8 6 0. 8 3 6 0. 8 7 2 0. 8 54 0. 8 8 1 0. 8 4 2 Cohort 4 B ase lin e 0. 9 3 6 0. 9 05 0. 8 8 4 0. 8 3 8 0. 8 7 0 0. 8 50 0. 8 8 2 0. 8 4 0 Cohort 5 B ase lin e 0. 9 4 0 0. 9 1 0 0. 8 8 4 0. 8 3 2 0. 8 6 8 0. 8 50 0. 8 8 0 0. 8 4 0

Table III-4 shows the reliability coefficients for the two summary measures for various demographic and socioeconomic subgroups as well as respondent type, survey mode, and vendor for Cohort 1 baseline and follow-up. Findings are similar across cohorts, so subsequent cohort data are not shown. The reliability coefficients did not differ appreciably by these characteristics. Both summary measures well exceed minimum standards of reliability for group comparisons for all subgroups. Our findings are not directly comparable to Gandek et al. (2004), as their estimates by participant characteristics are limited to those participants who completed the survey themselves (vs. proxy completion). McHorney et al. (1994) do not provide estimates by subgroups for the summary PCS and MCS score measures.


3 3



Table III-4. Reliability estimates for PCS score and MCS score for Cohort 1 baseline and follow-up.

Baseline Follow-up PCS score MCS score PCS score MCS score

Demographics Age < 65 0. 9 2 9 0. 9 2 8 0. 9 2 8 0. 9 2 9 6 5 – 6 9 0. 9 4 0 0. 8 8 8 0. 9 4 3 0. 8 9 8 7 0– 7 4 0. 9 3 7 0. 8 9 5 0. 9 3 7 0. 9 01 7 5 – 7 9 0. 9 3 2 0. 8 9 1 0. 9 3 5 0. 9 01 8 0+ 0. 9 2 7 0. 8 9 2 0. 9 2 8 0. 8 9 7

Gender

M a le 0. 9 4 0 0. 9 02 0. 9 3 9 0. 9 00 F em a le 0. 9 4 1 0. 9 04 0. 9 4 0 0. 9 05 Race C a u casia n 0. 9 4 2 0. 9 05 0. 9 4 1 0. 9 06 A frican A m erican 0. 9 2 4 0. 8 8 6 0. 9 2 5 0. 8 8 4 H isp an ic 0. 9 3 0 0. 9 04 0. 9 3 4 0. 9 04 A s ia n 0. 9 3 2 0. 8 9 3 0. 9 4 2 0. 8 9 5 N a tive A m erican 0. 9 4 7 0. 9 2 4 0. 9 60 0. 9 01 O th er 0. 9 4 4 0. 9 1 1 0. 9 4 1 0. 9 05 Socioeconomic Education 8 th gra d e or less 0. 9 3 0 0. 8 9 3 0. 9 2 9 0. 8 9 4

S om e h igh sch ool 0. 9 3 6 0. 8 9 7 0. 9 3 6 0. 9 00 H igh sch ool gra d u a te

0. 9 4 1 0. 9 05 0. 9 4 1 0. 9 04

S om e college cred its

0. 9 3 9 0. 9 06 0. 9 4 4 0. 9 08

C ollege gra d u a te 0. 9 4 0 0. 9 00 0. 9 3 9 0. 8 9 8 Income < $ 5 , 000 0. 9 2 4 0. 8 9 1 0. 9 2 5 0. 8 9 9

$ 5 , 000– $ 9 , 9 9 9 0. 9 3 6 0. 9 08 0. 9 3 5 0. 9 06

$ 1 0, 000– $ 1 9 , 9 9 9 0. 9 4 0 0. 9 07 0. 9 3 8 0. 9 05

$ 2 0, 000– $ 2 9 , 9 9 9 0. 9 4 2 0. 9 04 0. 9 4 0 0. 9 05

$ 3 0, 000– $ 3 9 , 9 9 9 0. 9 4 2 0. 9 00 0. 9 4 0 0. 8 9 9

$ 4 0, 000– $ 4 9 , 9 9 9 0. 9 4 3 0. 8 9 4 0. 9 4 1 0. 8 9 8

$ 50,000– $ 7 9 , 9 9 9 0. 9 3 7 0. 8 9 4 0. 9 4 1 0. 8 9 6

$ 8 0, 000– $ 9 9 , 9 9 9 0. 9 4 3 0. 9 01 0. 9 3 6 0. 9 08

$ 1 00,000 or m ore 0. 9 3 5 0. 8 8 5 0. 9 4 9 0. 8 9 4


3 4



Table III-4 (continued). Reliability estimates for PCS score and MCS score for Cohort 1 baseline and follow-up.

Baseline Follow-up PCS score MCS score PCS score MCS score Survey Administration Respondent Type S e lf 0. 9 4 0 0. 8 9 9 0. 9 4 1 0. 9 05 F am ily M em b er 0. 9 4 1 0. 9 2 0 0. 9 4 0 0. 9 1 9 F rien d 0. 9 3 1 0. 9 05 0. 9 3 7 0. 9 08 P rofession a l 0. 9 3 0 0. 8 8 8 0. 9 3 0 0. 8 90 Survey Mode M a i l 0. 9 4 4 0. 9 1 3 0. 9 4 3 0. 9 1 1 T e lep h one 0. 9 2 7 0. 8 9 9 0. 9 2 4 0. 8 8 6 Survey Vendor 1 0. 9 4 0 0. 9 02 0. 9 3 4 0. 9 03 2 0. 9 4 3 0. 9 04 0. 9 4 3 0. 9 07 3 0. 9 3 8 0. 8 9 8 0. 9 4 2 0. 9 08 4 0. 9 4 1 0. 9 01 0. 9 4 2 0. 9 07 5 0. 9 4 1 0. 9 05 0. 9 3 6 0. 8 9 6 6 0. 9 3 8 0. 9 05 0. 9 3 9 0. 9 02

As with the summary PCS score and MCS score measures, we examined the reliability of the eight scale scores by participant characteristics, respondent type, mode of administration and vendor. Tables III-5a and 5b present reliability estimates for Cohort 1 Baseline and Follow-up. Findings are similar across cohorts; thus, subsequent cohort data are not shown. Note that the reliability at baseline for the Spanish language version of the HOS was consistent for Cohorts 2–5 and had reliability coefficients similar to those displayed for Cohort 1 follow-up. Similar to its performance in the full HOS sample, all scale scores met the minimum criterion for group comparisons in all subgroups except Native Americans. Our findings are not directly comparable to Gandek et al. (2004), because of differences in participants included in the subgroup analyses. While McHorney et al. (1994) provide subscale estimates by age, their estimates of scale score reliability of other demographic groups (e.g., race) include both non-elderly and elderly participants. Thus, our findings are not directly comparable.


3 5



Table III-5a. Reliability estimates for SF-36 component scales: Cohort 1 baseline Cronbach’s alpha coefficient.

PF RP BP GH VT SF RE MH Demographics Age < 6 5 0. 9 2 0 0. 8 8 5 0. 8 9 6 0. 8 1 2 0. 8 3 3 0. 8 1 7 0. 8 9 7 0. 8 6 7 6 5 – 6 9 0. 9 2 9 0. 9 04 0. 8 7 3 0. 8 3 3 0. 8 7 9 0. 8 4 1 0. 8 6 2 0. 8 3 5 7 0– 7 4 0. 9 2 7 0. 8 9 9 0. 8 7 4 0. 8 2 7 0. 8 7 0. 8 4 2 0. 8 6 6 0. 8 2 8 7 5 – 7 9 0. 9 2 4 0. 8 9 5 0. 8 7 0 0. 8 1 8 0. 8 58 0. 8 3 0 0. 8 6 5 0. 8 2 3 8 0+ 0. 9 2 9 0. 8 9 1 0. 8 7 0 0. 8 06 0. 8 3 8 0. 8 2 8 0. 8 6 8 0. 8 2 4 Gender M a le 0. 9 3 7 0. 9 08 0. 8 7 8 0. 8 4 4 0. 8 7 7 0. 8 4 9 0. 8 7 6 0. 8 4 3 F em a le 0. 9 3 3 0. 9 04 0. 8 8 2 0. 8 3 4 0. 8 7 0 0. 8 5 1 0. 8 7 5 0. 8 4 2 Race C a u casia n 0. 9 3 6 0. 9 05 0. 8 8 1 0. 8 4 1 0. 8 7 9 0. 8 58 0. 8 7 5 0. 8 4 5 A frican A m erican 0. 9 2 5 0. 9 02 0. 8 6 4 0. 8 06 0. 8 2 0 0. 7 7 7 0. 8 6 1 0. 8 2 2 H isp an ic 0. 9 2 5 0. 9 2 1 0. 8 9 3 0. 8 3 1 0. 8 2 2 0. 8 07 0. 8 9 3 0. 8 2 7 A s ia n 0. 9 2 7 0. 9 1 0 0. 8 8 0 0. 8 3 0 0. 8 1 6 0. 8 1 1 0. 8 9 3 0. 8 06 N a tive A m erican 0. 9 3 3 0. 9 3 0 0. 9 09 0. 8 54 0. 8 9 6 0. 8 8 2 0. 8 53 0. 8 7 7 O th er 0. 9 3 6 0. 9 2 4 0. 9 03 0. 8 52 0. 8 4 2 0. 8 56 0. 9 04 0. 8 4 6 Socioeconomic Education 8 th gra d e or less 0. 9 3 2 0. 9 08 0. 8 7 5 0. 8 1 8 0. 8 3 2 0. 8 04 0. 8 7 8 0. 8 2 3 S om e h igh sch ool 0. 9 3 2 0. 9 09 0. 8 7 9 0. 8 2 5 0. 8 53 0. 8 3 0 0. 8 7 4 0. 8 2 8 H igh sch ool gra d u a te

0. 9 3 3 0. 9 05 0. 8 8 0 0. 8 4 1 0. 8 7 8 0. 8 5 7 0. 8 7 6 0. 8 4 4

S om e college cred its

0. 9 3 5 0. 9 02 0. 8 8 3 0. 8 4 3 0. 8 9 1 0. 8 7 1 0. 8 6 4 0. 8 4 9

C ollege gra d u a te 0. 9 3 3 0. 8 9 7 0. 8 6 8 0. 8 3 0 0. 8 8 7 0. 8 6 7 0. 8 60 0. 8 4 0 Income < $ 5 , 000 0. 9 3 2 0. 9 02 0. 8 7 5 0. 8 1 3 0. 8 1 6 0. 7 8 6 0. 8 7 2 0. 8 1 4 $ 5 , 000– $ 9 , 9 9 9 0. 9 3 1 0. 9 06 0. 8 8 4 0. 8 3 2 0. 8 5 1 0. 8 3 8 0. 8 8 1 0. 8 4 3 $ 1 0, 000– $ 1 9 , 9 9 9 0. 9 3 2 0. 9 05 0. 8 8 5 0. 8 4 2 0. 8 7 2 0. 8 54 0. 8 7 4 0. 8 4 7 $ 2 0, 000– $ 2 9 , 9 9 9 0. 9 3 2 0. 9 03 0. 8 8 0 0. 8 4 0 0. 8 8 3 0. 8 6 1 0. 8 7 0 0. 8 4 4 $ 3 0, 000– $ 3 9 , 9 9 9 0. 9 3 2 0. 8 9 8 0. 8 7 5 0. 8 3 9 0. 8 8 8 0. 8 7 0 0. 8 59 0. 8 3 7 $ 4 0, 000– $ 4 9 , 9 9 9 0. 9 3 3 0. 8 9 7 0. 8 7 3 0. 8 3 7 0. 8 8 8 0. 8 60 0. 8 5 7 0. 8 3 7 $ 50,000– $ 7 9 , 9 9 9 0. 9 2 9 0. 8 8 9 0. 8 59 0. 8 2 6 0. 8 9 8 0. 8 60 0. 8 50 0. 8 3 3 $ 8 0, 000– $ 9 9 , 9 9 9 0. 9 3 5 0. 8 9 7 0. 8 6 3 0. 8 3 5 0. 8 9 6 0. 8 8 2 0. 8 4 7 0. 8 4 1 $ 1 00,000 or m ore 0. 9 3 1 0. 8 90 0. 8 53 0. 8 1 6 0. 8 8 0 0. 8 54 0. 8 4 4 0. 8 07 Survey Administration Respondent Type S e lf 0. 9 2 9 0. 9 03 0. 8 7 9 0. 8 3 0 0. 8 7 4 0. 8 4 4 0. 8 6 6 0. 8 3 7 F am ily 0. 9 4 8 0. 9 1 6 0. 8 8 2 0. 8 4 6 0. 8 58 0. 8 53 0. 9 1 4 0. 8 54 F rien d 0. 9 3 7 0. 9 09 0. 8 8 0 0. 8 4 5 0. 8 03 0. 7 9 6 0. 9 09 0. 8 2 8 P rofession a l 0. 9 4 8 0. 9 1 7 0. 8 8 8 0. 8 06 0. 8 08 0. 7 52 0. 8 7 7 0. 8 1 6 Survey Mode M a i l 0. 9 4 0 0. 9 1 0 0. 8 8 0 0. 8 50 0. 8 8 0 0. 8 60 0. 8 8 0 0. 8 50 T e lep h one 0. 9 3 0 0. 8 8 5 0. 8 3 6 0. 7 8 7 0. 8 4 0 0. 7 60 0. 8 60 0. 8 3 0


3 6



Table III-5a. (continued) Reliability estimates for SF-36 component scales: Cohort 1 baseline Cronbach’s alpha coefficient.

PF RP BP GH VT SF RE MH Survey Language E nglish 0. 9 3 5 0. 9 06 0. 8 8 1 0. 8 3 8 0. 8 7 3 0. 8 50 0. 8 7 5 0. 8 4 3 S p an ish N A N A N A N A N A N A N A N A Survey Vendor 1 0. 9 3 4 0. 9 06 0. 8 7 7 0. 8 3 2 0. 8 7 1 0. 8 3 8 0. 8 7 7 0. 8 4 3 2 0. 9 3 7 0. 9 07 0. 8 9 1 0. 8 4 1 0. 8 7 6 0. 8 56 0. 8 7 4 0. 8 4 2 3 0. 9 3 2 0. 9 04 0. 8 8 0 0. 8 3 3 0. 8 6 1 0. 8 3 1 0. 8 7 2 0. 8 3 2 4 0. 9 3 4 0. 9 03 0. 8 8 1 0. 8 3 8 0. 8 7 5 0. 8 50 0. 8 7 2 0. 8 4 1 5 0. 9 3 5 0. 9 08 0. 8 7 5 0. 8 4 0 0. 8 7 3 0. 8 52 0. 8 7 6 0. 8 4 6 6 0. 9 3 2 0. 9 06 0. 8 7 9 0. 8 3 1 0. 8 6 6 0. 8 4 9 0. 8 8 5 0. 8 4 0

Table III-5b. Reliability estimates for SF-36 component scales: Cohort 1 follow-up Cronbach’s alpha coefficient.

PF RP BP GH VT SF RE MH Demographics Age < 6 5 0. 9 2 0 0. 8 8 1 0. 8 8 9 0. 8 1 2 0. 8 3 0 0. 8 1 3 0. 8 9 4 0. 8 7 1 6 5 – 6 9 0. 9 3 3 0. 9 05 0. 8 8 2 0. 8 4 0 0. 8 8 3 0. 8 6 1 0. 8 7 2 0. 8 4 3 7 0– 7 4 0. 9 2 7 0. 8 9 7 0. 8 7 5 0. 8 2 5 0. 8 7 3 0. 8 4 8 0. 8 7 0 0. 8 2 8 7 5 – 7 9 0. 9 2 6 0. 8 9 7 0. 8 8 0 0. 8 2 0 0. 8 6 3 0. 8 4 9 0. 8 7 3 0. 8 2 9 8 0+ 0. 9 3 0 0. 8 9 2 0. 8 7 4 0. 8 04 0. 8 4 8 0. 8 3 9 0. 8 7 5 0. 8 2 4 Gender M a le 0. 9 3 6 0. 9 05 0. 8 8 0 0. 8 4 0 0. 8 7 6 0. 8 5 7 0. 8 8 1 0. 8 4 3 F em a le 0. 9 3 3 0. 9 03 0. 8 8 4 0. 8 3 0 0. 8 7 1 0. 8 59 0. 8 7 9 0. 8 4 1 Race C a u casia n 0. 9 3 5 0. 9 03 0. 8 8 3 0. 8 3 5 0. 8 7 8 0. 8 6 6 0. 8 8 0. 8 4 4 A frican A m erican 0. 9 2 5 0. 9 07 0. 8 6 6 0. 8 06 0. 8 1 9 0. 7 7 8 0. 8 5 7 0. 8 1 6 H isp an ic 0. 9 2 9 0. 9 2 0 0. 9 06 0. 8 5 7 0. 8 4 3 0. 8 06 0. 8 8 3 0. 8 3 2 A s ia n 0. 9 3 8 0. 9 2 9 0. 8 8 9 0. 8 4 4 0. 8 4 0 0. 8 3 0 0. 9 07 0. 8 1 1 N a tive A m erican 0. 9 4 5 0. 9 6 4 0. 9 2 4 0. 8 05 0. 8 6 5 0. 6 7 7 0. 9 00 0. 8 3 5 O th er 0. 9 3 7 0. 9 2 0 0. 9 00 0. 8 4 3 0. 8 4 6 0. 8 3 1 0. 8 9 5 0. 8 4 1 Socioeconomic Education 8 th gra d e or less 0. 9 3 3 0. 9 09 0. 8 7 3 0. 8 1 9 0. 8 3 3 0. 8 1 9 0. 8 8 7 0. 8 1 8 S om e h igh sch ool 0. 9 3 2 0. 9 08 0. 8 8 3 0. 8 2 7 0. 8 56 0. 8 3 9 0. 8 8 2 0. 8 3 2 H igh sch ool gra d u a te 0. 9 3 2 0. 9 03 0. 8 8 4 0. 8 3 1 0. 8 7 3 0. 8 6 3 0. 8 7 7 0. 8 3 8 S om e college cred its 0. 9 3 5 0. 8 9 7 0. 8 8 5 0. 8 4 0 0. 8 90 0. 8 7 5 0. 8 6 8 0. 8 52 C ollege gra d u a te 0. 9 3 2 0. 8 90 0. 8 6 8 0. 8 2 6 0. 8 90 0. 8 7 1 0. 8 59 0. 8 4 3


3 7



Table III-5b (continued). Reliability estimates for SF-36 component scales: Cohort 1 follow-up Cronbach’s alpha coefficient.

PF RP BP GH VT SF RE MH Income < $ 5 , 000 0. 9 3 2 0. 9 1 1 0. 8 6 9 0. 8 01 0. 8 04 0. 8 1 5 0. 8 8 3 0. 8 1 2 $ 5 , 000– $ 9 , 9 9 9 0. 9 3 1 0. 9 05 0. 8 8 8 0. 8 2 9 0. 8 4 6 0. 8 3 9 0. 8 7 7 0. 8 4 3 $ 1 0, 000– $ 1 9 , 9 9 9 0. 9 3 2 0. 9 02 0. 8 8 6 0. 8 3 4 0. 8 6 3 0. 8 5 7 0. 8 7 7 0. 8 4 3 $ 2 0, 000– $ 2 9 , 9 9 9 0. 9 3 1 0. 8 9 7 0. 8 8 3 0. 8 3 7 0. 8 8 3 0. 8 6 6 0. 8 7 3 0. 8 4 6 $ 3 0, 000– $ 3 9 , 9 9 9 0. 9 2 9 0. 8 9 6 0. 8 7 4 0. 8 3 0 0. 8 8 7 0. 8 6 6 0. 8 6 9 0. 8 4 1 $ 4 0, 000– $ 4 9 , 9 9 9 0. 9 3 1 0. 8 9 2 0. 8 7 5 0. 8 3 7 0. 8 8 8 0. 8 6 9 0. 8 6 6 0. 8 3 7 $ 50,000– $ 7 9 , 9 9 9 0. 9 3 4 0. 8 9 4 0. 8 7 1 0. 8 3 0 0. 8 9 7 0. 8 6 8 0. 8 6 3 0. 8 3 1 $ 8 0, 000– $ 9 9 , 9 9 9 0. 9 3 7 0. 8 9 4 0. 8 6 4 0. 8 3 5 0. 8 9 3 0. 8 6 3 0. 8 7 9 0. 8 2 1 $ 1 00,000 or m ore 0. 9 3 4 0. 8 8 2 0. 8 4 8 0. 8 1 9 0. 8 9 6 0. 8 6 3 0. 8 7 3 0. 8 2 3 Survey Administration Respondent Type S e lf 0. 9 2 9 0. 9 00 0. 8 8 1 0. 8 2 7 0. 8 7 6 0. 8 54 0. 8 7 0 0. 8 3 7 F am ily 0. 9 4 6 0. 9 1 6 0. 8 8 6 0. 8 3 6 0. 8 54 0. 8 52 0. 9 2 4 0. 8 52 F rien d 0. 9 4 1 0. 9 07 0. 8 90 0. 8 6 4 0. 8 4 3 0. 8 1 6 0. 9 02 0. 8 50 P rofession a l 0. 9 3 9 0. 9 09 0. 9 03 0. 7 9 1 0. 8 1 8 0. 8 1 2 0. 9 00 0. 7 8 4 Survey Mode M a i l 0. 9 4 0 0. 9 1 0 0. 8 8 8 0. 8 4 0 0. 8 8 8 0. 8 7 0 0. 8 8 2 0. 8 50 T e lep h one 0. 9 3 0 0. 8 8 0 0. 8 4 0 0. 7 7 7 0. 8 2 3 0. 7 7 0 0. 8 50 0. 8 2 0 Survey Language E nglish 0. 9 1 7 0. 9 04 0. 8 8 3 0. 8 3 3 0. 8 7 3 0. 8 58 0. 8 8 0 0. 8 4 2 S p an ish 0. 9 6 4 0. 8 3 3 0. 9 9 1 0. 9 3 1 0. 9 1 0 0. 9 7 5 0. 7 50 0. 7 6 2 Survey Vendor 1 0. 9 3 1 0. 9 04 0. 8 7 5 0. 7 8 3 0. 8 59 0. 8 4 5 0. 8 8 0 0. 8 3 2 2 0. 9 3 6 0. 9 06 0. 8 8 7 0. 8 3 8 0. 8 7 2 0. 8 6 2 0. 8 8 5 0. 8 4 2 3 0. 9 3 3 0. 9 08 0. 8 8 5 0. 8 4 1 0. 8 8 0 0. 8 6 6 0. 8 8 1 0. 8 4 5 4 0. 9 3 6 0. 9 03 0. 8 8 4 0. 8 3 8 0. 8 8 4 0. 8 6 1 0. 8 8 0 0. 8 4 3 5 0. 9 3 2 0. 9 00 0. 8 8 4 0. 8 4 4 0. 8 6 5 0. 8 55 0. 8 7 2 0. 8 4 4 6 0. 9 3 4 0. 9 08 0. 8 7 8 0. 8 2 8 0. 8 6 4 0. 8 55 0. 8 8 0 0. 8 4 1

Discussion of HOS Reliability Results

The reliability of PCS score and MCS score summary scores was acceptable for group-level comparisons across cohorts. There were no appreciable differences in reliability by participant characteristics, respondent type, mode, or vendor. Similarly, reliability estimates for each of the eight scale scores met minimum standards for group comparisons. Although there was greater variability in scale score estimates by participants’ characteristics and survey administration features, estimates met standards for group comparisons. Thus, our analyses suggest that the reliability of the SF-36 summary measures and scale scores meet minimum standards for group-level comparisons, the intent of their use in the Medicare HOS.


3 8



Validity of the SF-36 in Older Populations

Measures of instrument validity are concerned with the extent to which an empirical measure of a concept adequately reflects that concept (Babbie, 2004). Measures of validity include face validity, construct validity, and criterion-related or predictive validity.

Face validity assesses the extent to which an indicator seems to be a reasonable measure of some variable (Babbie, 2004). Face validity can be examined by gaining the perspectives of survey respondents on instruments. Hayes et al. (1995) interviewed individuals in the U.K. who were 65 years or older regarding their views of the SF-36. These semi-structured interviews were conducted after each participant completed a self-administered or interviewer-administered SF-36. When asked if the SF-36 was “applicable to the situation and circumstances of their life,” 88% of participants (172 of 195 individuals) regarded all or most of the questions applicable to them (p. 123). What few issues that were raised regarding applicability related most strongly to questions focused on work or other regular activities (Role—Physical scale), as well as the question related to “vigorous activities” (Physical Functioning scale). Questions judged to be irrelevant resulted in missing data that were concentrated on nine of the 36 questions, accounting for about 40% of missing data. These items related to limitations of vigorous activities, problems with work or other activities as a result of physical health, and problems with work or other activities as a result of emotional health. A subsequent study by Hobson and Meara (1997) of individuals with Parkinson’s disease reported that modifications to the role functioning questions resulted only in minor improvements in missing data rates. Modifications in the survey used in their research reduced the ability to compare their findings to studies using the standard form.

Construct validity examines the extent “to which a measure relates to other variables as expected within a system of theorized relationships” (Babbie, 2004, p. 144). Convergent and discriminant validity form the core of construct validity. Convergent validity can be assessed by examining the extent to which different methods of measuring a single construct provide similar results. In contrast, discriminant validity is assessed by examining the extent to which a measure of one underlying construct can be differentiated from another.


3 9



Factor analysis is frequently used to test the construct validity of instruments such as the SF-36. Gandek et al. (2004) conducted a factor analysis of the SF-36 in older community-dwelling individuals, and Dexter et al. (1996) conducted a factor analysis of older adults at risk of acute deterioration. The analysis by Gandek et al. supported two principal components of physical and mental health. For example, Physical Functioning, Role—Physical, and Bodily Pain each correlated highly with the physical component, at 0.85, 0.77, and 0.78, respectively. Some differences were seen across subgroups, such as racial/ethnic groups. The exploratory factor analysis by Dexter et al. (1996) revealed the eight subscales with a single underlying dimension of general health. When a two-factor solution was forced, the physical and mental health dimensions were identified. Despite differences in participants’ age, education, race, income, and health from previous work (e.g., McHorney et al., 1994), Dexter and colleagues (1996) reported that scaling assumptions and reliability compared favorably to prior work.

Gandek et al.’s analysis (2004), as well as that of Dexter et al. (1996) demonstrated discriminant validity. For example, the factor loadings for the SF-36 subscales related to mental health had factor leadings of 0.88 (Role—Emotional) and 0.80 (Mental Health) on the MCS score, while their factor loadings were 0.14 and 0.34, respectively, for the PCS score in Dexter et al.’s analysis.

Weinberger et al. (1991) correlated the performance of the SF-36 with the Sickness Impact Profile in a small sample of 25 older veterans. The two instruments were highly correlated for overall functioning (r=0.73), physical functioning (r=0.78), and social functioning (r=0.67). Reuben and colleagues (1995) examined the construct validity among several measures of physical functioning in a sample of community-dwelling older individuals. Describing their findings as “inconsistent and weak,” Reuben et al. reported that the Older Americans Resources and Services (OARS) physical function measure of instrumental activities of daily living and the Physical Performance Test (PPT) had comparable correlations with the SF-36 measure of role limitations as a result of physical health problems, but inconsistent relationships with other SF-36 measures of health. For example, the PPT did not correlate with the measures of bodily pain and general health, both physical health scales.

Among ninety stroke survivors, construct validity was demonstrated by differences across the eight SF-36 scales and patients with physical disability or activities of daily living (ADL) dependence (measured by the Barthel Index) and mental health (measured by the General Health Questionnaire) (Anderson et al., 1996). The SF-36 did not appear to characterize well social functioning. No association was found between the social functioning scales and domains of everyday living relevant to many older individuals covered by the Adelaide Activities Profile.


4 0



Convergent validity was confirmed for the Physical Functioning, Bodily Pain, and Mental Health scales as their correlation with related instruments exceeded the criterion level of r=0.60 in a study of frail older individuals. Convergent validity was not found for the Social Functioning or General Health scales (Stadnyk et al., 1998). Seymour et al. (2001) correlated the PCS score with the functional independence measure (FIM) in a population of older adults with physical disabilities. While a correlation sufficient to establish convergent validity was found for participants who were cognitively intact, convergent validity could not be established for those who were cognitively impaired.

Criterion validity assesses the degree to which a measure relates to some external criterion (Babbie, 2004). Criterion validity can be examined with measures gathered concurrently (concurrent validity) or after some interval of time (predictive validity).

McHorney (1996) examined predictive validity for participants in the Medical Outcomes Study who were age 65 or older. Her analysis examined the relationship between each of the eight scales and mortality over a 4-year period, use of inpatient services over a prospective 2-year period, and the number of physician visits over a prospective 2-year period. With regard to mortality, the General Health perception scale differentiated subsequent mortality most clearly. The Physical Function scale was also predictive. However, baseline reports of Mental Health offered no predictive value. Other scales—Physical Functioning, Role—Physical, and Bodily Pain—were predictive of hospitalizations. In contrast, all scales were predictive of ambulatory visits, with Bodily Pain the most predictive and Mental Health and Role—Emotional the least predictive.

Analytic Approach for Assessment of the HOS Validity

Our assessment of validity focused on criterion validity, both concurrent and predictive. We focused on criterion validity because face validity has been well established in the literature and the available data and resources supported an assessment of criterion validity. We examined concurrent validity by calculating correlations between the SF-36 summary measures and other measures of health and mental health status within the HOS instrument. Specific questions used in the assessment of concurrent validity for the PCS score included: a composite measure of ADL function; three questions that query symptoms experienced at present or within the previous four weeks; seven chronic health conditions most likely to be related to higher levels of mortality; five questions related to cancer; a general health status question; and participant age. “Symptom” questions included two related to experiencing chest pain within the past four weeks and urinary incontinence. Chronic health conditions examined included:


4 1



1. High blood pressure (HP) 2. Coronary artery disease (CAD) 3. Congestive heart failure (CHF) 4. Acute myocardial infarction (AMI) 5. Other heart problems 6. Stroke 7. Gastrointestinal problems (GI)

Next, we examined predictive validity. For the three cohorts for which both baseline and follow-up responses were available, we examined the relationship between the PCS score and MCS score and 2-year mortality, controlling for age, gender, race/ethnicity, Medicaid eligibility, poverty status, education, marital status, home ownership, and chronic conditions at baseline (Miller and Weissert, 2000). For each cohort, we estimated a probit regression in which the dependent variable was equal to 1 if the respondent died between baseline and follow-up, and 0 otherwise. Probit is a standard technique used to analyze data when the dependent variable is dichotomous. It provides results equivalent to logit, another standard technique. This allowed us to examine the predictive value of the PCS score and MCS score, controlling for other relevant participant characteristics. Coefficients from probit regressions must be transformed for the purposes of interpreting magnitudes. We adopt the convention here of reporting coefficients as marginal probabilities, so that they may be interpreted in a manner similar to coefficients obtained from ordinary least squares (OLS) regression. Specifically, the reported coefficients can be interpreted as the marginal change in the probability a respondent died between baseline and follow-up, due to a one-unit change in an independent variable such as the PCS score or MCS score.6

Results Related to HOS Criterion Validity

Tables III-6 and III-7 present the correlation coefficients, our estimate of concurrent validity. All estimates are statistically significant at the 5% level, and all are in the expected direction. That is, functional limitations, the presence of symptoms, chronic health conditions, treatment for cancer, poorer perceived health, and increasing age were related to a lower PCS score, or poorer health (see Table III-6). Similarly, self-report symptoms such as feeling sad or blue, poorer perceived health, and increasing age were related to a lower MCS score (see Table III-7).

6 Since the probit estimator is nonlinear, it is important to recognize that we evaluate the change in probability at the mean values of the independent variables.


4 2



Table III-6. Correlations among participant PCS score, health characteristics, and age.

Cohort 1

Cohort 2

Cohort 3

Cohort 4

Cohort 5

B F B F B F B F ADL Summary – 0. 58 – 0. 5 7 – 0. 5 7 – 0. 5 7 – 0. 5 7 – 0. 56 – 0. 56 – 0. 56 Symptoms C h est p a in /exercise – 0. 3 8 – 0. 3 7 – 0. 3 8 – 0. 3 6 – 0. 3 7 – 0. 3 6 – 0. 3 7 – 0. 3 6 C h est p a in /restin g – 0. 3 2 – 0. 3 1 – 0. 3 3 – 0. 3 1 – 0. 3 2 – 0. 3 0 – 0. 3 2 – 0. 3 1 D ifficu lty con troll in g

u rin a tion – 0. 2 2 – 0. 2 3 – 0. 2 3 – 0. 2 3 – 0. 2 3 – 0. 2 3 – 0. 2 3 – 0. 2 3

Chronic Health Conditions

H igh b lood p ressu re – 0. 1 6 – 0. 1 5 – 0. 1 6 – 0. 1 7 – 0. 1 6 – 0. 1 6 – 0. 1 6 – 0. 1 6 C oronary a rtery

d isease – 0. 2 0 – 0. 1 9 – 0. 1 9 – 0. 1 8 – 0. 2 0 – 0. 1 9 – 0. 2 0 – 0. 1 9

C ongestive h ea rt fa i lu re

– 0. 2 1 – 0. 2 2 – 0. 2 1 – 0. 2 2 – 0. 2 1 – 0. 2 1 – 0. 2 1 – 0. 2 1

A cu te m yocard ia l in fa rction

– 0. 1 6 – 0. 1 5 – 0. 1 5 – 0. 1 5 – 0. 1 6 – 0. 1 5 – 0. 1 5 – 0. 1 5

O th er h ea rt p rob lem s – 0. 1 8 – 0. 1 8 – 0. 1 9 – 0. 1 9 – 0. 1 9 – 0. 1 8 – 0. 1 8 – 0. 1 9 S troke – 0. 1 6 – 0. 1 5 – 0. 1 6 – 0. 1 7 – 0. 1 6 – 0. 1 7 – 0. 1 7 – 0. 1 7 G astroin testin a l

p rob lem s – 0. 1 2 – 0. 1 1 – 0. 1 2 – 0. 1 2 – 0. 1 2 – 0. 1 1 – 0. 1 2 – 0. 1 1

General Health 0. 6 7 0. 6 6 0. 6 6 0. 6 7 0. 6 6 0. 6 6 0. 6 7 0. 6 6 Age – 0. 07 – 0. 09 – 0. 05 – 0. 08 – 0. 07 – 0. 09 – 0. 07 – 0. 07


4 3



Table III-7. Correlations among participants MCS score, mental health characteristics, and age.

Cohort 1

Cohort 2

Cohort 3

Cohort 4

Cohort 5

B F B F B F B B Sadness – 0. 6 0 – 0. 59 – 0. 6 0 – 0. 6 0 – 0. 6 0 – 0. 6 0 – 0. 6 1 – 0. 6 0 Feeling Depressed W ith in p ast yea r – 0. 6 0 – 0. 58 – 0. 6 0 – 0. 58 – 0. 6 0 – 0. 58 – 0. 6 0 – 0. 59 W ith in p ast 2 yea rs – 0. 4 4 – 0. 4 3 – 0. 4 5 – 0. 4 3 – 0. 4 5 – 0. 4 3 – 0. 4 5 – 0. 4 5 General Health 0. 4 6 0. 4 6 0. 4 7 0. 4 6 0. 4 7 0. 4 6 0. 4 7 0. 4 6 Age – 0. 05 – 0. 02 – 0. 07 – 0. 05 – 0. 05 – 0. 03 – 0. 06 – 0. 06

The magnitude of the estimates varies. For the PCS score, the highest correlations were observed for the composite ADL measure (–0.56 to –0.58), the measure of general health (0.66 to 0.67), and questions related to chest pain while exercising (–0.36 to –0.38) and at rest (–0.30 to –0.33). For the MCS score, the correlations between feelings of sadness (–0.59 to –0.61), feeling depressed or sad within the last year (–0.58 to –0.60), feeling depressed or sad within the past two years (–0.43 to –0.45), and general health (0.46 to 0.47) were the highest.

Table III-8 presents the findings from our probit analyses. Both the PCS score and MCS score were predictive of mortality. When a model that includes only the PCS score and MCS score as independent variables was estimated, Pseudo R2 ranged from .0727 to .0756 for mortality across cohorts. This is slightly less than half of the total Pseudo R2 reported in Table III-8 for the full models. The three models presented in Table III-8 were statistically significant and the Pseudo R2 ranged from 0.152 to 0.169. Associations were fairly robust across the three models, particularly with regard to age and most measures of chronic conditions, in addition to the estimates of PCS score and MCS score effects. Focusing on third cohort findings, a unit increase in PCS score was associated with a decrease in probability of death of 0.20 percentage points. The effect of a unit increase in MCS score was somewhat smaller, decreasing the probability of death by 0.12 percentage points. Increasing age and the presence of any of a number of chronic conditions were also predictive of death, as one might anticipate. Individuals under treatment for lung cancer at baseline had the highest probability of death, controlling for other factors. The probability of death was also notably higher for persons self-reporting a history of congestive heart failure, stroke, COPD, diabetes, a history of cancer, or treatment at baseline for colon cancer. Some chronic conditions included in our estimations were not significantly related to 2-year mortality (e.g., angina), while others appeared to be protective (e.g., arthritis). We attribute these differences to differing disease progression among chronic conditions over a 2-year period.


4 4



Table III-8. Predictors of mortality in Medicare HOS participants.

Cohort 1 Cohort 2 Cohort 3 dF/dx (S.E.) dF/dx (S.E.) dF/dx (S.E.) PCS score – . 001 9 ( . 000) * * * – . 002 0 ( . 000) * * * – . 002 0 ( . 000) * * * MCS score – . 001 0 ( . 000) * * * – . 001 1 ( . 000) * * * – . 001 2 ( . 000) * * * Demographic A ge ( in yea rs) . 002 1 ( . 000) * * * . 002 4 ( . 000) * * * . 002 5 ( . 000) * * * 7 5 -8 4 . 0009 ( . 000) * * * . 0007 ( . 000) * . 0009 ( . 000) * * * 8 5+ . 002 6 ( . 000) * * * . 002 6 ( . 000) * * * . 002 6 ( . 004 ) * * * F em a le – . 01 59 ( . 01 0) – . 007 4 ( . 01 0) . 006 2 ( . 01 0) F em a le * A ge – . 0001 ( . 000) – . 0002 ( . 000) – . 0004 ( . 000) * * A frican A m erican – . 002 1 ( . 002 ) – . 001 8 ( . 002 ) – . 004 1 ( . 002 ) H isp an ic – . 006 3 ( . 002 ) * – . 008 6 ( . 002 ) * * * – . 007 5 ( . 002 ) * * A s ia n – . 005 1 ( . 004 ) – . 008 8 ( . 002 ) * – . 0060 ( . 004 ) M a rried – . 003 9 ( . 001 ) * * – . 008 4 ( . 001 ) * * * – . 006 3 ( . 001 ) * * * Socioeconomic H igh sch ool gra d – . 0006 ( . 001 ) . 0000 ( . 001 ) – . 0001 ( . 001 ) P overty in com e . 002 4 ( . 001 ) . 001 5 ( . 001 ) . 003 5 ( . 001 ) * * M ed ica id . 01 1 9 ( . 004 ) * * * . 004 9 ( . 001 ) . 008 5 ( . 003 ) * * O wn h om e – . 0054 ( . 001 ) * * * – . 0090 ( . 001 ) * * * – . 008 9 ( . 001 ) * * * Health Characteristics H yp erten sion – . 0003 ( . 001 ) – . 001 6 ( . 001 ) – . 004 7 ( . 001 ) * * * A ngin a – . 002 2 ( . 002 ) – . 002 0 ( . 001 ) – . 001 5 ( . 001 ) C H F . 03 58 ( . 003 ) * * * . 03 9 2 ( . 003 ) * * * . 04 2 1 ( . 002 ) * * * A M I . 0060 ( . 002 ) * * . 003 7 ( . 001 ) * . 003 6 ( . 002 ) * O th er h ea rt . 001 4 ( . 001 ) . 002 2 ( . 001 ) . 003 2 ( . 001 ) * S troke . 01 4 1 ( . 002 ) * * * . 01 7 0 ( . 002 ) * * * . 01 6 6 ( . 002 ) * * * C O P D . 01 52 ( . 002 ) * * * . 01 8 1 ( . 002 ) * * * . 01 6 6 ( . 002 ) * * * G astroin testin a l – . 0054 ( . 002 ) * – . 004 9 ( . 002 ) * – . 01 07 ( . 002 ) * * * A rth ritis of h ip – . 001 0 ( . 001 ) * * * – . 01 2 1 ( . 001 ) * * * – . 01 52 ( . 001 ) * * * A rth ritis of h an d – . 007 8 ( . 001 ) * * * – . 008 2 ( . 001 ) * * * – . 009 1 ( . 001 ) * * * S cia tica – . 01 2 6 ( . 001 ) * * * – . 01 52 ( . 001 ) * * * – . 01 2 8 ( . 001 ) * * * D ia b etes . 01 2 8 ( . 002 ) * * * . 01 4 0 ( . 001 ) * * * . 01 07 ( . 001 ) * * * A n y cancer h istory . 02 2 6 ( . 002 ) * * * . 02 7 4 ( . 002 ) * * * . 02 9 2 ( . 002 ) * * * C olon . 01 4 0 ( . 006 ) * . 01 4 6 ( . 006 ) * * . 01 3 2 ( . 005 ) * * B reast . 0004 ( . 005 ) – . 007 4 ( . 004 ) . 0002 ( . 004 ) P rosta te – . 01 2 2 ( . 003 ) * * * – . 0052 ( . 003 ) – . 009 3 ( . 003 ) * * * L u ng . 1 4 1 3 ( . 01 7 ) * * * . 1 6 8 1 ( . 01 7 ) * * * . 1 5 7 9 ( . 01 5 ) * * * N u m b er of ob s. 1 09 , 7 8 3 1 2 6 , 53 9 1 2 8 , 9 6 8 L R ch i2 (3 2 ) 7 2 7 7 . 3 8 9 4 9 9 . 4 8 1 02 50. 55 P rob > ch i2 . 0000 . 0000 . 0000 P seu d o R 2 . 1 52 0 . 1 6 3 9 . 1 6 8 6 O b s. P . 056 8 . 0607 . 06 3 2 P red . P . 03 7 1 . 03 8 4 . 03 9 9

* p < . 05 . * * p < . 01 . * * * p < . 001 .


4 5



Our findings on predictive validity are similar to those reported by McHorney (1996) for PCS score and mortality, but not MCS score. McHorney (1996) found that the General Health and Physical Functioning scales differentiated subsequent mortality, but reported no relationship between baseline Mental Health scale scores and mortality. In part, this may be due to a difference in approach. McHorney examined the relationship of each of the eight scale scores with mortality, not controlling for additional factors. Our approach relied on the summary measures and controlled for several additional predictors of mortality, such as chronic conditions. However, we did observe an association between MCS score and mortality in models unadjusted for additional individual characteristics (data not shown).

Discussion of HOS Validity Results

With regard to the validity of using the SF-36 in the Medicare managed care population, we found evidence of concurrent validity through the correlation analyses and evidence of predictive validity through our analyses of 2-year mortality rates across the first three cohorts. The PCS score and MCS score scores accounted for almost half of the Pseudo R2 for 2-year mortality.

Although our analysis presents strong evidence for the reliability and validity of the HOS instrument, review of the literature suggests two areas that should be targeted for more research. Hill et al. (1996) reported concerns with face validity in older individuals using mental health or incontinence services. One focus of subsequent research could be to further explore issues of face validity among Medicare beneficiaries, leading to possible refinements in questions to relate to health status concerns of older individuals more clearly.

Bierman, Lawrence, and colleagues (2001b) cautioned that the effects of racial, ethnic, and cultural differences on SF-36 scores have not been well established. Gandek et al. (2004) reported internal consistency reliability estimates that meet the minimum criterion for group comparisons across several different subgroups, including racial and ethnic subgroups. With regard to validity, although Gandek et al.’s (2004, p. 19) results “across the subgroups supported the interpretation of the two components as physical and mental”, there were differences in factor loadings across subgroups, particularly racial/ethnic groups. In light of the increasing diversity of Medicare enrollees, this issue warrants additional research.


4 6



Attrition

As noted in the Introduction, an important feature of the HOS is its longitudinal framework. In any longitudinal survey, attrition between survey waves can pose a serious impediment to the ability to draw inferences about changes in outcomes over time. The inability to survey all baseline respondents at follow-up would pose no limitations to drawing such inferences if the process that determined who participates in follow-up surveys was purely random. Typically, however, baseline respondents in a longitudinal survey can be lost at follow-up for many reasons. These include inability or unwillingness on the part of respondents to participate in a follow-up survey and the inability of surveyors to locate baseline respondents at follow-up. To the extent that these reasons for lack of follow-up are correlated with health outcomes, respondent characteristics, or plan characteristics, attrition between the HOS waves could limit the ability to reach conclusions about changes in health status overall, or for important subgroups of beneficiaries or plans.

In the case of the HOS, eligible respondents from baseline survey may not be surveyed at follow-up for several reasons, including unwillingness to respond at follow-up; death between baseline and follow-up; and voluntary disenrollment from the managed care plan in which they were enrolled at baseline. In addition, beneficiaries surveyed at baseline may not be included in a follow-up survey because the plan in which they were enrolled at baseline merged with another plan, reduced plan’s market area, or exited the Medicare market.

In this section, we focus on analyses of the impact of the inability to obtain follow-up information on the HOS baseline respondents who met the eligibility criteria for inclusion in the follow-up survey. We restrict our analysis to those who are 65 or older and who are enrollees of plans that remain a part of the HOS between baseline and follow-up. Hence, we adopt a standard approach in defining attrition as the inability to obtain valid follow-up data on respondents who were eligible for follow-up or who died between baseline and follow-up. Understanding how eligible baseline respondents who were successfully remeasured differ from those for whom no information can be collected can be useful in assessing whether the data available at follow-up within plans reflects real changes in health outcomes for beneficiaries or non-random selection out of plans on the part of enrollees. We compare follow-up respondents to attriters as a group, as well as subgroups of those who did not respond at follow-up, who died before follow-up, or who voluntarily disenrolled prior to follow-up.

We first describe the analyses conducted to assess the impact of attrition on understanding changes in health outcomes overall and among groups of beneficiaries and managed care plans. We then report on the results of those analyses.


4 7



Overview of Analyses to Assess Significance of Attrition in HOS

To assess whether attrition could limit the ability to draw inferences about health status change overall, and by groups of beneficiaries and plans, we conducted two types of analyses for each cohort of HOS for which follow-up data are available. First, we compared baseline characteristics of the group for which follow-up is available (follow-up respondents) to the group for whom follow-up is not available (attriters). Specifically, we compared differences in physical and mental health, as measured with the SF-36; differences in demographic and socioeconomic characteristics; and differences in Medicaid status and M+CO plan size. We tested for differences in these characteristics between attriters and follow-up respondents assuming independent samples and unequal variances. We then carried out these tests of differences between follow-up respondents and various attrition groups: 1) non-respondents; 2) those who die before follow-up; and 3) those who voluntarily disenroll before follow-up. We do not include in our analyses baseline respondents who were ineligible for follow-up because their plans merged or exited the Medicare managed care market, since our focus here is on whether attrition inhibits the use of baseline and follow-up HOS data to inform performance management and quality improvement among Medicare managed care plans.

Next, we further explored the role of physical and mental health, demographic and socioeconomic characteristics, and plan size on attrition in a multivariate context. For each cohort, we estimated a probit regression in which the dependent variable is equal to 1 if the respondent attrited between baseline and follow-up, and 0 otherwise. This probit regression permits a more careful analysis of which factors are associated with attrition, because multivariate analysis can help eliminate spurious correlation between observable beneficiary characteristics and attrition. The probit regression will help identify which factors provide the most substantial and significant association with attrition.


4 8



Results of Analyses to Assess Significance of Attrition in the HOS

In Table III-9, we summarize baseline characteristics for follow-up respondents, and attriters for Cohort 1 of the HOS. In the first column, we present descriptive statistics for the demographic, socioeconomic, and health status characteristics at baseline for the HOS respondents who later provided follow-up information. In the second column, we present the same descriptive information at baseline for the group of HOS respondents who were eligible for follow-up, but for whom no follow-up survey information was obtained. In the remaining columns, we present this information on attriters, by reason for attrition. These columns present, respectively, baseline information on those who were nonrespondents at follow-up, respondents who died before follow-up, and those who voluntarily disenrolled from their baseline managed care plan prior to follow-up.7 All columns with information on attriters include an indication of whether or not observed baseline characteristics differ significantly from baseline characteristics of follow-up respondents. It is important to recognize that the very large sample size of HOS results in small standard errors, and hence statistically significant differences are common. So, assessing the substantive magnitude of observed differences is at least as important in understanding differences between attriters and follow-up respondents.

Table III-9. Baseline characteristics by follow-up status: HOS Cohort 1.

Attriters: By Reason for Attrition

Respondents All Non-

Respondents Dead Voluntarily Disenrolled

Demographics Age 7 3 . 9 7 4 . 8 * 7 4 . 9 * 7 8 . 5 * 7 3 . 8 Race1 W h ite 8 9 . 3 % 8 7 . 1 %* 8 3 . 8 %* 8 8 . 9% 8 8 . 0%* B lack 5 . 8 % 8 . 0%* 1 0. 7 %* 7 . 3 %* 7 . 3 %* A s ia n 1 . 3 % 1 %* 1 . 1 %* 0. 7 6%* 0. 9 4 %* H isp an ic 1 . 6% 2 . 1 %* 2 . 2 %* 1 . 4 % 2 . 1 %* N a tive A m erican 0. 1 % 0.06%* 0. 1 %* 0. 1 % 0.04 % Female 58 . 2 % 56 . 9%* 58 . 1 %* 4 9 . 1 %* 58 . 5% Socioeconomic Marital status M a rried 59 . 2 % 55 . 5%* 5 1 . 8 %* 4 9 . 9%* 58 . 4 % D ivorced 8 . 4 % 9 . 3 %* 9 . 9%* 8 . 1 % 9 . 4 %* W id ow 2 8 . 9% 3 1 . 5%* 3 4 . 1 %* 3 8 . 2 %* 2 8 . 9% N ever m arried 2 . 7 % 2 . 4 % 3 . 1 % 2 . 6% 2 . 1 %*

7 The entire group of attriters also includes respondents determined by the HOS to be invalid for follow-up survey, for reasons such as an incorrect address or phone number. There were only 567 such respondents, and they are not analyzed separately here.


4 9



Table III-9. (continued) Baseline characteristics by follow-up status: HOS Cohort 1.


Respondents All Non-

Respondents Dead Voluntarily Disenrolled

Education L ess th an h igh sch ool 2 8 . 8 % 3 3 . 3 %* 3 4 . 7 %* 3 9 . 5%* 3 1 . 0%* H igh sch ool gra d u a te 3 5 . 7 % 3 3 . 1 %* 3 3 . 2 %* 3 1 . 3 %* 3 3 . 6%* S om e college cred its 2 0. 6% 2 0.0%* 1 9 . 4 %* 1 7 . 8 %* 2 0. 8 %* C ollege gra d u a te 6 . 8 % 6 . 1 %* 5 . 4 %* 5 . 1 %* 6 . 6%* P ost gra d u a te 7 . 5% 6 . 4 %* 6 . 1 %* 4 . 8 %* 7 . 0%* Income2 H ou seh old I n com e

(con tin u ou s) $ 2 7 , 56 4 $ 2 5 , 1 3 0* $ 2 4 , 4 04 * $ 2 1 , 9 05 * $ 2 6 , 2 7 1 *

< $ 5 , 000 3 . 4 % 5 . 2 %* 6 . 3 %* 6 . 8 %* 4 . 4 %* $ 5 , 000– 9 , 9 9 9 1 2 . 4 % 1 5 . 5%* 1 7 . 1 %* 1 8 . 7 %* 1 4 . 0%* $ 1 0, 000– 1 9 , 9 9 9 3 0. 2 % 3 1 . 4 %* 3 1 . 4 %* 3 3 . 7 %* 3 0. 7 %* $ 2 0, 000– 2 9 , 9 9 9 2 2 . 1 % 2 0. 7 %* 1 9 . 4 %* 1 9 . 4 %* 2 1 . 5%* $ 3 0, 000– 3 9 , 9 9 9 1 3 . 7 % 1 2 . 1 %* 1 1 . 1 %* 1 0. 6%* 1 2 . 9%* $ 4 0, 000– 4 9 , 9 9 9 7 . 6% 6 . 3 %* 5 . 8 %* 4 . 9%* 6 . 9%* $ 50,000– 7 9 , 9 9 9 7 . 4 % 6 . 0%* 6 . 0%* 4 . 1 %* 6 . 5%* > $ 8 0,000 3 . 3 % 2 . 8 %* 2 . 8 %* 1 . 7 %* 3 . 0%* Health P C S 4 4 . 1 4 2 . 1 * 4 2 . 9 * 3 5 . 1 * 4 3 . 6 * M C S 52 . 8 50. 8 * 5 1 . 0* 4 6 . 7 * 5 1 . 8 * P h ys ica l fu n ction in g

sca le 4 2 . 9 4 0. 2 * 4 0. 8 * 3 1 . 8 * 4 2 . 1 *

R ole – P h ys ica l sca le 4 5 . 2 4 3 . 3 * 4 4 . 1 * 3 7 . 1 * 4 4 . 6 * B od i ly p a in sca le 4 7 . 8 4 6 . 4 * 4 7 . 1 * 4 2 . 6 * 4 7 . 1 * G enera l h ea lth sca le 4 8 . 0 4 5 . 8 * 4 6 . 4 * 3 9 . 0* 4 7 . 4 * V ita lity sca le 50. 7 4 8 . 7 * 4 9 . 4 * 4 2 . 6 * 4 9 . 9 *

S ocia l fu n ction in g sca le 4 9 . 2 4 6 . 6 * 4 7 . 4 * 3 9 . 3 * 4 8 . 2 *

R ole – E m otion a l sca le 4 8 . 6 4 6 . 6 * 4 6 . 9 * 4 1 . 8 * 4 7 . 7 * M en ta l h ea lth sca le 5 1 . 8 4 9 . 9 * 50.0* 4 6 . 2 * 50. 9 *

Health Care Context

R esp on d en t M ed ica id b eneficia ry 2 . 2 % 4 . 0%* 3 . 8 %* 6 . 1 %* 3 . 4 %*

E n rolled in p la n la rger th an 1 , 000 9 8 . 1 % 9 8 . 8 %* 9 8 . 9% 9 8 . 4 %* 9 9 . 2 %*

* S ign ifican tly d ifferen t from resp on d en ts a t th e 1 % leve l. 1 R ace categories m ay not su m to 1 00% b ecau se resp on d en ts rep ortin g race as “U n known” or “O th er” are not rep orted h ere . 2 I n com e was record ed in n in e ranges. W e estim ate th e u n d erly in g va lu e of in com e b y assign in g m id p oin t va lu es.


50



Demographics

In terms of demographic characteristics, there are some notable differences between follow-up respondents and attriters, as well as many similarities. The mean age of respondents who would be available for follow-up was 73.9 at baseline. This is nearly 1 year younger than the mean age of baseline respondents who attrited. Among attriters, there are substantial differences in ages, by reason for non-response. Not surprisingly, respondents who died between baseline and follow-up were substantially older (78.5 years old) than follow-up respondents at baseline. Follow-up nonrespondents were also older than follow-up respondents. Those who voluntarily disenrolled, who comprise the largest group of attriters, were approximately the same age at baseline as follow-up respondents.

In many ways, the racial and ethnic composition of respondents available for follow-up and those who were not are similar. Nearly 90% (89.3) of follow-up respondents were white, 5.8% African American, 1.3% Asian, and 1.6% Hispanic. This compares to 87.1% white, 8% African American, 1% Asian, and 2.1% Hispanic among attriters. The only real difference here is that attriters are slightly more likely to be African American (and less likely to be white) than follow-up respondents. This difference is almost entirely driven by the higher rate of non-response among African Americans. Fully 10.7% of follow-up nonrespondents were African American (nearly double the proportion of African Americans among follow-up respondents). At the same time, the proportion of African Americans and whites among those who attrited because of death or voluntary disenrollment are very close to the proportions among follow-up respondents.

Sample members who were available for follow-up were somewhat more likely to be female than male (58.2% versus 56.9% for attriters). This small difference is driven entirely by a higher rate of mortality among men between baseline and follow-up. While women make up a majority of all respondents to the HOS at baseline and follow-up, just less than half (49.1%) of respondents who die between baseline and follow-up are female.


5 1



Socioeconomic Status

There are a few, small differences between follow-up respondents and attriters in terms of their marital status at baseline. Fifty-nine percent (59.2) of those responding to follow-up were married at baseline, compared to 55.5% of attriters. Follow-up respondents were slightly less likely to be divorced or widowed, but they are not significantly any more or less likely to have never been married. Among attriters, those who were nonrespondents were more likely to have been divorced at baseline (9.9% versus 8.4% for follow-up respondents), and those who were dead by follow-up were substantially more likely to have been widowed at baseline (38.2% versus 28.9% for follow-up respondents). This is likely the result of age differences between the two groups.

The educational attainment of follow-up respondents and attriters differ in an interesting way. Attriters are more likely than follow-up respondents to have less than a high school education (33.3% versus 28.8%). This is especially true among nonrespondents (34.7%) and those who die before follow-up (39.5%). Interestingly, to the extent that follow-up respondents and attriters differ in terms of their likelihood of not finishing high school, this is almost completely reflected in differences in the likelihood of having a high school diploma (with no additional college education). There are small differences in the proportion of attriters and follow-up respondents with various levels of post-secondary education.

Next, we turn to differences in income at baseline. It is important to recognize that our analysis of income is limited by the fact that respondents were asked to report their incomes in one of nine ranges, rather than their actual income. So, in the first row under income, we present mean level of income estimated based on the income range respondents reported. To do this, we assigned the midpoint for a range to all respondents reporting income in that range.8 In the rows below, we report the proportion reporting earning within various income ranges.9 Regardless of how income is analyzed, it is clear that follow-up respondents report higher incomes than attriters ($27,564 versus $25,129). Those who die between waves and follow-up nonrespondents disproportionately report low incomes. Among follow-up respondents, 3.4% report incomes of less than $5,000 and 12.4% between $5,000 and $9,999. This compares to 5.2% and 15.5%, respectively, among all attriters.

8 This is a common technique for dealing with grouped data. 9 These ranges do not match the HOS income ranges precisely. For purposes of presentation, we collapsed some of the higher income ranges.


52



Health Status and Context

Differences in the health status of respondents who complete the follow-up survey and those who do not are largely driven by the poor health at baseline of respondents who died before follow-up. The mean SF-36 physical component summary of those who died before follow-up was 35.11 at baseline compared to 44.12 for follow-up respondents. This group had lower mental component summary scores, as well (46.71 versus 52.75). Other groups of attriters differed much less markedly from follow-up responders on both physical and mental health measures.

The most substantial differences in health status between those who die before follow-up and follow-up respondents are demonstrated in the physical and social functioning sub-scales. Similar to the summary score measures, there are much smaller differences between follow-up respondents and other groups of attriters on these measures.

In the final two rows of Table III-9, we present evidence on whether follow-up respondents and attriters differ on the likelihood of receiving Medicaid benefits and size of M+CO plan. Follow-up respondents are less likely to be Medicaid beneficiaries than are attriters (2.2% versus 4%). In particular, baseline respondents who die before follow-up or do not respond at follow-up have the highest rate of Medicaid receipt (6.1 and 3.8%, respectively).

There is little difference in attrition for beneficiaries enrolled in plans larger than 1,000. The only difference in the proportion of the HOS follow-up respondents versus attriters enrolled in managed care plans with more than 1000 enrollees is driven by the higher rate of voluntary disenrollment among respondents in larger plans. Ninety-eight percent (98.1) of follow-up respondents were enrolled in large plans at baseline. This compares to a slightly higher rate of 98.8% in large plans among attriters. However, the rates of enrollment in large plans at baseline for nonrespondents and those who die before follow-up were not statistically different from the rate among follow-up respondents. Only for those who voluntarily disenrolled was the proportion enrolled in large plans higher at baseline (99.2%).

This last finding provides an important contrast to the findings mentioned above on differences between follow-up respondents and various groups of attriters. In general, follow-up respondents and those who voluntarily disenrolled between baseline and follow-up were the most similar in terms of demographics, socioeconomic status, and health status. Baseline respondents who were nonrespondents at follow-up, and those who died between baseline and follow-up were most likely to differ from follow-up respondents on these dimensions, and when they did their differences were more substantial. There are many potential hypotheses one might develop to explain this pattern, such as several based on the nexus of income, education, and health.


53



In Tables III-10 and III-11, we present essentially identical analyses for the HOS Cohort 2 and Cohort 3, respectively. The patterns of demographic, socioeconomic, and health differences between respondents to the follow-up surveys and attriters for Cohorts 2 and 3 are substantially similar to those observed in Cohort 1. Consequently, we will not summarize those patterns here, but include the tables for the benefit of the reader.

Table III-10. Baseline characteristics by follow-up status: HOS Cohort 2.


Respondents

All Non-

Respondents

Dead Voluntarily Disenrolled

Demographics Age 7 4 . 1 7 5 . 2 * 7 5 . 6 * 7 8 . 9 * 7 4 . 0 Race1 W h ite 8 8 . 9% 8 6 . 5%* 8 4 . 2 %* 8 8 . 3 % 8 7 . 1 %* B lack 6 . 1 % 8 . 2 %* 1 0. 1 %* 7 . 5%* 7 . 7 %* A s ia n 1 . 4 % 1 . 0%* 1 . 3 %* 1 . 0%* 0. 8 %* H isp an ic 1 . 5% 2 . 2 %* 1 . 9%* 1 . 5% 2 . 2 %* N a tive A m erican 0. 1 % 0. 1 % 0. 1 % 0. 1 % 0.03 % Female 59 . 2 % 57 . 1 %* 59 . 6% 4 9 . 7 %* 58 . 1 % Socioeconomic Marital status M a rried 5 7 . 9% 53 . 5%* 4 9 . 9%* 4 7 . 7 %* 56 . 5% D ivorced 8 . 3 % 9 . 4 %* 9 . 7 %* 8 . 7 % 9 . 5%* W id ow 2 9 . 3 % 3 2 . 4 %* 3 5 . 3 %* 3 9 . 0%* 3 0. 5% N ever m arried 2 . 6% 2 . 6% 3 . 1 % 2 . 6% 2 . 4 %* Education L ess th an h igh sch ool 2 9 . 4 % 3 4 . 9%* 3 7 . 2 %* 4 0. 1 %* 3 2 . 4 %* H igh sch ool gra d u a te 3 5 . 6% 3 3 . 7 %* 3 3 . 3 %* 3 2 . 1 %* 3 4 . 3 %* S om e college cred its 2 0. 5% 1 8 . 9%* 1 7 . 7 %* 1 6 . 8 %* 2 0. 0%* C ollege gra d u a te 6 . 6% 5 . 7 %* 5 . 6%* 5 . 1 %* 5 . 9%* P ost gra d u a te 7 . 2 % 5 . 8 %* 4 . 9%* 4 . 6%* 6 . 4 %*


54



Table III-10. (continued) Baseline characteristics by follow-up status: HOS Cohort 2.


Respondents

All Non-

Respondents


Income2

H ou seh old I n com e (con tin u ou s)

$26,762 $ 2 3 , 7 6 5 * $ 2 3 , 059 * $ 2 1 , 1 7 0* $ 2 4 , 7 9 9 *

< $ 5 , 000 5 . 9% 8 . 1 %* 9 . 3 %* 9 . 7 %* 7 . 2 %* $ 5 , 000– 9 , 9 9 9 1 2 . 1 % 1 5 . 2 %* 1 6 . 3 %* 1 7 . 0%* 1 4 . 2 %* $ 1 0, 000– 1 9 , 9 9 9 3 0. 1 % 3 2 . 4 %* 3 2 . 8 %* 3 4 . 9%* 3 1 . 6%* $ 2 0, 000– 2 9 , 9 9 9 2 1 . 6% 1 9 . 7 %* 1 8 . 3 %* 1 8 . 6%* 2 0. 5%* $ 3 0, 000– 3 9 , 9 9 9 1 2 . 9% 1 0. 8 %* 1 0. 4 %* 9 . 6%* 1 1 . 4 %* $ 4 0, 000– 4 9 , 9 9 9 7 . 0% 5 . 6%* 4 . 9%* 4 . 5%* 6 . 3 %* $ 50,000– 7 9 , 9 9 9 6 . 9% 5 . 5%* 5 . 2 %* 3 . 9%* 6 . 1 %* > $ 8 0,000 3 . 4 % 2 . 6%* 2 . 7 %* 1 . 8 %* 2 . 7 %* Health P C S 4 3 . 1 4 1 . 4 * 4 2 . 2 * 3 4 . 5 * 4 3 . 0* M C S 52 . 6 50. 6 * 50. 7 * 4 6 . 3 * 5 1 . 7 * P h ys ica l fu n ction in g sca le

4 2 . 4 3 9 . 5 * 3 9 . 9 * 3 1 . 8 * 4 1 . 5 *

R ole – P h ys ica l sca le 4 4 . 6 4 2 . 5 * 4 3 . 1 * 3 1 . 4 * 4 4 . 0* B od i ly p a in sca le 4 7 . 4 4 5 . 8 * 4 6 . 3 * 3 6 . 3 * 4 6 . 7 * G enera l h ea lth sca le 4 7 . 6 4 5 . 2 * 4 5 . 7 * 4 1 . 7 * 4 6 . 8 * V ita lity sca le 50. 4 4 8 . 4 * 4 9 . 0* 3 8 . 5 * 4 9 . 8 * S ocia l fu n ction in g sca le 4 8 . 9 4 6 . 0* 4 6 . 5 * 3 8 . 6 4 7 . 8 * R ole – E m otion a l sca le 4 8 . 2 4 6 . 0* 4 6 . 1 * 4 1 . 1 * 4 7 . 3 * M en ta l h ea lth sca le 5 1 . 6 4 9 . 6 * 4 9 . 7 * 4 5 . 8 * 50. 7 * Health Care Context R esp on d en t M ed ica id b eneficia ry

2 . 5% 4 . 5%* 4 . 1 %* 6 . 1 %* 4 . 1 %*

E n rolled in p la n la rger th an 1 , 000

9 8 . 0% 9 8 . 0%* 9 7 . 7 % 9 8 . 2 % 9 8 . 1 %

* S ign ifican tly d ifferen t from resp on d en ts a t th e 1 % leve l. 1 R ace categories m ay not su m to 1 00% b ecau se resp on d en ts rep ortin g race as “U n known” or “O th er” are not rep orted h ere . 2 I n com e was record ed in n in e ranges. W e estim ate th e u n d erly in g va lu e of in com e b y assign in g m id p oin t va lu es.


55



Table III-11. Baseline characteristics by follow-up Status: HOS Cohort 3.


Respondents

All Non-

Respondents


Demographics Age 7 4 . 4 7 5 . 3 * 7 5 . 7 * 7 9 . 3 * 7 4 . 1 * Race1 W h ite 8 9 . 6% 8 7 . 4 % 8 5 . 2 %* 8 9 . 5% 8 8 . 1 %* B lack 5 . 5% 7 . 3 %* 8 . 5%* 6 . 2 % 7 . 0%* A s ia n 1 . 2 % 1 . 0%* 1 . 4 % 1 . 1 % 0. 9%* H isp an ic 1 . 4 % 1 . 9%* 1 . 9%* 1 . 6% 1 . 8 %* N a tive A m erican 0. 04 % 0.05% 0. 1 % 0.03 % 0. 1 % Female 59 . 1 % 57 . 0%* 59 . 4 % 4 9 . 0%* 57 . 8 %* Socioeconomic Marital status M a rried 58 . 7 % 55 . 0%* 52 . 8 %* 4 7 . 6%* 5 7 . 9% D ivorced 8 . 4 % 9 . 2 %* 0. 1 % 8 . 4 % 9 . 5%* W id ow 2 9 . 1 % 3 1 . 9%* 3 4 . 2 %* 3 9 . 7 %* 2 8 . 8 % N ever m arried 2 . 8 % 2 . 8 % 2 . 8 % 3 . 1 % 2 . 7 % Education L ess th an h igh sch ool 2 7 . 9% 3 2 . 0%* 3 4 . 0%* 3 8 . 3 %* 2 9 . 2 %* H igh sch ool gra d u a te 3 6 . 8 % 3 4 . 5%* 3 4 . 7 %* 3 2 . 7 %* 3 5 . 0%* S om e college cred its 2 0. 9% 1 9 . 8 %* 1 8 . 7 %* 1 7 . 5%* 2 1 . 1 % C ollege gra d u a te 6 . 6% 6 . 4 % 5 . 9% 5 . 5%* 6 . 8 % P ost gra d u a te 7 . 3 % 6 . 5%* 5 . 7 %* 4 . 8 %* 7 . 3 % Income2 H ou seh old I n com e

(con tin u ou s) $ 2 8 , 3 4 2 $ 2 6 , 7 7 4 * $ 2 5 , 7 4 5 * $ 2 3 , 54 9 * $ 2 8 , 060*

< $ 5 , 000 3 . 3 % 5 . 0%* 5 . 9%* 6 . 3 %* 4 . 3 %* $ 5 , 000– 9 , 9 9 9 1 0. 8 % 1 3 . 2 %* 1 4 . 2 %* 1 6 . 1 %* 1 2 . 0%* $ 1 0, 000– 1 9 , 9 9 9 3 1 . 1 % 3 1 . 5% 3 2 . 4 %* 3 5 . 1 %* 3 0. 2 %* $ 2 0, 000– 2 9 , 9 9 9 2 3 . 0% 2 1 . 0%* 2 0. 4 %* 1 9 . 3 %* 2 1 . 6%* $ 3 0, 000– 3 9 , 9 9 9 1 2 . 7 % 1 2 . 0%* 1 1 . 1 %* 1 0. 2 %* 1 2 . 8 % $ 4 0, 000– 4 9 , 9 9 9 7 . 4 % 6 . 6%* 6 . 0%* 5 . 0%* 7 . 2 % $ 50,000– 7 9 , 9 9 9 7 . 9% 7 . 3 %* 6 . 7 %* 5 . 1 %* 8 . 1 % > $ 8 0,000 3 . 9% 3 . 6%* 3 . 4 %* 2 . 8 %* 3 . 9% Health P C S 4 3 . 1 4 1 . 5 * 4 2 . 0* 3 4 . 4 * 4 3 . 0 M C S 52 . 5 50. 5 * 50. 9 * 4 5 . 9 * 5 1 . 6 * P h ys ica l fu n ction in g

sca le 4 2 . 1 3 9 . 8 * 4 0. 1 * 3 1 . 5 * 4 1 . 7 *

R ole – P h ys ica l sca le 4 4 . 3 4 2 . 7 * 4 3 . 3 * 3 6 . 2 * 4 4 . 1 B od i ly p a in sca le 4 7 . 0 4 6 . 0* 4 6 . 4 * 4 1 . 8 * 4 6 . 8 * G enera l h ea lth sca le 4 7 . 3 4 5 . 3 * 4 5 . 8 * 3 8 . 4 * 4 6 . 9 * V ita lity sca le 50. 2 4 8 . 5 * 4 9 . 1 * 4 2 . 1 * 4 9 . 8 * S ocia l fu n ction in g

sca le 4 8 . 8 4 6 . 3 * 4 6 . 9 * 3 8 . 5 * 4 8 . 0*

R ole – E m otion a l sca le 4 8 . 0 4 6 . 0* 4 6 . 3 * 4 0. 8 * 4 7 . 2 * M en ta l h ea lth sca le 5 1 . 6 4 9 . 8 * 50.0* 4 5 . 7 * 50. 8 *


56



Table III-11. (continued) Baseline characteristics by follow-up Status: HOS Cohort 3.


Respondents

All Non-

Respondents


Health Care Context R esp on d en t M ed ica id

b eneficia ry 2 . 7 % 4 . 3 %* 3 . 7 %* 6 . 7 %* 3 . 9%*

E n rolled in p la n la rger th an 1 , 000

1 00.0% 9 9 . 0%* 9 8 . 2 % 9 8 . 5% 9 9 . 5%*

* S ign ifican tly d ifferen t from resp on d en ts a t th e 1 % leve l. 1 R ace categories m ay not su m to 1 00% b ecau se resp on d en ts rep ortin g race as “U n known” or “O th er” are not rep orted h ere .

2 I n com e was record ed in n in e ranges. W e estim ate th e u n d erly in g va lu e of in com e b y assign in g m id p oin t va lu es.

Multivariate Analyses

To further assess the relationship between various measures of demographic, socioeconomic, and baseline health status measures and attrition we conducted probit regression analyses for each of the first three HOS cohorts. These multivariate analyses permit us to assess the independent contribution of each of these characteristics of respondents at baseline to the likelihood of attrition from the HOS.

The dependent variable in these regressions was set equal to 1 if an eligible baseline respondent did not respond to the follow-up survey (i.e., the respondent was an attriter). The dependent variable was set equal to 0 if the baseline respondent was also a follow-up respondent. As in the previous analyses, we restricted our focus to baseline respondents who were eligible for follow-up. The independent variables in these regressions measured demographic, socioeconomic, and health characteristics of respondents similar to the measures employed in Tables III-9 to III-11.10

The results of the probit regression analyses for each of the three cohorts are presented in Table III-12. Recall that coefficients from probit regressions must be transformed for the purposes of interpreting magnitudes. As in our assessment of the predictive validity in the previous section, we adopt the convention here of reporting coefficients as marginal probabilities so that the reported coefficients can be interpreted as the marginal change in the probability of attrition, due to a one unit change in an independent variable.11

1 0 For the probit regression analyses, we measured income using a series of dummy variables based on responses to the HOS income question. We also included only the physical and mental health summary scores, and not their component scales. 11 Since the probit estimator is non-linear, it is important to recognize that we evaluate the change in probability at the mean values of the independent variables.


57



Table III-12. Multivariate analysis of predictors of attrition.

HOS Cohort Cohort 1 Cohort 2 Cohort 3 Demographics Age . 003 ( . 000) * . 004 ( . 000) * . 003 ( . 000) * Race A frican A m erican . 04 9 ( . 007 ) * . 04 2 ( . 007 ) * . 04 8 ( . 007 ) * A s ia n – . 04 6 ( . 01 4 ) * – . 07 0 ( . 01 4 ) * – . 03 3 ( . 01 5 ) H isp an ic . 050 ( . 01 3 ) * . 050 ( . 01 2 ) * . 04 4 ( . 01 3 ) * N a tive A m erican . 052 ( . 07 4 ) . 07 3 ( . 07 2 ) – . 07 6 ( . 07 8 ) Female – . 03 2 ( . 004 ) * – . 052 ( . 004 ) * – . 04 7 ( . 004 ) * Socioeconomic Marital Status M a rried – . 0003 ( . 004 ) – . 005 ( . 004 ) – . 01 6 ( . 004 ) * D ivorced . 01 7 ( . 006 ) * . 02 7 ( . 006 ) * . 01 4 ( . 006 ) Education H igh sch ool gra d u a te – . 02 3 ( . 004 ) * – . 01 9 ( . 004 ) * – . 01 8 ( . 004 ) * S om e college cred its – . 002 ( . 005 ) – . 01 5 ( . 005 ) – . 01 1 ( . 005 ) C ollege gra d u a te – . 01 3 ( . 007 ) – . 02 1 ( . 007 ) – . 007 ( . 007 ) P ost gra d u a te – . 01 9 ( . 007 ) * – . 03 5 ( . 007 ) * – . 02 3 ( . 007 ) * Income $ 1 0, 000– 1 9 , 9 9 9 – . 03 0 ( . 005 ) * * – . 02 3 ( . 005 ) * – . 03 5 ( . 005 ) * $ 2 0, 000– 2 9 , 9 9 9 – . 03 9 ( . 006 ) * – . 04 8 ( . 006 ) * – . 04 7 ( . 006 ) * $ 3 0, 000– 3 9 , 9 9 9 – . 04 3 ( . 007 ) * – . 05 7 ( . 006 ) * – . 02 9 ( . 007 ) * $ 4 0, 000– 4 9 , 9 9 9 – . 05 1 ( . 008 ) * – . 059 ( . 008 ) * – . 03 7 ( . 008 ) * $ 50,000– 7 9 , 9 9 9 – . 053 ( . 008 ) * – . 056 ( . 008 ) * – . 02 3 ( . 008 ) > $ 8 0,000 . 04 6 ( . 01 1 ) * – . 06 5 ( . 01 ) * – . 02 2 ( . 01 ) Health P C S score – . 002 ( . 000) * – . 002 ( . 000) * – . 002 ( . 000) * M C S score – . 003 ( . 000) * – . 003 ( . 000) * – . 003 ( . 000) * M ed ica id B en eficia ry . 08 1 ( . 01 1 ) * . 06 8 ( . 009 ) * . 056 ( . 009 ) * E n rolled in P la n > 1 , 000 . 1 2 6 ( . 01 2 ) * . 03 6 ( . 01 2 ) * . 2 1 3 ( . 01 2 ) * Pseudo R2 . 01 52 . 01 7 8 . 01 4 7

* S ign ifican t a t th e 1 % leve l. Note. C oefficien ts rep orted a re ch anges in p rob ab i l ity of a ttrition , eva lu a ted a t m ean va lu es of in d ep en d en t va ria b les . S ta n d a rd errors in p a ren th eses.

These results generally confirm the patterns identified in Tables III-9 to III-11. Age, race, and gender are all substantial predictors of attrition. Notably, even controlling for socioeconomic status and health, African American and Hispanic baseline respondents are substantially more likely to attrite than white baseline respondents. Specifically, the probability of attrition among African American respondents is 4.9 percentage points higher than among white respondents. Hispanic respondents attrit at a rate that is 5.0 percentage points higher than whites. The magnitude of these increases is substantial, since the overall rate of attrition is 41.9% among all baseline respondents eligible for follow-up. So, even when controlling for other pertinent factors, attrition among African American and Hispanic respondents is more than 10% higher than among white respondents. Similarly, women remain substantially less likely to attrite (for reasons except death) prior to follow-up, with the rate of attrition about 3.2 percentage points lower for women than men, conditional on other factors.


58



Similar to the earlier results, education is a predictor of attrition and most of that is due to differences between respondents who do not complete high school and those who earn a high school diploma (regardless of whether or not additional education was attained). The attrition rate among high school graduates is 2.3 percentage points lower than that among high school dropouts, controlling for other factors. Attrition rates of respondents with education beyond high school are also lower than the attrition rates of those not completing high school (1.3 percentage points lower among college graduates, and 1.9 percentage points lower among those with postgraduate levels of education). However, attrition rates are not lower among those who complete college than among those who complete high school but do not continue on to graduate from college.

The results in Table III-12 suggest that upon controlling for other factors, marital status is not a particularly important predictor of attrition. There is no difference in attrition risk between married respondents and those widowed. Attrition rates among respondents who were divorced at baseline were 1.7 percentage points higher than those widowed.

The multivariate results do suggest a very interesting relationship between income and attrition that is not obvious in Tables III-9 to III-11. While it is obvious that attrition and income are negatively related, the gradient of income in determining the likelihood of attrition apparent in Table III-12 is quite interesting. Respondents with incomes between $10,000 and $19,999 at baseline drop out by follow-up at a rate that is 3.0 percentage points lower that the attrition rate of those with incomes under $10,000 (the comparison group). The likelihood of attrition declines somewhat further as income rises further (e.g., the probability of attrition is 5.1 percentage points lower among those with incomes in the $40,000 to $49,999 range). But the likelihood of attrition does not really fall further as income rises beyond this specified range.

Next, it is clear that good physical and mental health (measured by the PCS score and MCS score) are substantially, significantly, and negatively related to the likelihood of attrition between baseline and follow-up surveys. These two measures were the most statistically significant predictors included in the model. For each one-unit increase in a respondent’s PCS score at baseline, the probability of attrition falls by about one-quarter of a point (0.23). Mental health status is an equally important predictor of attrition. As MCS score scores increase by one unit, the probability of attrition falls by a third of a percentage point (0.33).


59



In terms of magnitude of risk, however, Medicaid status and enrollment in plans with more than 1,000 enrollees were among most substantial variables included in the model. Respondents who were Medicaid beneficiaries at baseline attrited at a rate that was 8.1 percentage points higher than comparable enrollees not enrolled in Medicaid. The rate of attrition among respondents enrolled in large plans was 12.6 percentage points higher than the rate among enrollees in smaller plans who had similar demographic, socioeconomic, and health characteristics. Clearly, Medicaid receipt and enrollment in a small health plan can, on average, substantially increase risk of attrition.

It is important to mention that while each of these variables can help identify patterns of attrition on average, collectively they explain very little of the overall variation in attrition. The Pseudo-R2 values for the multivariate models to assess attrition are quite low. So, for no cohort could individual variation in the likelihood of attrition among respondents be well explained by the demographic, socioeconomic, health status, and context variables examined here. While these variables help identify increases in risk of attrition, the vast majority of follow-up non-response is due to individual properties of respondents that are not associated with these characteristics. Thus, it appears the process determining attrition is largely random, insofar as it is unrelated to important measurable socio-demographic and economic properties of respondents.


60



Discussion of Attrition Results

Nearly all of the attrition observed in the HOS Cohorts 1, 2, and 3 was driven by factors other than respondents’ age, gender, race, marital status, education, income, health status, plan size, and Medicaid status. Nonetheless, although the overall patterns of attrition could not be explained by such factors, there were small average differences in some of these characteristics between follow-up respondents and attriters. Respondents who attrited from the HOS, on average, are older, less educated, have lower incomes, are in poorer mental and physical health, are more likely to be Medicaid beneficiaries, and are more likely to be African American and male. Although these differences exist, the impact of attrition on analyses using the HOS is likely to be relatively minor. On many of these dimensions, average differences between follow-up respondents and attriters were small, even if they were statistically significant. For example, for Cohort 1, 89.3% of follow-up respondents were white. The percentage of attriters who were white was substantively similar at 87.1%. Although this difference is statistically significant, its substantive importance is likely to be minor.12

Although our models show their effects are quite small, two important dimensions of attrition in the HOS are useful for understanding its potential impact. First, it is clear that differences between groups of attriters on many important dimensions exist. Most importantly, and not surprisingly, baseline respondents who die before follow-up are substantially older and have a poorer mental and physical health status at baseline. There also are other important differences. In general, respondents who were counted among the attriters because they were nonrespondents at follow-up tended to differ from follow-up respondents. For example, 10.7% of follow-up nonrespondents were African American compared with 5.8% of follow-up respondents. Nonrespondents also had substantially lower incomes and lower levels of education. Alternatively, baseline respondents who were counted among the attriters because of voluntary disenrollment were much more similar to follow-up respondents on almost all dimensions. For the purposes of understanding the net effect of each of these various group differences on the larger issue of attrition, it is important to recognize that the number of attriters due to voluntary disenrollment is about three times as large as the number due to non-response. Hence, the net effect of attrition is relatively modest because of similarities between follow-up respondents and those who voluntarily disenrolled before follow-up.

1 2 The large sample size of the HOS, and subsequent small standard errors, renders this small difference statistically significant.


6 1



The second important dimension is that although socioeconomic factors are associated with attrition, the pattern is not linear. Attrition is substantially more likely among respondents at the very lowest levels of income and education. However, rates of attrition do not fall appreciably lower as respondents’ education levels increase beyond high school or their incomes increase above the $30,000 to $40,000 range.

Both dimensions are important for users to keep in mind as they think about the potential implications of attrition on analytical use of the HOS. Although there are some average differences in characteristics of baseline respondents who attrite and those who are surveyed at follow-up, these often are driven by differences among follow-up respondents and nonrespondents and the deceased. Further, these differences appear most substantial at the very extremes of socioeconomic status. Nonetheless, it appears that the effects of attrition in the HOS are likely to be small, though such effects, however small, should never be left unconsidered. Differences between respondents available for follow-up and those not available for follow-up are often substantively minor.

Statistical Power and Minimum Detectable Effects

Two types of errors are of concern in statistical analysis. The first, which is referred to as a type I error, results when a null hypothesis that is in fact true is rejected. The type I error rate is generally what is referred to as significance level (e.g., 5%) and is generally specified a priori. The second type of error, referred to as a type II error, results when a null hypothesis that is in fact false is not rejected. Power is defined as one minus the type II error rate and is interpreted as the probability of rejecting a null hypothesis when that null hypothesis is, in fact, false. A concept that is closely related to power is minimum detectable effect, “the smallest true impact that would be found to be statistically and significantly different from zero at a specified level of significance with specified power” (Orr, 1999, p. 112; also see Bloom, 1995).

In this section of the analysis, we consider the power and minimum detectable effect associated with the HOS. It should be noted that the statistical methods actually used to conduct plan-level comparisons in the HOS entail a multi-step procedure. Tractable assessments of the power and minimum detectable effect of the HOS based upon that multi-step procedure are extremely difficult, if not impossible. Hence, the analysis presented here is based upon a statistical model that is less complex than that actually used to conduct plan-level comparisons in the HOS. However, we believe that the results obtained are a reasonable approximation to what would be obtained by directly analyzing power and minimum detectable effects based on the HOS multi-step procedure.


6 2



Statistical Model and Estimation Procedure

As noted above, the related concepts of power and minimum detectable effects are intimately tied to the testing of a null hypothesis. In the context of using HOS outcome measures to identify the extent to which Medicare plans maintain or improve the physical and mental health of their Medicare beneficiaries over time, the null hypothesis of concern is:

H0: There is no unique “effect” of being enrolled in plan i.

To test any null hypothesis, a statistical model must be specified. Here, we employ the following statistical model:

tjj

I

itijit

I

iiji

K

k

tjkk

tj auFPFPXY ++++++= ∑∑∑

=== 21

210 γγαββ , t=1, 2 [1]

where: is the “true”, but not observed, follow-up score for individual j on the measure

being analyzed (i.e., PCS score or MCS score);

2jY

1j is the “true”, but not observed, baseline score for individual j on the measure being

analyzed (i.e., PCS score or MCS score);

Y

F = 1, if the observation being considered is the follow-up score for individual j, t

ij

j

e t;

I is the number of plans; and

β0, β1, …, βK, α2, α3, …, αI, γ1, γ2…, γI are a series of coefficients to be estimated.

= 0 otherwise;

tjkX is the value of covariate k for individual j (at time t);

P = 1, if individual j is enrolled in plan i,

= 0 otherwise;

u is an “individual effect” that persists over time;

tja is a random error term that applies to person j at tim


6 3



Based on the cfollows:

oefficients γ1, γ2, …, γI, the “gross” plan effect for plan I, ∆i can be determined as

11 γ=∆ [2A]

ii γγ +=∆ 1 i=2, 3, …, I [2B]

These “gross” plan effects represent the mean change in PCS score or MCS score scores between baseline and follow-up for enrollees in each plan, controlling for characteristics represented in vector X (see equation [1]). Our focus in this analysis is on “net effects,” that is, mean change in PCS score or MCS score scores between baseline and follow-up for enrollees in each plan, controlling for characteristics repreplans. The net plan effect for plan i, δi, can be determined as follows:

sented in the vector X, relative to the mean change of enrollees in all other

iii ∆−∆=δ [3]

where i∆ is the mean of the “gross plan effects” (as defined in equation [2]) other than plan

measured with error, it is impossible to calculate the quantity on the left-hand side of equation [1]. Hence, equation [1] cannot be estimated directly froobserved scores at baseline and follow-up can be represented as follows:

tt YY ηω ++=

i.

Because both the baseline and follow-up scores for each individual are

m the HOS data. However, the relationship between true scores and

tjjjj

~ [4]

where: 1~jY is the observed baseline score for individual j on the measure being analyzed (i.e.,

PCS score or MCS score);

2~jY is the observed follow-up score for individual j on the measure being analyzed (i.e., PCS

score

or MCS score);

jω are random error terms that affect both the baseline and follow-up scores of individual j;

and


6 4



1jη and are random error terms that are time-specific.

Substituting the right-hand side of equation [1] for in equation [2] yields:

iji

K

k

tjkk

tj auFPFPXY ηωγγαββ ++++++++= ∑∑∑

== 20

10

~

, t=1, 2 [5]

2jη

tjY

tj

tjjj

I

itijit

I

i=2

tjj

I

itijit

I

iiji

K

k

tjkk eFPFPX θγγαββ ++++++= ∑∑∑

=== 10

110

where: jjj ue ω+= ; and

tj

tj

tj a ηθ +=

Equation [5] can be estimated since all variables in it are observed. In particular, if the individual effects in equation [5] (i.e., ej) represent random effects, a random effects estimator is appropriate. If, on the other hand, the individual effects in equation [5] represent fixed effects, fixed effects estimates are necessary (see Wooldridge, 2003, chapter 14). There do not appear to be any a priori grounds for elieving one estimation procedure is preferable to the other for the HOS.13 Hence, we examined

both estimation procedures and also conducted Hausman tests to guide our choice of whether to use fixed effects or random effects estimates.

b

1 3 It should be noted, however, that any of the covariates contained in the vector X that are time-invariant (e.g., gender, race) will drop out of fixed-effects models. Consequently, observations that contain missing values of these variables will not need to be dropped from the analysis.


6 5



For our random effects estimators, covariates included in the model were essentially the same as the demographic and socioeconomic variables used in the HOS case mix adjustment model (CMS and HSAG, 2003, D1-D14). Specifically, the covariates included in the model were: age; product of age and a dummy variable indicating whether the respondent is 75 years of age or older; product of age and a dummy variable indicating whether the respondent is 85 years of age or older;14 a dummy variable indicating whether the respondent is female; the product of age and the dummy variable for female; a series of dummy variables representing race/ethnicity (specifically, we used dummy variables for African American, Asian/Pacific Islander, and Hispanic); a dummy variable indicating whether the respondent is on Medicaid; a dummy variable indicating whether the respondent’s annual income is below $20,000; a dummy variable indicating whether the respondent is a high school graduate; a dummy variable indicating whether the respondent owns his/her home; a dummy variable indicating whether the respondent is married; and a dummy variable indicating whether the survey was administered by telephone. Since some of these covariates are fixed and the remaining ones have minimal variation over time, they were excluded from the fixed effects estimations. Hence, the fixed effects estimations were based on larger effective sample sizes.

Calculation of Power and Minimum Detectable Effects

The power of an estimate of a net plan effect is equal to the area under a standard normal

distribution to the left of *tSE

I

i −γ

γ, i.e.,

Power = ⎟⎟

⎠

⎞

⎜⎜

⎝

⎛−Φ *t

SEI

i

γ

γ [6]

where: γi is the actual net plan effect;

iSEγ̂ is the standard error of the estimate of γi; and

t* is the critical value of the t-statistic for the type I error rate selected (e.g., 1.96 for a type I

error rate of .05),

1 4 These three terms involving age allow for a piecewise nonlinear effect with the marginal effects of an additional year of age taking on three different values—one for the first 75 years of life, one for the period between ages 75 and 85, and one for the period after age 85.


6 6



It can also be shown that the minimum detectable effect (MDE) of an estimate of a net plan effect is given by

( )I

SEtkMDE γ*+= [7]

where: t* and are as defined above; and i

SEγ̂

k is a standard normal deviate such that the area to the left of k under a standard normal

distribution is equal to the specified level of power (e.g., 0.84 if the specified level of power

is 0.80).

The expressions for both power and minimum detectable effect require that the standard errors of the estimates of the net plan effects be calculated. These can be estimated as a by-product of estimating equation [5]. The calculation of power also requires that the actual effect of plan i be specified. However, this is, of course, unknown. In our analyses, we have selected two sets of values to use in calculating power. The first is derived from the distribution of net effects that were calculated from Cohort 1 data. This is analogous to the “actuarial approach” described by Lipsey (1990). In particular, we calculated power associated with the first quartile, median, and third quartile of the absolute value of estimated net effects that were calculated from Cohort 1 data. The second set of values of actual net plan effects that we considered were based on what Cohen (1988) labeled “small,” “medium,” and “large” effects, namely .2 standard deviations, .5 standard deviations, and .8 standard deviations. Because, by design of the SF-36, the standard deviation of PCS score and MCS score scores is 10, this implies that in the context of HOS, a small effect is equal to 2.0 units, a medium effect is equal to 5.0 units, and a large effect is equal to 8.0 units.

In calculating the minimum detectable effect, the level of power must be specified a priori. It has become standard to consider .80 as a “minimal standard” for power (see Lipsey, 1990, p. 22; and Cohen, 1988). In calculating minimum detectable effects, we considered this “minimal standard,” as well as .90.


6 7



Results

The model represented by equation [5] was estimated for Cohorts 1 through 3.15 For Cohort 1, 225 plans were included in the analysis; for Cohort 2, 165 plans were included in the analysis; for Cohort 3, 155 plans were included in the analysis. Separate analyses were conducted using PCS score and MCS score as dependent variables in the model. In all cases, the Hausman test indicated that fixed effects estimators were required. The results that follow are based on these estimators.

Power

As equation [6] indicates, the power of an estimate of a net plan effect is a function of the standard error of that estimate. Because the effective sample size16 for each plan is different, the standard error of the estimate of each net plan effect is different. In Cohort 1, for example, the effective sample size for plans ranged from a low of 8 to a high of 602. While the standard error of the estimate of each plan effect was calculated, power was only calculated for a selected set of plans, namely, those with small (i.e., first quartile), moderate (i.e., median), and high (i.e., third quartile) standard errors of estimates of net plan effects. The results of our power calculations for PCS score are shown in Tables III-13 to III-15. Tables III-16 to III-18 contain the results of our power calculations for MCS score.

These results suggest that whether the HOS has “adequate” power depends very strongly upon the actual effect size deemed important to detect. HOS clearly has adequate power to detect actual net plan effects that correspond to what Cohen (1988) labeled “small” effects, namely 0.2 standard deviations. For Cohort 1 PCS score scores, for example, if the actual net plan effects correspond to what Cohen (1988) labeled “small” effects, the power is greater than 0.93 for half of the plans and greater than 0.81 for 75% of plans. It is only the 25% of plans with the smallest effective sample sizes for which the power is less than 0.815 when the actual net plan effects are 2.0. Similar or stronger results hold for the analysis of Cohort 2 and Cohort 3 PCS score scores as well as for the analysis of MCS score scores for each cohort.

1 5 Our analysis was based on “follow-up plans,” rather than “performance measurement units.” We believe that analysis on the basis of “performance measurement units” would have produced essentially identical results. 1 6 By effective sample size, we mean the number of respondents from whom usable baseline, as well as follow-up, data was obtained.


6 8



While HOS has the power to detect Cohen's "small" effects, very few net plan effects that exceeded 2.0 (or even 1.5) in absolute value show up in our analysis. For example, analysis of Cohort 1 PCS score scores resulted in only three plans for which the absolute value of the estimated net plan effect exceeded 2.0 and three plans for which the absolute value of the estimated net plan effect was between 1.5 and 2.0. Ultimately, the appropriate threshold of actual net plan effects that it is deemed important to detect is a matter that cannot be determined on analytical grounds alone. Rather, this decision depends critically on the clinical and policy implications of actual net plan effects of various sizes.

Table III-13. Calculated power for net plan effect estimates—PCS score, Cohort 1.

Standard Error of Net Plan Effect Estimate

Actual Effect .53

(1st Quartile) .58

(Median) .70

(3rd Quartile) 0. 2 0 ( 1 st Q u a rti le of A ctu a l

E stim ates) . 057 . 053 . 04 7

0. 4 8 (M ed ia n of A ctu a l E stim ates)

. 1 4 6 . 1 2 9 . 1 01

0. 8 4 (3 rd Q u a rti le of A ctu a l E stim ates)

. 3 54 . 3 04 . 2 2 4

2 ( “S m a ll”) . 9 6 5 . 9 3 2 . 8 1 5

5 ( “M ed iu m ”) 1 . 0 1 . 0 1 . 0

8 ( “L a rge”) 1 . 0 1 . 0 1 . 0



Actual Effect .41

(1st Quartile) .45

(Median) .54


E stim ates) . 06 9 . 06 4 . 055


. 1 3 1 . 1 1 6 . 09 3


. 3 55 . 3 06 . 2 2 7

2 ( “S m a ll”) . 9 9 8 . 9 9 4 . 9 60

5 ( “M ed iu m ”) 1 . 0 1 . 0 1 . 0

8 ( “L a rge”) 1 . 0 1 . 0 1 . 0


6 9





Actual Effect .40

(1st Quartile) .44

(Median) .53


E stim ates) . 055 . 05 1 . 04 6


. 1 3 2 . 1 1 6 . 09 3

0. 56 (3 rd Q u a rti le of A ctu a l E stim ates)

. 2 8 9 . 2 4 4 . 1 8 4

2 ( “S m a ll”) . 9 9 9 . 9 9 5 . 9 6 5

5 ( “M ed iu m ”) 1 . 0 1 . 0 1 . 0

8 ( “L a rge”) 1 . 0 1 . 0 1 . 0

Table III-16. Calculated power for net plan effect estimates—MCS score, Cohort 1.


Actual Effect .48

(1st Quartile) .58

(Median) .67


E stim ates) . 07 1 . 060 . 054


. 1 6 6 . 1 2 6 . 1 05


. 4 4 6 . 3 2 3 . 2 5 7

2 ( “S m a ll”) . 9 8 7 . 9 3 2 . 8 5 1

5 ( “M ed iu m ”) 1 . 0 1 . 0 1 . 0

8 ( “L a rge”) 1 . 0 1 . 0 1 . 0


7 0





Actual Effect .45

(1st Quartile) .49

(Median) .59


E stim ates) . 057 . 053 . 04 7


. 1 2 8 . 1 1 4 . 09 1


. 3 1 1 . 2 7 0 . 2 00

2 ( “S m a ll”) . 9 9 4 . 9 8 3 . 9 2 4

5 ( “M ed iu m ”) 1 . 0 1 . 0 1 . 0

8 ( “L a rge”) 1 . 0 1 . 0 1 . 0



Actual Effect .44

(1st Quartile) .49

(Median) .60


E stim ates) . 06 3 . 058 . 050


. 1 4 7 . 1 2 6 . 09 8


. 3 56 . 2 9 8 . 2 1 4

2 ( “S m a ll”) . 9 9 5 . 9 8 3 . 9 1 5

5 ( “M ed iu m ”) 1 . 0 1 . 0 1 . 0

8 ( “L a rge”) 1 . 0 1 . 0 1 . 0

Minimum Detectable Effects

The minimum detectable effect of a net plan effect is, like the power of an estimate of such an effect, a function of the standard error of that estimate. As in the case of power, we calculated the minimum detectable effect for the same set of plans for which we calculated power. The minimum detectable effects for PCS score and MCS score are shown in Tables III-19 and III-20, respectively.


7 1



Table III-19. Minimum detectable effect for net plan effect estimates—PCS score.

Standard Error of Net Plan Effect Estimate Power Cohort 1st Quartile17 Median18 3rd Quartile19

. 8 0 1 1 . 4 8 1 . 6 2 1 . 9 6

. 8 0 2 1 . 1 5 1 . 2 6 1 . 5 1

. 8 0 3 1 . 1 2 1 . 2 4 1 . 4 9

. 9 0 1 1 . 7 2 1 . 8 8 2 . 2 7

. 9 0 2 1 . 3 3 1 . 4 6 1 . 7 5

. 9 0 3 1 . 3 0 1 . 4 4 1 . 7 2

Table III-20. Minimum detectable effect for net plan effect estimates—MCS score.

Standard Error of Net Plan Effect Estimate Power Cohort 1st Quartile20 Median21 3rd Quartile22

. 8 0 1 1 . 3 4 1 . 6 2 1 . 8 7

. 8 0 2 1 . 2 3 1 . 3 7 1 . 6 8

. 8 0 3 1 . 2 6 1 . 3 7 1 . 6 5

. 9 0 1 1 . 4 6 1 . 59 1 . 9 1

. 9 0 2 1 . 55 1 . 8 8 2 . 1 6

. 9 0 3 1 . 4 2 1 . 59 1 . 9 4

The results shown in these tables reinforce the basic picture suggested by the results of the power analysis. In particular, depending on the measure (PCS score or MCS score) and the cohort of concern, if the desired level of power is 0.80, the minimum detectable effect for a plan for which the standard error of the effect estimate is equal to the median of that standard error for all plans ranges from 1.24 to 1.62. Again using Cohort 1 PCS scores as an illustration, consider a plan for which the standard error of the effect estimate is equal to the median of that standard error for all plans. If the desired level of power is 0.80, the minimum detectable effect for such a plan is 1.62. In our analysis of PCS scores in Cohort 1, only 5 plans (out of 225) exceeded 1.62 for the absolute value of the actual estimated net plan effects.

1 7 The first quartiles of the standard error of net plan effect estimates for PCS score were 0.53 for Cohort 1, 0.41 for Cohort 2, and 0.40 for Cohort 3. 1 8 The medians of the standard error of net plan effect estimates for PCS score were 0.58 for Cohort 1, 0.45 for Cohort 2, and 0.44 for Cohort 3. 1 9 The third quartiles of the standard error of net plan effect estimates for PCS score were 0.70 for Cohort 1, 0.54 for Cohort 2, and 0.53 for Cohort 3. 2 0 The first quartiles of the standard error of net plan effect estimates for MCS score were 0.48 for Cohort 1, 0.45 for Cohort 2, and 0.44 for Cohort 3. 2 1 The medians of the standard error of net plan effect estimates for MCS score were 0.58 for Cohort 1, 0.49 for Cohort 2, and 0.49 for Cohort 3. 2 2 The third quartiles of the standard error of net plan effect estimates for MCS score were 0.67 for Cohort 1, 0.59 for Cohort 2, and 0.60 for Cohort 3.


7 2



Figures III-1 to III-6 plot the relationship between the effective sample size for a plan and the estimated minimum detectable effect for that plan. Not surprisingly, there are diminishing returns from increasing the effective sample size for a plan.23 Looking at Figure II-1, for example, increasing the effective sample size from 100 to 200 reduces the minimum detectable effect for Cohort 1 PCS scores from approximately 2.5 to approximately 1.8, whereas increasing the effective sample size from 400 to 500 reduces the minimum detectable effect for Cohort 1 PCS scores from approximately 1.2 to approximately 1.1.

23 The results represented by these figures should be interpreted in a ceteris paribus sense. That is, each curve represents how the minimum detectable effect for a plan would change as the effective sample size for that plan varies, if the effective sample sizes for all other plans are held constant. Our conjecture is that simultaneously increasing the effective sample sizes for multiple plans would somewhat steepen the curve.


7 3



7 4


Figure III-1: Minimum Detectable Effect, Cohort 1 PCS

0.0

1 . 0

2 . 0

3 . 0

4 . 0

5 . 0

0 1 00 2 00 300 4 00 500 600

Effective Sample Size

MD

E



7 5



0.0

1 . 0

2 .0

3 .0

4 .0

0 1 00 2 00 300 4 00 500 600 7 00 8 00 900 1 , 000 1 , 1 00 1 , 2 00 1 , 3 00 1 , 4 00 1 , 500


MD

E



7 6



0.0

1 . 0

2 . 0

3 . 0

4 . 0

5 . 0

0 1 00 2 00 300 4 00 500 600 7 00 8 00 900 1 , 000 1 , 1 00 1 , 2 00


MD

E



7 7


Figure III-4: Minimum Detectable Effect, Cohort 1 MCS

0.0

1 . 0

2 . 0

3 . 0

4 . 0

5 . 0

0 1 00 2 00 300 4 00 500 600


MD

E



7 8



0.0

1 . 0

2 . 0

3 . 0

4 . 0

0 1 00 2 00 300 4 00 500 600 7 00 8 00 900 1 ,000 1 , 1 00 1 , 2 00 1 , 3 00 1 , 4 00


MD

E



7 9



0.0

1 . 0

2 . 0

3 . 0

4 . 0

5 . 0

0 1 00 2 00 300 4 00 500 600 7 00 8 00 900 1 , 000 1 , 1 00 1 , 2 00


MD

E

Delmarva Foundation


We don’t provide healthcare…we make it better.

Ability to Detect Differences in Performance among Health Plans

The SF-36 PCS score and MCS score are used as the basis for establishing the plan-level health outcomes (see both the SF-36 Health Survey Manual and Interpretation Guide and the SF-36 Physical and Mental Health Summary Scales: A Manual for Users of Version 1 for details) (Ware et al., 1993; Ware and Kosinski, 2001). Using the four-stage method data analysis described above, the HOS results based on Cohorts 1–3 data are shown in Table III-21. Our power analysis found that HOS has the power to detect small (0.2 standard deviation) effects. The table shows that about 15% of plans in Cohort 1, 17% in Cohort 2, and 27% in Cohort 3 had significant changes (better or worse).

Table III-21. Medicare HOS performance measurement results.

Medicare HOS Performance Measurement Results

C oh ort Y ea rs

T ota l N u m b er of R ep ortin g

U n its

M en ta l H ea lth B etter T h an E xp ected

M en ta l H ea lth W orse T h an

E xp ected

P h ysica l H ea lth B etter

T h an E xp ected

P h ysica l H ea lth W orse

T h an E xp ected

C oh ort 1 1 9 9 8 – 2 000 1 8 8 p la n s 1 3 p la n s 1 5 p la n s N one N one C oh ort 2 1 9 9 9 – 2 001 1 6 0 p la n s 8 p la n s 5 p la n s 9 p la n s 5 p la n s C oh ort 3 2 000– 2 002 1 4 6 p la n s 1 5 p la n s 4 p la n s 2 0 p la n s 1 p la n

(S ou rce : www.cm s. h h s . gov/su rveys/h os) Discussion of Power and Minimal Detectable Effects

The HOS clearly has adequate power to detect actual net plan effects that correspond to what Cohen (1988) labeled “small” effects, namely 0.2 standard deviations. Most net plan effects that have been observed in our analyses of Cohorts 1–3, however, are smaller than 0.2 standard deviations. Ultimately, the appropriate threshold of actual net plan effects deemed important to detect is a matter that cannot be determined on analytical grounds alone. Rather, this decision critically depends on the clinical and policy implications of actual net plan effects of various sizes.


8 0

Delmarva Foundation



IV. UTILIZATION OF THE HOS RESULTS

A number of organizations and individuals use the HOS data in varied ways. One of the major goals of the program is that the HOS should be useful to M+COs, providers, QIOs, CMS, beneficiaries, and health researchers. Previously we examined the quantifiable properties of the HOS measure, without which results would be ineffective for users. Here we provide information regarding how these stakeholders use HOS data. In particular, using surveys, interviews, and focus groups, we examine how the HOS is currently being used by M+COs, QIOs, CMS, and health services researchers. We also examine current thinking by the same groups regarding how the HOS data utility can be improved.

We start by reviewing the current strategies used to disseminate the HOS data and communicate related information. In addition, the methodology to develop, field, and analyze the HOS-user survey of Medicare M+COs and QIOs is presented. The surveys focus on the utility of the HOS data and dissemination strategies to M+COs and QIOs. Three focus groups were conducted with plan and QIO representatives in California, Florida, and New York to explore in greater detail some of the issues related to the HOS program. Additionally, expert interviews were conducted with CMS staff members, HOS stakeholders, and health services researchers who have used HOS data. Findings from these focus groups and interviews are presented to supplement survey findings. We discuss the major challenges identified by QIO and M+CO representatives in their use of the HOS data and offer suggestions to improve the instrument and program for its use in quality improvement within the Medicare managed care market.

HOS Dissemination Strategies and Communication Tools

A key function of the HOS is to provide health status data to M+COs and QIOs in order to facilitate improvements in quality of care provided to their Medicare populations. At the time of this report, results from Cohort 1–4 (Baseline and Follow up) and Cohort 5–6 (Baseline only) are available to M+COs and QIOs. The following is a summary of CMS’s dissemination strategies.

Baseline and performance measurement reports. Each year, a baseline report and a performance measurement report are produced for each M+CO participating in the Medicare HOS. Each participating M+CO receives plan-specific baseline and performance measurement reports, which present aggregated results from its plan, the state total, and the HOS national total. Additionally, each state’s QIO receives state-specific baseline and performance measurement reports, which present results for all plans in its respective state(s), the state total, and the HOS national total.


8 1

Delmarva Foundation



After the administration of each baseline cohort, a cohort-specific baseline report is produced. This report presents SF-36 PCS and MCS scores and includes data on resource utilization predictors, health status indicators, comparative results, and respondent demographics. The SF-36 scales also provide useful information, such as bodily pain and vitality scores, that can be used to guide quality improvement plans. The performance measurement report provides information similar to that of the baseline report plus health plan performance data.

HEDIS Volume 6 manual. The HEDIS Volume 6 manual is an annual document that provides information and specifications for the Medicare HOS. It includes background information about the survey, the measure description, the HOS HEDIS protocol, English and Spanish versions of the HOS questionnaire, and the text for the survey letters and postcards.

Medicare HOS information and technical support. HOS information and technical support is a vehicle for HOS stakeholders to obtain information about HOS data and results through the technical assistance telephone line (888-880-0077) and e-mail address ([email protected]). These mechanisms are used to get technical questions about the HOS program answered by experts.

HOS Web site. The HOS Web site (www.cms.hhs.gov/surveys/hos) is a tool that provides access to information about the HOS program, survey data, and results. It includes background materials, a list of relevant research publications, downloadable survey instruments, applied research reports, public use files, and data user’s guides.

Data user’s guides. Data user’s guides consist of information regarding the HOS file specifications and the accurate use of data. Detailed documentation regarding file construction and contents has been compiled for data sets distributed by the HOS program. These data user’s guides have been created to facilitate the use of the HOS data files.

HOS Conferences. Five HOS conferences have been sponsored, targeting managed care plans, QIOs, Central Office and Regional Office CMS staff, and other stakeholders. The conferences have provided training related to maximizing the use of the HOS in PRO/QIO projects, explaining the role of Medicare HOS in CMS’s quality improvement strategy, and presenting QIO and managed care plan interventions designed from HOS results. In addition, the HOS partners present the HOS findings and activities at professional conferences on an ongoing basis (e.g., American Health Quality Association, Disease Management Association of America).


8 2

Delmarva Foundation



HOS Research Activities. The Medicare HOS partners also are involved in research that relies on HOS data. Reports have been made available on the Medicare HOS Web site, covering topics such as how performance measurement results are calculated, the health status of younger (under 65 years of age) Medicare beneficiaries with disabilities, enrollees dually eligible for Medicare and Medicaid, and health status comparisons between the Veteran’s Health Administration and Medicare managed care enrollees. A range of research related to HOS also has been published in scientific journals. Manuscripts are available on survey administration, policy, and technical implications, and several publications are related to measurement results. A list of available publications can be found on the HOS Web site.

Analytic Approach to Review of Current HOS Use

QIO and M+CO User Surveys

To gain an understanding of M+CO and QIO familiarity with and use of Medicare HOS results and communication tools, the HOS evaluation contractor conducted a survey of M+COs currently participating in Medicare and QIOs representing each state that has M+COs.

Survey design. Two surveys were developed. One targeted M+COs, and the other targeted QIOs. The surveys covered two general areas: 1) familiarity with and use of specific HOS instrument items (e.g., PCS and MCS scores, participant demographics), and 2) familiarity with and use of the previously identified dissemination strategies. The survey instruments were developed in consultation with CMS, Delmarva Foundation representatives, and a survey research expert at the University of Maryland, Baltimore County. The instruments were refined through discussion with several M+COs and the QIO for New York State (IPRO). Several representatives of the Arizona QIO (Health Services Advisory Group), two Arizona M+COs, Cigna Health, and Maricopa Integrated Health System Health Plan then pilot tested the instruments. Information on the availability of the final survey instruments can be found in Appendix III.


8 3

Delmarva Foundation



Questions specific to HOS data utilization asked M+COs to rate HOS’s usefulness for conducting quality improvement, resource utilization, performance assessment, and disease and case management activities. QIOs were asked to rate the usefulness of the HOS data in quality improvement, educational, and performance assessment activities. Respondents were provided a definition for each of these activities (e.g., quality improvement). Both plans and QIOs were asked to rate their experiences with a series of potential challenges to using HOS data (e.g., challenges linking HOS results to processes of care). With regard to dissemination strategies, both M+COs and QIOs were asked about their familiarity with the following dissemination strategies 1) HOS baseline reports; 2) HOS performance measurement reports; 3) HEDIS Volume 6 manuals; 4) Medicare HOS information and technical support; 5) HOS Web site; 6) data user’s guides; and 7) HOS conferences. If M+COs and QIOs were familiar with a given strategy, they were asked to indicate how often they had used the strategy, to rate its utility for specified activities, and to rate potential challenges with using the given strategy. Respondent evaluations were designed to reflect experiences over the previous 2 years.

Five-point Likert-like scales were used for the ratings of usefulness and challenges. The scales were equally weighted with two negative and two positive responses. A middle category on the scale served as a point of moderate agreement.

The survey administration was Web-based. Each M+CO and QIO was contacted to confirm the identity of the person with primary responsibility for HOS within the M+CO or QIO and to inform them of the survey. Each contact was e-mailed an invitational letter and the survey in early July. A series of four e-mail reminders was sent over a 3-week period. At least one telephone call was made to each representative who had not completed the survey by the initial deadline, so as to increase survey response.

Survey findings presented in this report provide the actual numbers of respondents, as well as percentages. For questions involving a 5-point Likert ranking, average rankings are presented. For certain analyses, the actual numbers of positive and negative responses relative to total responses are shown. Responses to the open-ended questions were categorized and tallied for frequencies.


8 4

Delmarva Foundation



QIO and M+CO Focus Groups

Focus group sessions were conducted within three states with significant Medicare managed care enrollment: California, Florida, and New York. These focus groups provided qualitative data from M+CO and QIO representatives. The information provided by these focus groups supplements the quantitative survey data collected and enriches the understanding of the utility of HOS data and the effectiveness of the current CMS strategies used to communicate HOS results. Focus group questions centered around assessing M+COs’ and QIOs’ experiences using HOS results and related tools (see Appendix III for the focus group agenda and questions). Approximately 12 M+CO representatives and 1 or more QIO representatives participated in each group. The duration of the focus groups, on average, was approximately 2 hours. The QIO for each state assisted in organizing the focus groups.

CMS Expert Interviews

In-person or phone interviews were conducted with various former and current CMS officials as well as with HOS partners and researchers who have used HOS data. Before the interviews, participants were e-mailed or faxed a copy of the interview questions for adequate preparation (see Appendices I and III). Interviews with individuals ranged from 1 to 1½ hours, whereas interviews with group partners, including NCQA and HSAG, lasted 2 hours. The interview questions for this portion of the evaluation focused on participants’ experiences with HOS data and their utility with respect to the participants’ needs.

Results of Utilization Surveys

Utility of HOS for QIOs

Thirty-two of 43 QIOs that have M+COs in their states completed the survey for an overall 74.42% response rate. Responses were received from 9 of 10 QIOs with plans in markets with the highest managed care penetration, including the QIOs of California, Florida, Massachusetts, and New York. A response was received from at least 2 QIOs in each of the four regions—northeast, south, central, and west. One QIO had not had an active M+CO plan since 1999 and was excluded from the analyses. All remaining respondents had at least one active M+CO at the time of the survey.


8 5

Delmarva Foundation



Of 31 QIOs, 26 described their roles and responsibilities with respect to the HOS program as providing assistance with using HOS data for quality improvement activities, and 23 as providing assistance to M+COs in interpreting HOS data. Most surveys (20 of 31) were completed in collaboration with additional QIO staff members.

Use of HOS data. QIOs were asked about the extent to which they used HOS data for: 1) quality improvement activities; 2) educational activities; and 3) performance assessment activities. Quality improvement was defined as “activities aimed at enhancing the current quality of health care services and outcomes.” Of 31 QIOs, 14 (45.16%) indicated that they used HOS data for quality improvement activities, 9 of which used HOS data three or more times for this purpose (see Table IV-1).

Table IV-1. QIO data utilization in the previous 2 years (n = 31).

Uses % Use None 1 2 3 or more Ranking Q u a lity Im p rovem en t 4 5 . 1 6 1 7 3 2 9 3 . 4 7

E d u cation 3 5 . 4 8 2 0 1 6 4 3 . 1 4

P erform ance A ssessm en t 3 8 . 7 1 1 9 4 4 4 3 . 00

Data were used most frequently to identify groups with greater health needs (11 of 14) and identify health conditions of concern (10 of 14) (see Table IV-2).

Table IV-2. QIO uses of HOS data for quality improvement (n = 14).

Uses Frequency To id en tify grou p s with greater h ea lth ca re n eed s 1 1

T o id en tify h ea lth con d ition s of concern 1 0

To id en tify p rocesses n eed in g im p rovem en t 9

T o id en tify low-p erform ing M + C O p la n s 3

QIOs using HOS data for quality improvement activities gave the data an average rating of 3.47 on a scale of 1 (not at all useful) to 5 (very useful). As shown in Table IV-3, the list of chronic conditions (4.26) and participant demographics (4.13) were viewed as most useful and the ADL measures (3.13) as least useful.


8 6

Delmarva Foundation



Table IV-3. QIO usefulness by HOS instrument components average scores.

Components

Quality Improvement

Education

Performance Assessment

C h ron ic C on d ition L ist (e . g. , n u m b ness or loss of fee lin g in you r feet)

4 . 2 6 3 . 6 9 3 . 2 1

D em ogra p h ics (e . g. , race , gen d er, age) 4 . 1 3 3 . 3 8 3 . 4 3 P h ysica l C om p onen t S u m m ary ( P C S score) 3 . 9 3 3 . 2 1 2 . 8 6 S F -3 6 S ca le S cores (e . g. , b od i ly p a in ) 3 . 8 7 3 . 3 6 2 . 8 6 M en ta l C om p onen t S u m m ary (M C S score ) 3 . 8 0 3 . 56 3 . 07 S m oking H istory an d F requ ency (e . g. , ever sm oked 1 00 ciga rettes in you r life )

3 . 6 7 3 . 2 1 2 . 9 3

S e lf-R ep orted H ea lth I ssu es (e . g. , p h ys icia n told you th a t you h ave cancer)

3 . 6 7 3 . 2 5 3 . 1 4

A ctivities of D a i ly L ivin g (A D L ) I tem s (e . g. , d ifficu lty b a th in g, d ressin g, ea tin g)

3 . 1 3 3 . 00 2 . 7 9

O vera ll U sefu ln ess 3 . 4 7 3 . 1 4 3 . 00

Education was defined as “activities aimed at providing information or enhancing the understanding of various stakeholders (i.e., providers and beneficiaries) related to HOS results or information.” Eleven QIOs (35.48%) reported using the HOS data at least once for educational purposes; 4 reported using it three or more times for this purpose (see Table IV-1). These 11 QIOs each reported using it to educate M+COs. Four used the HOS data to educate providers, and 3 used the information to educate beneficiaries. Overall usefulness for educational activities was somewhat lower than quality improvement (3.14 vs. 3.47), although there was not a particularly large difference. In relation to education, the list of chronic conditions (3.69) and MCS score (3.56) were rated as most useful and functional limitations (3.00) as least, although the absolute difference in rankings is not as great for this activity (see Table IV-3).

Performance assessment was defined as “comparing and monitoring performance within and across M+CO plans and health care providers.” Twelve QIOs (38.71%) reported using HOS data for performance assessment, with 4 using HOS data three or more times for this purpose (Table IV-1). HOS data were used most often for comparing state performance to national performance (11 of 12 QIOs) and for plan-to-plan comparisons (10 of 12 QIOs). Eight QIOs reported using HOS data to identify M+CO plans needing more attention, whereas six reported using the data to identify plans with outstanding performance (see Table IV-4).

Table IV-4. QIO uses of HOS data for performance assessment (n = 12).

Uses Frequency T o com p are sta te p erform ance to n a tion a l p erform ance 1 1 F or p la n -to-p la n com p arison 1 0 To id en tify M + C O p la n s n eed in g m ore assistance 8 T o id en tify M + C O p la n s with ou tstan d in g p erform ance 6


8 7

Delmarva Foundation



QIOs rated the HOS data at 3.00 for performance assessment, relative to 3.47 for quality improvement. Demographic data (3.43) and chronic condition data (3.21) were rated as most useful, whereas information on functional limitations (ADL) (2.79) was least useful for performance assessment (Table IV-3).

Desired changes to HOS. As part of the evaluation, QIOs were also asked whether there were items that would be useful to add to the HOS instrument. Through the surveys and the focus group discussions, several additional items were identified, including:

1. Advanced directives (e.g., living wills) 2. Several questions related to social support items, such as availability of community support 3. Information on living arrangements or setting (e.g., assisted living) 4. Receipt of preventive services 5. Data items related to QAPI projects 6. Additional demographic data

QIOs identified no items as possible measures to be deleted.

Challenges using HOS data. QIO respondents were asked to rate a series of experiences on whether they posed a challenge to the QIO in using HOS data (1 = a major challenge, 5 = not at all a challenge). As shown in Table IV-5, interpreting HOS data was not viewed as a challenge (3.87). The rating for the inappropriate person getting the data (3.54) indicates that this is not really seen as a challenge either. The rating for understanding how to improve HOS results (2.86) indicates that QIO respondents saw improving HOS scores as more of a challenge. Challenges resulting from the timeliness24 of the data (2.71) also were identified. QIOs viewed linking HOS data to process of care (2.52) as the greatest challenge.

Table IV-5. QIO data utilization challenges (n = 31).

Challenges Rating C h a llenges in terp retin g H O S d a ta an d in form ation 3 . 8 7 C h a llenges associa ted with th e d a ta b e in g sen t to an in a p p rop ria te p erson 3 . 54 C h a llenges u n d erstan d in g h ow to im p rove H O S resu lts 2 . 8 6 C h a llenges resu ltin g from th e tim e lin ess of th e d a ta 2 . 7 1 C h a llenges l in k in g H O S resu lts to p rocesses of ca re 2 . 52

2 4 The survey did not include a specific definition of timeliness. However, based on information received during the focus groups, timeliness seems to refer to the time that elapses between the initial baseline survey administration and the receipt of HOS baseline and performance measurement reports and data.


8 8

Delmarva Foundation



QIOs were asked whether there were additional challenges that affected their willingness and ability to use HOS data. The most frequent response was related to low managed care enrollment. Of 21 respondents, nine described this as an issue. For QIOs with some but limited enrollment, a few issues were apparent. One of the useful aspects of HOS data for QIOs is the ability to compare plan demographics and performance within market areas, within the state, or nationally (Table IV-4). Second, unlike CAHPS, there is not a fee-for-service equivalent of HOS, so QIOs are unable to compare M+CO beneficiaries with the majority of their beneficiaries who are in traditional Medicare. In the focus groups, QIO and M+CO participants discussed how the small sample size limited their ability to use HOS to focus on quality improvement related to specific chronic conditions.

Dissemination strategies. QIOs were queried about their use of seven reports, documents, and communication tools. Of 31 QIOs, 26 (83.87%) indicated that they were familiar with the HOS baseline reports. Eighteen of these 26 reported using the HOS baseline report at least once, and 6 reported using the reports three or more times (see Table IV-6a).

Table IV-6a. QIO familiarity with and use of HOS baseline reports in previous 2 years.

Familiar With HOS Baseline Reports (n = 31) Number Percent Y es 2 6 8 3 . 8 7

Use of HOS Baseline Reports (n = 26) Frequency Percent 0 8 3 0. 7 7 1 5 1 9 . 2 3 2 7 2 6 . 9 2 3 or m ore 6 2 3 . 08

Table IV-6b. QIO HOS baseline reports purpose for use in previous 2 years.

Purposes for Use (n = 18)

Frequency of Use Ranking of Use

P rovid e com p ara tive in form ation on h ea lth sta tu s in d ica tors an d d em ogra p h ic ch a racteristics

1 3 3 . 2 5

P rovid e in form ation for th e d eve lop m en t of qu a lity im p rovem en t in itia tives

9 3 . 00

M on itor p erform ance with in an d across p la n s an d p rovid ers 9 3 . 1 7 M on itor p rogress of qu a lity im p rovem en t activities 7 2 . 9 4 E d u cate a sta keh old er 4 3 . 00

Table IV-6c. QIO HOS baseline reports challenges in previous 2 years

Challenges Rating C h a llenges with u n d erstan d in g th e con ten t of th e rep ort 4 . 4 1 C h a llenges associa ted with in accu racies a n d in consisten cies in th e rep orts 4 . 04 C h a llenges associa ted with th e rep orts b e in g sen t to an in a p p rop ria te p erson 3 . 59 C h a llenges resu ltin g from th e tim e lin ess of th e rep orts 2 . 7 7


8 9

Delmarva Foundation



HOS baseline reports were used most frequently to provide comparative information on health status indicators and demographics (13 of 18) and least frequently to educate a stakeholder (4 of 18). QIOs were then asked to rate the usefulness of HOS baseline reports related to these activities. The reports (on a scale of 1 to 5) were rated as most useful in providing comparative information (3.25) and least useful in monitoring progress of quality improvement activities (2.94). The absolute difference in ratings is small. One QIO also used the reports in the context of a QAPI project. Overall, QIOs rated HOS baseline reports at 3.24. Eight of 17 respondents viewed HOS baseline reports as somewhat or very useful compared with 4 of 17 who viewed the reports as not at all or not very useful.

Somewhat fewer of the 31 QIOs, 22 (71.00%), were familiar with HOS performance measurement reports. Twelve had used the performance measurement reports one or more times, whereas six had used the reports three or more times (see Table IV-7a). Performance measurement reports were used most often to provide comparative information on health status indicators and demographics (10 of 12 QIOs), to provide information for the development of quality improvement initiatives (9 of 12 QIOs), and to monitor plan performance (8 of 12 QIOs). Performance measurement reports were viewed by QIOs using the reports as most useful for providing comparative information (3.58). QIOs rated the performance measurement reports as 3.69 in their overall usefulness, slightly above the HOS baseline reports.

Challenges in dissemination of reports. It is clear from the ratings that QIO respondents had few challenges that affected QIOs’ willingness or ability to use the HOS baseline reports (Table IV-6c). In fact, given the ratings, the opposite is true. The ratings for three of the four items indicate that QIO respondents had no trouble understanding reports (4.41), no problem with inaccuracies and inconsistencies in reports (4.04), and no problem with reports being sent to the wrong person (3.59). The rating of 2.77 indicates that respondents did find timeliness of the reports somewhat of a challenge.

Reported challenges for performance measurement reports (see Table IV-7c) are similar to baseline reports. The highest rating was for understandability of reports (3.86) and the lowest for timeliness (2.48).


90

Delmarva Foundation



Table IV-7a. QIO familiarity and use of HOS performance measurement reports in previous 2 years.

Familiar With HOS Performance Measurement Reports (n = 31) Number Percent

Y es 2 2 7 1 . 00 Use of HOS Performance Measurement

Reports (n = 22) Frequency Percent 0 1 0 4 5 . 4 6 1 2 9 . 09 2 4 1 8 . 1 8 3 or m ore 6 2 7 . 2 7

Table IV-7b. QIO HOS performance measurement reports purpose for use in previous 2 years.

Purposes for Use (n = 12) Frequency of

Use Ranking of

Use P rovid e com p ara tive in form ation on h ea lth sta tu s in d ica tors an d d em ogra p h ic ch a racteristics

1 0 3 . 58

P rovid e in form ation for th e d eve lop m en t of qu a lity im p rovem en t in itia tives

9 3 . 2 4

M on itor p erform ance with in an d across p la n s an d p rovid ers 8 3 . 4 2 M on itor p rogress of qu a lity im p rovem en t activities 6 3 . 4 5 E d u cate a sta keh old er 3 3 . 1 0

Table IV-7c. QIO HOS performance measurement reports challenges in previous 2 years.

Challenges Rating C h a llenges with u n d erstan d in g th e con ten ts of th e rep orts 3 . 8 6 C h a llenges associa ted with in accu racies a n d in consisten cies in th e rep orts 3 . 4 8 C h a llenges associa ted with th e rep orts b e in g sen t to an in a p p rop ria te p erson 3 . 04 C h a llenges resu ltin g from th e tim e lin ess of th e rep orts 2 . 4 8

Table IV-8. QIO Familiarity with tools (n = 31).

Tools Frequency of

Familiarity Any Use in Past

2 Years Overall

Usefulness H O S B ase lin e R ep ort 2 7 1 8 3 . 2 4 H O S P erform ance M easu rem en t R ep ort 2 2 1 2 3 . 6 9 H E D I S V olu m e 6 M an u a l 1 6 1 0 4 . 00 D a ta U ser’s G u id es 1 4 1 2 4 . 08 H O S W eb s ite 1 1 1 0 3 . 6 0 H O S C on ferences 1 1 0 3 . 50 I n form ation an d T ech n ica l S u p p ort 7 5 3 . 8 6


9 1

Delmarva Foundation



Use of other HOS tools. Of the 31 QIOs, 16 were familiar with the HEDIS Volume 6 manual (Table IV-8). Ten QIOs had used HEDIS Volume 6 manual one or more times, while four QIOs had used HEDIS Volume 6 manual three or more times. The HEDIS Volume 6 manual was viewed as useful for providing HEDIS background information (4.10), understanding the survey protocol (4.00), and gaining the HOS background (3.90) for those QIOs who had used this document (data not shown). QIOs ranked the HEDIS Volume 6 manual as 4.00 in its overall usefulness. QIOs rated the HEDIS Volume 6 manual as not presenting a challenge (3.71–3.92) (data not shown).

Fourteen QIOs (45.16%) were familiar with the data user’s guides and 12 had used the data user’s guides at least once in the past 2 years (Table IV-8). Data user’s guides were used most often to understand file contents (10 of 12 QIOs) and facilitate the use of data files (9 of 12 QIOs). Those QIOs who used the guides rated them a 4.08 in overall usefulness, with the guides viewed as most useful for facilitating use of data files (4.23). Those QIOs using the guides did not report any particular challenges to their use (data not shown).

Eleven QIOs (35.48%) were familiar with the HOS Web site, and 10 had used the Web site (Table IV-8). Four had used the Web site three or more times in the previous 2 years. The Web site was used equally as often to obtain updates of the HOS development, improve their understanding of the HOS, and access HOS publications and reports (6 of 10 QIOs). The Web site was rated most useful for obtaining reports (4.00) and least useful for gaining access to HOS data (2.44). Those using the HOS Web site rated it a 3.60 in overall usefulness. No challenges to using the Web site were identified (data not shown).

Eleven QIOs (35.48%) were familiar with HOS conferences (Table IV-8); 5 or fewer QIO respondents reported QIO attendance in a given year. HOS conferences were used most often to obtain guidance in the use of HOS data (7 QIOs) and obtain historical HOS information (6 QIOs). Those 8 QIOs rating the overall usefulness of the conferences rated them as 3.50. Conferences were viewed as most useful in providing information on HOS progress (3.88) and gaining HOS history (3.88). The HOS conferences posed no particular challenges for QIOs who were familiar with them (data not shown).

Seven QIOs were familiar with the HOS information and technical support (Table IV-8). These QIOs had used information and technical support most often to gain access to HOS data files (5 of 7 QIOs). Those using HOS information and technical support rated it a 3.86 in overall usefulness, with the information and technical support rated as most useful in gaining access to HOS data files (4.43). Nine QIOs rated the potential challenges to its use, and these respondents did not view the information and technical support as posing particular challenges (3.56-4.00) (data not shown).


9 2

Delmarva Foundation



QIO/M+CO collaboration. QIOs were then asked a series of questions related to QIO/M+CO collaboration. About two-thirds of QIO respondents (67.70%) perceived that the HOS program facilitates QIO/M+CO collaboration around quality improvement activities. QIOs that did not perceive the HOS program as facilitating collaboration were asked why they viewed this to be the case. Of 8 respondents, 4 indicated that not linking the HOS program to the required QAPI topic hindered collaboration. As explained by one respondent, “. . . the QIO/M+CO QI efforts focus primarily on the annual CMS required QAPI topics, to which the HOS data do not readily apply.”

To facilitate collaboration around quality improvement activities, QIOs provided several suggestions, including:

1. Improving the timeliness (5 of 14 respondents) 2. Building a requirement to use HOS data in the QIO scope of work or linking HOS data to the

QAPI project in the scope of work (4 of 14 respondents) 3. Developing “benchmarks” or “best practices” to disseminate to QIOs and M+COs (3 of 14

respondents).

The primary strength of the HOS program from the QIO representatives’ perspectives is its provision of outcomes data that can be used for comparative purposes as well as for surveillance (4 of 13). One QIO representative remarked that HOS “provides great information about the state of health for select groups of enrollees.” As a second QIO representative noted, it “provides a standardized surveillance system for monitoring the health of seniors across many dimensions.”

Of 14 respondents who discussed weaknesses, the lack of benchmarks and difficulties in knowing how to use the HOS data for quality improvement were cited most often (6 of 14). Of 6 respondents who offered suggestions for improvement of the HOS program, 3 identified activities related to best practices. As suggested by one respondent, “Develop templates of ‘if/then’ scenarios for M+COs to follow—this would ease the use of the data in QI efforts.”


9 3

Delmarva Foundation



Utility of HOS for M+COs

Forty-two individuals representing M+COs completed the survey. As several respondents represented more than one M+CO plan, survey respondents represented a total of 57 M+COs. These M+COs represented 41.30% of all M+COs and approximately 39.10% of Medicare enrollees presently in a managed care plan. Respondents were slightly more often representatives of nonprofit M+COs (45.45%) than nonrespondents (41.10%). Respondents were more often from group model M+COs than nonrespondents (45.45% vs. 31.51%) and less often from Independent Physician Associations (IPA) M+COs than nonrespondents (46.27% vs. 53.42%). Of the 57 represented M+COs, 19 were from the northeast (including New York, Pennsylvania, and New Jersey), 16 from the west (including Arizona, California, Hawaii, and Oregon), 11 from central states, and 11 from the south (including Florida, Georgia, and Alabama). Thirty-six of the 57 M+COs were from areas with the highest Medicare managed care penetration rates, such as Arizona, California, Hawaii, and Oregon. A little more than one third (35.71%) of responses were completed in collaboration with others in the organization.

HOS data. M+COs were asked to evaluate the usefulness of HOS data for activities related to quality improvement, resource utilization, performance assessment, and disease and case management. As previously discussed, quality improvement was defined as “activities aimed at enhancing the current quality of health care services and health outcomes.” Sixteen M+COs (38.10%) reported using HOS data for quality improvement activities. Of these, 6 reported using HOS data three or more times in the previous 2 years for quality improvement (see Table IV-9).


9 4

Delmarva Foundation



Table IV-9. M+CO data utilization in previous 2 years (n = 42).

Uses % Use None 1 2 3 or more Ranking P erform ance A ssessm en t 52 . 3 8 2 0 4 1 6 2 3 . 9 4 Q u a lity Im p rovem en t 3 8 . 1 0 2 6 4 6 6 2 . 7 1 D isease an d C ase M anagem en t

1 9 . 05 3 4 3 1 4 3 . 07

R esou rce U ti l iza tion 1 4 . 2 9 3 6 1 4 1 2 . 4 3

HOS data were used most frequently to identify health conditions of concern (14 of 16) and groups with greater health care needs (11 of 16) (see Table IV-10).

Table IV-10. M+CO uses of HOS data for quality improvement (n = 16).

Uses Frequency To id en tify h ea lth con d ition s of concern 1 4 T o id en tify grou p s with greater h ea lth n eed s 1 1 T o id en tify p rocesses n eed in g im p rovem en t 9 T o id en tify low-p erform ing h ea lth ca re p rovid ers 2

The chronic condition list (3.33) (1 = not at all useful; 5 = very useful) and respondent demographics (3.20) were perceived as most useful, and the information on functional limitations (2.55) as least useful in quality improvement. Overall, M+COs rated the HOS instrument as 2.71 in its usefulness for quality improvement (see Table IV-11).

Table IV-11. M+CO HOS data usefulness by HOS instrument components average scores.

Components Quality

Improvement Resource Utilization

Performance Assessment

Disease and Case

Management C h ron ic C on d ition L ist (e . g. , n u m b ness or loss of fee lin g in you r feet)

3 . 3 3 2 . 6 2 3 . 08 3 . 5 7

D em ogra p h ics (e . g. , race , gen d er, a n d age)

3 . 2 0 2 . 8 5 3 . 3 6 3 . 2 0

M en ta l C om p onen t S u m m ary (M C S score)

2 . 9 5 2 . 54 3 . 2 4 3 . 2 0

P h ysica l C om p onen t S u m m ary ( P C S score)

2 . 9 5 2 . 54 3 . 3 2 3 . 2 0

S e lf-R ep orted H ea lth I ssu es (e . g. , p h ys icia n told you th a t you h ave cancer)

2 . 9 0 2 . 50 2 . 7 5 3 . 07

S m oking H istory an d F requ ency (e . g. , ever sm oked 1 00 ciga rettes in you r life )

2 . 8 6 2 . 4 6 2 . 8 4 2 . 8 7

S F -3 6 S ca le S cores (e . g. , b od i ly p a in )

2 . 7 7 2 . 3 8 3 . 1 6 3 . 1 4

A ctivities of D a i ly L ivin g I tem s (A D L ) (e . g. , d ifficu lty b a th in g, d ressin g, ea tin g)

2 . 55 2 . 1 7 2 . 6 0 2 . 8 6

O vera ll U sefu ln ess 2 . 7 1 2 . 4 3 3 . 04 3 . 07


9 5

Delmarva Foundation



Performance assessment was defined as “comparing and monitoring performance within and across M+CO plans and health care providers.” More than half (52.38%) of the M+COs reported using the HOS data for performance assessment activities. Of 22 M+COs, 16 reported using the HOS twice in the past 2 years for this purpose, whereas 2 reported using the HOS data three or more times. All M+COs using the data for performance assessment used them to compare their performance to state and national performance figures, and almost all (20 of 22) used the data for plan-to-plan comparisons. Sixteen plans also used the data to identify areas or processes needing improvement (see Table IV-12).

Table IV-12. M+CO uses of HOS data for performance assessment (n = 22).

Resource utilization was defined as “activities aimed at reviewing the use of resources and services to determine whether the use was appropriate, reasonable, and medically necessary.” Only 6 (14.29%) M+COs reported using the HOS data for resource utilization activities, with 1 using the data three or more times for this purpose (Table IV-9). Ratings ranged from 2.85 for participant demographics to 2.17 for functional limitations. All items were ranked below a moderate rating of 3.00 for this activity, and the overall ranking of usefulness was 2.43 (Table IV-11).

Uses Frequency

T o com p are you r p la n ’s p erform ance to sta te an d n a tion a l p erform ance 2 2 F or p la n -to-p la n com p arison 2 0 To id en tify a reas or p rocesses n eed in g im p rovem en t 1 6

Participant demographics (3.36), PCS score (3.32), and MCS score (3.24) were viewed as most useful for performance assessment and functional limitations (2.60) the least useful. M+COs rated the HOS data at 3.04 in overall usefulness for performance assessment activities (Table IV-11).

Disease and case management was defined as “activities aimed at determining patient needs and providing adequate guidance and coordinated and comprehensive care to effectively manage their illness.” Eight (19.05%) M+COs reported using the HOS data for disease and case management activities, with four reporting use three or more times within the past 2 years (Table IV-9). M+COs using the HOS for this purpose rated it an overall 3.07 in usefulness. The chronic condition list (3.57) was perceived as most useful and the functional limitation information (2.86) as least useful for this purpose (Table IV-11).

Desired changes to HOS. Through the survey and focus groups, M+COs suggested some additional questions to add to the HOS instrument. These included:


9 6

Delmarva Foundation



1. Questions related to social support and use of preventive services 2. Information related to enrollees’ utilization of different types of services (e.g., emergency

rooms, skilled nursing facilities) 3. Medication information, such as the number of medications being taken and the affordability

of medications 4. Additional medical condition questions, including obesity 5. Information on physical exercise and activity

Although no items were recommended for removal, 1 M+CO representative suggested replacing the SF-36 with the SF-12.

Challenges to using HOS data. M+COs were asked to rate a series of issues as to whether they presented a challenge to using HOS data (see Table IV-13). These challenges ranged from interpreting HOS data (2.84) to a more severe challenge linking HOS data to processes of care (2.15).

Table IV-13. M+CO data utilization challenges (n = 42).

Challenges Rating C h a llenges in terp retin g H O S d a ta an d in form ation 2 . 8 4 C h a llenges associa ted with th e d a ta b e in g sen t to an in a p p rop ria te p erson 2 . 7 7 C h a llenges resu ltin g from th e tim e lin ess of th e d a ta 2 . 4 4 C h a llenges u n d erstan d in g h ow to im p rove H O S resu lts 2 . 2 1 C h a llenges l in k in g H O S resu lts to p rocesses of ca re 2 . 1 5

Several additional challenges were identified in an open-ended question directed to the M+COs as well as from the focus group discussions. Of 18 respondents, 6 discussed how the HOS data and reporting cycles are not linked to M+COs’ internal quality improvement activities. This issue also was discussed in the focus groups. As explained more fully by focus group participants, most if not all M+COs conduct risk assessments of newly enrolled members and use these assessments to structure activities that primarily are chronic disease based. Most M+COs, for example, operate disease management programs for members with conditions such as diabetes or congestive heart failure. These diseases have both process and outcome measures of quality that are used to monitor quality improvement. Plans have not particularly seen the utility of a more general measure of health status. As explained by one survey respondent, “We have one of the most progressive disease management departments in the industry, yet we fail to see the link between the HOS survey results and the progress we see with our disease management populations. That disconnect makes HOS virtually worthless to us.” Given the relatively high rankings for HOS’s usefulness in disease management, it is clear that most respondents do not agree.


9 7

Delmarva Foundation



A related concern is that HOS data do not appear “actionable.” Actionability relates to perceived timeliness (discussed in greater detail elsewhere), the generality of the SF-36 measures, and lack of knowledge about actions to be taken to improve scores. Generality here refers to general health, as opposed to the disease-specific nature of the SF-36.

HOS was designed to enable plan-to-plan comparisons, though our focus groups make it clear that some M+COs would like access to other combinations of data. One survey respondent noted, “The major limitation with the HOS is that it reports data at plan or contract level. This is not specific enough to identify where in the organization improvement efforts should be targeted. Given the lack of specificity, our organization uses our own internal data sources. That said, our measures of mental and physical functioning are limited.” For an organization with multiple Medicare contracts within a state or nationally, provision of data by contract number does not facilitate comparison across the organization’s multiple plans, as explained by one focus group participant.

In previous discussion, we reported that most of the minimum detectable effects found so far were quite small. This lack of significant difference in health status performance across plans was discussed by focus group participants and representatives of HSAG (HSAG representatives, personal communication, 2003a). A focus group participant noted that she saw few differences either cross-sectionally or longitudinally between her plan and others in her state in contrast to other measures, such as HEDIS. The issue of limited performance differences between plans may have been based on the results from Cohort 1, where no plans had PCS scores better than or worse than expected; only MCS score outliers were reported, as discussed previously. As noted, the results from Cohorts 2 and 3 show greater differences in plan performance for both.

Although our research shows that current sample sizes may provide adequate power to make plan-to-plan comparisons, some respondents and focus groups participants pointed to the small sample size as an issue. In particular, it seems that some plans wished to use the HOS as a measure specific to quality improvement activities focused on a particular condition (e.g., diabetes), and current sample sizes were too small to support such activities.

Dissemination strategies. M+COs were queried regarding seven HOS reports, documents, and communication tools:

1. HOS baseline reports 2. HOS performance measurement reports 3. HEDIS Volume 6 manual 4. HOS information and technical support


9 8

Delmarva Foundation



5. HOS Web site 6. Data user’s guides 7. HOS conferences

Most M+COs (83.33%) were familiar with the HOS baseline reports (see Table IV-14a). Slightly more than half (54.29%) of those familiar with these reports had used them in the previous 2 years, with 9 reporting using these reports three or more times. All M+COs using the HOS baseline reports used them to provide comparative information on health status indicators and demographics. Ten of 19 M+COs reported using the reports to develop quality improvement activities. M+COs rated the HOS baseline reports as 2.86 in their overall utility to M+COs.

Table IV-14a M+CO familiarity and use of HOS baseline reports in previous 2 years.

Familiar With HOS Baseline Reports (n = 42) Number Percent Y es 3 5 8 3 . 3 3

Use of HOS Baseline Reports (n = 35) Frequency Percent 0 1 6 4 5 . 7 1 1 5 1 4 . 2 9 2 5 1 4 . 2 9 3 or m ore 9 2 5 . 7 1

Table IV-14b. M+CO HOS baseline reports, purpose for use in previous 2 years.




1 9 3 . 2 4

D eve lop qu a lity im p rovem en t p rogram s 1 0 2 . 7 1 M on itor p erform ance 8 2 . 8 2 M on itor p rogress of qu a lity im p rovem en t p rogram s 7 2 . 2 9 R esou rce u ti l iza tion 5 2 . 2 9 I d en tify b eneficia ries for d isease an d case m anagem en t 5 2 . 3 3

Table IV-14c. M+CO HOS baseline reports, challenges in previous 2 years.

Challenges Rating C h a llenges associa ted with in accu racies a n d in consisten cies in th e rep orts 4 . 03 C h a llenges associa ted with th e rep orts b e in g sen t to an in a p p rop ria te p erson 3 . 9 7 C h a llenges with u n d erstan d in g th e con ten ts of th e rep orts 3 . 4 7 C h a llenges associa ted with la ck of b eneficia ry d a ta 2 . 56 C h a llenges resu ltin g from th e tim e lin ess of th e rep orts 2 . 50

Two-thirds (66.67%) of M+COs were familiar with the HOS performance measurement reports. The majority (67.90%) of those familiar with the reports had used it one or more times in the previous 2 years (see Table IV-15a).


9 9

Delmarva Foundation



Table IV-15a. M+CO familiarity and use of HOS performance measurement reports in previous 2 years.

Familiar With HOS Performance Measurement Reports (n = 42)

Number Percent

Y es 2 8 6 6 . 6 7 Use of HOS Performance Measurement

Reports (n = 28) Frequency Percent

0 9 3 2 . 1 1 4 1 4 . 3 2 1 1 3 9 . 3 3 or m ore 4 1 4 . 3

Table IV-15b. M+CO performance measurement reports, purpose for use in previous 2 years.




1 5 3 . 1 4

M on itor p erform ance 1 2 3 . 00 D eve lop qu a lity im p rovem en t p rogram s 9 2 . 6 7 M on itor p rogress of qu a lity im p rovem en t p rogram s 6 2 . 3 3 R esou rce u ti l iza tion 4 2 . 2 5 I d en tify b eneficia ries for d isease an d case m anagem en t 3 2 . 2 9

Table IV-15c. M+CO performance measurement reports, challenges in previous 2 years.

Challenges Rating C h a llenges associa ted with in accu racies a n d in consisten cies in th e rep orts 4 . 1 2 C h a llenges associa ted with th e rep orts b e in g sen t to an in a p p rop ria te p erson 3 . 8 9 C h a llenges with u n d erstan d in g th e con ten ts of th e rep orts 3 . 3 3 C h a llenges resu ltin g from th e tim e lin ess of th e rep orts 2 . 8 9 C h a llenges associa ted with th e la ck of b en eficia ry d a ta 2 . 7 4

M+COs used the performance measurement reports most often to provide comparative information on health status indicators (15 of 19) and to monitor performance (12 of 19). Nine M+CO plans used these reports to provide information for the development of quality improvement activities, and 6 used them to monitor the progress of quality improvement activities. M+COs rated the performance measurement reports as 2.83 in overall usefulness, with the report being most useful for providing comparative information (3.14). The reports were not seen as useful in either developing (2.67) or monitoring (2.33) quality improvement activities.

Challenges to using the HOS reports. With regard to the baseline and performance reports, respondents found little challenge with inaccuracies, the reports being sent to the wrong person, or understanding the contents of the reports. The lowest ranking for challenges for both baseline and performance measurement reports was timeliness and the lack of individual-level data.


1 00

Delmarva Foundation



Use of other HOS tools. Most M+COs (71.44%) were familiar with the HEDIS Volume 6 manual (see Table IV-16). Of those familiar with this document, 18 had used it in the previous 2 years, with 8 using it three or more times. M+COs viewed the HEDIS Volume 6 manual as most useful in providing HEDIS (3.85) and HOS (3.75) background information (data not shown), rating it a 3.62 in overall usefulness. M+COs did not rate any of the potential challenges to obtaining or using HEDIS Volume 6 as a challenge (data not shown).

Table IV-16. M+CO familiarity with tools (n = 42).

Tools

Frequency of Familiarity

Any Use in Past 2 yrs.

Overall Usefulness

H O S B ase lin e R ep orts 3 5 1 9 2 . 8 6 H E D I S V olu m e 6 M an u a l 3 0 1 8 3 . 6 2 H O S P erform ance M easu rem en t R ep orts 2 8 1 9 2 . 8 3 H O S C on ferences 1 6 7 2 . 9 2 H O S W eb s ite 1 5 3 . 09 1 0 I n form ation an d T ech n ica l S u p p ort 1 2 2 N /A D a ta U ser’s G u id es 7 4 3 . 4 0

Of the 42 M+CO respondents, 16 (38.10%) were familiar with HOS conferences (Table IV-16). Of these, few reported attendance at the conferences in 1998 (3), 1999 (3), or 2002 (7). Conferences were rated as 2.92 in their overall usefulness by M+COs familiar with the conferences. Conferences were viewed as most useful for providing historical HOS information (3.50) but less useful in providing guidance on how to use the HOS data for quality improvement (2.50) or plan monitoring (2.75) activities. There were no clear challenges identified by M+COs to the potential use of HOS conferences (data not shown). However, focus group participants noted time and travel expenses as an issue affecting their use of the conferences.

Fifteen (35.71%) M+COs were familiar with the HOS Web site, 10 of which had used it one or more times in the previous 2 years (Table IV-16). Of these 10 respondents, 7 used the Web site about as frequently for accessing HOS data, gaining HOS updates (7), accessing publications and reports (7), and improving understanding of HOS (6). The Web site was viewed as most useful in gaining access to the HOS data (3.67) and accessing data user’s tools (3.38) (data not shown). M+COs familiar with the Web site rated it a 3.09 in overall usefulness. M+COs identified no challenges to using the HOS Web site (data not shown).


1 01

Delmarva Foundation



Twelve (28.56%) M+COs reported knowing about HOS information and technical support; 2 of them reported using this support in the previous 2 years (Table IV-16). Seven (16.67%) M+COs were familiar with the data user’s guides, and 4 had used the guides in the previous 2 years, with all 4 using them to understand file contents (data not shown). Those M+COs using the guides rated them a 3.40 in overall usefulness, with the guides most useful for understanding file contents (3.40). No particular challenges to using the data user’s guides were identified (data not shown).

QIO/M+CO collaboration. Two questions focused on QIO/M+CO collaboration in quality improvement. The majority of M+COs responding (80.00%) did not view the HOS as encouraging QIO/M+CO collaboration. Fourteen respondents offered suggestions to improve collaboration or identified impediments. Four perceived that their QIO did not appear to know how to provide assistance with the HOS data. One M+CO representative indicated that the “QIO doesn’t seem to know much about the HOS and how they can assist us with interpreting the data, identifying areas needing improvement, etc.” Two additional M+CO respondents preferred receiving data directly rather than through the QIO. This last comment shows the difficulty in dissemination of data since M+COs do receive data directly from HSAG. Four suggested that providing ideas regarding how to use HOS data would facilitate collaboration. For example, one respondent suggested audio or Web-based conferences on how to use HOS results. Four respondents identified timeliness as hindering collaboration. One respondent explained, “Data timeliness is a huge issue. Reports, when they do arrive, provide very little information that has not already been obtained through other avenues.”

In contrast, one M+CO noted that it reached out to its QIO “because we know that our QIO has access to the most detailed data for all of the plans in our state. . . . the reports they have produced for us have been helpful because they have allowed us to view our scores compared to the rest of the state.”

Sixteen of 42 M+CO respondents identified HOS program strengths. Of these respondents, 8 viewed the greatest strength of the HOS program to be its ability to provide comparative data both cross-sectionally and longitudinally. One respondent noted, “The HOS program provides good data about Medicare member demographics and provides nice comparative data.” This echoes the survey ratings of HOS data elements as well as reports of use previously described. Focus group participants described how they had used HOS data to profile enrollees to identify prevalent conditions. M+COs also used HOS data on member demographics for purposes of member outreach, including the development of outreach materials that were culturally sensitive to their membership.

Nineteen respondents identified HOS program weaknesses. The most frequently cited weakness was the timeliness of the data, identified by 8 respondents. Five respondents perceived the lack of beneficiary-level data as a primary weakness.


1 02

Delmarva Foundation



Utility of the HOS for CMS

One of the goals of the HOS program is to produce valid and reliable data for CMS to use for Medicare managed care health plan performance monitoring and assessment. Currently, the HOS data are used as part of CMS’s Health Plan Management System (HPMS), a system of health plan process and outcomes performance measures. These measures are composite scores derived from HEDIS, CAHPS, HOS, and disenrollment data, and are used by CMS to rank plans (Malsbary, personal communication, 2003).

To develop a plan composite score, the percentile ranks for selected HEDIS, HOS, CAHPS, and disenrollment indicators are averaged. Those plans deviating substantially from the mean are considered either high or low overall performers based on the national comparison group. Plans in the top 5% are exempt from portions of the CMS biennial site audit. Areas of exemptions include the following: quality and effectiveness of care, access to care, member satisfaction, and possibly appeals processing (Bowen, personal communication, 2004a; Giovanni, personal communication, 2004; Hoogerwerf, personal communication, 2004).

The area of CMS responsible for using the HOS data for plan performance monitoring is the Division of Health Plan Accountability. CMS influences the performance of health plans by: (1) providing an incentive for health plans to provide and maintain quality of care (i.e., the exemption from portions of the biennial site audit) and (2) imposing a penalty for health plans with poor performance in the form of contract nonrenewal of health plans with a persistent history of poor performance. Few plans are actually nonrenewed due to performance issues (Hoogerwerf, personal communication, 2004). Further, the reward from CMS in the form of the site audit exemptions is unlikely to be a strong influence on health plan behavior. Staff members from the Division of Health Plan Accountability acknowledge that the CMS penalties and quality improvement/ performance incentives are not a strong influence on plan behavior (Giovanni, personal communication, 2004; Hoogerwerf, personal communication, 2004). Division representatives affirm that the HOS provides useful data about the quality of care provided by health plans as measured by the HOS plan-level outcome measure (percentage of plan beneficiaries with better-than-expected, same, or worse-than-expected changes in PCS and MCS scores) (Giovanni, personal communication, 2004; Hoogerwerf, personal communication, 2004).

Further, the HOS has provided CMS with the ability to use health outcomes data for payment adjustments. CMS is now using the ADL data from the HOS to calculate a frailty adjustor to set payment for Social Health Maintenance Organizations. A variant of the HOS, the PACE Health Survey, is being used to calculate a frailty adjustor for PACE plans and for demonstrations that target Medicare beneficiaries (Tudor, personal communication, 2003).


1 03

Delmarva Foundation



Although CMS is using the HOS data for performance monitoring and assessment, CMS is not using them to encourage quality of care through market pressure brought to bear by publicly reporting the HOS data. CMS has developed and rigorously tested the module to publicly report the HOS data but has yet to implement the tool. According to Paul (personal communication, 2003), at the time data were initially available for public reporting, M+COs did not see value added by reporting the HOS data in addition to the already available HEDIS and CAHPS data. Paul also noted that the HOS data did not differentiate many M+COs as either high or low in performance relative to those in the middle of the distribution. Finally, even if differences in plan performance were captured by the HOS, it was not clear that these differences could be translated to consumers.

The HOS has been used as a vehicle to explore emerging and/or geriatric quality care–related health issues and concerns. The HOS instrument has included questions about health services utilization, retirement community living, smoking frequency and cessation, management of urinary incontinence, and healthy days (NCQA, 2003b). The management of urinary incontinence questions used in the HOS was added in 2003 as a new HEDIS measure (NCQA, 2003b). The Healthy Days questions from the Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS) were added to allow a link between the HOS and BRFSS results, facilitating comparison across federal initiatives.

In addition, CMS has used the HOS data for research purposes to gain an understanding of the health status of dual eligible (Medicare and Medicaid) beneficiaries and beneficiaries under 65 years of age who are disabled. The HOS data also have been used to examine the quality of mental health care provided to beneficiaries (Bowen, personal communication, 2004a; Harris, personal communication, 2004; Lied, personal communication, 2004). Lastly, the HOS data have contributed to the development of a database of measures of functional status and health outcomes of disadvantaged populations. CMS also plans to establish a chronic disease database of health-related data by chronic diseases, as mandated by the Medicare Modernization Act of 2003, using the HOS as one of the data sources (Bowen, personal communication, 2004a).

Utility of the HOS for Health Services Researchers

The HOS data have been used in intramural research. Technical research reports have been made available on the Medicare HOS Web site, covering topics such as how performance measurement results are calculated and the health status of younger individuals with disabilities enrolled in M+CO plans. These reports are available at www.cms.hhs.gov/surveys/hos. The HOS Web site also includes a complete list of HOS-related publications appearing in scientific journals.


1 04

Delmarva Foundation



Researchers have used the HOS data to examine the functional status of chronically ill Medicare managed care enrollees and to assess the need for disease management programs. One study reviewed the disease management demonstration projects sponsored by CMS and discussed how such interventions may help to enhance the functional status of chronically ill Medicare beneficiaries over time. Through exploring the change in SF-36 scores over time, it has been found that the presence of chronic disease has a negative impact on both physical and mental health of enrollees over a 2-year period (Haffer et al., 2003). Results from the HOS data indicate that opportunities exist to enhance health outcomes in this population.

Researchers who have used the HOS data find the data to be extremely useful and accessible (Bierman, personal communication, 2003; Harris, personal communication, 2004; Kazis, personal communication, 2003; Lied, personal communication, 2004). According to Kazis (personal communication, 2003), the HOS data allow one to evaluate health care quality and health outcomes that are not available anywhere else for the Medicare managed care population. Although producing health outcomes data for research was not an initial goal or objective of the HOS program, it has become a very positive, unintended outcome of the program.

Discussion of How the HOS Data Are Currently Used

The HOS program goals state that if the HOS data are to be used in quality improvement activities, they must be useful to those program users—QIOs and M+COs—who have the primary role in delivering health care. The QIO and M+CO surveys and focus groups in this component of the evaluation study focused on issues related to the use of the HOS data in quality improvement and performance assessment activities. Of the 31 QIO respondents with active M+CO plans, 45.17% reported using HOS data for quality improvement activities, and 38.71% reported using HOS data for performance assessment. QIO HOS data users found various the HOS components particularly useful in quality improvement and education and less so for performance assessment. QIO respondents had little problem with interpreting the HOS data and information or receiving the reports, but did report challenges linking the HOS data to processes of care and challenges associated with the timeliness of the data.


1 05

Delmarva Foundation



Surveys were sent to all M+COs currently serving Medicare beneficiaries. Forty-two M+CO responses were received, representing approximately 40% of Medicare M+CO plans and enrollment. Of the 42 respondents, 52.38% reported using the HOS data for performance assessment, and 38.10% reported using the data for quality improvement initiatives. M+CO HOS data users found various HOS components particularly useful for disease management and performance assessment and less so for resource utilization and quality improvement. Similar to QIO respondents, M+CO respondents found less challenge with interpreting the HOS data and information or receiving the reports than they did with timeliness, how to improve HOS results, and linking HOS data to processes of care.

An interesting dichotomy resulted when QIO and M+CO respondents were asked whether they saw the HOS program as encouraging collaboration between them. About two-thirds of the QIO respondents agreed, but only 20% of the M+CO respondents agreed. Many M+CO respondents did not seem to view QIOs as knowledgeable about either M+COs or the HOS.

QIO respondents also gave feedback on their use and perceptions about the various products of the HOS program: baseline reports, performance measurement reports, the HEDIS Volume 6 manual, HOS conferences, and so forth. QIOs used or were aware of the various products, with the greatest awareness of the baseline reports (83.87%) and the least awareness of the HOS conferences and Web site (35.48%). The average rating of these reports was positive (above 3.0 on a 1–5 scale). QIO respondents reported the least challenge with accuracies of the reports, the correct person receiving the reports, and understanding the reports, and reported somewhat more challenge with the timeliness of the reports and linking HOS data to processes of care. QIOs receive both baseline and follow-up data from CMS about a year after collection.

In addition to the use of the data themselves, we also examined M+CO use of the various products of the HOS program: baseline reports, performance measurement reports, the HEDIS Volume 6 manual, HOS conferences, and so forth. M+COs either used or were aware of the various products, with greatest awareness of the baseline reports (88.33%) and least awareness of HOS Web site (35.17%). On average, these reports were positively rated (above 3.0 on a 1–5 scale). M+CO respondents had the least challenge with accuracies of the reports, the correct person receiving the reports, understanding the reports and somewhat more challenge with the timeliness of the reports and the lack of beneficiary level data. Note, however, that M+CO’s receive beneficiary level data at the end of each cohort.


1 06

Delmarva Foundation



The HOS provides QIOs and health plans with data that they may use to identify opportunities to improve the quality of care provided to their Medicare populations. In addition, CMS uses the HOS as a source of data for performance monitoring, plan accountability, and informing policy decisions. Further, researchers have a rich source of health outcomes data to use in a variety of gerontology studies aimed at improving the quality of care and health outcomes of this large and growing segment of our population.

CMS’s QAPI program represents another opportunity where HOS results could be used for quality improvement activities. The QAPI program ensures that Medicare managed care health plans emphasize quality assurance initiatives and actively create programs that enhance the quality of care they provide. Through the QAPI program, health plans provide evidence that their organization has a continuous quality assessment and improvement program. Additionally, ongoing evaluations are conducted on the program to ensure that the program is effective and that necessary modifications are made when appropriate. QAPI projects have included the National Diabetes Project, the National Pneumonia Project, the National Project on Congestive Heart Failure, and the National Breast Cancer Screening Project. The most recent projects have included the Clinical Health Care Disparities and Culturally and Linguistically Appropriate Services in 2003 and a return to Diabetes in 2004 (www.cms.hhs.gov/healthplans/quality). Although health plans are required under the QAPI program to engage in quality improvement, they currently are neither required to use HOS data or any other specific data set in quality improvement projects nor to pursue improvements in health outcomes. Rather, Chapter 5 of the Medicare Managed Care Manual (www.cms.hhs.gov/manuals/116_mmc/mc86c05.pdf) outlines general requirements for the types of data (population information, performance measures, enrollee satisfaction data, utilization) to be used in developing, implementing, and evaluating QAPI projects, without specifically requiring the use of HOS data or any other specific data set.

HOS data also have been used in research. Researchers have used HOS data for projects related to disease management and for exploring opportunities to enhance health outcomes. The HOS also has been used as a vehicle to explore emerging and geriatric health quality issues and concerns.


1 07

Delmarva Foundation



V. PROGRAM ALTERNATIVES

Identification of alternatives for the HOS program is a natural outgrowth of the information developed for this evaluation study. The first step in the assessment of alternatives was to evaluate the extent to which the key program outcomes are currently being met. This process entailed a systematic review of the literature; statistical analyses of reliability and validity, attrition, and power and precision, on data from Cohorts 1–5; expert interviews with stakeholders; and, to a lesser extent, preliminary results of QIO and health plan focus groups and surveys. The results of this process are reported in previous sections. Based on the result of this assessment, several a priori most promising alternatives were identified.

These alternatives were organized into the following categories: increasing the effective sample size; shortening the HOS instrument; adding process measures to the HOS instrument; and increasing the impetus for health plans to use HOS data for quality improvement activities and monitoring health outcomes. We then engaged in a screening process to limit the set of alternatives to be evaluated in-depth. The screening process entailed assessing whether the alternative positively affects a key program outcome and conducting an intuitive assessment of the extent to which the alternative affects the various criteria in the evaluation logic model presented earlier.

The in-depth evaluation of each alternative entailed a qualitative, literature-based assessment of the impact of that alternative on the costs to respondents, health plans, and CMS. Every alternative presented is assessed qualitatively on the literature and where possible, on analysis of the existing HOS data. The likely effects of each alternative on baseline survey response rates, attrition, missing data, statistical power, reliability, validity, and QIOs/health plans’ understanding/familiarity of HOS data for quality and health outcomes improvement—and their willingness to engage in quality improvement activities—is based on this assessment. The results of this assessment are discussed here.

Increasing the Effective Sample Size

The effective sample size is an important determinant of statistical power, although it is not the only determinant (Lipsey, 1990). By effective sample size, we mean the number of observations that can actually be used in analyzing plan performance. Thus, the effective sample size depends on the size of the baseline sample, the baseline response rate, the attrition rate between the baseline and follow-up surveys, and the patterns of missing data.


1 08

Delmarva Foundation



Increasing the effective sample size is a means of increasing statistical power and thereby detecting differences in plan performance in terms of health outcomes that might exist. The most straightforward way to increase the effective sample size is to increase the baseline sample size. However, the extent to which baseline sample size can be increased is constrained by plan size. The HOS currently is administered to a random sample of 1000 Medicare beneficiaries from each health plan at baseline. In health plans with fewer than 1000 beneficiaries, all eligible beneficiaries are surveyed. Our research on the first five cohorts of the HOS found that between 6.5% and 10.0% of all health plans have fewer than 1000 beneficiaries. Moreover, increasing the sample size also would increase the cost to health plans because they pay an NCQA-certified vendor for each Medicare beneficiary surveyed. Results of our focus groups and M+CO surveys suggest that increased cost to the health plans may decrease the extent to which they engage in quality improvement activities.

The costs to survey respondents associated with increasing the baseline sample size, at the individual level, is negligible. However, the cost to the broader Medicare managed care population, in terms of survey burden (i.e., the number and frequency of surveys administered to them) would be greater. This issue of survey burden led CMS to study the feasibility of integrating the HOS and CAHPS programs as a means to reduce survey burden for the Medicare managed care population and to enhance cost efficiency. The results of this study are not available at this time.

The effective sample size also could be increased by increasing the baseline response rate or decreasing attrition. Increasing the effective sample size in this manner can limit potential problems due to non-random non-response or attrition. In addition, compared with increasing the baseline sample size, alternatives that address increasing the baseline response rate or decreasing attrition are less likely to impose additional costs on health plans, although they may impose additional costs on CMS. Of course, many causes of attrition, namely death and voluntary and involuntary disenrollment, are likely to be beyond the control of CMS. Beyond this, not enough is known about the reasons that beneficiaries do not respond at baseline or follow-up to identify possible alternatives for increasing the baseline response rate and decreasing attrition.


1 09

Delmarva Foundation



As reflected in our logic model, reducing respondent burden will influence baseline response rates, attrition, and missing data rates positively. These, in turn, should lead to an improved ability of the HOS to detect differences in a health plan’s performance based on health outcomes. In addition, decreasing respondent burden is important in its own right.

Shortening the HOS Instrument

Shortening the HOS instrument is an alternative primarily aimed at reducing respondent burden. Respondent burden refers to the length of time required to complete a survey, difficulty of completion, and overall survey acceptability (McHorney, 1996). According to Neuman (1994), a mail survey of about 100 questions would be considered long for a general population survey. The HOS survey has, on average, 98 questions addressed to an older, sicker population than the general population. Mail surveys are most appropriate if the length is moderate rather than long (Neuman, 1994). However, as noted by Dillman (2000), simply reducing the number of questions does not automatically translate into improved response and missing data rates. Issues of question relevance, clarity, and invasiveness are also important.

The items in the HOS instrument consist of three primary components: (1) the SF-36; (2) case-mix and risk-adjustment questions; and (3) demographic and other questions required by the Balanced Budget Act of 1997 (NCQA, 2003b). Shortening the instrument would necessitate modifying one or more of these components of the current instrument.

In the fall of 2002, HAL and QM presented the HOS Technical Expert Panel with two analytic reports addressing the issues of modifying the HOS instrument and strategies for shortening the HOS instrument. In general, we agree with the findings in these two reports (HAL & QM, 2002a, 2002b). However, we do not fully agree with the recommendations made for reducing the length of the HOS instrument. We have two points of divergence.


1 1 0

Delmarva Foundation



HAL and QM recommended replacing the SF-36 with the SF-12v2®. On the basis of psychometric issues, we take no exception to this recommendation. However, the SF-12v2 instrument and scoring algorithms are proprietary (owned by QM). Adopting the SF-12v2 for the HOS can be expected to increase the cost of the program to CMS as well as to health plans25. Although we are unable to provide a precise estimate of the cost increase, we believe that this increase in program costs would not be trivial. Based on the literature, the VA SF-12 has acceptable psychometric characteristics. Because this instrument is in the public rather than private domain, the costs of using the VA SF-12 and its associated scoring algorithms are likely to be lower than using the SF-12v2. However, adopting the VA SF-12 still will result in transition costs to CMS, such as training costs and redrafting the HEDIS Volume 6 manual.

HAL and QM also recommended that CMS give consideration to eliminating 33 questions that were not used in the case-mix analysis, have not been used to develop severity classifications for specific chronic conditions, or are not used in the HOS plan reports, as well as items that have high “none of the time” response rates. We suggest that in addition to these characteristics, CMS consider eliminating questions that do not change over time and are already available to CMS, such as age, race, and gender questions (Q48-Q51). In addition, CMS should give consideration to eliminating questions that are believed to have low utility to QIOs and health plans based on our evaluation survey and focus group results. Thoughtful consideration must be given to balancing survey length and the data needs that the HOS program is intended to meet.

Adding Process Measures to the HOS Instrument

Understanding the manner in which process and outcome measures interact provides a basis for the development of delivery systems that lead to better patient outcomes (O’Leary, 1998). Linking outcomes and process measures is widely viewed as an important mechanism for improving the quality of care (e.g., see Chassin, 1997; Halfon et al., 1999; Leichter and Tryens, 2002; Sheingold and Lied, 2001).

2 5 CMS is likely to have increased costs related to license/use fees for the instrument and charges for scoring the data. Additionally, health plans may have increased cost, if the cost of the HEDIS Volume 6 manual is increased due to the change to the SF-12v2.


1 1 1

Delmarva Foundation



Evidence from expert interviews, QIO and health plan focus groups, and HOS-user surveys indicate that a number of factors limit the use of HOS results by QIOs and health plans for quality improvement purposes. These include: (1) limited incentives to use HOS data; (2) lack of pressure (governmental or market) to use HOS data to improve health outcomes; (3) lack of understanding of how to use HOS data for quality improvement; and (4) HOS results that are not timely for quality improvement purposes.

The results of our expert interviews, focus groups, and QIO and health plan surveys suggest that several stakeholders believe that adding process measures would benefit the HOS. These stakeholders argue that the most significant challenge to using HOS data is that of linking the data to processes of care. According to feedback received, the addition of process measures would improve a health plan’s understanding of what is driving HOS results and what the plan can do to improve beneficiary outcomes.

Mechanisms to link HOS to process measures were suggested as a way for plans to understand more fully what health plan practices may be driving PCS score and MCS score scores (HSAG, personal communication, 2003; NCQA, personal communication, 2003; Ware, personal communication, 2003). Interviewees contend that it is necessary to “devise mechanisms for clinicians and plans to learn to use the data . . . to use this type of data in real time to help people” (HAL, personal communication, 2003). Jencks (personal communication, 2003) also noted the importance of using the data to target improvement efforts within health plans.

A number of process measures seem particularly appropriate to consider. These include annual flu shots, depression screening, pain management, falls and injury prevention education, and nutrition and exercise programs. The primary benefit of adding process measures to the HOS is increasing the extent to which plans engage in quality improvement activities. The addition of process measures would have several negative consequences. In particular, more process measures would increase respondent burden, which would, in turn, reduce baseline response rates as well as increase attrition and missing data, thereby reducing power and the ability to detect high- and low-performing plans. CMS may experience increased costs to have NCQA develop a training protocol for the new instrument and the production costs associated with developing a revised HEDIS Volume 6 manual. Costs to health plans as a result of this alternative would depend on how much the survey was lengthened and would come from higher survey vendor charges for administering the revised survey.

Increasing the Impetus for QIOs and Health Plans to Use HOS Data for Quality Improvement


1 1 2

Delmarva Foundation



Several strategies are available to address the first two issues. These strategies include: (1) requiring the use of HOS data in QIO and health plan contracts; (2) using HOS results for payment adjustments (reward health plans for achieving better-than-expected health outcomes); and (3) publicly reporting HOS health outcomes results.

Medicare beneficiaries would use publicly reported HOS data, in principle, to inform their enrollment decisions. To the extent that the data are valid signals of health plan performance and there is an adequate level of competition in the market, this would directly increase health plan accountability. In addition, it would increase the extent to which health plans are likely to engage in quality and health outcomes improvement activities. However, unintentional outcomes of publicly reporting HOS results may include consumer confusion and data overload, misrepresentation of health plan performance, and a reduction in the number of health plans participating in the Medicare managed care market.

Most of the empirical evidence regarding the effects of publicly reporting health care performance data is based on hospital data (Marshall et al., 2000a). Moreover, the limited evidence that exists regarding the public reporting of health plan performance data (Gabel et al., 1998; Hibbard et al., 1997) addresses the effects of such reporting on employers rather than on consumers. The existing evidence on the effects of publicly reporting hospital performance “suggests that the information has only a limited impact on consumer decision making” (Marshall et al., 2000a, p. 1867). Among the reasons for this is the difficulty in understanding the data (Marshall et al., 2000a, p. 1867). Understanding plan performance data is likely to be even more challenging to consumers than is understanding hospital mortality data, suggesting that publicly reporting HOS health outcomes results would be less likely to influence consumer choice than publicly reporting hospital mortality data. Curiously, however, despite the apparent lack of influence of publicly reporting health care performance on consumer behavior, public reporting appears to result in improved outcomes through some other undetermined mechanism (Marshall et al., 2000b, p. 16). Moreover, public reporting of HOS data need not be directed only at individual consumers. Among the other possible audiences are consumer advocates and interest groups (Marshall et al., 2000a, p. 1873).

Given that CMS has already developed the modules and data fields for reporting HOS data and the work it has done in the area of payment adjustments, key aspects of this alternative could be implemented without significant initial cost to CMS. However, the costs to health plans are likely to increase but will vary across plans depending on the plans’ initial level of resources committed to quality and health outcomes improvement activities. On the other hand, the costs to Medicare beneficiaries (respondents), in terms of time and money, are likely to be negligible.


1 1 3

Delmarva Foundation



Third, process measures could be added to the HOS instrument. The primary benefit of doing so is the ability to link health status changes to process measures. The primary drawbacks would be an increase in cost to administer the survey and an increase in respondent burden.

Discussion of Alternatives

Based on the evaluation logic model, literature review, and interviews, a number of alternatives merit consideration. First, various techniques could be used to increase the effective sample size. The primary positive outcome of this change would be an increase in the power of the instrument. The primary drawbacks to increasing sample size are cost and the lack of clarity in knowing the most effective way to go about increasing effective sample size.

Second, to reduce respondent burden and perhaps to increase response rate, the HOS instrument could be shortened. Shortening the instrument could be done in a number of ways, including going to a shorter version of the SF-36 or eliminating "unused" questions. The primary benefits of shortening the instrument are lowering respondent burden and reducing administration costs associated with survey length (printing and shipping). The primary drawbacks are costs that might be associated with using a proprietary shorter version of the SF-12 (though a nonproprietary version is available), loss of comparable data caused by shifting to a new form, and loss of items that researchers use.

Fourth is to increase the impetus for QIOs and M+COs to use HOS data for quality improvement. This alternative could be done a number of ways, from including specific tasks related to use of HOS data in contracts (with both QIOs and M+COs) to reporting the data publicly. The primary benefit would be that HOS data would be more likely to be used as they were originally intended. The primary drawbacks would be costs to implement contract changes and the resistance that occurs whenever data that could be used to judge a provider is released to the public.


1 1 4

Delmarva Foundation



3. Medicare beneficiaries, their families, and advocates when making health care purchasing decisions.

VI. SUMMARY AND FUTURE DIRECTIONS

Summary

“The goal of the HOS program has been to gather valid, reliable, and clinically meaningful data that are used by:

1. M+COs, providers, and QIOs to monitor and improve health care quality. 2. CMS to assess the performance of M+COs and reward high performers.

4. Health researchers to advance the state-of-the-science in functional health outcomes measurement, and quality improvement interventions and strategies” (Haffer and Bowen, 2004).

In light of these goals, it has been the objective of this evaluation to determine if the goals are being met, and how the HOS program could be improved to better meet them. In the first component of this study, we describe how the HOS instrument was developed during a time when policy makers and researchers became focused on increasing enrollment in managed care and found that care quality was less than adequate. This was coupled with a growing view that understanding health quality meant measuring not just the processes of care but also the outcomes of care. As a reaction and outgrowth of best thinking about health care quality in the early and mid 1990s, HOS was introduced as a novel approach to measuring and improving health care quality.

In this evaluation study, we examine whether CMS’s approach has been effective and make recommendations about how best to improve the HOS program. Overall, despite some minor issues, we find that the HOS program is meeting most, if not all, of its stated goals. Our research found (as many others have already done) that the HOS measure is valid and reliable and so meets that basic goal. We also find that the vast majority of attrition seems to be caused by random factors. Random attrition does not cause systematic biases that may damage an instrument's usefulness. As to power, we found that HOS has adequate power to detect small effects at the health plan level, although whether this is sufficient is a matter that cannot be determined on analytical grounds alone.


1 1 5

Delmarva Foundation



Further, we found that M+COs, providers, QIOs, CMS, and health science researchers are using the HOS in the manner in which it was intended. CMS is using data to reward high performers. QIOs and M+COs are using the data to monitor and improve health quality. Health researchers are beginning to advance the state-of-the-science in functional health outcomes measurement. HOS products—reports, in particular—are well understood, believed to be accurate, and sent to the correct individuals (no small task in large organizations). Awareness of some HOS products, such as the Web site, are lower than they should be, which suggests that efforts should be made to increase awareness of the HOS "brand" and its products.

Future Directions

The HOS program was developed in an era of great change in the health care system as a way to measure health care quality. To continue to meet its goals, it is essential that the HOS continues to evolve with the health care system.

One key alternative to consider is to increase the effective sample size of the HOS by some combination of increasing the baseline sample size, increasing the baseline response rate, reducing the attrition rate between the baseline and follow-up surveys, and reducing the amount of missing data. Whether this would provide meaningful benefits depends in large part on the size of the actual plan effects that it is deemed important to detect—an issue that cannot be determined on analytical grounds alone.

Shortening the instrument in some manner might be more fruitful. Shifting to a shorter version of the SF-36 represents quite a radical (and perhaps expensive) departure for an instrument that already has seven years of data behind it. A first step in that direction might be an examination of the HOS measure itself, seeking items that are not used in the SF-36 or case-mix, items with low response rates, and items with low variability.

Adding process measures to HOS may be another useful addition to the survey. Linking outcomes and process measures is widely viewed as an important mechanism for improving the quality of care (e.g., see Chassin, 1997; Halfon et al., 1999; Leichter and Tryens, 2002; Sheingold and Lied, 2001). Both QIOs and M+COs stress the difficulty in somehow linking the HOS data or results with process measures. A primary focus in quality improvement of M+COs in particular is on enrollees with chronic health conditions such as diabetes. Enrollees with these conditions are identified through a risk assessment, typically at the time of enrollment. These enrollees are managed through disease or case management programs. Disease-specific process and outcome measures can be used to monitor their performance.


1 1 6

Delmarva Foundation



The HOS, in part, represented a shift away from process measures, but such measures have obvious value and currently continue to be used. However, adding such measures to the instrument itself should be done in concert with shortening the instrument to minimize respondent burden and administrative cost. Any process measures considered for the HOS instrument should be those that affect the largest number of Medicare beneficiaries and the understanding of which provides the greatest benefit. Another possibility is to develop a scoring model that uses outcome measures from the HOS and process measures from another source, such as Medicare HEDIS.

Another alternative we suggest is to find new ways to encourage QIOs and M+COs both to use HOS data and work together. A valid and reliable measure such as the HOS is ideal for detecting and guiding the transformative change that is expected of the QIOs in their eighth scope of work. Including specific incentives in QIO and M+CO contracts seems a natural way to encourage the use of HOS data. Additionally, QIOs and M+COs need to have a common understanding regarding how best to implement interventions that may improve HOS scores.

Data regarding health quality in nursing homes, home health agencies, and hospitals have been publicly released in the past 18 months, with more expected to come. Since CMS already has a Web module to release HOS data tested and ready to go, perhaps doing so would be another way to encourage both QIOs and M+COs to use the HOS. Releasing the information to the public would certainly address the one goal that remains largely unmet: providing HOS information directly to beneficiaries and their families so that they have an additional source to help inform their health care purchasing decisions. However, any release of information must be weighed against the fact that little evidence exists that this type of information is actually used by beneficiaries in making decisions (Marshall et al., 2000a). In addition, the complexity of the information may add to beneficiary confusion.

Related to the issue of M+COs and QIOs working together is the specific challenge of understanding how to improve HOS results. This challenge may relate in part to the previously discussed lack of process measures. As discussed, M+CO quality improvement activities may include outcome measures, but these typically are disease-specific measures rather than general health measures. Mechanisms are needed to link both the disease-specific measures in the HOS and the PCS score and MCS score (which are general health measures) to what M+COs currently are doing. Ware and colleagues (2004) have released an SF-36 primer related to measuring and improving health outcomes. Three chapters provide information on the Medicare HOS, the SF-36, and HOS normative data. A fourth chapter summarizes six studies in which the relationship between health care and SF-36 is presented. Work of this type may begin to address issues associated with the challenge of understanding how to improve HOS results.


1 1 7

Delmarva Foundation



If the production timeline can be shortened without great expense it would certainly be useful, though this largely seems to be a perceptual issue where expectations need to be managed. The actual length of time is less than other national surveys, such as the General Social Survey, which does not have the scope of HOS, but is more than NCQA's CAHPS, which does not have the scoring complexity of HOS. It seems likely that this belief that the HOS takes too long comes about because the majority of QIO and M+COs are unaware of information to the contrary, such as the Web site, and having little experience with a major survey effort (such as the HOS). Efforts should be made to deal with these expectations arising from lack of information.

Finally, a number of M+COs would like the HOS to have a sample size large enough to drill down to specific groups, such as persons with diabetes (NCQA, personal communication, 2004). However, it might be worthwhile to consider allowing more flexibility with regard to sample size as does NCQA's CAHPS. In this program, if NCQA approves, sample sizes can be increased (at M+CO expense) to address specific research issues. This certainly adds management complexity, but not insurmountably, and would address M+COs concerns.


1 1 8

Delmarva Foundation

M ed ica re H ea lth O u tcom es S u rvey P rogram E va lu a tion A p p en d ix I


Catherine Gordon

Ron Lambert

Barbara Gandek

Kristin Spector

APPENDIX I List of Interviewees

Boston University School of Public Health—Veterans Administration Medical Center Lewis Kazis

Centers for Medicare & Medicaid Services (Current and Former Staff) Sonya Bowen Steven Clauser Chris Eisenberg Michelle Giovanni Elizabeth Goldstein

Samuel C. “Chris” Haffer Yael Harris

Tim Hoogerwerf Stephen Jencks Jeffrey Kang

Terry Lied Richard Malsbary Barbara Paul Cynthia Tudor

Health Assessment Lab

Health Services Advisory Group (Current and Former Staff) Randall Adams David Drachman Brenda Fowler Laura Giordano

HOS Technical Expert Panel Members (Current and Former Members) William Rogers Marcia Stevic John Ware

National Committee for Quality Assurance Lori Andersen Alan Hoffman Russell Mardon

Oanh Vuong

Research Triangle Institute International/Division of Health Economics Research Nancy McCall


1 1 9

Delmarva Foundation



General

Describe the processes used to educate data users and stakeholders on HOS findings and applications.

4. What items on the HOS have been most useful? 5. What items on the HOS have been least useful?

Example of Stakeholder Interview Questions

Health Services Advisory Group (HSAG)

CMS contracts with HSAG for the HOS data cleaning and analysis, developing and disseminating data files and reports, educating data users and stakeholders on the HOS findings and applications, and conducting applied research with the HOS data to support CMS priorities.

1. Describe the circumstances that led to your selection to the HOS program team. 2. Describe in detail your role and responsibilities in the HOS program and how they have

changed over the years. 3. What do you expect your future role and responsibilities to include? 4. Describe the process/procedures you use to fulfill your HOS role and responsibilities.

Describe the data cleaning and editing process used by HSAG. Describe the process used to develop profiles and disseminate data to QIOs and plans.

5. Describe how these processes/procedures were developed. 6. How have these processes/procedures changed since the start of the HOS program? 7. What were some of the other strategies considered for fulfilling your responsibilities? What

were the benefits and costs of each strategy? 8. In light of all the options available, how did you arrive at the decision to conduct operations

in the way you did? 9. Describe the decision-making progress for the development and implementation of the HOS

survey. 10. Describe some of the current quality initiatives implemented by CMS and other entities.

Describe how the HOS compares to other public and private quality improvement initiatives within a managed care setting?

11. What are the key industry factors and changes that are likely to affect the future of quality assurance/improvement, and HOS specifically?

12. Does the slowdown in growth of managed care affect the need for HOS? 13. If the HOS program were discontinued, what would be the likely impact? 14. What changes to the HOS program would you recommend and why?

Data Use

1. Describe the manner in which you have used HOS data. (i.e., National Pilot Program on Depression, quality improvement).

2. Overall, how useful is HOS data for quality improvement purposes? 3. Overall, how useful is HOS data for research purposes?


1 2 0

Delmarva Foundation



9. What have been the major challenges to HSAG in using HOS data?

2. What specific strategies does HSAG currently have in place for maintaining quality assurance in the HOS program?

3. How did HSAG arrive at the decision to use these strategies? What other options were considered?

1. Boston University School of Public Health, Health Services Department Health Outcomes Technologies Program (HOT)

6. What should be done in the future to increase the utility of HOS data? 7. What can be done in the future to increase the utility of HOS data? 8. If the HOS program were discontinued, what would be the likely impact?

Quality Assurance

1. What have been your role and responsibilities in maintaining quality assurance in the HOS program?

4. What have been the major challenges to HSAG in maintaining quality assurance? 5. How does the HOS program differ from CMS quality assurance/quality improvement

programs for fee-for-service plans?

Cost/Burden

1. What are currently the costs to HSAG in helping CMS administer the HOS? 2. Have these costs changed since the start of the program? If so, how? 3. What can be done to minimize the costs to HSAG?

A similar set of questions was asked of representatives of the following organizations:

2. Centers for Medicare & Medicaid Services (CMS) 3. Health Assessment Lab 4. National Committee for Quality Assurance (NCQA) 5. Research Triangle Institute (RTI) International


1 2 1

Delmarva Foundation

M ed ica re H ea lth O u tcom es S u rvey P rogram E va lu a tion A p p en d ix I I


APPENDIX II CMS Leadership/Technical Experts Central to HOS Development

TEP Interviewees Role & Responsibilities S teven C la u ser, P h D C la u ser is cu rren tly a S en ior S cien tist a t th e N a tion a l C a ncer I n stitu te . H e

is in terested in u s in g H O S d a ta to stu d y th e im p act of ca ncer on h ea lth ou tcom es. F rom 1 9 9 7 to 2 001 , C la u ser was D irector of th e Q u a lity M easu rem en t an d H ea lth A ssessm en t G rou p (Q M H A G ) , th e organ iza tion a l u n it with in wh ich H O S was located .

S am u e l C . “C h ris” H affer, P h D H a ffer h as b een D irector of th e M ed ica re H ea lth O u tcom es S u rvey P rogram sin ce 1 9 9 7 .

S tep h en Jencks, M D , M P H Jencks h as b een D irector of th e Q u a lity Im p rovem en t G rou p a t C M S s in ce 1 9 9 8 .

Jeffrey K ang, M D , M P H K ang was D irector of th e O ffice of C lin ica l S tan d a rd s & Q u a lity a t C M S from 1 9 9 8 to 2 000. H e was th e p rin cip a l ch am p ion with in C M S lea d ersh ip for d eve lop in g a h ea lth ou tcom es m easu re for M ed ica re . H e a lso served on N C Q A ’s C om m ittee for P erform ance M easu rem en t, wh ich p la yed a key role in th e d eve lop m en t of th e H O S in stru m en t.

L ewis K a zis , S cD K azis is D irector of th e V eteran s S F -3 6 P roject for th e V eteran s A d m in istra tion an d C h ief of H ea lth O u tcom es for th e C en ter for H ea lth Q u a lity a t th e V eteran s A d m in istra tive M ed ica l C en ter in B ed ford , M assach u setts. H e h as p rovid ed tech n ica l exp ertise in th e d eve lop m en t an d refin em en t of th e H O S in stru m en t an d con d u cted com p ara tive an a lyses u s in g V eteran s S F -3 6 an d H O S d a ta .

W ill iam R ogers, P h D R ogers is S en ior S ta tisticia n at T h e H ea lth I n stitu te an d h as worked for n ea rly 3 d eca d es to a p p ly sta tistica l m eth od s to stu d ies of h ea lth an d h ea lth ca re d e livery . H e served a s sen ior sta tisticia n for b oth th e R A N D H ea lth I n su ra nce E xp erim en t an d th e M ed ica l O u tcom es S tu d y . H e colla b ora ted in th e d eve lop m en t of th e H O S su rvey , ca se-m ix a d ju stm en t m eth od ology , a n d stu d ies of th e p sych om etric p rop erties of th e H O S .

M a rcia S tevic, R N , P h D S tevic is a n a tion a lly recogn ized exp ert in h ea lth ou tcom es m easu rem en t an d im p rovem en t wh o was in volved in th e in itia l d iscu ssion s a t C M S /H C F A on d esign in g a n ou tcom es m easu re . S h e served as D irector of H ea lth O u tcom es a t H S A G from 1 9 9 5 to 1 9 9 9 an d p reviou sly worked in th e A d m in istra tor’s office a t C M S (H C F A ) .

Joh n E . W are , Jr. , P h D W are is th e F ou n d er, P resid en t, C E O , a n d C S O of Q u a lityM etric, I n c. , a n d E xecu tive D irector of H A L . H e served as P rin cip a l I n vestiga tor for th e M ed ica l O u tcom es S tu d y , wh ich d eve lop ed th e S F -3 6 su rvey . W are colla b ora ted in th e d eve lop m en t of th e H O S in stru m en t, case-m ix a d ju stm en t m eth od ology , a n d stu d ies of th e p sych om etric p rop erties of th e H O S .

B a rb a ra P a u l, M D P a u l was D irector of th e Q u a lity M easu rem en t an d H ea lth A ssessm en t G rou p a t C M S from 2 001 to 2 003 .

C a th erin e G ord on , R N , M B A G ord on was D irector of th e D ivis ion of H ea lth P rom otion an d D isease P reven tion in Q M H A G from 1 9 9 7 to 2 003 . S h e con d u cted th e in itia l resea rch on sta te-of-th e-a rt in stru m en ts ava i la b le for m easu rin g fu n ction a l h ea lth sta tu s to in form th e d eve lop m en t of th e H O S in stru m en t.

(S ou rce : www.cm s. h h s . gov/su rveys/h os)


1 2 2

Delmarva Foundation

M ed ica re H ea lth O u tcom es S u rvey P rogram E va lu a tion A p p en d ix I I I


APPENDIX III

M+CO & QIO Survey Instruments

The survey instruments are available as a separate document at www.cms.hhs.gov/surveys/hos.

M+CO & QIO Focus Group Agenda and Questions

5. Concluding thoughts

4. Please discuss ways in which CMS could improve the HOS program’s ability to facilitate collaboration between Quality Improvement Organizations and managed care plans.

5. What has worked well with the HOS program? 6. What, if any, recommendations do you propose to enhance the usefulness of the HOS

instrument, data, results, and/or communication tools?

Focus Group Agenda (California, Florida, and New York)

1. Introductions 2. Brief overview of the Medicare Health Outcomes Survey (HOS) evaluation 3. Purpose of the focus groups 4. Discussion of quality improvement initiatives and activities related to Medicare HOS

Focus Group Questions

[Introduction by ____________]: Thank you for agreeing to participate in this focus group. We would like to begin with a general discussion regarding quality improvement initiatives and activities that your organization has implemented based on the HOS data.

1. What quality improvement initiatives or related activities has your organization implemented?

2. Please discuss how you have used HOS data and/or other resources (e.g., baseline profiles) in the development and implementation of these activities.

3. What have been the outcomes of these activities?


1 2 3

Delmarva Foundation



CMS Users Interview Questions

1. What CMS department are you affiliated with?

2. Describe the purpose and goals of this department and describe how the department’s goals fit in with the overall goals of CMS (i.e., quality improvement, quality measurement, accountability).

4. How does your department utilize HOS data? For what purposes does your department use HOS data (i.e., for quality improvement, research, medical management)? PLEASE PROVIDE EXAMPLES.

6. What is your specific role within the department? How long have you been in that role?

7. When did you begin using HOS data? How knowledgeable and familiar with the HOS program are you?

9. Describe in detail the nature of your work with HOS data. Provide examples of the manner in which you have used the data.

3. Describe the nature of your department’s relationship with the HOS program/team.

5. How long has your department been using HOS data for these activities?

10. Describe the manner or procedures in which your organization obtains HOS data (e.g., from HOS team, HOS Web site, public use files).

11. Are you included in the HOS dissemination process? What is the nature of the data that you receive (e.g., CMS reports, publications, public use data files, Web site information)?

12. Is assistance with data use made available to your department by the HOS team?

13. Upon receipt of the data, what tools do you draw upon to facilitate your utilization of the data (i.e., data user’s guides, HEDIS Volume 6 manual, technical support, conferences, consulting with HOS contacts)? PLEASE DESCRIBE YOUR USE OF EACH.

14. How useful are these tools in assisting you with your efforts?

15. Of those tools you have not used, were you aware they existed? If not, are they tools you will consider using in the future to inform your efforts?

16. What additional tools would you recommend to enhance your ability to use HOS data?

17. How useful are HOS data for your department’s specific efforts and goals? PROVIDE EXAMPLES.

18. What components of the HOS data have been most useful in your department’s efforts (i.e., ADL, urinary incontinence questions, PCS score, MCS score, demographics)?

19. What components of the HOS data have been least useful in your department’s efforts?

20. How could the data be made more useful to your department?

21. What have been some of the major challenges or difficulties faced with respect to the HOS data or program?

22. What could be done to alleviate or lessen the impact of these difficulties?


1 2 4

Delmarva Foundation



23. Overall, what improvements to the HOS program would you recommend?

24. Overall, do HOS data provide value-added benefits to your department?

25. Overall, how important are HOS data to the operations of your department?


1 2 5

Delmarva Foundation

M ed ica re H ea lth O u tcom es S u rvey P rogram E va lu a tion R eferences


References

Anderson, C., Laubscher, S., & Burns, R. (1996). Validation of the Short-Form 36 (SF-36) Health Survey Questionnaire among stroke patients. Stroke, 27, 1812–1816.

Andresen, E.M., Bowley, N., Rothenberg, B.M., Panzer, R. and Katz, P. (1996). Test-retest performance of a mailed version of the Medical Outcomes Study 36-Item Short Form Health Survey among older adults. Medical Care, 34:1165-1170.

Andresen, E.M., Rothenberg, B.M., Panzer, R. and McDermott, M.P. (1998). Selecting a generic measure of health-related quality of life for use among older adults. Evaluation & the Health Professions, 21:244-264.

Babbie, E. R. (2004). The practice of social research (10th ed.). Belmont, CA: Wadsworth.

Bailit Health Purchasing. (1997a). Assessment of the Medicare Managed Care Compliance Monitoring Program (Report 1). Unpublished report.

Bailit Health Purchasing. (1997b). Review of the Managed Care Contractor Compliance Monitoring Programs of major U.S. health care purchasers. Unpublished report.

Berwick, D. M., Godfrey, A. B., & Roessner, J. (1990). Curing health care: New strategies for quality improvement. San Francisco: Jossey-Bass.

Bhatia, A., & Blackstock, S. (2000). Evolution of quality review programs for Medicare: Quality assurance to quality improvement. Health Care Financing Review, 22(1), 69–74.

Bierman, A. (2003, October 21). Telephone interview.

Bierman, A. S., & Clancy, C. M. (2001). Health disparities among older women: Identifying opportunities to improve quality of care and functional health outcomes. Journal of the American Medical Women's Association, 56(4), 155–160.

Bierman, A. S., Haffer, S. C., & Hwang, Y. T. (2001a). Health disparities among older women enrolled in Medicare managed care. Health Care Financing Review, 22(4), 187–198.


1 2 6

Delmarva Foundation



Centers for Medicare & Medicaid Services. (1998, Spring). Medicare HOS Cohort 1 baseline data. Retrieved from http://www.cms.hhs.gov/surveys/hos/hosdata.asp

Centers for Medicare & Medicaid Services. (1999, Spring). Medicare HOS Cohort II baseline data. Retrieved from http://www.cms.hhs.gov/surveys/hos/hosdata.asp

Centers for Medicare & Medicaid Services. (2000). Medicare HOS Cohort 1 analytic data, 1998–2000. Retrieved from http://www.cms.hhs.gov/surveys/hos/hosdata.asp

Centers for Medicare & Medicaid Services. (2000, Spring). Medicare HOS Cohort III baseline data. Retrieved from http://www.cms.hhs.gov/surveys/hos/hosdata.asp

Bierman, A. S., Lawrence, W. F., Haffer, S. C., & Clancy, C. M. (2001b). Functional health outcomes as a measure of health care quality for Medicare beneficiaries. Health Services Research, 36(6), 90–109.

Bloom, H.S. (1995). Minimum detectable effects: A simple way to report the statistical power of experimental designs. Evaluation Review, 19:547–556.

Boards of Trustees, Federal Hospital Insurance and Federal Supplementary Medical Insurance Trust Funds. (2004). 2004 Annual Report of the Boards of Trustees of the Federal Hospital Insurance and Federal Supplementary Medical Insurance Trust Funds. Retrieved from http://www.cms.hhs.gov/publications/trusteesreport/

Bowen, S. (2003, August 28). Personal communication (E-mail–HOS Beneficiary and Plan Burden), Baltimore, MD.

Bowen, S. (2004, April 8). Personal communication (E-mail–CMS HOS Data Users), Baltimore, MD.

Bowen, S. (2004, July 13). Personal communication (E-mail–HOS Project Funding), Baltimore, MD.

Brook, R. H., McGlynn, E. A., & Cleary, P.D. (1996). Measuring quality of care. New England Journal of Medicine, 335, 966–970.

Centers for Medicare & Medicaid Services. (2000, Spring). Medicare HOS Cohort 1 follow-up. Retrieved from http://www.cms.hhs.gov/surveys/hos/hosdata.asp


1 2 7

http://www.cms.hhs.gov/publications/trusteesreport/

http://www.cms.hhs.gov/surveys/hos/hosdata.asp

Delmarva Foundation



Centers for Medicare & Medicaid Services. (2001). Medicare HOS Cohort II analytic data, 1999–2001. Retrieved from http://www.cms.hhs.gov/surveys/hos/hosdata.asp

Centers for Medicare & Medicaid Services. (2002, October 17). Implementing the HEDIS Medicare Health Outcomes Survey (Option Year 1), Draft Annual Report 09/29/01–09/28/02, CMS Contract 500-00-0055. Baltimore, MD: Author.

Centers for Medicare & Medicaid Services. (n.d.). Implementing the HEDIS Medicare Health Outcomes Survey, Statement of Work (9/29/02–9/28/03). Baltimore, MD: Author.

Centers for Medicare & Medicaid Services. (n.d.). Medicare HOS publications. Retrieved from http://www.cms.hhs.gov/surveys/hos/hosresearch.asp

Centers for Medicare & Medicaid Services. (2001, Spring). Medicare HOS Cohort II follow-up. Retrieved from http://www.cms.hhs.gov/surveys/hos/hosdata.asp

Centers for Medicare & Medicaid Services. (2001, Spring). Medicare HOS Cohort IV baseline data. Retrieved from http://www.cms.hhs.gov/surveys/hos/hosdata.asp

Centers for Medicare & Medicaid Services. (2002, November 20). Medicare Health Outcomes Survey (HOS) Technical Expert Panel, conference call summary. Baltimore, MD: Author.

Centers for Medicare & Medicaid Services. (2003). 2003 Data Compendium. Retrieved from http://www.cms.hhs.gov/researchers/pubs/datacompendium/current/

Centers for Medicare & Medicaid Services. (n.d.). Improving outcomes using Medicare Health Outcomes Survey Data, Statement of Work (12/15/02–10/31/05). Baltimore, MD: Author.

Centers for Medicare & Medicaid Services. (n.d.). Medicare Health Outcomes Survey overview. Retrieved from http://www.cms.hhs.gov/surveys/hos/hosoverview.asp

Centers for Medicare & Medicaid Services. (n.d.). Medicare Health Outcomes Survey Technical Expert Panel memorandum: Follow-up on HOS TEP decisions from November 20, 2002. Baltimore, MD: Author.


1 2 8



http://www.cms.hhs.gov/researchers/pubs/datacompendium/current/

Delmarva Foundation



Centers for Medicare & Medicaid Services and Health Services Advisory Group. (2003). Medicare HOS Performance Measurement Report Cohort III (2000–2002). August, 2003, Phoenix, AZ: HSAG.

Centers for Medicare & Medicaid Services. (n.d.). Medicare HOS Technical Expert Panel. http://www.cms.hhs.gov/surveys/hos/hostep.asp

Chassin, M.R. (1997). Assessing strategies for quality improvement,. Health Affairs, 16(3): 151-161.

Chassin, M., & Galvin, R. (1998). The urgent need to improve health care quality: Institute of Medicine National Roundtable on Health Care Quality. Journal of the American Medical Association, 280, 1000–1005.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Cooper, J. K., & Kohlmann, T. (2001). Factors associated with health status of older Americans. Age and Ageing, 30, 495–501.

Cooper, J. K., Kohlmann, T., Michael, J. A., Haffer, S. C., & Stevic, M. (2001). Health outcomes: New quality measure for Medicare. International Journal for Quality in Health Care, 13(1), 9–16.

Dexter, P. R., Stump, T. E., Tierney, W. M., & Wolinsky, F. D. (1996). The psychometric properties of the SF-36 Health Survey among older adults in a clinical setting. Journal of Clinical Geropsychology, 2, 223–237.

Dillman, D. A. (2000). Mail and Internet surveys. The tailored design method (2nd ed.). New York: Wiley.

Donabedian, A. (1988). The quality of care. How can it be assessed? Journal of the American Medical Association, 269(12), 1743–1748.

Gabel, J. R., Hunt, K. A., Hurst, K. & Marwick, K. P. (1998). When employers choose health plans (Report 293). New York: Commonwealth Fund.

Gandek, B., Sinclair, S. J., Kosinski, M. A., & Ware, J. E. (2004). Psychometric evaluation of the SF-36 Health Survey in Medicare managed care. Health Care Financing Review, 25, 5–25.


1 2 9

Delmarva Foundation



Ginsburg, P. B., & Lesser, C. S. (1999). The view from communities. Journal of Health Politics, Policy, and Law, 24(5), 1005–1013.

Harris, Y. (2004, April 23). Personal communication, Baltimore, MD.

Gandek, B., Ware, J. E., Aaronson, N. K., Alonso, J., Apolone, G., Bjorner, J., Brazier, J., Bullinger, M., Fukuhara, S., Kaasa, S., Leplege, A., & Sullivan, M. (1998). Tests of data quality, scaling assumptions, and reliability of the SF-36 in eleven countries: Results from the IQOLA Project. Journal of Clinical Epidemiology, 51, 1149–1158.

Giovanni, M. (2004, April 21). Personal communication, Baltimore, MD.

Goldstein, E., Cleary, P. D., Langwell, K. M., Zaslavsky, A. M., & Heller, A. (2001). Medicare Managed Care CAHPS: A tool for performance improvement. Health Care Financing Review, 22(3), 101–107.

Haffer, S. C. (2003, October 29). Personal communication, Baltimore, MD.

Haffer, S. C., Bowen, S. E., Shannon, E. D., & Fowler, B. M. (2003). Assessing beneficiary health outcomes and disease management initiatives in Medicare. Disease Management and Health Outcomes, 11(2), 111–124.

Haffer, Samuel C. and Bowen, Sonya E. (2004). "Measuring and Improving Health Outcomes in Medicare: The Medicare HOS Program." Health Care Financing Review. 25(4), 1–3.

Halfon, N., Inkelas, M., & Newacheck, P. W. (1999). Enrollment in the State Child Health Insurance Program: A conceptual framework for evaluation and continuous quality improvement. Milbank Quarterly, 77(2), 181–204.

Hayes, V., Morris, J., Wolfe, C., & Morgan, M. (1995). The SF-36 Health Survey Questionnaire: Is it suitable for use with older adults? Age and Ageing, 24, 120–125.

Health Assessment Lab. (2003, October 1). Group interview, Washington, DC.


1 3 0

Delmarva Foundation



Hill, S., Harries, U., & Popay, J. (1996). Is the Short Form 36 (SF-36) suitable for routine health outcomes assessment in health care for older people? Evidence from preliminary work in community based health services in England. Journal of Epidemiology and Community Health, 50, 94–98.

Institute of Medicine. (2003). Priority areas for national action. Washington, DC: National Academy Press.

Health Assessment Lab & QualityMetric. (2002a). Modifying the Medicare Health Outcomes Survey Questionnaire: Issues and analysis. Report for November 20, 2002, HOS TEP Meeting. Boston, MA: Author.

Health Assessment Lab & QualityMetric. (2002b). Strategies for shortening the Medicare Health Outcomes Survey. Report for November 20, 2002, HOS TEP Meeting. Boston, MA: Author.

Health Services Advisory Group. (2003a, October 1). Group interview, Washington, DC.

Hibbard, J. H., Jewett, J. J., Legnini, M. W., & Tusler, M. (1997). Choosing a health plan. Health Affairs, 16, 172–180.

Hobson, J. P., & Meara, R. J. (1997). Is the SF-36 Health Survey Questionnaire suitable as a self-report measure of the health status of older adults with Parkinson’s disease? Quality of Life Research, 6, 213–216.

Hoogerwerf, T. (2004, April 21). Personal communication, Baltimore, MD.

Institute of Medicine. (2000). To err is human: Building a safer health system. Washington, DC: National Academy Press.

Institute of Medicine. (2001a). Envisioning the National Health Care Quality Report. Washington, DC: National Academy Press. Retrieved from http://www.nap.edu/books/030907343X/html/

Institute of Medicine. (2001b). Crossing the quality chasm: A new health system for the 21st century. Washington, DC: National Academy Press.

Institute of Medicine. (2002). Leadership by example: Coordinating government roles in improving health care quality. Washington, DC: National Academy Press.


1 3 1

http://www.nap.edu/books/030907343X/html/

Delmarva Foundation



Kazis, L. (2003, October 3). Personal communication, Washington, DC.

Jencks, S. (2003, October 8). Personal communication, Baltimore, MD.

Jencks, S. F., & Wilensky, G. (1992). The health care quality improvement initiative: A new approach to quality assurance in Medicare. Journal of the American Medical Association, 268(7), 900–904.

Johansen, A. (1993). SF-36 may reinforce ageism. British Medical Journal, 307, 127.

Kane, R. (1997). Improving outcomes in rehabilitation: A call to arms and legs. Medical Care, 35(6), JS21–JS27.

Kang, J. (2003, October 27). Personal communication, Baltimore, MD.

Kantz, M. E., Harris, W. J., Levitsky, K., Ware, J. E., & Davies, A. R. (1992). Methods for assessing condition-specific and generic functional status outcomes after total knee replacement. Medical Care, 30(Suppl. 5), MS240–MS252.

Leichter, H. M., & Tryens, J. (2002). Achieving better health outcomes: The Oregon Benchmark Experience. Milbank Quarterly. Retrieved from http://www.milbank.org/reports/OregonProgres/020909Oregon.html

Lied, T. (2004, April 21). Personal communication, Baltimore, MD.

Lied, T. R., & Kazandjian, V. A. (1999). Performance: A multidisciplinary and conceptual model. Journal of Evaluation in Clinical Practice, 5(4), 394–400.

Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Thousand Oaks, CA: Sage.

Lyons, R. A., Perry, H. M., and Littlepage, B. (1993). Comparison of postal and interview-administered version of the Short-Form 36 Questionnaire (SF-36). Unpublished manuscript. As cited in McHorney, C.A. (1996). Measuring and monitoring general health status in elderly persons: Practical and methodological issues in using the SF-36 Health Survey. The Gerontologist, 36(5):571–83.


1 3 2

http://www.milbank.org/reports/OregonProgres/020909Oregon.html

Delmarva Foundation



Malsbary, R. (2003, October 29). Personal communication, Baltimore, MD.

Manton, K. G., Newcomer, R., Lowrimore, G. R., Vertrees, J. C., & Harrington, C. (1993). Social/health maintenance organization and fee-for-service health outcomes over time. Health Care Financing Review, 15(4), 173–202.

Marshall, M., Shekelle, P. G., Leatherman, S. T., & Brook, B. H. (2000a). What do we expect to gain? A review of evidence. Journal of the American Medical Association, 283, 1866–1874.

Marshall, M., Shekelle, P. G., Leatherman, S. T., Brook, B. H., & Owen, C. B. (2000b). Dying to know: Public release of information about quality of health care. London: The Nutfield Trust and Rand Corporation.

Maxwell, J., Briscoe, F., Davidson, S., Eisen, L., Robbins, M., Temin, P., & Young, C. (1998). Managed competition in practice: Value purchasing by fourteen employers. Health Affairs, 17(3), 216–226.

McHorney, C.A., Teno, J., Lu, J.F.R., Sherbourne, C., and Ware, J.E. (1990). The use of standardized measures of functional status and well being among cognitively impaired and intact elders: Results from the medical outcomes study. Paper presented at the 43rd Annual Scientific Meeting of the Gerontological Society of American, Boston, MA. As cited in McHorney, C.A. (1996). Measuring and monitoring general health status in elderly persons: Practical and methodological issues in using the SF-36 Health Survey. The Gerontologist, 36(5):571–583.

McHorney, C. A., Ware, J. E., Lu, J. F. R., and Sherborne, C. D. (1994). The MOS 36-Item Short-Form Health Survey (SF-36):III: Tests of data quality, scaling assumptions and reliability across diverse patient groups. Medical Care, 32:40–66.

McHorney, C. A. (1996). Measuring and monitoring general health status in elderly persons: Practical and methodological issues in using the SF-36 health survey. The Gerontologist, 36(5), 571–583.

McIntyre, D., Rogers, L., & Heier, E. J. (2001). Overview, history and objectives of performance measurement. Health Care Financing Review, 22(3), 7–21.


1 3 3

Delmarva Foundation



National Committee for Quality Assurance. (2002a). HEDIS 2002, Volume 6: The Medicare Health Outcomes Survey manual: Specifications for the Medicare Health Outcomes Survey. Washington, DC: Author.

Miller, E. A., & Weissert, W. G. (2000). Predicting elderly people’s risk of nursing home placement, hospitalization, functional impairment and mortality. Medical Care Research and Review, 57, 259–297.

Miller, N. A. (1992). An evaluation of substance misuse treatment providers used by an employee assistance program. International Journal of the Addictions, 27(5), 533–559.

National Committee for Quality Assurance. (1998a). HEDIS 1998, Medicare Health Outcomes Survey. Retrieved from the Centers for Medicare & Medicaid Services Web site: http://www.cms.hhs.gov/surveys/hos/download/HOS_1998_Survey.pdf

National Committee for Quality Assurance. (1998b). HEDIS 1998, Volume 6: The Medicare Health Outcomes Survey manual: Specifications for the Medicare Health Outcomes Survey. Washington, DC: Author.

National Committee for Quality Assurance. (1999). HEDIS 1999, Volume 6: The Medicare Health Outcomes Survey manual: Specifications for the Medicare Health Outcomes Survey. Washington, DC: Author.



National Committee for Quality Assurance. (2002b). Making a difference: Recognizing and rewarding excellence. Annual report 2002. Washington, DC: Author.

National Committee for Quality Assurance. (2002c). The state of health care quality report. Washington, DC: Author.


1 3 4

http://www.cms.hhs.gov/surveys/hos/download/HOS_1998_Survey.pdf

Delmarva Foundation



National Committee for Quality Assurance. (2004, July 15). Personal communication (E-mail–HOS Fees), Washington, DC.

Palmer, H. (1995). Perspectives on Medicare: Securing healthcare quality for Medicare. Health Affairs, 14(4), 89–100.

National Committee for Quality Assurance. (2003a). HEDIS 2003, Medicare Health Outcomes Survey. Retrieved from the Centers for Medicare & Medicaid Services Web site: http://www.cms.hhs.gov/surveys/hos/download/HOS_2003_Survey.pdf

National Committee for Quality Assurance. (2003b). HEDIS 2003, Volume 6: The Medicare Health Outcomes Survey manual: Specifications for the Medicare Health Outcomes Survey. Washington, DC: Author.

National Committee for Quality Assurance. (2003c, October 1). Group interview, Washington, DC.

Neuman, W. L. (1994). Social research methods. Qualitative and quantitative approaches (2nd ed.). Boston: Allyn and Bacon.

Nunnally, J. C. and Bernstein, I. H. 1994. Psychometric theory, 3rd ed. New York: McGraw-Hill.

Office of Inspector General. (1997). Medicare’s oversight of managed care: Implications for regional staffing (Report No. OEI-01-96-00190). Washington, DC: U.S. Department of Health and Human Services.

O’Mahony, P. G., Rodgers, H., Thomson, R. G., Dobson, R., & James, O. F. W. (1998). Is the SF-36 suitable for assessing health status of older stroke patients? Age and Ageing, 27, 19–22.

Orr, L. L. 1999. Social experiments: Evaluating social programs with experimental methods. Thousand Oaks, CA: Sage Publications.

Paul, B. (2003, October 29). Personal communication, Baltimore, MD.

Phillips, R. C., & Lanka, D. J. (1992). Outcomes management in heart valve replacement surgery: Early experience. Journal of Heart Valve Disorders, 1(1), 42–50.


1 3 5

http://www.cms.hhs.gov/surveys/hos/download/HOS_2003_Survey.pdf

Delmarva Foundation



Seymour, D. G., Ball, A. E., Russell, E. M., Primrose, W. R., Garrett, A. M., & Crawford, J. R. (2001). Problems in using health survey questionnaires in older patients with physical disabilities: The reliability and validity of the SF-36 and the effect of cognitive impairment. Journal of Evaluation in Clinical Practice, 7(40), 411–418.

Reuben, D. B., Valle, L. A., Hays, R., & Siu, A. L. (1995). Measuring physical function in community-dwelling older persons: A comparison of self-administered, interviewer-administered, and performance-based measures. Journal of the American Geriatric Society, 43, 17–23.

Rogers, W. (2003, November 6). Telephone interview.

Selltiz, C., Wrightsman, W., Hackbarth, G. M., & Cook, S. W. (1976). Research methods in social relations. New York: Holt, Rinehart & Winston.

Shaughnessy, P., Schlenker, R. E., & Hittle, D. F. (1994). Home health care outcomes under capitated and fee-for-service payment. Health Care Financing Review, 16(3), 252–270.

Sheingold, S., & Lied, T. (2001). An overview: The future of plan performance measurement. Health Care Financing Review, 22(3), 1–5.

Sherbourne, C., & Meridith, L. S. (1992). Quality of self-report data: A comparison of older and younger chronically ill patients. Journal of Gerontology, 46(4), S204–S211.

Stadnyk, K., Calder, J., & Rockwood, K. (1998). Testing the measurement properties of the Short Form-36 Health Survey in a frail elderly population. Journal of Clinical Epidemiology, 51, 827–835.

Stevic, M. (2003, October 21). Telephone interview.

Tudor, C. (2003, October 29). Personal communication, Baltimore, MD.

Turner-Bowker, D. M., Bartley, P. J., & Ware, J. E. (2002). SF-36 health survey and SF bibliography (3rd ed.). Lincoln, RI: QualityMetric.

Ware, J. E., Snow, K. K., Kosinski, M., and Gandek, B. (1993). SF-36®. Health Survey Manual and Interpretation Guide. Boston: The Health Institute.


1 3 6

Delmarva Foundation



Ware, J. E. (1997). Patient based assessment: Tools for monitoring and improving healthcare outcomes. Boston: New England Medical Center.

Ware, J. E. (2003, October 31). Telephone interview.

Ware, J. E., Bayliss, M. S., Rogers, W. H., Kosinski, M., & Tarlov, A. R. (1996). Differences in 4-year health outcomes for elderly and poor chronically ill patients treated in HMO and fee-for service systems. Journal of the American Medical Association, 276, 1039–1047.

Ware, J. E., Gandek, B, Sinclair, S. J., & Kosinski, M. (2004). Measuring and improving health outcomes: An SF-36 primer for the Medicare Health Outcomes Survey. Waltham, MA: Health Assessment Lab and QualityMetric Incorporated.

Ware, J.E., and Kosinski, M. (2001). SF-36 Physical and Mental Health Summary Scales: A Manual for Users of Version 1. Lincoln, RI: QualityMetric Incorporated.

Ware, J. E., Kosinski, M., Bayliss, M. S., McHorney, C. A., Rogers, W. H., & Raczek, A. (1995). Comparison of methods for the scoring and statistical analysis of the SF-36 health profile and summary measures: Results from the Medical Outcomes Study. Medical Care, 33(Suppl. 4), AS264–AS279.

Ware, J. E., Kosinski, M., & Gandek, B. (2003). SF-36 Health Survey manual and interpretation guide. Lincoln, RI: QualityMetric.

Ware, J. E., & Sherbourne, C. D. (1992). The MOS 36-Item Short-Form Health Survey (SF-36): Conceptual framework and item selection. Medical Care, 30(6), 473–483.

Ware, J. E., Snow, K. K., Kosinski, M., and Gandek, B. (1993). SF-36® Health survey manual and interpretation guide. Boston: The Health Institute.

Weinberger, M., Samsa, G. P., Hanlon, J. T., Schmader, K., Doyle, M. E., Cowper, P.A. (1991). An evaluation of a brief health status measure in elderly veterans. Journal of the American Geriatrics Society, 39, 691–694.

Wooldridge, J. M. (2003). Introductory econometrics (2nd ed.). Mason, OH: South-Western.


1 3 7

medicare health outcomes survey evaluation...medicare+choice (m+cos) plans (renamed medicare...

Documents