The Role of Local Specificity in the Interpretation of Small Area
Estimation
Benmei LiuScott GilkesonGordon WillisRocky Feuer
2012 FCSM Statistical Policy SeminarDecember 4, 2012
Outline
I. Overview of small area estimationII. The importance of local specificity and how it
could affect data useIII. An example from a recent project to estimate
cancer risk factors and screening behaviorIV. Discussion
2
I. Overview of Small Area Estimation (SAE)
The demand for survey estimates for small areas (small geographic areas or domains) has increased in many different areas of application (e.g., income and poverty, education, health, substance use) over the past several decades
The standard direct estimation methods for survey data cannot provide reliable estimates due to the small sample size
Model-based methods that combine information from multiple related sources have been developed to increase the precision
3
Basic SAE Model and Estimates
Fay-Herriot model (1979) has been considered the prominent fundamental approach
The final estimate for area derived from the Fay-Herriot class of models:
where:
is the direct estimate; is a regression-based synthetic estimate; is the proportion of the final estimate due to regression based synthetic estimate, or a measure of this borrowed strength;
4
II. The importance of Local Specificity
We label the information about the use of local versus borrowed data based on the SAE techniques as local specificity
We propose that the term local specificity be used as a generalizable and intuitively understandable term for the degree to which local data contribute to the small area estimate for a specified area
5
The importance of Local Specificity (Cont’d)
Local specificity can be an important indicator of fitness for use
We argue that local specificity provides unique information that is not otherwise available
For local data users, a measure of local specificity could be useful
A measure of local specificity was not provided on any of the government websites that release small area estimates data (e.g., SAIPE, NAAL, NSDUH)
6
III. Communicating Local Specificity to End Users: An Example
Combining information from two health surveys to enhance small-area estimation (Raghunathan et al. 2007; Davis et al. 2010)
Project led by National Cancer Institute, with collaboration by:- National Center for Health Statistics- National Center for Chronic Disease Prevention and
Health Promotion- University of Michigan- University of Pennsylvania- Information Management Services
7
Motivation for the Project
Cancer screening and risk factor data are of great interest to cancer control planners at the state and sub-state level, but accurate local statistics have been difficult to obtain
Different surveys have different strengths Combining information from surveys and
borrowing strength from other sources (e.g., Census or administrative records) using small area modeling approach could improve small-area estimates
8
Surveys Used Behavioral Risk Factor Surveillance System
(BRFSS) – the largest U.S. survey tracking health conditions and risk behaviors at the state and sub-state level since 1984
Limitations: Potential nonresponse bias; Undercoverage of hhlds without landline phones
National Health Interview Survey (NHIS) – the principal source of information on the health of the civilian noninstitutionalized population of U.S. since 1957
Limitations: Smaller sample size; only includes data on about ¼ of U.S. counties
9
Project Description Bayesian methods are developed to combine
information from the two surveys; also incorporated telephone coverage rates from the Census
National Cancer Institute released estimates for two time periods: 1997-99 and 2000-03 (http://sae.cancer.gov/)- Smoking, mammography, and pap smear- Counties, health service areas, and states
Current work involves including component for cellphone-only households and for the recent periods
10
Focus Group SuggestionsConducted two focus groups with cancer control
planners and public health professionals at the Comprehensive Cancer Control Leadership Institute in June 2010
Recommendations:Include these estimates within NCI’s State Cancer Profiles
website (http://statecancerprofiles.cancer.gov/)- The website is a comprehensive system of interactive maps and
graphs enabling the investigation of cancer trends at the national, state, and county level
Need a way to describe the differences between the bias-adjusted model-based estimates and existing direct estimates
Data users would appreciate an indicator like local specificity to validate the estimate against local evidence
11
Issues on Communicating Local Specificity
1) How should it be measured? 2) What should it be labeled? 3) What thresholds should be set in assigning
values to it?
12
1) Measuring Local Specificity
The bias-adjusted SAE model is complex and lacks an explicit shrinkage factor
The concept of borrowed strength still applies, depending primarily on the combined BRFSS and NHIS sample size within the area
NHIS sample size is confidential. The sample size of the combined sample is close to the BRFSS sample size
BRFSS sample size is published, and alone was the best practical measure of the amount of local data
13
2) Labeling Local Specificity
Presenting the BRFSS sample size as a number along with the estimates didn’t convey the message of local specificity
Developed the term local specificity and selected qualitative (i.e., high, medium, and low) rather than quantitative descriptors
14
3) Assigning Thresholds
Selected BRFSS sample size of 50 as the threshold for low local specificity
Determining break points for the categories of local specificity deserves further study
15
Ratios of model-based county level current mammography screening rate over the bias-corrected BRFSS direct estimate
16
Small area estimates of mammography screening by county in Pennsylvania, with a mini-map showing local specificity
17
Warren county 2000-2003 percentage = 65.9 (56.6-75.2)
Westmoreland county 2000-2003 percentage = 64.8 (57.5-72.2)
IV. Discussion Our experience has convinced us that such a measure is
critical for end users in their use and interpretation of results
The potential importance of local specificity should not be under-emphasized, given that users demand more from SAEs than from the results of most other statistical models
There is no single computational formula for calculating levels of local specificity that will apply generally across various models and further research is needed
Whenever estimates are based on non-ignorable levels of borrowed strength, it is vitally important to disseminate analyses in such a way that local specificity, as an important index of fitness for use, be conveyed to data users in a clear and unbiased manner
18
Thank you!
Contact information:
Benmei Liu, Ph.D.Survey StatisticianNational Cancer [email protected]
19