conventions for handling missing data

1
440 Abstracts P94 ANALYSIS OF RISK FACTORS FOR CONDITIONS DEFINED BY EXTREMES OF A DISTRIBUTION Bruce Barton and Robert McMahon Maryland Medical Research lnstiture Baltimore, Maryland Some medical conditions, such as obesity, are commonly defined by values of a continuous variable exceeding some extreme cutpoint. For example, obesity is frequently defined as exceeding the 85th percentile of a standard distribution of the body mass index (WeighVheighP). Predictors of development of extreme values may differ from predictors of change within the normal range. With longitudinal data, development of such conditions might be represented by repeated assessment of a binary outcome. If the medical condition may be treated as an absorbing state, logistic regression can be used for survival analysis of repeated measures of binary response at discrete time points (Wu and Ware, 1979). The signif- icance of the effect of factors measured immediately prior to the development of obesity and factors measured at baseline can be assessed during the same model. Data from the NHLBI Growth and Health Study (NGHS) will be used to demonstrate this approach. NGHS is a 5-year longitudinal epidemiological study of 2379 girls, age 9-10 at entry, to assess factors related to the development of obesity in adolescent girls. Time-dependent data used will include nutrition, physical activity and stage of maturation. This approach offers a valuable tool to the statistician analyzing this type of data. P95 CONVENTIONS FOR HANDLING MISSING DATA Linda Ward for the Brltlsh Stomach Cancer Group CRC Trials Unit University of Birmingham Birmingham, England Since a false negative result is potentially as misleading as a false positive, data managers are trained to record silent data as missing. If requests for more information fail there are 3 main options at analysis: assume a negative or positive result or drop unclassifiable cases. The implications of “silent” stage data were examined in a cancer registry cohort of 1491 resected gastric cancer cases treated in the West Midlands between 1976-80. Cases were staged using the modified TNM classification of the 1st British Stomach Cancer Group trial (Fielding, 1983). Palliatively treated (stage 4) patients were those with metastatic disease or local residual disease. Curatively resected cases were characterized by histological findings on depth of tumour (S), node involvement (N) and resection line clearance (L). Inadequate pathological reporting prevented TRUE staging in 1221976 (13%) curatively treated cases. A 113 had incomplete surgical data of whom 47 (3%) patients were possibly resected for cure. We excluded the remaining 66 (4%) patients since intent to cure or palliate could not be evaluated. The effect of applying each of the three conventions-stage up, down or true-on the number and proportion surviving to 5 years in each stage is shown below: Staae assigned Down (% at No. 5 vrs) True (% at No. 5 vrs) UP (% at No. 5 vrsl Stage 1 (s-n-l -) Stage 2 (s+n- 1 -) Stage3(n+ 2 l+) Stage 4 (palliative) *Curative:unspecified ‘NK (at operation) ‘Excluded from statistics 143 (40) 79 (48) 79 (48) 213 (30) 136 (29) 139 (29) 667 (11) 639 (11) 758 (14) 402 (3) 402 (3) 449 (4) 0 122 (32) 66 (11) 113 (12) 6:’ (11) The Stage-down convention offered the best compromise of power and accuracy. However, missing data had a disproportionate effect in preventing the accurate identification of rarer, highly prognostic early stage disease. Staging down caused an 8% reduction in Stage 1 cases surviving to 5 years, hence accurate surgical and pathological reporting is needed to establish prognosis following surgery for early gastric cancer.

Upload: linda-ward

Post on 28-Aug-2016

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Conventions for handling missing data

440 Abstracts

P94 ANALYSIS OF RISK FACTORS FOR CONDITIONS DEFINED BY EXTREMES OF A DISTRIBUTION

Bruce Barton and Robert McMahon Maryland Medical Research lnstiture

Baltimore, Maryland

Some medical conditions, such as obesity, are commonly defined by values of a continuous variable exceeding some extreme cutpoint. For example, obesity is frequently defined as exceeding the 85th percentile of a standard distribution of the body mass index (WeighVheighP). Predictors of development of extreme values may differ from predictors of change within the normal range. With longitudinal data, development of such conditions might be represented by repeated assessment of a binary outcome.

If the medical condition may be treated as an absorbing state, logistic regression can be used for survival analysis of repeated measures of binary response at discrete time points (Wu and Ware, 1979). The signif- icance of the effect of factors measured immediately prior to the development of obesity and factors measured at baseline can be assessed during the same model.

Data from the NHLBI Growth and Health Study (NGHS) will be used to demonstrate this approach. NGHS is a 5-year longitudinal epidemiological study of 2379 girls, age 9-10 at entry, to assess factors related to the development of obesity in adolescent girls. Time-dependent data used will include nutrition, physical activity and stage of maturation. This approach offers a valuable tool to the statistician analyzing this type of data.

P95 CONVENTIONS FOR HANDLING MISSING DATA

Linda Ward for the Brltlsh Stomach Cancer Group CRC Trials Unit

University of Birmingham Birmingham, England

Since a false negative result is potentially as misleading as a false positive, data managers are trained to record silent data as missing. If requests for more information fail there are 3 main options at analysis: assume a negative or positive result or drop unclassifiable cases. The implications of “silent” stage data were examined in a cancer registry cohort of 1491 resected gastric cancer cases treated in the West Midlands between 1976-80. Cases were staged using the modified TNM classification of the 1 st British Stomach Cancer Group trial (Fielding, 1983). Palliatively treated (stage 4) patients were those with metastatic disease or local residual disease. Curatively resected cases were characterized by histological findings on depth of tumour (S), node involvement (N) and resection line clearance (L). Inadequate pathological reporting prevented TRUE staging in 1221976 (13%) curatively treated cases. A 113 had incomplete surgical data of whom 47 (3%) patients were possibly resected for cure. We excluded the remaining 66 (4%) patients since intent to cure or palliate could not be evaluated. The effect of applying each of the three conventions-stage up, down or true-on the number and proportion surviving to 5 years in each stage is shown below:

Staae assigned

Down

(% at No. 5 vrs)

True

(% at No. 5 vrs)

UP

(% at No. 5 vrsl

Stage 1 (s-n-l -) Stage 2 (s+n- 1 -) Stage3(n+ 2 l+) Stage 4 (palliative)

*Curative:unspecified ‘NK (at operation)

‘Excluded from statistics

143 (40) 79 (48) 79 (48) 213 (30) 136 (29) 139 (29) 667 (11) 639 (11) 758 (14) 402 (3) 402 (3) 449 (4)

0 122 (32) 66 (11) 113 (12) 6:’ (11)

The Stage-down convention offered the best compromise of power and accuracy. However, missing data had a disproportionate effect in preventing the accurate identification of rarer, highly prognostic early stage disease. Staging down caused an 8% reduction in Stage 1 cases surviving to 5 years, hence accurate surgical and pathological reporting is needed to establish prognosis following surgery for early gastric cancer.