improved use of continuous data- statistical modeling instead of categorization willi sauerbrei...

25
Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany Patrick Royston MRC Clinical Trials Unit, London, UK

Upload: dale-stewart

Post on 11-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

Improved Use of Continuous Data- Statistical Modeling instead of

Categorization

Willi SauerbreiInstitut of Medical Biometry and Informatics University Medical Center Freiburg, Germany

Patrick RoystonMRC Clinical Trials Unit,

London, UK

Page 2: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

2

Qiao et al, BJC June 2005, 137-143

What is the evidence for this statement?

Page 3: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

3

Study (first report on Rad51 in NSCLC)

340 NSCLC patients, median FU 34 monthsImmunhistochemistry (IHC)Proportion of positively stained tumor cells (positive-cell

index, PCI)

PCI continuous variable, but‚an optimal cutoff point of marker index was determined that

allowed best separation ... for prognosis‘

IHC scores 10% - low level expression (70%)IHC scores > 10% - high level expression (30%)

Page 4: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

4

Overall population

RR (95%CI): 1.93 (1.44-2.59)

multivariate analysis adjusting for N Status, Stage, Differentiation

Is such a large effect believable?

Dangers of using optimal cutpoints ... JNCI 1994

Page 5: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

5

Contents

• Categorisation or

determination of functional form

• Problems of optimal cutpoint approach

• Fractional polynomials

• Prognostic markers – current situation

Page 6: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

6

a) Step function (categorical analysis)• Loss of information• How many cutpoints?• Which cutpoints?• Bias introduced by outcome-dependent choice

b) Linear function • May be wrong functional form • Misspecification of functional form leads to wrong conclusions

c) Non-linear function• Fractional polynominals

Continuous marker Categorisation or

determination of functional form ?

Page 7: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

7

Freiburg DNA study in breast cancer patients

N= 266, median follow-up 82 months115 events for event free survival time

Prognostic value of SPF

Example 1

Page 8: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

8

SPF in Freiburg DNA study, N+ patients

Searching for optimal cutpoint

Page 9: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

9

Problems of the ‚optimal‘ cutpoint

• Multiple testing increases Type I error (~ 40% instead of 5%)

• p-value correction is possibleSPF (N+ patients)p-value 0.007corr. p-value 0.123

• Size of effect overestimated

• Different cutpoints in different studies

Page 10: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

10

Cut- point

Reference Method Cut- point

Reference Method

2.6 Dressler et al 1988 median 8.0 Kute et al 1990 median

3.0 Fisher et al 1991 median 9.0 Witzig et al 1993 median

4.0 Hatschek et al 1990 1) 10.0 O'Reilly et al 1990a 'optimal'

5.0 Arnerlöv et al 1990 not given 10.3 Dressler et al 1988 median

6.0 Hatschek et al 1989 median 12.0 Sigurdsson et al 1990 'optimal'

6.7 Clark et al 1989 'optimal' 12.3 Witzig et al 1993 2)

7.0 Baak et al 1991 not given 12.5 Muss et al 1989 median

7.1 O'Reilly et al 1990b median 14.0 Joensuu et al 1990 'optimal'

7.3 Ewers et al 1992 median 15.0 Joensuu et al 1991 'optimal'

7.5 Sigurdsson et al 1990 median

1) Three Groups with approx. equal size 2) Upper third of SPF-distribution

SPF-cutpoints used in the literature(Altman et al 1994)

‚Optimal‘ cutpoint analysis – serious problem

Page 11: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

11

a) Step function (categorical analysis)• Loss of information• How many cutpoints?• Which cutpoints?• Bias introduced by outcome-dependent choice

b) Linear function • May be wrong functional form • Misspecification of functional form leads to wrong conclusions

c) Non-linear function• Fractional polynominals

Continuous factor Categorisation or

determination of functional form ?

Page 12: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

12

• Conventional polynomial of degree 2 with powers p = (1, 2) is defined as

β1 X 1 + β2 X 2

• Fractional polynomial of degree 2 with powers p = (p1, p2) is defined as

FP2 = β1 X p1 + β2 X p2

• Powers p are taken from a predefined set S = {2, 1, 0.5, 0, 0.5, 1, 2, 3}

Fractional polynomial models

Page 13: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

13

Some examples of fractional polynomial curves

Royston P, Altman DG (1994) Applied Statistics 43: 429-467.

Sauerbrei W, Royston P, et al (1999) British Journal of Cancer 79:1752-60.

(-2, 1) (-2, 2)

(-2, -2) (-2, -1)

Page 14: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

14

Example 2

German Breast Cancer Study Group - 2

n = 686 patients, median follow-up 5 years,299 events for event-free survival time (EFS)

Prognostic markers5 continuous, 1 ordinal, 1 binary factor

Page 15: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

15P-value 0.9 0.2 0.001

Continuous factors – Different results assuming different functions

Example: Prognostic effect of age

Page 16: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

16

FP approach can also be used

to investigate predictive factors

Page 17: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

17

At risk 1: 175 55 22 11 3 2 1

At risk 2: 172 73 36 20 8 5 1

0.0

00.2

50.5

00.7

51.0

0P

roport

ion a

live

0 12 24 36 48 60 72Follow-up (months)

(1) MPA(2) Interferon

Example 3RCT in metastatic renal carcinoma

N = 347; 322 deaths

Page 18: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

18

MRCRCC, Lancet 1999

Is the treatment effect

similar in all patients?

Overall conclusion: Interferon is better (p<0.01)

Page 19: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

19

-4-2

02

Tre

atm

ent effect, log r

ela

tive h

azard

5 10 15 20White cell count

Original data

Treatment – covariate interaction

 Treatment effect function for WCC

Only a result of complex (mis-)modelling?

Page 20: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

20

0.0

00

.25

0.5

00

.75

1.0

0P

roport

ion a

live

0 12 24 36 48 60 72

Group I

0.0

00

.25

0.5

00

.75

1.0

0

0 12 24 36 48 60 72

Group II0

.00

0.2

50

.50

0.7

51

.00

Pro

port

ion a

live

0 12 24 36 48 60 72Follow-up (months)

Group III

0.0

00

.25

0.5

00

.75

1.0

0

0 12 24 36 48 60 72Follow-up (months)

Group IV

Treatment effect in subgroups defined by WCC

HR (Interferon to MPA) overall: 0.75 (0.60 – 0.93)I : 0.53 (0.34 – 0.83) II : 0.69 (0.44 – 1.07)III : 0.89 (0.57 – 1.37) IV : 1.32 (0.85 –2.05)

Check result of FP modelling

Page 21: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

21

Prognostic markers – current situation

number of cancer prognostic markers validated as clinically useful is

pitifully small

Evidence based assessment is required, but

collection of studies difficult to interpret due to

inconsistencies in conclusions or a lack of comparability

Small underpowered studies, poor study design, varying and sometimes inappropriate statistical analyses, and differences in assay methods or endpoint definitions

More complete and transparent reporting

distinguish carefully designed and analyzed studies from

haphazardly designed and over-analyzed studies

Identification of clinically useful cancer prognostic factors: What are we missing?

McShane LM, Altman DG, Sauerbrei W; Editorial JNCI July 2005

Page 22: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

22

We expect some improvements by REMARK guidelines

published simultaneously in 5 journals, August 2005

Page 23: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

23

Conclusions• Cutpoint approaches have several problems• Analyses are required in which continuous

markers are kept continuous• More power by using all information from

continuous markers• FPs are well-suited to the task• FP analyses may detect important effects

which may be missed by standard methodology

Page 24: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

24

• Substantial improvement in research in prognostic and predictive markers is required, similar problems in risk factors in epidemiology analysis of genomic data gene-environmental interactions …

• Improvement by more collaborationwithin disciplinesbetween disciplines

Page 25: Improved Use of Continuous Data- Statistical Modeling instead of Categorization Willi Sauerbrei Institut of Medical Biometry and Informatics University

25

ReferencesAltman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “Optimal” cutpoints in the evaluation of prognostic

factors. Journal of the National Cancer Institute 1994; 86:829-835. McShane LM, Altman DG, Sauerbrei W. Identification of clinically useful cancer prognostic factors: What are we missing?

(Editorial). Journal of the National Cancer Institute 2005. McShane LM, Altman DG, Sauerbrei W, Taube SE, Gion M, Clark GM for the Statistics Subcommittee of the NCI-EORTC

Working on Cancer Diagnostics. REporting recommendations for tumor MARKer prognostic studies (REMARK). Simultaneous Publication in Journal of Clinical Oncology, Nature Clinical Practice Oncology, Journal of the National Cancer Institute, European Journal of Cancer, British Journal of Cancer, 2005.

Pfisterer J, Kommoss F, Sauerbrei W, Renz H, du Bois A, Kiechle-Schwarz M, Pfleiderer A. Cellular DNA content and survival in advanced ovarian carcinoma. Cancer 1994; 74:2509-2515.

Qiao G-B, Wu Y-L, Yang X-N et al. High-level expression of Rad5I is an independent prognostic marker of survival in non-small-cell lung cancer patients. BJC 2005; 93:131-143.

Rosenberg et al. Quantifying epidemiologic risk factors using non-parametric regression: Model selection remains the greatest challenge. Stat Med 2003; 22:3369-3381.

Royston, P, Altman DG. Regression using fractional polynomials of continuous covariates : parsimonious parametric modelling (with discussion). Applied Statistics 1994; 43:429-467.

Royston P, Sauerbrei W, Ritchie A. Is treatment with interferon-alpha effectiv in all patients with metastatic renal carcinoma? A new approach to the investigations of interactions. British Journal of Cancer 2004; 90:794-799.

Sauerbrei, W., Meier-Hirmer, C., Benner, A., Royston, P. Multivariable regression model building by using fractional polynomials: description of SAS, STATA and R programs, Computational Statistics and Data Analysis 2005, to appear.

Sauerbrei W, Royston P. Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society A 1999; 162:71-94.

Sauerbrei W, Royston P, Bojar H, Schmoor C, Schumacher M. and the German Breast Cancer Study Group (GBSG). Modelling the effects of standard prognostic factors in node positive breast cancer. British Journal of Cancer 1999; 79:1752-1760.