kernel estimation as alternative to (semi)parametric

29
Introduction Kernel estimates Simulation study Real data Conclusion Kernel estimation as alternative to (semi)parametric models in survival analysis Iveta Selingerová, Stanislav Katina, Ivana Horová Department of Mathematics and Statistics Faculty of Science, Masaryk University Brno ROBUST 2016 11 - 16 September 2016

Upload: others

Post on 10-Dec-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Kernel estimation as alternative to(semi)parametric models in survival analysis

Iveta Selingerová, Stanislav Katina, Ivana Horová

Department of Mathematics and StatisticsFaculty of Science, Masaryk University

Brno

ROBUST 201611 - 16 September 2016

Page 2: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Outline1 Introduction

Survival analysisKernel smoothing

2 Modeling hazard function using kernel methodTwo types of kernel estimates of conditional hazard functionCox proportional hazard modelBandwidth selection

3 Simulation studyWeibull modelLognormal modelCox modelGeneral hazard function

4 Real dataTriple negative breast cancer

5 Conclusion

Page 3: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Survival analysis

Survival time T with distribution function FRandom right censoringCensoring time C with distribution function GObserved time Y = min(T ,C) with distribution function L,censoring indicator δ = I(T ≤ C)Survival function F (t) = P(T ≥ t) = 1− F (t)Hazard function

λ(t) = lim∆t→0+

P(t ≤ T < t + ∆t|T ≥ t)∆t = f (t)

F (t)

Cumulative hazard function Λ(t) =∫ t

0 λ(u)du

Page 4: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Properties

Survival function of observed time L(t) = F (t)G(t)Joint density of (Y , δ) is l(·, ·) and is composed of two”subdensities” l(·, 0) and l(·, 1)Subdensity of the event time r(t) = l(·, 1) = f (t)G(t)Hazard function

λ(t) = r(t)L(t)

Page 5: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Nonparametric estimation

Kaplan-Meier estimator of the survival function

F KM(t) =

1 t ≤ Y(1),∏i :Y(i)<t

(n−i

n−i+1

)δ(i) t > Y(1).

Nelson-Aalen estimator of the cumulative hazard

ΛNA(t) =

0 t ≤ Y(1),∑i :Y(i)<t

δ(i)n−i+1 t > Y(1).

Smoothing techniques for estimation of hazard function λ(t)

Page 6: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Survival models

X = (X1,X2, . . . ,Xp)T vector of p covariatesConditional hazard function

λ(t|x) = lim∆t→0+

P(t ≤ T < t + ∆t|T ≥ t,X = x)∆t

Parametric survival modelAssumption of distribution of survival timeAccelerated failure time (AFT) models, parametricproportional hazard (PH) models

Semiparametric survival model – Cox modelExponential dependence on covariatesAssumption proportionality of hazard ratioNonparametric estimate of baseline hazard

Nonparametric survival modelNo assumption of distribution or dependence on covariatesSmoothing techniques

Page 7: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Kernel smoothing

The value of unknown function at a point is estimated as alocal weighted average of known observations in theneighbourhood of this pointKernel K – real function with support on [−1, 1] satisfying

∫ 1

−1x jK (x)dx =

1 j = 0,0 0 < j < k,bk 6= 0 j = k.

Epanechnikov kernel K (x) = −34(x2 − 1)I ([−1, 1])

Page 8: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Kernel smoothing

The bandwidth sequence {h(n)}

limn→∞

h(n) = 0, limn→∞

nh(n) =∞

Statistical properties of an estimator ϑ(x) of a function ϑ(x)Local error of an estimator

MSE(ϑ(x)

)= E

(ϑ(x)− ϑ(x)

)2= Var

(ϑ(x)

)+Bias2

(ϑ(x)

)Global error of an estimator

MISE(ϑ)

=∫

MSE(ϑ(x)

)ω(x)dx

Page 9: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Bandwidths

0 2 4 6 8 10 12 14 16 18 20

0

2

4

6

8

10

0

0.05

0.1

0.15

0.2

0.25

0.3

t

h

λ(t

)

Page 10: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Two types of kernel conditional hazard estimates

External estimatorL(t|x) = F (t|x)G(t|x)λ(t|x) = f (t|x)

F (t|x) = f (t|x)G(t|x)L(t|x) = r(t|x)

L(t|x)

λE (t|x) = r(t|x)L(t|x)

=1ht

∑ni=1 wi (x)K

(t−Yi

ht

)δi∑n

i=1 wi (x)W(

Yi−tht

)Nadaraya–Watson weights

wi (x) =K(

x−Xihx

)∑n

j=1 K(

x−Xjhx

) , i = 1, . . . , n

Page 11: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Two types of kernel conditional hazard estimates

Internal estimatorBeran estimator of conditional cumulative hazard function

Λ(t|x) =

0 t ≤ Y(1)∑i :Y(i)<t

δ(i)w(i)(x)1−∑i−1

j=1 w(j)(x)t > Y(1)

λI(t|x) = 1ht

∫K( t − u

ht

)dΛ(u|x)

= 1ht

n∑i=1

K( t − Y(i)

ht

)δ(i)w(i)(x)

1−∑i−1

j=1 w(j)(x)

ht influences smoothing in direction of the timehx influences smoothing in direction of the covariate

Page 12: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Cox proportional hazards model

Cox modelλ(t|x) = λ0(t)eβx

Properties and assumptions:Hazard ratios are constant over time, i.e.

λ(t|x1)λ(t|x2) = eβ(x1−x2)

The dependence on covariate is exponentialNo distribution of the survival time is assumed

Maximum partial likelihood estimator of the model parameterβ

Page 13: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Cox proportional hazards model

Breslow estimator of baseline hazard function

Λ0(t) =∑

i :t(i)<tλ0(t(i)) =

∑i :t(i)<t

di∑j∈Ri eβXj

Kernel estimator of baseline hazard function

λ0(t) = 1h

∫K( t − u

h

)dΛ0(u)

= 1h

n∑i=1

K( t − Yi

h

)δi∑n

j=1 I(Yj ≥ Yi )eβXj

Page 14: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Bandwidth selection

Asymptotically optimal bandwidthsMinimize MISETheoretical value of bandwidthsNeed to know distribution of survival time, observed time, orcovariate

Page 15: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Bandwidth selection

Cross-validation method

CV (hx , ht) = 1n

n∑i=1

∫λ3(t|Xi)e

−∫ t

0λ(u|Xi )dudt

− 2n

n∑i=1

λ2−i(Yi |Xi)

L(Yi |Xi)e−∫ Yi

0λ−i (u|Xi )du

δi

(hx,CV , ht,CV ) = arg min(hx ,ht ) CV (hx , ht)

0 2 4 6 8 10 12 14 16 18 20

0

24

6

810

−0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

ht

hx

CV

(hx,h

t)

Page 16: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Bandwidth selection

Maximum likelihood method

ML(hx , ht) =n∏

i=1

λδi−i(Yi |Xi)F−i(Yi |Xi)

(hx,ML, ht,ML) = arg max(hx ,ht ) ML(hx , ht)

0 2 4 6 8 10 12 14 16 18 20

0

2

4

6

8

100

1

2

3

4

5

6

7

8

x 10−207

ht

hx

ML(h

x,h

t)

Page 17: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Simulation study

200 triples of observed data (Xi ,Yi , δi )100 samplesSpecified conditional hazard function λ(t|x)Covariate Xi ∼ U(0, 10)Survival time Ti obtained as Ti = F−1(Ui |Xi ), whereUi ∼ U(0, 1) and F (t|x) = 1− e−

∫ t0 λ(u|x)du

Censoring time Ci ∼ logN (µ; 0.22), where µ influencescensoring rate (30%, 60% or 90%)Observed time Yi = min(Ti ,Ci )Censoring indicator δi = 1 for Yi = Ti and δi = 0 for otherError between true and estimated conditional hazard functionmeasured by

ASE(λ) = 1n

n∑i=1

(λ(Yi |Xi)− λ(Yi |Xi)

)2

Page 18: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Weibull model

λ(t|x) = νµtµ−1eβx

ν = 0.018, µ = 1.3,β = 0.2The interpretationusing PH or AFTmodel 0

5

10

15

20

0

2

4

6

8

10

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

t

x

Page 19: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Lognormal model

λ(t|x) =1σt φ( log t−βx

σ )Φ(βx−log t

σ )σ = 0.5, β = 1

3The interpretationusing AFT modelln Ti = βXi + εi ,εi ∼ N (0, σ2) 0

5

10

15

20

0

2

4

6

8

10

0

0.5

1

1.5

2

t

x

Page 20: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Cox model

λ(t|x) =1

15e(1.5− cos

( t1.7 + 1

))eβx

β = 0.2λ0(t) =

115e(1.5− cos

( t1.7 + 1

))0

5

10

15

20

0

2

4

6

8

10

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

tx

Page 21: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

General hazard function

λ(t|x) =1

40000

(t (t − 25)2 + 200

× (sin(0.7x − 5) + 2)

0

5

10

15

20

02

46

8100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

tx

Page 22: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Results of simulation study

If assumptions of the (semi)parametric models are satisfied,these models have slightly better resultsThis difference between models are diminished with increasingcensoring rateThe differences between kernel estimates and parametricmodels are more pronounced than between kernel estimatesand Cox model. This is due to stronger assumptions of theparametric models.The simulation of general conditional hazard function(violated assumptions of the (semi)parametric models) showskernel estimates as preferableIn practice, time distribution and dependence on covariatesare not known ⇒ general conditional hazard is most frequentsituation

Page 23: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Triple negative breast cancer

408 patients diagnosed and/or treated at Masaryk MemorialCancer Institute in Brno in the period 2004–2011100 deaths (82 deaths in the first 4 years from diagnosis ∼80% censoring rate)The age at diagnosis varies between 25 and 88 years (theaverage 55 years)Is survival time affected by patient age? (Testing ofhypothesis that β = 0 versus β 6= 0 )

0 20 40 60 80 100 120 1400

50

100

150

200

250

300

350

400

Time (months)

Patient

death

censored

Patients’ times Death rate

Page 24: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Triple negative breast cancer

010

2030

4050

20

40

60

80

1000

0.002

0.004

0.006

0.008

0.01

0.012

Time (months)Age (years)

Weibull model p-value=0.129β = 0.0128, ν = 0.0005, µ = 1.41PH interpretation: increase of age about one yearincreases risk of death 1.0128 timesAFT interpretation: increase of age about one yearshortens survival time 0.9910 times

010

2030

4050

20

40

60

80

1000

0.002

0.004

0.006

0.008

0.01

0.012

Time (months)Age (years)

Lognormal model p-value=0.045β = −0.0126, σ = 1.290, µ = 5.616AFT interpretation: increase of age about one yearshortens survival time 0.9875 times

010

2030

4050

20

40

60

80

1000

0.002

0.004

0.006

0.008

0.01

0.012

Time (months)Age (years)

Cox model p-value=0.127β = 0.0126PH interpretation: increase of age about one yearincreases risk of death by 1.0127 times

Page 25: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Triple negative breast cancer

010

2030

4050

20

40

60

80

1000

0.002

0.004

0.006

0.008

0.01

0.012

Time (months)Age (years)

External kernel estimate

010

2030

4050

20

40

60

80

1000

0.002

0.004

0.006

0.008

0.01

0.012

Time (months)Age (years)

Internal kernel estimate

High risk of death for patients over 70 years and increased riskof death for patients up to 40 yearsSlightly different shape of hazard function for various agegroupsIt offers possibility to divide patients into groups with similarrisk of death (up to 40, 41-70, over 70 years)

Page 26: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Conclusion

In practice, the theoretic shape of the conditional hazardfunction or distribution of the survival time is not known,assumptions of PH model are often violated – using kernelestimates is therefore very helpful alternativeThe kernel estimates of the conditional hazard function can beuseful for verifying assumptions of the (semi)parametricmodels and for finding thresholds of continuous variablesThe kernel estimates are able to capture any changes in thehazard function in direction of covariate and timeThe kernel methods are able to estimate different shapes ofthe hazard function without any constrains and smooth themoutThe kernel estimation produces functions more useful forpresentation

Page 27: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Conclusion

Advantage DisadvantageKernel Flexibility Complexity of calculating

estimates Visualization Estimate influencedNo assumptions by choice of bandwidthFinding groupswith similar risk

Parametric Easy interpretationInadequate results can

be obtained whereassumptions of models

are violated

model Quality of estimate forknown time distribution

Semiparametric Easy interpretationmodel No distribution

assumption

Page 28: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

References

I. Selingerová, H. Doleželová, I. Horová, S. Katina, J. Zelinka, Survival ofPatients with Primary Brain Tumors: Comparison of Two StatisticalApproaches, PloS one 11(2), 2016.T. R. Fleming and D. P. Harrington, Counting processes and survivalanalysis, John Wiley & Sons, 2011.I. Horová, J. Koláček, J. Zelinka, Kernel Smoothing in MATLAB: Theoryand Practice of Kernel Smoothing, World Scientific Publishing, 2012.L. Spierdijk, Nonparametric Conditional Hazard Rate Estimation: A LocalLinear Approach, Computational Statistics & Data Analysis 52, 2008,2419–2434.M. Svoboda et al., Triple-Negative Breast Cancer: Analysis of PatientsDiagnosed and/or Treated at the Masaryk Memorial Cancer Institutebetween 2004–2009 (in Czech), Clinical Oncology 25(3), 2012, 188–198.

Page 29: Kernel estimation as alternative to (semi)parametric

Introduction Kernel estimates Simulation study Real data Conclusion

Thank you for yourattention!