outlier detection in survival analysis · a presen¸ca de outliers numa amostra pode inﬂuenciar a...

Outlier Detection in Survival Analysis

João Diogo Pinto

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisors: Prof. Alexandra Sofia Martins de Carvalho

Prof. Susana de Almeida Mendes Vinga Martins

Examination Committee

Chairperson: Prof. Nuno Cavaco Gomes Horta

Supervisor: Prof. Susana de Almeida Mendes Vinga Martins

Members of the Committee: Nuno Luıs Barbosa Morais

May 2015

To my parents

ii

Acknowledgments

Firstly, I would like to thanks my supervisors Prof. Alexandra Carvalho, and Prof. Susana Vinga, for their

continuous attention and support, they gave me freedom to do research and kept me on track when was

needed. My grattitude also goes to the Post-Doc and PhD students at CSI/IDMEC and IT, for their support

and useful feedback, a special thanks goes to Andre Verıssimo, whose help was essential to produce the

simulation results. I would like to give my sincere thanks to the medical team from Hospital de Santa Maria

and Instituto de Medicina Molecular for their support and availability, a special thanks in particular to Prof.

Luis Costa, and Dr. Irina Duarte. I also would like to express my gratitude to Fundacao para a Ciencia e

Tecnologia who supported this work, under the CancerSys project (EXPL/EMS-SIS/1954/2013).

I would like to thanks my dear girlfriend Luegi for all her patience, understanding, and for being such a

great source of inspiration. I would also like to give my goddaughter Benedita, my sister Catarina and her

husband Pedro a special thanks, even abroad, you are a daily inspiration to me. I would like to give my sincere

thanks to my grandparents who were always interested and supportive. A special thanks goes to my parents,

this thesis is dedicated to them.

iii

Abstract

The goal of outlier detection methods is to identify observations that are dissimilar or inconsistent with the

data. The nature of what constitutes an outlier is subjective, and it commonly depends on the application.

Outlier detection is a fundamental task in many fields, since financial fraud detection, computer network

intrusion detection, and in the diagnosis of clinical diseases. Outliers can have extreme influence on data

analysis, and for this, their presence must be taken into account. Additionally, outliers may be interesting

observations themselves, they can provide insights about certain structures in the data or particular events

that occurred in the sample. In this thesis we popose three novel methods methods aiming to perform outlier

detection in a survival context. The methods proposed are model-based and rely on the measurement of

the concordance c index (Harrell et al., 1982). The first method named One Step Deletion (OSD) relies on

backward search for the subset of the k most outlying observations. The second method named Bootstrap

Hypothesis Testing (BHT) is a stochastic method that obtains several measures of the concordance c-index

using the bootstrap (Efron, 1979) resampling scheme. The observation under test is removed from the original

dataset, then the concordance c index is bootstrapped on the remaining data, using the resulting histogram

of concordances, an hypothesis test on the improvement of concordance is made. The higher the tendency to

improve concordance when removed from the data, the more outlying the observation under test is considered

to be. The third method named Dual Bootstrap Hypothesis Testing (DBHT) is an extension of BHT but

where two di↵erent kinds of bootstrap schemes are used: one where the observation being tested is never

present in the bootstrap samples, other where the observation under test is present in all samples at least once.

The more significant the di↵erence between the two generated histograms, the more outlying we consider the

observation to be. The last two methods (BHT and DBHT) are single-step methods, meaning they output

an outlying score for each observation, while OSD just returns the set of k most outlying observations, with

k given as parameter. In the results chapter the merits of the proposed methods are assessed performing a

comparative analysis with several existent methods. The performance is first assessed on a set of simulated

scenarios and then applied to real clinical datasets. On the simulation scenarios tested, the DBHT method

outperformed the remaining methods in most of the scenarios. On the real clinical datasets, the predictive

ability of the Cox regression presented improvements when trimming a certain level of outliers from the fit.

Keywords: outlier detection, survival analysis, model-based outlier detection, Cox proportional hazards,

bootstrap, robust estimation, concordance c-index.

iv

Resumo

Os metodos de deteccao de outliers tem como objectivo a identificacao de indivıduos que apresentam inco-

sistencias ou diferencas extremas em relacao aos demais indivıduos de uma amostra, estes sao usualmente

denominados de outliers. A definicao do que constitui um outlier tem uma natureza subjectiva, e normal-

mente a sua definicao depende da aplicacao em causa. Sao varias as areas de aplicacao onde se aplicam

metodos de deteccao de outliers: deteccao de fraudes financeiras, deteccao de intrusoes em redes de com-

putadores, e no diagnostico de doentes. A presenca de outliers numa amostra pode influenciar a analise de

forma desproporcionada, a sua presenca tem normalmente de ser levada em conta.

O trabalho aqui apresentado propoe tres novos metodos para a deteccao de outliers em dados de so-

brevivencia. Os tres metodos desenvolvidos utilizam uma metrica de performance especıfica de analise de

sobrevivencia: o ındice de concordancia c index introduzido por (Harrell et al., 1982). O primeiro metodo,

de nome One Step Deletion (OSD) efectua uma pesquisa sequencial pelo sub-conjunto que maximiza a con-

cordancia, as k observacoes eliminadas serao consideradas as mais outlying da amostra. O segundo metodo

proposto, denominado Bootstrap Hypothesis Testing (BHT) e baseado no esquema de reamostragem Boot-

strap (Efron, 1979). Varias amostras bootstrap sao geradas a partir do dataset com a observacao a ser testada

excluıda, daı calcula-se o histograma da variacao de concordancia em relacao a concordancia no dataset orig-

inal, quanto mais o histograma produzido apresentar valores maiores que zero, mais outlying e considerada

a observacao sob teste. O terceiro metodo Dual bootstraps hypothesis testing (DBHT) e uma extensao do

metodo BHT, mas em vez de bootstrap, sao utilizadas duas versoes antagonicas do bootstrap: na primeira

versao, a observacao sob teste esta ausente de todas as amostras geradas, enquanto que na versao dual, a

observacao e incluıda uma vez em todas as amostras. O merito dos metodos desenvolvidos vai ser avaliado

recorrendo a um conjunto de dados simulados, nestes o metodo DBHT demonstrou uma performance superior

na maioria dos cenarios. Nos dados reais, a remocao de um certo nıvel de outliers dos dados revelou aumentar

a performance do model de Cox em termos de predicao.

Palavras-chave: deteccao de outliers, analise de sobrevivencia, deteccao de outliers baseada em modelos,

Cox proportional hazards, concordance c-index, bootstrap, estimacao robusta.

v

Notation

B number of generated bootstrap samples

BHT Bootstrap Hypothesis Testing (proposed method)

DBHT Dual Bootstraps Hypothesis testing (proposed method)

�i

event indicator for subject i

DEV deviance residuals

DFB DFBETAS

h(t) hazard function

H(t) cumulative hazard function

LD Likelihood Displacement statistic

MART Martingale residuals

OSD One step Deletion (proposed method)

S(t) survival function

T survival time

Ti

survival time for subject i

TC

i

the follow-up time for individual i

X

i

covariates vector for subject i

vi

Contents

1 Introduction 1

2 Survival Analysis 3

2.1 Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Continuous-time Survival Function S(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Continuous-time Hazard Function h(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4 Discrete-time Survival Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.5 Discrete-time Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.6 Kaplan-Meier Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.7 Log-Rank Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.8 Cox proportional hazards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.9 Other Survival Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.10 Performance Metrics for Survival Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.10.1 Somers’ D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.10.2 Harrell’s concordance c index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.10.3 Time-dependent ROC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.11 Counting Processes Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Survival outlier detection and robust estimation 21

4 Proposed Methods 26

4.1 Motivation for the use of Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 One Step Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Bootstrap Hypothesis Test Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 Dual Bootstraps Hypothesis Testing Outlier Detection . . . . . . . . . . . . . . . . . . . . . . 36

5 Results 40

5.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.2 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3 Worcester Heart Attack Study dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.4 Bone Marrow Transplant dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.5 CancerSys Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

vii

6 Conclusions and Future Work 63

viii

List of Tables

2.1 Example of data with censored observations (�i

= 0). . . . . . . . . . . . . . . . . . . . . . 5

2.2 Example of the subject replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Example of coding time-dependent covariates. . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 Linear regression example: observations sorted by residuals. . . . . . . . . . . . . . . . . . . . 23

4.1 Evolution of the OSD algorithm when applied to an example dataset. . . . . . . . . . . . . . 33

4.2 Example of a BHT output, sorted by p values . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.1 Outlier scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 Average of TPR grouped by outlier scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.3 Average of AUC grouped by outlier scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.5 Average TPR of the proposed methods grouped by outlier scenario and outlier amount k. . . 46

5.4 TPR by outlier scenario and outlier amount for the alternative methods . . . . . . . . . . . . 47

5.6 Average AUC of the alternative methods grouped by outlier scenario and outlier amount k. . . 47

5.7 Average AUC of proposed methods by outlier scenario and outlier amount k. . . . . . . . . . 48

5.8 Average TPR of the alternative methods grouped by outlier scenario and level of censoring. . 48

5.9 Average TPR of the proposed methods grouped by outlier scenario and level of censoring c. . 49

5.10 Average AUC of the alternative methods grouped by outlier scenario and censoring amount c. 49

5.11 AUC of BHT and DBHT by scenario and censoring amount . . . . . . . . . . . . . . . . . . . 50

5.12 Average TPR of the alternative methods by outlier scenario and baseline hazard(�, ⌫). . . . . 50

5.13 Average TPR of the proposed methods by outlier scenario and baseline hazard(�, ⌫). . . . . . 51

5.14 Average AUC of alternative methods by outlier scenario and baseline hazard type (�, ⌫). . . . 51

5.15 Average AUC of proposed methods by outlier scenario and baseline hazard(�, ⌫). . . . . . . . 52

5.16 Top-15 outliers detected by the methods on the WHAS100 dataset. . . . . . . . . . . . . . . 54

5.17 Cox model estimated with all WHAS observations. . . . . . . . . . . . . . . . . . . . . . . . 55

5.18 WHAS: Cox estimation after 5% outlier trimming . . . . . . . . . . . . . . . . . . . . . . . . 55

5.19 WHAS: Cox estimation after 10% outlier trimming . . . . . . . . . . . . . . . . . . . . . . . 55

5.20 Cox model on the WHAS data with 10% outlier trimming for alternative methods. . . . . . . 55

5.21 Cox model fit on the WHAS dataset with 10% outlier trimming, using the proposed methods. 56

5.22 Top-10% outliers detected by the methods on the BMT dataset. . . . . . . . . . . . . . . . . 56

5.23 Cox model estimation with all BMT data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

ix

5.24 Cox model estimations with 5% outlier trimming using the alternative methods. . . . . . . . . 57

5.25 Cox model estimations with 5% outlier trimming using the proposed methods. . . . . . . . . . 58

5.26 Cox model estimations with 10% outlier trimming using the alternative methods. . . . . . . . 58

5.27 Cox model estimations with 10% outlier trimming using the proposed methods. . . . . . . . . 59

5.28 Top-15 outliers detected by the methods on the CSYS dataset. . . . . . . . . . . . . . . . . . 59

5.29 Cox model fitted to all CSYS data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.30 Cox model fit with 5% outlier trimming using the alternative methods. . . . . . . . . . . . . . 60

5.31 Cox model fit with 5% outlier trimming using the proposed methods. . . . . . . . . . . . . . 60

5.32 Cox model with 10% outlier trimming for alternative methods. . . . . . . . . . . . . . . . . . 61

5.33 Cox model with 10% outlier trimming for proposed methods. . . . . . . . . . . . . . . . . . . 61

5.34 Leave-one-out c-indexes for the BHT method. . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.35 Leave-one-out c-indexes for the DBHT procedure. . . . . . . . . . . . . . . . . . . . . . . . 62

5.36 Leave-one-out c-indexes for the OSD procedure. . . . . . . . . . . . . . . . . . . . . . . . . 62

6.1 TPR of each method in the 12 outlier scenarios. c = 0.2; k = 5; � = 1; ⌫ = 1. . . . . . . . . 67

6.2 AUC of each method in the 12 outlier scenarios. c = 0.2; k = 5; � = 1; ⌫ = 1. . . . . . . . . 68

6.3 TPR of each method in the 12 outlier scenarios. c = 0.2; k = 10; � = 1; ⌫ = 1. . . . . . . . 68

6.4 AUC of each method in the 12 outlier scenarios. c = 0.2; k = 10; � = 1; ⌫ = 1. . . . . . . . 69

6.5 TPR of each method in the 12 outlier scenarios. c = 0.3; k = 5; � = 1; ⌫ = 1. . . . . . . . . 69

6.6 AUC of each method in the 12 outlier scenarios. c = 0.3; k = 5; � = 1; ⌫ = 1. . . . . . . . . 70

6.7 TPR of each method in the 12 outlier scenarios. c = 0.3; k = 10; � = 1; ⌫ = 1. . . . . . . . 70

6.8 AUC of each method in the 12 outlier scenarios. c = 0.3; k = 10; � = 1; ⌫ = 1. . . . . . . . 71

6.9 TPR of each method in the 12 outlier scenarios. c = 0.2; k = 5; � = 0.5; ⌫ = 1.5. . . . . . . 71

6.10 AUC of each method in the 12 outlier scenarios. c = 0.2; k = 5; � = 0.5; ⌫ = 1.5. . . . . . . 72

6.11 TPR of each method in the 12 outlier scenarios. c = 0.2; k = 10; � = 0.5; ⌫ = 1.5. . . . . . 72

6.12 AUC of each method in the 12 outlier scenarios. c = 0.2; k = 10; � = 0.5; ⌫ = 1.5. . . . . . 73













x

List of Figures

2.1 Example of left, right, and interval censoring. . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 The e↵ect of changing the time scale on h(t). . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Example of S(t) KM estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Example of ROC(t). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1 Example of linear regression fit on a data set with outliers. . . . . . . . . . . . . . . . . . . . 23

4.1 Bootstrapping a test statistic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 Bootlier multi-modal e↵ect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3 BHT: histograms for inlier and outlier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.4 DBHT: poison and antidote bootstraps for an outlier. . . . . . . . . . . . . . . . . . . . . . . 39

4.5 DBHT: poison and antidote bootstraps for an inlier. . . . . . . . . . . . . . . . . . . . . . . . 39

5.1 A 2-D example of a general trend �G with examples of outliers sources. . . . . . . . . . . . . 41

5.2 Baseline hazards used in the simulation data. . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3 Evolution of TPR with parameter B, DBHT on blue and BHT on red. . . . . . . . . . . . . . 52

5.4 Evolution of AUC with parameter B, DBHT on blue and BHT on red. . . . . . . . . . . . . . 53

xi

Chapter 1

Introduction

Survival analysis is a body of statistical methods which aim to study time to event data. Its applications range

from social sciences, industrial reliability, to clinical studies. One of its main innovations was the introduction

of the Proportional Hazards model by Cox in 1972 (Cox, 1972) also known as Cox regression. The simplicity

and semi-parametric nature of the estimation procedure led it to be one of the most used tools in survival

analysis, and the model of choice when studying the relationship between explanatory variables and survival

time of individuals. Despite its popularity, it has been shown that the model lacks robustness, having a

breakdown point of zero meaning that the estimation can be compromised by a single corrupt observation

(Kalbfleisch and Prentice, 2011). To deal with this lack of robustness, survival-specific tools can be employed,

several of these are from Cox regression diagnostics, these include: martingale residuals, deviance residuals

(Therneau et al., 1990), besides other methods that are not specific to the Cox model, such as: the likelihood

displacement statistic (David Collett, 2003) and regression DFBetas (Harrell, 2001). This thesis is concerned

with algorithms able to identify the k most likely outlier observations. In this way, those k observations

could be excluded prior to the estimation of a survival model thus potentially increasing the robustness of

the Cox fitting process, or the fitting of other survival models such as the Buckley-James regression (Buckley

and James, 1979) which is also known for its lack of robustness (Stare et al., 2000). Our approach to

survival outlier detection is model-based, meaning what will be evaluated are not the explanatory variables

nor the survival outcomes of the subjects but the relationship between them. The rationale behind this, is

that extreme values on the outcome can be considered regular or common when looking at the explanatory

variables, additionally extreme values of a given explanatory variable can be considered normal or regular

when also an extreme outcome is observed. So we trade assumptions about the distributions of covariates

and outcomes, with the assumption that the model is able to capture the relationship between covariates and

respective survival outcomes.

Contribution

We propose three new methods to perform outlier detection in survival data. The methods are implemented

in R in the form of an R package, that can be made available under request. Some of the results presented in

1

this thesis were already published in an article which was nominated for Best Paper Award at the international

conference BIOSTEC - BIOINFORMATICS 2015 in Lisbon: Joao Diogo Pinto, Alexandra M. Carvalho and

Susana Vinga, Outlier Detection in Survival Analysis based on the Concordance c-index, In Proceedings of

BIOINFORMATICS, pages 72-82, 2015

Thesis Outline

Chapter 2 starts with a literature review, including the basic concepts in survival analysis and their mathe-

matical definitions, then we explore the Cox proportional hazards model, its features and estimation process.

Additionally, an overview of alternative survival models is made. Chapter 3 concerns outlier detection and

robust estimation. Starting with a revision of general outlier detection literature, we then focus on outlier

detection in a survival context, including Cox residual analysis and methods that aim to increase the robust-

ness of Cox regression. In chapter 4 we present our three proposed methods. In chapter 5 we present the

experimental results where the di↵erent methods were tested over simulated data. Concerning real data, three

datasets from clinical studies were used.

2

Chapter 2

Survival Analysis

Survival Analysis consists on a set of statistical and machine-learning procedures addressing the study of data

where the outcome of interest is the time until a given event of interest occurs. The event of interest depends

on the application. In the medical field this event can be the death of a patient, organ rejection, relapse

or remission from disease; examples of such studies are (Crowley and Hu, 1977; Ojo et al., 2000). In social

sciences the time to rearrest of drug o↵enders can be studied using survival analysis techniques (Rocha, 2011).

Given the fact that survival analysis literature is mostly applied to the clinical and biomedical fields, the time

until the event of interest is usually named survival time. In general, when undertaking a survival analysis

task we can consider three major goals (David G. Kleinbaum, Mitchel Klein, 2005):

1. Estimate survival distributions from the data.

2. Compare survival distributions between groups of patients.

3. Evaluate the relationship between the explanatory variables and respective survival time.

The first goal is important to characterize the survival time of the overall sample, in particular obtaining

statistics as the mean and median survival times. The second aims at discovering statistical significant

di↵erences in survival time among several groups. The third one concerns the fact that in most applications

the event of interest is associated with an individual with its own particular characteristics. For instance in

a clinical study, the researchers obtain several biological measures from the patients, in order to be able to

associate biological phenomena and survival time. For convenience, we henceforward denote the survival time

by T and very often the event of interest will referred as death. This is due to the fact, that most datasets

used for analysis, simulation and testing come from clinical studies where in the majority of the cases the

event of interest is death or relapse of disease. There is a an observed survival time Ti

for each individual

i. Individuals are represented by a vector of explanatory variables, also called covariates. This vector will be

denoted by X, with X

i

being the covariate vector representing the i-th individual.

3

2.1 Censoring

It is mainly due to censoring that survival analysis is considered a field on its own, the most part of statistical

procedures and machine learning algorithms are not designed in order to incorporate it. The occurrence of

censoring is very frequent in survival data, it occurs when the exact individual’s time of failure Ti

is unknown.

When a subject is censored, we only know Ti

lies in a certain interval around the censoring time TC also

known as follow-up time. The follow-up time TC

i

is the only observed time of the i-th individual, if there is

no censoring then TC

i

= Ti

. More formally:

Ti

2 [TC

i

��

left

, TC

i

+�

right

] with �

left

,�right

� 0.

Regarding the characteristics of this uncertainty interval, we can classify censoring in three categories:

�

left

= 0 ^�

right

= 0 the event time is uncensored.

�

left

= 0 ^�

right

> 0 the event time is right censored;

�

left

> 0 ^�

right

= 0 the event time is left censored;

�

left

> 0 ^�

right

> 0 the event time is interval censored.

In figure 2.1 we have the censoring mechanism illustrated for the four types of censoring described above,the

dotted-lines represent the time interval when the event might have occurred.

t1

Event

t3 t2

Left Censoring

Right Censoring

Interval Censoring

t

Event

Event

Event Uncensored

Figure 2.1: Example of left, right, and interval censoring.

Right censoring typically occurs by one of the following reasons: 1) a person experiences the events only

after the study has ended; 2) a person is lost to follow-up during the study; 3) the subject gets removed from

the study by unrelated events. Left censoring may be caused by a subject entering a study with an unknown

disease onset time, for instance, if an examination to evaluate cancer recurrence gives a positive result, we

can only assume that the recurrence time is less or equal than the time of the examination. To illustrate

interval censoring we can resume the cancer example but this time with two examinations one at 3 months

after surgery and another at 6 months, if the patient gets a negative result in the first one and a positive

result in the second examination, we only know that the exact time of recurrence is between 3 and 6 months.

4

Right censoring is by far the most common in clinical studies, and it will the only type of censoring considered

in this work.

A typical data format to model right censored data is the addition of an event indicator variable �i

for

each individual i. When the individual experiences the event then � = 1; otherwise it means the individual is

right censored and it will have � = 0. In Table 2.1 we have data codified in this format. For individuals A and

C we know the experienced the event occurred respectively after 44.5 and 10 months, for individual B, as its

survival time was censored we only know it survived at least for 30 months. Regarding right censoring another

Table 2.1: Example of data with censored observations (�i

= 0).

Individual TC

i

(months) �

A 44.5 0

B 30 1

C 10 0

kind of categorization can be done that concerns the design of the study and the dependence between the

existence of censoring and individuals survival time:

Type I censoring :, a survival time Ti

is observed if is no larger than a pre-specified censoring time Tstudy

,

otherwise we just know the event happened after Tstudy

.

Type II censoring : this kind of censoring occurs when the study is ceased after a pre-specified number

of events is registered;

Random censoring : also called noninformative censoring, the survival time T and censoring � are

random variables independent from each other.

2.2 Continuous-time Survival Function S(t)

Survival time is typically treated as a non-negative real random variable T . Instead of characterizing T by

its distribution F (t) = Pr(T < t), in survival analysis the focus is on the complementary of F (t), given

by S(t) = Pr(T � t) = 1 � F (t). S(t) is called the Survival Function (David Collett, 2003) also know as

Survivor Function (Lawless, 2003). As the name indicate, S(t) gives the probability of an individual living

longer than a given time t. We may write:

S(t) = Pr(T > t) =

Z 1

t

f(x)dx = 1� F (t). (2.1)

Where f(t) is the p.d.f function of T . S(t) has some properties worth notice:

S(t) : R+0 ! [0, 1].

S(t) is a monotone decreasing continuous function.

S(0) = 1 and S(+1) = 0.

5

2.3 Continuous-time Hazard Function h(t)

Another fundamental concept in survival analysis is the hazard function h(t), this function also called hazard

rate, force of mortality, conditional failure rate or even instantaneous death rate (David G. Kleinbaum, Mitchel

Klein, 2005; David Collett, 2003; Kalbfleisch and Prentice, 2011). With a di↵erent perspective from S(t)

which represents the probability of not failing until a given time t, h(t) represents the failure rate at a given

time t given the person has survived until t, so the higher the values of h(t) the shorter will be the survival

time. It also expresses a rate rather than a probability like S(t), so its values are non-negative but can exceed

unity. The meaning of this failure rate can be thought as the number of events occurring at a given time

instant t given individuals surviving up to time t. Formally h(t) is defined by:

h(t) = lim

�t!0

Pr(t T < t+�t|T � t)

�t(2.2)

= lim

�t!0

Pr(t T < t+�t, T � t)

Pr(T � t)�t

= lim

�t!0

Pr(t T < t+�t)

Pr(T � t)�t

= lim

�t!0

Rt+�t

t

f(x)dx

Pr(T � t)�t

= lim

�t!0

�tf(t)

�tS(t)! h(t) =

f(t)

S(t)(2.3)

The meaning of h(t) is not so intuitive as S(t), (David G. Kleinbaum, Mitchel Klein, 2005) provided the

following conceptual interpretation: the hazard function h(t) gives the instantaneous potential per unit time

for the event to occur, given that the individual has survived up to time t.

In (Lawless, 2003) the concept instantaneous death rate is used, and the author notes that h(t)�t gives

the approximate probability for the event to occur in the interval [t, t+�t), given the individual has survived

until t.

This function is particularly useful, since it describes the way, how the failure rate of the population evolves

along time. In many applications there may be clinical and statistical information that can be used to define

the hazard function, which in turn can help in selecting a lifetime distribution model. For example, in some

situations there are reasons to only consider an increasing monotone hazard function, that translates an “wear

out” e↵ect, for example, a mechanical part is subject to an aging phenomenon that degrades the part as time

passes. Some properties of h(t) include:

Continuous function that maps h(t) : R+0 ! R+

0 .

Is generally not monotone.

The scale of h(t) depends on the time unit used.

An important relation between h(t) and S(t) is the ability of obtaining S(t) from a product of probabilities of

the form P (T 2 [t, t+ dt]|T > t) = h(t)dt , such probabilities vary infinitesimally so integrating this product

6

we can write S(t) as the product of each probability of not dying until time t (Cox, 1972) :

S(t) = P (T � t) (2.4)

=

tY

0

(1� h(t)dt) (2.5)

= lim

Y

⌧k<t

[1� h(⌧k

)(⌧k+1 � ⌧

k

)](⌧k+1�⌧k)!0 (2.6)

= exp

�Z

t

0h(u)du

�. (2.7)

From this we get:

h(t) = �d logS(t)

dt

In the exponential in 2.7, the function being integrated, is known as Cumulative Hazard function, and it

is given by:

H(t) =

Zt

0h(x)dx (2.8)

As we can conclude from the definitions so far, the functions f(t), F (t), S(t), H(t) and h(t) contain the exact

same information about the distribution of the random variable T .

2.4 Discrete-time Survival Function

Sometimes when the individuals lifetimes are grouped, quantized, or the available data is not appropriate for

a continuous interpretation of time, T becomes a r.v with values on a finite set t1, t2, ..., tn. The survivor

function is defined very intuitively as the sum of all probabilities for failing in all times prior to t:

S(t) = Pr(T � t) (2.9)

=

X

j:tjt

p(tj

) (2.10)

Where p(tj

) = Pr(T = tj

). The above calculation is not able to incorporate censoring in each of the

probabilities p(tj

). In the next section we introduce the Kaplan-Meier estimator that solves precisely this

problem.

2.5 Discrete-time Hazard Function

The hazard function is easily defined as a failure rate (Klein and Moeschberger, 2003):

h(ti

) = Pr(T = ti

|T � ti

) =

p(ti

)

S(ti�1)

(2.11)

It is worth noticing that if we change the time scale from days to months for examples the numeric value

for �ti

gets smaller and the probability on the numerator gets larger (as it includes more events in the same

interval). This e↵ect can be seen in Figure 2.2, where h(t) is estimated for time measured in months and

with a time scale given in days.

7

0 2 4 6

0.0

0.5

1.0

1.5

2.0

[Time]=years

Time

Hazard

0 500 1500 2500

0.000

0.002

0.004

0.006

[Time]=days

Time

Hazard

Figure 2.2: The e↵ect of changing the time scale on h(t).

2.6 Kaplan-Meier Estimator

The Kaplan-Meier (KM) estimator (Kaplan and Meier, 1958) is one of the most used tools in survival analysis,

it provides a non-parametric estimate of S(t), able to incorporate right censoring. Considering m events in

a population of n individuals, with n � m of them being right censored. Denoting by nj

the number of

individuals at risk at time tj

, meaning the inidividuals still alive (not experienced the event) at time tj

, and

the number of deaths (occurred events) at time t by dj

. In a discrete setting and assuming independence

between the events and censoring times (Random Censoring), we can interpret the KM estimator as a product

of the probabilities of surviving each time tj

having survived until time tj�1 :

bS(t) = cPr(T � t)

=

Y

tj<t

(1� cPr(T = tj

|T > tj�1))

=

Y

tj<t

(1� dj

nj

)

In figure 2.3 we have an example of the KM estimation of S(t) for a survival dataset. The dashed lines

represent the 95% confidence interval which can be estimated as in (Klein and Moeschberger, 2003).

8

0 500 1000 1500 2000 2500

0.0

0.2

0.4

0.6

0.8

1.0

S(t) for WHAS100

t [days]

P(T>

t)

Figure 2.3: Example of S(t) KM estimation.

This estimation of the survival curve S(t) is completely uninformative in what concerns assessing relation-

ships with available explanatory variables, but when applied to di↵erent strata of the original data, is very

useful to assess relevant di↵erences between survival profiles. For example: in a clinical study calculating

one Kaplan-Meier curve for Male and Female patients can be very insightful in understanding the association

between gender and survival time, as we will see later, this di↵erence between survival curves is tested recur-

ring to statistical tests like the Log-Rank test. The Nelson-Aalen estimator (Aalen et al., 2008) is another

non-parametric estimator, this time for the Cumulative Hazard Function, it is equivalent to the KM estimator

for the calculation of the survival function by using the relationship between the two: S(ti

) = exp [�H(ti

)].

The Nelson-Aalen estimates the cumulative hazard function as the sum of the failure rates until time t:

H(ti

) =

i�1X

j=0

dj

nj

(2.12)

With dj

again denoting the number of deaths (events) that occurred at time tj

and nj

the total number of

individuals at risk at time tj

in other words, subjects that did not fail until tj�1.

2.7 Log-Rank Test

Although we can estimate survival curves for several groups it is useful to assess if the estimated curves

present statistically significant di↵erences. One of the most used methods is the Log-Rank test and it will be

presented in a similar way of (David G. Kleinbaum, Mitchel Klein, 2005). This testing procedure, assesses if

the individuals of the two (or more) groups come (or not) from the same population concerning their survival

time. The log-rank test is a large-sample chi-square that makes use of a test statistic that captures the

di↵erence in ters of survival between groups that is able to incorporate censoring. The log-rank statistic tries

9

to reflect the di↵erence in observed and expected number of failure events registered for each group at each

time step. The time steps are defined as the ordered failure times. First the following quantities are defined

(David G. Kleinbaum, Mitchel Klein, 2005):

mi,tj : number of deaths in group i at time t

j

.

ni,tj : number individuals at risk (still alive) in group i at time t

j

.

ei,tj : number of expected events in group i at time t

j

Oi

: total of observed events for group i

Ei

: total of expected events for group i

The expected number of events is calculated assuming that the two groups are equally prone to have individuals

experiencing events. So for each group, the expected number of events is proportional to the number of

individuals in the group:

ei,tj =

ni,tj

n1,tj + n2,tj

(m1,tj +m2,tj )

The total number of observed and expected events are sum over all failure times:

Oi

=

X

j

mi,tj

Ei

=

X

j

ei,tj

The Log-Rank statistic (LR) translates the di↵erence in expected versus observed events for each group. In

the two-group case this statistic will be equal for both groups, so calculating for group 1 we have:

LR =

(O1 � E1)2

V ar(O1 � E1)(2.13)

(2.14)

Making use of this statistic, the following hypothesis test is performed:

H0 : no di↵erence in survival between groups (2.15)

LR ⇠ �2 with 1 d.f. under H0 (2.16)

For the general case with G groups , the LR statistic is given by:

LR =

GX

i=1

(Oi

� Ei

)

2

V ar(Oi

� Ei

)

(2.17)

In this case, under the null hypothesis the LR statistic follows a �2 distribution with G�1 degrees of freedom.

10

2.8 Cox proportional hazards

When assessing the relationship between a vector of explanatory variables X = X1, X2, ..., Xp

and observed

survival times, regression analysis is usually very insightful and is supported by strong mathematical tools.

Unfortunately, typical regression techniques such as multivariate least squares, logistic regression and others

are not able to deal with survival data due to the existence of censoring. In order to fill this void, in 1972

Sir David Cox introduced the acclaimed and widely used Cox proportional hazards model, (Cox, 1972), also

known as the Cox regression. This kind of regression given its semi-parametric nature, made it the most

used regression model in survival analysis.The Cox regression relies on the proportional hazards assumption:

it is assumed that for every pair of individuals i, j their hazard functions are proportional along time, this has

the elegant consequence that every hazard function hi

(t), can be written in relation to an abstract baseline

function h0(t), so the hazard function for individual i is the result of a product of two factors:

hi

(t) = Ki

h0(t)

Cox modeled the factor Ki

as a function of the covariates of individual i. A natural choice to model this

factor would be a linear combination of covariates, but as the hazard function cannot take negative values,

the linear combination is exponentiated, so the values are always positive (David G. Kleinbaum, Mitchel Klein,

2005).

h(t|xi

) = e�xih0(t)

The factor multiplying by the baseline hazard is often called relative risk or hazard ratio, given that if known,

we only can compare the risks among individuals, we cannot infer any absolute hazard value, because we do

not know h0(t). In applications the relative risks are quite useful to compare survival profiles among subjects,

and to assess the impact that each variable has in the subjects survival time.

Partial Likelihood and estimation of �

Cox approach to the estimation of � = (�1, ...,�p

) resorts to a maximum likelihood approach, in particular

aconditional partial likelihood. Partial because the baseline hazard funciton h0(t) does not need to be defined

in order to calculate �. Conditional, because the likelihood function is constructed only considering the set

of instants {tj

} at which events have occurred. Being Ti

the random variable representing the survival time

of individual i, and R(t

j

) the set individuals at risk at time tj

. Cox defined the conditional probability of

Ti

= tj

(restricting the event time to tj

) as the fraction of total hazard o individual i at time tj

(Cox, 1972):

P (Ti

= tj

|event at tj

) =

hi

(tj

)Pl2R(tj)

hl

(tj

)

11

P (Ti

= tj

|death at tj

) is the probability that the individual i fails at time tj

, knowing that an individual has

died at time tj

. Having individuals j experiencing the event at times tj

, the conditional likelihood is given by:

L(�) = P (T1 = t1, Tj

= tj

, .., Tk

= tk

|events at t1, tj , ..., tk)

=

kY

j=1

hj

(tj

)Pl2R(tj)

hl

(tj

)

=

kY

j=1

exj�h0(t)Pl2R(tj)

exl�h0(t)

=

kY

j=1

exj�h0(t)

h0(t)P

l2R(tj)

exl�

=

kY

j=1

exj�

Pl2R(tj)

exl�

Following the proportional hazards assumption the term h0(t) cancels out and thus is irrelevant to this max-

imum likelihood estimation. Moreover is to note that the information from censored subjects is incorporated

in the denominator: for all individuals in the risk set R(t

j

) their hazards are summed. The Cox model

log-likelihood function becomes:

l(�) =kX

i=1

x

i

� �kX

i=1

log

2

4X

l2R(ti)

exl�

3

5 (2.18)

Since the baseline hazard canceled out in the calculations, this is one of the main reasons for this model

popularity: in order to calculate the regression parameters � no assumption has to be made about the

baseline hazard function, as long as the proportional hazards assumptions holds. The estimation of ˆ� is given

by maximizing 2.18 in relation to �:

ˆ� = argmax l(�)

In order to find maximize 2.18 we di↵erentiate the log-likelihood in order to each coe�cient �r

, resulting in :

Ur

(�) =@l(�)

@�r

=

kX

i=1

[x

r,i

�Ar,i

(�)]

where Ar,i

(�) =

Pl2R(tj)

x

r,i

exl�

Pl2R(tj)

exl�

Ur

(�) is known as the score function. The entries for the Hessian matrix are given by:

⌥

r,⇠

(�) =@2l(�)

@�r

@�⇠

The derivative in conjunction with @

2l(�)

@�r@�⇠can be used with the Newton-Raphson algorithm to calculate the

the maximum likelihood estimate of � (Cox, 1972).

12

Wald Test

The Wald is used to assess the significance of being di↵erent from zero for each coe�cient �r

. The Wald

statistic is given by (Harrell, 2001):

W (�r

) =

�2r

�2�r

The covariance matrix for the parameter vector � can be approximated by the Hessian matrix in 2.19, the

standard error ��r for each coe�cient is given by the squared root of the corresponding diagonal element.

Under the null hypothesis of the coe�cient being equal to zero, this statistic follows a �2 dsitribution with

one degree of freedom. More formally the hypothesis test made is the following:

H0 : �r

= 0

with H0 true :W (�r

) ⇠ �2with 1 degree of freedom

This test is one of the most used when assessing the significance of a coe�cient, for example it is used

to calculate the p-values of the Cox’s coe�cients in the R routine “coxph” of the survival analysis package

“survival” (Therneau, 2014).

Testing the Proportional Hazards Assumption

As the name indicates, the key assumption in the Cox’s model is the proportionality of hazards along time,

given by the relation:

h0(t)eXi(t)�

h0(t)eXj(t)�=

eXi(t)�

eXj(t)�, (2.19)

note that even for time changing covariates, although the relative hazard is not independent of time, the rela-

tive impact of any two values for a covariate is always determined by �. There a variety of di↵erent approaches

when assessing the proportionality of hazards assumption, the simplest one and perhaps more intuitive, is to

plot the survival curves estimated for example using the Kaplan-Meier estimator, if the assumption holds, the

curves plotted in log-log scale should be approximately parallel, ( (Therneau and Grambsch, 2000) provides a

simple explanation to this result). In the Cox model, with time-fixed covariates only, the cumulative hazard

function is given by:

H(t) =

tZ

0

h0(u)eXi�du = (2.20)

= eXi�

tZ

0

h0(u) = (2.21)

= eXi�H0(t) (2.22)

Thu we have for the survival function:

Si

(t) = � log[eXi�H0(t)] () (2.23)

Si

(t) = log[H0(t)]�X

i

� (2.24)

13

An alternative test based on the work by (Grambsch and Therneau, 1994), extends the Cox model to include

time varying coe�cients, �(t) is calculated based on Schoenfeld residuals, the more �(t) is constant along

the time, the more reasonable is the proportionality of hazards assumption.

Ridge and LASSO

The partial log-likelihood can adapted in order to restrict the structure of �. For example, one common

feature is to promotesparsity since in many settings the number of covariates is large. To assure convergence

and significance for the coe�cients we want the model to force some of the coe�cients to zero. This can be

done by introducing a penalty term as in ridge regression (Goeman, 2010) :

lridge

(�) = lCox

(�)� �||�||2, (2.25)

the parameter � is a tuning parameter usually called the regularization term. It has been shown that the ridge

penalty term introduced, does not necessarily lead to sparsity [CITAR]. An alternative approach to penalization

is the Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani, 1996), the penalized Cox partial

likelihood is given by:

llasso

(�) = lCox

(�)� �pX

r=0

|�r

|, (2.26)

s These methods are very useful when performing feature selection. After fitting the model, the covariates

that have coe�cients close to zero are the favorites to be discarded. Both methods are available in the R

package ”penalized” (Goeman, 2012).

Competing Risks and Time-varying covariates

In clinical studies it is fairly common to have individuals that are being subject to multiple risks of failure.

For example, in the Stanford heart transplant study, reported by (Crowley and Hu, 1977), the failures are

classified between death from organ rejection or death given other complications related to the procedure.

Several approaches to analyze this kind of data have been studied, for example (Kalbfleisch and Prentice,

2011) treats this case by fitting two separate Cox models, one for each kind of failure treating other failure

types as censored observations. Another approach adopted by various authors including (Larson and Dinse,

1985) involves fitting more complex models incorporating the di↵erent failure types. Here we will present one

of the most simple methods to incorporate competing risks, when fitting a survival model, the (Lunn and

McNeil, 1995), their approach allows the analysis of competing risks survival data directly using the standard

Cox proportional hazards model. Their strategy relies on reformulating the data according to each individuals

cause of death. Considering an example scenario of two competing risks I and II given by ⇢ = 0, ⇢ = 1, each

subject appears twice (two competing risks), if the subject dies from risk ⇢ = 0 it appears censored in the entry

corresponding to risk ⇢ = 1. The covariates vector is duplicated in order to take into account interactions of

the covariates with the type of failure . In table 2.2 we have the general case for two groups with subjects

having covariate vector x: As shown, an individual has always two entries in the data, one accounting for

each kind of failure. The basis for these procedure relies on the assumption that the overall hazard function

14

Table 2.2: Example of the subject replication.

Individual Ti

�i

Type of failure (⇢) Augmented covariates

i ti

1 ⇢i

{⇢i

x,x}

ii

0 1� ⇢i

{(1� ⇢i

)x,x}

is the sum of the independent hazard functions from each competing risk , thus using the counting process

framework we have for the cumulative hazard function:

H(t) =nX

i=0

Hi

(t) (2.27)

=

nX

i=0

Hi, I(t) +nX

i=0

Hi, II(t) (2.28)

The existence of time-dependent covariates is very common in any clinical data, the most common type are

repeated measurements on a subject along time, or a change in the patient’s therapy. To incoroporate such

time-series, the data is usually written in counting process format as described in(Therneau and Grambsch,

2000). So for example having a time variable covariate X(t) we can describe its behaviour by adding a

new individual at the time hte variable changed its value, and censoring the individual corresponding to the

previous value, as can be seen in Table 2.3 This extension does not need any modifications in the standard

Table 2.3: Example of coding time-dependent covariates.

id Interval � X(t)

A (0,102] 1 0

B (0,21] 0 0

B (21,200] 1 1

Cox model, this data format allows for the computer routine to pick the right values of each covariate at each

time point where the components of the partial likelihood is being calculated, there is no subject replication,

at each time point, one subject enters in the partial likelihood calculation only once.

2.9 Other Survival Models

Buckley-James regression

A least squares approach to survival data was first introduced by (Miller, 1976). Buckley and James (Buckley

and James, 1979) presented a type of linear regression that was able to incorporate censored data that has

been shown to have good statistical properties(Miller and Halpern, 1982; Heller and Simono↵, 1990). The

Buckley-James model assumes that the survival time T is linearly related to the covariate vector X:

Ti

= �0 + �Xi

+ ✏i

;

where the values of ✏i

are i.i.d. having E[✏i

] = 0, V ar(✏i

) = �2, independently distributed from X

i

. Since for

right censored observations we only observe Ci

the usual least squares regression approach is not applicable.

15

Buckley and James defined their regression on a new response variable T ⇤i

:

T ⇤i

= Ti

�i

+ E[Ti

|T > Ci

](1� �i

)

recalling that � = 1 when the event is registered and � = 0 when the observation is censored. This expression

replaces censored survival times Ci

with E[Ti

|T > Ci

], their approach to the calculation of this conditional

expectation was (Buckley and James, 1979) :

E[Ti

|Ti

> Ci

] = E [�0 + �Xi

+ E [✏i

|�0 + �Xi

+ ✏i

> Ci

]]

= �0 + �Xi

+ E[✏i

|✏i

> Ci

� �0 + �Xi

]

= �0 + �Xi

+

1Z

Ci��0+�Xi

✏

Ci

� �0 + �Xi

dF

where F is the distribution of ✏ obtaiend through the Kaplan-Meier estimator. Since this estimate depends on

� itself, the estimation process is done by iterations. A few reasons can be enumerated on why one should

use the Buckley James regression over the Cox regression (Stare et al., 2000):

The assumption of proportionality of hazards may not hold.

When predicting survival times, the Cox model can only be done by estimating a baseline hazard h0(t)

which is not part of the Cox model estimation process.

The results of a Cox model are less intuitive in than results from a linear fit.

Survival trees by Goodness of Split

A tree-based approach to survival data was introduced by (Leblanc and Crowley, 1993), the method recursively

makes partitions of the data. The partitioining of the data is made until it has been split into many regions,

each only containing a few observations. The splitting rule used is the maximization of a standardized two-

sample logrank test statistic, which is called goodness of split G. Each partitioning is made under the CART

framework (Breiman et al., 1984) and the authors further restrict to splits on a single covariate Xj

, these can

be described by the following rules:

1. Each split depends on the value of one predictor Xj

.

2. if Xj

is an ordered variable, then partition is done founding a split point c such that one group with

Xj

< c and the other Xj

� c.

3. if Xj

is nominal with values in B = {b1, ..., br}, the partition is made on non-empty disjoint subsets of

B.

2.10 Performance Metrics for Survival Models

2.10.1 Somers’ D

Somers’D (Somers, 1962) is an asymmetric measure of association between two variables. Given a predictor

variable Z and an outcome variable T we may estimate DTZ

as a performance of using Z as a predictor of

16

T . For example, given X as the hazard function given by a Cox model fit on a dataset, we can check the

model ability of the estimated hazard function on predicting the survival times of the subjects. The definition

of Somers’ D is usually expressed in terms of Kendall’s ⌧ZT

(Kendall and Gibbons, 1990) :

⌧ZT

= E [sign(Zi

� Zj

)sign(Ti

� Tj

)] ;

sign(a) =

8>>>><

>>>>:

�1, if a < 0

1, if a > 0

0, if a = 0

where Xi

is the predictor variable for individual i, and Ti

the outcome of individual i. Expression 2.29

is unable to incorporate censored values of Zi

or Zj

, because there is certainty in which of the values is

greater. In order to incorporate censoring replaced the factors sign(Zi

�Zj

) with a censored signed di↵erence

censored sign di↵erence. The version given by (Newson, 2006) considers the general case with both right and

left censoring. Here we give the particular case for right-censored survival data codified with standard event

indicators � 2 {0, 1}:

csign(ai

, �i

, aj

, �j

) =

8>>>><

>>>>:

1, if ai

> aj

and �j

= 1

�1, if ai

< aj

and �i

= 1;

0 otherwise;

where the values of �i

, �j

are the event indicators (0 for censored) of ai

and aj

respectively. This new operator

has an intuitive interpretation: using an analogy with survival times and event indicators, if ai

is apparently

longer than aj

we can only be sure it is in indeed larger if aj

is not censored, otherwise there is uncertainty

and as random censoring is admitted the expected di↵erence between the two is zero. For the second case,

if ai

is apparently shorter than aj

,certainty can only be assured when ai

is not censored.

With this new operator a new quantity ⌧ censZT

analogous to ⌧ZT

can be defined:

⌧ censZT

= E[csign(Zi

, RZ

i

, Zj

, RZ

j

)csign(Ti

, �i

, Tj

, �j

)]

As in survival studies the only source of censoring is usually in the T variable, the formula above can be

reduced to:

⌧ censZT

= E[sign(Z1 � Z2)csign(Ti

, �i

, Tj

, �j

)]

The Somers’D measure (Somers, 1962) is calculated by:

DTZ

=

⌧ censZT

⌧ZZ

2.10.2 Harrell’s concordance c index

The Harrell’s concordance c index (Harrell et al., 1982) can be calculated directly from Somers’D:

c =D

TZ

2

+ 0.5

However this index can be be interpreted in a more intuitive way: it measures the total of concordant pairs

among all possible pairs of individuals. Its calculation can be done following the following steps:

17

1. Form all possible pairs over the data

2. Omit the pairs whose shorter survival time is censored. Omit pairs i and j if Ti

= Tj

unless on of them

is dead. These are the permissible pairs.

3. For each permissible pair where Ti

6= Tj

, count 1 if the shorter survival time has higher predicted risk.

Count 0.5 if the predicted outcomes are tied. For Ti

= Tj

count 1 if the predicted risks are tied. For

each permissible pair where Ti

= Tj

and only one censored, count 1 if the uncensored one has a higher

predicted risk.

4. In the cases not specified count 0.5

5. The C-index, is given by

C =

Concordance

Permissible Pairs Count

2.10.3 Time-dependent ROC

The typical Receiver Operating Curve (ROC) is a common measure of predictive ability when dealing with a

binary classification problem. In general the model outputs a continuous value on which upon the use of some

threshold one of the outcomes is chosen. Given the model output Y a typical classification setting would be:

Outcome =

8><

>:

1 if Y � Ythr

0 if Y < Ythr

ROC is particularly useful on assessing the discrimination ability of the model, because it does not depend

on any particular threshold value. To calculate the ROC curve, the classification is done for every threshold

value possible and the values for the true positive rate (sensitivity) and false positive rate (1 - specificity) are

computed and displayed on an axis. Usually the sensitivity is placed on the horizontal axis and the false positive

rate on the vertical axis. Generally a survival problem does not fit in the binary setting specified above, and

even if we define a priori two classes depending on a chosen threshold, survival data also presents censoring.

To extend the the use of ROC to surival data (Heagerty et al., 2000) developed the Time-dependent ROC. In

their work, they take as binary outcome a modified event indicator D(t). This event indicator takes D(t) = 0

if the event did not happen until time t and takes D(t) = 1 otherwise. Given the fact that the outcome

binary variable values depend at which time they are being evaluated, there will be a ROC curve for every

time instant ti

. Each curve ROC(ti

) is obtained by computing the sensitivity and specificity for each possible

value of the threshold Ythr

. Assuming that higher values for the model prediction Y indicate longer survival,

sensitivities and specifities are given by:

sensitivity(Ythr

, ti

) = P (Y > Ythr

|D(t) = 1)

specificity(Ythr

, ti

) = P (Y Ythr

|D(t) = 0)

In figure 2.4 we have the ROC(t) to measure the predictive ability and discrimination of a Cox regression

fit on the WHAS100 dataset. The curves were calculated using the R package ”survivalROC” (Heagerty and

packaging by Paramita Saha-Chaudhuri, 2013) :

18

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

ROC(1 year) AUC=0.775000

FPR

TPR

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8


FPR

TPR

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8


FPR

TPR

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8


FPR

TPR

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8


FPR

TPR

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8


FPR

TPR

Figure 2.4: Example of ROC(t).

2.11 Counting Processes Framework

Introduced by Aalen in (1975) and (1978), the counting process approach has been a source of many important

developments and continues to be one of the most studied fields in state-of-the-art survival analysis, issues such

as left censoring, right censoring, left truncation and time-dependent covariates can be elegantly incorporated

in this framework. In this text we will skip the heavy mathematical theory regarding stochastic integrals,

martingales and measeure theory.In (Aalen et al., 2008) he authors give a very intuitive explanation on the

counting process modeling of survival data, that will serve as basis for the exposition made in the next section.

This framework is based on the theory of stochastic processes and stochastic integrals, in particular Martingale

theory, first we define the event counting process, N(t), it counts the number of events that occurred until

time t, this can be translated using the indicator I function:

N(t) = I(T t, �i

= 1); (2.29)

Then we define the at-risk process, a decreasing process that gives the number of subjects in the risk set at

time t:

Y (t) = I(T � t); (2.30)

The expected number of events is related with the survival curves for every individual. To calculate the process

of expected number of individuals that fail until tj

. At each time instant we can consider an individual’s event

as a Bernoulli random variable with probability p equal for all subjects, the total number of events becomes

a binomial random variable with expected value given by np. The counting process states that n translated

by a process Y (t) and each probability p can be given by the population hazard function h(t). Since these

19

values change over time, the expected value is given by the integrating their product:

⇤(t) =

tZ

0

Y (u)h(u)du. (2.31)

The di↵erence between these two quantities is calld the counting process Martingale:

M(t) = N(t)� ⇤(t) (2.32)

M(t) possesses the Martingale property (Mikosch, 1998) In substance it means that the process is driftless,

meaning its expected value for a given time in the future is equal to the current value of the process.

20

Chapter 3

Survival outlier detection and robust

estimation

The outlier detection and robust estimation fields intersect each other very often. Outlier detection con-

cerns the identification of outlying observations in a dataset. Robust estimation or inference, addresses the

consequences of having outliers in the data when applying analysis methods, such as fitting regressions or

computing statistics. There are several views and definitions on what is considered an outlier. Definitions

vary greatly with applications. For example (Hawkins, 1980) defines an outlier as an observation that deviates

so much from other observations as to arouse suspicion that it was generated by a di↵erent mechanism than

the remaining data, (Johnson et al., 1992) defines an outlier saying it is an observation in a data set which

appears to be incosistent with the remainder of that set of data, which alludes to a more parametric view. In

econometrics, (Je↵rey Wooldridge) gives an informal definition, regarding outliers present in a OLS regression

context, loosely speaking, an observation is an influential observation if dropping it from the analysis changes

key OLS estimates by a practical large amount. Informal definitions in the survival field are also prolific,

(Nardi and Schemper, 1999) define outlying observations as individuals whose survival time is too short, or

too long with respect to the values of its covariates. In a regression context, the term outlier is very often

interchanged with overly influential, this term refers to the influence that one particular observation has on

the model estimated parameters, for example the slope parameter of a linear regression.

Taxonomy of Outlier Detection Methods

Outlier detection methods can be divided into several classes. In relation to data dimension, we have a clear

division between univariate methods and multivariate methods. Another important categorization is between

parametric methods and non-parametric methods. Parametric methods are based on the assumption that

the data has an underlying known distribution or even that the data is based on a distribution with unknown

parameters. These methods flag as outliers, the observations that deviate from the a priori model. In the

class of non-parametric methods we only have model-free techniques, in particular distance-based methods

and clustering techniques. Regarding the method’s output: hard classifiers, label each observation as outlier

21

or not; the denominated soft classifiers output an outlying score for each observation. Another important

distinction (Davies and Gather, 1993) is between single-step and sequential procedures. Single-step procedures

do not rely on inward or outward removal of observations, while sequential procedures rely on eliminating the

most outlying observations at each step (outward) or including the least outlying at each step (inward).

Swamping and masking e↵ects.

When trying to identify outliers, one must rely on a quantity to use as measure of outlyingness (for example,

residuals) to assess if a given observation is in fact an outlier or not. When performing such analysis, swamping

and masking phenomenons are very likely to occur. Here we provide their definition given by (Ben-Gal, 2005):

Swamping E↵ect One outlier observation swamps a second observation if the latter can be considered an

outlier in presence of the first but not by itself.

Masking E↵ect One outlier masks another outlier if the second outlier can be considered an outlier only by

itself but not in the presence of the first outlier.

When performing model-based outlier detection, as it will be our case, swamping is prone to generate false-

positives. These occur since the model fit was biased by the presence of the true outliers, and consequently

inlying observations can appear as outliers in relation to the fitted model. Masking will potentially generate

false negatives. True outliers are hidden or “masked” again because of the bias introduced in the model by

the presence of true outliers.

Swamping and masking: a practical example

Consider the artificial data depicted in Figure 3.1. It is composed by 8 inliers that represent a strict linear

trend and two outlying observations (9 and 10) that can be considered outliers. One possible measure of

outlyingness is the regression residual of each observation to the fitted line. In Table 3.1 we have the list of

observations sorted by their linear regression residual. We verify that instead of having the two outliers in the

top two positions, we have observation 8 and 10, and only in the fourth position we have observation 9. In

this example we have a very clear example of swamping and masking e↵ects. Observation 8 is being swamped

by the two outliers, and observation 9 is being masked by observation 10.

Cox Regression Residual Analysis

Similar with typical regression analysis, one of the most tradtitional methods to detect overly influential

observations is to compute the leverage the observation has on the regression line (Rousseeuw and Leroy,

2005). The leverage is a value proportional to (X � X)

2, this method flaw is that it does not take into

account the outcome of the individual, distribution based methods such as Box-plot also su↵er from the

same limitation. In survival analysis it is very typical a model-based approach when assessing for outliers.

In this class, the use of residuals specific of the Cox proportional hazards model are by far the most used

in practice. In this section we review some of these residuals: Cox-Snell residuals, Martingale residuals and

22

1

2

3 4

5

6

9

10

7

8

Figure 3.1: Example of linear regression fit on a data set with outliers.

Table 3.1: Linear regression example: observations sorted by residuals.

Observation Residual

8 25.89

10 24.12

7 10.08

9 8.48

1 1.66

6 1.59

2 0.42

5 0.39

4 0.14

3 0.00

Deviance residuals. Further we review some outlier detection methods typically used on a regression context:

the likelihood displacement statistic and DFBETAS.

Cox-Snell Residuals

The Cox-Snell residuals were the first kind of residuals to be defined for the Cox regression (Cox and Snell,

1968), they are still widely when assessing for outlying observations. The Cox-Snell residual for individual i

is given by:

rCi =

ˆHi

(Ti

) = e�xiH0(Ti

); (3.1)

23

the baseline hazard is usually obtained using the Nelson-Aalen estimator (David Collett, 2003). Important to

notice is that as H(t) is a monotonically increasing function, there is a certain bias for censored observations

to get lower residual values, to correct this, the modified Cox-Snell residuals were introduced as:

rCi =

8><

>:

rCi for observed event times

rCi + 0.693 for censored event times

(3.2)

Martingale residuals

As seen in section 2.11 the martingale counting process is a residual-like quantity that expresses the di↵erence

between observed and expected number of events, if we assign for every individual its own martingale process

we have:

Mi

(t) = Ni

(t)� ⇤

i

(t) (3.3)

The martingale residual is defined as the value of process M(t) at the time of failure/censoring (follow-up

time), as N(t) takes 1 if the event is observed and zero when censored, they are given simply by (Therneau

et al., 1990) :

rMi = �

i

� ⇤

i

(TC

i

) (3.4)

where �i

is the event indicator for individual i, and ⇤

i

(TC

i

), the value of the cumulative hazard function at

the follow-up time of individual i. The value of �i

is present in the data, ⇤i

(TC

i

) can be calculated from the

estimated Cox model with the choice of an appropriate baseline hazard function h0(t).

Deviance Residuals

The skewness of Martingale residuals make its plots is di�cult to interpret. In 1990 (Therneau et al., 1990)

introduced the deviance residuals in order to have them more distributed around zero, they are given by:

rDi = sgn(r

Mi)[�2{rMi + �

i

log(�i

� rMi)}]

12 (3.5)

The likelihood displacement statistic

Let ˆ� be the value of � that maximizes the partial Cox likelihood and ˆ�(�i) the estimate when observation i

is eliminated from the fitting. The likelihood displacement (David Collett, 2003) statistic (LD) is given by:

LDi

= 2logL( ˆ�)� 2logL( ˆ�(�i)) (3.6)

Under the null hypothesis �(�i) = � the LD statistic follows a chi-square distribution with one degree of

freedom. Therefore we calculate the p-value for this test for all observations, the more significant ones are

considered the most outlying observations.

24

DFBETAS

One common criteria used for assessing the influence of one particular observation is to measure its impact

on the estimated parameter vector �. DFBETAS measure the change on the estimation values upon deleting

each observation in turn, scaled by their standard errors (Harrell, 2001). More formally the j-th component

of DFBETAS for a given observation i is given by:

DFBETAj

=

⇣b�j

� b�j

�i

⌘/�

�

. (3.7)

The standard deviation ��

can be provided by the quared root of the each diagonal component of the partial

likelihood Hessian matrix 2.19. Since DFBETAS are a vector valued quantity, analyzing the components

associated with each covariate allows to study in which components does the observation shows an outlying

behavior.

Finite Sample Breakdown Point

The concept of breakdown point was introduced by Hampel in 1968, providing an asymptotic definition. In

this work we will review it in a small sample context (Donoho and Huber, 1983). Qualitatively the breakdown

point corresponds to the smallest fraction of data that may cause an estimator to take on arbitrary values

(Huber, 2011). For instance, when estimating the mean of a sample by the sample mean estimator, one single

corrupt observation is able to o↵set the estimator by an arbitrarily large value, in this case the breakdown is

said to be 1/N where N is the dataset size. On the other hand, to o↵set the median estimator by an arbitrary

value, one necessarily needs half the observations to be corrupt, for this case the breakdown point is 1/2.

Robust methods for the Cox model estimation

It has been shown that the estimation process of a Cox model can be severely a↵fected by the presence of

overly influential or outlier observations. In terms of robustness, it has been pointed in (Kalbfleisch and

Prentice, 2011; Struthers and Kalbfleisch, 1986) that the Cox regression has a breakdown-point of 1/N . In

order to perform robust Cox regression in the presence of outliers, (Farcomeni and Viviani, 2011) proposed a

modified Cox model that is fit by trimming the smallest contributions to the partial likelihood. Choosing a

trimming level ↵, typically between 0.1 and 0.2, then the problem corresponds to find the subset of cardinality

(1�↵)n that maximizes the Cox partial likelihood. The results indicate that this process of trimming increases

the robustness of the method. Simulated data was used with contamination levels of 5%, the trimmed model

was the procedure that ensured the estimation more close to the pure values of the model (data without

contamination). Unfortunately the authors did not publish any code or package for download, so we will

not be able to use it in our results section for comparison with the methods proposed. In 1993 Bednarski

(Bednarski, 1993), proposed a way to increase the robustness of the Cox model estimation. This new method

consists on modifying the maximization process of the Cox partial likelihood. This method is available through

the routine “coxr” from the “coxrobust” R package (Bednarski and Borowicz, 2006).

25

Chapter 4

Proposed Methods

In this chapter we present three new methods aiming at performing outlier detection in a survival analysis

context. We will start by reviewing the Bootstrap resampling procedure and related concepts. Next we provide

the insight of how a test statistic in conjunction with a resampling technique (like the Bootstrap) can be used

to perform outlier detection. The test statistic used, must be sensitive to the presence of outliers, our three

methods rely on using the performance of a survival model as a statistic sensitive to outliers, in particular

we will use Harrell’s concordance c index (Harrell et al., 1982), the rationale is the belief, that the larger the

amount of outliers present in the data, the lower the performance of the model.

The first proposed method is One Step Deletion (OSD), that maximizes concordance of the model on a

dataset by removing the most likely outlying observations at each step. The second developed method is the

Bootstrap Hypothesis Test (BHT), which relies on performing an hypothesis test for each observation in the

dataset, where the null hypothesis means that the observation does not increase concordance when absent

from the data, and thus it is an outlier; the significance from this test is used as measure of outlyingness. The

third developed method is the Dual Bootstrap Hypothesis Test (DBHT), which extends BHT by performing

an hypothesis test on the inequality of two random variables, produced by resampling concordance under two

di↵erent variants of the bootstrap specially designed for this method.

General Strategy

The proposed methods share the same underlying strategy. Let a dataset be denoted by D with observations

d1, d2, ..., dN and let a sample from from this dataset be denoted by D⇤. Let ⌧ represent an arbitrary test

statistic, with ⌧(D⇤) representing the value of the test statistic on sample D⇤. To search for outliers, we will

study the impact of removing each observation di

on the value of ⌧ , again with the requirement that ⌧ must

be somehow sensitive to the presence of outliers. Bootstrap resampling will be employed in order to better

assess the impact that observation di

has on concordance, aiming to make this assessment more resistant

from masking and swamping interactions.

26

The Bootstrap

The Bootstrap (Efron, 1979) is a resampling technique which main goal is to recreate the underlying distri-

bution of the data. It is used when the underlying distribution is unknown or simplifying assumptions are not

reasonable. Bootstrapping can be useful when one wants to gain insight on the behaviour of a test statistic

⌧ on the underlying distribution. Given a dataset D with N observations, one bootstrap sample is obtained

by sampling with replacement N observations from D, the bootstrapped test statistic can be obtained by

calculating the value of ⌧ for each bootstrap sample, like is illustrated in Figure 4.1. One typical example

(Efron, 1979) concerns the sample mean of a data sample. It would be very useful to know the standard

error of this statistic. One way to do this would be using the following Bootstrap procedure: 1) generate B

bootstrap samples from the original data; 2) calculate the sample mean for each of the B bootstrap samples;

3) calculate the standard error of the B sample means calculated from the bootstrap samples. The rationale

behind this resampling procedure is the fact that when resampling with replacement, we are using the original

dataset as a distribution, and in fact the original dataset as an empirical distribution is the best approximation

of the underlying distribution of the observed data.

D = (d1,d2, …,dN)

D*1 D*2 D*B

Original dataset

B bootstrap samples

τ ( D*1) τ ( D*2) τ ( D*B)

B bootstrap replications

Figure 4.1: Bootstrapping a test statistic ⌧ . Adapted from (Efron and Tibshirani, 1994).

27

The Plug-in principle

We denote by F the true underyling distribution that generated dataset D. The empirical distribution ˆF

is defined as the discrete distribution that puts probability 1/N on each observation di

, i = 1, 2, .., N . The

plug-in estimate (Efron and Tibshirani, 1994) of a parameter ✓ = ⌧(F ) is given by:

ˆ✓ = ⌧( ˆF ).

This means that a test statistic of the probability distribution F is estimated by the same function of the

empirical distribution ˆF .

Bootstrap Hypothesis testing

In two of our mehods we will perform hypothesis tests following a Bootstrap approach, further explained in

(MacKinnon, 2009). Given a test statistic ⌧ with an observed value onD of ⌧0, to asses where ⌧0 is on the

distribution F (⌧), we apply the plug-in principle given that F which is unknown. For example, considering we

want to calculate the approximated p-value hatp for the test: H0 : ⌧ > ⌧0; using a bootstrap approach, this

can be done by generating B bootstrap samples and then calculating the fraction of samples which values of

⌧ are larger than ⌧0 :

p = P (⌧ > ⌧0) =1

B

BX

j=1

I�⌧�D⇤

j

�> ⌧0

�. (4.1)

where I represents the indicator function.

4.1 Motivation for the use of Bootstrapping

The semi-clairvoyant outlier counter

In this section we explain the idea behind using a bootstrap approach in order to perform outlier detection. We

will start by defining a test statistic that is sensitive to the presence of outliers and how it can be used on outlier

detection. Finally we propose a way of improving the utility of such statistic by using a bootstrap approach.

As outlier-sensitive test statistic we present what we named as semi-clairvoyant outlier counter (SOC) test

statistic. This test statistic counts outliers in an imperfect way, simulating an entity with some expertise on

counting outliers, but sometimes missing true outliers and other times assuming inliers as outliers.First we

define Isc

, that represents the semi-clairvoyant outlier indicator function, with the following characteristics:

Isc

(d⇤i

) =

8><

>:

Bernoulli(pTP

) when di

is an outlier

Bernoulli(pFP

) when di

is not an outlier (inlier);(4.2)

where pTP

represents the probability of counting a true outlier as an outlier (true positive) , and pFP

the

probability of counting an inlier as an outlier (false positive). The SOC is given by summing the indications

over all observations in given data sample D⇤:

SOC(D⇤) =

nX

i=1

Isc

(d⇤i

); (4.3)

28

It is fairly intuitive to verify that under certain conditions the SOC is an outlier-sensitive test statistic, in

particular if pTP

> pFP

we can expect higher counts for data samples with larger amounts of outliers.

Using SOC for outlier detection

Considering a dataset D of N observations with k < N outliers, our strategy to perform outlier detection

using the SOC operator is based on its expected value under two di↵erent scenarios: 1) when one outlier

is removed from D; and 2) when one inlier is removed from D. The remaining dataset now with N � 1

observations will be denoted D� when one outlier is removed, and by D+ when the observation removed was

an inlier. For the first scenario the expected value of SOC in the remaining data is:

E⇥SOC(D�

)

⇤= (N � k)p

FP

+ (k � 1)pTP

(4.4)

= NpFP

+ k(pTP

� pFP

)� pTP

, (4.5)

similarly for the second scenario, the expected value is:

E⇥SOC(D+

)

⇤= (N � k � 1)p

FP

+ kpTP

(4.6)

= NpFP

+ k(pTP

� pFP

)� pFP

. (4.7)

Making the di↵erence between these two expected values, we have:

E⇥SOC(D+

)

⇤� E

⇥SOC(D�

)

⇤= p

TP

� pFP

(4.8)

Considering pTP

> pFP

we can expect lower values of SOC when an outlier is removed. With this expected

di↵erence under the two scenarios, the following outlier detection strategy could be devised:

1. For all observations remove one at a time.

2. For each removal calculate SOC on the remaining data.

3. The lower the value of SOC on the remaining data, the more outlying is the observation considered to

be.

One potential problem with this strategy is that the di↵erence pTP

� pFP

may be very small in relation to

standard deviation of the SOC statistic in each scenario. The standard deviation of SOC in each scenario is

given by:

SD⇥SOC(D�

)

⇤=

p(N � k)p

FP

(1� pFP

) + (k � 1)pTP

(1� pTP

); (4.9)

SE⇥SOC(D+

)

⇤=

p(N � k � 1)p

FP

(1� pFP

) + kpTP

(1� pTP

). (4.10)

By inspection we see that the standard deviation grows with N. For example, with 100 individuals wich 10 of

them are outliers, and a SOC with values of pTP

= 0.8 and pFP

= 0.1, the di↵erence in expected values is

pTP

� pFP

= 0.7 the standard deviation of SOC for the outlier, and inlier removed scenarios are respectively:

3.09 and 3.10. So when removing an observation the variance of the statistic on the remaining data will

introduce a large amount of confusion in our method, since it is very large in comparison with the di↵erence

of between expected values. To solve this problem we would need an outlier sensitive statistic with much

lower variance, in order to achieve this, we will employ Bootstrap resampling as explained in the next section.

29

Bootstrapping the SOC test statistic

For typical cases, the magnitudes of the variances in Eq. 4.9 will be much higher than the di↵erences between

the expected values of the two scenarios — Eq. 4.8. It is predictable that a single calculation of the SOC

statistic when removing each observation will have very low significance due to the high variance of SOC.

To mitigate this e↵ect, a bootstrap approach can be employed. Even if SOC was described as a stochastic

test statistic we consider that it takes the same value for the same dataset, and probabilities will reflect the

uncertainty about the data sample to which SOC is applied. Drawing B bootstrap samples from D we can

define a new test statistic named SOCB , corresponding to the mean SOC bootstrapped over B bootstrap

samples D⇤1 , D

⇤2 , .., D

⇤B

:

SOCB

=

PB

i=1 SOC(D⇤i

)

B

By the linearity of the expected value, the expected di↵erence between scenarios continues to be (pTP

�pFP

),

while the standard deviation, this time is given by:

SD

"PB

i=1 SOC(D⇤i)

B

#=

sB

B2V ar(SOC(D⇤i))

�(4.11)

=

r1

B⇥ SD [SOC(D⇤i)] ; (4.12)

where SD [SOC(D⇤i)] is the standard deviation seen in expression 4.9. This last expression translates the

advantage of intensive simulation in order to minimize the variance of the SOC statistic. Now for the same

setting in as in section 4.1 with a value of B = 100, instead of a standard deviation of about 3 we would

have an approximate standard deviation ofq

1100 ⇥ 3 =

310 = 0.3. This will ultimately allow for a significant

outlying score. In order to achieve more significance for the outlying scores we have to increase B, this value

will necessarily depend on the size of the data set and the level of significance one wants for the statistic.

Concordance as an outlier-sensitive statistic

The choice of which test statistic to use as a potential outlier-sensitive statistic will greatly influence the

performance of our methods. As mentioned the statistic we will use will be related with a survival model’s

performance, to assess it, we will use Harrell’s concordance c index. The main assumption underlying our

methods is that the c index of a survival model fit on a given dataset will increase as the quantity of outliers

decrease in the data. Behind this choice is the fact that the c index is a rank measure, thus it only measures

how well predicted values are concordant with rank-ordered response variables. For example, the c-index for

two patients with predicted hazard ratios of 0.4 and 0.6 is the same as if the patients had hazard ratios of

0.1 and 0.9 (Harrell, 2001), it only measures if the outcome is concordant with the response variables or not.

Thus, unlike measures such as the sum of squared errors, one observation by itself has a limited contribution

for the overall concordance. This robustness may allow for the maximization of the c-index without worrying

if it is being maximized at the cost of the majority of the data, only to fit better one or a cluster of outlying

observations, as it can happen with the sum of squared errors as exemplified in (Fischler and Bolles, 1981).

30

4.2 One Step Deletion

The One Step Deletion (OSD) algorithm removes a subset of observations in order to maximize the con-

cordance of a model fit on the remaining data. This maximization is made by a greedy search. At each

step, every observation is temporarily removed and corresponding concordances computed for the model on

the remaining data. The observation that when removed caused the highest improvement in concordance is

eliminated definitely from the data. Each of this eliminated observations is considered more outlying than

the ones that remained in the dataset. The algorithm terminates when the quantity of removed observations

equals a reasonable amount of expected outliers. The output consists on the subset of observations that were

eliminated, which are considered the most outlying ones.

Input

Besides the input dataset D and a survival model, this method has one input parameter given by k, cor-

responding to the maximum amount of outliers expected in the data. This parameter is needed since is

impossible to remove all observations, eventually the model estimation will cease to converge as the number

of remaining observations gets too low.

Output

The output is the subset of the k most outlying observations according to the method. In this subset there

is no score of outlyingness defined. No score of outlyingness is defined since due to masking and swamping

e↵ects, one cannot conclude if the first observation to be removed was the most extreme outlier, because

more extreme outliers might be masked by this one particular observation, or the observation that has been

removed was being swamped by extreme outliers and was not even an outlier.

Algorithm

The input parameter k determines the number of steps the algorithm. At each step one of the observations

is removed from the data, not being part of it in further steps. To decide which observation is removed,

at each step the method removes one observation at a time and fits a Cox model on the remaining data,

the observation that when removed, led to the highest concordance is removed. On the rare case when no

removal improves the concordance of the model, the algorithm terminates, returning the observation removed

until then, these steps are given in detail in algorithm 1.

31

Algorithm 1 One Step Deletion

1: Inputs: D : input dataset; Model : survival model k : number of expected outliers.

2: Output: subset of removed observations.

3: OutlierSet = ? {stores the observations already removed (outliers)}

4: count = 0 {counts the number observations already removed}

5: Dactual

= D {Dactual

contains the actual set of remaining observations}

6: while count k do

7: C0 = C(Dactual

,Model) {compute the model concordance for the actual set}

8: �Ccandidate

= 0 {initialize the concordance variation with zero}

9: dcandidate

= null {start with no candidate to be removed}

10: for all di

2 Dactual

do

11: Di

actual

= Dactual

\ di

{remove observation i from the actual set}

12: �Ci

= C(Di

actual

,Model)� C0 {compute the concordance variation upon removing observation

i}

13: if �Ci

> �Ccandidate

then

14: �Ccandidate

= �

i

15: dcandidate

= di

{if the concordance improvement of i is larger than of the previous candidate, i is now the candidate

observation to be removed}

16: end if

17: end for

18: if dcandidate

6= null then

19: OutlierSet = toRemoveSet [ dcandidate

{add the candidate to the output}

20: Dactual

= Dactual

\ dc

andidate {remove the observation from the actual set}

21: count++ {increment the number of removed observations}

22: else

23: return OutlierSet {special case when no removal improves concordance}

24: end if

25: end while

26: return OutlierSet

Example

In this example we apply BHT to a dataset, to illustrate how it works and to analyze its output. As survival

model we chose the Cox model and a parameter of k equal to 15 (in this case 15% of the total observations).

In Table 4.1 we can observe the algorithm flow. The column �C displays the concordance improvement

at each step (a removal). The last column C contains the concordance c-index of the fitted model on the

remaining dataset. The respective output of the algorithm is given in Table ??, the outlyingness measure

used, is the order in which the observations were removed, so the faster the observation gets removed, the

32

more outlying is considered to be. By inspecting Table 4.1 we verify that if the outlyingness measure was �C

the ranking of the outliers would be di↵erent. Experimentally, sorting by order of removal has shown to be

more successful at identifying the outliers, this may be due the fact that due to masking and swamping the

values of �C depend strongly on the observations already removed.

Table 4.1: Evolution of the OSD algorithm when applied to an example dataset.

#step Observations removed �C C

0 {} 0 0.692

1 {1} 0.0111 0.7031

2 {1,67} 0.0099 0.7130

3 {1,67,97} 0.0112 0.7242

4 {1,67,97,51} 0.0068 0.7310

5 {1,67,97,51,23} 0.0089 0.7399

6 {1,67,97,51,23,31} 0.0059 0.7458

7 {1,67,97,51,23,31,93} 0.0049 0.7507

8 {1,67,97,51,23,31,93,52} 0.0109 0.7616

9 {1,67,97,51,23,31,93,52,56} 0.0063 0.7679

10 {1,67,97,51,23,31,93,52,56,57} 0.0098 0.7777

11 {1,67,97,51,23,31,93,52,56,57,7} 0.0090 0.7867

12 {1,67,97,51,23,31,93,52,56,57,7,30} 0.0113 0.7980

13 {1,67,97,51,23,31,93,52,56,57,7,30,13} 0.0106 0.8086

14 {1,67,97,51,23,31,93,52,56,57,7,30,13,78} 0.0110 0.8196

15 {1,67,97,51,23,31,93,52,56,57,7,30,13,78,8} 0.0124 0.8320

Comments

Although we previously argued that concordance can be relatively resistant to masking and swamping, the

step-wise process of removing one observation can be very fragile. We can have outlying observations that

are only perceived as outliers in a later stage of the process, because the observations responsible for masking

them were not removed at an early stage. Similarly for swamping we can have non-outlying observations

removed at an early stage erroneously because the model fitting was severely a↵ected by the presence of

strong outliers that were swamping those regular observations. On the other hand if the swamping and

masking characteristics are not an extreme scenario, if concordance captures the e↵ect of outliers we have a

computationally cheap and extremely simple algorithm for survival outlier detection.

4.3 Bootstrap Hypothesis Test Outlier Detection

Singh K. and Xie M. in their 2003 paper (Singh and Xie, 2003) introduced the Bootlier plot, a method that

uses the bootstrap resampling technique to extract outliers. As an example for their rationale, they consider

33

the bootstrapping of the sample mean on a dataset of N real numbers, containing only one outlier. The

probability of having a bootstrap sample free of outliers is given by: (1� 1n

)

n ⇡ 1/e(⇡ 37%) as n ! 1. This

means that 37% of the bootstrap samples will be outlier-free, the remaining 63% will contain at least one

outlier.If the outlier observation seriously a↵ects the sample mean, the authors argue that when generating

B bootstrap samples and respective histogram of B sample mean values will present multi-modal e↵ect. In

particular two modes: one resulting from the 37% outlier-free samples and the other mode, from the remaining

samples which contain at least one outlier — that severely contaminates the sample mean. This e↵ect is

illustrated in Figure 4.2, each large circle containing smaller circles, represents a bootstrap sample. A portion

of the samples do not contain the outlier, so their mean maps much lower on the histogram. The bootstrap

samples containing the outlier map much higher in the histogram. With only one outlier, the following strategy

Figure 4.2: Representation of the multi-modal e↵ect of the sample mean with one outlier in the data (red

circle).

to detect outliers can be devised: 1) for each observation i compute the histogram of the sample mean from

B bootstrap samples generated from the data without observation i; 2) the observation that when removed

does not cause a multi-modal histogram is the outlier. Unfortunately a single-outlier setting is a very limited

scenario. To overcome this limitation, we will use the belief that the concordance of a survival model is an

outlier-sensitive test statistic — it tends to increase in data samples with less number of outliers. Using such

assumption we will remove one observation at a time, and the observations that more systematically improve

concordance (when absent) will be considered the most outlying ones. This method does one hypothesis test

for each observation in the dataset. The resulting p-value is assigned as outlying measure to the observation

under test.The hypothesis test for each observation i can be stated as:

H0 : CModel,⇠ ˆ

F�i C

original

H1 : CModel,⇠ ˆ

F�i> C

original

34

The hypothesis tests will be made following the bootstrap approach explained in section 4. Each empirical

distribution ˆF�i

represents the a distribution where observation di

has probability zero. Writing �Ci

=

CModel,(X,T,�)⇠Datai

� Coriginal

it is more useful to formulate the hypothesis test as:

H0 : �Ci

0 (4.13)

H1 : �Ci

> 0 (4.14)

Input

Besides a survival model and input dataset D, this method has one input parameter: B, corresponding to

the number of bootstrap samples generated from the empirical distribution. The value of B needs to be large

enough in order to achieve the convergence of the p values . The number of necessary bootstrap samples B

necessary for the convergence of the output, has shown to be dependent on the number of individuals and

number of covariates. In our tests the value for B was iteratively increased until convergence of the p values.

Output

The BHT method is a soft-classifier and single-step method, the output consists of an outlying score for each

observation.

Algorithm

In algorithm 2 we have the sequence of operations needed to compute each observations p-value. First we

compute the baseline concordance C0 as the concordance of the model fit with all observations. Then for each

observation, we remove the observation under test from the data, then generate B bootstrap samples from

the remaining data. The proportion of samples who register a model concordance higher than the baseline,

corresponds to the p-value.

Algorithm 2 Bootstrap Hypothesis TestInput: D : input dataset; Model survival model; B : number of bootstrap samples.

Output: a p value for each observation di

2 D.

C0 = C(D⇤j

,Model) {compute the baseline concordance C0 as the concordance on the original} dataset

for all di

2 D do

D�i

= D \ di

{ remove observation i from the original dataset}

From D�i

generate B bootstrap samples D⇤1 , D

⇤2 , ..., D

⇤B

.

p[i] =BP

j=1I(C(D⇤

j

,Model) > C0) {compute the p-value for each observation}

end for

return the vector of p-values p

35

Example

To illustrate how the BHT method works we present some results when applying BHT on dataset. An

example of BHT’s output is presented in Table 4.3 where the observations are sorted by their p value from

the hypothesis test in expression 4.13. In Figure 4.3 we have the overlapped histograms of the bootstrapped

concordance for two di↵erent observations from the dataset, where one, following the concordance criteria, is

clearly more outlying than the other, the histogram on blue corresponds to a more outlying observation

than the one corresponding to the red histogram. This blue histogram is more shifted to the right of

� = 0 than the red, so the p-value for the test will be lower and thus more outlying. Using the p-value

as measure of outlyingness, allows us to measure how systematically the removal of such observation leads to

the improvement of concordance on the remaining data.

Figure 4.3: Two histograms: outlier (blue) and inlier observation (red) produced by BHT.

Comments

This method extends the ideas of the Bootlier plot (Singh and Xie, 2003). Being a single-step procedure, its

output is more flexible in terms of analysis, mainly it allows the definition a threshold for significance from

which an observation is considered an outlier or not. Given being single-step it also tends to be less sensitive

to masking and swamping e↵ects.

4.4 Dual Bootstraps Hypothesis Testing Outlier Detection

This method aims to improve the approach taken in the BHT method. In the BHT method, removing one

observation from the dataset, and then assess the impact of each removal on concordance has a undesired

e↵ect, since the model has less observations to fit (observation under test is removed), there is tendency for

the concordance to increase, this potentially introduces confusion in the hypothesis test made in BHT, in

particular it may increase the number of “false positives’. The rationale behind DBHT is to generate two

36

Table 4.2: Example of a BHT output, sorted by p values

# Observation p

1 67 0.274

2 1 0.284

3 78 0.285

4 56 0.285

5 69 0.293

6 8 0.294

7 45 0.300

8 93 0.308

9 30 0.313

10 32 0.315

11 23 0.316

12 100 0.316

13 91 0.325

14 29 0.326

15 13 0.328

histograms from two antagonistic versions of the bootstrap procedure: the poison and antidote bootstraps

and then compare them. The antidote bootstrap excludes the observation under test from every bootstrap

sample, it can be described as:

1: Input: dataset D; index of observation under test i.

2: D�i

= D \ di

.

3: Generate B bootstrap samples from D�i

— each with size N .

4: Output: B antidote bootstrap samples

The poison bootstrap works by forcing the observation under test to be part of every bootstrap sample, the

procedure is the following:

1: Input: dataset D; index of observation under test i.

2: Generate B bootstrap samples from D�i

— each with size N � 1.

3: Add observation di

to each bootstrap sample generated.

4: Output: B poison bootstrap samples

Using these two bootstrap variants the strategy is the following: for each observation i we make the hypothesis

that such observation is “poison” (meaning the observation is an outlier). To test it, we will compare the

histograms of concordance variation �C between the antidote and poison bootstraps. If the observation is

an outlier, we are expecting that the antidote bootstrap pushes the histogram for higher values of �C —

since that one outlier is always absent from the samples. Additionally we are expecting the poison bootstrap

to generate lower values of �C since all samples will have an outlier. The more the poison histogram is to the

37

left of the antidote histogram, the more outlying the observation is considered to be. We consider �Cantidote

and �Cpoison

as two real random variables and we perform the following hypothesis:

H0 : E [�Cantidote

] > E [�Cpoison

] ; (4.15)

H1 : E [�Cantidote

] E [�Cpoison

] , (4.16)

to calculate the p-value of the test we use a independent two sample t test with unequal variances as described

in (Rajagopalan, 2006) (“test for equality of population means with known equal variances”).

Input

Similar to the BHT method, besides a survival model and the input dataset D, DBHT only takes one input

parameter: B the number of bootstrap samples used on the antidote and poison bootstrap procedures.

Output

The BHT method is a soft-classifier and single-step method. Thus the output is an outlying measure for each

observation, from this one can extract the the k most oultying observations.

Algorithm

Algorithm 3 Dual Bootstraps Hypothesis TestInput: D : input dataset; Model survival model; B : number of bootstrap samples.

Output: a p-value for each observation.

for all di

2 D do

D�i

= D \ di

{remove observation i from the original dataset}

Generate B poison bootstrap samples.

Generate B antidote bootstrap samples from.

Compute the B values of �Cpoison

and store them in vector psn.

Compute the B values of �Cantidote

and store them in vector ant.

From psn and ant compute the p-value using a t test for equality of means.

end for

return the vector of p-values p

Example

In figure 4.4 we have the poison and antidote histograms of observation 1 — overlapped and clearly apart

from each other, confirmed by the low p value of the test. Contrasting with observation 1,in Figure 4.5 we

have the histograms for observation 82, there is no clear distinction between the two histograms, this indicates

that this observation is not that influential to the concordance of a survival model and therefore it does not

appear to be an outlier.

38

Figure 4.4: Contrast between antidote (blue) and poison (red) bootstrap histograms of concordance variation

— for a typical outlier.

Figure 4.5: Antidote (blue) and poison (red) bootstrap histograms of concordance variation — for a typical

inlier.

39

Chapter 5

Results

5.1 Goals

In this section we assess the performance of the developed methods: OSD ,BHT, and DBHT on several

datasets. To compare their performances we will also employ to the same datasets, some of the alternative

methods previously mentioned in Chapter 3 in particular, martingale residuals (MART), deviance residuals

(DEV), likelihood displacement statistic (LD), and DFBETAS (DFB). There were used two types of datasets:

1) artificial d.atasets, that simulate a set of survival observations containing outliers; and 2) real clinical

datasets. For the artificial datasets, having prior knowledge of which observations are outliers, we can assess the

methods’ performance on doing outlier identification. Our goal is to check if our concordance based methods

can match or even surpass the alternative methods in terms of performance on the scenarios recreated. We

will analyze how each method’s performance evolves under di↵erent conditions such as the censoring amount,

quantity of outliers and if it is a↵ected by the type of baseline hazard used to simulate the data. Another

goal was to study the behavior of parameter B — the number of bootstrap samples — for the methods BHT

and DBHT, more specifically the relation between B and the method’s performance. On the real datasets

containing clinical data, we will study our approach to robust Cox regression based on trimming a certain

quantity of outliers from the original data. Given the fact that the breakdown-point of the Cox estimation

process is the lowest possible (see section 3) if the methods are accurate enough, we may exclude observations

that could have been distorting the model estimation when all observations were included in the fit, we call

this process outlier trimming. Outlier trimming consists on removing a certain level of outliers, for example:

the 5% or 10% of the most outlying observations, in order to fit the model on the remaining uncontaminated

dataset. Based on this we will compare the cross-validated c-index of a Cox model, when fitting the model

with all observations, trimming the top-3 outliers, trimming the top-10 and finally the top-30 outliers.

5.2 Simulated Data

Before generating the simulated data, we have to define what is an outlier in our simulation study. An

observation is considered outlying regarding the relationship between the vector of covariates Xi

and (Ti

, �i

).

40

Our rationale is based on the fact that in general we cannot judge about the outlyingness of an observation

only by looking at its covariates. Having unusual covariates is not enough to be considered an outlier, instead

we focus on unusual behavior. This relationship can be translated by a survival model such as the Cox

proportional hazards model, that belongs to the class of generalized linear models (GLM).In a survival context

GLM models describe the hazard of an individual by a function of a linear combination of its covariates and

time. The hazard function in a generalized linear model is described as:

h(t,X) = g(t,�X); (5.1)

for the Cox model, g is proportional to the exponential of �X. These type of models are completely

characterized by their parameter vector �, it defines the direction of hazard, meaning it describes the e↵ects

of covariates on survival time. We consider an outlying observation, one that is generated by a model, very

distinct from the model generating the large majority of the data. The more distinct the model is, the more

outlying we can consider the observation to be. To evaluate the distinction between the models generating

each observation, we only have to look at at the � parameters. In Figure 5.1 we have the example of a

β2

β1

βG

β’

β’’

β’’’ β’’’’

Θ’’’

Figure 5.1: A 2-D example of a general trend �G with examples of outliers sources.

pure model �G that generates the large majority of the data, the remaining vectors are models that produce

outlying observations. Each vector represents a two dimensional � parameter of a GLM model. Looking at

each one separately, we have �0 similar in direction to the general trend but shorter, meaning the individuals

will have lower hazard for the same covariates, �00is also very similar in direction but it has a larger norm

which means the same covariates will have a higher hazard; �000

points in the opposite direction of the general

41

trend, so the e↵ects of the two covariates are the opposite comparing with the general trend model. Other

variants are models that are not opposite but have a negative dot product like �0000. We define two measures

of outlyingness that are measured in relation to the general trend model �G: the discrepancy between norms

|�|/|�0 | and the value of cos ✓. The value of cos ✓ the lower, the more opposite is the e↵ect of covariates

in relation to the general trend. We can consider for example an outlier, as a person that has a response

to a drug opposite from the vast majority, if in general the administration of a certain drug decreases the

patient’s hazard, for this outlier it will increase. The di↵erence in norms aims to translate, the discrepancy in

the hazards magnitude for the same covariates, continuing with the drug example, for di↵erent norms, if the

administration of a certain drug decreases the hazard, for the outlying patient it will also decrease but in a much

more lower/higher quantity than the general trend. In our results section we will generate outliers, varying

both the angle and the magnitude of � parameters of outlying models. When measuring the performance

of outlier detection methods on the simulated datasets we have to take into account that the observations

are randomly generated from distributions: the inliers from the general distribution �G

= (1, 1, 1), and the

outliers from an outlying distribution �0. It may happen that observations initially intended to be inliers, may

be drawn from the lower or upper tail of the distribution and may configure an outlier. For the same reason,

it may also occur that observations generated from the outlying distribution become inliers. Our analysis

of performance will assume that for each scenario the observations generated from general distribution are

inliers and the observations generated from the outlying distribution are outliers. To assess performance we

will use two measures: the True Positive Rate (TPR), also known as sensitivity, and the area under the ROC

curve (AUC). For datasets with k outliers the TPR will measure for each scenario the fraction of true outliers

found in the top-k most outlying observations indicated by each method. The AUC provides us a threshold-

independent outlier detection ability. The AUC is not applicable to the output of the OSD method, because

it does not provide an outlying score for every observation. The TPR and AUC are measured applying them

to 50 random datasets per simulation configuration, then we take the mean and standard deviation of the

metrics.

Generating survival observations from a Cox model

The model chosen to recreate survival times was the Cox proportional hazards. The simulated observations

will be generated from two di↵erent Cox models, a general trend model �G and an outlier model �0. From

the Cox hazard function, the distribution of T is given by:

F (t|X) = 1� exp [�H0(t)⇥ exp(�X)] ;

the vector of covariates X characterizing each individual,will be generated from a three-dimensional normal

distribution with zero mean with identity covariance matrix. The survival times are generated using the

methodology explained in (Bender et al., 2005), each observation time as function of the covariate vector X

can be given by:

T = H�10 [� log(U)⇥ exp(��X)] , (5.2)

42

where U is a uniform random variable distributed in interval [0, 1]. Before being able to generate survival times

from the hazard functions produced by each Cox model, we need to define the baseline hazard function h0(t).

Our choice is the Weibull function, characterized two parameters:scale � and shape ⌫. For T ⇠ Weibull(�, ⌫)

the corresponding baseline hazard is:

hWeibull

(t) = �⌫t⌫�1

The inverse of the corresponding cumulative hazard function is given by:

H�10 (t) = (��1t)1/⌫ ; (5.3)

inserting this cumulative baseline hazard function in equation 5.2 we have:

T = ��1[� log(U)⇥ exp(��X)]

1/⌫=

✓� log(U)

�⇥ exp(�X)

◆ 1⌫

.

in order to recreate random censoring we will generate event indicators as:

�i

⇠ Bernoulli(p = c).. (5.4)

Several scenarios will be simulated, for each one, the vector of covariates is given by Xi

⇠ N(0, I), where I

is the identity matrix. Each simulated dataset contains 100 observations with hazard functions given by:

hi

(t) =

8><

>:

h0(t) exp{�GX} 1 i n� k;

h0(t) exp{�0X} n� k < i n;

. (5.5)

where �G, the pure model will be always equal to (1, 1, 1), the twelve di↵erent vectors simulated for �0can be

seen in Table 5.1. Concerning censoring, we will experiment scenarios with amounts of c = 0.2 and c = 0.3 of

censored observations. Regarding the characteristics of outliers, levels of k = 5 and k = 10 will be simulated.

On the generating the observations three types of baseline hazard h0(t) will be used. The baselines correspond

to having a baseline survival function that follows a Weibull distribution function, for its scale � and shape

⌫ parameters, three configurations were used corresponding to a constant, strictly decreasing, and strictly

increasing hazards functions, represented in Figure 5.2. The following dimension values were considered in

our simulation:

Levels of censoring: c = 0.2 and c = 0.3.

Outlier amounts: k = 5 and k = 10.

Weibull baseline hazards with parameters: (� = 1, ⌫ = 1), (� = 1.5, ⌫ = 0.5), (� = 0.5, ⌫ = 1.5).

The methods use the Cox proportional hazards as survival model, OSD is parametrized with k = 10, and for

DBHT and BHT the number of bootstrap samples used was B = 1000.

43

Table 5.1: The di↵erent outlier configurations used in the simulation data. The pure model is �G

=(1,1,1).

# ⇥

0 ||�0 ||/||�G|| �0

1 180 1 (-1,-1,-1)

2 180 0.2 (-0.2,-0.2,-0.2)

3 180 5 (-5,-5,-5)

4 135 0.2 (-0.143,0,-0.283)

5 135 5 (-3,6,0,-7.07)

6 90 0.2 (-0.245,0,-0.245)

7 90 5 (6.12,0,-6.12)

8 0 0.2 (0.2,0.2,0.2)

9 0 5 (5,5,5)

10 180 10 (-10,-10,-10)

11 0 10 (10,10,10)

12 135 10 (-7.15,0,-14.15)

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

t

h(t)

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

t

h(t)

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

t

h(t)

Figure 5.2: The three types of Weibull baseline hazards used: � = 1, ⌫ = 1 (blue); � = 1.5, ⌫ = 0.5 (orange);

� = 0.5, ⌫ = 1.5 (red).

44

Simulation results

The average TPR and AUC for each scenario are displayed respectively on Table 5.2 and 5.3, more detailed

results of the simulation can be seen in appendix I 6. The highest value for each scenario is marked on bold.

We may observe that for both the TPR and AUC, the DBHT method attains the best performance for 9

of the 12 di↵erent outlier scenarios. Also worth notice is the very poor performance on scenarios 9 and 11,

these two scenarios correspond to the only two scenarios where the the angle between �G and �0is zero and

the magnitude (thus the hazard) is higher than the general trend (see Table 5.1). One possible explanation

for the low performance registered may come from the fact that these type of outliers may help in pointing

the model in the right direction, and can possibly mitigate the e↵ect of other outliers originated from the

general distribution. Additionally as explained previously, concordance is a rank coe�cient, so the di↵erences

in magnitude are not so easily captured as di↵erences in angle, as indicated by the overall results (Tables 5.3

and 5.2).

Table 5.2: Average of TPR grouped by outlier scenarios.

Scenario # MART DEV LD DFB OSD BHT DBHT

1 0.29 0.36 0.43 0.36 0.47 0.43 0.47

2 0.22 0.25 0.31 0.29 0.32 0.31 0.34

3 0.50 0.58 0.59 0.52 0.63 0.59 0.65

4 0.22 0.23 0.30 0.28 0.30 0.29 0.32

5 0.44 0.54 0.52 0.48 0.58 0.53 0.58

6 0.21 0.22 0.28 0.26 0.27 0.26 0.28

7 0.40 0.50 0.40 0.41 0.44 0.37 0.42

8 0.18 0.18 0.23 0.22 0.22 0.20 0.23

9 0.32 0.36 0.18 0.25 0.09 0.06 0.07

10 0.53 0.63 0.64 0.57 0.68 0.60 0.70

11 0.38 0.46 0.24 0.32 0.14 0.11 0.12

12 0.49 0.60 0.54 0.51 0.60 0.52 0.60

Scenario sensitivity analysis

Here we present the analysis on how the performances of the outlier detection methods behave under di↵erent

simulation conditions. For each of the 12 outlier scenarios we break down the averaged values shown in

Tables 5.2 and 5.3 by each simulation dimension: outliers amount, level of censoring and baseline hazard

type. The methods performances sliced by the two values of k are present in Tables 5.4, 5.5, 5.6, and 5.7.

Among the alternative methods the martingale residuals (MART) is the only one that presents a consistent

TPR increase when passing from 5 to 10 outliers present in the data, the TPR of our proposed method

consistently increases when going from 5 to 10 outliers. Regarding AUC there is an overall tendency to

decrease as the outlier level goes from 5 to 10. The two metrics point in di↵erent directions regarding the

relation between censoring and performance.

45

Table 5.3: Average of AUC grouped by outlier scenarios.

Scenario # MART DEV LD DFB BHT DBHT

1 0.70 0.70 0.74 0.68 0.78 0.82

2 0.65 0.65 0.70 0.64 0.71 0.75

3 0.80 0.80 0.78 0.77 0.86 0.90

4 0.64 0.64 0.69 0.63 0.71 0.73

5 0.78 0.77 0.74 0.75 0.82 0.84

6 0.63 0.63 0.67 0.63 0.68 0.71

7 0.76 0.76 0.66 0.73 0.70 0.72

8 0.62 0.62 0.66 0.62 0.65 0.68

9 0.74 0.72 0.61 0.69 0.60 0.60

10 0.83 0.83 0.80 0.81 0.87 0.92

11 0.78 0.76 0.61 0.73 0.59 0.61

12 0.80 0.80 0.74 0.78 0.81 0.86

Table 5.5: Average TPR of the proposed methods grouped by outlier scenario and outlier amount k.

OSD BHT DBHT

k = 5 k = 10 k = 5 k = 10 k = 5 k = 10

1 0.43 0.51 0.40 0.46 0.43 0.51

2 0.27 0.37 0.27 0.35 0.29 0.38

3 0.59 0.67 0.58 0.59 0.63 0.67

4 0.25 0.34 0.26 0.32 0.28 0.36

5 0.57 0.60 0.54 0.52 0.57 0.58

6 0.24 0.31 0.23 0.29 0.25 0.32

7 0.41 0.47 0.37 0.37 0.39 0.44

8 0.17 0.26 0.16 0.24 0.18 0.28

9 0.05 0.13 0.04 0.09 0.04 0.09

10 0.66 0.70 0.60 0.60 0.69 0.71

11 0.10 0.18 0.08 0.13 0.08 0.16

12 0.57 0.63 0.53 0.51 0.59 0.61

46

Table 5.4: Average TPR of the alternative methods grouped by outlier scenario and outlier amount k.

MART DEV LD DFB

k = 5 k = 10 k = 5 k = 10 k = 5 k = 10 k = 5 k = 10

1 0.26 0.31 0.35 0.36 0.41 0.45 0.36 0.37

2 0.19 0.24 0.22 0.28 0.28 0.35 0.27 0.30

3 0.45 0.55 0.58 0.57 0.61 0.56 0.54 0.50

4 0.20 0.24 0.22 0.25 0.27 0.33 0.26 0.31

5 0.38 0.50 0.56 0.52 0.56 0.48 0.50 0.47

6 0.20 0.22 0.19 0.25 0.26 0.31 0.25 0.28

7 0.34 0.45 0.50 0.51 0.41 0.38 0.42 0.41

8 0.16 0.19 0.15 0.21 0.20 0.27 0.20 0.25

9 0.29 0.35 0.35 0.38 0.16 0.19 0.22 0.28

10 0.48 0.58 0.65 0.62 0.68 0.59 0.58 0.56

11 0.36 0.41 0.45 0.48 0.22 0.26 0.29 0.35

12 0.44 0.54 0.62 0.58 0.58 0.50 0.52 0.50

Table 5.6: Average AUC of the alternative methods grouped by outlier scenario and outlier amount k.

MART DEV LD DFB

k = 5 k = 10 k = 5 k = 10 k = 5 k = 10 k = 5 k = 10

1 0.70 0.69 0.70 0.69 0.74 0.74 0.69 0.66

2 0.66 0.63 0.66 0.63 0.70 0.70 0.65 0.62

3 0.82 0.79 0.82 0.78 0.81 0.81 0.79 0.75

4 0.65 0.63 0.65 0.62 0.70 0.70 0.64 0.62

5 0.78 0.77 0.79 0.76 0.78 0.78 0.77 0.74

6 0.65 0.62 0.65 0.62 0.68 0.68 0.65 0.61

7 0.77 0.75 0.77 0.75 0.70 0.70 0.74 0.71

8 0.63 0.61 0.63 0.61 0.66 0.66 0.63 0.60

9 0.74 0.74 0.73 0.71 0.63 0.63 0.70 0.67

10 0.84 0.82 0.85 0.82 0.84 0.84 0.82 0.79

11 0.78 0.78 0.77 0.76 0.64 0.64 0.73 0.73

12 0.81 0.79 0.82 0.78 0.78 0.78 0.79 0.76

47

Table 5.7: Average AUC of proposed methods by outlier scenario and outlier amount k.

BHT DBHT

k = 5 k = 10 k = 5 k = 10

1 0.77 0.79 0.81 0.82

2 0.71 0.72 0.74 0.75

3 0.87 0.85 0.90 0.89

4 0.72 0.71 0.73 0.73

5 0.83 0.81 0.85 0.83

6 0.69 0.67 0.71 0.70

7 0.72 0.68 0.73 0.71

8 0.66 0.64 0.68 0.67

9 0.62 0.58 0.61 0.58

10 0.88 0.86 0.92 0.91

11 0.61 0.58 0.62 0.59

12 0.83 0.80 0.87 0.85

Concerning the level of censoring, in tables 5.8,5.9,5.10, and 5.11 is displayed the performance metric for

each method discriminated by censoring levels c = 0.2 and c = 0.3. In terms of TPR, the alternative methods

show a slight decrease in performance when the censoring level was increased from 0.2 to 0.3, the proposed

methods presented a similar behavior. Analyzing the changes in AUC from passing c = 0.2 to c = 0.3, we

verify that all alternative methods presented a significant drop in performance, while the proposed methods

experienced only a small decrease similar to the decrease in TPR.

Table 5.8: Average TPR of the alternative methods grouped by outlier scenario and level of censoring.

MART DEV LD DFB

c = 0.2 c = 0.3 c = 0.2 c = 0.3 c = 0.2 c = 0.3 c = 0.2 c = 0.3

1 0.29 0.29 0.36 0.35 0.44 0.42 0.36 0.37

2 0.22 0.22 0.26 0.24 0.34 0.29 0.29 0.28

3 0.49 0.50 0.60 0.55 0.63 0.54 0.53 0.51

4 0.21 0.22 0.24 0.23 0.31 0.29 0.27 0.29

5 0.44 0.44 0.56 0.52 0.56 0.49 0.50 0.47

6 0.22 0.20 0.24 0.21 0.30 0.26 0.28 0.25

7 0.39 0.41 0.52 0.49 0.41 0.38 0.42 0.41

8 0.17 0.18 0.16 0.19 0.23 0.23 0.21 0.23

9 0.31 0.33 0.39 0.34 0.19 0.16 0.26 0.24

10 0.53 0.53 0.65 0.62 0.68 0.59 0.57 0.56

11 0.39 0.37 0.48 0.45 0.25 0.24 0.33 0.30

12 0.49 0.49 0.63 0.57 0.59 0.49 0.53 0.50

48

Table 5.9: Average TPR of the proposed methods grouped by outlier scenario and level of censoring c.

OSD BHT DBHT

c = 0.2 c = 0.3 c = 0.2 c = 0.3 c = 0.2 c = 0.3

1 0.48 0.47 0.44 0.42 0.47 0.47

2 0.33 0.31 0.32 0.30 0.35 0.32

3 0.65 0.61 0.62 0.55 0.67 0.63

4 0.30 0.30 0.30 0.28 0.32 0.31

5 0.61 0.55 0.56 0.51 0.60 0.55

6 0.29 0.25 0.29 0.23 0.30 0.26

7 0.45 0.43 0.39 0.36 0.43 0.41

8 0.20 0.23 0.18 0.21 0.22 0.24

9 0.09 0.09 0.06 0.07 0.06 0.07

10 0.70 0.66 0.61 0.59 0.72 0.68

11 0.14 0.14 0.10 0.11 0.12 0.12

12 0.63 0.57 0.54 0.50 0.63 0.57

Table 5.10: Average AUC of the alternative methods grouped by outlier scenario and censoring amount c.

MART DEV LD DFB

c = 0.2 c = 0.3 c = 0.2 c = 0.3 c = 0.2 c = 0.3 c = 0.2 c = 0.3

1 0.70 0.69 0.70 0.69 0.75 0.73 0.69 0.67

2 0.66 0.64 0.66 0.63 0.72 0.67 0.64 0.63

3 0.82 0.79 0.83 0.78 0.82 0.74 0.78 0.76

4 0.64 0.64 0.64 0.63 0.70 0.68 0.63 0.63

5 0.80 0.76 0.80 0.75 0.77 0.71 0.77 0.74

6 0.64 0.62 0.64 0.63 0.70 0.65 0.64 0.62

7 0.78 0.74 0.78 0.74 0.68 0.65 0.74 0.72

8 0.61 0.63 0.61 0.62 0.67 0.64 0.61 0.63

9 0.76 0.72 0.75 0.69 0.63 0.59 0.71 0.66

10 0.85 0.81 0.85 0.81 0.84 0.76 0.82 0.79

11 0.80 0.76 0.79 0.74 0.62 0.61 0.75 0.72

12 0.84 0.77 0.84 0.76 0.79 0.70 0.81 0.74

49

Table 5.11: S

Average AUC of proposed methods by outlier scenario and censoring amount c.

BHT DBHT

c = 0.2 c = 0.3 c = 0.2 c = 0.3

1 0.79 0.77 0.82 0.81

2 0.72 0.70 0.75 0.74

3 0.88 0.84 0.91 0.89

4 0.72 0.70 0.74 0.72

5 0.83 0.80 0.86 0.83

6 0.70 0.66 0.72 0.69

7 0.71 0.69 0.73 0.71

8 0.64 0.65 0.67 0.68

9 0.60 0.60 0.60 0.59

10 0.88 0.86 0.92 0.91

11 0.60 0.59 0.61 0.60

12 0.83 0.80 0.88 0.84

Analyzing the e↵ects of di↵erent baseline hazards on performance, the simulation results grouped by type

of baseline hazard are shown in tables 5.12, 5.13, 5.14, and 5.15. In terms of TPR neither the alternative and

proposed methods experienced significant changes between the three types of baseline hazards. Regarding

AUC the behavior is similar with no significant changes detected for any of the methods between types of

baseline hazard.

Table 5.12: Average TPR of the alternative methods by outlier scenario and baseline hazard(�, ⌫).

MART DEV LD DFB

(1, 1) (0.5, 1.5) (1.5, 0.5) (1, 1) (0.5, 1.5) (1.5, 0.5) (1, 1) (0.5, 1.5) (1.5, 0.5) (1, 1) (0.5, 1.5) (1.5, 0.5)

1 0.29 0.28 0.29 0.36 0.35 0.37 0.44 0.42 0.43 0.36 0.36 0.38

2 0.22 0.22 0.21 0.25 0.26 0.24 0.32 0.31 0.31 0.29 0.30 0.27

3 0.51 0.51 0.47 0.59 0.57 0.57 0.59 0.59 0.57 0.53 0.52 0.51

4 0.23 0.23 0.20 0.23 0.24 0.23 0.30 0.31 0.29 0.28 0.29 0.27

5 0.45 0.43 0.44 0.52 0.56 0.54 0.50 0.55 0.52 0.48 0.50 0.47

6 0.20 0.21 0.21 0.23 0.23 0.22 0.29 0.29 0.27 0.27 0.26 0.26

7 0.40 0.40 0.40 0.51 0.51 0.50 0.38 0.40 0.40 0.40 0.42 0.42

8 0.18 0.17 0.18 0.17 0.19 0.17 0.23 0.24 0.22 0.23 0.23 0.21

9 0.32 0.31 0.33 0.37 0.37 0.35 0.17 0.18 0.17 0.24 0.24 0.27

10 0.55 0.52 0.52 0.62 0.63 0.65 0.65 0.62 0.63 0.57 0.55 0.57

11 0.39 0.38 0.38 0.47 0.45 0.48 0.24 0.23 0.25 0.32 0.30 0.33

12 0.48 0.49 0.49 0.59 0.60 0.61 0.53 0.54 0.54 0.49 0.51 0.53

50

Table 5.13: Average TPR of the proposed methods by outlier scenario and baseline hazard(�, ⌫).

OSD BHT DBHT

(1, 1) (0.5, 1.5) (1.5, 0.5) (1, 1) (0.5, 1.5) (1.5, 0.5) (1, 1) (0.5, 1.5) (1.5, 0.5)

1 0.47 0.46 0.49 0.44 0.43 0.43 0.48 0.46 0.47

2 0.33 0.33 0.30 0.32 0.31 0.30 0.34 0.34 0.33

3 0.65 0.63 0.60 0.60 0.57 0.58 0.67 0.65 0.62

4 0.30 0.30 0.30 0.30 0.28 0.28 0.32 0.33 0.30

5 0.56 0.60 0.59 0.51 0.55 0.54 0.55 0.59 0.58

6 0.27 0.28 0.27 0.27 0.27 0.24 0.29 0.29 0.27

7 0.45 0.43 0.44 0.38 0.37 0.36 0.42 0.41 0.43

8 0.22 0.22 0.20 0.21 0.20 0.18 0.24 0.24 0.20

9 0.09 0.10 0.08 0.06 0.07 0.06 0.07 0.07 0.06

10 0.67 0.69 0.69 0.61 0.60 0.59 0.70 0.70 0.70

11 0.12 0.13 0.16 0.10 0.09 0.12 0.11 0.12 0.13

12 0.58 0.62 0.60 0.53 0.53 0.51 0.59 0.61 0.60

Table 5.14: Average AUC of alternative methods by outlier scenario and baseline hazard type (�, ⌫).

MART DEV LD DFB

(1, 1) (0.5, 1.5) (1.5, 0.5) (1, 1) (0.5, 1.5) (1.5, 0.5) (1, 1) (0.5, 1.5) (1.5, 0.5) (1, 1) (0.5, 1.5) (1.5, 0.5)

1 0.70 0.70 0.70 0.70 0.70 0.69 0.74 0.73 0.74 0.67 0.69 0.68

2 0.65 0.64 0.65 0.65 0.65 0.64 0.69 0.69 0.70 0.65 0.63 0.64

3 0.80 0.81 0.80 0.80 0.80 0.80 0.78 0.79 0.78 0.77 0.77 0.77

4 0.63 0.65 0.63 0.64 0.65 0.62 0.69 0.70 0.68 0.62 0.65 0.62

5 0.77 0.79 0.78 0.76 0.79 0.77 0.73 0.77 0.73 0.75 0.76 0.75

6 0.64 0.63 0.63 0.64 0.63 0.63 0.69 0.67 0.67 0.63 0.63 0.62

7 0.76 0.77 0.76 0.76 0.77 0.76 0.66 0.67 0.66 0.72 0.74 0.73

8 0.62 0.62 0.62 0.62 0.62 0.62 0.66 0.65 0.65 0.61 0.62 0.63

9 0.74 0.74 0.74 0.72 0.73 0.72 0.61 0.62 0.61 0.69 0.68 0.68

10 0.84 0.83 0.82 0.84 0.83 0.82 0.81 0.80 0.79 0.82 0.80 0.80

11 0.78 0.78 0.78 0.76 0.76 0.77 0.62 0.61 0.61 0.73 0.73 0.73

12 0.78 0.80 0.82 0.78 0.80 0.82 0.73 0.74 0.76 0.76 0.78 0.80

BHT and DBHT sensitivity to parameter B

The assessment was made using the outlier scenario where BHT and DBHT have shown median performance

(in terms of AUC and TPR), both corresponding to the outlier scenario # 7. For each of the 4 configurations

(baseline hazard fixed at � = 1, ⌫ = 1) 20 runs were made, making 80 runs for each value of B. The values

of B were increased by increments of 100, until there was no longer improvement in performance. The mean

values of TPR and AUC were taken, and are depicted in Figures 5.3, and 5.4. As expected from the previous

results, DBHT convergences much faster to the maximum performance than BHT, with the DBHT’s TPR

converging at about 400 bootstrap samples as compared with the BHT that converges only at B = 800. In

terms of AUC, DBHT converges to its maximum performance at about 400, as BHT at only about 800.

51

Table 5.15: Average AUC of proposed methods by outlier scenario and baseline hazard(�, ⌫).

BHT DBHT

(1, 1) (0.5, 1.5) (1.5, 0.5) (1, 1) (0.5, 1.5) (1.5, 0.5)

1 0.78 0.77 0.79 0.81 0.81 0.82

2 0.72 0.71 0.72 0.75 0.74 0.75

3 0.87 0.86 0.86 0.90 0.90 0.90

4 0.71 0.72 0.70 0.73 0.74 0.72

5 0.81 0.83 0.82 0.83 0.85 0.85

6 0.68 0.68 0.68 0.72 0.70 0.71

7 0.70 0.70 0.70 0.72 0.72 0.72

8 0.65 0.65 0.65 0.68 0.69 0.66

9 0.59 0.59 0.61 0.59 0.60 0.60

10 0.88 0.87 0.86 0.92 0.91 0.91

11 0.59 0.59 0.60 0.60 0.61 0.61

12 0.81 0.82 0.81 0.85 0.87 0.86

200 400 600 800 1000

0.30

0.35

0.40

0.45

B

TPR

200 400 600 800 1000

0.30

0.35

0.40

0.45

B

TPR

Figure 5.3: Evolution of TPR with parameter B, DBHT on blue and BHT on red.

52

200 400 600 800 1000

0.60

0.65

0.70

0.75

0.80

B

AUC

200 400 600 800 1000

0.60

0.65

0.70

0.75

0.80

B

AUC

Figure 5.4: Evolution of AUC with parameter B, DBHT on blue and BHT on red.

5.3 Worcester Heart Attack Study dataset

The dataset from the Worcester Heart Attack Study, contains data of 100 individuals each with 5 covariates.

This data concerns the survival times of patients having their first heart attack. This dataset is publicly

available at https://www.umass.edu/statdata/statdata/data/. The outliers detected by the methods for the

WHAS dataset are presented in Table 5.16. The selection corresponds to the fifteen observations with the

lowest p-values. The estimates for the regression coe�cients when fitting the Cox model to all observations

are given in Table 5.17, we observe that only two covariates are statistically significant: the age at the first

hear attack (Age) and the body mass index (BMI ). Performing a 5% trimming of outliers, the Cox estimates

53

Table 5.16: Top-15 outliers detected by the methods on the WHAS100 dataset.

# MART DEV LD DFB OSD BHT DBHT

1 93 1 97 8 1 67 1

2 51 31 67 97 67 1 67

3 90 56 1 93 97 78 97

4 33 85 52 52 51 56 56

5 11 97 23 30 23 69 23

6 27 93 7 10 31 8 90

7 40 30 57 78 93 45 93

8 1 78 78 7 52 93 8

9 31 51 56 56 56 30 78

10 56 90 17 32 57 32 51

11 85 67 29 54 7 13 29

12 97 91 31 98 30 33 69

13 46 11 91 57 13 47 72

14 69 23 30 51 78 51 30

15 30 27 32 90 8 97 17

54

Table 5.17: Cox model estimated with all WHAS observations.

� p-value

Los -0.0220 0.3967

Age 0.0386 0.0025

Gender 0.1558 0.6066

BMI -0.0711 0.0497

Table 5.18: Cox model fit on the WHAS dataset with 5% outlier trimming, using the alternative methods.

MART DEV LD DFB

Xi

�i

p �i

p �i

p �i

p

Los -0.0322 0.2074 -0.0194 0.4543 -0.0169 0.5012 -0.1001 0.0541

Age 0.0443 0.0006 0.0492 0.0005 0.0588 0.0001 0.0490 0.0003

Gender 0.5758 0.0559 0.0751 0.8154 -0.0316 0.9214 -0.0716 0.8253

BMI -0.0794 0.0286 -0.0967 0.0160 -0.0970 0.0156 -0.1305 0.0020

Table 5.19: Cox model fit on the WHAS dataset with 5% outlier trimming, using the proposed methods.

OSD BHT DBHT

Xi

�i

p �i

p �i

p

Los -0.0195 0.4412 -0.1048 0.0434 -0.0216 0.4260

Age 0.0525 0.0003 0.0485 0.0003 0.0555 0.0002

Gender 0.1408 0.6544 0.1786 0.5690 -0.0020 0.9951

BMI -0.1064 0.0067 -0.1214 0.0031 -0.1010 0.0122

Table 5.20: Cox model on the WHAS data with 10% outlier trimming for alternative methods.

MART DEV LD DFB

Xi

�i

p �i

p �i

p �i

p

Los -0.0399 0.1646 -0.0308 0.3014 -0.0158 0.5654 -0.1139 0.0358

Age 0.0523 0.0001 0.0481 0.0006 0.0690 0.0000 0.0503 0.0012

Gender 0.5013 0.1064 0.3778 0.2475 -0.2390 0.4821 -0.2272 0.5076

BMI -0.0936 0.0134 -0.1613 0.0002 -0.1458 0.0013 -0.1706 0.0004

55

Table 5.21: Cox model fit on the WHAS dataset with 10% outlier trimming, using the proposed methods.

OSD BHT DBHT

Xi

�i

p �i

p �i

p

Los -0.0252 0.3743 -0.1662 0.0062 -0.1298 0.0172

Age 0.0677 0.0000 0.0478 0.0000 0.0588 0.0001

Gender 0.0422 0.8983 0.0029 0.9921 0.2585 0.4247

BMI -0.1336 0.0024 -0.1620 0.0012 -0.1618 0.0003

5.4 Bone Marrow Transplant dataset

The Bone Marrow Transplant dataset (BMT) (Klein and Moeschberger, 2003) contains data about 137

leukemia patients each with 10 covariates. The data concerns the survival time after the bone marrow

transplant. It is publicly available in the R package “KMsurv” (by Klein et al., 2012). The outliers detected

by the methods in the BMT dataset are presented in Table 5.22. The observations presented correspond to

the ones with the 10% lowest p-values. For BHT and DBHT a value of bootstrap samples B = 2000 has

shown to be su�cient for convergence.

Table 5.22: Top-10% outliers detected by the methods on the BMT dataset.


1 65 129 129 65 129 129 129

2 103 35 132 26 132 103 132

3 99 108 89 129 30 99 99

4 97 65 90 99 130 65 65

5 13 132 26 2 26 30 130

6 42 87 30 6 28 132 103

7 63 84 28 89 65 13 30

8 40 103 130 43 13 130 89

9 92 30 17 84 103 16 13

10 14 99 105 103 14 136 28

11 43 97 136 130 72 15 14

12 39 28 116 132 89 26 105

13 49 109 72 30 50 97 116

14 10 80 36 10 99 131 90

When using all the data, the statistically significant covariates are FAB, Hospital and MTX (Table 5.23).

After identifying the most outlying observations with each method, we are able to perform outlier trimming

in order to make a more robust estimation of the Cox model. Starting with a trimming level of 5%, new

Cox models were estimated (Table 5.24 and Table 5.25) there are no major changes from the model with

56

all data, apart from the the coe�cients of Donor Age and CMV that experience a considerable reduction of

their p-value. With an outlier trimming level of 10%, we van verify that covariate Donor Age for two of the

Table 5.23: Cox model estimatation with all BMT data (coe�cients with p-value below 5% on bold).

Xi

�i

p-value

Age Diagn -0.0017 0.9357

Donor Age 0.0316 0.1072

Sex -0.2738 0.2651

Donor Sex 0.0409 0.8662

CMV -0.1701 0.4922

Donor CMV 0.0038 0.9875

Wait Time -0.0001 0.8701

FAB 0.7917 0.0012

Hospital -0.5570 0.0004

MTX 1.0062 0.0026

Table 5.24: Cox model estimations with 5% outlier trimming using the alternative methods.

MART DEV LD DFB

Xi

�i

p �i

p �i

p �i

p

Age Diagn 0.0057 0.7733 0.0184 0.4007 0.0087 0.6996 0.0022 0.9163

Donor Age 0.0338 0.0630 0.0194 0.3390 0.0352 0.0936 0.0353 0.0787

Donor Age -0.4350 0.0754 -0.3857 0.1318 -0.3110 0.2309 -0.2544 0.3129

Donor Sex -0.0826 0.7417 0.2214 0.3959 0.2268 0.3834 0.2096 0.4284

CMV -0.4239 0.1077 -0.3828 0.1580 -0.3386 0.1910 -0.4058 0.1264

Donor CMV 0.0150 0.9511 0.0326 0.8968 0.1254 0.6206 -0.0774 0.7585

Wait Time 0.0000 0.9538 0.0001 0.7803 -0.0003 0.5304 0.0005 0.3899

FAB 1.1493 0.0000 1.0162 0.0001 0.9587 0.0002 1.1592 0.0000

Hospital -0.6484 0.0000 -0.6842 0.0001 -1.0732 0.0001 -0.7807 0.0000

MTX 1.6598 0.0000 1.2379 0.0008 1.7474 0.0003 1.6405 0.0000

alternative methods now appears above the 5% significance level. Among our proposed methods, by trimming

their 13 most outliying observations from the data, resulted that in the three methods, the covariate CMV

now appears with a p-value below the 5% level.

57

Table 5.25: Cox model estimations with 5% outlier trimming using the proposed methods.

OSD BHT DBHT

Xi

�i

p �i

p �i

p

Age Diagn 0.0192 0.3837 0.0086 0.6762 0.0183 0.3779

Donor Age 0.0242 0.2356 0.0327 0.0910 0.0250 0.2001

Donor Age -0.3655 0.1508 -0.3076 0.2234 -0.2922 0.2524

Donor Sex4 0.3848 0.1510 0.0357 0.8904 0.1293 0.6206

CMV -0.5158 0.0574 -0.3469 0.1868 -0.4485 0.0919

Donor CMV 0.0064 0.9797 0.0514 0.8349 0.0265 0.9153

Wait Time -0.0003 0.5408 0.0001 0.7567 0.0001 0.6760

FAB 1.0752 0.0000 1.2288 0.0000 1.2365 0.0000

Hospital -0.8930 0.0000 -0.7683 0.0000 -0.8743 0.0000

MTX 1.6513 0.0001 1.6951 0.0000 1.8809 0.0000

Table 5.26: Cox model estimations with 10% outlier trimming using the alternative methods.

MART DEV LD DFB

Xi

�i

p �i

p �i

p �i

p

Age Diagn 0.0029 0.8845 0.0295 0.1748 0.0017 0.9421 0.0102 0.6457

Donor AgeS 0.0335 0.0778 0.0225 0.2738 0.0555 0.0144 0.0368 0.0849

Sex -0.3882 0.1233 -0.5748 0.0359 -0.3793 0.1673 -0.3072 0.2421

Donor Sex 0.0508 0.8399 0.2325 0.3946 0.3354 0.2309 0.2538 0.3634

CMV -0.4251 0.1097 -0.4897 0.0765 -0.3968 0.1484 -0.5183 0.0637

Donor CMV 0.0068 0.9777 0.1063 0.6800 -0.0026 0.9922 0.0010 0.9968

Wait Time -0.0001 0.7227 0.0003 0.3269 -0.0002 0.6049 0.0005 0.3632

FAB 1.1796 0.0000 1.2746 0.0000 1.0994 0.0000 1.3017 0.0000

Hospital -0.6857 0.0000 -0.7872 0.0000 -1.4023 0.0001 -0.9851 0.0000

MTX 1.9100 0.0000 1.5219 0.0001 2.2493 0.0002 2.0618 0.0000

5.5 CancerSys Dataset

The CancerSys dataset (CSYS) contains data about 161 cancer patients (91 after removing missing values)

with bone metastasis, the recorded time corresponds to the follow-up time after bone metastasis have been

diagnosed. The top-10% outliers detected by the methods are displayed in Table 5.28. Fitting a Cox model to

all observations (see Table 5.29) we may verify that only two covariates are above the 5% level of significance:

AgeDiagn and ExtraMets. After doinf 5% outlier trimming of the data and then fitting a Cox model (see

Tables 5.30 and 5.31) we observe important changes, trimming for three of the alternative methods, resulted

on the covariate Sex becoming significant; for all three of our developed methods, removing the 4 most

58

Table 5.27: Cox model estimations with 10% outlier trimming using the proposed methods.

OSD BHT DBHT

Xi

�i

p �i

p �i

p

Age Diagn 0.0278 0.1945 -0.0174 0.4183 0.0226 0.2863

Donor Age 0.0124 0.5390 0.0332 0.0972 0.0288 0.1394

Sex -0.4453 0.0873 -0.4120 0.1148 -0.4235 0.1124

Donor Sex 0.2619 0.3427 0.0760 0.7804 0.1546 0.5717

CMV -0.6221 0.0263 -0.5407 0.0466 -0.5840 0.0378

Donor CMV 0.0597 0.8164 -0.0239 0.9256 0.1857 0.4689

Wait Time -0.0002 0.7155 0.0001 0.6230 0.0002 0.5920

FAB 1.2761 0.0000 1.2596 0.0000 1.2792 0.0000

Hospital -1.3875 0.0000 -0.9910 0.0000 -1.5141 0.0000

MTX 3.0082 0.0000 2.1267 0.0000 3.0513 0.0000

outlying observations resulted in having covariate XRayPattern emerging as significant. It is noteworthy that

for the DBHT only by removing 4 observations in 91, the p-value of XRayPattern decreased from 0.9 to 0.002.

Table 5.28: Top-15 outliers detected by the methods on the CSYS dataset.


1 68 60 49 68 83 83 83

2 83 143 112 78 126 112 126

3 34 68 60 34 84 22 112

4 78 110 124 53 91 49 124

5 53 64 22 57 112 47 28

6 97 124 28 23 49 124 49

7 126 112 32 126 22 126 60

8 23 78 62 83 62 62 91

9 91 53 83 9 60 28 73

10 57 126 140 97 124 91 62

59

Table 5.29: Cox model fitted to all CSYS data.

Xi

�i

p-value

Sex 0.2848 0.2658

AgeDiagn 0.0268 0.0034

XRayPattern -0.2442 0.0997

NSRE -0.0565 0.6210

ExtraMets 0.8031 0.0035

NTXBase 0.0004 0.2406

Table 5.30: Cox model fit with 5% outlier trimming using the alternative methods.

MART DEV LD DFB

Xi

�i

p �i

p �i

p �i

p

Sex 0.6734 0.0130 0.5709 0.0396 0.2684 0.3174 0.8407 0.0029

AgeDiagn 0.0209 0.0450 0.0198 0.0567 0.0311 0.0012 0.0147 0.1627

XRayPattern -0.2175 0.1766 -0.2532 0.1017 -0.3116 0.0428 -0.1344 0.3911

NSRE -0.0648 0.5530 -0.0858 0.4680 -0.0096 0.9339 -0.0275 0.8017

ExtraMets 1.0720 0.0002 0.9979 0.0006 1.0453 0.0005 1.1404 0.0001

NTXBase 0.0002 0.5102 0.0004 0.1568 0.0005 0.1512 0.0005 0.1436

Table 5.31: Cox model fit with 5% outlier trimming using the proposed methods.

OSD BHT DBHT

Xi

�i

p �i

p �i

p

Sex 0.1289 0.6167 0.2300 0.3803 0.2739 0.2917

AgeDiagn 0.0351 0.0002 0.0306 0.0012 0.0344 0.0003

XRayPattern -0.4881 0.0036 -0.3423 0.0282 -0.4292 0.0073

NSRE -0.1377 0.2560 -0.0461 0.6838 -0.0842 0.4665

ExtraMets 1.0456 0.0003 1.0025 0.0008 0.9654 0.0008

NTXBase 0.0003 0.2913 0.0005 0.1506 0.0004 0.2524

When performing a 10% level of outlier trimming, the MART method that with a 5% returned Sex

as significant backs again considering Sex non significant again. Our proposed methods maintain their

consistency, all three present the same significant covariates, and the coe�cient estimations are similar between

them.

60

Table 5.32: Cox model with 10% outlier trimming for alternative methods.

MART DEV LD DFB

Xi

�i

p �i

p �i

p �i

p

Sex 0.3642 0.1902 0.8061 0.0052 0.2234 0.4153 0.6558 0.0218

AgeDiagn 0.0365 0.0012 0.0158 0.1367 0.0352 0.0004 0.0289 0.0152

XRayPattern -0.3291 0.0502 -0.1408 0.4040 -0.3998 0.0136 -0.3261 0.0503

NSRE -0.1056 0.3328 -0.0105 0.9290 -0.0332 0.7764 -0.0188 0.8842

ExtraMets 1.0650 0.0004 1.2823 0.0000 1.2264 0.0001 0.9939 0.0011

NTXBase 0.0001 0.6687 0.0004 0.2584 0.0005 0.1074 0.0004 0.3105

Table 5.33: Cox model with 10% outlier trimming for proposed methods.

OSD BHT DBHT

Xi

�i

p �i

p �i

p

1 -0.0204 0.9400 0.1799 0.5039 0.2191 0.4228

2 0.0398 0.0001 0.0379 0.0001 0.0394 0.0001

3 -0.5477 0.0015 -0.4688 0.0042 -0.5169 0.0022

4 -0.1258 0.3132 -0.0831 0.4848 -0.0724 0.5272

5 1.3058 0.0000 1.1139 0.0003 1.1872 0.0001

6 0.0005 0.1512 0.0005 0.1541 0.0004 0.2351

Leave-one-out Cross-validation of the c-index

To assess the predictive ability of the model when facing new observations, we perform leave-one-out cross-

validation of the c-index. In this results the outliers also become part of the several test sets, but they are

never present in the training used to estimate the models, three types of outlier trimming were used: removal

of the 3 most oulying, 10 most outlying and the 30 more outlying observations, this leave-one-out values

obtained for the three real datasets are presented for each method in tables 5.34, 5.35, and 5.36. The results

are very positive for the three methods, with the concordance showing a steady increase while removing the

most outlying observations of each dataset.

Table 5.34: Leave-one-out c-indexes for the BHT method.

Dataset All data top-3 top-10 top-30

WHAS 0.6607 0.6813 0.6824 0.6900

BMT 0.6208 0.6314 0.6441 0.6668

CSYS 0.5963 0.6053 0.6147 0.6186

61

Table 5.35: Leave-one-out c-indexes for the DBHT procedure.


WHAS 0.6607 0.6710 0.6807 0.6910

BMT 0.6208 0.6288 0.6462 0.6630

CSYS 0.5963 0.6114 0.6160 0.6240

Table 5.36: Leave-one-out c-indexes for the OSD procedure.


WHAS 0.6607 0.6832 0.6853 0.6986

BMT 0.6208 0.6314 0.6441 0.6629

CSYS 0.5963 0.6100 0.6214 0.6196

62

Chapter 6

Conclusions and Future Work

We proposed three methods for outlier detection in a survival context. In our simulation study the methods

have achieved in general, a better performance when compared with alternative methods. Overall, DBHT

has shown promising results, being the best method in nine of the twelve simulated outlier scenarios. On

the three scenarios where the outlier source is colinear with the general model, the performance is poor for

all of our proposed methods. One possible cause is that concordance fails to capture these type of outliers,

given they have the same hazard direction as the inliers. We also have verified that the performance of the

proposed methods is relatively robust to changes in scenario conditions, when compared to the alternative

methods. On the real datasets, performing outlier trimming prior to fitting a Cox model looks promising,

since it potentially allows to unveil other Cox regression, that might have been distorted by the presence of

outliers in the original data. In terms of increasing the model’s predictive ability, the leave-one-out c-indexes

increased when excluding the most outlying observation from the fit.

In terms of future work, the developed methods leave room for many extensions, in particular given that

the methods use a survival model as a black-box (only used to fit and calculate concordance) they allow

several models to be used, instead of only using the Cox model as we did in this work. The bootstrapping

frameworks of BHT and DBHT, can also be applied to perform outlier detection on other kinds of data, the

methods just need one test statistic that is believed to be sensitive to outliers. In conclusion, is shown once

more that outlier detection methods can provide tools for performing robust regression and improve model

interpretability and accuracy in many fields of research.

63

Bibliography

Aalen, O., Borgan, O., and Gjessing, H. (2008). Survival and event history analysis: a process point of view.

Springer Science & Business Media.

Bednarski, T. (1993). Robust estimation in cox’s regression model. Scandinavian Journal of Statistics, pages

213–225.

Bednarski, T. and Borowicz, F. (2006). coxrobust: Robust Estimation in Cox Model. R package version 1.0.

Ben-Gal, I. (2005). Outlier detection. In Data Mining and Knowledge Discovery Handbook, pages 131–146.

Springer.

Bender, R., Augustin, T., and Blettner, M. (2005). Generating survival times to simulate cox proportional

hazards models. Statistics in medicine, 24(11):1713–1723.

Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984). Classification and regression trees. CRC

press.

Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika, 66(3):429–436.

by Klein, O., Moeschberger, and modifications by Jun Yan (2012). KMsurv: Data sets from Klein and

Moeschberger (1997), Survival Analysis. R package version 0.1-5.

Cox, D. R. (1972). Regression Models and Life Tables. Journal of the Royal Statistic Society, B(34):187–202.

Cox, D. R. and Snell, E. J. (1968). A general definition of residuals. Journal of the Royal Statistical Society.

Series B (Methodological), pages 248–275.

Crowley, J. and Hu, M. (1977). Covariance analysis of heart transplant survival data. Journal of the American

Statistical Association, 72(357):27–36.

David Collett (2003). Modelling survival data in medical research. Boca Raton, Fla. : Chapman &

Hall/CRC, c2003.

David G. Kleinbaum, Mitchel Klein (2005). Survival analysis: a self-learning text. New York, NY : Springer,

c2005.

Davies, L. and Gather, U. (1993). The identification of multiple outliers. Journal of the American Statistical

Association, 88(423):782–792.

Donoho, D. L. and Huber, P. J. (1983). The notion of breakdown point. A Festschrift for Erich L. Lehmann,

157184.

64

Efron, B. (1979). Bootstrap methods: another look at the jackknife. The annals of Statistics, pages 1–26.

Efron, B. and Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC press.

Farcomeni, A. and Viviani, S. (2011). Robust estimation for the cox regression model based on trimming.

Biometrical Journal, 53(6):956–973.

Fischler, M. and Bolles, R. (1981). Random Sample Consensus: A Paradigm for Model Fitting with Applica-

tions to Image Analysis and Automated Cartography. Communications of the ACM.

Goeman, J. J. (2010). L1 penalized estimation in the cox proportional hazards model. Biometrical Journal,

(52):–14.

Goeman, J. J. (2012). Penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs

and in the Cox model. R package version.

Grambsch, P. M. and Therneau, T. M. (1994). Proportional hazards tests and diagnostics based on weighted

residuals. Biometrika, 81(3):515–526.

Harrell, F. E. (2001). Regression modeling strategies: with applications to linear models, logistic regression,

and survival analysis. Springer.

Harrell, F. E., Cali↵, R. M., Pryor, D. B., Lee, K. L., and Rosati, R. A. (1982). Evaluating the yield of medical

tests. Jama, 247(18):2543–2546.

Hawkins, D. M. (1980). Identification of outliers, volume 11. Springer.

Heagerty, P. J., Lumley, T., and Pepe, M. S. (2000). Time-dependent roc curves for censored survival data

and a diagnostic marker. Biometrics, 56(2):337–344.

Heagerty, P. J. and packaging by Paramita Saha-Chaudhuri (2013). survivalROC: Time-dependent ROC curve

estimation from censored survival data. R package version 1.0.3.

Heller, G. and Simono↵, J. S. (1990). A comparison of estimators for regression with a censored response

variable. Biometrika, 77(3):515–520.

Huber, P. J. (2011). Robust statistics. Springer.

Johnson, R. A., Wichern, D. W., and Education, P. (1992). Applied multivariate statistical analysis, volume 4.

Prentice hall Englewood Cli↵s, NJ.

Kalbfleisch, J. D. and Prentice, R. L. (2011). The statistical analysis of failure time data, volume 360. John

Wiley & Sons.

Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the

American statistical association, 53(282):457–481.

Kendall, M. and Gibbons, J. D. (1990). Rank Correlation Methods. A Charles Gri�n Title, 5 edition.

Klein, J. P. and Moeschberger, M. L. (2003). Survival Analysis Techniques for Censored and Truncated Data.

Second edition.

Larson, M. G. and Dinse, G. E. (1985). A mixture model for the regression analysis of competing risks data.

Applied statistics, pages 201–211.

65

Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. John Wiley & Sons, 2nd edition.

Leblanc, M. and Crowley, J. (1993). Survival trees by goodness of split. Journal of the American Statistical

Association, 88(422):457–467.

Lunn, M. and McNeil, D. (1995). Applying cox regression to competing risks. Biometrics, pages 524–532.

MacKinnon, J. G. (2009). Bootstrap hypothesis testing. Handbook of Computational Econometrics, pages

183–213.

Mikosch, T. (1998). Elementary stochastic calculus, with finance in view. AMC, 10:12.

Miller, R. and Halpern, J. (1982). Regression with censored data. Biometrika, 69(3):521–531.

Miller, R. G. (1976). Least squares regression with censored data. Biometrika, 63(3):449–464.

Nardi, A. and Schemper, M. (1999). New residuals for cox regression and their application to outlier screening.

Biometrics, 55(2):523–529.

Newson, R. (2006). Confidence intervals for rank statistics: Somers’ d and extensions. Stata Journal, 6(3):309.

Ojo, A. O., Hanson, J. A., Wolfe, R. A., Leichtman, A. B., Agodoa, L. Y., and Port, F. K. (2000). Long-term

survival in renal transplant recipients with graft function. Kidney international, 57(1):307–313.

Rajagopalan, V. (2006). Selected Statistical Tests. New Age International.

Rocha, C. (2011). Examining time to rearrest by drug treatment experience of drug court eligible o↵enders.

Rousseeuw, P. J. and Leroy, A. M. (2005). Robust regression and outlier detection, volume 589. John Wiley

& Sons.

Singh, K. and Xie, M. (2003). Bootlier-Plot: Bootstrap Based Outlier Detection Plot. Sankhya: The Indian

Journal of Statistics (2003-2007), 65(3):532–559.

Somers, R. H. (1962). A new asymmetric measure of association for ordinal variables. American sociological

review, pages 799–811.

Stare, J., Heinzl, H., and Harrell, F. (2000). On the use of buckley and james least squares regression for

survival data. New approaches in applied statistics: Metodoloski zvezki, 16.

Struthers, C. A. and Kalbfleisch, J. D. (1986). Misspecified proportional hazard models. Biometrika,

73(2):363–369.

Therneau, T. M. (2014). A Package for Survival Analysis in S. R package version 2.37-7.

Therneau, T. M. and Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. Springer.

Therneau, T. M., Grambsch, P. M., and Fleming, T. R. (1990). Martingale-based residuals for survival models.

Biometrika, 77(1):147–160.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society.

Series B (Methodological), pages 267–288.

66

Appendix A: Results on the simulation

data

Table 6.1: TPR of each method in the 12 outlier scenarios. c = 0.2; k = 5; � = 1; ⌫ = 1.

# Scenario MART DEV LD DFB OSD BHT DBHT

1 0.28(0.19) 0.39(0.19) 0.43(0.19) 0.37(0.16) 0.44(0.21) 0.42(0.16) 0.44(0.21)

2 0.2(0.16) 0.22(0.17) 0.28(0.19) 0.27(0.15) 0.26(0.19) 0.28(0.18) 0.28(0.18)

3 0.44(0.22) 0.59(0.16) 0.62(0.18) 0.55(0.17) 0.62(0.19) 0.58(0.19) 0.63(0.19)

4 0.21(0.14) 0.23(0.14) 0.32(0.14) 0.25(0.15) 0.28(0.17) 0.3(0.15) 0.31(0.16)

5 0.41(0.2) 0.54(0.17) 0.53(0.19) 0.49(0.19) 0.54(0.2) 0.51(0.19) 0.56(0.21)

6 0.2(0.16) 0.24(0.14) 0.31(0.18) 0.28(0.17) 0.25(0.17) 0.28(0.17) 0.28(0.18)

7 0.3(0.19) 0.54(0.17) 0.41(0.17) 0.38(0.15) 0.4(0.2) 0.39(0.19) 0.41(0.2)

8 0.16(0.13) 0.13(0.13) 0.24(0.16) 0.22(0.13) 0.18(0.17) 0.17(0.14) 0.2(0.17)

9 0.28(0.17) 0.39(0.18) 0.17(0.15) 0.23(0.16) 0.05(0.1) 0.04(0.08) 0.06(0.1)

10 0.53(0.19) 0.7(0.19) 0.72(0.18) 0.64(0.16) 0.72(0.19) 0.62(0.16) 0.73(0.18)

11 0.38(0.2) 0.49(0.17) 0.25(0.14) 0.32(0.15) 0.06(0.13) 0.07(0.13) 0.06(0.11)

12 0.39(0.22) 0.64(0.2) 0.61(0.2) 0.53(0.18) 0.6(0.19) 0.54(0.19) 0.59(0.2)

67

Table 6.2: AUC of each method in the 12 outlier scenarios. c = 0.2; k = 5; � = 1; ⌫ = 1.

# Scenario MART DEV LD DFB BHT DBHT

1 0.71(0.13) 0.72(0.14) 0.74(0.14) 0.7(0.13) 0.77(0.15) 0.79(0.15)

2 0.66(0.12) 0.66(0.13) 0.71(0.15) 0.65(0.13) 0.71(0.14) 0.75(0.13)

3 0.82(0.13) 0.83(0.13) 0.83(0.13) 0.8(0.14) 0.86(0.11) 0.91(0.08)

4 0.65(0.13) 0.67(0.12) 0.71(0.11) 0.62(0.12) 0.72(0.12) 0.73(0.13)

5 0.8(0.12) 0.8(0.12) 0.78(0.13) 0.78(0.11) 0.82(0.13) 0.85(0.1)

6 0.66(0.11) 0.66(0.11) 0.74(0.12) 0.67(0.12) 0.72(0.13) 0.75(0.13)

7 0.77(0.13) 0.8(0.13) 0.71(0.14) 0.74(0.12) 0.7(0.12) 0.73(0.13)

8 0.61(0.1) 0.62(0.1) 0.7(0.11) 0.61(0.12) 0.66(0.12) 0.67(0.13)

9 0.76(0.12) 0.76(0.12) 0.65(0.13) 0.71(0.13) 0.61(0.1) 0.61(0.09)

10 0.9(0.11) 0.9(0.1) 0.88(0.14) 0.88(0.12) 0.91(0.09) 0.95(0.06)

11 0.82(0.12) 0.82(0.12) 0.66(0.11) 0.76(0.13) 0.59(0.09) 0.62(0.1)

12 0.8(0.15) 0.82(0.15) 0.81(0.16) 0.8(0.14) 0.84(0.12) 0.88(0.1)



1 0.32(0.11) 0.37(0.14) 0.48(0.13) 0.37(0.12) 0.53(0.12) 0.51(0.12) 0.55(0.12)

2 0.22(0.1) 0.28(0.09) 0.38(0.11) 0.3(0.12) 0.39(0.11) 0.39(0.14) 0.39(0.12)

3 0.59(0.13) 0.6(0.15) 0.63(0.15) 0.51(0.12) 0.7(0.14) 0.67(0.11) 0.71(0.11)

4 0.23(0.1) 0.22(0.12) 0.31(0.13) 0.29(0.1) 0.32(0.13) 0.31(0.13) 0.35(0.11)

5 0.52(0.16) 0.51(0.14) 0.49(0.14) 0.51(0.1) 0.6(0.14) 0.53(0.15) 0.57(0.12)

6 0.21(0.13) 0.25(0.11) 0.29(0.14) 0.29(0.12) 0.29(0.14) 0.29(0.13) 0.3(0.14)

7 0.46(0.11) 0.51(0.14) 0.35(0.12) 0.4(0.08) 0.5(0.15) 0.4(0.14) 0.45(0.15)

8 0.18(0.07) 0.17(0.1) 0.25(0.13) 0.22(0.09) 0.23(0.15) 0.21(0.12) 0.25(0.12)

9 0.33(0.11) 0.42(0.15) 0.16(0.1) 0.28(0.1) 0.14(0.09) 0.08(0.09) 0.08(0.07)

10 0.6(0.12) 0.62(0.14) 0.66(0.15) 0.55(0.11) 0.68(0.17) 0.65(0.13) 0.73(0.1)

11 0.39(0.14) 0.5(0.14) 0.26(0.13) 0.35(0.12) 0.17(0.09) 0.12(0.09) 0.15(0.09)

12 0.55(0.12) 0.61(0.13) 0.53(0.16) 0.51(0.13) 0.62(0.16) 0.54(0.18) 0.62(0.16)

68



1 0.71(0.08) 0.71(0.08) 0.76(0.1) 0.67(0.1) 0.81(0.07) 0.84(0.07)

2 0.64(0.09) 0.63(0.09) 0.72(0.11) 0.63(0.1) 0.74(0.11) 0.74(0.11)

3 0.82(0.1) 0.81(0.1) 0.8(0.1) 0.77(0.11) 0.89(0.06) 0.91(0.07)

4 0.6(0.08) 0.61(0.08) 0.68(0.09) 0.58(0.07) 0.71(0.1) 0.72(0.09)

5 0.79(0.1) 0.77(0.1) 0.72(0.11) 0.76(0.1) 0.82(0.08) 0.84(0.07)

6 0.64(0.09) 0.63(0.08) 0.68(0.1) 0.62(0.08) 0.67(0.09) 0.7(0.1)

7 0.74(0.1) 0.75(0.09) 0.62(0.1) 0.68(0.1) 0.7(0.11) 0.73(0.11)

8 0.6(0.06) 0.6(0.06) 0.66(0.09) 0.57(0.07) 0.63(0.09) 0.66(0.1)

9 0.76(0.08) 0.74(0.1) 0.61(0.09) 0.7(0.08) 0.55(0.07) 0.56(0.07)

10 0.85(0.07) 0.85(0.08) 0.83(0.1) 0.8(0.07) 0.89(0.06) 0.92(0.05)

11 0.78(0.09) 0.77(0.08) 0.6(0.06) 0.73(0.11) 0.58(0.06) 0.58(0.06)

12 0.82(0.1) 0.8(0.1) 0.73(0.11) 0.79(0.09) 0.81(0.1) 0.86(0.1)



1 0.28(0.18) 0.3(0.17) 0.39(0.18) 0.33(0.16) 0.41(0.19) 0.39(0.18) 0.42(0.18)

2 0.21(0.16) 0.22(0.15) 0.28(0.19) 0.28(0.16) 0.29(0.2) 0.26(0.17) 0.3(0.19)

3 0.44(0.21) 0.6(0.17) 0.61(0.17) 0.54(0.17) 0.62(0.18) 0.58(0.17) 0.66(0.16)

4 0.22(0.16) 0.22(0.17) 0.24(0.18) 0.28(0.17) 0.24(0.19) 0.27(0.19) 0.28(0.2)

5 0.36(0.19) 0.5(0.21) 0.52(0.21) 0.46(0.17) 0.52(0.2) 0.48(0.19) 0.5(0.19)

6 0.18(0.15) 0.18(0.17) 0.26(0.19) 0.24(0.17) 0.22(0.2) 0.21(0.16) 0.26(0.2)

7 0.35(0.19) 0.47(0.19) 0.41(0.18) 0.4(0.16) 0.42(0.21) 0.34(0.2) 0.4(0.2)

8 0.18(0.16) 0.17(0.15) 0.21(0.16) 0.23(0.16) 0.21(0.17) 0.18(0.14) 0.23(0.17)

9 0.32(0.17) 0.31(0.18) 0.15(0.16) 0.22(0.17) 0.04(0.08) 0.04(0.08) 0.04(0.09)

10 0.5(0.2) 0.6(0.18) 0.66(0.16) 0.56(0.13) 0.63(0.15) 0.58(0.18) 0.68(0.15)

11 0.38(0.19) 0.43(0.19) 0.21(0.15) 0.28(0.16) 0.08(0.13) 0.09(0.12) 0.09(0.13)

12 0.43(0.21) 0.57(0.19) 0.5(0.19) 0.48(0.19) 0.52(0.21) 0.5(0.2) 0.57(0.18)

69



1 0.69(0.12) 0.68(0.12) 0.73(0.12) 0.67(0.13) 0.77(0.11) 0.81(0.13)

2 0.66(0.13) 0.67(0.14) 0.67(0.14) 0.69(0.12) 0.71(0.13) 0.74(0.14)

3 0.81(0.14) 0.81(0.15) 0.79(0.13) 0.78(0.14) 0.87(0.1) 0.89(0.08)

4 0.66(0.11) 0.65(0.11) 0.72(0.11) 0.65(0.13) 0.72(0.12) 0.73(0.15)

5 0.75(0.15) 0.75(0.16) 0.76(0.17) 0.74(0.15) 0.8(0.14) 0.83(0.13)

6 0.65(0.12) 0.65(0.12) 0.68(0.12) 0.63(0.13) 0.67(0.13) 0.72(0.12)

7 0.74(0.14) 0.74(0.14) 0.7(0.13) 0.72(0.14) 0.71(0.13) 0.72(0.14)

8 0.66(0.11) 0.65(0.11) 0.65(0.13) 0.65(0.13) 0.66(0.13) 0.7(0.12)

9 0.72(0.14) 0.7(0.13) 0.61(0.12) 0.68(0.13) 0.62(0.1) 0.61(0.08)

10 0.84(0.12) 0.84(0.11) 0.83(0.13) 0.83(0.12) 0.88(0.08) 0.92(0.06)

11 0.76(0.14) 0.75(0.12) 0.64(0.1) 0.73(0.13) 0.59(0.11) 0.61(0.09)

12 0.77(0.15) 0.76(0.15) 0.74(0.15) 0.75(0.14) 0.81(0.12) 0.84(0.12)



1 0.29(0.11) 0.37(0.12) 0.45(0.14) 0.36(0.11) 0.49(0.14) 0.43(0.14) 0.48(0.13)

2 0.26(0.1) 0.26(0.1) 0.34(0.13) 0.32(0.13) 0.37(0.15) 0.33(0.13) 0.37(0.14)

3 0.57(0.14) 0.56(0.12) 0.52(0.13) 0.5(0.12) 0.67(0.13) 0.58(0.1) 0.67(0.12)

4 0.24(0.1) 0.26(0.1) 0.33(0.13) 0.31(0.11) 0.35(0.12) 0.34(0.13) 0.35(0.13)

5 0.5(0.13) 0.52(0.13) 0.44(0.17) 0.47(0.11) 0.56(0.16) 0.51(0.13) 0.57(0.14)

6 0.23(0.11) 0.25(0.11) 0.3(0.11) 0.29(0.11) 0.33(0.12) 0.3(0.12) 0.32(0.11)

7 0.47(0.15) 0.51(0.14) 0.37(0.13) 0.41(0.12) 0.47(0.13) 0.37(0.12) 0.45(0.11)

8 0.2(0.09) 0.22(0.12) 0.24(0.14) 0.25(0.11) 0.27(0.12) 0.26(0.12) 0.28(0.11)

9 0.33(0.14) 0.37(0.13) 0.19(0.14) 0.25(0.13) 0.13(0.12) 0.07(0.09) 0.09(0.1)

10 0.56(0.13) 0.57(0.15) 0.55(0.14) 0.54(0.11) 0.66(0.14) 0.61(0.12) 0.67(0.12)

11 0.42(0.16) 0.45(0.15) 0.25(0.15) 0.34(0.14) 0.17(0.15) 0.14(0.14) 0.16(0.14)

12 0.55(0.14) 0.54(0.15) 0.46(0.13) 0.45(0.13) 0.59(0.15) 0.52(0.14) 0.59(0.12)

70



1 0.67(0.09) 0.67(0.09) 0.72(0.11) 0.64(0.11) 0.76(0.09) 0.82(0.07)

2 0.63(0.09) 0.64(0.09) 0.67(0.09) 0.63(0.1) 0.72(0.09) 0.75(0.09)

3 0.78(0.11) 0.76(0.11) 0.69(0.13) 0.74(0.11) 0.84(0.07) 0.89(0.05)

4 0.62(0.09) 0.61(0.09) 0.66(0.09) 0.61(0.09) 0.7(0.09) 0.72(0.09)

5 0.74(0.08) 0.73(0.08) 0.68(0.09) 0.71(0.09) 0.79(0.08) 0.82(0.09)

6 0.61(0.09) 0.62(0.08) 0.64(0.11) 0.6(0.09) 0.68(0.09) 0.71(0.08)

7 0.77(0.08) 0.75(0.09) 0.61(0.11) 0.73(0.09) 0.67(0.1) 0.71(0.09)

8 0.6(0.08) 0.6(0.08) 0.64(0.08) 0.61(0.09) 0.64(0.1) 0.68(0.09)

9 0.72(0.1) 0.67(0.11) 0.57(0.08) 0.65(0.09) 0.6(0.08) 0.6(0.08)

10 0.78(0.11) 0.77(0.11) 0.7(0.11) 0.76(0.11) 0.86(0.08) 0.9(0.07)

11 0.75(0.11) 0.71(0.12) 0.59(0.08) 0.7(0.12) 0.58(0.08) 0.59(0.08)

12 0.74(0.11) 0.73(0.1) 0.64(0.1) 0.7(0.1) 0.8(0.07) 0.83(0.08)

Table 6.9: TPR of each method in the 12 outlier scenarios. c = 0.2; k = 5; � = 0.5; ⌫ = 1.5.


1 0.24(0.16) 0.35(0.2) 0.38(0.21) 0.33(0.14) 0.42(0.22) 0.41(0.22) 0.42(0.2)

2 0.2(0.16) 0.25(0.14) 0.29(0.19) 0.3(0.2) 0.32(0.2) 0.27(0.19) 0.32(0.19)

3 0.52(0.2) 0.6(0.15) 0.69(0.14) 0.58(0.14) 0.64(0.25) 0.64(0.22) 0.7(0.2)

4 0.25(0.18) 0.25(0.18) 0.26(0.2) 0.28(0.16) 0.24(0.22) 0.28(0.19) 0.28(0.2)

5 0.35(0.24) 0.6(0.21) 0.62(0.16) 0.56(0.14) 0.64(0.2) 0.6(0.19) 0.62(0.18)

6 0.21(0.15) 0.22(0.19) 0.3(0.18) 0.27(0.18) 0.27(0.13) 0.28(0.19) 0.29(0.19)

7 0.34(0.17) 0.55(0.19) 0.46(0.15) 0.45(0.13) 0.45(0.17) 0.43(0.16) 0.42(0.17)

8 0.16(0.19) 0.15(0.16) 0.2(0.17) 0.19(0.18) 0.16(0.19) 0.16(0.12) 0.2(0.15)

9 0.23(0.15) 0.37(0.22) 0.2(0.18) 0.19(0.14) 0.06(0.11) 0.02(0.06) 0.02(0.06)

10 0.4(0.23) 0.65(0.18) 0.69(0.2) 0.51(0.2) 0.68(0.19) 0.57(0.2) 0.69(0.19)

11 0.34(0.16) 0.41(0.21) 0.21(0.15) 0.28(0.15) 0.08(0.12) 0.06(0.13) 0.08(0.12)

12 0.45(0.21) 0.69(0.18) 0.63(0.12) 0.55(0.16) 0.67(0.16) 0.57(0.16) 0.67(0.16)

71

Table 6.10: AUC of each method in the 12 outlier scenarios. c = 0.2; k = 5; � = 0.5; ⌫ = 1.5.


1 0.68(0.13) 0.69(0.14) 0.72(0.13) 0.7(0.13) 0.77(0.12) 0.8(0.11)

2 0.67(0.11) 0.69(0.12) 0.73(0.14) 0.65(0.15) 0.69(0.14) 0.75(0.12)

3 0.85(0.12) 0.86(0.11) 0.87(0.12) 0.83(0.11) 0.92(0.07) 0.93(0.06)

4 0.69(0.14) 0.68(0.12) 0.69(0.15) 0.67(0.12) 0.74(0.13) 0.74(0.15)

5 0.81(0.13) 0.83(0.14) 0.83(0.16) 0.82(0.14) 0.85(0.13) 0.87(0.11)

6 0.65(0.1) 0.67(0.12) 0.67(0.15) 0.67(0.13) 0.71(0.13) 0.71(0.11)

7 0.8(0.13) 0.81(0.14) 0.74(0.17) 0.77(0.13) 0.76(0.14) 0.76(0.15)

8 0.64(0.12) 0.63(0.11) 0.68(0.14) 0.63(0.14) 0.66(0.12) 0.69(0.13)

9 0.75(0.13) 0.76(0.13) 0.65(0.14) 0.72(0.11) 0.62(0.12) 0.63(0.13)

10 0.83(0.12) 0.85(0.14) 0.86(0.13) 0.81(0.13) 0.88(0.12) 0.91(0.09)

11 0.77(0.11) 0.77(0.12) 0.63(0.1) 0.75(0.13) 0.62(0.1) 0.63(0.11)

12 0.85(0.11) 0.86(0.12) 0.83(0.11) 0.84(0.1) 0.85(0.11) 0.92(0.07)



1 0.3(0.12) 0.35(0.13) 0.43(0.13) 0.37(0.12) 0.48(0.14) 0.44(0.11) 0.49(0.14)

2 0.22(0.1) 0.28(0.12) 0.37(0.15) 0.32(0.12) 0.36(0.14) 0.33(0.15) 0.39(0.14)

3 0.52(0.18) 0.59(0.13) 0.62(0.14) 0.5(0.13) 0.69(0.13) 0.6(0.13) 0.69(0.14)

4 0.22(0.12) 0.23(0.09) 0.36(0.12) 0.29(0.07) 0.35(0.12) 0.32(0.12) 0.36(0.11)

5 0.46(0.15) 0.52(0.13) 0.57(0.16) 0.45(0.13) 0.62(0.11) 0.58(0.13) 0.62(0.12)

6 0.22(0.1) 0.28(0.09) 0.36(0.1) 0.28(0.09) 0.34(0.1) 0.36(0.13) 0.36(0.09)

7 0.45(0.14) 0.53(0.12) 0.41(0.11) 0.43(0.13) 0.46(0.16) 0.39(0.11) 0.43(0.11)

8 0.19(0.1) 0.21(0.11) 0.27(0.13) 0.25(0.13) 0.27(0.12) 0.24(0.09) 0.29(0.13)

9 0.34(0.13) 0.39(0.12) 0.19(0.13) 0.28(0.11) 0.13(0.1) 0.11(0.09) 0.1(0.1)

10 0.57(0.12) 0.65(0.11) 0.67(0.13) 0.55(0.12) 0.74(0.1) 0.61(0.12) 0.74(0.11)

11 0.44(0.15) 0.48(0.14) 0.25(0.12) 0.33(0.1) 0.18(0.13) 0.12(0.11) 0.16(0.16)

12 0.52(0.14) 0.61(0.17) 0.6(0.12) 0.5(0.12) 0.7(0.1) 0.56(0.13) 0.68(0.1)

72



1 0.68(0.1) 0.69(0.09) 0.72(0.1) 0.67(0.1) 0.79(0.08) 0.82(0.07)

2 0.64(0.09) 0.64(0.09) 0.72(0.1) 0.63(0.08) 0.71(0.1) 0.75(0.1)

3 0.8(0.1) 0.8(0.1) 0.79(0.11) 0.76(0.11) 0.86(0.07) 0.9(0.06)

4 0.63(0.07) 0.63(0.09) 0.73(0.1) 0.65(0.1) 0.73(0.08) 0.76(0.08)

5 0.79(0.1) 0.8(0.11) 0.79(0.13) 0.74(0.11) 0.85(0.09) 0.86(0.06)

6 0.63(0.07) 0.62(0.09) 0.7(0.1) 0.6(0.1) 0.71(0.06) 0.73(0.05)

7 0.79(0.1) 0.78(0.1) 0.64(0.1) 0.75(0.1) 0.7(0.09) 0.7(0.1)

8 0.61(0.08) 0.61(0.09) 0.66(0.09) 0.61(0.08) 0.64(0.09) 0.68(0.09)

9 0.76(0.09) 0.73(0.1) 0.6(0.07) 0.68(0.08) 0.57(0.07) 0.59(0.07)

10 0.85(0.1) 0.85(0.1) 0.83(0.09) 0.8(0.1) 0.87(0.07) 0.92(0.05)

11 0.82(0.09) 0.8(0.07) 0.59(0.08) 0.74(0.08) 0.57(0.07) 0.61(0.08)

12 0.84(0.07) 0.84(0.08) 0.77(0.1) 0.8(0.07) 0.84(0.05) 0.89(0.05)



1 0.27(0.17) 0.36(0.18) 0.44(0.2) 0.36(0.16) 0.46(0.24) 0.42(0.2) 0.45(0.21)

2 0.17(0.15) 0.2(0.17) 0.24(0.19) 0.26(0.2) 0.24(0.2) 0.26(0.2) 0.24(0.2)

3 0.45(0.23) 0.54(0.2) 0.55(0.18) 0.51(0.18) 0.57(0.23) 0.52(0.17) 0.59(0.18)

4 0.18(0.17) 0.19(0.15) 0.31(0.18) 0.28(0.17) 0.28(0.21) 0.24(0.17) 0.29(0.18)

5 0.38(0.19) 0.56(0.19) 0.53(0.17) 0.51(0.15) 0.55(0.18) 0.52(0.16) 0.56(0.17)

6 0.2(0.16) 0.17(0.15) 0.22(0.18) 0.22(0.17) 0.2(0.19) 0.19(0.18) 0.21(0.18)

7 0.35(0.2) 0.46(0.18) 0.38(0.19) 0.4(0.17) 0.38(0.18) 0.34(0.18) 0.36(0.17)

8 0.14(0.15) 0.17(0.15) 0.21(0.16) 0.2(0.15) 0.18(0.17) 0.15(0.12) 0.18(0.15)

9 0.3(0.17) 0.34(0.18) 0.15(0.16) 0.21(0.15) 0.05(0.1) 0.06(0.11) 0.05(0.1)

10 0.52(0.24) 0.65(0.18) 0.63(0.18) 0.59(0.18) 0.62(0.2) 0.58(0.19) 0.65(0.17)

11 0.31(0.18) 0.42(0.19) 0.2(0.15) 0.25(0.15) 0.08(0.12) 0.06(0.11) 0.07(0.11)

12 0.45(0.22) 0.56(0.17) 0.52(0.17) 0.49(0.17) 0.52(0.19) 0.51(0.19) 0.55(0.18)

73



1 0.74(0.13) 0.73(0.14) 0.77(0.14) 0.73(0.12) 0.77(0.13) 0.82(0.14)

2 0.63(0.12) 0.64(0.11) 0.66(0.13) 0.63(0.14) 0.7(0.13) 0.71(0.11)

3 0.8(0.16) 0.8(0.15) 0.77(0.14) 0.78(0.16) 0.85(0.11) 0.89(0.1)

4 0.65(0.11) 0.64(0.12) 0.71(0.13) 0.67(0.13) 0.74(0.14) 0.74(0.13)

5 0.77(0.13) 0.76(0.14) 0.74(0.14) 0.75(0.13) 0.82(0.11) 0.84(0.11)

6 0.63(0.11) 0.63(0.11) 0.68(0.12) 0.63(0.12) 0.65(0.12) 0.68(0.11)

7 0.75(0.15) 0.75(0.14) 0.67(0.16) 0.73(0.14) 0.7(0.13) 0.73(0.14)

8 0.65(0.1) 0.63(0.11) 0.65(0.1) 0.63(0.12) 0.65(0.12) 0.68(0.13)

9 0.71(0.13) 0.71(0.13) 0.63(0.13) 0.68(0.12) 0.61(0.11) 0.6(0.1)

10 0.84(0.13) 0.84(0.13) 0.82(0.11) 0.83(0.13) 0.87(0.11) 0.91(0.08)

11 0.74(0.14) 0.73(0.14) 0.63(0.11) 0.72(0.13) 0.61(0.1) 0.62(0.11)

12 0.76(0.16) 0.76(0.15) 0.71(0.17) 0.74(0.14) 0.82(0.11) 0.85(0.09)



1 0.33(0.11) 0.34(0.11) 0.43(0.14) 0.37(0.11) 0.49(0.15) 0.43(0.12) 0.49(0.13)

2 0.28(0.12) 0.32(0.11) 0.33(0.12) 0.32(0.1) 0.39(0.15) 0.37(0.16) 0.4(0.13)

3 0.56(0.11) 0.55(0.11) 0.52(0.13) 0.47(0.1) 0.64(0.13) 0.52(0.12) 0.64(0.13)

4 0.26(0.09) 0.27(0.11) 0.32(0.14) 0.32(0.13) 0.35(0.13) 0.3(0.12) 0.37(0.13)

5 0.53(0.15) 0.56(0.14) 0.48(0.14) 0.46(0.11) 0.59(0.15) 0.51(0.13) 0.58(0.14)

6 0.22(0.12) 0.24(0.12) 0.29(0.12) 0.27(0.11) 0.3(0.14) 0.25(0.12) 0.31(0.14)

7 0.46(0.15) 0.48(0.13) 0.36(0.14) 0.4(0.12) 0.45(0.15) 0.33(0.13) 0.42(0.12)

8 0.19(0.09) 0.22(0.12) 0.28(0.13) 0.27(0.13) 0.29(0.14) 0.25(0.11) 0.29(0.12)

9 0.38(0.12) 0.38(0.13) 0.19(0.13) 0.28(0.11) 0.15(0.12) 0.1(0.1) 0.1(0.11)

10 0.61(0.14) 0.58(0.14) 0.51(0.15) 0.55(0.1) 0.71(0.12) 0.62(0.13) 0.72(0.11)

11 0.41(0.16) 0.49(0.12) 0.26(0.13) 0.34(0.13) 0.18(0.11) 0.13(0.11) 0.16(0.1)

12 0.54(0.08) 0.55(0.2) 0.42(0.18) 0.49(0.12) 0.59(0.17) 0.46(0.15) 0.55(0.14)

74



1 0.68(0.09) 0.67(0.09) 0.71(0.11) 0.66(0.1) 0.77(0.11) 0.81(0.09)

2 0.64(0.1) 0.64(0.08) 0.66(0.11) 0.61(0.08) 0.72(0.1) 0.76(0.07)

3 0.77(0.09) 0.76(0.08) 0.72(0.09) 0.72(0.09) 0.81(0.08) 0.88(0.05)

4 0.63(0.1) 0.64(0.09) 0.67(0.09) 0.62(0.1) 0.68(0.07) 0.73(0.09)

5 0.79(0.09) 0.78(0.1) 0.7(0.1) 0.75(0.1) 0.8(0.08) 0.83(0.08)

6 0.61(0.08) 0.61(0.08) 0.62(0.11) 0.6(0.08) 0.65(0.09) 0.69(0.09)

7 0.73(0.11) 0.72(0.1) 0.62(0.1) 0.7(0.1) 0.66(0.1) 0.7(0.09)

8 0.6(0.07) 0.6(0.08) 0.63(0.1) 0.61(0.08) 0.65(0.09) 0.69(0.08)

9 0.72(0.09) 0.69(0.1) 0.58(0.08) 0.66(0.09) 0.57(0.06) 0.58(0.07)

10 0.81(0.1) 0.79(0.1) 0.7(0.11) 0.78(0.09) 0.87(0.06) 0.9(0.05)

11 0.77(0.1) 0.75(0.09) 0.59(0.09) 0.72(0.09) 0.57(0.07) 0.58(0.08)

12 0.76(0.1) 0.73(0.13) 0.65(0.13) 0.73(0.11) 0.77(0.08) 0.82(0.07)



1 0.25(0.21) 0.36(0.19) 0.42(0.2) 0.38(0.19) 0.45(0.2) 0.42(0.2) 0.41(0.19)

2 0.2(0.17) 0.26(0.15) 0.36(0.19) 0.29(0.17) 0.3(0.18) 0.33(0.23) 0.36(0.21)

3 0.35(0.18) 0.62(0.19) 0.62(0.16) 0.51(0.21) 0.56(0.21) 0.62(0.17) 0.6(0.18)

4 0.14(0.13) 0.22(0.16) 0.25(0.17) 0.21(0.15) 0.25(0.16) 0.23(0.18) 0.24(0.18)

5 0.4(0.16) 0.64(0.17) 0.61(0.18) 0.5(0.17) 0.65(0.16) 0.63(0.15) 0.64(0.12)

6 0.24(0.17) 0.2(0.17) 0.25(0.18) 0.25(0.17) 0.29(0.2) 0.21(0.14) 0.25(0.16)

7 0.36(0.15) 0.48(0.18) 0.39(0.14) 0.4(0.11) 0.38(0.24) 0.31(0.17) 0.37(0.2)

8 0.16(0.14) 0.11(0.14) 0.15(0.16) 0.16(0.15) 0.13(0.13) 0.12(0.15) 0.11(0.14)

9 0.3(0.17) 0.36(0.17) 0.2(0.15) 0.27(0.18) 0.07(0.12) 0.04(0.1) 0.04(0.1)

10 0.48(0.26) 0.64(0.12) 0.71(0.12) 0.61(0.15) 0.66(0.2) 0.63(0.12) 0.71(0.12)

11 0.39(0.18) 0.48(0.18) 0.23(0.19) 0.33(0.18) 0.14(0.15) 0.11(0.14) 0.11(0.15)

12 0.47(0.21) 0.63(0.18) 0.62(0.19) 0.54(0.16) 0.54(0.18) 0.5(0.21) 0.56(0.19)

75



1 0.73(0.11) 0.72(0.11) 0.79(0.11) 0.72(0.12) 0.83(0.08) 0.86(0.08)

2 0.7(0.12) 0.7(0.14) 0.74(0.14) 0.69(0.11) 0.79(0.12) 0.78(0.12)

3 0.8(0.13) 0.83(0.12) 0.82(0.15) 0.77(0.15) 0.88(0.09) 0.91(0.09)

4 0.63(0.11) 0.61(0.14) 0.68(0.11) 0.62(0.12) 0.72(0.13) 0.73(0.11)

5 0.82(0.12) 0.84(0.12) 0.82(0.17) 0.79(0.12) 0.87(0.1) 0.88(0.1)

6 0.65(0.11) 0.64(0.12) 0.69(0.15) 0.66(0.13) 0.72(0.14) 0.74(0.12)

7 0.77(0.11) 0.77(0.12) 0.68(0.14) 0.74(0.13) 0.72(0.15) 0.73(0.13)

8 0.6(0.11) 0.62(0.11) 0.66(0.09) 0.64(0.12) 0.67(0.1) 0.64(0.12)

9 0.75(0.13) 0.76(0.13) 0.65(0.11) 0.7(0.13) 0.64(0.12) 0.64(0.1)

10 0.82(0.11) 0.83(0.1) 0.84(0.12) 0.81(0.1) 0.89(0.07) 0.92(0.06)

11 0.8(0.1) 0.78(0.11) 0.64(0.12) 0.75(0.14) 0.65(0.1) 0.66(0.1)

12 0.86(0.12) 0.87(0.12) 0.83(0.15) 0.83(0.13) 0.84(0.12) 0.88(0.11)



1 0.31(0.11) 0.36(0.11) 0.47(0.13) 0.34(0.12) 0.54(0.13) 0.46(0.12) 0.52(0.11)

2 0.25(0.11) 0.25(0.13) 0.33(0.13) 0.29(0.11) 0.34(0.12) 0.31(0.13) 0.35(0.14)

3 0.54(0.17) 0.6(0.15) 0.61(0.14) 0.53(0.14) 0.69(0.15) 0.61(0.13) 0.68(0.13)

4 0.25(0.11) 0.26(0.11) 0.38(0.12) 0.32(0.13) 0.38(0.12) 0.34(0.13) 0.39(0.13)

5 0.48(0.16) 0.54(0.15) 0.51(0.16) 0.48(0.11) 0.63(0.15) 0.5(0.14) 0.6(0.14)

6 0.21(0.09) 0.25(0.13) 0.32(0.12) 0.3(0.1) 0.33(0.11) 0.3(0.13) 0.33(0.12)

7 0.44(0.13) 0.52(0.15) 0.43(0.12) 0.44(0.13) 0.5(0.14) 0.4(0.09) 0.49(0.11)

8 0.2(0.13) 0.19(0.13) 0.26(0.12) 0.24(0.13) 0.25(0.14) 0.21(0.11) 0.27(0.14)

9 0.37(0.12) 0.39(0.13) 0.19(0.13) 0.3(0.13) 0.11(0.11) 0.08(0.08) 0.09(0.1)

10 0.58(0.17) 0.66(0.14) 0.65(0.14) 0.58(0.1) 0.73(0.14) 0.58(0.13) 0.72(0.12)

11 0.39(0.14) 0.52(0.15) 0.29(0.11) 0.36(0.13) 0.18(0.09) 0.11(0.09) 0.16(0.08)

12 0.54(0.15) 0.62(0.15) 0.53(0.18) 0.53(0.12) 0.67(0.15) 0.52(0.14) 0.66(0.13)

76



1 0.68(0.09) 0.69(0.11) 0.74(0.11) 0.66(0.1) 0.78(0.08) 0.83(0.06)

2 0.63(0.1) 0.63(0.09) 0.69(0.1) 0.62(0.1) 0.7(0.08) 0.75(0.08)

3 0.82(0.1) 0.81(0.09) 0.81(0.1) 0.78(0.1) 0.86(0.08) 0.9(0.06)

4 0.62(0.09) 0.62(0.09) 0.7(0.09) 0.62(0.08) 0.71(0.09) 0.75(0.08)

5 0.77(0.11) 0.77(0.11) 0.71(0.13) 0.73(0.11) 0.8(0.08) 0.85(0.08)

6 0.61(0.07) 0.62(0.08) 0.69(0.08) 0.6(0.08) 0.67(0.09) 0.72(0.08)

7 0.79(0.1) 0.78(0.12) 0.66(0.11) 0.75(0.11) 0.69(0.09) 0.72(0.11)

8 0.62(0.08) 0.6(0.08) 0.65(0.1) 0.61(0.09) 0.62(0.09) 0.66(0.1)

9 0.77(0.1) 0.74(0.1) 0.59(0.08) 0.71(0.1) 0.59(0.06) 0.59(0.07)

10 0.84(0.11) 0.84(0.1) 0.81(0.1) 0.81(0.1) 0.86(0.08) 0.92(0.06)

11 0.8(0.1) 0.8(0.11) 0.61(0.08) 0.75(0.09) 0.57(0.07) 0.59(0.08)

12 0.84(0.1) 0.84(0.1) 0.75(0.12) 0.81(0.1) 0.8(0.1) 0.88(0.08)



1 0.26(0.17) 0.37(0.14) 0.38(0.17) 0.38(0.18) 0.41(0.17) 0.36(0.17) 0.42(0.16)

2 0.17(0.16) 0.17(0.16) 0.22(0.17) 0.22(0.17) 0.21(0.19) 0.23(0.19) 0.23(0.19)

3 0.48(0.25) 0.54(0.17) 0.6(0.19) 0.55(0.18) 0.55(0.2) 0.54(0.15) 0.6(0.18)

4 0.17(0.14) 0.2(0.15) 0.22(0.15) 0.24(0.14) 0.23(0.15) 0.22(0.13) 0.26(0.15)

5 0.36(0.17) 0.52(0.15) 0.56(0.2) 0.47(0.18) 0.53(0.16) 0.51(0.17) 0.55(0.18)

6 0.17(0.13) 0.16(0.15) 0.21(0.19) 0.22(0.14) 0.18(0.17) 0.19(0.16) 0.2(0.16)

7 0.36(0.2) 0.5(0.2) 0.41(0.2) 0.46(0.19) 0.42(0.23) 0.38(0.22) 0.42(0.23)

8 0.16(0.17) 0.14(0.13) 0.18(0.16) 0.18(0.16) 0.13(0.17) 0.15(0.15) 0.16(0.15)

9 0.32(0.21) 0.32(0.17) 0.12(0.13) 0.22(0.18) 0.03(0.07) 0.05(0.08) 0.03(0.07)

10 0.45(0.2) 0.69(0.19) 0.68(0.16) 0.53(0.17) 0.67(0.17) 0.61(0.17) 0.7(0.16)

11 0.34(0.21) 0.46(0.14) 0.22(0.16) 0.26(0.16) 0.14(0.17) 0.12(0.13) 0.1(0.14)

12 0.46(0.24) 0.62(0.19) 0.56(0.18) 0.56(0.18) 0.57(0.19) 0.55(0.17) 0.59(0.18)

77



1 0.68(0.12) 0.68(0.12) 0.71(0.13) 0.66(0.15) 0.74(0.12) 0.78(0.11)

2 0.63(0.09) 0.61(0.11) 0.69(0.11) 0.62(0.1) 0.66(0.11) 0.72(0.1)

3 0.82(0.15) 0.82(0.13) 0.81(0.13) 0.8(0.14) 0.84(0.11) 0.9(0.08)

4 0.62(0.12) 0.63(0.1) 0.69(0.12) 0.61(0.12) 0.67(0.13) 0.71(0.13)

5 0.76(0.18) 0.74(0.17) 0.77(0.2) 0.74(0.16) 0.83(0.1) 0.85(0.15)

6 0.62(0.1) 0.63(0.09) 0.65(0.12) 0.62(0.1) 0.66(0.13) 0.68(0.12)

7 0.76(0.14) 0.76(0.14) 0.69(0.13) 0.75(0.16) 0.72(0.14) 0.73(0.14)

8 0.63(0.11) 0.62(0.11) 0.63(0.11) 0.64(0.14) 0.66(0.12) 0.68(0.13)

9 0.74(0.13) 0.71(0.13) 0.6(0.09) 0.67(0.12) 0.61(0.1) 0.6(0.1)

10 0.8(0.15) 0.82(0.14) 0.81(0.15) 0.78(0.14) 0.87(0.09) 0.91(0.08)

11 0.76(0.13) 0.75(0.12) 0.61(0.08) 0.7(0.1) 0.6(0.11) 0.6(0.12)

12 0.82(0.14) 0.82(0.13) 0.78(0.14) 0.8(0.13) 0.82(0.15) 0.87(0.11)



1 0.32(0.12) 0.38(0.08) 0.44(0.14) 0.4(0.12) 0.54(0.12) 0.46(0.11) 0.54(0.09)

2 0.22(0.12) 0.27(0.11) 0.34(0.12) 0.28(0.13) 0.36(0.15) 0.34(0.14) 0.38(0.15)

3 0.51(0.13) 0.51(0.1) 0.46(0.14) 0.46(0.12) 0.61(0.14) 0.57(0.15) 0.61(0.14)

4 0.22(0.11) 0.24(0.12) 0.29(0.1) 0.3(0.15) 0.33(0.12) 0.32(0.15) 0.33(0.13)

5 0.5(0.11) 0.47(0.13) 0.39(0.14) 0.45(0.11) 0.57(0.16) 0.51(0.15) 0.54(0.14)

6 0.23(0.11) 0.24(0.14) 0.28(0.17) 0.26(0.09) 0.26(0.16) 0.25(0.15) 0.28(0.14)

7 0.44(0.17) 0.49(0.14) 0.37(0.14) 0.4(0.13) 0.44(0.18) 0.37(0.13) 0.44(0.16)

8 0.2(0.13) 0.24(0.13) 0.29(0.15) 0.26(0.14) 0.28(0.14) 0.25(0.11) 0.28(0.15)

9 0.34(0.12) 0.34(0.15) 0.18(0.13) 0.27(0.14) 0.13(0.1) 0.09(0.07) 0.08(0.07)

10 0.57(0.16) 0.61(0.15) 0.5(0.17) 0.56(0.1) 0.69(0.15) 0.55(0.11) 0.67(0.11)

11 0.39(0.15) 0.45(0.12) 0.27(0.1) 0.36(0.14) 0.16(0.11) 0.14(0.12) 0.15(0.13)

12 0.52(0.14) 0.56(0.15) 0.46(0.14) 0.51(0.14) 0.62(0.15) 0.48(0.11) 0.57(0.11)

78



1 0.69(0.09) 0.68(0.09) 0.72(0.11) 0.68(0.09) 0.8(0.07) 0.83(0.05)

2 0.62(0.09) 0.61(0.1) 0.69(0.1) 0.62(0.11) 0.71(0.1) 0.75(0.09)

3 0.76(0.11) 0.75(0.11) 0.68(0.14) 0.74(0.1) 0.84(0.07) 0.88(0.07)

4 0.64(0.09) 0.63(0.09) 0.65(0.1) 0.63(0.09) 0.71(0.1) 0.69(0.11)

5 0.75(0.08) 0.71(0.09) 0.64(0.1) 0.73(0.08) 0.79(0.08) 0.81(0.08)

6 0.63(0.1) 0.63(0.1) 0.66(0.12) 0.61(0.1) 0.66(0.13) 0.68(0.11)

7 0.71(0.12) 0.71(0.11) 0.61(0.09) 0.69(0.11) 0.67(0.11) 0.7(0.13)

8 0.64(0.11) 0.63(0.1) 0.66(0.09) 0.62(0.08) 0.63(0.08) 0.67(0.09)

9 0.71(0.11) 0.68(0.11) 0.58(0.08) 0.65(0.09) 0.59(0.05) 0.58(0.07)

10 0.8(0.11) 0.8(0.12) 0.69(0.12) 0.78(0.12) 0.83(0.08) 0.9(0.06)

11 0.77(0.13) 0.74(0.12) 0.57(0.09) 0.73(0.1) 0.6(0.07) 0.59(0.08)

12 0.76(0.09) 0.75(0.09) 0.68(0.1) 0.74(0.09) 0.77(0.09) 0.83(0.09)

79

outlier detection in survival analysis · a presen¸ca de outliers numa amostra pode inﬂuenciar a...

Documents