enhancing diagnostics for invasive aspergillosis using machine learning

22
Introduction Results Description Conclusions HISA Big Data 2014 – April 3rd 2014 ( #BD14 ) Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning Simone Romano [email protected] @ialuronico James Bailey 1 Lawrence Cavedon 1,2,3 Orla Morrissey 4,5 Monica slavin 6,7 Karin Verspoor 1,2 1 The University of Melbourne, Dept. of Computing and Information Systems 2 NICTA (National ICT Aust.) VRL 3 School of Computer Science and IT, RMIT University 4 Alfred Health 5 Monash University 6 Peter MacCallum Cancer Centre 7 Melbourne Health Simone Romano The University of Melbourne Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Upload: simone-romano

Post on 12-Apr-2017

124 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

HISA Big Data 2014 – April 3rd 2014 ( #BD14 )

Enhancing Diagnostics for Invasive Aspergillosis usingMachine Learning

Simone Romano

[email protected]

@ialuronico

James Bailey1 Lawrence Cavedon1,2,3 Orla Morrissey4,5

Monica slavin6,7 Karin Verspoor1,2

1The University of Melbourne, Dept. of Computing and Information Systems2NICTA (National ICT Aust.) VRL3School of Computer Science and IT, RMIT University4Alfred Health 5Monash University6Peter MacCallum Cancer Centre 7Melbourne Health

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 2: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

IntroductionInvasive AspergillosisChallenging Big Data Task

ResultsDiagnostic Model

DescriptionMachine Learning for DiagnosisDiagnosis of Invasive Aspergillosis

ConclusionsSummaryFuture Work

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 3: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Invasive Aspergillosis

Invasive Aspergillosis (IA)

Serious fungal infection and major cause ofmortality in patients undergoing allogeneicstem cell transplantation or chemotherapyfor acute leukaemia.

Figure : Pulmonary IA.http://en.wikipedia.org/wiki/Aspergillosis

Facts

I 34–43% mortality rate;

I culture methods low sensitivity, only 40–50% IA cases identified;

I IA patient results in +7 days of hospital stay and +$30,957.

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 4: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Invasive Aspergillosis

Diagnosis and Treatment

Cases are classified with ProvenIA/ProbableIA/PossibleIA.

Current criteria for diagnosing IA are:

1. microbiology, risk factors, and CT scan findings;

2. Improved biomarkers such as Aspergillus PCR and Galactomannan(GM) tested twice a week.

I positive biopsy OR (positive CT scan AND single positive PCR/GM)⇒ ProvenIA

I ≥ 2 consecutive positive PCR/GM in 2 week time frame⇒ ProbableIA

ProblemOne single positive biomarker might be a False Positive⇒ Unnecessary harmful treatment.

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 5: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Challenging Big Data Task

Big Data task

In a randomised controlled trial comparing the two different strategies fordiagnosis IA, large amount of data was collected from 240 patientsbetween Sept. 2005 and Nov. 2009 at six Australian Centres.

Objective: Leverage such data to produce more accurate prediction ofIA with Machine Learning techniques.

Are we really dealing with Big Data?

All patients tracked for 26 weeks providing rich longitudinal data ondaily and weekly tests for each patient.

240× 26× 7 = 45,680 records.Bed-side interpretation is a challenging task!

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 6: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Challenging Big Data Task

Big Data task

In a randomised controlled trial comparing the two different strategies fordiagnosis IA, large amount of data was collected from 240 patientsbetween Sept. 2005 and Nov. 2009 at six Australian Centres.

Objective: Leverage such data to produce more accurate prediction ofIA with Machine Learning techniques.

Are we really dealing with Big Data?All patients tracked for 26 weeks providing rich longitudinal data ondaily and weekly tests for each patient.

240× 26× 7 = 45,680 records.Bed-side interpretation is a challenging task!

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 7: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Diagnostic Model

IntroductionInvasive AspergillosisChallenging Big Data Task

ResultsDiagnostic Model

DescriptionMachine Learning for DiagnosisDiagnosis of Invasive Aspergillosis

ConclusionsSummaryFuture Work

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 8: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Diagnostic Model

Model

Our training set is a collection of 358 single positive biomarker tests thatprecede the earliest label of IA.

Transplant/Chemotherapybegins

1st 2nd 3rd 4th 5th months

positive biomarkers infection

Just 29 of the positive biomarkers were associated with a Proven IA orProbable IA label within a week (329 false positives)

I Built a model to output a probability of infection within a weekvalue;

I Validated by a patient-level cross-validation framework.

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 9: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Diagnostic Model

1 − TNR

TPR

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

AUC = 0.63

I AUC not too good

I But good inclassifying negatives!

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 10: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Diagnostic Model

1 − TNR

TPR

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

AUC = 0.63 I AUC not too good

I But good inclassifying negatives!

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 11: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Diagnostic Model

1 − TNR

TPR

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

AUC = 0.63

I AUC not too good

I But good inclassifying negatives!

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 12: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Diagnostic Model

Result

Setting a low threshold on the model output probability to achieve highNPV (100%) we were able to identify 95 (26.5%) tests that do notlead to an IA infection (TNR = 28.9%) within a week.

⇒ Doctors can avoid to start treatment in 26.5% cases!

I avoid over-treatment;

I reduce drug-toxicity;

I reduce antifungal drug costs(E.g. Amphotericin B $8,260 per patient per week).

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 13: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Machine Learning for Diagnosis

IntroductionInvasive AspergillosisChallenging Big Data Task

ResultsDiagnostic Model

DescriptionMachine Learning for DiagnosisDiagnosis of Invasive Aspergillosis

ConclusionsSummaryFuture Work

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 14: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Machine Learning for Diagnosis

Classification ModelsI Logistic regression;

I Decision trees;

I Random forestTraining set

Voting

resampling

random tree

resampling

random tree

resampling

random tree

resampling

random tree

resampling

random tree

Random forest because:

I It has the capability to work with heterogeneous features(categorical/continuous);

I It can work with many features.

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 15: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Diagnosis of Invasive Aspergillosis

Features to use

Known at baseline: Gender, age, BMI, smoking attitude status,etc.

Daily tested: neutrophil count, body temperature, amount ofadministered steroids, haemoglobin, platelets, white cell count, urea,creatinine, ALT, AST, GGT, bilirubin, LDH, etc.

Very heterogeneous features!!!

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 16: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Diagnosis of Invasive Aspergillosis

Features to use

Known at baseline: Gender, age, BMI, smoking attitude status,etc.

Daily tested: neutrophil count, body temperature, amount ofadministered steroids, haemoglobin, platelets, white cell count, urea,creatinine, ALT, AST, GGT, bilirubin, LDH, etc.

Very heterogeneous features!!!

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 17: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Diagnosis of Invasive Aspergillosis

Heterogeneous FeaturesFeatures constant along the treatment: Age, Gender, etc.

Features that varied over time: neutrophil count, temperature,corticosteroid doses, etc.

When we have a positive biomarker test we can use the recent pastinformation to predict IA. We consider recent past the values in the 3week window prior a single positive test result.

May Jun Jul

36.5

37.5

38.5

date

temperature

window

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 18: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Diagnosis of Invasive Aspergillosis

Features that varied over time

I Duration Features we count the number of days the value eachparameter lay within a particular range. For example, we divide themeasured temperature measurements into the intervals [36,37],(37,38], (38,39], (39, 40], and and greater than 40(>40) Celsiusdegrees and counted the number of days temperature occurred ineach interval;

I Trajectories We select two days in the 3 week window preceding apositive test test and compute the mean value, the standarddeviation, and the relative difference between those values. Wedo it for all possible intervals in the window.

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 19: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Summary

IntroductionInvasive AspergillosisChallenging Big Data Task

ResultsDiagnostic Model

DescriptionMachine Learning for DiagnosisDiagnosis of Invasive Aspergillosis

ConclusionsSummaryFuture Work

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 20: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Summary

Summary

Target: Enhance Diagnostics for biomarkers for InvasiveAspergillosis

Method: Random forest for heterogeneous features creatingduration features, and trajectories features;

Validation: patient-level cross-validation;

Results: Setting a low threshold on the output probability, NPV =100%, TNR = 28.9%. Safe avoidance of antifungaltherapy for 26.5% cases. Savings around $8K per patientper week.

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 21: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Future Work

Future Work

I make the model more accurate in predicting when a positive test isassociated with an immediate infection to trigger the antifungaltreatment earlier in time;

I search for alternative diagnosis when the outcomes are equallyprobable according to the model;

I make the model output more interpretable to clinical practitioners,e.g. by identifying the trajectories in the data which generate a lowor high probability of IA.

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Page 22: Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning

Introduction Results Description Conclusions

Future Work

Thank you.

Questions?

Simone Romano The University of Melbourne

Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning