crowdsourcing predictive analytics to enhance clinical ... · – crowdsourcing is the practice of...

23
Crowdsourcing Predictive Analytics to Enhance Clinical Trial Design Scott A. Jelinsky, Ph.D Computational Precision Medicine Inflammation and Immunity Research Area

Upload: others

Post on 19-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Crowdsourcing Predictive Analytics to Enhance Clinical Trial Design Scott A. Jelinsky, Ph.D Computational Precision Medicine Inflammation and Immunity Research Area

Page 2: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Exacerbation requiring emergency visit

Patient selection strategy in Chronic Obstruction Pulmonary Disease (COPD)

2

http://www.nhlbi.nih.gov/health//dci/Diseases/Copd/Copd_WhatIs.html

20%

No Effective way to identify these patients

Stable or slow decline in disease

80%

Page 3: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Patient-Centric DataCommons

Pfizer Confidential │ 3

•  Routine clinical trial collect an unprecedented amount data •  1000s to millions of data points per patient •  Diverse Data types collected

•  Advanced data analytics are need to analysis this wealth of data

Page 4: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Genotype data

Pfizer Confidential │ 4

N=10,000

N=3.2 Billion

N=1 Million

•  Genetics: a discipline of biology, is the science of heredity and variation in living organisms.

Page 5: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Imaging Lung Data

Pfizer Confidential │ 5

https://commons.wikimedia.org/wiki/File:Pulmon_fibrosis.PNG#/media/File:Pulmon_fibrosis.PNG

Med Image Comput Comput Assist Interv 2009; 12:690-8

Acad Radiol. 2012 Oct; 19(10): 1241–1251.

Data reduced to ~100 numerically derived fields

Page 6: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Integration of Diverse Biomolecular and Clinical Data

•  Our goal: Integrate our Genetics, Genomics, Clinical, Text mining efforts to enable powerful analyses

Open  Sourcing  

Internal  Efforts   Externaliza6on  

Knowledge  to  help  develop  pa6ent  stra6fica6on  and  drive  clinical  development  

Consor6a  

Page 7: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

•  From Wikipedia –  Crowdsourcing is the practice of obtaining

needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers

What is Crowdsourcing/Open innovation?

•  Crowdsourcing can apply to a wide range of activities •  Microtasks- Division of labor for tedious tasks (Wikipedia) •  MacroTasks- finding a specific skill for a job (Web Design) •  Crowdfunding engagement of social networks to raise money •  Crowdcontests a broad-based competition to identify the best solution

for a particular question

Page 8: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Prize based contest approach for open innovation

•  Prize-based contest approach – Generalize any life sciences problem into generic computer-science terms

•  Remove bias

–  Diverse group of programmers tackle the problem

–  Award prizes/milestones payments for best solutions •  Contest run on platforms including TopCoder.com and

CrowdANALYTIX.com •  Community of over 500,000 developers and data scientists

Page 9: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Crowdsourced predictive algorithm development

Pfizer Confidential │ 9

Collect  data      •  Iden6fy  data/ques6ons  • Obfuscate  data  to  protect  pa6ent/Data  privacy  

Predic6ve  models  • Crowdsource  custom  predic6ve  models  

Visualiza6on  /  Integra6on  • Crowdsource  the  visualiza6on  

Implementa6on  • Server  Support  • Update  code  

Page 10: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Question Formulation/Data collection

•  Identify data/questions –  Babbage Analytics and Innovation contracted to help formulate

questions that would be suitable for a crowd-based contest •  Goal:

–  Create a model that will predict whether a patient with lung disease will experience worsening of symptoms

•  Objectives: –  Given baseline data and LFU outcomes, develop an algorithm to

predict which patients are more likely to exhibit an exacerbation. –  Uncover top variables which can help identify and monitor

exacerbation of disease.

Pfizer Confidential │ 10

Collect  data  •  Iden6fy  data/ques6ons  • Obfuscate  data  to  protect  pa6ent/Data  privacy  

Page 11: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

DataSets

Collect  data  •  Iden6fy  data/ques6ons  • Obfuscate  data  to  protect  pa6ent/Data  privacy  

•  Observation study of past and current smokers –  Clinical data (over 400 data points) for >10,000 subjects –  High resolution radiology data for >10,000 subjects –  Genotype data for >10,000 subjects –  Telephony follow-up (3-6 month intervals)

Page 12: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Data was obfuscated to protect patient privacy

•  Obfuscate data to protect patient/Data privacy –  Contest to have the experts develop a software

solution to obfuscate data •  MetaDataEngine

–  Python script developed through crowdsourcing contest

•  Patient IDs de-identified •  All data labels stripped. •  All continuous and non continuous data values

normalized to values between 0.0 and 1.0 •  Columns/Rows will be randomized and rearranged

–  $400 in prize money

Pfizer Confidential │ 12

Collect  data  •  Iden6fy  data/ques6ons  • Obfuscate  data  to  protect  pa6ent/Data  privacy  

Page 13: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Predictive Analytics contest

•  Contest run on CrowdANALYTIX to develop an algorithm to predict which patients are more likely to exhibit an exacerbation –  412 people registered –  101 people submitted solutions

Pfizer Confidential │ 13

Leader  BoardPosition Solver Score Location

1 Guschin  Alexander 0.8616 Isreal2 Andrey  Shapulin 0.86052 Bagdad3 Pietro  Marini 0.86003 Amsterdam4 Rohan  Rao 0.85954 Kolkata5 piotrek 0.85942 Amsterdam6 Stanislav  Semenov 0.85899 Baghdad7 Roberto  Abalde 0.85831 Buenos  Aires8 Sriram  Sampathraman 0.85605 Kolkata9 marios  michailidis 0.85521

10 Marija  Zoldin 0.85507 Amsterdam11 Manuel  Amunategui 0.85083 US12 NimNid 0.85052 Kolkata13 Giuseppe  C. 0.85016 Amsterdam

Data Set 400 Clinical Variables 1000 Genetic Markers

Predic6ve  models  (Complete)  • Crowdsource  custom  predic6ve  models  

Page 14: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Crowd Sourced prediction contest

•  30 Day Contest –  Obfuscated data –  101 unique submitters –  ~2870 submissions –  $9000 in prize money

•  Winning solution had a number of different approaches –  Place 1. Random Forest Classifier –  Place 2. XGBoost and Logistic Regression –  Place 3. Ensemble of KNN, Logistic Regression and Random

Forest –  Place 4. Extremely Random Forest –  Place 5. Extreme Gradient Boosting

Pfizer Confidential │ 14

Page 15: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Exploration of solution landscape

•  Top 5 submission had very similar performance –  Over 2800 submissions –  Sufficient test of the

landscape

Pfizer Confidential │ 15

Predictive categories

MeanDecreaseGinigenScore 18.9

Quality  of  life  Questionare 7.1Quality  of  life  Questionare 6.1Quality  of  life  Questionare 5.7Quality  of  life  Questionare 5.1Quality  of  life  Questionare 5.1

Lung  Function 3.8Lung  Function 3.2Lung  Function 3.2Lung  Function 3.1Lung  Function 3.0Lung  Function 2.9

Quality  of  life  Questionare 2.9Quality  of  life  Questionare 2.8Quality  of  life  Questionare 2.6

21% increase in performance over current algorithm

Page 16: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

What we have Learned

•  Sufficient test of the solution landscape –  Which algorithms work and which do not –  Key predictive variables

•  Significant effort spent on identification of key predictive variables –  Most variables are relatively easy to measure

•  Identification of key next steps

Pfizer Proprietary Information│ 16

Page 17: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Predictive Analytics contest

•  Results were used to refine data/Question •  Second contest run on CrowdANALYTIX to predict

patients that will have increased disease severity –  350 people registered –  146 people submitted solutions –  2527 different code versions

Pfizer Confidential │ 17

Data Set 60 Clinical Variables 3309 patients 430 Exacerbators

Predic6ve  models  (Complete)  • Crowdsource  custom  predic6ve  models  

Additional high level Annotation provided

Page 18: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Crowd Sourced prediction contest

•  Winning solution had a number of different approaches –  Place 1. Extremely Random Forest Classifier –  Place 2. Logistic Regression –  Place 3. SVM with Regularization –  Place 4. Logistic Regression with elastic net –  Place 5. PCA

Pfizer Confidential │ 18

0.6  

0.62  

0.64  

0.66  

0.68  

0.7  

0.72  

0.74  

0.76  

Previous  Exacerba6on   Random  Forest   Winning  Solu6on  

Accuracy  

+2%

+14% •  Unique creation and

selection of variables –  PCA, Random Forest,

Linear combinations

Page 19: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Integration and utilization

•  Need to develop front end visualization (APP or Dashboard or Web site) –  Algorithm does not need to available to the end user –  Data is collected at screening visit, entered and probability of

outcome is reported back •  Turned to the crowd to develop visualization dashboard

Pfizer Proprietary Information│ 19

Visualiza6on    • Crowdsource  the  visualiza6on  

Page 20: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Clinical Prediction Dashboard

•  Trial Information Configuration : –  List of Sites, –  Target Enrollments –  Inclusion/Exclusion criteria

•  Enrollment Summary : –  Summary of Estimated Target Number of Patients –  Visualization of Enrollment numbers –  Visualization of geographic enrollment number

•  Patient Prediction Scores –  Summary of each patient’s predicted efficacy, safety and dropout scores.

Pfizer Confidential │ 20

Page 21: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Patient Prediction dashboards

General Information

Sites

Inclusion/Exclusion criteria

Page 22: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Integration with algorithm and ECR/Clinical trials

•  Visualization needs to be integrated with predictive algorithm •  Data input directly from ECR (Pfizer Electronic Clinical Record ) •  Multiple views to support different

•  Clinical view (most simplistic) for direct use by clinicians •  Research view •  Administration view

Pfizer Proprietary Information│ 22

Implementa6on  • Server  Support  • Update  code  

Page 23: Crowdsourcing Predictive Analytics to Enhance Clinical ... · – Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a

Summary

•  CrowdSourcing can be an effective tool for predictive analytics –  leverages crowds of data scientists to identify and build the best

performance model –  Access to domain knowledge experts –  On demand resources particularly when consultants may not be

appropriate –  Winners usually outperform the state of the art methods

Pfizer Confidential │ 23