what makes healthcare data science so hard & interesting - data science pop-up seattle
TRANSCRIPT
#datapopupseattle
What Makes Healthcare Data Science so Hard & Interesting
David TalbySVP Engineering, Atigeo
davidtalby antigeo
#datapopupseattle
UNSTRUCTUREDData Science POP-UP in Seattle
www.dominodatalab.com
DProduced by Domino Data Lab
Domino’s enterprise data science platform is used by leading analytical organizations to increase productivity, enable collaboration, and publish
models into production faster.
David&Talby&SVP&Engineering,&Atigeo&
@davidtalby
WHAT&MAKES&HEALTHCARE&DATA&SCIENCE SO&HARD&&&INTERESTING
“as&many&as&40,500&patients&die&each&year&in&an&ICU&in& the&U.S.&due&to&misdiagnosis”&Winters&et&al.,&2012,&John&Hopkins&&
“Combining&estimates&from&3&studies&yielded&a&rate&of&outpatient&diagnostic&errors&of&5.08%,&or&12&million&US&adults&every&year.”&Singhet&al.,&2014,&VA&Medical&Center
3
Root$cause:$being$human
Premature(closure Fatigue
Overconfidence Team&dynamics
Snap(judgment Prejudice
Real&time&monitoring
Always&up&to&date&with&science
Large&sample&size
Works&24x7&
No&trip,&no&waiting
Cheaper
More&accurate
More&objective
THE&PROMISE
4
“The&algorithm&was&able&to&identify&the&fake&smiles&92%&of&the&time.& Humans,&on&the&other&hand,&performed&no&better&than&chance.”&MIT,&2012
HUMAN&NUANCES
6
Can&you&distinguish&between&real&smiles&of&happiness&and&fake&smiles&trying&to&mask&frustration?
“Algorithms&correctly&predicted&which&atcrisk&youth&would&go&on&to&develop&psychosis&over&a&2.5cyear&period&with& 100%&accuracy.”&Bedi&et&al.,&Nature'Schizophrenia,'2015
MENTAL&HEALTH
7
SAMPLE&HYBRID&ANALYTICS&PIPELINE
8
Freectext&clinical¬es
Relationships&&&ontologies&
Sensors&&&wearables
Graph&Features
Time&Series&Features
NLP&Features
Direct&&&ambient&Feedback
Train&&&te
st&Classifiers
Imagery,& drugs,&labs,&…
Train&&&te
st&ensem
bles
THE&OPEN&PROBLEM:&EXPLAINABILITY
9
@DavidJBianco,&http://www2.mlsecproject.org/blog/oncexplainabilitycincmachineclearning
Never&Changing Always&Changing
Online$Social$Networking$Models/
Rules$
Banking$&$ eCommerce$fraud&Cyber$Security
Automated$trading&RealAtime$ad$bidding
Natural$Language,$Social$Behavior$
Models
Political$&$Economic$Models
Physical$models:&Face$recognition&Voice$recognition$Climate$models
Google/Amazon&Search$models
THE&MOMENT&YOU&PUT&A&MODEL&IN&PRODUCTION,& IT&STARTS&DEGRADING
[Gunjan&Gupta,&Atigeo,&2014]
100%&Offcline 100%&Online
Automated$ensemble,$boosting$&$feature$selection$techniques
Automated$‘challenger’$online$
evaluation$&$deployment
RealAtime$online$learning$via$
passive$feedback
HandAcrafted$machine$learned$models
Active$learning$via&Active$feedback
TraditionalScientific$Method:Test$a$Hypothesis
Hard$Crafted$Rules
Daily/weekly$batch$retraining
SO&PUT&THE&RIGHT&MACHINERY&IN&PLACE
100%&Offcline 100%&Online
Automated$ensemble,$boosting$&$feature$selection$techniques
Automated$‘challenger’$online$
evaluation$&$deployment
RealAtime$online$learning$via$
passive$feedback
HandAcrafted$machine$learned$models
Active$learning$via&Active$feedback
TraditionalScientific$Method:Test$a$Hypothesis
Hard$Crafted$Rules
Daily/weekly$batch$retraining
STATE&OF&THE&PRACTICE&IN&HEALTHCARE
THE&OPEN&PROBLEM:&MODEL&EVALUATION
14
Evaluate&models&that&are:&• Personalized&• Localized&• Evolve&over&time&• Regulatory&acceptable&
?,'?
©&2015&Atigeo,&Corporation.&All&rights&reserved.&&Atigeo&and&the&xPatterns&logo&are&trademarks&of&Atigeo.&The&information&herein&is&for&informational&purposes&only&and&represents&the¤t&view&of&Atigeo&as&of&the&date&of&this&presentation.&&Because&Atigeo&must&respond&to&changing&market&conditions,&it&should¬&be&interpreted&to&be&a&commitment&on&the&part&of&Atigeo,&and&Atigeo&cannot&guarantee&the&accuracy&of&any&information&provided&after&the&date&of&this&presentation.&&ATIGEO&MAKES&NO&WARRANTIES,&EXPRESS,&IMPLIED&OR&STATUTORY,&AS&TO&THE&INFORMATION&IN&THIS&PRESENTATION.