vipr: an interactive tool for meaningful visualization of...

1
VIPR: An Interactive Tool for Meaningful Visualization of High-Dimensional Data Donghan Wang [email protected] Artur Dubrawski [email protected] Madalina Fiterau mfi[email protected] MOTIVATION ACCEPT OUTCOME INVESTIGATE FURTHER QUERY QUERY BORDER CONTROL STOCK MARKET VEHICLE CHECKS DIAGNOSTICS CELL ANALYSIS DRUG EVALUATION Automated Decision Support Systems Our method targets applicaQons where a human operator is involved in the decision. The process must be: Transparent Comprehensible Thus, the problem of finding subspaces where data is classified with high accuracy but which also give operators confidence in the predicQons. Informa<ve Projec<on Ensemble (IPE) methodology has proven effecQve in finding interpretable renderings of high- dimensional data that reveal hidden low- dimensional structures if they exist. User is in control of the choice: InvesQgate Further – expensive Accept Outcome - assume responsibility INFORMATIVE PROJECTION ENSEMBLES User-System Interac<on: 1. the user provides the system a query point; 2. system finds a query- specific projec<on 3. system displays result and illustra<on of how the label was obtained X g(X) c 1 (X) c 2 (X) c 3 (X) h 1 (c 1 (X)) h 2 2 (X)) h 3 3 (X)) π (X) CONTEXT CLASSIFIERS PROJECTIONS SELECTOR QUERY optimal nearly optimal Penalty – limits # of projections 1 2 3 4 5 6 7 Data Points Projections Loss Matrix (L) § B binary selection matrix § B ij is § 1, if projection j is to be used to solve point i and § 0, otherwise 1 0 1 2 3 4 5 6 7 Data Points Projections Selection Matrix (B) IPE learning = finding a set of few projecQons for which the loss is close to the opQmum. We limit the number of projecQons used. LEARNING THE ENSEMBLE c j THE VIPR INTERFACE EXAMPLES EXPERIMENTAL RESULTS Dataset k-NN Feature SelecQon IPE-Min IPE-HC SVM Feature SelecQon IPE-Linear IPE-HC + k-NN (Best) k-NN k-NN + SVM (Best) SVM SVM Alert BP 0.7216 76.1 88.6 82.3 75 73.96 77.08 76.04 0.226 0.1253 0.1479 0.1645 0.063 0.1294 0.1391 15.06 Alert RR 63.51 85.34 97.8 93.65 88.39 86.8 89.25 88.95 0.0397 0.0557 0.0117 0.0157 0.0408 0.0648 0.0442 0.0531 Alert SPO2 88.01 89.55 91.2 90 90.76 90.35 93.02 90.77 0.0104 0.0411 0.0164 0.0328 0.0362 0.0268 0.0255 0.0556 Chars74k 31.61 25.34 35.78 35.54 27.07 30.76 35.92 33.72 0.0245 0.0222 0.0305 0.0159 0.0145 0.0217 0.0321 0.0175 G50C 87.27 92 94.18 93.45 95.09 94.18 95.64 94.36 0.035 0.023 0.044 0.0466 0.0302 0.0384 0.0294 0.023 Leler 95.25 92.61 95.33 95.1 97.07 91.66 97.1 94.86 0.0018 0.0012 0.0017 0.0013 0.0017 0.002 7.00E-04 9.00E-04 MNIST 97.21 92.49 97.57 97.48 9.15 90.53 93.96 9.35 0.0037 0.001 4.71E-04 7.17E-04 0.0056 0.0062 0.01 0.0014 USPS 95.82 93.5 96.69 97.13 93.91 91.38 9.58 9.55 0.0036 0.0064 0.0052 0.0061 0.086 0.0964 0.057 0.62 Dataset # Features # Samples # Classes Alert BP 147 96 2 Alert RR 147 362 2 Alert SPO2 147 259 2 Chars74k 85 3410 62 G50C 50 550 2 Leler 16 16000 26 MNIST 784 60000 10 USPS 256 11000 10 Method Classifier SelecQon OpQmizaQon IPE-Min k-NN k-NN Min loss Two-stage IPE-H k-NN k-NN Hyper rectangle Greedy IPE-Linear SVM SVM MulQclass SVM Two-stage IPE-H SVM SVM Hyperrectan gle Greedy CLINICAL ALERT ADJUDICATION CEREBRAL PALSY PROGRESSION Heart Rate<40 or >140 Respiratory Rate<8 or >36 Systolic Blood Pressure<80 or >200 Diastolic Blood Pressure>110 SPO 2 <85% window preceding alert alert duration Alerts some are ar(facts, not true alerts Maximum Respiratory Rate Median Respiratory Rate Artifacts Real alerts New alert New alert can be confidently adjudicated with the informaQve projecQon. Alert events Events labeled by commilee Selected for expert review Review based on InformaQve ProjecQons Chart-based review AutomaQc AdjudicaQon AdjudicaQon Model CalibraQon PaQents are monitored via non- invasive vital sign monitors. Alerts issued when a VS exceeds predefined thresholds. Many alerts are arQfacts, due to threshold- based issuance. ArQfacts cause alarm faQgue. Machine Learning has proven useful in classifying clinical data. Training data requires laborious expert annotaQon. ObjecQve: Reduce expert annotaQon effort through semi-automaQc adjudicaQon of VS alerts as real or arQfacts, while maintaining high accuracy. CP PaQents Surgical IntervenQon No surgery Selected by doctors, not at random GDI + No guarantee of success Gait DeviaQon Index GDI - GDI + GDI - Most have physical therapy PaQent does not always worsen Subject Information Predict GDI (Gait Deviation Index) following surgery High GDI Low GDI Stronger patients improve High initial GDI can’t be improved Preoperative GDI Strength score We aim to predict the improvement in Gait DeviaQon Index (ΔGDI) following Surgery, for the treatment group AlternaQve treatment for the control group (for instance, physical therapy) Features used in predicQon: age, BMI, joint angles, motor control, strength, walking efficiency (oxygen cost), iniQal GDI We show a subset of preliminary results for Propensity Score group 3 (PS3) The severity of the disease, in terms of Gait Devia<on Index, is the strongest predictor of outcome 0 1 2 3 0 2 4 6 8 0 5 10 15 20 25 30 35 O2cost pctSMC O2cost pre age 20 40 60 80 100 50 0 50 50 60 70 80 90 100 KneeFlex maxFlexion AnkleDorsi IC GDI pre 0 50 100 20 0 20 40 0 0.2 0.4 0.6 0.8 Anteversion AnkleDors Kn90 AggregateSpasScore pre 30.1 25 19.8 14.7 9.57 4.43 0.704 5.84 11 16.1 21.2 Predict GDI for the controls Visual toolkit for InformaQve ProjecQon Recovery (VIPR) 1. Analysis tool for the following tasks: regression, classificaQon, clustering 2. User-specified parameters Features to be used Number of submodels Dimensionality of subspaces Hypothesis class Hold-out set evaluaQon 3. ManipulaQons of trained models Add/remove features/samples Compare models Observe predicQon on test samples Provide feedback on labels

Upload: vandat

Post on 13-Jun-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VIPR: An Interactive Tool for Meaningful Visualization of ...mfiterau/papers/2016/IJCAI_2016_VIPR_poster.pdf · VIPR: An Interactive Tool for Meaningful Visualization of High-Dimensional

VIPR:AnInteractiveToolforMeaningfulVisualizationofHigh-DimensionalData

[email protected]@cs.cmu.edu

[email protected]

MOTIVATION

ACCEPTOUTCOME

INVESTIGATEFURTHER

QUERY QUERY

BORDERCONTROL STOCKMARKET VEHICLECHECKS

DIAGNOSTICS CELLANALYSIS DRUGEVALUATION

AutomatedDecisionSupportSystemsOurmethod targets applicaQonswhere ahuman operator is involved in thedecision.Theprocessmustbe:•  Transparent•  ComprehensibleThus, the problem of finding subspaceswheredataisclassifiedwithhighaccuracybutwhich also give operators confidenceinthepredicQons.Informa<veProjec<onEnsemble(IPE)methodologyhasproveneffecQveinfindinginterpretablerenderingsofhigh-dimensionaldatathatrevealhiddenlow-dimensionalstructuresiftheyexist.

Userisincontrolofthechoice:•  InvesQgateFurther–expensive•  AcceptOutcome-assumeresponsibility

INFORMATIVEPROJECTIONENSEMBLESUser-SystemInterac<on:1.  the user provides the

systemaquerypoint;

2.  system finds a query-specificprojec<on

3.  systemdisplaysresultandillustra<on of how thelabelwasobtained

X g(X)

c1(X)

c2(X)

c3(X)

h1(c1(X))

h2(π2(X))

h3(π3(X))

π(X)

CONTEXTCLASSIFIERSPROJECTIONSSELECTORQUERY

optimalnearlyoptimal

Penalty–limits#ofprojections

1234567

DataPoints

ProjectionsLossMatrix(L)

§  Bbinaryselectionmatrix§  Bijis

§  1,ifprojectionjistobeusedtosolvepointiand

§  0,otherwise

10

1234567

DataPoints

ProjectionsSelectionMatrix(B)

IPElearning=findingasetoffewprojecQonsforwhichthelossisclosetotheopQmum.WelimitthenumberofprojecQonsused.

LEARNINGTHEENSEMBLE

cj

THEVIPRINTERFACE EXAMPLES

EXPERIMENTALRESULTS

Dataset k-NNFeatureSelecQon IPE-Min IPE-HC SVM

FeatureSelecQon IPE-Linear IPE-HC

+k-NN(Best) k-NN k-NN

+SVM(Best) SVM SVM

AlertBP 0.7216 76.1 88.6 82.3 75 73.96 77.08 76.04 0.226 0.1253 0.1479 0.1645 0.063 0.1294 0.1391 15.06AlertRR 63.51 85.34 97.8 93.65 88.39 86.8 89.25 88.95 0.0397 0.0557 0.0117 0.0157 0.0408 0.0648 0.0442 0.0531AlertSPO2 88.01 89.55 91.2 90 90.76 90.35 93.02 90.77 0.0104 0.0411 0.0164 0.0328 0.0362 0.0268 0.0255 0.0556Chars74k 31.61 25.34 35.78 35.54 27.07 30.76 35.92 33.72 0.0245 0.0222 0.0305 0.0159 0.0145 0.0217 0.0321 0.0175G50C 87.27 92 94.18 93.45 95.09 94.18 95.64 94.36 0.035 0.023 0.044 0.0466 0.0302 0.0384 0.0294 0.023Leler 95.25 92.61 95.33 95.1 97.07 91.66 97.1 94.86 0.0018 0.0012 0.0017 0.0013 0.0017 0.002 7.00E-04 9.00E-04MNIST 97.21 92.49 97.57 97.48 9.15 90.53 93.96 9.35 0.0037 0.001 4.71E-04 7.17E-04 0.0056 0.0062 0.01 0.0014USPS 95.82 93.5 96.69 97.13 93.91 91.38 9.58 9.55 0.0036 0.0064 0.0052 0.0061 0.086 0.0964 0.057 0.62

Dataset #Features #Samples #ClassesAlertBP 147 96 2AlertRR 147 362 2AlertSPO2 147 259 2Chars74k 85 3410 62G50C 50 550 2Leler 16 16000 26MNIST 784 60000 10USPS 256 11000 10

Method Classifier SelecQon OpQmizaQonIPE-Mink-NN k-NN Minloss Two-stageIPE-Hk-NN k-NN

Hyperrectangle Greedy

IPE-LinearSVM SVM

MulQclassSVM Two-stage

IPE-HSVM SVMHyperrectangle Greedy

CLINICALALERTADJUDICATION

CEREBRALPALSYPROGRESSION

HeartRate<40or>140RespiratoryRate<8or>36

SystolicBloodPressure<80or>200DiastolicBloodPressure>110

SPO2<85%

window preceding alert alert duration

Alertssomearear(facts,nottruealerts

MaximumRespiratoryRate

Med

ianRe

spira

toryRate

ArtifactsRealalertsNewalert

NewalertcanbeconfidentlyadjudicatedwiththeinformaQveprojecQon.

Alertevents

Eventslabeledbycommilee

Selectedforexpertreview

ReviewbasedonInformaQveProjecQons

Chart-basedreview

AutomaQcAdjudicaQon

AdjudicaQonModel

CalibraQon

PaQentsaremonitoredvianon-invasivevitalsignmonitors.AlertsissuedwhenaVSexceedspredefinedthresholds.ManyalertsarearQfacts,duetothreshold-basedissuance.ArQfactscausealarmfaQgue.MachineLearninghasprovenusefulinclassifyingclinicaldata.TrainingdatarequireslaboriousexpertannotaQon.

ObjecQve:ReduceexpertannotaQoneffortthroughsemi-automaQcadjudicaQonofVSalertsasrealorarQfacts,whilemaintaininghighaccuracy.

CPPaQents

SurgicalIntervenQon

Nosurgery

Selectedbydoctors,notatrandom GDI+

Noguaranteeofsuccess

GaitDeviaQonIndex

GDI-

GDI+

GDI-

Mosthavephysicaltherapy

PaQentdoesnotalwaysworsen

Subject Information

Predict 𝞓GDI (Gait Deviation Index) following surgery High 𝞓GDI

Low 𝞓GDI

Stronger patients improve

High initial GDI can’t be improved

Preoperative GDI

Str

engt

h sc

ore

WeaimtopredicttheimprovementinGaitDeviaQonIndex(ΔGDI)following•  Surgery,forthetreatmentgroup•  AlternaQvetreatmentforthecontrol

group(forinstance,physicaltherapy)FeaturesusedinpredicQon:age,BMI,jointangles,motorcontrol,strength,walkingefficiency(oxygencost),iniQalGDIWeshowasubsetofpreliminaryresultsforPropensityScoregroup3(PS3)

Theseverityofthedisease,intermsofGaitDevia<onIndex,isthestrongestpredictorofoutcome

01

23 0 2 4 6 8

0

5

10

15

20

25

30

35

O2cost pctSMCO2cost pre

age

20 40 60 80 100−50

0

5050

60

70

80

90

100

KneeFlex maxFlexion

AnkleDorsi IC

GD

I pre

0

50

100

−20

0

20

400

0.2

0.4

0.6

0.8

AnteversionAnkleDors Kn90

Agg

rega

teSp

asSc

ore

pre

−30.1

−25

−19.8

−14.7

−9.57

−4.43

0.704

5.84

11

16.1

21.2

Predict 𝞓GDI for the controls

VisualtoolkitforInformaQveProjecQonRecovery(VIPR)1.Analysistoolforthefollowingtasks:regression,classificaQon,clustering2.User-specifiedparameters•  Featurestobeused•  Numberofsubmodels•  Dimensionalityofsubspaces•  Hypothesisclass•  Hold-outsetevaluaQon3.ManipulaQonsoftrainedmodels•  Add/removefeatures/samples•  Comparemodels•  ObservepredicQonontestsamples•  Providefeedbackonlabels