evaluation of interactive machine learning systems › ial2019 › pdf ›...

85
Evaluation of Interactive Machine Learning Systems Nadia Boukhelifa IAL, September 2019

Upload: others

Post on 28-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

EvaluationofInteractiveMachineLearningSystems

NadiaBoukhelifaIAL,September2019

Page 2: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

EvaluationofInteractiveMachineLearningSystems

NadiaBoukhelifaIAL,September2019

visual

(IVMLs)

Page 3: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

WhoamI?

3

NadiaBOUKHELIFA

Ph.D.2007—UniversityofKent,UKComputerScience,InformationVisualization

Post-doc—INRIA,TélécomParisTech,FranceVisualization,Human-ComputerInteraction

Researcher,Tenured,2016—INRA,FranceMulti-dimensionalDataVisualization,InteractiveModelling

Page 4: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

INRA—NationalInstituteofAgriculturalResearch

4

NadiaBOUKHELIFA

Page 5: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

INRA—NationalInstituteofAgriculturalResearch

5

NadiaBOUKHELIFA

http://www.cfsg.fr/site-de-grignon

Page 6: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�6

MALICESTeam:ModellingandKnowledgeIntegration

Expertiseformalisation

Uncertainty

Deterministicandstochasticmodelling

DecisionmakingOptimisation,visualisation

Page 7: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�7

Expertiseformalisation

Uncertainty

Deterministicandstochasticmodelling

DecisionmakingOptimisation,visualisation

NotexpertinAI

MALICESTeam:ModellingandKnowledgeIntegration

Page 8: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�8

Context:complexsystems,complexdatasets,complexmodels…

Azodyn-blé (Jeuffroy, Recous, Girard, Barré, Champeil, Meynard) modèle complet

Caractéristiques du sol (Clay%, CaCO3%, N%)

Données climatiques (température, pluie, Irr)

Nature du précédent

Amendements organiques (date, dose, forme)

Minéralisation nette de l�humus, des résidus et des amendements organiques (Mh, Mr, Ma)

Engrais minéral ou organique (date et dose)

Nmin dans sol

Vitesse de croissance la semaine avant apport

Indice de nutrition azotée (NNI)

Quantité de N aérien réel

Courbe de dilution maxi, Vmax

Climat (rayonnement, température)

Quantité maxi de N accumulé dans la culture

Biomasse aérienne

Rayonnement intercepté

Indice foliaire

RUE

NG/m²réel Quantité max de N dans les grains

Max Biomass in grains

Biomasse des grains = rendement

Quantité d�N dans les grains

Teneur en protéines

Acides aminés accumulés

N remobilisé

N dans organes végétatifs

Nmin dans sol à récolte climat (température)

CAU de l�engrais

NE/m²

climat avant floraison (température, rayonnement, déficit hydrique) NG/m² max

ENTREES

SORTIES

Jeuffroy, M-H., and Sylvie Recous. "Azodyn: a simple model simulating the date of nitrogen deficiency for decision support in wheat fertilization." European journal of Agronomy 10, no. 2 (1999): 129-144.

Page 9: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�9

Context:InteractiveModelExploration

Page 10: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Whyhumanintheloopinmodelling?

• tointegratevaluableexpertsknowledgethatmaybehardtoencodedirectlyintomathematicalorcomputationalmodels.

• tohelpresolveexistinguncertaintiesasaresultof,forexample,biasanderrorthatmayarisefromautomaticmachinelearning.

• tobuildtrustbymakinghumansinvolvedinthemodellingorlearningprocesses.

• tocaterforindividualhumandifferencesandsubjectiveassessmentssuchasinartandcreativeapplications.

�10

Page 11: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

VisualisationOptimisationModel Simulations

Experimental System

Application!Multiple Computational Stages!

Multiple Actors!

Feedback

Interaction!Results !

11

Boukhelifa,Nadia,etal."AnExploratoryStudyonVisualExplorationofModelSimulationsbyMultipleTypesofExperts."Proceedingsofthe2019CHIConferenceonHumanFactorsinComputingSystems.ACM,2019.

OurapproachtoInteractiveModelExploration

Page 12: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

VisualisationOptimisationModel Simulations

Experimental System

Application!Multiple Computational Stages!

Multiple Actors!

Feedback

Interaction!Results !

12 Boukhelifa et al., CHI2019

Collaborativesetup

OurapproachtoInteractiveModelExploration

Page 13: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

VisualisationOptimisationModel Simulations

Experimental System

Application!Multiple Computational Stages!

Multiple Actors!

Feedback

Interaction!Results !

13

• Modelswrittenbythirdparties• Expertsspecialistsinpartsofthemodelledprocess

=>Co-locatedmultipleexpertise:domain,model,optimisation,visualization.

OurapproachtoInteractiveModelExploration

Collaborativesetup

Boukhelifa et al., CHI2019

Page 14: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

VisualisationOptimisationModel Simulations

Experimental System

Application!Multiple Computational Stages!

Multiple Actors!

Feedback

Interaction!Results !

Data & Models 14

OurapproachtoInteractiveModelExploration

Boukhelifa et al., CHI2019

Page 15: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

VisualisationOptimisationModel Simulations

Experimental System

Application!Multiple Computational Stages!

Multiple Actors!

Feedback

Interaction!Results !

Data & Models 15

Trade-offs

OurapproachtoInteractiveModelExploration

Boukhelifa et al., CHI2019

Page 16: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

VisualisationOptimisationModel Simulations

Experimental System

Application!Multiple Computational Stages!

Multiple Actors!

Feedback

Interaction!Results !

Data & Models Pareto Fronts 16

setofnon-dominatedcompromisepoints,wherenoobjectivecanbeimprovedwithoutsacrificingatleastoneotherobjective.

QU

AN

TITY

OF

ITEM

2

QUANTITY OF ITEM 1

OurapproachtoInteractiveModelExploration

Boukhelifa et al., CHI2019

Page 17: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

VisualisationOptimisationModel Simulations

Experimental System

Application!Multiple Computational Stages!

Multiple Actors!

Feedback

Interaction!Results !

Visualization SystemData & Models Pareto Fronts 17

QU

AN

TITY

OF

ITEM

2

QUANTITY OF ITEM 1

OurapproachtoInteractiveModelExploration

Page 18: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�18

http://www.seguetech.com/

a way to generate pretty images from data

Definition#1:Visualization

Page 19: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�19

Definition#1:Visualization

Thepurposeofvisualizationisinsight,notpictures!

http://www.seguetech.com/

Page 20: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�20

Definition#1:Visualization

Informa<onVisualisa<on

“Theuseofcomputer-supported,interactive,visualrepresentationsofabstractdatatoamplifycognition.”[Cardetal.,1999]

Thepurposeofvisualizationisinsight,notpictures!

http://www.seguetech.com/

Card,StuartK.,JockD.Mackinlay,andBenShneiderman."Usingvisiontothink."Readingsininformationvisualization.MorganKaufmannPublishersInc.,1999.Card,S.

Page 21: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

WhyvisualiseyourdataRawDatafromAnscombe’sQuartet

�21

FrankAnscombe

Page 22: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

StatisticalanalysisRawDatafromAnscombe’sQuartet

�22

StatisticalProperties

Meanofx 9.0

Varianceofx 11.0

Meanofy 7.5

Varianceofy 4.12

Correlationbetweenxandy 0.816

Linearregressionline y=3+0.5x

Page 23: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

VisualrepresentationofthedataRawDatafromAnscombe’sQuartet

�23

VisualProperties

Samestats,differentgraphs!

Page 24: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Nevertrustsummarystatisticsalone;visualiseyourdata!

�24

DatasaurusDozenDataset,AutodeskResearchhttps://www.autodeskresearch.com/publications/samestat

Visualrepresentationofthedata

Page 25: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�25

Communicate visually

Explore interactively

Evaluate

InformationVisualization

Page 26: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�26

Definition#2:Human-ComputerInteraction

“Human-computer interaction is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them.”

Hewett,T.etal.ACMCurriculaforHuman-ComputerInteraction.1992;

Page 27: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

“algorithms that can interact with both computational agents and human agents (in active learning: oracles) and can optimize their learning behavior through these interactions.”

�27

Definition#3:iML

“Users Are People, Not Oracles”

Takingintoaccounthumanfactors

Amershietal.,PowertothePeople:TheRoleofHumansinInteractiveMachineLearning,2014

AndreasHolzinger.Interactivemachinelearning(iml).InformatikSpektrum,39(1),2016.DanielKottke,InteractiveAdaptiveLearning,IntelligentEmbeddedSystems,UniversityofKassel,2018

Page 28: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�28

Definition#4:IVML

“In interactive visual machine learning (IVML), a human operator and a machine collaborate to achieve a task mediated by an interactive visual interface.”

Ourdefinition:

N.Boukhelifa,A.Bezerianos,andE.Lutton.EvaluationofInteractiveMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

Page 29: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�29

EvaluationofInteractiveSystems

✔Userperformance

✔Userexperience

✔Algorithmicperformance

Weknowhowtoassess:

Page 30: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�30

EvaluationofIVMLSystems

https://unsplash.com/photos/VBe9zj-JHBs

Page 31: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�31

OurexperiencewithEVE-EvolutionaryVisualExploration

N.Boukhelifa,A.Bezerianos,I-CTrelea,N.MPerrot,andELutton.AnExploratoryStudyonVisualExplorationofModelSimulationsbyMultipleTypesofExperts."ACMCHIConferenceonHumanFactorsinComputingSystems,2019

N.Boukhelifa,A.Bezerianos,andE.Lutton.EvaluationofInteractiveMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

N.Boukhelifa,A.Bezerianos,W.CancinoandE.Lutton.EvolutionaryVisualExploration:EvaluationofanIECFrameworkforGuidedVisualSearch.EvolutionaryComputationJournal,MITpress,2017.

N.Boukhelifa,A.BezerianosandE.Lutton.AMixedApproachfortheEvaluationofaGuidedExploratoryVisualization System.EuroVisWorkshoponReproducibility,Verification,andValidationinVisualization(EuroRV3)2015.

W.Cancino,N.Boukhelifa,A.BezerianosandE.Lutton.EvolutionaryVisualExploration:ExperimentalAnalysisofAlgorithmBehaviour.GECCOworkshoponGeneticandEvolutionaryComputation(VizGEC2013).

N.Boukhelifa,W.Cancino,A.BezerianosandE.Lutton.EvolutionaryVisualExploration:EvaluationWithExpertUsers.ComputerGraphicsForum(EuroVis2013),EurographicsAssociation,2013,32(3).

EvaluationofIVMLSystems

Page 32: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

N.Boukhelifa,A.Bezerianos,I-CTrelea,N.MPerrot,andELutton.AnExploratoryStudyonVisualExplorationofModelSimulationsbyMultipleTypesofExperts."ACMCHIConferenceonHumanFactorsinComputingSystems,2019

N.Boukhelifa,A.Bezerianos,andE.Lutton.EvaluationofInteractiveMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

N.Boukhelifa,A.Bezerianos,W.CancinoandE.Lutton.EvolutionaryVisualExploration:EvaluationofanIECFrameworkforGuidedVisualSearch.EvolutionaryComputationJournal,MITpress,2017.

N.Boukhelifa,A.BezerianosandE.Lutton.AMixedApproachfortheEvaluationofaGuidedExploratoryVisualization System.EuroVisWorkshoponReproducibility,Verification,andValidationinVisualization(EuroRV3)2015.

W.Cancino,N.Boukhelifa,A.BezerianosandE.Lutton.EvolutionaryVisualExploration:ExperimentalAnalysisofAlgorithmBehaviour.GECCOworkshoponGeneticandEvolutionaryComputation(VizGEC2013).

�32

EvaluationofIVMLSystems

w. Cancino A. Bezerinaos E. Lutton

Our experience with EVE - Evolutionary Visual Exploration

Page 33: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�33

EvolutionaryVisualExploration

Needtoexplorecombineddimensionshttps://unsplash.com/photos/YTbFHT9_IhY

Page 34: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�34

Howtoexploreahugesearchspaceofpossiblecombined

dimensions?

Page 35: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�35

Howtoexploreahugesearchspaceofpossiblecombined

dimensions?

n-DdatasetInteresting2D projectionsEV

E IEAs

EvolutionaryVisualExploration

Page 36: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�36

WhyIEAs?

✔cancombineobjective&subjectivemeasures

✔supportexploration&exploitation

✔adapttouserchangeofinterest

Suitableforexploratoryvisualization:

Page 37: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

EVEPrototype

�37

IEA

Bou

khel

ifa e

t al.,

Eur

oVis

201

3

Page 38: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�38

Page 39: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

ArtificialEvolution

NaturalEvolution EvolutionaryAlgorithms(EAs)

�39

Wikipedia

Page 40: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

EVE:CreatingNewProjections

�40

Biscuit data (6D)

Top_heat, Bottom_heat

+,-,*,/, (.)(.), exp and log

offspring1: 0.2 * Top_heat + 0.7 * Bottom_heat

offspring2: Exp(Top_heat)

offspring3: 0.99 - (0.69 * log(Top_heat)) …

Favor linear patterns

Choose

offspringX

InteractiveEvolution(IEAs)

Page 41: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

EvaluationofProjections

�41

User assessment ComplexitySurrogate function

Scagnostics, Wilkinson and Wills, 2008

learned

interactivecomputed

Page 42: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

TheSearchSpace

TheSetofalldimensionsencodedastrees(GPframework,Koza92)

�42

initialdimensions,operators,constants

Page 43: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

SubspaceExploration

�43

Page 44: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

SubspaceExploration

�44

Page 45: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

KeyfeaturesofEVE

�45

Intuitive

Adaptable

Interactive

Flexible

Isit?

Page 46: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

EvaluatingEVE

46

Page 47: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Amixedapproach

�47

Algorithm centred evaluation

User centred Evaluation

Page 48: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Isthehuman-machinecooperationproducingtherightresults?

�48

Whenwehaveground-truth

Page 49: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Aim

• istheexplorationactuallysteeredtowardaninterestingareaofthesearchspace?

• aretheproposedsolutionsvaried?

Quantitativestudymethodology

• syntheticdataset

• pre-specifiedtask

�49

Evaluatinghuman-AIcollaboration

Page 50: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

• 12participants• mean28.5years• noexperiencerequired• Syntheticdataset5D• 20minutes

Participants

�50

Page 51: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Gameseparatetwopointclusters

Task

�51

Page 52: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

IEAisabletofollowtheorderofuserranking

Convergenceanalysis

�52

Page 53: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

failure

failuresuccess

success

Interfaceevaluation-usestrategiesVisualpatternselectionstrategies

Page 54: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

failure

failuresuccess

success

Interfaceevaluation-usestrategiesVisualpatternselectionstrategies

Page 55: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Keyfindings

Userstakedifferentsearchandevaluationstrategiesevenforasimpletask.

Onaveragethesurrogatefunctionfollowstheorderofuserrankingfairlyconsistently.

Linkbetweenuserevaluationstrategy,andoutcomeofexploration&speedofconvergence.

�55

Page 56: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Promisingresultsbut…

• Realworldsituationshavemorecomplexdatasetsandtasks.

• Usersarenotalwaysconsistentorgivedetailedfeedback.

• Wedonotalwayshavegroundtruth.

What happens when we do not have ground truth ?

�56

Page 57: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

User-centredevaluation

Insightandusabilityevaluation

• areexpertsabletoconfirmoldknowledge?

• areexpertsabletogainnewinsight?

Qualitativestudymethodology

• thinkaloud,observe,interview&questionnaire

• videotapedandlogdatacapture

�57

Page 58: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

• 5domainexperts

• mean34.2years

• owndatasets

• pre-questionnaire

• 2.5hours

Participants

�58

Page 59: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Training

T1:showinthetoolwhatyoualreadyknowaboutthedata

T2:explorethedatainlightofaresearchquestion

Tasks

�59

Page 60: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

EVEResults

Hypothesisgeneration

�60

“thiscombinationmaybeanimportantfindingbecauseitinvolvesparametersthataffectonlyonepartofthe

simulationmodel...”

Cityemergencemodel

Page 61: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

EVEResults

Hypothesisquantification

‘‘wealwaystalkaboutthisqualitatively.Thisisthefirst

timeIseeconcreteweights...’’

�61

Electricityconsumptionprofiles

Page 62: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

EVEResults

�62

expertswereableto:• tryoutalternativescenarios• thinklaterally• quantifyaqualitativehypothesis• formulateanewhypothesisorrefineoldone• (domainvalue)

But…

Page 63: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�63

« On one hand, humans perform unpredictable and sophisticated reasoning; on the other, artificial solvers are technically complex and adopt solving strategies which are very different from those employed by humans.

Secondly, the environment from which the problem to be solved is drawn is usually uncontrollable and uncertain.

Together, these factors complicate the task of designing precise and effective evaluation studies. »

G. Cortellessa and A. Cesta, AAAI

EvaluationofIVMLisChallenging

Cortellessa,Gabriella,andAmedeoCesta."EvaluatingMixed-InitiativeSystems:AnExperimentalApproach."ICAPS.Vol.6.2006.

Page 64: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�64

EvaluationofIVMLisChallenging

Ch#1ComplexHumanFactors

Ch#2MultipleExpertise

Ch#3StochasticProcesses

Ch#4Co-adaptation

Ch#5Uncertainty

Page 65: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�65

Ch#1ComplexHumanFactors

“Users Are People, Not Oracles”

FrustratedBored

FatigueAnnoyed

Inconsistent

Reluctanttogivefeedback

Interruptibility

Amershi et al., Power to the People: The Role of Humans in Interactive Machine Learning , 2014

Bias

e.g.,Needbettertechniquestocaptureuserintent.

Page 66: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�66

Ch#2MultipleExpertise

ShouldIVMLhelpgroupsreachconsensus?Encouragemultiple

views?

Forus:buildcommongroundandselectbest

trade-offs

Page 67: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�67

Ch#2MultipleExpertise

ShouldIVMLhelpgroupsreachconsensus?Encouragemultiple

views?

Forus:buildcommongroundandselectbest

trade-offs

Weneedtocaterforindividualaswellascollective

exploration

Needsetupsthatcanswitchbetweenindividualand

collectivelearning

Page 68: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�68

• Riskofgettingstuckinlocaloptima?

�68

• Effectofstochasticityonuser’smentalmodeoftheML

Ch#3StochasticProcesses

Page 69: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Ch#3Co-adaptation

�69

“ Co-adaptive phenomena are defined as those in which the environment affects human behavior and at the same time, human behavior affects the environment. Such phenomena pose theoretical and methodological challenges and are difficult to study in traditional ways. ”

W. E. Mackay, Users and Customizable Software: A Co-Adaptive Phenomenon, 1990.

Page 70: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Ch#4Uncertainty

�70

Differentsourcesofuncertaintyarisingfrom:• Automatic

inferences• Analysts

reasoning

https://www.facebook.com/pedromics

Page 71: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�71

N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

ReviewofCHI/VIZpublications2012-2017Keywordsearch:IML+Evaluation19papersfromdifferentdomains

Notanexhaustivesurvey!

IMLEvaluation-HCIStudies

Page 72: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�72

N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

IMLEvaluation-HCIStudies

Page 73: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�73

N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

HumanFeedback:

• implicit(7papers)

• explicit(8papers)

• mixed(4papers):• implicithumanfeedbackhelpsinferinformationtocomplementimplicithumanfeedback.

IMLEvaluation-HCIStudies

Page 74: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�74

N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

SystemFeedback:

• toinformhumansaboutthestateofthemachinelearningalgorithm,andprovenanceofsystemsuggestions.

• canbevisual,progressive,andcanindicateuncertainty.

• mostsystemsgavefeedback.

• challenge:toinformwithoutoverwhelmingtheuser.

IMLEvaluation-HCIStudies

Page 75: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�75

N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

Typesofstudy:

• 12/19studiesinvolvingusers(someforofcontrolledstudy)

• Difficulttoconduct:• potentialconfoundingfactors

• nogroundtruth

IMLEvaluation-HCIStudies

Page 76: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�76

N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

EvaluationMetrics:

objective:• time,precision,re-call,#Insights

• iMLvs.baselineorvariantsofML;with/outsystemfeedback;explicitvs.implicit;impactofuserfeedback

• difficultyseparatingusabilityissuesfromtaskresults

IMLEvaluation-HCIStudies

Page 77: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�77

N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

EvaluationMetrics:

subjective:• aspectsofuserexperience,e.g.,reported,easiness,speed,taskload,trust,andconfidence.

IMLEvaluation-HCIStudies

Page 78: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�78

N.Boukhelifa,A.Bezerianos,andE.LuQon.Evalua<onofInterac<veMachineLearningSystems.InHumanandMachineLearning,pp.341-360.Springer,Cham,2018.

EvaluationMetrics:

subjective:• aspectsofuserexperience,e.g.,reported,easiness,speed,taskload,trust,andconfidence.

IMLEvaluation-HCIStudies

“One sign of success of iML systems is when humans forget that they are feeding information to an algorithm, and rather focus on

synthesising information relevant to their task”. Endertetal.,2012[38]

Page 79: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

GuidelinesforIVMLevaluation

�79

Weknowhowtoevaluateinterfaces&interactivesystems,butveryfewguidelinesexistspecificallyforiMLsystems!

JakobNielsen's10UsabilityHeuristicsforUserInterfaceDesign

G1:VisibilityofsystemstatusG2:MatchbetweensystemandtherealworldG3:UsercontrolandfreedomG4:ConsistencyandstandardsG5:ErrorpreventionG6:RecognitionratherthanrecallG7:FlexibilityandefficiencyofuseG8:AestheticandminimalistdesignG9:Helpusersrecognize,diagnose,&recoverfromerrorsG10:Helpanddocumentation

Page 80: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

GuidelinesforIVMLevaluation

�80

G1Developingsignificantvalue-addedautomation.

G2Consideringuncertaintyaboutauser’sgoals.

G3Consideringthestatusofauser’sattentioninthetimingofservices.

G4Inferringidealactioninlightofcosts,benefits,anduncertainties.

G5Employingdialogtoresolvekeyuncertainties.

G6Allowingefficientdirectinvocationandtermination.

G7Minimizingthecostofpoorguessesaboutactionandtiming.

G8Scopingprecisionofservicetomatchuncertainty,variationingoals.

G9Providingmechanismsforefficientagent−usercollaborationtorefineresults.

G10Employingsociallyappropriatebehaviorsforagent−userinteraction.

G11Maintainingworkingmemoryofrecentinteractions.

G12Continuingtolearnbyobserving.

PrinciplesofMixed-InitiativeUserInterfaces-EricHorvitz,1999

Page 81: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

GuidelinesforIVMLevaluation

�81

G1Makeclearwhatthesystemcando.

G2Makeclearhowwellthesystemcandowhatitcando.G3Timeservicesbasedoncontext.G4Showcontextuallyrelevantinformation.G5Matchrelevantsocialnorms.G6Mitigatesocialbiases.

G7Supportefcientinvocation.

G8Supportefcientdismissal.

G9Supportefcientcorrection.

G10Scopeserviceswhenindoubt.

G11Makeclearwhythesystemdidwhatitdid.G12Rememberrecentinteractions.G13Learnfromuserbehavior.G14Updateandadaptcautiously.G15Encouragegranularfeedback.

G16Conveytheconsequencesofuseractions.

G17Provideglobalcontrols.

G18Notifyusersaboutchanges.

Newguidelinesproposed,e.g.,Amershietal.,2019.

Page 82: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Noconclusions-researchquestions!

�82

Q1WhataspectsorcomponentsoftheIVMLsystemaremostimportanttoevaluate?

Q2Whattypesoftaskscanbedelegatedtomachinelearningandwhicharebestlefttohumans?

Q3WhoarethetargetusersofIVMLsystems?

Q4Whatistheroleofexpertiseinthiscontext?

Q5ShouldwehavedomainexpertstrainIVMLsystems?

Q6Whataretherisksandbenefitsofintroducinghumanexpertiseintothe(machine)learningprocess?

Q7Whatmetricsshouldweusetoevaluate?

Q8Canweestablishapplicationordomain-independentmetrics?

Q9DoweestablishdifferentevaluationsmeasuresfortheunderstandingofIVMLsystemsandfortheirperformance?

Q10Canweestablishbenchmarkdatasetsandtestuse-casestohelpevaluateIVMLsystems?

Q11Shouldweseekreplicationinthiscontext,andifso,howdowesupportreplicationofresultsinaco-learningoradaptiveenvironment?

Q12Howdowecommunicatetheevaluationresultstootherdisciplines?

eviva-ml.github.io

Page 83: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

iML-eval:On-goingResearchTopic

�83

21 October, 2019

eviva-ml.github.io

Page 84: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

�84

eviva-ml.github.ioEarly registration deadline 20/09

Organizing Committee

• Nadia Boukhelifa (INRA, FR)• Anastasia Bezerianos (Univ. Paris-Sud and INRIA, FR)• Enrico Bertini (NYU Tandon School of Engineering, USA)• Christopher Collins (Uni. of Ontario Institute of Technology, CA)• Steven Drucker (Microsoft Research, USA)• Alex Endert (Georgia Tech, USA)• Jessica Hullman (Northwestern University, USA)• Michael Sedlmair (University of Stuttgart, DE) • Remco Chang (Tufts University, USA)• Chris North (Virginia Tech, USA)

Page 85: Evaluation of Interactive Machine Learning Systems › ial2019 › pdf › IAL_invited_boukhelifa.pdf · Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning,

Noconclusions-researchquestions!

�85

Q1WhataspectsorcomponentsoftheIVMLsystemaremostimportanttoevaluate?

Q2Whattypesoftaskscanbedelegatedtomachinelearningandwhicharebestlefttohumans?

Q3WhoarethetargetusersofIVMLsystems?

Q4Whatistheroleofexpertiseinthiscontext?

Q5ShouldwehavedomainexpertstrainIVMLsystems?

Q6Whataretherisksandbenefitsofintroducinghumanexpertiseintothe(machine)learningprocess?

Q7Whatmetricsshouldweusetoevaluate?

Q8Canweestablishapplicationordomain-independentmetrics?

Q9DoweestablishdifferentevaluationsmeasuresfortheunderstandingofIVMLsystemsandfortheirperformance?

Q10Canweestablishbenchmarkdatasetsandtestuse-casestohelpevaluateIVMLsystems?

Q11Shouldweseekreplicationinthiscontext,andifso,howdowesupportreplicationofresultsinaco-learningoradaptiveenvironment?

Q12Howdowecommunicatetheevaluationresultstootherdisciplines?

eviva-ml.github.io

Danke!