Download - Are we data responsible?
Assessingyouralgorithms,tools,systems:benchmark,benchmark,benchmark!
IriniFundulakiInstituteofComputerScience–FORTH,Greece
7/21/16 DagstuhlSeminar16291:Data,Responsibly 1
StatusoftheLinkedOpenDataCloud,2014
7/21/16 DagstuhlSeminar16291:Data,Responsibly 2
Media
Government
Geographic
Publications
User-generated
Lifesciences
Cross-domain
Morethan31BtriplesinLOD
Links(external):500M
TheQuestion(s)• WhicharetheproblemsthatIwishtosolve?• Whicharetherelevantkeyperformanceindicators?• Whichisthebehavioroftheexistingtoolsw.r.t.thekey
performanceindicators?
7/21/16 DagstuhlSeminar16291:Data,Responsibly 4
Whicharethetool(s)thatIshoulduseformydataandformyusecase?
TheAnswer:Benchmarkyourtools!
7/21/16 DagstuhlSeminar16291:Data,Responsibly 5
• Benchmarkcomprisesof
– datasets(syntheticorreal)– setofsoftwaretools
• Syntheticdatagenerators• Testcasegenerators
– Keyperformanceindicators,and
– setofclearexecutionrules• Standardizedapplicationscenario(s)thatserveasabasisfor
testingsystems
• Mustincludeaclearsetoffactorstobemeasuredandtheconditionsunderwhichthesystemsshouldbemeasured
BenchmarksfortheLinkedDataValueChain
7/21/16 DagstuhlSeminar16291:Data,Responsibly 6
Whataboutdataresponsibility?
WeneedbenchmarksthattestwhetherthetoolsforeachsteptheBigLinkedDatavaluechainaredataresponsible!
Arewedataresponsible?• Canweidentifyusecasesofinterest?– EmergencyResponsePriorities– TargetedAds– Loans– CriminalJustice– (Universal)Healthcare,PrecisionMedicine– Evaluationofemployeeperformance(e.g.,teachers,publicservants)
– PricingSchemes– H2020andNSFProjectFunding– …
7/21/16 DagstuhlSeminar16291:Data,Responsibly 7
©InputfromtheUseCase&BenchmarksBreakoutGroup@DagstuhlSeminar16291
Arewedataresponsible?• CanweidentifyKeyPerformanceIndicators?– Discrimination/Fairness– Neutrality– Transparency– Diversity– …
• Canweidentifytoolswhoseresponsibilitywemustmeasure?– SearchEngines?– ServiceProviders(e.g.,Amazon,eBay)?– SocialNetworkProviders(e.g.,FaceBook,Twitter)?– DecisionMakers?– …7/21/16 DagstuhlSeminar16291:Data,Responsibly 8
Arewedataresponsible?• Canwefinddatasetsforourusecases?– Open(public/private)datasets– User-provideddatasets(throughcrowdsourcing?)– …
• Whataretheteststhatweneedtosetuptobenchmarkthetools?– Differaccordingtotheusecase
7/21/16 DagstuhlSeminar16291:Data,Responsibly 9
• Weneedaprincipled,nonbiasedsetofguidelinesforbenchmarkdefinition!
• Wemustconsiderguidelines,legislation(nationalandinternational)whendefiningbenchmarks!
• Multidisciplinaryworkrequired!