Sam Madden1, Jane Greenberg2 3, Carsten Binnig4, Tim Kraska4, Danny Weitzner1, & Sam Grabus2 3
A Licensing Model and Ecosystem for Data Sharing
1 MIT, 2 Metadata Research Center, 3 Drexel University, 4 Brown University
ScreenShot2017-03-13at5.15.39PM
ReferencesGreenberg,J.,Grabus,S.,Hudson,F.,Kraska,T.,Madden,S.,&Bastón,R.(2016).The
northeastbigdatahub:“Enablingseamlessdatasharinginindustryandacademia”workshop.Philadelphia,PA:TheNortheastBigDataInnovationHub.
Nelson,G.(2015).Practicalimplicationsofsharingdata:Aprimerondataprivacy,anonymization,andde-identification.PaperpresentedatSASGlobalForum,Dallas,TX.
ALicensingModelandEcosystemforDataSharingissupportedbyNSFaward:1636788
Summary
ApartoftheNSFBigDataregionalinnovationhubprogram,theNortheasthub,isaddressingkeydatasharingchallengesby:
• Creatingalicensingmodelfordatathatfacilitatessharingdatathatisnotnecessarilyopenorfreebetweendifferentorganizations,
• Developingaprototypedatasharingsoftwareplatform,ShareDB,whichwillenforcesthetermsandrestrictionsofthedevelopedlicenses,and
• Developingandintegratingrelevantmetadatathatwillaccompanythedatasetssharedunderthedifferentlicenses,makingthemeasilysearchableandinterpretable.
Toensurethatthedevelopedtoolsandlicensesareuseful,theprojectwillformtheNortheastDataSharingGroup,comprisedofmanydifferentstakeholderstomakethelicensingmodelwidelyacceptedandusableinmanyapplicationdomains(e.g.,healthandfinance).
Timeline• Year1:Requirementsgathering.Initialrequirements-gathering
workshop• Year2:Version 0.1firstdraftversionofthelicensingmodel.
Presentmodelatworkshopforsuggestions.Identifystakeholderswhocommittousingthelicensingmodel
• Year3:TransitionNorthEastDataSharingConsortiumintoanon-profitorganization
Rationale
Sharingofdatasetscanprovidetremendousmutualbenefitsforindustry,researchers,andnonprofitorganizations.Amajorobstacleisthatdataoftencomeswithprohibitiverestrictionsonhowitcanbeused(e.g.,requiringtheenforcementoflegaltermsorotherpolicies,handlingdataprivacyissues,etc.).Additionally,manyattemptstosharerelevantdatasetsbetweendifferentstakeholdersinindustryandacademiafailorrequirealargeinvestmenttomakedatasharingpossible.
KeyComponents1. Data-sharingLicensingFramework/Generator2. Data-SharingPlatform(enforcelicenses)3. Metadata(SearchLicenses&Data)
LicensingFramework:Creatingasetofpossibleoptionsthatcanbeeasilycomposedintoastandardizeddatasharingagreementfordifferentdomains.
Data-SharingPlatform:Developaprototypesoftwaresystemfordatasharing,whichseamlesslyenforcestherestrictionsstatedinthedevelopedlicenses.
Metadata:Developametadataschemewhichleveragesthebest-of-breedfromthevastamountofexistingmetadatastandards.
CurrentProgress@Brown,Drexel,&MIT
• Gatheringexamplesofdatasharinglicenses
• Parsingessentialattributes
• Naturallanguageprocessing,termclusteringandcategorization
• Initialdatasharingplatform(DataHub)
• Addedsupportforaccesscontrol&authorization
• Exploringresearchissuesrelatedtoanonymization&de-identificationofPII
Incollaborationwith: