lg-71-17-0094-17 indiana university data to insight center · lg-71-17-0094 indiana university data...
TRANSCRIPT
LG-71-17-0094-17 Indiana University Data To Insight Center
Abstract
Overthelastdecade,societyisseeinganearlyexponentialincreaseinthevolumeofdigitalcontent.ResearchersandeducatorsinresponseseethepotentialthatBigDatatechniquesbringtocomputationalexplorationorculturalandscholarlydigitalcollectionsfororganizing,accessing,andanalyzingcontent.LibrarieshavelongmadeamissionofprovisioningaccessservicestodigitalcontenttoenrichandimprovethelivesofallAmericans,however,whendigitalcollectionshaveaccessrestrictions,provisioningservicesbecomesachallenge.
WerespondtothischallengewiththeDataCapsuleservice,developedintheHathiTrustResearchCenter,thatenablesremoteaccesstorestricteddigitaldataintheHathiTrustDigitalLibrary.DataCapsuleisarchitectedtobemodularandusesapplicationprogramminginterfaces(APIs)forcommunication;thisbestpracticeinsystemsdesignplusproposedeffortinpackaging,willallowforfasterintegrationintoanewenvironmentandreadycontributionsbythirdparties.
Inthisproject,weintendtopartnerwith8academiclibrariesacrossthecountryinamulti-methodresearchprojectthatdrawsfromhumancomputerinteractionandexperimentalcomputerscienceto:
• Understandcurrentlibraryneedsandpracticesinprovisioninglibraryservicesforcomputationalaccesstospecialcollectionshavingconstraintsduetosensitivityorrestrictions
• ExtendtheDataCapsuleservicetobroaderneedsofprovisioningforanalyticalaccesstorestrictedcollectionsacrossarangeofcollectionsanduses,
• StudyextensionsofDataCapsuletocloudcomputingenvironmentsforbroaderuses• Identifygapsinskillsneededforlibrarianstoenablesecuredataanalyticsandprovideresourcesthat
canaddressthosegaps.
Thisprojectproposal,responsivetotheIMLSNationalLeadershipGrantsforLibrariesprogram,isplannedasa2-yeareffort.IffundeditwillbecarriedoutundertheencompassingframeworkofParticipatoryDesignandinvolvefundedpartnersatIndianaUniversity,UniversityofIllinois,UniversityofCaliforniaatBerkeley,andUniversityofVirginia;andengagedpartnersatIndiana University, LafayetteCollege,MIT,RutgersUniversity,SwarthmoreCollege,andUCLA.
Inresponsetoreviewerfeedback,weincreasedthenumberoflibrarypartnersintheprojectfrom3-5to8,andintroducedthetwo-tieredpartnermodel.Level1partners(2)receivedirectfundingthroughthegrant.Level2partners(6)receivetravelfundsbuiltintotheIndianaUniversitygranttoparticipateinaregionalcommunity-buildingevent.Thechangeresultedinanincreaseofabout15%fromthepre-proposal.
Sustainabilityisplannedthroughutilizinganexistingoperationalservice,growingitsadoptercommunity(libraries),extendingforbroadercollectionsandusecases.TheserviceitselfisgroundedintheHathiTrustResearchCenter,whichcontinuestosupportandendorsetheDataCapsuleserviceasitsprimaryserviceforcomputationalanalysisonthenearly15millionvolumesoftheHathiTrustDigitalLibrary.HTRCdeeplywelcomesthisinitiativetoinvolvemorepartnersinuseandsustainersofthesoftwarecodebase.
LG-71-17-0094IndianaUniversityDataToinsightCenter
Narrative,p.1
DataCapsuleApplianceforResearchAnalysisofRestrictedandSensitiveDatainAcademicLibraries
1.StatementofNationalNeed
Overthelastdecade,societyisseeinganearlyexponentialincreaseinthevolumeofdigitalcontent[1].Thenewcontentiscomingintoexistenceontheculturalsidethroughmassivedigitizationefforts[2]orbecausecontentisincreasinglyborndigital.LibrarieshavelongmadeamissionofprovisioningaccessservicestodigitalcontenttoenrichandimprovethelivesofallAmericans[3].Whendigitizedcollections(ofletters,governmentpapers,videoclips,institutionalrecords,annotatedvolumes)haveaccessrestrictions,however,provisioningservicesbecomesachallenge.Collectionscanhaveaccessrestrictionsforanumberofreasons:asetofpapersthathavenotbeenproperlyaccessioned;acollectionofvideoswithmixedin-copyrightandpublicdomaincontent;materialdonatedbyaprominentresearcherthatcontainssensitiveinformationfromethnographicstudiesonaboriginalpeoples.Thedata-sidepushfornewservicestomeetthechallengeofrestrictedandsensitivecollectionsisbeingmetwithacorollaryenduserpull,asresearchersandeducatorsdiscoverthepotentialthatBigDatatechniquesbringtothehumanities[4]andotherareas,andbegintoenvisionopportunityintheirownresearchspheretotheexplorationofbothsmallorlargecollectionsofmaterialscomputationallyfororganizing,accessing,andanalyzingcontent.
Traditionaltypesoflibraryservicesofteninadequatelyaddressenduserneedswhenacollectionofmaterialsisrestrictedordeemedtocontainsensitivedata.Securedataenclavepilotsallowresearcherstoworkwiththisuniquetypeofdata[5]–[9].Yetsuchenclavesoftenarelimitedtoanalysisofmicrodatathroughcommonstatisticalpackages,makingthemless-suitedforotherusesastherearehundredsofdifferentcomputationalcontentminingtools,forexample,thetextanalysisportalTAPoRlists493ofthem[10].Additionally,enclavesarefrequentlycustom-builtforacollection,orasmallsetofcentrallylocatedcollections,makingthissolutionnotsoeasilyportabletonewinstitutionsorcollections.
Drawingonthemostpressingthemesoftrust,access,infrastructure,andskillsinprovidingdataservices[11],theoverarchinggoalofthisprojectismanifold:understandcurrentlibraryneedsandpracticesinprovisioningservicesforcomputationalaccesstospecialcollections,extendanexistingservicetoenableintuitiveandyetsecurecomputationalaccesstorestricteddatainlibraries,andidentifygapsinskillsneededforlibrarianstoenablesecuredataanalyticsandprovideresourcesthatcanaddressthosegaps.WeaimtobuilduponaservicethathasbeendevelopedintheHathiTrustResearchCenter(HTRC)thatenablesenduserstoremotelyaccesstheHathiTrustDigitalLibraryforcomputationaluse.Wepropose,aspartofthisgrant,topackagetheserviceasanappliancesothatitcanbeeasilyinstalledinalibrarytechnologicalenvironment,andextendtheservicetosatisfyscenariosofdifferentcollectionsandenduserneedsdrivenbyourlibrarypartners.TheserviceiscalledDataCapsule[12],[13],anditderivesfromtheoreticalworkonaconceptcalled“storagecapsules”[14].ThroughagrantfromtheAlfredP.SloanFoundation(2011-2015)theauthorofstoragecapsules,AtulPrakash,alongwithPlaleandMcDonald(lattertwoareleadsonthisproposal)developedthestoragecapsuleconceptintotheworkingDataCapsuleservice,whichbecameavailableinHTRCin2015.TheserviceinHTRCutilizesatoolcalledtheWorkset[15],whichmaintainsanenduser’scontext.
LG-71-17-0094IndianaUniversityDataToinsightCenter
Narrative,p.2
BuildingontheearlierworkoftheDCservice,weproposetoextendandevaluatethesystemundertheencompassingframeworkofParticipatoryDesignwithlibrarypartnersfromeightlibrariesacrossthecountrywhohavecommittedtoservingaseitherbeingLevel1testingpartners,eagertoengageinhandsonevaluation,orLevel2partners,readytoparticipateindiscussionsandstudies.Wepropose,throughthisParticipatoryDesignframework,toextendtheserviceto:
• Bepackagedasanappliancethatcanberunandmanagedlocallyatpartnerinstitutions• GeneralizetheDataCapsuleservicetoconnecttobroadertypesofrestrictedcollections• DeliverextensionstotheDataCapsuleserviceandWorksetmodelthatreflectpartnerneedsobtained
throughintensepartnerengagement• DeliveradesignofDataCapsulethatutilizeshighperformanceandcloudcomputingresourcesthat
accommodatesbothlarge-scaleneedsofpartnersandpartnerswithlightertechnologyresourcesavailabletothem
AstheDataCapsuleserviceisarchitectedusingprinciplesofwelldefinedAPIsandsoftwarecomponentmodularity,itishighlysuitedtoextensionandgeneralizationforthebroaderuse.
TheconceptualframeworkguidingthearchitectureofDataCapsule(DC)initscurrentformcanbeexplainedinthecontextoffairuse.Legaljudgmentsoffairusehaverepeatedlyreturnedtotwokeyanalyticalquestions[16]:First,“didtheuse“transform”thematerialtakenfromthecopyrightedworkbyusingitforabroadlybeneficialpurposedifferentfromthatoftheoriginalordiditjustrepeattheworkforthesameintentandvalueastheoriginal?”Andsecond,“Wasthematerialtakenappropriateinkindandamount,consideringthenatureofthecopyrightedworkandoftheuse?”InDC,thetransformingworkiscarriedoutbyanenduserwithinaCapsulethattheyhaveattheirdisposalforuseforanextendedperiodofweekstomonths.Theservicethenenforcesbothquestionsasfollows:
• Useisappropriate:theDCserviceassessesappropriatenessofthecontentexportedfromCapsule:o Unintentionalexportationsuchasthroughmalwareisstoppedo Intentionalexportationisreviewedthroughmanual(orinfutureautomatic)resultsreview
• Amountofdatausedisappropriate:theamountofdatausedincreationofallexporteddataproductsisbelowathresholdofappropriatenessofuse
• Datatypes:thetypeofdatausedinthecreationofnewcontentisallowablefortheneed• Intentisreasonableandidentityisproven:throughstructuresofpolicyandinstitutionalinfrastructure• WhenaCapsuleisusedforanalyticalpurposes,acceptableactivitiesincludebutarenotlimitedtoa)
imageanalysisandtextextraction,b)textualanalysisandinformationextraction,c)linguisticanalysis,d)automatedtranslationandlanguagetranslation,ande)indexingandsearch.
DataCapsulethusenablestransformativeuseofrestrictedandsensitivecollectionsthroughaservicethatwillbepackagedasanappliance,willhaveoptionsforhookingtoanewcollectionwithrelativeease,andprovidestheneededassurancesthattheactionsallowablebytheservicewillprotectthecollection.
LG-71-17-0094IndianaUniversityDataToinsightCenter
Narrative,p.3
2.ProjectDesign
Theprojectisstructuredtobringtogetherthreedistinctcomplementarybodiesofexpertise:humancomputerinteractionexpertiseincommunityengagement,participatorydesign,andsocial-technicalinteractions(Kouper);computerscienceandtechnologyexpertiseindata-drivenarchitectures,datamodels,andtrust(Plale,McDonald,andDownie),andlibrarypartnerswithexpertiseintechnologyservicesforspecialcollections(MitchellandUnsworth).Themultidisciplinaryteamiscriticaltobringaboutaprojectofthisnature.
Thelibrarypartnershipisdesignedattwolevels.Level1TestingPartnersidentifyacollectionandanend-userneed,andworkwiththeDataCapsuleteamtoimplementaproof-of-conceptdemonstrationforthecollection.Level1TestingPartnersalsoparticipateintheassessment,userstudy,andparticipatoryactivities.TheyincludethelibrariesofUniversityofCaliforniaBerkeleyandUniversityofVirginia.Level2Partnersengageintheassessmentanduserstudy,andcontributetoparticipatoryactivities.Level2partnersincludethelibrariesofLafayetteCollege,IndianaUniversity,MIT,Rutgers,Swarthmore,andUCLA.
2.1Goals,methods,assumptions,andrisks
Thebroadgoalofthisprojectwillbeaccomplishedthroughsynergisticandmutuallyreinforcingactivityinitstwomajorfociofexpertise:inparticipatory,design-orientedpartnerengagementandinsoftwarearchitectureandevaluation.Thenatureoftheprojectisiterativewithinandbetweenthetwofociofexpertise:“explore,approximate,andrefine”[17].
Researchmethodologies:Theprojectwillemployresearchmethodologiesfromboththedomainsofhuman-computerinteractiontoaccomplishthegoalsassessment,partnerengagementandevaluation,andexperimentalcomputersciencetoadvancetheDataCapsuledesignandWorkset.Thismulti-methodapproachtoresearchisincreasinglyimportantinsuccessfultechnologyadoption:activeall-stakeholderengagementattheearlystagesensuresagoodfitonthehumancapitalside,andtheexperimentalcomputerscienceensuresagoodfitonthetechnologicalside.Themethodologiesofeacharedescribedinmoredetailbelow.
Projectrisks:Lowlibrarypartnerparticipationisapotentialprojectrisk.Weaddressedthisriskduringdevelopmentofthefullproposalbydevotingsubstantiallymoreresourcestothelibrarypartners.Weincreasedthenumberoflibrarypartnersintheprojectfrom3-5to8,andintroducedthetwo-tieredpartnermodel.Level1partners(2)receivefundingthroughasubcontractthattheyuseforengagementoftechnicalorcollectionsexpertise.WeadditionallybuiltfundingintotheIndianaUniversitybudgettofundtravelforLevel2partners(6)toparticipateinaregionalcommunity-buildingevent.Thechangeresultedinanincreaseintheoverallbudgetofabout15%fromthepre-proposal.Wethoughtthisactionanecessaryriskmitigationstrategy.Ourprojecthasalreadybuiltintoitaprogramforconstantsupportandinteractionwiththelibrarypartnersonbothlevelstoensurethehighestpossibleparticipation.
Assumptions:Ourprojecthasseveralassumptions,allofwhichwethinkarereasonableexpectationsintheenvironmentsofmajoracademiclibraries,thoughfurtherstudywillbecarriedoutforlesswell-equippedlibraries.DataCapsuleisanenvironment(asetofsoftwareservicespluspolicies)thatutilizesaclusterof
LG-71-17-0094IndianaUniversityDataToinsightCenter
Narrative,p.4
computerslocatedwithinasecurenetwork.ThecodebaseismodularandutilizesApplicationProgrammingInterfaces(APIs)forextensibilityandinteroperability.ADataCapsuleControllerrunsononeoftheclusternodes.Fromthere,itallocatestoanenduseraCapsule--avirtualcomputer(virtualmachine)thatrunsononeoftheothernodesinthecluster.TheDataCapsuleserviceimplementationwillbeextendedinthisprojecttoutilizetheoperatingsystemlibrary,Libvirt1,whichallowsDCtoconfigureanenduserCapsuleforsecureaccess.Ourimplementationplanthusassumesi)Level1testingpartnerssupporttheexistenceofalibrarysuchasLibvirtrunningontheirtestingservers,ii)programmaticaccesstoacollectionisavailablethroughanAPI,andiii)thereexistsatrustedserviceinthelibraryenvironmentthroughwhichuserauthenticationcanbecarriedout.
ResearchFramework1:TheframeworkofParticipatoryDesign(PD)informstheresearchquestionsandmethodologiesofthehuman-computerinteractionresearch.Atheoreticalframeworkandasetofpractices,PDexploresconditionsfordeepuserengagementinthedesignandimplementationofcomputer-basedsystemsatwork[18].Userempowermentanddemocraticdecision-makingarecrucialforsuccessfulPDasoneofthemainassumptionsisthattechnologyisbeingdesignedtofacilitateskilledworkandenhanceratherthancompletelyreplacehumanlabor[19].Librariesrecognizetheneedtoengagetheirendusersinthedesignoflibraryspacesandtechnologies[20],[21].Weraisethequestionsofhowlibrariansthemselvescanbeinvolvedinco-designoftoolsthatuseandenhancetheirskillsets,while,atthesametime,enablelibraryendusers.
ResearchFramework2:Experimentalcomputerscienceasadisciplineandmethodologyformstheframeworkforassessingandadvancingthetechnologicalaspectsoftheproject.Throughiterativedesignandprototyping,wereflectuserneedsinthesoftwaredevelopmentprocess.Throughcarefullycontrolledcomparativeevaluationstudiesthataredesignedtoincludeperformanceevaluation,weaccuratelyassessdifferenttechnologicaltradeoffs.Thesestudies,whichareofaqualitysoastobepublishedinarchivalvenues,contributetothediffusionoftheprojectresultsmorebroadlythroughlibrariesandthroughtime.
DataCapsuleisanenvironmentthatutilizesaclusterofcomputerslocatedwithinasecurenetwork.Capsuleshavetwomodesofrunning:anopenmodeduringwhichausercanuploadtools,data,andsoftwareoftheirchoice.Duringopenmode,accesstotherestrictedorsensitivedataisblocked.Inthesecondmode,aclosedmode,allaccesstotheInternetisblocked,andthechannelstotherestricteddataareopened.Thisiswherethetoolsthatneedtoworkwiththesensitivedatacanbestartedup.Uponcompletionofatask,theuserstorestheresultstheywishtoexporttoaspecialdirectory,wheretheyarequeuedformanualreview,and,uponsuccessfulreview,theuserissentaURLfromwhichdownloadcanoccur.
TheexistingDataCapsulesystemwillbemigratedtoutilizetheLibvirtvirtualizationtoolkit.TheDataCapsuleControllerisdeliveredaseitheravirtualmachineimageormultipleDockercontainers,togetherwithasetofconfigurationfilesforpartnerstocustomizefortheirparticularenvironment.TheDataCapsuleControllerexpectstwocommunicationendpointsfromthepartnersite:APIsandcorrespondingSDK/toolkitthatcansecurelyaccessthedatacollectiontobeusedfromcapsules;andatrusteduserauthentication/authorizationinformationrelaytotheDataCapsuleController.LibvirtdaemonsarerequiredtoberunningonallData
1ThevirtualizationAPI:https://libvirt.org;runsonLinux,Windows,OSX,FreeBSD
LG-71-17-0094IndianaUniversityDataToinsightCenter
Narrative,p.5
Capsulehostingservers.TheDataCapsuleControllerwillprovideRESTfulAPIsandabasicadministrationdashboardforpartnersitetobuildcustomizedfront-enduserinterface.AseparatedatabaseisneededtostorestatusofCapsulesandtheiractivities,aswellasusercomputationresultsforthewholesystem.TheDataCapsuleControllerisexpectedtoberatherlightweighttorunasasingleVM.TheDockercontainerapproachcouldprovidefurtherflexibilityofpackagingcomponentsandlesssystemresourceconsumption,albeitbemorecomplicatedtodeploy[22],[23].
OneoftheimportanttoolsintheDataCapsuleenvironmentistheWorkset.Asrestrictedcollectionscannotbemovedoutsideoftheirsecurestorageandprocessingenvironment,usersneedamechanismtosaveapersistentcontextoftheirsourcesthatholdsinformationaboutthestateoftheiractivities.HTRCusesthenotionoftheWorkset-amachine-actionablepersonalresearchcollectiondescribedusingtheResourceDescriptionFramework(RDF)thatconsistsofreferencestodigitalobjects(e.g.,volumes,pages,andsoon)andmetadata[18].TheWorksetmodelcombinespointersto,andmetadataabout,thegeneratedresourcesanditsselectionproceduresaswellasmetadataaboutbibliographicresourcesthatwentintoitscreation.Itprovidescontextandcontinuitythroughtheresearchlifecycle,fromitsconceptionandcreationtoarchiving,citation,andusebyotherresearchers.
Theresearchquestions/issuesthatweproposetoinvestigateare:
● Whataretheusesofrestrictedcollectionsinthecontextofdeliveringcomputationalanalyticalservices?Howdocollectionprovidersandusersconstructtheirneedsoftransformativeusesofthecollection?
● Howdocollection-specificservices,policiesandusesaffectthedesignofDC,andhowcanDCappliancefitwithinthelibraryanditstechnologicalandorganizationalmodels?Howdodifferentlypositionedactorswithinanorganizationinfluencethat?
● QuantifytheperformanceimplicationsofcertaindesigntradeoffsinextendingandgeneralizingtheDataCapsulesystemtomeettheneedsofabroadsetoflibraryusesandenvironments.
○ Includeinthestudyanassessmentoftradeoffswhenconsideringlibrarieswithlesswellequippedtechnicalinfrastructures
● EvaluatethetradeoffsforextendingtheDataCapsulesystemtoallowuserCapsulestoutilizehigh-performancecomputeresourcesinsideorexternaltoaninstitution,andrunlargeanalysistasks.
● EvaluatethedifferentmodelsforWorksetuseintheCapsulefordifferentuseandcollectionneeds.
2.2Specificactivities
Element1:Assessment
Workwithpartnerstomapoutcollectionspecificsandthecontextsoftheiruse;prioritizeneedsinco-designandimplementation;organizeeventstobringparticipantstogetherasacommunity.Employparalleltheoreticalreflectionandcontinuousexchangeofknowledge.
LG-71-17-0094IndianaUniversityDataToinsightCenter
Narrative,p.6
Tasks
• Researchteaminterviewspartnerstogatherinformationaboutcollectionsandthecontextoftheiruses,identifiescollection-specificcharacteristicsaswellasworkpracticesthatmayimpactdevelopmentandimplementationofDC.Accessrestrictions,storage,security,andanalyticalneedsaswellastherelationshipsbetweencollectionusers,stewards,andtechnicalsupportwillbeincluded.Userneedsasseenbylibrariansortakenfrompreviousfeedbackofactualusers(e.g.,typesofdataanalysis,toolsused)willalsobeidentified.
• ExaminepoliciesandotherfactorsthataffecttheuseofrestricteddataandDC.Collectandanalyzedocumentsthatgovernaccessanduseoftherestrictedcollections.
• Organizecommunity-buildingeventspossiblyco-locatedwithregionalHTRCUnCampeventstoincreaseparticipation;organizeregularinformation-sharingsessions.
Outcomes
• Effectivecoordination,sharing,andnetworkingwithallpartners• Taxonomicknowledgeaboutrestrictedcollectionsandtheirpoliciesandcontextsofuse• Emergingsenseofcommunity• Communitybuildingmeetings
Element2:PartnerEngagement
Engagethetechnicalteam,Level1testing,andLevel2partnersinclosecooperation.Level1testingpartnerseachhaveaninstallationofDataCapsuleonanexperimentalsetofmachinesoftheirchoice.
Tasks
• TechnicalteamandLevel1testingpartnersengageinmutualexchangeaboutcollectionconstraints,infrastructureconstraints,technologyoptions,andsolutionsforprototypedemonstrationswithpartnercollections.Carryoutcontinuousinstallation,evaluation,andfeedbackcyclestorefine.
• EngagelibrarypartnersinParticipatoryDesign.Participatoryactivitiesandevaluationofappliance,whichwillincludedemoofDataCapsuleprototypeandWorksetreflectingco-designedfunctionality;installationofDataCapsuleatLevel1partners;continuousinstallofextensionsatLevel1partners,evaluationofimprovementsforallpartners.
• VisitworkplacesofLevel1and2partnersforpurposesofinformationexchange,assessmentandlearning.
Outcomes
LG-71-17-0094IndianaUniversityDataToinsightCenter
Narrative,p.7
• Sharedknowledgeandunderstanding• Participant-influenceddesignoftechnologies• Betterfitoftechnologytoneeds• Loweredbarrierstoadoptionforpartners
Element3:DataCapsule
ExtendexistingDataCapsuleservicetoenableintuitiveandyetsecurecomputationalaccesstorestricteddatainlibraries.Evaluateextensionsthroughdemos,prototypedfunctionality,andevaluativestudies.
Tasks
• Design,developarchitectureforpackagingDataCapsuleasanappliance• Extenddatacapsulesystem’sarchitectureto
i) Enforceproperaccessofrestrictedandsensitivecollections,ii) Supportaccesstomultiplecollectionshavingdiverseformatsandtypes,iii) Supportrangeofusemodelsneededbypartners.Implementselectivechangesinform
ofprototypedemoforfeedback.• DesignevaluativestudyofDCascapableofutilizinghighperformanceorcloudcomputing
resourcestoserveinstitutionswithvariousresourcesincludinglessequippedinstitutions.Carryoutperformanceexperimentsevaluatedifferentdesigntradeoffs
Outcomes
• ExtendedcodebaseofDataCapsulepackagedasanappliancewithsupportfornewcollectiontypesandusecases.Codebasereleasedwithappropriateuseranddeveloperdocumentation.
• PublishedproofofconceptstudyofhowDataCapsulecanbescaledtouselarge-scalecomputeresourcesataninstitutionoratacloudprovidersuchasAmazonWebServices
• Publishedstudyofdesigntradeoffsinenhancementstosupportnewusecasesandaccessmodestorestrictedandsensitivecollections
Element4:Workset
EvaluateWorksetswithinthecontextoftheproject’snewusesanduserstoimprovetheutilityandimpactofWorksetsinthescholarlyresearchprocess.
Tasks
• Participateinassessmentandparticipatoryactivitiestogatherinformationaboutthe
LG-71-17-0094IndianaUniversityDataToinsightCenter
Narrative,p.8
applicabilityofthecurrentWorksetmodeltospecificcollections.• DesignandcarryoutstudythatevaluatestradeoffstoextensionofWorksetmodeltoaccommodatethenewusesofDataCapsulesforcomputationalaccesstorestrictedandsensitivecollections.
• BringWorksettostatetoparticipateindemosshowcasingnewDataCapsulefunctionality• ActivelyengagelibrarypartnersinexploringhowbesttoeducateusersonoptimalpracticesforWorksetuseandreuse.
Outcomes
• Educationalmaterialsforaresearcher’sbestutilizationoftheWorksetnotioninthedistantanalysisthatthisprojectenables
• PublishablestudyofdesigntradeoffsforextendingWorksettoadditionalcollectionsanduses
2.3Projectmanagement
TheprojectwillbeledbyBethA.Plalewithdirectoversightandresponsibilityforprojectsuccess.Dr.KouperandRobertMcDonaldwillserveasco-Directors.TheleadershipteamincludingJ.StephenDownieatUniversityofIllinoiswillmeetweekly,andbejoinedonceamonthbytheLevel1TestingLibrarypartners.DecisionmakingwithiscarriedoutthroughconsensusbuildingwiththefinaldecisionrestingwiththePD.
Dr.Plalealsobringstechnicalexpertise,andinthisPlalewillworkcloselywithDr.Yu(Marie)Ma,Dev/OpsmanagerofHathiTrustResearchCenter,toensurethatthetechnicalstaffmembersaretaskedappropriatelyfortheprojectneedsandtimelines.Dr.InnaKouperwillleadtheprojectassessmentandcommunitybuildingactivitiesusingParticipatoryDesignmethodsandcarriedoutincollaborationwithpartnerlibraries.RobertH.McDonaldwillcoordinatethepartnerlibraries.Level1partnerlibrarieswillsuperviseprototypingandtestingofdigitalcollections.J.StephenDowniewillcoordinateexpertiseontheWorkset.
Bi-weeklyvideoconferencingmeetingscarriedoutforcommunitybuildingwillbeheldusingtheZoom.usconferencingsystemthatIUprovidesfreetoitsresearchgroups.TechnicalcommunicationwithLevel1(andlevel2asinterested)partners,whichtendstobefrequentandshortduringjointefforts,willutilizeaSlack.comchannel.Stakeholderinteractionswillbeviaregularteleconferencesandphonecalls.Userstudieswillbeconductedonlineusingscreen-sharingandrecordingtoolssuchasZoominadditiontoin-personvisits.
IssuesraisedbylibrarypartnersneedingimmediateattentionoftheDataCapsuleandWorksettechnicalteamcanutilizetheHathiTrustResearchCenterservicedeskbuiltontheAtlassianJiraServiceDeskandbugtrackingsystem.Softwaredevelopmentandprojectmanagementcomputers,grantsmanagementstaff,andofficespaceneededfortheeffortatIndianaUniversityareprovidedbytheDataToInsightCenter.Theotherfundeduniversitieswillprovidesimilarresourcesneededforaccomplishingtasks.WewillutilizecomputerresourcessuchasAmazonWebServicesasneededfortesting.
LG-71-17-0094IndianaUniversityDataToinsightCenter
Narrative,p.9
Asthisisaresearchgrant,evaluationandperformancemeasurementsarebuiltintotheoutcomes.Thatis,publishedresultsareamongsttheplannedoutcomes.ThefindingsfromassessmentandParticipatoryDesignwillbesharedanddiscussedwithdeveloperandlibrarianteamsduringregularmeetings.Ongoingfeedbackwillbeincorporatedintothefindings.
2.4Projectdisseminationandsustainability
Recommendationsfromthisprojectcanbeadoptedindiverselibrarysettings;thesurveysandcommunitybuildingeffortscanbringtogethermanystakeholdersindata,includingresearchers,librarians,universityadministrators,andfundingagencies.Resultsoftheprojectwillbedisseminatedthroughmultipleprofessional,academic,andsocialmediachannels.
Communitybuildingisakeypartoftheproject.CommunitybuildingusermeetingsfromthisprojectwillbeconsideredtobecomepartoftheregularHTRCUnCamps--hybridconference-workshopeventsalreadyapartofHTRC’scommunityengagementplan.ChangestotheDataCapsulecodebaseundertakenduringthisprojectwillbecommittedbacktoanewprojectbranchoftheexistingDataCapsulecoderepository(https://github.com/htrc/HTRC-DataCapsules).AsanintendedoutcomeoftheParticipatoryDesignframeworkofthisproject,librarypartners,especiallyLevel1partners,willbeactivelycontributingtothecodebranchbytheendoftheproject.Thiswillcreateabroadercommunityaroundthecodebase,thusgivingastrongfoundationforitssustainability.ThechangestotheDataCapsulessystem,includingtheWorkset,areanticipatedtoalsobenefittheinstancerunningintheHathiTrustResearchCenter,creatinganotherpillarinthefoundationofsustainabilityfortheframework.
3.NationalImpact
Theproposedprojectwillhavenationalimpactthroughi)provisionofaportablesolutionforaccessingrestrictedandsensitivecollections,ii)fosteringacommunityandincreasedcollaborationaroundthetechnical,organizational,andpolicychallengesofprovidingcomputationalaccesstorestrictedcollections,andiii)amplifyingprojectoutcomesthroughtheconnectiontoHathiTrustConsortiumanditshundredsofmemberlibraries.Ourportablesolution,onceinshareableform,canbereusedbyotherlibrariesaroundthecountry,whereexpertscanimprovethecodeanddocumentationaswellasdigitalcurationactivities,andworkwiththeiruserstodevelopnewrequirementsandmaterialstouserestricteddigitalcollectionsinresearchandteaching.AnemergingcommunitywillbecomepartofthelargerHathiTrustcommunityandwillcontinuestimulatinglibrariesandresearchandnon-profitorganizationstojoinforcesinfurtherdevelopmentandmutuallearningandsupport.Astrongsenseofcontributionandcollaborationaroundcommunity-sustainedsoftwarewillhelptohavealong-lastingimpact.
Addressedneeds:Throughitsdevelopmentandparticipatoryactivities,thisprojectwillbroadenaccesstodigitalcollectionsthatexistinlibraries,includingpapers,letters,video-materialsandmanyothers.Itwillnotonlyestablishacommunitydedicatedtoworkingonsolutionsforrestrictedcollections,butalsodevelopastrongfoundationformotivatingandengagingfuturegenerationsoflibraryexpertsindevelopinginnovative
LG-71-17-0094IndianaUniversityDataToinsightCenter
Narrative,p.10
softwareandservices.Projectoutcomeswilladdressthelibraryneedsofprovidingscalabletoolsforworkingwithdigitalcollections,whilerespectingprivacy,copyright,andconfidentialityrestrictions,andcontributetobuildingtheNationalDigitalPlatformasadistributedsetofsoftwareapplicationsandprofessionalexpertisethatprovidelibrarycontentandservicestoallusersintheUS[24].
Inadditiontoprovidingastrongprototype,wewillhelptrainlibrariansandprofessionalsinvolvedindevelopingtechnologyviasupportfromandcollaborationswithourtechnicalteamandviatargetedcommunityevents.Wewillsupportcommunitiesofpracticeandstrengthenlibrariesaspartnersinaddressingtheresearchandscholarshipneedsofcomputationalresearch.
Resultingproducts:ThisprojectwillresultinthetangibleproductsofextensionstotheexistingcodebaseforDataCapsule,toguidelinesandeducationalmaterials,andpublications.Theintangibleproductiscommunitybuy-intowardsadoptionandcommunityinvolvementinongoingcontributionstotheDCcodebase.Thetangibleproductsenableproliferationofexperienceandfactsbeyondtheimmediatelibrarypartnerstoincreasedadoption.Publications,forinstance,areatangibleoutcomethatfacilitatestrustintechnologyandhumanwork.Researchisgroundingforassessmentsofuse.
Sustainingthebenefit:Thesustainabilityofthebenefitsoftheproposedactivityextendswellbeyondtheperiodoffunding.Itisanimportantpointthatthisactivitywillvaultanexistingandsuccessfulserviceintobroaderusethroughstudyandextension,andwilldosoinawaythatbuildsitsadopters(libraries)intotheprocessthusgrowingthesustainingcommunitythroughthegrantduration.
Growingadoptersandasustainingcommunityaroundthesoftwarecodebasecantaketime,likelymoretimethantheshortgrantduration.ThisriskismitigatedbecausetheserviceitselfisgroundedintheHathiTrustResearchCenter,whichstandsbehindtheDataCapsuleserviceasitsprimaryserviceforcomputationalanalysisonthenearly15millionvolumesoftheHathiTrustDigitalLibrary.HTRCdeeplywelcomesthisinitiativetoinvolvemorepartners.AsexpectedoutcomeofthisprojectistohavepartnersoutsidetheHTRCtechnicalteammakingcontributionstothecodebase,theHTRCcommitstoincorporatingthosechangesbacktothemainbranchoftheDataCapsulecodebaseandusetheextensionsinfuturereleasesofDataCapsuleforitsownandbroaderuse.
LG-71-17-0094IndianaUniversityDataToinsightCenter
Scheduleofcompletion,p.1
ScheduleofCompletion
2017 2018 2019
Apr-Jun Jul-Sep Oct-Dec Jan-Mar Apr-Jun Jul-Sep Oct-Dec Jan-Mar Apr-Jun
Task
Award-May2017
Task/elementI:Assessment
Preparationforassessment
Assessmentofcollections,policiesandcontextsofuse
Preparationforcommunitybuildingevents
Communitybuildingevents
Carryoutpublishableanalysesofcollectedassessmentandparticipatorydesigndata
Supportstakeholder/communityinteractions
Conductonlineuserstudies
Publishtrainingmaterials
Publishresults
Task/elementII:Partnerengagementandevaluation
PlanDCinstall
Firstinstallintestenvironment
Partnercampusvisits
Guidedhandsonexperienceandcrossinstitutionlearning
Co-designandevaluationofappliance
DemoDCandworksetreflectingparticipatorydesignfunctionality
Continuousinstall,evaluationofimprovements
IntegrateprojectdevelopmentsintoDCcodebaseandrelease
Task/elementIII:Datacapsuledevelopment
Designforappliancearchitecture
Development:codechangestopackageasappliance
Usingfeedbackfromassessment,refinedesignplans
Carryoutpublishablestudythatevaluatesdifferentdesigntradeoffs
DesignevaluativestudyforDCasthinclienttoHPCresources
CarryoutdevelopmentstudyofDCasthinclient
EvaluateandintegratechangesinmainDCbranch
Publishresults
LG-71-17-0094IndianaUniversityDataToinsightCenter
Scheduleofcompletion,p.2
Developandreleaseuseranddeveloperguides
Task/elementIV:Worksetstudyanddevelopment
Developstudyofworksetinthissetting
Conductstudyofworkset
Usingfeedbackfromassessment,refinedesignplans
Carryoutpublishablestudythatevaluatesdifferentdesigntradeoffs
Evaluateandintegratechangesinmainworkset/worksetbuilderbranch
Publishresults
OMB Control #: 3137-0092, Expiration Date: 7/31/2018 IMLS-CLR-F-0032 Digital product, pp. 1 of6
LG-71-17-0094IndianaUniversityDataToInsightCenter
DIGITALPRODUCTFORM
Introduction
TheInstituteofMuseumandLibraryServices(IMLS)iscommittedtoexpandingpublicaccesstofederallyfundeddigital products(i.e.,digitalcontent,resources,assets,software,anddatasets).TheproductsyoucreatewithIMLSfunding requirecarefulstewardshiptoprotectandenhancetheirvalue,andtheyshouldbefreelyandreadilyavailableforuseand re-usebylibraries,archives,museums,andthepublic.However,applyingtheseprinciplestothedevelopmentand managementofdigitalproductscanbechallenging.Becausetechnologyisdynamicandbecausewedonotwanttoinhibit innovation,wedonotwanttoprescribesetstandardsandpracticesthatcouldbecomequicklyoutdated.Instead,weask thatyouanswerquestionsthataddressspecificaspectsofcreatingandmanagingdigitalproducts.LikeallcomponentsofyourIMLSapplication,youranswerswillbeusedbyIMLSstaffandbyexpertpeerreviewerstoevaluateyourapplication, andtheywillbeimportantindeterminingwhetheryourprojectwillbefunded.
PARTI:IntellectualPropertyRightsandPermissions
A.1 Whatwillbetheintellectualpropertystatusofthedigitalproducts(content,resources,assets,software,ordatasets) youintendtocreate?Whowillholdthecopyright(s)?Howwillyouexplainpropertyrightsandpermissionstopotential users(forexample,byassigninganon-restrictivelicensesuchasBSD,GNU,MIT,orCreativeCommonstotheproduct)? Explainandjustifyyourlicensingselections.
Theformalproductsproducedasoutcomeofourproposedeffortaresoftware,trainingmaterials,useranddeveloperdocumentation,andstudies.Weanticipateintermediateproductsemergingaswellintheformofdatasetsderivedfromtestingoftheconnectionstorestrictedandsensitivecollections.Theformalmaterialsandsoftwareproductsresultingfromthiseffortwillbelicensedusingopenandfreelicensing,e.g.,CreativeCommonsandApache2.0-stylelicenses,followingthebestpracticeestablishedbytheHathiTrustResearchCenter(HTRC).Intermediateproductsemergingasaresultoftestingandexperimentationwillbediscardedbytheendoftheprojectlife.WhileoperationaluseofaDataCapsuleserviceatapartnerinstitutionisnotanticipatedoverthecourseoftheproject,shoulditoccur,orshoulduseofHTRC’soperationalDataCapsuleservicebeusedfortraining,thenthedataproductsemergingfromenduseruseofaCapsulewillfollowtheHTRCpolicyofnotimposinglicensingrestrictionsontheproductsassumingthattheDataCapsuleservicethattheenduserisusingisfullyoperationalandthedataproductspassthereviewprocess(runbyHTRC).Iftheconditionsarenotmet,thedataproductsareconsideredintermediateproductsandwillbedestroyedbyendofprojectlife.
A.2 Whatownershiprightswillyourorganizationassertoverthenewdigitalproductsandwhatconditionswillyouimpose onaccessanduse?Explainandjustifyanytermsofaccessandconditionsofuseanddetailhowyouwillnotifypotential usersaboutrelevanttermsorconditions.
Softwareproductsdevelopedinthisprojectwillbeopenlysharedandaccessibleviaanopensoftwarerepository(Github).AstoaccesstotheDataCapsuleservice,duringthecourseoftheprojecttherewillbetestinstancesofDataCapsuleservicerunningattheLevel1librarytestingpartnerinstitutions,andanoperationalinstancerunningatIndianaUniversityaspartofHTRC.WeanticipatethetestinstancesofDataCapsuleservicehavingnoend-userusesduringthecourseoftheprojectastheywillbeunderdevelopment.TrainingwillbecarriedoutontheoperationalHTRCinstanceoftheDataCapsuleservice.
A.3 Ifyouwillcreateanyproductsthatmayinvolveprivacyconcerns,requireobtainingpermissionsorrights,orraiseany culturalsensitivities,describetheissuesandhowyouplantoaddressthem.
Aspartofthisproject,wewillbeconductinginterviewsandtakingnotesduringethnographicobservations.Thedatacollectedviainteractionswithhumansubjectswillbestoredsecurelyandaccessed by projectinvestigators only. Such datawill be shared only after appropriate anonymization or
OMB Control #: 3137-0092, Expiration Date: 7/31/2018 IMLS-CLR-F-0032 Digital product, pp. 2 of6
LG-71-17-0094IndianaUniversityDataToInsightCenter
withexplicitconsentfromparticipants.Additionally,restrictedcollectionsthatwillbeusedduringtestingincomputationalanalysisinDataCapsulesmayraisecopyright,privacyorotherconcerns.Theseconcernedwillbeaddressedthroughpolicydiscussionswithlibrarypartners;thesediscussionsmaybeguidedbyHTRC’spolicydevelopedtoaddresssimilarconcerns.
PartII:ProjectsCreatingorCollectingDigitalContent,Resources,orAssets
A. CreatingorCollectingNewDigitalContent,Resources,orAssets
A.1 Describethedigitalcontent,resources,orassetsyouwillcreateorcollect,thequantitiesofeachtype,andformatyou willuse.
Inthecourseofthisprojectthefollowingdigitalcontentwillbecreated:
1. ExtensionstoDataCapsuleservice.TheextensionswillstartfromtheexistingHTRCcodebase,whichisorganizedinapprox.50modules.Itisexpectedthatmodificationswilltouch10-20%ofthecodeforpartnercustomization.2. EnhancementstotheWorksetmodel.ThisresourceisanOntologythatcanbeexpressedinRDFand/orXMLformats.Enhancementswillcompriseabout10%oftheresource.3. Interviewrecordingsandtranscriptsandfieldnotes.SeePartIVDatasetsformoredetails.4. Onlinemanualsandtrainingmaterials.Installation,testinganduseofDataCapsulewillbedocumentedinonlinemanualsandtrainingmaterials,whichwillbeopenlyaccessibleviatheweb.5. Publicationsandpresentations.Findingsfromtheprojectwillbedisseminatedviajournals,conferences,andothervenues.PDFdocumentsandslideswillbeopenlysharedwiththecommunity,unlesspublishingrestrictionsapply.
A.2 Listtheequipment,software,andsuppliesthatyouwillusetocreatethecontent,resources,orassets,orthenameof theserviceproviderthatwillperformthework.
Theprojectactivitywillbetodevelopsoftwareextensionstoexistingcodebasesandconducthuman-computerinteractionstudies.Activitydoesnotextendtothecreationofdigitalcollections.Weintendto use computers at IndianaUniversity, University of Illinois, University of Virginia, UC Berkeley, andUCLAfortestinganddevelopment.WeexpectLevel1partnerstohavetestserversavailableonwhichwewillinstallthesoftware(DataCapsule).
A.3 Listallthedigitalfileformats(e.g.,XML,TIFF,MPEG)youplantouse,alongwiththerelevantinformationaboutthe appropriatequalitystandards(e.g.,resolution,samplingrate,orpixeldimensions).
Softwarewillexistindevelopmentformats,predominantlyJavafiles,Pythonscripts,andXMLconfigurationfiles.PartnerlibrarieswhowillusetheoperationalDataCapsuleserviceatHTRCforanalyzingtheirrestrictedcollections,mayhavederivedproductsinotherformatsthatareappropriateintheirrespectiveuserdisciplines,suchastabularfilesorimages.Qualitystandardsforthosederivedproductsaswellasqualitychallengeswillbediscussedduringparticipatorydesignactivities.Softwarequalitywillbemonitoredandevaluatedbyusing"fitnessforpurpose"andstructuralanalysistechniques.
B. WorkflowandAssetMaintenance/Preservation
B.1 Describeyourqualitycontrolplan(i.e.,howyouwillmonitorandevaluateyourworkflowandproducts).
Fordetailsonsoftwarequalitycontrol,seePartIII.
TheassessmentiscarriedoutbyaPhDresearchfacultymemberwhoishighlytrainedincarryingoutqualityprocesses.Dr.Kouperhasastrongrecordofpublicationqualityresearchinthisarea.SoftwaredevelopmentwilluseHTRC’ssoftwaredevelopmentprocesses,includingoversightbyaDevOpsManager,helpdesk,andbugtracking.StudiesofDataCapsuleandWorksetwillbeunderthesupervisionofPlaleand Downie, both full professors and accomplished scholars in this type of work.
OMB Control #: 3137-0092, Expiration Date: 7/31/2018 IMLS-CLR-F-0032 Digital product, pp. 3 of6
LG-71-17-0094IndianaUniversityDataToInsightCenterB.2 Describeyourplanforpreservingandmaintainingdigitalassetsduringandaftertheawardperiodofperformance. Yourplanmayaddressstoragesystems,sharedrepositories,technicaldocumentation,migrationplanning,and commitmentoforganizationalfundingforthesepurposes.Pleasenote:Youmaychargethefederalawardbeforecloseout forthecostsofpublicationorsharingofresearchresultsifthecostsarenotincurredduringtheperiodofperformanceof thefederalaward(see2C.F.R.§200.461).
Softwareproductswillbeshared,preservedandmaintainedusingtheopensoftwarerepositoryGithub.TechnicaldocumentationwillbestoredonGitHubaswellasontheopenHTRCwikipages.WewillencourageHathiTrustcommunityandtheemergingDataCapsulecommunitytofurthercontributetocurationandpreservationofthesoftware.Productsofresearch(publications,datasets,andpresentations)willbepreservedinIndianaUniversityinstitutionalrepositoryIUScholarworks,whichwillserveasanadditionalpreservationlayertotraditionalpublicationvenues.
C. Metadata
C.1 Describehowyouwillproduceanyandalltechnical,descriptive,administrative,orpreservationmetadata.Specify whichstandardsyouwilluseforthemetadatastructure(e.g.,MARC,DublinCore,EncodedArchivalDescription,PBCore, PREMIS)andmetadatacontent(e.g.,thesauri).
READMEfiles,useranddeveloperguidesaretheformofdocumentationusedtopreservesoftwaremetadata.FordatasetswewilluseDublinCoretorecorddescription,administrative,andpreservationmetadata.
C.2 Explainyourstrategyforpreservingandmaintainingmetadatacreatedorcollectedduringandaftertheawardperiod ofperformance.
Metadatawillbemaintainedaspartofthesoftwareanddatamaintenance,i.e.,itwillbestoredandmigratedalongwiththedigitalproducts.
C.3 Explainwhatmetadatasharingand/orotherstrategiesyouwillusetofacilitatewidespreaddiscoveryanduseofthe digitalcontent,resources,orassetscreatedduringyourproject(e.g.,anAPI[ApplicationProgrammingInterface], contributionstoadigitalplatform,orotherwaysyoumightenablebatchqueriesandretrievalofmetadata).
Astheprojectisnotconcernedwithcreatingadigitalcollection,wewillrelyonotherlargerresourcesforwidespreaddiscoveryanduse,includingHathiTrustResearchCenternetworks,academicpublishingdatabases,andsoftwareandinstitutionalrepositories.
D. AccessandUse
D.1 Describehowyouwillmakethedigitalcontent,resources,orassetsavailabletothepublic.Includedetailssuchasthe deliverystrategy(e.g.,openlyavailableonline,availabletospecifiedaudiences)andunderlyinghardware/software platformsandinfrastructure(e.g.,specificdigitalrepositorysoftwareorleasedservices,accessibilityviastandardweb browsers,requirementsforspecialsoftwaretoolsinordertousethecontent).
Softwareandstudyproductswillbeopenlyavailableonline,unlessthelatterisrestrictedbythepublishers.
D.2 Providethename(s)andURL(s)(UniformResourceLocator)foranyexamplesofpreviousdigitalcontent,resources, orassetsyourorganizationhascreated.
TheDatatoInsightCenterhasitsowngrouprepositoryonGitHubwhereallsoftwareproductsaremadeavailabletothepublic:https://github.com/Data-to-Insight-CenterMostrecentexamplesincludeDataMatchMakerhttps://github.com/Data-to-Insight-Center/Data-MatchMakerandPRAGMAData
https://github.com/Data-to-Insight-Center/PRAGMA-Data-Repository
OMB Control #: 3137-0092, Expiration Date: 7/31/2018 IMLS-CLR-F-0032 Digital product, pp. 4 of6
LG-71-17-0094IndianaUniversityDataToInsightCenter
Additionally,D2IcontributionstoHTRCcodearemadeavailableviaseparaterepositoryhttps://github.com/htrc,wheretheexistingDataCapsulecodebasecanbefoundhttps://github.com/htrc/HTRC-DataCapsules.
PartIII.ProjectsDevelopingSoftware
A. GeneralInformation
A.1 Describethesoftwareyouintendtocreate,includingasummaryofthemajorfunctionsitwillperformandtheintended primaryaudience(s)itwillserve.
Toaccomplishthegoalsofthisproject,wewillextendtheDataCapsulesservicecodebase.HTRCDataCapsuleworksbygivingaresearcheravirtualmachine(VM)thatrunswithintheHTRCdomain.TheresearchercanconfiguretheVMastheywouldtheirowndesktopwiththeirowntools.Aftertheyaredone,theVMswitchesintoa“securemode”,wherenetworkandotherdatachannelsarerestrictedinexchangeforaccesstothedatabeingprotected.Currently,DataCapsuleworksonlywiththeHathiTrustDigitalLibraryandwithinHTRCarchitecture.Wewillgeneralizethearchitecturetoworkwithothercollectionsandevaluatedesign,secureaccessandscalabilityoptionstoworkinspecificlibraryenvironments.
A.2 Listotherexistingsoftwarethatwhollyorpartiallyperformsthesamefunctions,andexplainhowthesoftwareyou intendtocreateisdifferent,andjustifywhythosedifferencesaresignificantandnecessary.
ComparableconceptualframeworksthatintendtoperformsimilarfunctionsincludeDataEnclavesandStorageCapsules.DataEnclavesrelyoncustomizedvirtualizationsoftwareandpre-definedsetoftoolstoenableaccess.Tothebestofourknowledge,noworkingsoftwareexiststhataddressestheneedtoperformcomputationalanalysisondocumentsandresourcesusingaresearcher-definedsetoftools.Astheneedforcomputationalresearchonrestrictedcollectionsusingalargevarietyoftoolsgrows,thedevelopmentofsuchsoftwareisundoubtedlysignificantandnecessary.
B. TechnicalInformation
B.1 Listtheprogramminglanguages,platforms,software,orotherapplicationsyouwillusetocreateyoursoftwareand explainwhyyouchosethem.
DataCapsulesoftwareisinJava,Python,andshellscripts.
B.2 Describehowthesoftwareyouintendtocreatewillextendorinteroperatewithrelevantexistingsoftware.
ThesoftwareextendstheDataCapsuleservice.
B.3 Describeanyunderlyingadditionalsoftwareorsystemdependenciesnecessarytorunthesoftwareyouintendto create.
DataCapsuleusesopensourcevirtualizationinfrastructure(QEMUandKVM),whichneedstobeinstalledforthecapsuletowork.
MySQLrelationaldatabasesystemisusedtostorecapsulemetadataandresults.
DataCapsuleisprovidedforUbuntu(Linux)environment.
B.4 Describetheprocessesyouwillusefordevelopment,documentation,andformaintainingandupdatingdocumentation forusersofthesoftware.
ThecodewillbeforkedinGitHubrepository,creatinganewbranch.ContributingdeveloperswillbeusingtheirenvironmenttowritecodeandthencommitthecodebacktoGitHub.WewilluseHTRCdocumentationandbug-trackingservices(AtlassianConfluenceandJira)formaintainingandupdatingdocumentation for users of the software. Atthe end of the projectonline manuals will also be written.
OMB Control #: 3137-0092, Expiration Date: 7/31/2018 IMLS-CLR-F-0032 Digital product, pp. 5 of6
LG-71-17-0094IndianaUniversityDataToInsightCenterB.5 Providethename(s)andURL(s)forexamplesofanyprevioussoftwareyourorganizationhascreated.
TheDatatoInsightCenterhasitsowngrouprepositoryonGitHubwhereallsoftwareproductsaremadeavailabletothepublic:https://github.com/Data-to-Insight-CenterMostrecentexamplesincludeDataMatchMakerhttps://github.com/Data-to-Insight-Center/Data-MatchMakerandPRAGMAData
https://github.com/Data-to-Insight-Center/PRAGMA-Data-Repository
Additionally,D2IcontributionstoHTRCcodearemadeavailableviaseparaterepositoryhttps://github.com/htrc,wheretheexistingDataCapsulecodebasecanbefoundhttps://github.com/htrc/HTRC-DataCapsules.
C. AccessandUse
C.1 Weexpectapplicantsseekingfederalfundsforsoftwaretodevelopandreleasetheseproductsunderopen-source licensestomaximizeaccessandpromotereuse.Whatownershiprightswillyourorganizationassertoverthesoftwareyou intendtocreate,andwhatconditionswillyouimposeonitsaccessanduse?Identifyandexplainthelicenseunderwhich youwillreleasesourcecodeforthesoftwareyoudevelop(e.g.,BSD,GNU,orMITsoftwarelicenses).Explainandjustify anyprohibitivetermsorconditionsofuseoraccessanddetailhowyouwillnotifypotentialusersaboutrelevanttermsandconditions.
WewilluseApache2.0licensetoreleaseDataCapsule.Thelicenseallowstoreproduceanddistributecopiesofthesoftwareanditsderivativeswithorwithoutmodifications.Thelicensetextisputtousebyaddingittotheheaderofasoftwarefile(seehttps://www.apache.org/licenses/LICENSE-2.0foracopyofthelicense).
C.2 Describehowyouwillmakethesoftwareandsourcecodeavailabletothepublicand/oritsintendedusers.
ThesourcecodeextensionstotheDataCapsulewillbemadeavailableviaGitHubhttps://github.com/htrcasaseparatebranchoftheprimarybranch.
C.3 Identifywhereyouwilldepositthesourcecodeforthesoftwareyouintendtodevelop:
Nameofpubliclyaccessiblesourcecoderepository:GitHub
URL:https://github.com/htrc
PartIV:ProjectsCreatingDatasets
A.1 Identifythetypeofdatayouplantocollectorgenerate,andthepurposeorintendedusetowhichyouexpectittobe put.Describethemethod(s)youwilluseandtheapproximatedatesorintervalsatwhichyouwillcollectorgenerateit.
Datawillbecollectedviaphoneinterviewsandethnographicobservations,whichinvolvenote-taking,recording,andphotographs.Phoneinterviewswillbeconductedatthebeginningoftheproject.Follow-upinterviewsandadditionalrecordingsofconversationsandnote-takingwilltakeplacethroughouttheprojectasaneedtodocumentparticipantinteractionswillarise.
A.2 Doestheproposeddatacollectionorresearchactivityrequireapprovalbyanyinternalreviewpanelorinstitutional reviewboard(IRB)?Ifso,hastheproposedresearchactivitybeenapproved?Ifnot,whatisyourplanforsecuring approval?
DatacollectioninvolveshumansubjectsandrequiresIRBapproval.IRBapplicationwillbepreparedandsubmittedwhen/iftheprojectisapprovedforfunding.
A.3 Willyoucollectanypersonallyidentifiableinformation(PII),confidentialinformation(e.g.,tradesecrets),orproprietary information?Ifso,detailthespecificstepsyouwilltaketoprotectsuchinformation whileyou prepare the data files for public release (e.g., data anonymization, data
OMB Control #: 3137-0092, Expiration Date: 7/31/2018 IMLS-CLR-F-0032 Digital product, pp. 6 of6
LG-71-17-0094IndianaUniversityDataToInsightCenter
suppressionPII,orsyntheticdata).
Participantscanbeidentifiedinphoneinterviews,notes,andrecordings.PersonallyidentifiableinformationwillbestoredsecurelyandonlyPIandco-PIswillhaveaccesstoit.BeforepublicreleaseofthedatasetallPIIwillberemoved(participantswillbeassignedcodednumbersandanyinformationthatmayidentifythemindividuallywillbeobscuredintheinterviews,notes,andtranscripts).
A.4 Ifyouwillcollectadditionaldocumentation,suchasconsentagreements,alongwiththedata,describeplansfor preservingthedocumentationandensuringthatitsrelationshiptothecollecteddataismaintained.
Participantswillbeprovidedwithinformedconsentforms,whichtheywillsign.TheformswillbestoredsecurelyandseparatelyandtherelationshiptothecollecteddatawillbemaintainedviaastudyIDthatwillberecordedintheinformedconsentformsandinthedatafiles.
A.5 Whatmethodswillyouusetocollectorgeneratethedata?Providedetailsaboutanytechnicalrequirementsor dependenciesthatwouldbenecessaryforunderstanding,retrieving,displaying,orprocessingthedataset(s).
Thedatawillbecollectedviainterviewsandobservationsandwillconsistoftextfiles,audioandvideofiles,andphotographs.Commonwordprocessingsoftwareandmultimediaplayersmaybeusedtodisplaythedata.Processeddatamayconsistofadditionalspreadsheetsandvisualizations,whichwillbestoredinnon-proprietaryformats(e.g.,CSVorPNG).
A.6 Whatdocumentation(e.g.,datadocumentation,codebooks)willyoucaptureorcreatealongwiththe dataset(s)? Where will the documentation be stored and in what format(s)? How will youpermanentlyassociateandmanagethe documentationwiththedataset(s)itdescribes?
Codebookswillbecreatedaspartoftheanalysisofqualitativedata(e.g.,inthethematiccodingprocedurescodeswillbedevelopedintheinductivemanner,aftercloseiterativereadingoftheinterviews).Codes,theirdescriptionsandotherdocumentationthatdescribeswhenandwheretheinterviewsandobservationstookplacewillbestoredintextformatsalongwiththedata.Thedocumentationwillbeassociatedwiththedatasetsthroughconsistentfilenamingandthroughidentifiersthatrefertoeachdatacollectioneffortseparately.
A.7 Whatisyourplanforarchiving,managing,anddisseminatingdataafterthecompletionoftheaward-fundedproject?
ThedatawillbemanagedandarchivedusingScholarlyDataArchive(backed-upstorageforlong-termarchiving)andinstitutionalGoogleDriveatIndianaUniversity(foractiveworkwithdata).Folderswithappropriatepermissionsfordata,processingscripts,IRBdocumentation,andpublicationswillbecreated.Fordissemination,wewilluseIUScholarworksrepositoryandoneofthepubliclyavailablerepositories,suchasFigshareorMendeley.
A.8 Identifywhereyouwilldepositthedataset(s):
Nameofrepository:IUScholarworks;Figshare;MendeleyData
URL:scholarworks.iu.edu/dspace/;fighare.com;data.mendeley.com
A.9 Whenandhowfrequentlywillyoureviewthisdatamanagementplan?Howwilltheimplementationbemonitored?
PIswillmonitortheimplementationofthisdatamanagementplan.Theplanwillbereviewedevery6monthsandadjustedaccordingtotheamountsandtypesofdatagenerated.