open access indicator 2015 technical en - danish national … · 2020-05-05 · 2 0 preface the...
TRANSCRIPT
1
OpenAccessIndicatorfor2015
Part2
TechnicalDescriptionofDataFoundation,ProcessesandOutput
0 Preface............................................................................................................................................................21 IntroductionandMainProcesses........................................................................................................32 Process1:CollectionofTheData........................................................................................................42.1 TheUniversitiesPublicationData..............................................................................................42.1.1 RequirementsonUniversities–MetadataFormatandMethodofCollection42.1.2 ThisYearsUniversitiesandTheirResearchDatabases...........................................5
2.2 AuthorityandAuxiliaryData........................................................................................................52.2.1 DirectoryofOpenAccessJournals(DOAJ).....................................................................52.2.2 Sherpa/Romeo(Sh/Ro).........................................................................................................52.2.3 TheDanishBibliometricResearchIndicator(BFI)....................................................52.2.4 AuthorityList:AcceptedExternalRepositories(”TheWhitelist”)......................62.2.5 AuthorityList:JournalswithextendedEmbargo(”TheBlacklist”)....................6
2.3 ThisYearsCompleteDataCollection........................................................................................63 Process2:DefiningtheSetofIn-ScopedPublications...............................................................63.1 TheSetofScopedRecordsIncludingDuplicates.................................................................73.2 TheSetofScopedRecordsExcludingDuplicates................................................................83.3 ThisYearsSetsofScopedRecords.............................................................................................9
4 Process3:CalculationofOARealizationandPotential.............................................................94.1 OpenAccessClassification–UniversityLevel...................................................................104.1.1 CheckingforGoldenOpenAccessPotential...............................................................114.1.2 CheckingforGreenOpenAccessPotential.................................................................114.1.3 CheckingforUnused&UnclearPotential...................................................................144.1.4 CheckingOpenAccessPotential–Combined............................................................14
4.2 OpenAccessClassification–NationalandMainResearchAreaLevel....................155 Process4:QualityAssurance.............................................................................................................166 Process5:Output....................................................................................................................................176.1 DataReportsfordownload........................................................................................................176.2 WebDisseminationviaTheDanishResearchDatabase................................................18
7 AppendixA:TheFulltextDownloadSubProcess.....................................................................19
Revision3of11April2017
2
0 PrefaceTheNationalSteeringGroupforOpenAccess1hasproposedtheDanishAgencyforScience,TechnologyandInnovationandDenmark’sElectronicResearchLibrary,todevelopaDanishOpenAccessIndicator.TheintentionistosupporttheimplementationofthenationalOpenAccessstrategy2-cf.thestrategy’sstatementonmonitoring:”TheimplementationofOpenAccessistobemonitoredonanongoingbasistoensurethatallpartiesmakeamaximumefforttodevelopanddisseminatefreeaccessibilitytoDanishresearchfindings.”TheOpenAccessIndicatoriscalculatedonceperyearwiththetargetfield:ScientificandpeerreviewedarticlesandconferencecontributionsinjournalsandproceedingswithISSN.InthecontextofHorizon20203,EUrequiresthatOpenAccessbeestablishedwithinatmost6monthsafterpublicationfortheareasofscience,technologyandhealthandwithinatmost12monthsforthesocialsciencesandhumanities.Thisdelayiscausedbymanyjournalsmaintainingso-calledembargoperiods,wheretheyexcluderesearchersfromestablishingOpenAccesstothearticlesbeforetheendoftheembargoperiod.AstheOAIndicatoriscalculatedonceannuallyforallpublicationswithinitstargetfield,itisdesignedtoacceptaone-yeardelayinOpenAccesstothepublications.Consequently,theOAIndicatorfor2015iscalculatedearlyMarch2017inordertoaccommodateafullyearembargoperiodalsoforpublicationsfromDecember2015.InpracticethismeansthatpublicationsfromJanuary2015couldhaveembargoperiodsallthewayupto24monthsandstillbecreditedbytheOAIndicator.ThedescriptionoftheOpenAccessIndicatorisorganizedintwoparts:
• Part1:Overviewofdatafoundation,processesandoutput• Part2:Technicaldescriptionofdatafoundation,processesandoutput
Note:InPart2,thetechnicaldescription,thenotionoftheindicator’s“targetfield”isexpressedusingtheterm“setofscopedrecords”.Queriesregardingtheindicatormaybedirectedto
AdamBaden/Hanne-LouiseKirkegaardDanishAgencyforScienceandHigherEducationMinistryofHigherEducationandScienceBredgade40DK-1260KøbenhavnKEmail:[email protected]/[email protected]
1http://ufm.dk/en/research-and-innovation/cooperation-between-research-and-innovation/open-access2http://ufm.dk/en/research-and-innovation/cooperation-between-research-and-innovation/open-access/Publications/denmarks-national-strategy-for-open-access3https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
3
1 IntroductionandMainProcessesTheactivitiesoftheOAIndicatorcanbebrokendownintothesefivemainprocesses.
Thefivemainprocessesaredescribedinfurtherdetailinthesectionsbelow.ThisdescriptionoftheOpenAccessIndicatorisaimedforatechnicallyinclinedaudienceandaimstodescribeindepthhowtheIndicatorworks–overallaswellasindetail.ThedescriptionassumesthatthereaderhasfamiliaritywithbasicXML4andbasicpartsoftheXPath5notationforreferingtoXMLelementsofanXMLdocumentconformingtoacertainXMLSchema.Italsoassumesthatthereaderisfamiliarwithvisualisationofprocessesafworkflowdiagrams6.
4https://www.w3.org/TR/xml/5https://www.w3.org/TR/xpath-30/6https://en.wikipedia.org/wiki/Flowchart
4
2 Process1:CollectionofTheData
ThefirstactivityintheOAIndicatoristhecollectionofthecompletedatafoundationusedbytheindicator.Thisincludesimportingsixnationalandinternationalsources.Thedatafoundationiscomposedofmetadatadescribingthepublicationsoftheuniversities,aswellasauthority-andauxiliarydata.
2.1 TheUniversitiesPublicationDataMetadatadescribingthepublicationsoftheuniversitiesareusedtoestablishthesetofpublicationsinscopeoftheOAIndicator.MetadatadescribingthepublicationsoftheuniversitiesarecollectedfortheOAIndicatoronceannually.Collectionisdonedirectlyfromtheuniversities,usinganXML-basednationallyagreedexchangeformatandanationallyagreedexchangeprotocol.Forfulltextsregisteredinthecollectedpublicationmetadata,collection(download)areattempted.
2.1.1 RequirementsonUniversities–MetadataFormatandMethodofCollectionAuniversitycanbeincludedintheOAIndicatorifitmeetsthefollowingminimumrequirements:
• Publicationspublishedbyresearchersemployedattheuniversityarecollectedinauniversityresearchdatabasecontainingpublicationdata,persondata,projectdataetcofthatparticularuniversityonly.
• ThisresearchdatabaseoftheuniversitymustexposeitspublicationdatausingOAI-PMH(http://www.openarchives.org/OAI/openarchivesprotocol.html).
• TheresearchdatabasemustsupportOAI-PMHselectiveharvestingusingSets,characterisedbytheirsetSpec(code),toharvestonlypartsofthedatabase.
• AdedicatedOAI-PMHSetexposingallpublicationdataheldintheresearchdatabasemustexist.
• Forthisdedicatedset,OAI-PMHmetdataPrefix”ddf_mxd”mustbesupported.• WhenanOAI-PMHclientharvestthisdedicatedsetusingmetadataPrefix
”ddf_mxd”,metadatarecordsmustbevalidDDF-MXD(http://mx.forskningsdatabasen.dk/mxd/).
5
2.1.2 ThisYearsUniversitiesandTheirResearchDatabasesThefollowing8universities–andassociatedresearchdatabases–areincludedintheOAIndicatorfor2015:University ResearchDatabase-OAI-PMHserver OAI-PMHsetSpecAAU http://vbn.aau.dk/ws/oai publications:allAU https://pure.au.dk/ws/oai publications:allCBS http://research.cbs.dk/ws/oai publications:allDTU http://orbit.dtu.dk/ws/oai publications:allITU https://pure.itu.dk/ws/oai publications:allKU http://curis.ku.dk/ws/oai publications:allRUC http://rucforsk.ruc.dk/ws/oai publications:allSDU http://heinz.sdu.dk:8080/ws/oai publications:all
2.2 AuthorityandAuxiliaryDataAuthorityandAuxiliaryDataarecollectedfortheOAIndicatorfromvarioussources.Foreachofthesesources,thecollectionisdoneonceannually.Collectionmethodanddataformatsvaryacrosssources.
2.2.1 DirectoryofOpenAccessJournals(DOAJ)DOAJisusedbytheOAIndicatorasanauthorativelistofGoldenOpenAccessJournals.Parametersofthedatacollection:
• Protocol:OAI-PMH(serverhttp://www.doaj.org/oai/)• metadataPrefix:oai_dc• Dataformat:DublinCore(http://dublincore.org/documents/dces/)
2.2.2 Sherpa/Romeo(Sh/Ro)Sh/RoisusedbytheOAIndicatortodeterminethepolicyforGreenOpenAccessbyjournals,andtherebytheOpenAccesspotentialofindividualjournalarticles.Parametersofthedatacollection:
• Protocol:HTTP(GETfromhttp://www.sherpa.ac.uk/downloads/)• Dataformat:ProprietaryXML-basedformat(http://sherpa.ac.uk/news/2012-10-08-
RoMEO-API-News.html)
2.2.3 TheDanishBibliometricResearchIndicator(BFI)DatafromBFIareusedbytheOAIndicatorforthreepurposes:
• Toidentifyduplicatepublicationdataacrossuniversities(existsforcollaborativepublicationswithcoauthorsemployedatdifferentuniversitiesandthereforeregisteredinmultipleresearchdatabases)
• Toresolvepotentialconflictswrt.MainResearchAreasregisteredinthemetadataforthepublications
• ToensurethatarticlespublishedinDOAJ-validatedjournalscanbeconsideredscientificandpeer-reviewed(BFI-level1or2).
Parametersofthedatacollection:• Protocol:HTTPS(GETfromhttps://bfi.fi.dk/AnnualReport)• Format:CompressedExcelspreadsheet–undocumentedtemplate
6
2.2.4 AuthorityList:AcceptedExternalRepositories(”TheWhitelist”)Forfulltextsdepositedinexternalrepositories,thisauthoritylistisusedbytheOAIndicatortoonlyallowfulltextsdepositedinacceptedexternalrepositoriestodemonstrateRealisedOpenAccessPotential.
• Protocol:Mail(fromAuthoritylistmaintainers)• Format:ExcelSpreadsheet–undocumentedtemplate
2.2.5 AuthorityList:JournalswithextendedEmbargo(”TheBlacklist”)TheauthoritylistisusedbytheOAIndicatortoreclassifyfromUnusedtounclearOpenAccessPotentialforjournalsregisteredonthelist.
• Protocol:Mail(fromAuthoritylistmaintainers)• Format:ExcelSpreadsheet–undocumentedtemplate
2.3 ThisYearsCompleteDataCollectionSummaryofthedatacollectionfortheOAIndicatorfor2015:Source Protocol Ver. Format Ver. CollectionDate RecordsAAU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 7248*AU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 13221*CBS OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 2118*DTU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 7740*ITU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 280*KU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 13845*RUC OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 1550*SDU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 7327*DOAJ OAI-PMH 2.0 DC % 6/3–2017 13515Sh/Ro HTTP % Proprietary % 6/3–2017 27032BFI HTTPS % Proprietary % 6/3-2017 25044Whitelist Mail % Proprietary % 26/1-2017 15Blacklist Mail % Proprietary % 14/12-2016 2945
*WithSubmissionYear2015
3 Process2:DefiningtheSetofIn-ScopedPublications
AfterthecollectionofalldatafortheOAIndicator,anumberofactivitiesareinitiatedinordertoisolatethepublicationrecordswhichareinscopefortheOAIndicator.Notallpublicationsareinscope–onlyasubsetofthepublicationsoftheuniversities.
7
Thescopeisdefinedas:
• Scientific,peer-reviewedarticlesandconferencecontributionspublishedinjournalsorproceedingswithISSN
Thus,thesubsetofpublicationmetadatarecordsrepresentingthisscopemustbeisolatedfromthetotalsetofpublicationmetadatacollected.Thsisdoneintwoways,inordertofacilitatestatisticsonthenationallevelandontheuniversitylevel:
• Scopedrecordsincludingduplicates–forstatisticsontheuniversitylevelForcollaborativearticlesacrossuniversities,allregistrationsfromallparticipatinguniversitiesarekept
• Scopedrecordsexcludingduplicates–forstatisticsonthenationallevelForcollaborativearticlesacrossuniversities,onlyoneregistrationiskept.
3.1 TheSetofScopedRecordsIncludingDuplicatesEachoftherequirementsinthedefinitionofthescopemapsnicelytoacorrespondingruleregardingDDF-MXDdataelementsandtheircontent.Thesetofscopedpublicationmetadatarecordsarethereforethesetthatcompliestoalltherules.Therulesaredescribedbelow.Firstofall,thesetofscopedrecordsmustrepresentrecordswithagivensubmissionyear.Initialruleistherefore:
0) Thesubmissionyear(indberetningsår)mustbemarkedupinthepublicationmetadatarecordwiththegivenvalue.Ruleapplied:Attribute/ddf_doc/@doc_yearhavethevalue(year)fortheOAindicatorcalculation
Subsequently,thefollowingfourrulesareappliedonallrecords:
1) Thetypeofthepublicationmustbemarkedupinthepublicationmetadatarecordas”JournalArticle””Reviewarticle”or”ConferenceContribution”(samedefinitionof“article”asusedbyBFI).Ruleapplied:Attribute/ddf_doc/@doc_typehasvalue“dja”,“djr”or“dcp”.
2) Thereview-statusofthepublicationmustbemarkedupinthepublicationmetadatarecordas“Peer-review”(similardemandasforBFI).Ruleapplied:Attribute/ddf_doc/@doc_reviewhasvalue“pr”.
3) Thescientificlevelofthepublicationmustbemarkedupinthepublicationmetadatarecordas“Scientific”(similardemandasforBFI).Ruleapplied:Attribute/ddf_doc/@doc_levelhasvalue“sci”
4) ThepublicationchannelofthepublicationmustbemarkedupinthepublicationmetadatarecordwithanISSN.Ruleapplied:Element/ddf_doc/publication/*/issnhasvalue.
8
3.2 TheSetofScopedRecordsExcludingDuplicatesForcollaborativepublicationsbetweentheuniversities,multiplepublicationmetadatarecordsmayrepresentthesamepublication.Asthisisimpracticalwhenproducingstatisticsonthenationallevel,asetofscopedrecordswithoutduplicatesareproduced.Thissetisproducedbyexposingthesetofscopedrecordswithduplicatestoadeduplicationprocess.Theambitionofthisprocessistoensure,thatforeachpublicationinthescopeoftheOAIndicatorandforwhichthereisatleastonerecordinthesetofscopedrecordsincludingduplicates,thereisexactlyonerecordinthesetofscopedrecordsexcludingduplicates.Thededuplicationprocescreatesclustersofrecords.Aclustercontainsrecordsthatrepresentsthesamepublication.Thefullsetofscopedrecordsexcludingduplicatesisultimatelyestablishedbyproducingonerecordpercluster.Thealgorithmforproducingclustersis:
1) RecordsthatwerepartoftheBFIcalculationforthesamesubmissionyearandwereidentifiedbytheBFIprocessasbeingduplicates,areaddedtothesamecluster
2) Recordsforwhichsignificantmetadataelements(DOI,title,subtitle,ISSN,publicationyear,etc.)matchessufficientlywell,areconsideredtorepresentthesamepublicationandareaddedtothesamecluster
ThisalgorithmrespectsBFI’sdeduplicationalgorithm:Rule(1)ensuresthatanyrecordsidentifiedbyBFIasduplicatesarealsoidentifiedbytheOAIndicatorasduplicates.ThescopeofBFIandthescopeoftheOAIndicatordiffer.Thismakesitrealisticthatothernon-BFI-scopedrecordsarepartoftheOAIndicatorscopeandareindeedduplicatestootherrecords.Rule(2)ensures,thattheserecordsareinfact(besteffort)beingfathomedintoclustersaswell.Thus,clustersmayinclude
a. OnlyrecordswhichwerepartofBFI,b. BothrecordswhichwerepartofBFIandrecordswhichwerenot,orc. OnlyrecordswhichwerenotpartofBFI.
Asubtlebutimportantremark:ForclusterscontainingBFIrecords-(a)and(b)above–theBFIrecordsclusteredbyrule(2)abovemaystemfromdifferentBFIclusters.OAIndicatorclustersmaycontainBFIrecordswhichwerenotjoinedbytheBFIdeduplicationalgorithm.ConflictResolutionTheresultsoftheOAIndicatoraredistributedonMainResearchArea(MRA).Inordertobeabletodothisdistribution,eachclustermusthaveauniqueMainResearchArea.BFI’sdefinitionofMRAisusedbytheOAIndicator:
• Science(sci)• SocialScience(soc)• Humanities(hum)
9
• Medicine(med)AllDDF-MXDrecordscontainauniqueMRA.Forrecordsinthesetofscopedrecordsincludingduplicates,theseMRA’sareused.Forrecordsinthesetofscopedrecordsexcludingduplicates,recordsintheunderlyingclustersmaydisagreeonMRA.UsingBFIterminology,suchasituationiscalledanMRA-conflict.SuchMRA-conflictsmustberesolvedsoeachclusterhasauniqueMRA.ThealgorithmforresolvingMRA-conflictsinaclusterare:
1) IfalltherecordsinaclusterhavethesameMRA,thisisusedforthecluster(noconflict)
2) Otherwise,ifoneormoreoftherecordsintheclusterwerepartofaBFIcluster,theBFIMRAforthatclusterisused.
3) IfnoneoftherecordsintheclusterwerepartoftheBFIcalculation–orifmultiplerecordswerepartofdifferentBFIclustersdiagreeingontheirBFIMRAforthoseBFI-clusters–majoritywins:TheMRAoftheclusteristheMRArepresentedbymostoftherecordsinthecluster.
4) IftwoormoreMRA’sarerepresentedbythesamenumberofrecordsinthecluster,theMRAwiththehighestrepresentationintheentiresetofscopedrecordsischosenforthecluster.
Thisalgorithmensures,thattheOAIndicatorsolvespotentialMRA-conflictsrespectingtothelargestextendpossiblethecorrespondingMRA-conflictresolutionsdonebyBFI.
3.3 ThisYearsSetsofScopedRecordsDataset RecordsTotalnumberofpublicationrecordscollectedfromtheuniversities 53.429Setofscopedrecordsincludingduplicates 25.070Setofscopedrecordsexcludingduplicates 22.666Forfurtherdetails,seesectiononDatareports.
4 Process3:CalculationofOARealizationandPotential
10
ThecalculationofOArealisationandpotentialaredonerespectingGreenandGoldenOpenAccess.Thecalculationisdonenationally,distributedonMainResearchArea(MRA)anddistributedonuniversities.TheOpenAccesspotential–andtherealisationofthat–isinitiallycalculatedperuniversity,usingaper-publicationapproachbasedonthesetofscopedrecordsincludingduplicates.Subsequently,itisalsocalculatedforthenationallevelandMRAlevel,alsousingaper-publicationapproach,butbasedonthesetofscopedrecordsexcludingduplicatesForbothsets,eachrecord/publicationbelongingtothesetisclassifiedaccordingtohowthepublicationrealiseitsOpenAccesspotential.Therearethreevaluesforthisclassifications,andtheyarecolorcodedusinggreen,yellowandred(trafficlight):
• RealisedOpenAccesspotential• UnusedOpenAccesspotential,and• UnclearOpenAccesspotential
Forsomein-scopedrecords,theclassificationincludesattemptingadownloadofafulltextregisteredintherecord.Fortechnicalreasons,theactualdownloadattemptsofallpotentialfulltextsarethefirstsubprocess.PleaserefertoAppendixAfortechnicaldetailsonhowthisisdone.
4.1 OpenAccessClassification–UniversityLevelForanyrecordinthesetofscopedrecordsincludingduplicates,theOpenAccesspotentialisestablishedthroughanumberofvalidationsteps.Asanoverview,theclassificationprocesscanbeillustratedasfollows:
11
Pleasenote,thatalthoughthediagramaboveindicatesthatvalidationforGoldenandGreenOpenAccesstakesplaceinparallel,theactualimplementationis,thatGoldenisvalidatedbeforeGreen.Eachofthestepsillustratedaboveareworkflowsoftheirown.Theyaredescribedindividuallybelow.
4.1.1 CheckingforGoldenOpenAccessPotentialFirst,thejournalregisteredinthepublicationmetadatarecordischeckedagainstDOAJ.Ifpresent,andifthepublicationrecordachievedalevel1orlevel2BFIclassification,thepublicationisconsideredonewitha(Golden)OpenAccesspotential,andthepotentialisconsideredtobeRealised.Theassociated–simple-workflowcanbedepictedasfollows:
4.1.2 CheckingforGreenOpenAccessPotentialGreenOpenAccessvalidationofapublicationrecordinvolvesinspectingtheelement/ddf_doc/oa_link.Below,itwillbereferredtowiththeshorthandnotation//oa_link.Recordsmaycontainzero,oneormore//oa_linkelements.ThecombinedworflowforvalidatingGreenOpenAccessisasfollows:
12
Threedecisionsinthisworkflowhastodowithqualification.Thesethreedecisionsaremadefollowingsub-workflows:Decision://oa_linkelementqualify?Aqualified//oa_linkelementisa//oa_linkelement
• withattribute@typehavinganacceptablevalue(”loc”forlocalor”rem”forremote”–not”doi”forDOI),and
• witha@urlattributethathasavalue.Checkingforqualificationcanbeillustratedwiththefollowingworkflow:
13
Decision:DoesURLqualify?AqualifiedURLiseitheraURLtoalocalrepositoryoraURLtoanexternalrepositorythathasaprefix(domainnameandpotentiallyalsopath)registeredforarepositoryonthelistofacceptedexternal(/remote)repositories(theWhitelist).Checkingforqualificationcanbeillustratedwiththefollowingworkflow:
Decision:DoesFilequalify?Aqualifiedfileisafilethat
• canbedownloadedbyacomputer• wherethecontentofthedownloadedfilehassizebiggerthanzero
Checkingforqualificationcanbeillustratedwiththefollowingworkflow:
14
4.1.3 CheckingforUnused&UnclearPotentialIftherecordhasnoRealisedOpenAccessPotential,therecordisexaminedtodetermineifthepotentialisUnusedorUnclear.TheOpenAccesspotentialofthepublicationisderivedfromthetheOpenAccesspotentialofthejournalregisteredinthepublicationmetadatarecord,asregisteredintheSherpa/Romeadataset(c.f.http://www.sherpa.ac.uk/romeoinfo.html).
Rulesapplied:
• IftheISSNofthejournalisregisteredinSherpa/Romeowithcolorcodegreen,blueoryellow,thejournalisconsideredonewithOpenAccessPotential,andthepublicationmetadatarecordisconsideredonewithanUnusedOpenAccesspotential.
o AnExceptiontothisruleis,iftheISSNisregisteredonthelistofacceptedjournalswithextendedembargoperiods(theBlacklist).Ifso,therecordisreclassifiedtoUnclear
• IfthejournalisregisteredinSherpa/Romeowithadifferentcolorcodeornotregisteredatall,thejournaldoesnothaveaclearOpenAccesspotential,andthepublicationmetadatarecordisconsideredtobeonewithanUnclearOpenAccesspotential.
Thisvalidationcanbedepictedasfollows:
4.1.4 CheckingOpenAccessPotential–CombinedThus,thecombineddecissionworkflowfordeterminingtheOpenAccesspotentialofarecordis:
15
4.2 OpenAccessClassification–NationalandMainResearchAreaLevelPublicationmetadatarecordsinthesetofscopedrecordsexcludingduplicatescorrespondtoclustersofoneormorerecordsfromthesetofscopedrecordsincludingduplicates.AfterclassifyingeachoftherecordsofthesetofscopedrecordsincludingduplicatesaccordingtoOpenAccesspotentialanditsrealization,clustersinheritclassificationsaccordingtoa”best-classification-wins”algorithm,usingthefollowingdecisionworkflow:
16
5 Process4:QualityAssurance
TheresultsoftheOpenAccessIndicatorhavebeensubjectedtothefollowingqualityassurancemeasures:
• DataFoundation.Thecollecteddataandtheregisteredlinkstofulltextsandtheirresolvabilitybacktotheuniversitiesresearchdatabases,hasbeentested.Thetestshavebeenbasedonsamplingacrosstheuniversities.
• Downloadedfulltextfiles.Thecollecteddataandtheregisteredlinksandtheirresolvabilitybacktotheuniversitiesresearchdatabases,hasbeentested.Aselectionofthedownloadedfulltextfileshavebeeninspectedtoensurethattheycanindeedbeconsideredfilesrepresentingthescientificarticle–inacompleteandreadablefashion.Thetesthavefocusedonfilesthat,basedonsimplecomputerbasedanalysis,couldseemtodeviatesuspiciouslyfromthemetadataregisteredforthepublication(pagenumber,filesizes,etc.)
17
• LinkstoexternalOArepositories.Allfiles,realizedthroughlinkstorecognizedexternalOArepositories,havebeeninspectedinordertoensurethatthelinksleadtoafulltextfilerepresentingthescientificarticle.
• Randomsample.Arandomsampleof5%fromthetotalsetofrealizedOpenAccesspotential,fromeachuniversity,hasbeeninspectedwiththeaimofvalidatingtheoveralldataquality
6 Process5:Output
Asoutput,theOpenAccessIndicatorproduceanumberofdatareportsaswellasweb-friendlyvisualisationsofthesummationsofthese.TheDanishResearchDatabase(http://forskningsdatabasen.dk/)isusedasdisseminationplatformforthevisualisationsandthereports.
6.1 DataReportsfordownloadFivedatareportsareproduced:
1) Summations::Thesetsofscopedrecords,aggregatedanddistributedonRealized,UnusedandUnclearOpenAccesspotential
a. Nationaly(setofscopedrecordsexcludingduplicates)b. DistributedonMainResearchArea(setofscopedrecordsexcluding
duplicates)c. Distributedontheuniversities(setofscopedrecordsincludingduplicates)
2) Detailedfoundationfor(a)and(b):Totallistofpublicationrecordsinthesetof
scopedrecordsexcludingduplicates
3) Detailedfoundationfor(c):Totallistofpublicationrecordsinthesetofscopedrecordsincludingduplicates
4) Thelistofacceptedexternalrepositories(TheWhitelist)usedforthecalculation
5) Thelistofacceptedjournalswithextendedembargoes(TheBlacklist)usedforthecalculation
18
6.2 WebDisseminationviaTheDanishResearchDatabaseThesummationsoftheOpenAccessIndicatorarevisualisedonhttp://forskningsdatabasen.dk/en/open_access/overview,fromwheredatareportscanbedownloadedaswell.
19
7 AppendixA:TheFulltextDownloadSubProcessAllthefulltextsregistered(byitsURL)inthescopedsetofpublicationmetadatarecordsareattempteddownloadedinasinglesubprocess.Thissubprocessisimplementedinthefollowingway:
• Fulltextsaredownloadedonebyone(serial;notinparallel)
• Fulltextsaredownloadedina”UniversityRoundRobin”fashion:o onefulltextfromuniversity1o onefulltextfromuniversity2,o onefulltextfromuniversity3,o …,o onefulltextfromuniversityN,o onefulltextfromuniversity1,o onefulltextfromuniversity2,o …,o onefulltextfromuniversityN,o …o …
AlldownloadsaredoneautomaticallybytheOAIndicatordownloadrobot.Anyrepositoryholdingthefulltexts(eithertheresearchdatabasesoftheuniversitiesorexternalrepositories)canidentifyadownloadbytheOAIndicatorrobotby:
• IPaddress:192.38.67.38