open access indicator 2016 technical en...2 0 preface the national steering group for open access1...

20
1 Open Access Indicator for 2016 Part 2 Technical Description of Data Foundation, Processes and Output 0 Preface .......................................................................................................................................................... 2 1 Introduction and Main Processes ..................................................................................................... 3 2 Process 1: Collection of The Data ...................................................................................................... 4 2.1 The Universities Publication Data........................................................................................... 4 2.1.1 Requirements on Universities – Metadata Format and Method of Collection ..................................................................................................................................................... 4 2.1.2 This Years Universities and Their Research Databases....................................... 5 2.2 Authority and Auxiliary Data .................................................................................................... 5 2.2.1 Directory of Open Access Journals (DOAJ) ................................................................ 5 2.2.2 Sherpa/Romeo (Sh/Ro) ..................................................................................................... 5 2.2.3 The Danish Bibliometric Research Indicator (BFI) ............................................... 5 2.2.4 Authority List: Accepted External Repositories (”The Whitelist”) ................. 6 2.2.5 Authority List: Journals with extended Embargo (”The Blacklist”) ............... 6 2.3 This Years Complete Data Collection ..................................................................................... 6 3 Process 2: Defining the Set of In-Scoped Publications............................................................. 6 3.1 The Set of Scoped Records Including Duplicates.............................................................. 7 3.2 The Set of Scoped Records Excluding Duplicates ............................................................. 8 4 Process 3: Calculation of OA Realization and Potential........................................................... 9 4.1 Open Access Classification – University Level ................................................................ 10 4.1.1 Checking for Golden Open Access Potential .......................................................... 11 4.1.2 Checking for Green Open Access Potential............................................................. 12 4.1.3 Checking for Unused & Unclear Potential ............................................................... 15 4.1.4 Checking Open Access Potential – Combined ....................................................... 16 4.2 Open Access Classification – National and Main Research Area Level ................ 18 5 Process 4: Quality Assurance ........................................................................................................... 18 6 Process 5: Output .................................................................................................................................. 19 6.1 Data Reports for download ..................................................................................................... 19 6.2 Web Dissemination via The Danish Research Database ............................................ 19 7 Appendix A: The Fulltext Download Sub Process ................................................................... 20 Revision 1 of 20 April 2018

Upload: others

Post on 25-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

1

OpenAccessIndicatorfor2016

Part2

TechnicalDescriptionofDataFoundation,ProcessesandOutput

0 Preface..........................................................................................................................................................21 IntroductionandMainProcesses.....................................................................................................32 Process1:CollectionofTheData......................................................................................................42.1 TheUniversitiesPublicationData...........................................................................................42.1.1 RequirementsonUniversities–MetadataFormatandMethodofCollection.....................................................................................................................................................42.1.2 ThisYearsUniversitiesandTheirResearchDatabases.......................................5

2.2 AuthorityandAuxiliaryData....................................................................................................52.2.1 DirectoryofOpenAccessJournals(DOAJ)................................................................52.2.2 Sherpa/Romeo(Sh/Ro).....................................................................................................52.2.3 TheDanishBibliometricResearchIndicator(BFI)...............................................52.2.4 AuthorityList:AcceptedExternalRepositories(”TheWhitelist”).................62.2.5 AuthorityList:JournalswithextendedEmbargo(”TheBlacklist”)...............6

2.3 ThisYearsCompleteDataCollection.....................................................................................63 Process2:DefiningtheSetofIn-ScopedPublications.............................................................63.1 TheSetofScopedRecordsIncludingDuplicates..............................................................73.2 TheSetofScopedRecordsExcludingDuplicates.............................................................8

4 Process3:CalculationofOARealizationandPotential...........................................................94.1 OpenAccessClassification–UniversityLevel................................................................104.1.1 CheckingforGoldenOpenAccessPotential..........................................................114.1.2 CheckingforGreenOpenAccessPotential.............................................................124.1.3 CheckingforUnused&UnclearPotential...............................................................154.1.4 CheckingOpenAccessPotential–Combined.......................................................16

4.2 OpenAccessClassification–NationalandMainResearchAreaLevel................185 Process4:QualityAssurance...........................................................................................................186 Process5:Output..................................................................................................................................196.1 DataReportsfordownload.....................................................................................................196.2 WebDisseminationviaTheDanishResearchDatabase............................................19

7 AppendixA:TheFulltextDownloadSubProcess...................................................................20

Revision1of20April2018

Page 2: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

2

0 PrefaceTheNationalSteeringGroupforOpenAccess1hasproposedtheDanishAgencyforScience,TechnologyandInnovationandDenmark’sElectronicResearchLibrary,todevelopaDanishOpenAccessIndicator.TheintentionistosupporttheimplementationofthenationalOpenAccessstrategy2-cf.thestrategy’sstatementonmonitoring:”TheimplementationofOpenAccessistobemonitoredonanongoingbasistoensurethatallpartiesmakeamaximumefforttodevelopanddisseminatefreeaccessibilitytoDanishresearchfindings.”TheOpenAccessIndicatoriscalculatedonceperyearwiththetargetfield:ScientificandpeerreviewedarticlesandconferencecontributionsinjournalsandproceedingswithISSN.InthecontextofHorizon20203,EUrequiresthatOpenAccessbeestablishedwithinatmost6monthsafterpublicationfortheareasofscience,technologyandhealthandwithinatmost12monthsforthesocialsciencesandhumanities.Thisdelayiscausedbymanyjournalsmaintainingso-calledembargoperiods,wheretheyexcluderesearchersfromestablishingOpenAccesstothearticlesbeforetheendoftheembargoperiod.AstheOAIndicatoriscalculatedonceannuallyforallpublicationswithinitstargetfield,itisdesignedtoacceptaone-yeardelayinOpenAccesstothepublications.Consequently,theOAIndicatorfor2016iscalculatedearlyMarch2018inordertoaccommodateafullyearembargoperiodalsoforpublicationsfromDecember2016.InpracticethismeansthatpublicationsfromJanuary2016couldhaveembargoperiodsallthewayupto24monthsandstillbecreditedbytheOAIndicator.ThedescriptionoftheOpenAccessIndicatorisorganizedintwoparts:

• Part1:Overviewofdatafoundation,processesandoutput• Part2:Technicaldescriptionofdatafoundation,processesandoutput

Note:InPart2,thetechnicaldescription,thenotionoftheindicator’s“targetfield”isexpressedusingtheterm“setofscopedrecords”.

Queriesregardingtheindicatormaybedirectedto

AdamBaden/Hanne-LouiseKirkegaardDanishAgencyforScienceandHigherEducationMinistryofHigherEducationandScienceBredgade40DK-1260KøbenhavnKEmail:[email protected]/[email protected]

1http://ufm.dk/en/research-and-innovation/cooperation-between-research-and-innovation/open-access2http://ufm.dk/en/research-and-innovation/cooperation-between-research-and-innovation/open-access/Publications/denmarks-national-strategy-for-open-access3https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf

Page 3: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

3

1 IntroductionandMainProcessesTheactivitiesoftheOAIndicatorcanbebrokendownintothesefivemainprocesses.

Thefivemainprocessesaredescribedinfurtherdetailinthesectionsbelow.ThisdescriptionoftheOpenAccessIndicatorisaimedforatechnicallyinclinedaudienceandaimstodescribeindepthhowtheIndicatorworks–overallaswellasindetail.ThedescriptionassumesthatthereaderhasfamiliaritywithbasicXML4andbasicpartsoftheXPath5notationforreferingtoXMLelementsofanXMLdocumentconformingtoacertainXMLSchema.Italsoassumesthatthereaderisfamiliarwithvisualisationofprocessesafworkflowdiagrams6.

4https://www.w3.org/TR/xml/5https://www.w3.org/TR/xpath-30/6https://en.wikipedia.org/wiki/Flowchart

Page 4: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

4

2 Process1:CollectionofTheData

ThefirstactivityintheOAIndicatoristhecollectionofthecompletedatafoundationusedbytheindicator.Thisincludesimportingsixnationalandinternationalsources.Thedatafoundationiscomposedofmetadatadescribingthepublicationsoftheuniversities,aswellasauthority-andauxiliarydata.

2.1 TheUniversitiesPublicationDataMetadatadescribingthepublicationsoftheuniversitiesareusedtoestablishthesetofpublicationsinscopeoftheOAIndicator.MetadatadescribingthepublicationsoftheuniversitiesarecollectedfortheOAIndicatoronceannually.Collectionisdonedirectlyfromtheuniversities,usinganXML-basednationallyagreedexchangeformatandanationallyagreedexchangeprotocol.Forfulltextsregisteredinthecollectedpublicationmetadata,collection(download)areattempted.

2.1.1 RequirementsonUniversities–MetadataFormatandMethodofCollectionAuniversitycanbeincludedintheOAIndicatorifitmeetsthefollowingminimumrequirements:

• Publicationspublishedbyresearchersemployedattheuniversityarecollectedinauniversityresearchdatabasecontainingpublicationdata,persondata,projectdataetcofthatparticularuniversityonly.

• ThisresearchdatabaseoftheuniversitymustexposeitspublicationdatausingOAI-PMH(http://www.openarchives.org/OAI/openarchivesprotocol.html).

• TheresearchdatabasemustsupportOAI-PMHselectiveharvestingusingSets,characterisedbytheirsetSpec(code),toharvestonlypartsofthedatabase.

• AdedicatedOAI-PMHSetexposingallpublicationdataheldintheresearchdatabasemustexist.

• Forthisdedicatedset,OAI-PMHmetdataPrefix”ddf_mxd”mustbesupported.• WhenanOAI-PMHclientharvestthisdedicatedsetusingmetadataPrefix

”ddf_mxd”,metadatarecordsmustbevalidDDF-MXD(http://mx.forskningsdatabasen.dk/mxd/).

Page 5: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

5

2.1.2 ThisYearsUniversitiesandTheirResearchDatabasesThefollowing8universities–andassociatedresearchdatabases–areincludedintheOAIndicatorfor2016:University ResearchDatabase-OAI-PMHserver OAI-PMHsetSpecAAU http://vbn.aau.dk/ws/oai publications:allAU https://pure.au.dk/ws/oai publications:allCBS http://research.cbs.dk/ws/oai publications:allDTU http://orbit.dtu.dk/ws/oai publications:allITU https://pure.itu.dk/ws/oai publications:allKU http://curis.ku.dk/ws/oai publications:allRUC http://rucforsk.ruc.dk/ws/oai publications:allSDU http://heinz.sdu.dk:8080/ws/oai publications:all

2.2 AuthorityandAuxiliaryDataAuthorityandAuxiliaryDataarecollectedfortheOAIndicatorfromvarioussources.Foreachofthesesources,thecollectionisdoneonceannually.Collectionmethodanddataformatsvaryacrosssources.

2.2.1 DirectoryofOpenAccessJournals(DOAJ)DOAJisusedbytheOAIndicatorasanauthorativelistofGoldenOpenAccessJournalsaswellasthesourceofdatadescribingifthejournalrequireAPCchargesornot.Parametersofthedatacollection:

• Protocol:OAI-PMH(serverhttp://www.doaj.org/oai/)• metadataPrefix:oai_dc• Dataformat:DublinCore(http://dublincore.org/documents/dces/)• Enrichment:Per-journallookupusingRESTAPIendpoint

https://doaj.org/api/v1/journals(cf.https://doaj.org/api/v1/docs#!/CRUD_Journals/get_api_v1_journals_journal_id)

• Dataformat:JSON

2.2.2 Sherpa/Romeo(Sh/Ro)Sh/RoisusedbytheOAIndicatortodeterminethepolicyforGreenOpenAccessbyjournals,andtherebytheOpenAccesspotentialofindividualjournalarticles.Parametersofthedatacollection:

• Protocol:HTTP(GETfromhttp://www.sherpa.ac.uk/downloads/)• Dataformat:ProprietaryXML-basedformat(http://sherpa.ac.uk/news/2012-10-08-

RoMEO-API-News.html)

2.2.3 TheDanishBibliometricResearchIndicator(BFI)DatafromBFIareusedbytheOAIndicatorfortwopurposes:

• Toidentifyduplicatepublicationdataacrossuniversities(existsforcollaborativepublicationswithcoauthorsemployedatdifferentuniversitiesandthereforeregisteredinmultipleresearchdatabases)

• Toresolvepotentialconflictswrt.MainResearchAreasregisteredinthemetadataforthepublications

Parametersofthedatacollection:

Page 6: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

6

• Protocol:HTTPS(GETfromhttps://bfi.fi.dk/AnnualReport)• Format:CompressedExcelspreadsheet–undocumentedtemplate

2.2.4 AuthorityList:AcceptedExternalRepositories(”TheWhitelist”)Forfulltextsdepositedinexternalrepositories,thisauthoritylistisusedbytheOAIndicatortoonlyallowfulltextsdepositedinacceptedexternalrepositoriestodemonstrateRealisedOpenAccessPotential.

• Protocol:Mail(fromAuthoritylistmaintainers)• Format:ExcelSpreadsheet–undocumentedtemplate

2.2.5 AuthorityList:JournalswithextendedEmbargo(”TheBlacklist”)TheauthoritylistisusedbytheOAIndicatortoreclassifyfromUnusedtounclearOpenAccessPotentialforjournalsregisteredonthelist.

• Protocol:Mail(fromAuthoritylistmaintainers)• Format:ExcelSpreadsheet–undocumentedtemplate

2.3 ThisYearsCompleteDataCollectionSummaryofthedatacollectionfortheOAIndicatorfor2016:Source Protocol Ver. Format Ver. CollectionDateAAU OAI-PMH 2.0 DDF-MXD 1.4.0 6/3–2018AU OAI-PMH 2.0 DDF-MXD 1.4.0 6/3–2018CBS OAI-PMH 2.0 DDF-MXD 1.4.0 6/3–2018DTU OAI-PMH 2.0 DDF-MXD 1.4.0 6/3–2018ITU OAI-PMH 2.0 DDF-MXD 1.4.0 6/3–2018KU OAI-PMH 2.0 DDF-MXD 1.4.0 6/3–2018RUC OAI-PMH 2.0 DDF-MXD 1.4.0 6/3–2018SDU OAI-PMH 2.0 DDF-MXD 1.4.0 6/3–2018DOAJ OAI-PMH 2.0 DC+JSON % 6/3–2018Sh/Ro HTTP % Proprietary % 5/3–2018BFI HTTPS % Proprietary % 23/10-2017Whitelist Mail % Proprietary % 6/3-2018Blacklist Mail % Proprietary % 5/3-2018

3 Process2:DefiningtheSetofIn-ScopedPublications

Page 7: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

7

AfterthecollectionofalldatafortheOAIndicator,anumberofactivitiesareinitiatedinordertoisolatethepublicationrecordswhichareinscopefortheOAIndicator.Notallpublicationsareinscope–onlyasubsetofthepublicationsoftheuniversities.Thescopeisdefinedas:

• Scientific,peer-reviewedarticlesandconferencecontributionspublishedinjournalsorproceedingswithISSN

Thus,thesubsetofpublicationmetadatarecordsrepresentingthisscopemustbeisolatedfromthetotalsetofpublicationmetadatacollected.Thsisdoneintwoways,inordertofacilitatestatisticsonthenationallevelandontheuniversitylevel:

• Scopedrecordsincludingduplicates–forstatisticsontheuniversitylevelForcollaborativearticlesacrossuniversities,allregistrationsfromallparticipatinguniversitiesarekept

• Scopedrecordsexcludingduplicates–forstatisticsonthenationallevelForcollaborativearticlesacrossuniversities,onlyoneregistrationiskept.

3.1 TheSetofScopedRecordsIncludingDuplicatesEachoftherequirementsinthedefinitionofthescopemapsnicelytoacorrespondingruleregardingDDF-MXDdataelementsandtheircontent.Thesetofscopedpublicationmetadatarecordsarethereforethesetthatcompliestoalltherules.Therulesaredescribedbelow.Firstofall,thesetofscopedrecordsmustrepresentrecordswithagivensubmissionyear.Initialruleistherefore:

0) Thesubmissionyear(indberetningsår)mustbemarkedupinthepublicationmetadatarecordwiththegivenvalue.Ruleapplied:Attribute/ddf_doc/@doc_yearhavethevalue(year)fortheOAindicatorcalculation

Subsequently,thefollowingfourrulesareappliedonallrecords:

1) Thetypeofthepublicationmustbemarkedupinthepublicationmetadatarecordas”JournalArticle””Reviewarticle”or”ConferenceContribution”(samedefinitionof“article”asusedbyBFI).Ruleapplied:Attribute/ddf_doc/@doc_typehasvalue“dja”,“djr”or“dcp”.

2) Thereview-statusofthepublicationmustbemarkedupinthepublicationmetadatarecordas“Peer-review”(similardemandasforBFI).Ruleapplied:Attribute/ddf_doc/@doc_reviewhasvalue“pr”.

3) Thescientificlevelofthepublicationmustbemarkedupinthepublicationmetadatarecordas“Scientific”(similardemandasforBFI).Ruleapplied:Attribute/ddf_doc/@doc_levelhasvalue“sci”

4) ThepublicationchannelofthepublicationmustbemarkedupinthepublicationmetadatarecordwithanISSN.Ruleapplied:Element/ddf_doc/publication/*/issnhasvalue.

Page 8: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

8

3.2 TheSetofScopedRecordsExcludingDuplicatesForcollaborativepublicationsbetweentheuniversities,multiplepublicationmetadatarecordsmayrepresentthesamepublication.Asthisisimpracticalwhenproducingstatisticsonthenationallevel,asetofscopedrecordswithoutduplicatesareproduced.Thissetisproducedbyexposingthesetofscopedrecordswithduplicatestoadeduplicationprocess.Theambitionofthisprocessistoensure,thatforeachpublicationinthescopeoftheOAIndicatorandforwhichthereisatleastonerecordinthesetofscopedrecordsincludingduplicates,thereisexactlyonerecordinthesetofscopedrecordsexcludingduplicates.Thededuplicationprocescreatesclustersofrecords.Aclustercontainsrecordsthatrepresentsthesamepublication.Thefullsetofscopedrecordsexcludingduplicatesisultimatelyestablishedbyproducingonerecordpercluster.Thealgorithmforproducingclustersis:

1) RecordsthatwerepartoftheBFIcalculationforthesamesubmissionyearandwereidentifiedbytheBFIprocessasbeingduplicates,areaddedtothesamecluster

2) Recordsforwhichsignificantmetadataelements(DOI,title,subtitle,ISSN,publicationyear,etc.)matchessufficientlywell,areconsideredtorepresentthesamepublicationandareaddedtothesamecluster

ThisalgorithmrespectsBFI’sdeduplicationalgorithm:Rule(1)ensuresthatanyrecordsidentifiedbyBFIasduplicatesarealsoidentifiedbytheOAIndicatorasduplicates.ThescopeofBFIandthescopeoftheOAIndicatordiffer.Thismakesitrealisticthatothernon-BFI-scopedrecordsarepartoftheOAIndicatorscopeandareindeedduplicatestootherrecords.Rule(2)ensures,thattheserecordsareinfact(besteffort)beingfathomedintoclustersaswell.Thus,clustersmayinclude

a. OnlyrecordswhichwerepartofBFI,b. BothrecordswhichwerepartofBFIandrecordswhichwerenot,orc. OnlyrecordswhichwerenotpartofBFI.

Asubtlebutimportantremark:ForclusterscontainingBFIrecords-(a)and(b)above–theBFIrecordsclusteredbyrule(2)abovemaystemfromdifferentBFIclusters.OAIndicatorclustersmaycontainBFIrecordswhichwerenotjoinedbytheBFIdeduplicationalgorithm.ConflictResolutionTheresultsoftheOAIndicatoraredistributedonMainResearchArea(MRA).Inordertobeabletodothisdistribution,eachclustermusthaveauniqueMainResearchArea.BFI’sdefinitionofMRAisusedbytheOAIndicator:

• Science(sci)

Page 9: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

9

• SocialScience(soc)• Humanities(hum)• Medicine(med)

AllDDF-MXDrecordscontainauniqueMRA.Forrecordsinthesetofscopedrecordsincludingduplicates,theseMRA’sareused.Forrecordsinthesetofscopedrecordsexcludingduplicates,recordsintheunderlyingclustersmaydisagreeonMRA.UsingBFIterminology,suchasituationiscalledanMRA-conflict.SuchMRA-conflictsmustberesolvedsoeachclusterhasauniqueMRA.ThealgorithmforresolvingMRA-conflictsinaclusterare:

1) IfalltherecordsinaclusterhavethesameMRA,thisisusedforthecluster(noconflict)

2) Otherwise,ifoneormoreoftherecordsintheclusterwerepartofaBFIcluster,theBFIMRAforthatclusterisused.

3) IfnoneoftherecordsintheclusterwerepartoftheBFIcalculation–orifmultiplerecordswerepartofdifferentBFIclustersdiagreeingontheirBFIMRAforthoseBFI-clusters–majoritywins:TheMRAoftheclusteristheMRArepresentedbymostoftherecordsinthecluster.

4) IftwoormoreMRA’sarerepresentedbythesamenumberofrecordsinthecluster,theMRAwiththehighestrepresentationintheentiresetofscopedrecordsischosenforthecluster.

Thisalgorithmensures,thattheOAIndicatorsolvespotentialMRA-conflictsrespectingtothelargestextendpossiblethecorrespondingMRA-conflictresolutionsdonebyBFI.

4 Process3:CalculationofOARealizationandPotential

ThecalculationofOArealisationandpotentialaredonerespectingGreenandGoldenOpenAccess.Thecalculationisdonenationally,distributedonMainResearchArea(MRA)anddistributedonuniversities.

Page 10: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

10

TheOpenAccesspotential–andtherealisationofthat–isinitiallycalculatedperuniversity,usingaper-publicationapproachbasedonthesetofscopedrecordsincludingduplicates.Subsequently,itisalsocalculatedforthenationallevelandMRAlevel,alsousingaper-publicationapproach,butbasedonthesetofscopedrecordsexcludingduplicatesForbothsets,eachrecord/publicationbelongingtothesetisclassifiedaccordingtohowthepublicationrealiseitsOpenAccesspotential.Therearethreevaluesforthisclassifications,andtheyarecolorcodedusinggreen,yellowandred(trafficlight):

• RealisedOpenAccesspotential• UnusedOpenAccesspotential,and• UnclearOpenAccesspotential

Forsomein-scopedrecords,theclassificationincludesattemptingadownloadofafulltextregisteredintherecord.Fortechnicalreasons,theactualdownloadattemptsofallpotentialfulltextsarethefirstsubprocess.PleaserefertoAppendixAfortechnicaldetailsonhowthisisdone.Forrecords/publicationsclassifiedasRealised,thetypesofrealisationarealsodetermined.TherearefourtypesofRealised:

• GoldenOpenAccessinjournalswithAPC• GoldenOpenAccessinjournalswithoutAPC• GreenOpenAccessfromlocalrepository• GreenOpenAccessfromexternalrepository

Eachrecord/publicationmayhavemorethanonetypeofrealisation.

4.1 OpenAccessClassification–UniversityLevelForanyrecordinthesetofscopedrecordsincludingduplicates,theOpenAccesspotentialisestablishedthroughanumberofvalidationsteps.Asanoverview,theclassificationprocesscanbeillustratedasfollows:

Page 11: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

11

Pleasenote,thatalthoughthediagramaboveindicatesthatvalidationforGoldenandGreenOpenAccesstakesplaceinparallel,theactualimplementationis,thatGoldenisvalidatedbeforeGreen.Eachofthestepsillustratedaboveareworkflowsoftheirown.Theyaredescribedindividuallybelow.

4.1.1 CheckingforGoldenOpenAccessPotentialFirst,thejournalregisteredinthepublicationmetadatarecordischeckedagainstDOAJ.Ifpresent,thepublicationisconsideredonewitha(Golden)OpenAccesspotential,andthepotentialisconsideredtobeRealised.Todeterminethetypeofrealisation,DOAJAPIisrequestedforthejournal,andJSONresponseelementapc{average_price}ischecked.Below,thiselementisreferredtoinshorthandnotation‘apc_price’.Ifapc_pricehasavaluebiggerthanzero,thetypeofrealisationisconsideredtobeGoldenwithAPC.Otherwise,itisGoldenwithoutAPC.Theassociated–simple-workflowcanbedepictedasfollows:

Page 12: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

12

4.1.2 CheckingforGreenOpenAccessPotentialGreenOpenAccessvalidationofapublicationrecordinvolvesinspectingtheelement/ddf_doc/oa_link.Below,itwillbereferredtowiththeshorthandnotation//oa_link.Recordsmaycontainzero,oneormore//oa_linkelements.ThecombinedworflowforvalidatingGreenOpenAccessisasfollows:

Page 13: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

13

Threedecisionsinthisworkflowhastodowithqualification.Thesethreedecisionsaremadefollowingsub-workflowsdescribedbelow.Foreachfilethatpassallthreedecisionssuccessfully,givingtherecordstatusRealised,theTypeof(GreenOpenAccess)realisationforthisfileisdetermined.Thisfourthdecisionisalsodescribedbelow.Decision1://oa_linkelementqualify?Aqualified//oa_linkelementisa//oa_linkelement

• withattribute@typehavinganacceptablevalue(”loc”forlocalor”rem”forremote”–not”doi”forDOI),and

• witha@urlattributethathasavalue.Checkingforqualificationcanbeillustratedwiththefollowingworkflow:

Page 14: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

14

Decision2:DoesURLqualify?AqualifiedURLiseitheraURLtoalocalrepositoryoraURLtoanexternalrepositorythathasaprefix(domainnameandpotentiallyalsopath)registeredforarepositoryonthelistofacceptedexternal(/remote)repositories(theWhitelist).Checkingforqualificationcanbeillustratedwiththefollowingworkflow:

Decision3:DoesFilequalify?Aqualifiedfileisafilethat

• canbedownloadedbyacomputer• wherethecontentofthedownloadedfilehassizebiggerthanzero

Checkingforqualificationcanbeillustratedwiththefollowingworkflow:

Page 15: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

15

Decision4:DeterminingthetypeofrealisationThetypeofrealisationisdeterminedbyattribute//oa_link/@type:

• Ifthisattributehasvalue“loc”,thetypeisGreenOpenAccessfromlocalrepository,• otherwiseitisGreenOpenAccessfromexternalrepository.

Illustratedbythefollowingworkflow:

4.1.3 CheckingforUnused&UnclearPotentialIftherecordhasnoRealisedOpenAccessPotential,therecordisexaminedtodetermineifthepotentialisUnusedorUnclear.TheOpenAccesspotentialofthepublicationisderivedfromthetheOpenAccesspotentialofthejournalregisteredinthepublicationmetadatarecord,asregisteredintheSherpa/Romeadataset(c.f.http://www.sherpa.ac.uk/romeoinfo.html).

Rulesapplied:

• IftheISSNofthejournalisregisteredinSherpa/Romeowithcolorcodegreen,blueoryellow,thejournalisconsideredonewithOpenAccessPotential,andthepublicationmetadatarecordisconsideredonewithanUnusedOpenAccesspotential.

o AnExceptiontothisruleis,iftheISSNisregisteredonthelistofacceptedjournalswithextendedembargoperiods(theBlacklist).Ifso,therecordisreclassifiedtoUnclear

Page 16: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

16

• IfthejournalisregisteredinSherpa/Romeowithadifferentcolorcodeornotregisteredatall,thejournaldoesnothaveaclearOpenAccesspotential,andthepublicationmetadatarecordisconsideredtobeonewithanUnclearOpenAccesspotential.

Thisvalidationcanbedepictedasfollows:

4.1.4 CheckingOpenAccessPotential–CombinedThus,thecombineddecissionworkflowfordeterminingtheOpenAccesspotentialofarecordis:

Page 17: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

17

Page 18: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

18

4.2 OpenAccessClassification–NationalandMainResearchAreaLevelPublicationmetadatarecordsinthesetofscopedrecordsexcludingduplicatescorrespondtoclustersofoneormorerecordsfromthesetofscopedrecordsincludingduplicates.AfterclassifyingeachoftherecordsofthesetofscopedrecordsincludingduplicatesaccordingtoOpenAccesspotentialanditsrealization,clustersinheritclassificationsaccordingtoa”best-classification-wins”algorithm,usingthefollowingdecisionworkflow:

ForclustersclassifiedasRealised,thetypeofrealisationfortheclusterisalsoinheritedfromtherecordsofthecluster.Theinheritanceisdonebyunion:AnytypeofrealisationassociatedtoanyrecordintheclusterthatareclassifiedasRealised,arealsoassociatedwiththeclusterasawhole.

5 Process4:QualityAssurance

TheresultsoftheOpenAccessIndicatorhavebeensubjectedtoqualityassurance.Foradescription,pleaserefertotheOverviewdocumentation

Page 19: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

19

6 Process5:Output

Asoutput,theOpenAccessIndicatorproduceanumberofdatareportsaswellasweb-friendlyvisualisationsofthesummationsofthese.TheDanishResearchDatabase(http://forskningsdatabasen.dk/)isusedasdisseminationplatformforthevisualisationsandthereports.

6.1 DataReportsfordownloadFivedatareportsareproduced:

1) Summations::Thesetsofscopedrecords,aggregatedanddistributedonRealized(andtypesofrealisation),UnusedandUnclearOpenAccesspotential

a. Nationaly(setofscopedrecordsexcludingduplicates)b. DistributedonMainResearchArea(setofscopedrecordsexcluding

duplicates)c. Distributedontheuniversities(setofscopedrecordsincludingduplicates)

2) Detailedfoundationfor(a)and(b):Totallistofpublicationrecordsinthesetof

scopedrecordsexcludingduplicates

3) Detailedfoundationfor(c):Totallistofpublicationrecordsinthesetofscopedrecordsincludingduplicates

4) Thelistofacceptedexternalrepositories(TheWhitelist)usedforthecalculation

5) Thelistofacceptedjournalswithextendedembargoes(TheBlacklist)usedforthecalculation

6.2 WebDisseminationviaTheDanishResearchDatabaseThesummationsoftheOpenAccessIndicatorarevisualisedonhttp://forskningsdatabasen.dk/en/open_access/overview,fromwheredatareportscanbedownloadedaswell.

Page 20: Open Access Indicator 2016 Technical en...2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science, Technology and Innovation and Denmark’s

20

7 AppendixA:TheFulltextDownloadSubProcessAllthefulltextsregistered(byitsURL)inthescopedsetofpublicationmetadatarecordsareattempteddownloadedinasinglesubprocess.Thissubprocessisimplementedinthefollowingway:

• Fulltextsaredownloadedonebyone(serial;notinparallel)

• Fulltextsaredownloadedina”UniversityRoundRobin”fashion:o onefulltextfromuniversity1o onefulltextfromuniversity2,o onefulltextfromuniversity3,o …,o onefulltextfromuniversityN,o onefulltextfromuniversity1,o onefulltextfromuniversity2,o …,o onefulltextfromuniversityN,o …o …

AlldownloadsaredoneautomaticallybytheOAIndicatordownloadrobot.Anyrepositoryholdingthefulltexts(eithertheresearchdatabasesoftheuniversitiesorexternalrepositories)canidentifyadownloadbytheOAIndicatorrobotby:

• IPaddress:192.38.67.38