towards an eu psi portal
TRANSCRIPT
-
8/6/2019 Towards an EU PSI Portal
1/39
1
TowardsapanEUdataportaldata.gov.eu
ProfessorNigelShadbolt
15thDecember2010
Version4.0
1 ExecutiveSummary 22 Introduction 33 data.gov.eumotivation 34 data.gov.euservices 5
4.1 LocatorServices 64.2 RecordsCreation 8
5 SchemaDesigns 96 TheRoadtoStardom-SupportingLinkedData 117 Multilingualism 138 SearchandRetrievalServices 149 RecommendationServices 1610 Dataexploitation 1611 DataEnhancementandImprovementServices 17
11.1 EvaluationServices 17 11.2 CommunityDevelopmentandUserInvolvement 18
12 Challenges 1812.1 OpenData 1912.2 Technical 1912.3 Financial 2012.4 Legal 2012.5 SocialandOrganisational 2112.6 Political 22
13 TimeframeandDeliverables 2314 Conclusions 2615 Acknowledgements 26AppendixAAnOpenGovernmentDataServiceDeployment:thedata.gov.uk
exampleatDecember2010 27AppendixBDescriptionofmeta-dataelements:thedata.gov.ukexample 31AppendixCUKPublicDataPrinciples 38AppendixDAPossibleProjectPlanforPilotdata.gov.eu 39
-
8/6/2019 Towards an EU PSI Portal
2/39
2
1 ExecutiveSummary
1. There is a rapidly growing move towards the release of open data. EUMemberStatesareinthevanguardofthismovementwithcountriessuchas
theUKputtinglargeamountsofitsPublicSectorInformationonline.
2. Publishing Public Sector Information (PSI) promotes transparency andaccountability,supportsevidenced-basedpolicy,createssocialandeconomic
value,achievesefficienciesandcanimprovethequalityofdataitself.
3. As this happens we need to build data portals sites like data.gov.ukcatalogue the databeing releasedand offer a rangeof additional services.
Theseportalsactasunifiedpointsofaccesstofindthepublisheddata.
4. data.gov.euwouldbeanEUportalthatwouldaggregatenationalefforts.Datafrom across the EU would be catalogued and discovered using common
standards. It would showcase applications that exploit the data. It would
connectcommunitiesofdevelopersandusers,organisationsandindividuals,
privateandpublicbodiesfromacrosstheEUastheyuseandexploitthedata.
5.Challenges include a scarcity of PSI in some member states data.gov.eucouldactasastimulushere.Therearealsowell-understoodfinancial,legal
and organizational challenges around the publication of PSI. Political
leadershipand Public DataPrinciples can dealwith theseemphasizing the
advantagesandbenefitsofPSIpublication.
6. Technical architectures exist that would make the implementation ofdata.gov.eu relatively straightforward. A modest programme of resource
coulddelivertheportal.IfprocuredandmanagedinanagilefashionasaBeta
site(oneundercontinuousrefinement)itoughttobedeliverablein6months
atacostof500K-thereaftermaintenanceofthebasicsiteataround200K
peryeardependingonthespeedatwhichnationaldatabecameavailable.
7. Commissioninganinitialproject inthisareaneednot beexpensive,wouldprovidevaluableservices,deepenourunderstandingofhowtopublishand
exploitEUPSI,buildacommunityofusersanddeveloperscapableofcreating
andderivingvaluefromthedataacrossallEUMemberStates.
-
8/6/2019 Towards an EU PSI Portal
3/39
3
2 Introduction
AnumberofEUMemberStates(e.g. data.gov.ukintheUK)andelsewhere(the
USwithdata.gov)havedeveloped orare now in the processofcreatingopen
datacataloguesandportalsofPublicSectorInformation(PSI).Thisapproachis
also beginning to be implemented in a limited number of municipalities and
regions (e.g. Catalunya dadesobertes.gencat.cat, Piemonte dati.piemonte.it).
These initiatives are beingundertaken by the public sector and are providing
accesstoabroadandsubstantialsetofgovernmentdata,makingdataaccessible
inelectronicrawdataformats(e.g.XLS,CSV)allowingforitsimmediatere-use.
These initiativeshave attracted widespread interestand are highlighting how
publicdatacanbemademoreaccessibleandavailableforreuse.Thesettingup
ofadata.gov.eu,anEUportalaggregatingnationalportals(UK,FR,ES,etc.)could
becomeanEUflagshipinitiativeallowinggovernments,companiesandcitizens
to easily find, understand, and re-use data created and maintained by the
European institutions and Member States. This report is a scoping paper
intendedtoofferguidance astowhat the initiative couldoffer, the services it
might provide, as well as the problems to be surmounted and a preliminary
tentativeprojectplantoimplementaworkingsystem.
Specificallythereportaddressesthefollowing:
1. Itdetermines the potential European addedvalueofsuch an initiative,identifyingpotentialbenefitstoEuropeanre-usersandcitizens
2. It identifies the different potential types of services the EU portalcould/shouldeventuallyprovide.
3. Itidentifiesthemaintechnical,financialandorganisationalchallengestheestablishmentoftheportalwouldentail
4. It presents a tentative timeframe and proposes intermediarysteps/milestones(feasibilitystudy,pilotproject,etc.)
3 data.gov.eumotivation
The past 18 months has witnessed the emergence of a number of open
governmentdata(OGD)initiatives.Twoofthemostdevelopedaretobefoundin
-
8/6/2019 Towards an EU PSI Portal
4/39
4
the UK and the US. However, Member States, regions and cities are also
embracingopendatainitiatives1.Themotivationsvarybutanumberofbenefits
havebeenadvancedfortheseOGDprojects.
The first of these is transparency simply put the provision of detailed
informationrelatingtoEuropeancommoninterestssuchastaxation,spending,
education,transport,energy,environment,crime,healthetc.enablescitizensto
be better informed and to be able to make comparisonswithin and between
states.
Related to this is accountability information in these sectors holds the
providersofpublicservicesaccountablefromspendingoninfrastructuretothe
timeliness of transport, from death rates in hospitals to employability of
graduates.
Open data is also the basis for evidence-based policy this provides EU
institutionsandMemberStateswiththeabilitytobasetheirpolicydecisionson
empiricaldatadatathatisopentopublicscrutinyanddebate.
TheprovisionofopengovernmentdataacrossMemberStatescanalsogenerate
socialvaluesocialvaluearisesfromthepublicgoodtowhichthedatacanbeput. This can range from community action to remedy noise pollution to
equitable access to public transport, from identifying those disadvantaged by
poorITinfrastructuretoprovidingthemeansforpatientstoshareinsightsand
experience.
Aswellassocialvalueopengovernmentdatacangiverisetoeconomicvalue-
openpublicdatacanbeaggregatedandenrichedtooffernewservicesthatcan
generaterealeconomicreturnsexamplesincludelookingatspendingpatternstodetermineintelligentprocurement,buildingsmarttransportationapplications
frompublicallyaggregatedtime-tabledataacrossEurope.Theseapplicationsare
built outside of government procurement and exploit data in ways not
anticipatedbythecollectingagency.
1PublicSectorInformation(PSI)DataCatalogues(bygovernments)
http://www.epsiplus.net/psi_data_catalogues/category_1_public_sector_informa
-
8/6/2019 Towards an EU PSI Portal
5/39
5
Opendataalsoofferstheopportunitytoachieveefficienciesthroughoutpublic
servicesandgovernment,betweenandwithinMemberStates.Examplesrange
frommoreefficientprocurementtoreducingthecostsofservicingFreedomof
Informationrequests,fromcuttingthecostofpublishingdatainamyriadofnon-
machine readable formats toreducing the cost of locatingand retrievingdata
fromacrossMemberStates.Opendatacansupportmoreforless.
There is also growing evidence that open data can improve data quality
publicdataisoftenincomplete,outofdateorofvariablequality.Crowdsourcing
data improvement is possible once data is openly published and a feedback
meansprovided.Whetheritisthelocationofbusstopsorpotholesinthestreet
thepublicprovidesapowerfulresourcetolocate,identifyandreporttheactual
factsofthematter.
Above and beyond these general benefits there are a number of particular
potentialbenefitsthatwouldaccruefromapanEuropeandataportal.
ProvidesasinglepointofaccessforEuropeanpublicsectordata. PromotesaracetothetopasknowledgeofsuccessfulOGDinitiatives
stimulatesnewinitiativesinotherMemberStates.
Supports interoperability across processes thanks to the greateravailabilityofdata.
Provides better comparability of EU 27 information and data bothbetweenMemberStatesandforthepurposedofwiderglobalcomparison.
ReducesEUadministrativecosts. Eliminatesorcutsdownre-publicationcostsofofficialEUinformation. Provides planning and monitoring resources for those companies that
operateacrossEUborders.
SupportstheEuropeaninnovationprocess. Facilitates the harmonisation of standards and guidelines for open
governmentdataacrossEurope.
4 data.gov.euservices
Inthissectionwereviewthetypesofservicesuchaportalcouldprovide.Two
classesofservicewouldbecentraltodata.gov.eufirstlythelocatorservicesthat
-
8/6/2019 Towards an EU PSI Portal
6/39
6
usemetadata tocatalogue anddiscoverrelevantcontentandsecondlyrecords
creationthatsupportthecreationanddepositionofdataassetsintotheportal.
OnemightassumethatGovernmentsandtheiragenciesmaintainthedata.This
iscurrentpracticeatalreadyestablishedportalse.g.data.gov.uk(SeeAppendix
A). However, in themix of practice we see proposals for the development of
servercloudstooutsourcedatasets.Thuscloud-hostingdataatEUlevelmight
changesomeoftheassumptionsbelow.Wearealsowitnessinganumberofnon-
governmental agencies and groups aggregating national datasets, such efforts
but not explicitly supported by national governments (e.g. in Ireland
opendata.ie).
4.1 LocatorServices
When referring to locator services we generally mean the set of human,
organisational,andtechnologicalfacilitiesthathelppeoplefindinformation.A
high level view of the locator infrastructure is presented in figure 1. This
envisages a situation where catalogues are harvested or provided by the
respectiveMemberStates,andaresynchronisedbyamediatorservice(although
thegeneralarchitecturecouldalsosupportthesituationwheredataisharvested
orsuppliedbyagenciesotherthanEUMemberStates).Theroleofamediatoris
essentialifwearetoprovidemorethanasimplecatalogueofcatalogues.Priorto
publication,therecordsoughttobepreparedsoastosupportglobalretrieval,
multi-lingual search, and facilitate recommendation processes that utilise
schematicandotheralignmentsbetweenthecatalogues.Afundamentalaspectof
any portal is to support the developmentofa community ofusers capable of
providingrelevancefeedbacktotheretrievalmodels,exploitthedatainarange
ofapplicationsandfurtherenrichitbyinterlinkingtoothermaterial.
-
8/6/2019 Towards an EU PSI Portal
7/39
7
Figure1:data.gov.eulocatorservicearchitecture
Wecan see how someoftheseserviceshavebeenprovidedinthe largestEU
Open Data project data.gov.uk - a technical summary and topology of the
physical architecture employed is provided in Appendix A. The functional
componentsinclude
1. a data registry from a customised version of CKAN (http://ckan.net/)whichprovidesanopenregistryofdataandcontentpackages,
2. afrontendContentManagementSystem(CMS)usingDrupalapowerfulopensourceCMS(http://drupal.org/)(v.6),
3. a search engine provided by Apache SOLR an open source enterprisesearchengine(http://lucene.apache.org/solr/),
4. aLinkedDatacapability(http://data.gov.uk/linked-data)andhostingenvironmentprovidedbyTalis(e.g.
http://api.talis.com/stores/ordnance-survey/services/sparql)andTSOs
FourStoreservices(e.g.http://gov.tso.co.uk/transport/sparql).
5. Additionalservicesforend-usersupportincludeblogs,wikis,forums,andmailinglists.
CKAN is themain mediator service of data.gov.uk and the primary route for
publisherstoincludetheirdataintothecatalogue.Thegreatmajorityofdatasets
remainhostedonthepublisherssite,notdata.gov.uk.Thearchitectureisfurther
splitintotwomainsubnets:asubnetusedforprimaryconnections,andabackup
subnetforfail-overconnections(moredetailsareprovidedinAppendixA).
-
8/6/2019 Towards an EU PSI Portal
8/39
8
Afundamentaltechnicalchallengeofdata.gov.euconcernstheoriginalformatsof
thecataloguesfedintothemediator,theirlinguisticstructure,andtheoriginal
schema used tomodel them.Global schemamightbe the preferred option in
such cases. A timeframe towards common implementation of core datasets
wouldbedesirable aswould agreed taxonomies andontologies forclassifying
specificelementsofthecatalogues.
In the following sectionsweoutline the processes of record creation, schema
design, and discuss support for LinkedData,andmultilingualism. Services for
end-user support and collaboration, including discovery, retrieval, and
exploitation of data, are presented later in the report. Facilities for data
enhancementandevaluationarealsocovered.
4.2 RecordsCreation
Creationanddepositionofdatarecordswillgenerallyoccuralongsideorduring
the period and process of collecting and manufacturing the data assets. The
practice starts with the public sector body in charge of producing the data
resourceandendswiththereleaseofthemetadatarecord,andpotentiallythe
resourceitself,totheonlineportal,oritsmediatorservice(asinfigure1).
Asthesizeofthecataloguesexpandsovertime,thecostandmaintenanceeffort
willincrease.Keepingup-to-dateandconsistentthousandsortensofthousands
of records will demand careful use and choice of technology. Versioning and
recordcleansingtoolswillbeanimportantclassofservicesandtheseshouldbe
addresseddirectlybyanyEuropean-wideinitiative.
It is unlikely that a central service would cover and support all European
agenciesthroughouttheentirelifecycleoftheirrecords.Butsuchaservicecould
prove valuable for agencies to use. Additionally, it would be useful for
administeringandcontrollingtheflowoftherecordscreationprocess.
Versioningtools,recordcleansingandrefinementapplications(e.gGoogleRefine
http://code.google.com/p/google-refine/),RDFconvertors,automaticclassifiers
and recommenders for keyword or category related metadata, could be
employed or unified at a central location. Promoting their use would foster
better classification of records, faster and seamless integration, and an
improvement of metadata elements (such as misspelled, redundant and
-
8/6/2019 Towards an EU PSI Portal
9/39
9
ambiguoustags).Further,providingaunifiedbundleforagenciestouse,either
as an integrated network-service or as a package for in-house deployment,
wouldalleviatethetaskofamediatortocleansetherecordspriortopublication.
5 SchemaDesigns
While access mechanisms, such as database technologies and user interfaces,
mightchangeovertime,certainfundamentalaspectswillpersistandcarrylong-
term implications. Citizens searching for information resources will generally
always need information to guide their search.What is the data about? Who
distributesit?Whenandwherewasitaccruedandpublished?Howfrequentlyis
itupdated?Successfuldeploymentandcontinuousevolutionofapan-European
portal will be interlinked with the ways in which such characteristics of the
informationareexposedtosearchingweneedstableschemas.
A catalogue schema should provide a common semantic basis for meaningful
searching, encapsulate thesalient featuresof information andbesufficient for
user needs. At the same time, the schema must remain flexible so as to
accommodatelocalextensions.MemberStatesmightwishtomodifyschemain
waysthattheythinkisbesttoservetheirownusergroups.Diversity,therefore,
mightbe commonplace, especially as the portal starts to integrate increasing
numbersofcatalogues, astheyaremadeavailable.The roleofamediatorwill
remain prominent for semantic mappings between one metadata perspective
andanother.
The w3c e-government group has undertaken work to develop a cataloguing
schemaforgovernmentcatalogues.TheoutcomehasbeentheDCATontology2,
andwhichwasdeployedbydata.gov.uk.Thevocabularyisdetailedinfigure2.
DCAT proposes a unified format for publishing the contents of a government
catalogue and proposes the use of several open vocabularies, such as FOAF
(www.foaf-project.org/), Dublin Core (dublincore.org/documents/dces/), and
SKOS (www.w3.org/2004/02/skos/). There is a clear separation between
metadataandprovenanceinformationrelatedtoadataset,thecataloguerecord,
and the catalogue itself, which is also depicted as a unique element in the
ontology.
2DataCatalogVocabulary,DCAT,http://vocab.deri.ie/dcat-overview
-
8/6/2019 Towards an EU PSI Portal
10/39
10
Figure2:TheDCATontology(recreatedhere)
DCAT comprises a breadth ofmetadata, whichhas proved adequate for most
purposes.Thecurrentmetadataadvicefordata.gov.ukisprovidedinAppendixB
andalignswiththeDCATontologyhere.
Clearly there is a substantial amount of experience to draw on in this area.
AlthoughtheUKmetadataschemaisitselfworkinprogressitprovidesauseful
initialdesignreferencepoint.Nevertheless, therearestillaspectsforpotential
improvementandmetadataschemasshouldbeseenassubjecttoevolutionand
improvement.Someoftheareasfordevelopmentaresummarizedbelow.
Although Accrual Periodicity appears to be modeled, users might also beinterested in the exact methods of accrual of the data, otherwise called
collectionmode.
Fullerdescriptionofthecontributorsofthedata:authors,publishers,thirdpartiesinvolved,andmaintainingprovenancetrailswhentheresponsibility
fordatacollectionchanges.
-
8/6/2019 Towards an EU PSI Portal
11/39
11
Informationaboutthe cost ofcreationand dissemination,and liabilitiesofuse(ifnotcoveredbyanyopendatalicensesused).
Connectionstoataxonomyforrepresentinggeographicalboundaries,suchasfordc:spatialorcoverage.Thiswillenableseamlesstopologicalintegrationof
therecords,animportantfactorforcross-statesearchandretrieval.Related
workhasbeenongoingaspartoftheNUTS3hierarchicalclassificationforthe
economicterritoriesoftheEU.AmorerecentinitiativeattheUniversityof
SouthamptonhasintegratedtheNUTStaxonomywithRDFandLinkedData
formats4.
Need to representmeta data relating to large statistical datasets. There isexistingwork topublish large-scalestatistical datasetsasRDF and Linked
Data5.Thishasarisenfromeffortstointegrateexistingstatisticalmetadata
schemasuchasSDMXwiththeW3CLinkedDatastandards.
Theproposedclassificationschemesforessentialelementsintheschemashould
ideally be adopted over time by the local agencies. For example, a uniform
format for temporal information ISO-8601, or the aforementioned NUTS
taxonomy.Thiswill,again,alleviatethetaskofthemediatorandreducepossible
ambiguities.Considerationsheremaybealignedwiththosediscussedinsection
3.2onRecordsCreation, sincepartoftheproposalistofindpathwaystoassist
agenciesinthecorrectdeliveryofrecords.Itisimportanttopromotethewidest
possibleapplicationoftheschemaatthepointofcreatingtherecords.
6 TheRoadtoStardom-SupportingLinkedData
InpromotingthepublicationofOpenGovernmentDatatherehavebeenavariety
ofpracticestodate.AcrosstheEUMemberStatesitislikelythattherewillbea
widevarietyofformats.InordertosucceedanyEUdataportalshouldacceptall
structureddataformats.
3Nomenclatureofterritorialunitsforstatistics,NUTS,Eurostat,
http://epp.eurostat.ec.europa.eu/portal/page/portal/nuts_nomenclature/introduction4GianlucaCorrendo,ManuelSalvadores,YangYang,NickGibbins,andNigelShadbolt.Geographicalservice:acompassforthewebofdata.InWWW2010
Workshop:LinkedDataontheWeb(LDOW2010),April2010.5http://data.gov.uk/resources/coins
-
8/6/2019 Towards an EU PSI Portal
12/39
12
Neverthelessbestpracticesuggeststheadoptionofamethodologythatsupports
progressiontomoreusefulformatsintermsofdataintegrationandenrichment.
The'fivestar'methodologyoutlinedbySirTimBerners-Lee6hasbeenadopted
withindata.gov.ukthisoutlinesapragmaticmeanstoprovideforaprogression
throughincreasinglymoreflexibleformats.
ifthedataisavailableontheWeb(anyformat)
ifthedataismadeavailableasstructureddata(e.g.thexlsoutput
fromExcel)
ifthedataismadeavailableusingopen,non-proprietorialstandard
formats(e.g.thecommaseparatedvaluesCSVformat)
if URIs are used to identify those things the data represents (so
peopleandmachinescanpointatyourdata)
if the data represented as in 4 is linked to other peoples data
(LinkedDataapproach)
Afeatureofdata.gov.ukhasbeentheadoptionoflinkeddataprinciples.Whilsta
majority of datasets are submitted as 3 star data the UK effort has seen the
construction of a digital estate that comprises a set of URIs describing the
objects of interest. The paradigm of Linked Data is a principled approach to
connecting structureddataontheWebusingtyped links i.e. linksthatenclose
meaninganddistinctfunctionality.
TheadoptionofLinkedDatastandardsforopendataisgrowingrapidly7.There
hasbeensignificantadoptionbylargeorganisationssuchastheBBC,theLibrary
ofCongress,theGuardian,andotherPSIportalstheUSequivalentdata.govand
UKsdata.gov.uk.Itisenvisagedthatintimeallthesewillbelinkedtogether,in
ways thatwill allowuserstostartbrowsing inone datasource and gradually
navigate along links to related data sources. The development of data.gov.eu
could represent a leading initiative towards this goal.The adoptionofLinked
Dataprinciplesto federateEuropeancatalogueswithsimilarinitiativesaround
theworldwillpavethewaytothenextgenerationlocatorservices.
6LinkedOpenDataStarScheme,http://lab.linkeddata.deri.ie/2010/star-
scheme-by-example/7http://linkeddata.org/andhttp://en.wikipedia.org/wiki/Linked_Data
-
8/6/2019 Towards an EU PSI Portal
13/39
13
OwnershipoftheURIsontheWebofData(sometimesexpressedsimplybytheir
namespace)iscrucialasisprovenanceandpersistence.Thisappliesnotjustto
thedomaincontentwithinthedatasetsbutalsotothemetadataandcatalogue
URIs.ShouldrecordandmetadataURIspointbacktotheportalsofthemember
states,orshouldtheybecentralisedattheEUplatformi.e.bemodeledwiththe
EU namespace? This is a decision that will drive some parts of the general
architecture.
Whateverthedecisionithighlightsapotentiallyimportantpairofrolesforapan
EUportal.
1. An authoritative registry of conceptual models (e.g. the definition of aterms in domains such as transportation or spending). This would
facilitate analysis,exploration andapplication developmentacross data
setscontributedbyseveraldataproviders.
2. Anauthoritativeproviderofcodes,identifiers,URIs(e.g.toseewhatthismightlooklikeseetheURIsforadministrategeographyprovidedbythe
UK Ordnance Survey http://data.ordnancesurvey.co.uk/) that could be
reused by national initiatives when structuring their own data. This
would make it possible for information about the same entity to beretrievedacrossindependentlydevelopeddatasets.
7 Multilingualism
Linguistic and cultural diversity is a key feature of European identity; it
comprisesthesharedheritageofanintegratedEuropebutpresentsachallenge
to media and content online. The European Union is a continent of many
languages.Therearecurrentlymorethan23officiallanguagesintheEU,with60
orsoindigenous,regionalandminorityones.
While the learning and dissemination of European languages is encouraged,
multilingualism in many digital communication technologies remains a
challenging task. Work on multilingual lexicons and research on language-
independentretrievalmodelsarebeginningtoreceiveattention.Examplesare
the increasing availability of electronic lexicons and translators, provided as
-
8/6/2019 Towards an EU PSI Portal
14/39
14
open source and using international standards, such as the EuroWordNet
database8.
Member States should have different choices for the languages of their
cataloguesand,thetermsandelementswithintheirdatasets.However,thenwe
face the challenge of catalogue translation between the different European
languages.Multilingualsupportwouldbeaprominentassetofdata.gov.eubutat
thesametimeitrepresentsanenormouschallengetoovercome.
Severalquestionsneedtobeaddressed.Howmanylanguagesshouldtheportal
support? Which parts of the catalogues are most essential for translation to
reach aminimum functional stage? What are themost effective ways to use
technology to implement multilingual support in the portal? Can we reduce
complexitybycoordinatingamanualtranslationphase?
Thereisacasetobemadeinthefirstinstanceforahubandspoketranslation
methodology. For catalogue schema and the associated taxonomieswewould
requirethattheinitialcreationoftherecordalsosuppliedatranslationmapping
ofthetermsintoEnglish.Thiswouldenablecoresearchmethodstomapintothe
native datasets in a relatively straightforward manner. More sophisticated
translationsupportcouldbedevelopedsubsequently.
8 SearchandRetrievalServices
Duringthelast30orsoyears,workoninformationretrievalhasgrownbeyond
itsprimarygoalsofindexingandsearchingfordocumentsinlibrarycollections.
Increasinglyweseetheimportanceofintelligentrankingalgorithmstoimprove
the qualityofanswer sets,studyingthe behaviors ofusersand understanding
their needs, delivering improved graphical interfaces and collaborative
frameworks.
Severaltechnicalapproacheshavebeenpursued;relevancefeedbacktoembody
user satisfaction about the retrieval process, exploiting document semantics,
query log analysis to correlate user requests, query expansion via local and
globalgraphanalyses,useoftermassociationandmetricclusters,development
of similarity and statistical thesauri. If we take account of this rich range of
8http://www.illc.uva.nl/EuroWordNet/
-
8/6/2019 Towards an EU PSI Portal
15/39
15
informationretrieval approacheswecanimproveuser satisfactionbyfulfilling
themostnaveandatthesametimethemostcomplicatedinformationrequests.
The development of data.gov.eu will bring expert users and regular citizens
together and require a robust and collaborative retrieval model. The rich
semantic structure of government catalogues, coupled with flexible retrieval
models and rich user interfaces, has the potential to deliver breakthrough
servicesforthePSIneedsofEuropeancitizens.
Weoutlinebelow some of the fundamental search and retrieval functionality
neededtohelpassistusersindealingwiththevolumeandvarietyofrecordsat
such a portal. In the next section we describe additional recommendation
services.
Facetedsearchandwaystoprogressivelyrefinesearchrequests.Exploitandcombinetherichmetadataofthecataloguestoprovideintelligent,yetsimple
search utilities i.e. be able to get precise results on a guided search for
hospital listings per capital or country in Europe, retrieve records about
accommodationfacilitiesandrestaurantsrefinedbycountryortown,etc.
Embody user feedback directly in the retrieval/search process. There areseveralwaystoaccomplishthissomeinvolvealgorithmicapproaches,such
asqueryexpansionviarelevancefeedback,othersrelyoninterfacefeatures,
suchas iterativeand interactive search.Probabilistic retrievalmodels also
incorporateuserfeedbackintherankingprocess.
Access to previous search requests. Browsing history and key conceptsvisited throughout the users stay. Let users save their history in their
profiles. Let them view their history (primarily consisting of requests and
observedrecords)asatag-cloudandgetrecommendationsfromit.
Feedsforpreviouslyviewedrecordsanddownloadeddatasets. ProvidealistofdatasetsthatcannotbereleasedthisaddressesaFreedom
of Information (FOI) issue encountered in the UK and likely to exist
throughouttheEU.Howcanyouaskforwhatyoudontknowisthere.Asset
inventoriesshouldbeavailablewiththeirstatus.
Better tag-cloud navigation. Recent analysis of tag-clouds coming from anumber of available portals data.gov, data.gov.uk, data.australia.gov.au
reveallargeamountsofagency-definedindexterms,coveringawidearrayof
topics.Onaverage recordscomewith about8 suchkeywords,whilesome
-
8/6/2019 Towards an EU PSI Portal
16/39
16
records contain more than 100 pre-compiled keyword terms. These are
usefulandcanbeexploitedforimprovingnavigationofthetag-clouds.
Amultilingualretrievalservicebeabletosearchinmultiplelanguages.
9 RecommendationServices
Arangeofrecommendationservices9wouldbepossibleinaportalofthissize
andambition.Forexample,
SimilarusersalsodownloadedAmazon-stylerecommendations.Someuserswill have a longer and more interactive experience with the portal than
others.The systemcanprovideassistancetonaveusersornewcomersby
correlating theirprofileswith expert usersusing features of the resources
they search for, records they view, and the datasets they download. This
implicitcollaborationofexpertwithinexperienceduserswillcertainlyhelp
individualstomineresourcesthatarenotreadilyavailableorrankedhighon
theirsearchresults.Recommendationsofthistypemaybedeliveredinthe
formofseewhatsimilarusersaresearching,orwhiletheuserisviewinga
specificrecord.
Recommendationsofsimilardatasetsforlocationsnearbytheusersspecificgeography.Thiscanbeemployedonaper-recordrecommendationbasisi.e.
offertherecommendationwhiletheuserisviewingaspecificrecord.
Popularandfeatureddatasetspresentedonthemainpage. Supporting network formation of people like me is another
recommendationmethodthatiswidelyandsuccessfullyused.
10Dataexploitation
Theportalshouldfeatureservicesthatenableexploitationofthedata.Forfull
engagementthereareanumberofdistinctaudiences.
9DietmarJannachetal,(2010)RecommenderSystemsAnIntroductionCUP,
ISBN:9780521493369
-
8/6/2019 Towards an EU PSI Portal
17/39
17
Organisations that have the in house expertise toextract value from thedata.Forexample,traditionaldataintegratorswhowouldfindmostvaluein
the databeingeasy to find and available todownloadinstandardformats
andwherethesemanticsofthedataareexplicit.
Non-specialists who have an interest in the data but are not specialistprogrammers. Here a collaborative, simple module for putting together
mashupsofdatasetsiscalledforonethatallowsfordiscussionaroundthe
dataand itsproperties.Simpleblogsandlinksto thedatasetscansuffice
seeforexamplehttp://www.datamasher.org/node/106
DeveloperswhoaremovingtoLinkedDataandW3Crecommendations 10 here the data would need to be available as well-formed RDF with well
designedURIseitherasfilesoravailablefromSPARQLendpoints.
Developers who appreciate the data in accessible forms that do notnecessarilyrequireLinkedDataexpertiseseeforexampletheLinkedData
APIofdata.gov.ukthatdeliversJSONandotherforms11.
Linkstoscriptsforconvertingandmanipulatingthedatashouldalsobemade
availablewherepossible.Further,thecataloguesshouldbeintegratedwithlocal
datasites,eveniftheyofferredundantmeansofpresentingdata.Available
mashupsandothervisualisationsofthedatacouldalsobelinkedoffthemain
site(e.g.data.gov.uk/appsandwww.data.gov/developers/showcase).
11DataEnhancementandImprovementServices
11.1EvaluationServices
Member States and Public sector bodies will need to undertake regular
evaluationsinordertodeterminethefuturestrategiesthatshouldbeadoptedby
providersofPSIThesemaybebasedaroundthequality,value,andquantityof
recordsprovided.Thefivestarreferredtoearlierinthisreportisaproposal
thatofferspragmaticguidanceinthisprogression.Further,amodulewithvisual
aidstoviewtheevaluationswillalsobeusefuli.e.barcharts,linecharts,league
tables.
10http://www.w3.org/standards/semanticweb/data11http://www.epimorphics.com/public/presentations/ogdc-
slides/ogdc.html#(1)
http://code.google.com/p/linked-data-api/
-
8/6/2019 Towards an EU PSI Portal
18/39
18
11.2CommunityDevelopmentandUserInvolvement
Therecruitmentanddevelopmentofanengageddevelopercommunitywasone
of themajorsuccesses ofdata.gov.uk the community ofdevelopers actedas
criticalfriendsasthesystemwasbeingdevelopedandprovidedmuchofthe
inputtorefineandenhancethelook,feel,andservicesofferedbythesite.
Inordertosupportthecommunitiesthatwillnaturallyformaroundtheportal,
obviousfunctionalityincludes
User profileswith groupmembership, user networks, shared blogs, wikis,email lists and fora. In supporting these functionalities it is important to
recognizetheimportanceofactiveadministratorsandmoderators.
User ranks provide kudos for those who are the main and most valuedcontributors.
AcollaborativetaggingmodulethatallowsuserstoapplytheirowntagstothedatasetsDel.icio.usstyle
Ask the experts module discussion forums are essential elements fordisseminatingbestpractice.
Itis important toremember thatdevelopersandusersofPSIcareabouttheir
localcontext first and foremost. Theirprimary interest is likely tobe localor
hyperlocal12,regional,national,andsupernationalinthatorder.Anycommunity
developmentneedstobearthisconsiderationinmind.Itshouldalsopayclose
attentiontohowdevelopersthemselvesareusingthedata.
12Challenges
There are a number of significant challenges that present themselves when
considering a pan EUportal of the sort described here. Technical challenges
have been addressed already in this report although it is useful to outline
additionalspecificitems.Asecondsetcouldbeconsideredessentiallyfinancial
or resource specific. Finally there exist a set of challenges around the social,
organizational,legalandpoliticalaspectsofapanEUportal.
12http://openlylocal.com/hyperlocal_sites
-
8/6/2019 Towards an EU PSI Portal
19/39
19
12.1OpenData
PerhapsthesinglemostsignificantchallengeisthescarcityofnationalEUOpen
GovernmentData initiatives. The largest nationalefforts are still restricted to
particular English speaking democracies; UK, US, Australia and New Zealand.
There are however a significant number of regional EUopen data initiatives.
Moreover,thereareanumberofEUnationalinitiativesplanned.Butitislikely
that initially the EUportalmighthave toserveas the primary datacatalogue
whilst national portals are commissioned. In some ways this is a strong
argumentinfavouroftheEUportalsincetherearemanynationaldatasetsthat
arereadyandavailableforpublicationbutasyetnonationaldepositoryexists.
12.2Technical
Thelargesttechnicalchallengeisaroundthecharacterizationofanappropriate
portalarchitectureandthesetofassociatedservices.Thisreporthasreviewed
thelargestextantsuchportaldata.gov.uk- itprovidesanexistenceproofthat
the basic functionality can be realized. It would seem a good candidate for
implementationatleastinanypilotsystemthatwascommissioned.
TheparticulartechnicalproblemsrelatingtothepanEUportalarearoundthe
multilingualfacilitiesandtherewouldneedtobecarefulconsiderationaround
the initial functionality offeredhere.Thisreporthas argue foraminimal hub
andspokesolutioninthefirstinstance.
Itishardenoughtoestablishconsistentdatareportingstandardsandformats
withinexistingnationalbordersandadministrations.Thesewouldbemultiplied
across the EU and there would need to be considerable political will and
leadershiptosupportthistechnicalharmonization.
Itisimportanttobuildevaluationandassessmentintothearchitecturefromthe
outset.Thisallows for the ability tomeasuredatareuseand establishessome
notiononthereturnoninvestmentorsavingsthatmightresult.
Iflinkeddataistobesupportedorelseifdatasetsaretobehostedbytheportal
there is the questions of providing sufficient computational resource.
Infrastructureneedstobepaid for andmaintained.Largedatapublishersare
outsourcing someof the datahosting tocloudproviders.However, ifSPARQL
-
8/6/2019 Towards an EU PSI Portal
20/39
20
endpoints are tobesupported allowingdatasetstobequerieddirectlyover
thesecanimposesubstantialcomputationaloverheads.
12.3Financial
Onefinancialchallengethathastobefactoredinbyanydatapublisherconcerns
datasetsthataretodaydistributedforafeeandrepresentasourceofincometo
some administrations. Some of these provide local economic benefit but the
greaterbenefitofreleasingthedatatoalargeraudienceandsetofapplication
developershastobeconsidered.Somedatasetsrelatingtocompanyregistration
orelsegeography are particularly important in fusing togetherotherdata if
thiskey connectivedata is restrictedbyfeesand licences then the benefitsofopendataneveraccrue.
Theservicewillhaveacosttosetupandrun.Thereisalsoacostintermsofthe
capabilitydevelopmentwithinpublishers.Peoplehavetobegiventheskillsto
generateopendata.
12.4Legal
Asubstantialchallengewillexistaroundvariationsanddifferencesinlegislation
across Member States. For example, basic data access and storage law varies
across the EU. This willmake data publication a challenge for someMember
States.
Anothersubstantialchallengeisstrikingtherightbalancebetweentransparency
and open data and ensuring that privacy is maintained. This might involve
careful redaction of items in datasets that could identify individuals for
exampleinthepublicationofcrimestatistics,orspendinginformation13.
A fundamental precondition to the effective implementation of OGD is open
licensing.Experienceinmanycountriesisthatdevelopersarediscouragedfrom
usingdataiftherearerestrictionsaroundit.Thesecanvaryfromfeesthatare
charged, through to the assertion of deriveddata rights over the data.OGD
flourisheswheretherearenoimpedimentsonthereuseofdata.TheUKOpen
13http://data.gov.uk/blog/transparency-and-privacy-review-announced
-
8/6/2019 Towards an EU PSI Portal
21/39
21
GovernmentLicenseisoneofthemostsignificantachievementsintheUKeffort
andservesasan interesting template14.Licensesneedtobeopenforthedata
setsandalsofortheassociatedmetadata.
AnopenlicensepolicywouldseemtobeaprerequisiteforthepanEUportaland
essentialifwearetomaximizereuseofdataassets.
12.5SocialandOrganisational
Although implicit when discussing the services needed for developers and
citizens it must be explicitly recognized that there needs to be active
involvementofbothifwearetobuildviablecommunitiesofusersofthedata.
Thiswillrequireactivemanagement.
AtthepanEUlevelitislikelythatarangeofsuppliersofsoftwaresystemsand
serviceswouldfindthedataofsignificantinterest.Theinclusionofvalueadded
material and services from private enterprise will be an important success
criteria.Theirinvolvementisimportant.
Inbuildingcommunitiesofengagedstakeholdersaneffective communications,
disseminationanddevelopmentprogrammeisneeded.Whetherviaconferences
oronlinecompetitions,schoolsorUniversities,throughpublicorprivatesector
bodies a feature of OGD initiatives to date has been vibrant community
involvement (e.g. the Government Bar Camp15 movement). The pan EU effort
needstoreplicatethis.
The engagementofthemediahasbeenaparticular feature inOGD.Mediaare
increasinglylookingattheconceptofstoriesfromdatatheGuardianinthe
UK andNew York Times in US have been two prominent proponents of this
approach16. The increasing use ofpublic databy informationaggregators and
14http://www.nationalarchives.gov.uk/doc/open-government-licence/15http://en.wikipedia.org/wiki/BarCamp,http://opengovernmentdata.org/camp2010/,
http://opengovernmentdata.org/camp2010/after/16http://www.guardian.co.uk/news/datablog/2010/oct/01/data-journalism-
how-to-guide,http://data.nytimes.com/
-
8/6/2019 Towards an EU PSI Portal
22/39
22
search engine companies also offers an opportunity for engagement and
dissemination.
The organizational challenges are significant. Not least because many in
participatinggovernmentswillviewOGDassomethingelsetheyarebeingasked
todo.Thebenefitshavetobeclearlyexplainedandthedatapublishershaveto
be acknowledged in this process. Even with clear commitment there will be
variable practice in publication capabilities. We have learnt in the UK that
keeping the publication process close to those who generate the data is
important.Themorehubsand aggregationpoints thatare put in theway the
moredifficulttheorganizationalprocess.
InthecaseofrelativelymaturenationalOGDsitesitmightbesimplyamatterof
harvestingorsynchronizingwiththeseportals.Analternativemodelistoaskthe
datapublisherstosubmittheirdatatotheirnationalandtheEUportal.However,
onewouldneedtobesurethatnoadditionalburdenwasbeingplacedonthe
publishersintermsofmetadata.Thisisanotherreasontofindageneralformof
theDCATschemethatcanbeadoptedacrosstheEU.
12.6Political
ThereisastrongargumentfortryingtoestablishasetofPublicDataPrinciples
(SeeAppendixC). Many of these relate to the policy assumptions around the
publicationofpublicnon-personaldata. Inparticular embracingOGDrequires
themove toapositionthatnon-personal public datawillbepublished unless
thereisaclearreasonnotto.
Finally,andperhapsmostimportantlythemostsuccessfulOGDinitiativeshave
receivedtop-levelpoliticalsupportandleadership intheUKsuccessivePrime
MinistersandintheUSataPresidentiallevel.TheEUisdemonstratingpolitical
leadership and commitment in the statements of EC Vice-President Neelie
Kroes17
17
http://www.epsiplatform.eu/news/news/neelie_kroes_on_eu_open_data_portal
-
8/6/2019 Towards an EU PSI Portal
23/39
23
ExpectationmanagementisimportantandatthesametimethatapanEUportal
isbeingbuilttheCommissionsowndatashouldbemadeavailable.NotdoasI
saybutdoasIdo.
Thereisgreatvalueinpublishingallthedatasetsonecanfindontheprinciple
thatotherswillfindapplicationsfordatathatgovernmentshadneverthoughtof.
Thisunanticipated reuseofcontentwas one of themajor lessonsofWeb 1.0.
However, public interest and enthusiasm is maintained when high value
datasetsarereleaseddatathatthepubliccaresabout.IntheUKthishasbeen
around how the Government spends taxpayersmoney, crimeand healthdata,
data relating toschoolsand transport etc.One should aim inpan EUwork to
surfacesuchdata.Indeedoneofthereasonsforhavingsuchaportal isthatit
producesanenvironmentinwhichcitizensofoneMemberStatecanaskwhyif
otherscanhavecertainpublicdatareleasedwhycantthey.
13TimeframeandDeliverables
Despite the complexities of a pan EUportal given the rapid developments at
national and regional level there is a strong argument for launching a pilot
projectquickly.
However,oneoftheimmediatechallengesthatfollowsonfromthecurrentlack
ofEUnationalportalsisestablishingwhethertogoforabroadbasedcatalogue
where the portal acts as an official depository whilst national efforts get
underway or else to select a number of official or otherwise recognized
nationalsitestoactasfeeds.
Whichever approach is takenabasicset ofdeliverables outlinedbelowwould
allowfortheestablishmentofadata.gov.eupanEUportal.Thisshouldcomprise
thebasicelementsoftheservicesdescribedinthisreportandsowouldneedthe
followingas intermediate tasks, subtasksandmilestones (seealso the GANTT
chartinAppendixD)
1. Workpackage1data.gov.eudataandschemadesign1. GuidelinesforpanEUURIdesignschemesimilarinnaturetothe
advice contained here http://data.gov.uk/resources/uris but
allowing for particular design considerations from an EUperspective2personmonths/40days
-
8/6/2019 Towards an EU PSI Portal
24/39
24
2. Initial Data Inventory for inclusion in the portal this willprovidedanoverviewofextantdataanditsrespectiveattributes
formats,licencesandotherbasicmetadata2personmonths/40
days
3. DCATstyleschema for record creation initialbasicschema forEUdatasetrecords1personmonth/20days
2. Workpackage2data.gov.euOpenLicence1. Open Licence for portal content need todetermine if thiswas
requiredaboveandbeyondthelicencesattachingtothenational
datasetsbutcertainlyitwouldbeforthepanEUportalspecific
dataelements3personmonths/60days
3. Workpackage3data.gov.eutechnicalinfrastructure1. A CKAN installation customized for EUdatasets would expect
thistobeinlinewiththatmaintainedforexamplebytheUK1
personmonth/20days
2. Basic search, retrieval and recommendation capabilities sufficienttoenabledatasetstobeidentified,retrievedandsimilar
itemsidentified3personmonths/40days
3. Basic minimal level of multilingual support initially wouldsuggest all submitted data sets additionally have a mapping of
their vocabulary and data schema into English this wouldprovide for a hub-based mapping model that would have
significantinitialutility3personmonths/60days
4. CMS framework selection and design from a number ofalternatives1personmonth/20days
5. BasicWeb2.0toolstosupportcommunicationandcollaborationblog,wiki,posting ofapplications, suggestions for new data and
applications,discussionforum,email lists2personmonths/40
days4. Workpackage4data.gov.euSystemIntegration
1. SystemIntegrationPhaseItoensureallelementsofthesystemandworkflowprocessesintegrated-2personmonths/40days
2. MilestoneDeveloperReleaseofdata.gov.euafter40%ofelapsedpilotproject
3. SystemIntegrationandRefinementPhaseIItoiterateandrefineelementsoftheportal-3personmonths/60days
4.MilestonePublicReleasedata.gov.eu6monthsafterinitiationofproject
-
8/6/2019 Towards an EU PSI Portal
25/39
25
5. Workpackage5data.gov.euLinkedDataInfrastructure1. SPARQL endpoints toprovide Linked Data access via SPARQL
endpointsrunningontriplestoretechnology-1personmonth/20
days
2. Linked Data Support Tools modify and customize variety oflinkeddatapublicationtoolsfortheportal1.5personmonths/30
days
3. CoreReferenceLinkedDataSets-convertanumberofcoredatasetsintolinkeddata2personmonths/40days
4. ExemplarApplicationsLinkedDataFunctionalitybuildexemplarlinked data applications using core reference data 1.5 person
months/30days
6. Workpackage6ProjectManagementandDissemination1. ProjectManagement1personmonth/20days2. DisseminationandCommunication1personmonth/20days
Totalpilotprojectresource30personmonthsor5full-timeequivalentsover6
monthsequatestoaround400Kofstaffcosts(fulleconomiccost)and100
Kforothercostsincludingsystemprocurementuntillaunch.
Ongoing development andmaintenancecosts are difficult toestimate. Inpartbecause the effort required will be a function of the success of the initial
deployment.IfthesiteservesasastimulusforMemberstatestopublishtheir
datathentheamountofdatatobedealtwithcouldincreaseverydramatically
andthiswouldhaveaneffectonthemaintenanceeffortandcouldalsostimulate
requests for new services. A modest ongoing cost of operation and support
wouldbearound200Kperyearoverfivesyearsequatingto1Mandatotal
projectcostofaround1.5M.
The costs are only possible if the system is procured with the following
characteristics : (i) based as open source, (ii) presented as a perpetual Beta
(systemunderdevelopment)and(iii)adoptanopenlicence
Thisshoulddeliverasystemthatcouldbelaunchedwithin6monthsalthough
developersandthewiderdatacommunitywouldbeinvolvedfromtheoutset
sothatsystemmeetstheirneeds.Thesystemwouldhavebasicfunctionalityand
serve to highlight the particular challenges and opportunities afforded bydata.gov.eu
-
8/6/2019 Towards an EU PSI Portal
26/39
26
Based onnational experience building such sites can happen if procured and
managedusingagilemethods.Thisisbestservedbyasmallteamofdedicated
developers with a few officials having the political backing and authority to
surfacetheinitialdatasets.
14Conclusions
ThereisasignificantmomentumbehindOGD.Therearecleareconomic,social
and political benefits. Within the EU Member States there is a range of
experience and some developed national expertise. The number of official
Member State portals is still small but more can be expected in arise in the
courseofthenextyear.
Ifthe EUdoesnothing thereare still likely tobeefforts tocatalogue theOGD
efforts.However,asthisreportandanearlierWorkshop18pointsoutapanEU
portal could act as a stimulus for national efforts. This report considers that
commissioning an initial project in this area need not be expensive, would
provide valuable services, deepen our understanding of how to publish and
exploitEUOGD,buildacommunityofusersanddeveloperscapableofcreating
andderivingvaluefromthedataacrossallEUMemberStates.
15Acknowledgements
Thisreportreflectsthepersonalviewoftheauthor.Hegratefullyacknowledges
input from JamesForrester of the data.gov.uk team and ChristosKoumenides
fromtheUniversityofSouthampton.Thereportalsobenefitedfromaworkshop
hostedbytheCommissionatwhichtheauthorwaspresent.
18http://cordis.europa.eu/fp7/ict/content-knowledge/docs/report-ws-pan-eu-
dat-porta_en.pdf
-
8/6/2019 Towards an EU PSI Portal
27/39
27
AppendixAAnOpenGovernmentDataServiceDeployment:the
data.gov.ukexampleatDecember2010
Architecture
Therearefourprimaryfunctionalcomponentsofdata.gov.uksmainWebservice
(i)thedataregistry,(ii)frontend,(iii)search,and(iv)theLinkedDatahosting.
Other than Linked Data and very occasional use e.g. for the HM Treasury
Comprehensive Spending (COINS)datasets, hostingofdatasets is providedby
thepublisher,notdata.gov.uk
data.gov.ukas-isphysicalarchitecture(excludingLinkedDataservices)
-
8/6/2019 Towards an EU PSI Portal
28/39
28
Thedataregistryisprovidedfromaslightly-customisedversionofCKANwhich
providestheopenregistryofdataandcontentpackages.Thecustomizedversion
bothleadsandlagsfromthepublicversionrunningckan.net,andissubjectto
on-goingdevelopment(especiallywithregardstoimplementingtheECs
INSPIREdirective).CodechangesarereleasedintothemainCKANtreewhen
theyarestable.
The service includes a (read-only) public API at catalogue.data.gov.uk which
returnsJSON,JSON-PorusingtheCKANclientlibrary.ThereisanRDFviewof
thiscatalogue.TheRDFserviceisan"official"service,butitisnotnative,instead
itoriginatesfromanRDF-converteddailydumpfromtheUKGovernmentCKAN.
ItisavailablefromtheAPI,e.g.http://catalogue.data.gov.uk/doc/dataset/coins).
Thereisanargumentfor buildinganativeRDF-enabled service basedaround
triplestoretechnologyratherthanthecurrentPostgresRDBMS.
ThefrontendCMSisprovidedusingDrupal(v.6).Meta-dataaboutthedatasets
ismasteredinandprovidedfromCKANdirectlyexceptforafewitems(e.g.user
ratings,linksbetweendatasetsandrelatedapps,etc.),whichwillbemovedover
toCKANasandwhenappropriate.
ThereisRDFainselectedareasof thefrontend(e.g.datasetdescriptionpages),buta comprehensiveversionwiththehuman-andmachine-readableversions
produced from the same back-end would require moving to a different CMS
(assumedtobeDrupal7,whenthisbecomesstable).
Search isprovidedbyApache SOLR(open sourceenterprise searchplatform),
mastered by CKAN and also written to by Drupal. This is surfaced through
Drupal for the human- readable version; a machine-viewable form is not yet
public.
LinkedDatahostingisprovidedusingTalissandTSOsFourStoreserviceswith
Puelia (the Linked Data API) supplying the joint human-/machine-readable
interfaceontopseee.g.:
http://education.data.gov.uk/doc/school?parliamentaryConstituency.label=Islin
gton%20North
-
8/6/2019 Towards an EU PSI Portal
29/39
29
Coreservices
Servicesspecifictothedatacatalogue.
Currentdatacatalogueservices:
Reading
Human-facing(viaDrupalandCKAN) Machine-facingintheAPI(viaCKANsomeelementsnotyetavailable)
Writing
Human-facing(viaDrupalandCKAN) ImportscriptsfromtheOfficeofNationalStatistics(ONS)andData4NR;
occasionalbulk-imports
Searching
Human-facingstructuredsearch(fromSOLRviaDrupal) Machine-facingintheAPI(fromCKANnotstructured)
LinkedData
AccesstoSPARQLendpointsforkeydatasets APIforLinkedData
Communication
Blogs andbestpracticeadvice andguidance(includingmultimedia andvideo
ideasandapplicationcataloguestoharvestideasfornewdatasetsandshowcaseapplicationsusingthedata
wikis to support discussion of range of topics around methods, tools,techniquesanddatasets
foraandmailinglistPotentialfutureservices:
DevolvingthedatacataloguewritingviaAPItodatapublishers. Data access (not just meta-data) via catalogue API i.e. proxying of
payloadonrequest.
ComprehensiveRDFformsofthedatacataloguerequiresback-endre-writeofCKAN.
Data catalogue search via the API to use SOLR rather than plain-textsearch.
Transfer and presentation of the community-focussed meta-data fromDrupaltoCKAN.
-
8/6/2019 Towards an EU PSI Portal
30/39
30
Workflow
Datasetsarepublishedintodata.gov.ukscataloguethroughthreemainroutes:
Theprimaryrouteforpublisherstoincludetheirdataintothecatalogueinvolvesusing aWeb form for authenticatedusers through the Drupal
data.gov.uksystemintoCKANviaanAPI.Thereiscurrentlynoprovision
forpre-moderationofdatareleasesmadeinthisway,andTheNational
Archives(TNA)go through the published dataand follow-upwithdata
publisherstoimprovethemeta-data.TNAalsoundertaketheadditionto
thecatalogueofdatafromsomepublishers.
AnadditionalrouteareimportscriptsthattakefeedsfromtheOfficeforNationalStatisticsPublicationsHubandtheDepartmentforCommunity
and Local Governments Data4NR19 service, and occasional manually-
drivenbulk-importsofdatafromlargescaledatapublishers.
LinkedDataworkdonebytheteamatTheNationalArchive(TNA)hasadifferent route that involves peer review of the modelling and
representationsofdatabeforepublication.Notallsuchdataisincludedin
thedata.gov.ukindex,butthiswillchangesoon.
Datasets provided under the ECs INSPIRE Directive will (by law) be mass-
submitted via a different set of routes (mainly CSW), which is under
developmentandwillbedeployedoverthenextfewweeksandmonths.
19http://www.data4nr.net/introduction/
-
8/6/2019 Towards an EU PSI Portal
31/39
31
AppendixBDescriptionofmeta-dataelements:thedata.gov.uk
exampleTheinstructionsbelowweredevelopedbythedata.gov.ukteamtohelppublic
bodiessupplytheinformationrequiredtoincludetheirpublisheddatasetsinthedata.gov.ukInternetservice.
GiventhelargeamountsofdatathattheUKpublicsectorwillprovidethrough
data.gov.uk,thedatasetsmustbewell-catalogued.Itisessentialtoprovideeach
datasetwithacarefullyconstructedrecordthatfacilitatesuseractivitiessuchas
discovering,searching,linking,andcomparingdatasets.
The elements of meta-data have been designed to provide flexibility forcataloguinganypublicdatasets;guidanceforeachelementisgivenbelow.
Dependentonwhatkindofdata-setissupplied,someelementsareMandatory
andmustbecompleted. Thisappendixprovidesanat-a-glanceviewofwhich
elementsarerequiredundereachprofile.
Thereistheoptiontoaddothersifhelpfulformembersofthepublic.
Itis important thatpublishersunderstandthatall datasubmitted inthis form
willbepublishedontheInternetandmustbeUNCLASSIFIEDandinsomecases
aprocessofredactionwillberequired.
The project team for the data.gov.uk work can be reached at
-
8/6/2019 Towards an EU PSI Portal
32/39
32
Identifier
This is a public unique identifier for the dataset. It should be roughly
readable,withdashesseparatingwords.Ifthedatarelatestoaperiodof
time, includethatin thename. Indicatebroadgeographicalcoverageto
distinguishfromthosedatasetsforanothercountry.
Format Twoormorelowercasealphanumericordashes(-).
Example uk-road-traffic-statistics-2008 or west-midlands-
employment-statistics-2010
Obligation Mandatoryforalldatasetsondata.gov.uk
Change Inthepreviousmeta-dataguidancethiswascalledName.
Title
Thisis thetitleofthedatasetsothatthepubliccaneasilytellwhatthe
datasetcovers.Itisnotadescription.Donotgiveatrailingfullstop.
Example RoadtrafficstatisticsformajorUKroads,2008
Obligation Mandatoryforalldatasetsondata.gov.uk
Abstract
Thiselementisthemaindescriptionofthedataset,andoftendisplayedwiththepackagetitle.Inparticular,itshouldstartwithashortsentence
thatdescribesthedatasetsuccinctly,becausethefirstfewwordsalone
maybeusedinsomeviewsofthedatasets.
Obligation Mandatoryforalldatasetsondata.gov.uk
Change Inthepreviousmeta-dataguidancethiswascalledNotes.
DatereleasedThisisthedateoftheofficialreleaseoftheinitialversionofthedataset
(thisisprobablynotthedatethatitisaddedtothedata.gov.ukindex).Be
carefulnot toconfuseanew 'version' ofsomedatawithanewdataset
coveringanothertimeperiodorgeographicarea.
Dateupdated
This should be the date of release of the most recent version of the
dataset(notnecessarilythedatewhenitwasupdatedondata.gov.uk).As
-
8/6/2019 Towards an EU PSI Portal
33/39
33
withDatereleased, this is for updates toaparticular dataset, suchas
correctionsorrefinements,notforthatofanewtimeperiod.
Datetobepublished
Thedatewhenthedatasetwillbeupdatedinthefuture,ifappropriate.
Change Newforthisversionofthemeta-dataguidance.
Updatefrequency
Thisshouldbehowfrequentlythedatasetsisupdatedwithnewversions.
For one-off data, use never. For those once updated but now
discontinued,usediscontinued.
Choices Never|Discontinued |Annual |Monthly|Weekly |Otheras
appropriate
Precision
Thisshould indicatetousersthe levelofprecision inthedata,toavoid
over-interpretation.
Example percenttotwodecimalplacesorassuppliedbyrespondents
Geographicgranularity
Thisshouldgivethelowestlevelofgeographicdetailgiveninthedataset
ifit isaggregated.If thedataisnotaggregated,andsothedatasetgoes
down to the level of the entities being reported on (such as school,
hospital, or police station), use Point. If none of the choices is
appropriate,pleasespecifyintheotherelement.Wherethegranularity
varies,pleaseselectthelowestandnotethisintheotherelement.
Choices National|Regional|LocalAuthority|Ward|Point|Otheras
appropriate
Example ForalistofallschoolsinaLocalEducationAuthoritysarea,
Point.
Geographiccoverage
This should show the geographic coverage of this dataset. Where adataset covers multiple columns, the system will automatically group
-
8/6/2019 Towards an EU PSI Portal
34/39
34
these (e.g. England, Scotland and Wales all being Yes would be
shownasGreatBritain).
Choices Yes|Noforeach.
Temporalgranularity
Thisshouldgivethelowestleveloftemporaldetailgiveninthedatasetif
it is aggregated, expressed as an interval of time. If the data is not
aggregatedovertime,andsothedatasetgoesdowntotheinstantsthat
reportedeventsoccurred(suchasthetimingsofhighandlowtides),use
Point.Ifnoneofthechoicesisappropriate,pleasespecifyintheother
element.Wherethegranularityvaries,pleaseselectthelowestandnote
thisintheotherelement.
Choices Year|Quarter|Month|Week|Day|Hour|Point| Otheras
appropriate
Example Forthenumberofenquiriesdealtwitheachday,Day.
Temporalcoverage
The temporal coverage of this dataset. If available, please indicate the
timeaswellasthedate.Wheredatacoversonlyasinglepointintime,givetheinstantratherthanarange.
Example 21/03/200703/10/2009or07:4531/03/2006
URL
ThisistheInternetlinktoawebpagediscussingthedataset.
Example http://www.somedept.gov.uk/growth-figures.html
TaxonomyURL
This is an Internet link to a Web page describing for re-users the
taxonomies used in the dataset, if any, to ensure they understand any
termsused.
Example http://www.somedept.gov.uk/growth-figures-technical-
details.html
-
8/6/2019 Towards an EU PSI Portal
35/39
35
MinisterialDepartment
ThisisthesponsoringministerialDepartmentunderwhichthedatasetis
collected andpublished (but notnecessarilydirectlyundertaking this
use the Organisation element where this applies). The data.gov.uk
systemwillautomaticallycapturethis.
Obligation Mandatoryforalldatasets
Organisation
Thisisthebodyresponsibleforthedatacollection,ifdifferentfromthe
MinisterialDepartment.Pleaseusethefulltitleofthebodywithoutany
abbreviations,sothatallitemsfromitappeartogether.Thedata.gov.uk
systemautomaticallycapturesthiswhereappropriate.
Example EnvironmentAgency
Publisher
An over-ride for the system, setting the public body that should be
creditedwiththepublicationofthisdata,insteadoftheOrganisationor
MinisterialDepartment.Thiscouldbeusedwherethepublicbranding
oftheworkofanagencyisasitsparentdepartment.
Change Newforthisversionofthemeta-dataguidance.
Contact
Thisisthepermanentcontactpointforthepublictoenquireaboutthis
particulardataset.Thisshouldbethenameofthesectionoftheagencyor
Department responsible,andshouldnotbeanamed person.Particular
careshouldbetakeninchoosingthiselement.
Example Statisticsteam|Publicconsultationunit|FOIcontactpoint
Change Inthepreviousmeta-dataguidancethiswascalledAuthor.
Contacte-mail
Thisshouldbeagenericofficiale-mailaddressformembersofthepublic
tocontact,tomatchtheAuthorelement.Anewe-mailaddressmayneed
tobecreatedforthisfunction.
-
8/6/2019 Towards an EU PSI Portal
36/39
36
Change Inthepreviousmeta-dataguidancethiswascalledAuthore-
mail.
Mandate
AnInternetlinktotheenablinglegislationthatservesasthemandatefor
thecollectionorcreationofthisdata,ifappropriate.Thisshouldbetaken
fromTheNationalArchivesLegislationwebsite,andwherepossiblebea
linkdirectlytotherelevantsectionoftheAct.
Example http://www.legislation.gov.uk/id/ukpga/Eliz2/6-
7/51/section/2
Change Newforthisversionofthemeta-dataguidance.
Licence
Thisshouldmatchthelicenceunderwhichthedatasetisreleased.This
should generally be the Open Government Licence for public sector
information. If you wish to release the data under a different licence,
pleasecontacttheMakingPublicDataPublicteam.
Obligation Mandatoryforalldatasetsondata.gov.uk
Tags
Tagscanbethoughtofastheway thatthepackagesarecategorised,so
areofprimaryimportance.Oneormoretagsshouldbeaddedtogivethe
governmentdepartmentandgeographiclocationthedatacovers,aswell
as general descriptivewords. The Integrated Public Sector Vocabulary
may be helpful in forming these. As tags cannot contain spaces, use
dashesinstead.
Format Two or more lowercase alphanumeric or dashes (-); tags
separatedbyspaces.
Example For NHS statistics on burns to the arms: "nhs arm burns
medical-statistics"
Resources
Thesecanberepeatedasrequired.Forexampleifthedataisbeingsuppliedin
multiple formats, or split into different areas or time periods, each file is a
-
8/6/2019 Towards an EU PSI Portal
37/39
37
differentresourcewhichshouldbedescribeddifferently.Theywillallappear
onthedatasetpageondata.gov.uktogether.
Resourcedescription
Thiswillidentifyeachspecificresourceifyouhavemultipleones
forthisdataset.
Example 2009/10data|Zippeddata(20MB)|CSV-format|
APIservice
Change Newforthisversionofthemeta-dataguidance.
Resourceformat
Thisshouldgivethefileformatinwhichthedataissupplied.You
maysupplythedataina formnotlistedhere,constrainedbythe
Public Sector TransparencyBoardsprinciples that require that
alldataisavailableinanopenandstandardisedformatthatcan
bereadbyamachine.Datacanalsobereleasedinformatsthat
arenotmachine-processable(e.g.PDF)alongsidethis.
Choices CSV|RDF|XML|XBRL|SDMX|HTML+RDFa|Other
Obligation MandatoryforeachresourceChange Inthepreviousmeta-dataguidancethiswas
calledFileformat.
ResourceURL
ThisistheInternetlinkdirectlytothedatabyselectingthislink
in aweb browser, the userwill immediately download the full
dataset.Notethatdatasetsarenothostedbytheproject,butby
theresponsibledepartment
Example http://www.somedept.gov.uk/
growth-figures-2009.csv
Obligation Mandatoryforeachresource
Change Inthepreviousmeta-dataguidancethiswascalled
DownloadURL.
Obligation Mandatory for there to be at least one Resource for allitemsondata.gov.uk
-
8/6/2019 Towards an EU PSI Portal
38/39
38
AppendixCUKPublicDataPrinciples
http://data.gov.uk/wiki/Public_Data_Principle
"Public Data" is the objective, factual, non-personal data on which publicservicesrun and are assessed, andonwhichpolicydecisionsare based, or
whichiscollectedorgeneratedinthecourseofpublicservicedelivery.
Publicdatapolicy and practicewill beclearlydriven bythe public andbusinesseswhowant and use thedata, includingwhatdata is released
whenandinwhatform
Publicdatawillbepublishedinreusable,machine-readableform Publicdatawillbereleasedunderthesameopenlicencewhichenables
freereuse,includingcommercialreuse
Publicdatawillbeavailableandeasytofindthroughasingleeasytouseonlineaccesspoint(data.gov.uk)
Public data will be published using open standards, and followingrelevantrecommendationsoftheWorldWideWebConsortium
PublicdataunderlyingtheGovernmentsownwebsiteswillbepublishedinreusableformforotherstouse
Publicdatawillbetimelyandfinegrained Releasedataquickly,andthenre-publishitinlinkeddataform Publicdatawillbefreelyavailabletouseinanylawfulway Publicbodiesshouldactivelyencouragethere-useoftheirpublicdata Public bodies should maintain and publish inventories of their data
holdings
-
8/6/2019 Towards an EU PSI Portal
39/39
AppendixDAPossibleProjectPlanforPilotdata.gov.eu