uk discovery-jisc-project-showcase
DESCRIPTION
Overview of the eight RDTF Discovery projects funded by JISC, due to complete summer 2011TRANSCRIPT
Project Showcase
Twitter hashtag: #ukdiscovery Wifi: WellcomeNet
AIM25: Open Metadata Pathway• AIM25: aggregating and cross-searching descriptions of
archives held in London. 16,000 descriptions on every key theme; subject, personal & corporate name and place indexing; 1.3m hits per month
• King’s College London and ULCC lead; 123 institutional partners
• OMP: JISC-funded, Feb-July 2011. Add 1,100 descriptions from 5 new partners including National Maritime Museum and Kew Gardens; run Linked Data across the records to improve user searching. Blog & Twitter updates: #Aim25news
• Focus groups and statistical analysis to test outcomes and inform forward action
AIM25 OMP: Progress so far
• Gate and Open Calais used to interrogate samples of records including their indexing. Pros and cons of each explored. OC selected
• Articulating the link between ISAD(G), Linked Data, Open Metadata and faceted searching
• Aim to test the value of Linked Data – does it improve searching; does it speed up data input and editing; does it enhance or supplant indexing?
• Questions include: how to deal with subject terms (use UKAT – out of date?); which URI scheme to use? How will the user benefit – what are the tangible, demonstrable gains?
http://openmetadatapathway.blogspot.com/
COMET – Cambridge Open MetadataWhat?
Releasing large subset of UL records under a Public Domain Data License
Identifying IPR history of our bibliographic data
Documenting process and releasing tools for others to do the same
Converting to useful linked RDF
Some as Marc21
Establishing a library triple-store
Link to OCLC ‘next-gen’ authority services
Why?
See what developers can do with our stuff
Move libraries forward with open data licensing
Gain in-house understanding of semantic web
Better realise value in records through contribution to the public domain
COMET – Cambridge Open Metadata
Where ?http://cul-comet.blogspot.com/ - project blog
http://data.lib.cam.ac.uk – datasets and triplestore (eventually)http://lib.cam.ac.uk/api - library API portal
Who?
Ed Chamberlain – [email protected] - @edchamberlain
How and when ?
1. License negotiation with Record vendors – 75% complete
2. RDF Conversion – 70 % complete
3. Triplestore and linked data web infrastructure - 60% complete
4. Documentation and supporting work - 30% complete
JISC
Contextual Wrappers http://context.collectionstrustblogs.org.uk/
packaging collections information as part of open metadata provision
Contextual Wrappersdeveloping collection level descriptions for aggregation
on the Culture Grid as an aid to resource discovery
related item recordsrelated item records
related publicationsrelated publications
related collectionsrelated collections
collection level description
17th centuryDutch art inmuseum a
collection level description
17th centuryDutch art inmuseum a
collection level description
17th centuryDutch art inmuseum b
collection level description
17th centuryDutch art inmuseum b
collection level description
17th centuryDutch materialin archive a
collection level description
17th centuryDutch materialin archive a
17th centuryDutch art
subject area inundergraduatestudy
17th centuryDutch art
subject area inundergraduatestudy
improved contextual information about collection scope and strength relevant to area of study
for further information, contact: David Scruton [email protected]
Jerome 1/2
University of Lincoln
What's the point of Jerome?• Harvesting data from several existing library systems• Unifying and standardising bibliographic data• Supplementing /enhancing with existing Open Data• Releasing 200,000+ bib records under CC0• Immensely fast search using NoSQL d/b and Sphinx• Personalised portal• Awesome APIs!• Range of data formats:
XML, JSON, MARC, RIS
http://jerome.library.lincoln.ac.uk/
Jerome 2/2
University of Lincoln
What have we done already?• Now harvesting live MARC, Repository, e-journals data• Live search portal available: http://lncn.eu/jerome• Basic APIs created
What's still to come?• Records with explicit Open Data licence• Fully-documentated APIs• "Radical Personalisation" (machine
intelligence, mixing desk, is it raining?)• Mashup with the COMET project
http://jerome.library.lincoln.ac.uk/
OpenART : Open metadata for Art Research at the Tate
OpenART has at it’s heart a dataset ‘The London Art World 1660-1735’, the result of several years research into People, Sales, Places and Sources from the period.
OpenART is a collaboration between the University of York, Tate and Acuity Unlimited.
OpenART will model the entities, likely using an event-based approach, store in a semantic-friendly way and expose the data for aggregation.
OpenART will explore data normalisation with vocabularies and ontologies, and propose methods for Tate and York to expose or encode their own data.
What now, when next …
Analysis and working with our researcher to understand the complex dataset so that we can produce a data model; implementing a database to help manage the input and data structureFinalising the data model, choosing a modeling approach and ontology/ies and blogging about it; validating with stakeholder use cases; considering entity registration.
Contact: Julie Allinson [email protected]://tinyurl.com/dlib-openart -- http://yorkdl.wordpress.com/
Existing process
…Institutional Resolvers
OpenURL Router
Institutional Resolvers
Institutional Resolvers
Institutional Resolvers
Request
Redirect request
Logrequest
Level 0 Data(Log)
Survey institutions to enable opt-out
Level 1 Data
Process to Level 1:• Exclude data from opted-
out institutions• Anonymise IP addresses• Anonymise institution &
remove button & lookup data that would identify institution
• Parse OpenURL request into constituent parts
Process to Level 2:• Include only redirect to
resolver requests
Level 2 Data
Use for prototypes & services
Using OpenURL Activity Data: Overview
• Build on existing process for OpenURL Router log• Legal investigation: risk management and data
disclosure when processing IP addresses
Project aims:•Make this data
available under open licence
•Develop prototype service using this activity data
•Explore including institutions’ data in aggregation
OpenURL Activity Data: Project and Info
•What’s in the data set?– Date & time– Anonymised IP address &
institution IDs– OpenURL request data, e.g.
• Article Title• Journal Title• Book Title
•How might the data be used?– Article/journal recommendations– Student analysis– Research thesis– Publishers comparing listings with
texts sought– Innovative services to meet your
users’ needs– Other, unanticipated uses
Legal investigation: report available
Licence selection: ODC-PDDL with Attribution
Sharealike community norms
Extend openurl.ac.uk for provision of data
Aggregated Level 2 Data
Available
Key activity to date: Next steps/outputs:
Provide Level 1 & Annual Data
We are here:
Develop Prototype Recommender
Service
• Author• ISSN• DOI …
•Further information:– openurl.ac.uk/doc– EDINA Project Page: Using OpenURL Activity Data– Sheila Fraser: [email protected] 0131 651 7715
SALDA ProjectSussex Archive Linked Data Application
http://blogs.sussex.ac.uk/salda/
Transforming the catalogue records of the
Mass Observation Archive into Linked
Open Data
SALDA ProjectProgress
• Converted catalogue records to EAD • Ongoing transformation of data to Linked Data
by Pete Johnston at Eduserv • Using a PDDL licence to make data available• Using data.lib.sussex.ac.uk as our URI stemMore information at:
http://blogs.sussex.ac.uk/salda/Contact: Karen Watson [email protected]