vra_2015_catalogingroundup_seneff

47
Using The Getty Vocabularies As Linked Open Data In A Cataloging Tool For An Academic Teaching Collection: Case Study At The University Of Denver Heather Seneff, MA, MLS Director of the Visual Media Center School of Art and Art History University of Denver Fernando Reyes, MS Senior Systems Developer University Libraries University of Denver Sheila Yeh, MLS, MSE Library Digital Infrastructure and Technology Coordinator University Libraries University of Denver © Seneff, H., Yeh, S., Reyes, F. VRA 2015, Session 4: Cataloging Round-up March 12, 2015

Upload: heather-seneff

Post on 17-Jan-2017

84 views

Category:

Documents


3 download

TRANSCRIPT

Using the Getty Vocabularies as Linked Open Data in a Cataloging Tool for an Academic Teaching Collection: Case Study at the University of Denver

Using The Getty Vocabularies As Linked Open Data In A Cataloging Tool For An Academic Teaching Collection: Case Study At The University Of Denver

Heather Seneff, MA, MLSDirector of the Visual Media CenterSchool of Art and Art HistoryUniversity of Denver

Fernando Reyes, MSSenior Systems DeveloperUniversity LibrariesUniversity of Denver

Sheila Yeh, MLS, MSELibrary Digital Infrastructure and Technology CoordinatorUniversity LibrariesUniversity of Denver

Seneff, H., Yeh, S., Reyes, F.VRA 2015, Session 4: Cataloging Round-upMarch 12, 2015

Linked Open Data is an exciting component of the Semantic Web that has special implications in the realm of art and cultural objects. In the Semantic Web, objects and concepts are seen in context through the use of structured, defined data. This improves searches and services on the Internet and exposes cultural objects in a new way. The Getty Research Institute has made the Art and Architecture Thesaurus (AAT) available as LOD, with plans to do the same for its Union List of Artist Names (ULAN) in 2015. These vocabularies are traditional mainstays of cataloging cultural objects. This case study addresses some of the practical applications of LOD in the cataloging environment during the 2013 development of a cataloging tool at the University of Denver. This study will also discuss principles of agile software development, which contributed to a successful collaboration.

1

Sir Tim Berners-Lee The Semantic Web (Web 3.0) = web of linked data

The Semantic Web (Web 3.0) is the newest evolution of the Internet. In the Semantic Web, natural language is structured so it is machine-readable. Logical connections can be made between things. Sir Tim Berners-Lee (who is considered the inventor of the Internet and is now Director of the World Wide Web Consortium or W3C) envisions the Semantic Web to be a web of linked data rather than a web of hypertext as it is now. In his 2009 TED talk The Next Web of open, linked data, he explained that the Semantic Web can unlock enormous potential on the Internet. The content is there on the web, but the data about the content needs to be explored, added to, and structured. 2

Semantic Web:

URIs = Uniform Resource Identifiers

The Semantic Web relies on three important components: Uniform Resource Identifiers, which are unique strings of characters identifying a resourcean entity or thing.3

Semantic Web:

URIs = Uniform Resource Identifiers

RDF = Resource Description Frameworks

SPARQL = SPARQL Protocol and RDF Query Language

RDF triples-- subject, predicate, and objectare the syntax that describes the thing.

RDF information is expressed in a graph format using SPARQL (SPARQL Protocol and RDF Query Language) which is the W3C standard query language for the Semantic Web4

Semantic Web:

URIs = Uniform Resource Identifiers

RDF = Resource Description Frameworks

SPARQL = SPARQL Protocol and RDF Query Language

Ontologies and vocabularies

OWL = Web Ontology Language

Ontologies and vocabularies which define the properties and relationships of the thing and how it is described.

These ontologies and vocabularies are created by subject domains or specific disciplines using W3C standards in RDF schema language or in Web Ontology Language (OWL), and are shared on the Semantic Web. 5

Semantic Web:

URIs = Uniform Resource Identifiers

RDF = Resource Description Frameworks

SPARQL = SPARQL Protocol and RDF Query Language

Ontologies and vocabularies

OWL = Web Ontology Language

LOD = Linked Open Data

Those that are freely accessible and shared without licensing on the Internet are called Linked Open Data or LOD.

6

Semantic Web:

URIs = Uniform Resource Identifiers

RDF = Resource Description Frameworks

SPARQL = SPARQL Protocol and RDF Query Language

Ontologies and vocabularies

OWL = Web Ontology Language

LOD = Linked Open Data

These are the pillars of the Semantic Web--URIs, RDF, and standardized ontologies.7

RDF TriplesAfter Yang & Lee: Organizing Bibliographic Data with RDA, p. 8, fig. 1

At the basic level, information is organized in a RDF triple with a subject, predicate, and object.8

URIsURIsURIs or textRDF TriplesAfter Yang & Lee: Organizing Bibliographic Data with RDA, p. 8, fig. 1

The subjects and predicates consist of Unique Resource Identifiers while the objects can be URIs or text content.9

URIsURIsURIs or textShakespeareIs author ofOthelloRDF TriplesAfter Yang & Lee: Organizing Bibliographic Data with RDA, p. 8, fig. 1

Such as this example: Shakespeare. is author of. Othello. These sets of RDF triples then link to other triples10

giving further meaning to the information11

CC-BY-SA license: Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/

and spreading out into a web of meaningful content.

Examples of projects using LOD include Europeana and the Digital Public Library of America (DPLA).

12

http://www.europeana.eu/portal/

Europeana is often cited as an example of the power and usefulness of Linked Open Data. Their goal is to connect and publish data from Europes cultural heritage and scientific communities. 1500 institutions have contributed over two hundred million records to the Europeana Digital Library, and all the metadata is shared under a Creative Commons CCO 1.0 agreement. The Europeana Foundation exposes its content as Linked Open Data through its professional portal.13

http://dp.la/

The Digital Public Library of America is a similar project and uses the Europeana Data Model to aggregate the resources of a growing number of partners. Their metadata model is based on RDF and uses Dublin Core as a central descriptive metadata standard. In addition to an RDF-based data model, the DPLA also uses JavaScript Object Notation for Linked Data (JSON-LD) serialization.

A majority of contributing partners to both Europeana and the DPLA are cultural institutionsmuseums, archives, libraries--which usually have a wealth of metadata and description about their holdings AND they also benefit by having a number of established vocabularies structured specifically for the types of objects they describe. For example14

the Getty Research Institute vocabularies are long established standards for cataloging cultural objects. The Getty Research Institute began working to expose their vocabularies as Linked Open Data, and in February 2014, the Art and Architecture Thesaurus (AAT) was the first vocabulary to be released as Linked Open Data. 15

In 2013, I took an online class (a MOOC) from Coursera about Metadata and the professor, Jeffrey Pomerantz, discussed Linked Open Data a bit--enough to intrigue me about LOD, and for me to realize how useful it would be in cataloging cultural objects in my own work as the Director of the Visual Media Center at the University of Denver. 16

The University of Denver uses a learning object media management software called CourseMedia which was developed in house in 2003 by DUs Office of Teaching and Learning. It is a web-based, password-protected media management system for images, video, and audio files to support teaching. The primary software technologies used are Adobe ColdFusion, Flash, and MySQL database.Image files are added to CourseMedia by the Visual Media Center in the School of Art and Art History (video and audio files are added by the library). I have four graduate students and one half-time staff member cataloging art and cultural objects. I train the students and edit each record before it is ingested into CourseMedia. OTL had also developed the cataloging system used in the VMC in 2003, which was becoming outdated. When the opportunity to develop an updated cataloging tool arose in 2013, it was reasonable for me to collaborate with DUs Library instead of OTL, which was concentrating more on the user end of CourseMedia rather than cataloging. I began meeting with Librarys Digital Infrastructure and Technology Coordinator, Sheila Yeh, and the Librarys Senior Systems Analyst Fernando Reyes to plan for this software.

17

user-friendly designqualified Dublin Core metadata schema accommodate fields from the first generation cataloging toolexport metadata into CourseMedia

I envisioned user-friendly software based on a qualified Dublin Core metadata schema that would accommodate fields from the first generation cataloging tool and export metadata into CourseMedia18

user-friendly designqualified Dublin Core metadata schema accommodate fields from the first generation cataloging toolexport metadata into CourseMediametrics for productionability to review and edit new recordsintegrated authority control

I also wanted metrics for production assessment and a process for me to review and edit new records. And I wanted integrated authority control for certain fields:19

user-friendly designqualified Dublin Core metadata schema accommodate fields from the first generation cataloging toolexport metadata into CourseMediametrics for productionability to review and edit new recordsintegrated authority controlULAN and Library of Congress names authority for the creator and repository (DC: coverage: spatial) fieldsAAT and Library of Congress subject headings for the style/period (DC: coverage: temporal) and subjects fields.

the ULAN and Library of Congress Names authority for the creator and repository fields, and the AAT and Library of Congress subject headings for the style/period and subjects fields.

Sheila and Fernando decided to apply an agile software development approach to the planning.

20

This approach focused on defining the features of the system and what it could and could not do, and on prioritizing the development, allowing for features that could be added later. Using this approach, the development efforts welcomed reasonable changes and focused on a modular design for simplicity, sustainability, and reusability. The three of us also met frequently in face-to-face meetings as the development progressed and as the software was ready for testing. The collaboration went well and communication remained open and positive throughout the project.

Here is a teaser look at the finished product (from the catalogers view) before I talk more about Fernandos role in programming behind the scenes:21

This is the catalogers view of Art History Metadata Management system (MMS). It uses JavaScript, PHP and MySQL which were chosen because of Fernandos expertise and the fact that they are currently part of the general curriculum of Computer Science education. This helps with the sustainability of the project, since MMS does not rely on the specific skillset of one individual or a particular team. MMS also uses Solr search server and a Fedora-based digital repository. Solr is an open source search platform produced by the Apache Software Foundation used in indexing metadata and providing a fast discovery layer. Fedora offers an accessible application programming interface (API) to manage XML metadata, in this case the qualified Dublin Core, in a flexible way. The LOC names and subjects were integrated into MMS by API and the AAT and ULAN were purchased from the Getty in their pre-LOD form for integration as local data stores.22

After MMS was in use for about 6 months, the Getty made the AAT accessible through API as linked open data. Fernando and Sheila and I jumped at the chance to utilize the Linked Open Data AAT when it was made available by Getty. MMSs local JavaScript Web service was modified to search the AAT (in addition to the LCSH and LCN) using SPARQL query language in a uniform manner.23

Screenshot from Fernando Reyes

Fernando found that the Getty Institute facilitates the use of its LOD very professionally with things like sample queries guidelines.24

There is also a forum for users to ask technical questions of all kinds. This is the SemanticWeb forum accessed through the Gettys implementation instructions. For example, Fernando initially encountered ambiguous headers from the dataset returned from the AAT Web services, but was able to quickly gain insights on the problem from reading posts from the forum. The forum is also helpful when LOD changes the way their Web Services return data, helping the programmers accommodate the changes to service.

Now the next screenshots are from Fernando to reveal coding--25

Screenshot from Fernando Reyes

This is an example of a JSON (JavaScript Object Notation) search response.26

Screenshot from Fernando Reyes

where you can see what a search for pipes in the AAT exposeswith the specific term calumets within the structured data.27

Screenshot from Fernando Reyes

This screenshot shows the metadata class returned from the AATs vocabulary service in MMS.

28

Screenshot from Fernando Reyes

This shows the AATs response in LOD code. Now well look at MMS again from the catalogers perspective29

So here were back at the catalogers view of MMS, where you can30

choose authorities .31

from a pulldown menu on the right (Alora is the acronym for the first generation cataloging tool developed in 2003we had to keep two vocabulary lists from that tool to support some of the older records in the system).32

So for example we can use the AAT to search for a term33

like pipes for which we retrieve a number of results. We want to be sure to differentiate between pipes in the context of plumbing, musical instrument pipes, or the kind of pipes that are Native American ritual objects (as opposed to the kind of pipes that are associated with a different sort of ritual in Colorado.). We can select a more specific term, like calumets.34

and access the AAT for more information on the term by clicking the associated URL (or URI).35

which takes you to the LOD version of the term. From here you can see the RDF triples associated with the term.36

.on the left hand side tabs

37

revealed in columns of Subject, predicate, object --and on the right hand38

side the links to the AAT website and the hierarchy to which the term belongs.

39

Heres the familiar page for a term in the AAT this one for calumets, the term for the specific type of pipe we are cataloging.40

Then from inside MMS, we can click the AAT term of choice41

and the appropriate field is populated! No re-typing necessary. You can see here that the subjects field is repeatable, too, with the plus sign above the field. We can use the LOC Subject Heading authority for the same field.42

.and for the style\period field above as well.43

Heres an example of a completed record in MMS. 44

You can see multiple subjects used.

The software has proved to be user friendly for the graduate students I train and direct, and it encourages good cataloging habits. The integrated vocabularies help me mentor the students about the importance of metadata -- AND the authoritative terms improve the searches in CourseMedia.45

Its now been over a year since MMS has been used as our cataloging system in the VMC. You can see the metrics displayed within MMS here. I believe using integrated vocabularies has improved our productivity, our efficiency, and our accuracy in cataloging new records and correcting old ones. MMS was a successful collaboration, and an exciting introduction to the power and usefulness of Linked Open Data.46

BibliographyBreitman, Karen Koogan, Marco Antonio Casanova, and Walter Truszkowski. Semantic Web Concepts, Technologies and Applications. New York; London: Springer, 2007. Hendler, James, Jeanne Holm, Chris Musialek, and George Thomas. US Government Linked Open Data: Semantic. Data. Gov. IEEE Intelligent Systems 27, no. 3 (2012): 00250031.Hooland, Seth van. Linked Data for Libraries: How to Clean, Link and Publish Your Metadata. Chicago, IL: Neal-Schuman, 2014.Johnson, L., S. Adams Becker, V. Estrada, and A. Freeman. NMC Horizon Report: 2014 Library Edition. Austin, Texas: The New Media Consortium, 2014.Manifesto for Agile Software Development. Manifesto for Agile Software Development. Accessed January 23, 2015. http://www.agilemanifesto.org/principles.html.Mitchell, Erik T. Building Blocks of Linked Open Data in Libraries. Library Technology Reports: Library Linked Data: Research and adoption 49, no. 5 (July 2013): 1125.Mitchell, Erik T. Three Case Studies in Linked Open Data. Library Technology Reports: Library Linked Data: Research and adoption 49, no. 5 (July 2013): 2643.Park, Jung-ran, and Lynne C. Howarth, eds. Library and Information Science, Volume 7: New Directions in Information Organization. Bradford, GBR: Emerald Insight, 2013. Park, Ziyoung, and Heejung Kim. Organizing and Sharing Information Using Linked Data. In Library and Information Science, Volume 7: New Directions in Information Organization. Bradford, GBR: Emerald Insight, 2013.Yang, Sharon Q., and Yan Yi Lee. Organizing Bibliographic Data with RDA: How Far Have We Stridden Toward the Semantic Web? In Library and Information Science, Volume 7: New Directions in Information Organization. Bradford, GBR: Emerald Insight, 2013.Presentation screenshots by H. Seneff unless otherwise noted

47