a registry for controlled vocabularies at the library of congress rebecca guenther network...

28
A Registry for controlled vocabularies at the Library of Congress Rebecca Guenther Network Development & MARC Standards Office, Library of Congress October 29, 2008

Post on 21-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

A Registry for controlled vocabularies at the Library of

Congress

Rebecca Guenther

Network Development & MARC Standards Office,

Library of Congress

October 29, 2008

Oct. 29, 2008 ASIST 2008

Outline of presentation

Types of controlled vocabularies Vocabularies maintained at LC An introduction to SKOS Establishing concept databases at LC Examples of concept schemes: ISO 639-2 and

PREMIS event type Providing the registry as a web service

Oct. 29, 2008 ASIST 2008

Why establish controlled vocabularies?

Control values that occur in metadata Document and publish for reuse Reduce ambiguity Control synonyms Establish formal relationships among terms (where

appropriate) Test and validate terms

Oct. 29, 2008 ASIST 2008

Types of Controlled Vocabularies used in metadata standards

Lists of enumerated values Code lists (e.g. language, country) Taxonomies Formal Thesauri Locally controlled enumerated lists

Oct. 29, 2008 ASIST 2008

Enumerated lists

Simple list of terms used in a pull-down menu or Web site pick list Values enumerated in an XML schema Little additional information or structure about each value Examples:

– Code and value from a MARC 21 fixed field, e.g. code “e” in Leader/06 is “cartographic material”

– Enumerated value “MD5” for METS CHECKSUMTYPE– Enumerated value “born digital” in MODS digitalOrigin

Oct. 29, 2008 ASIST 2008

Code lists

Some established as ISO standards and used worldwide in many communities for many purposes

The standard standardizes the code, not a particular name for it

Codes are used as identifiers Examples (maintained by LC):

– ISO 639-2 (language codes)– MARC relator codes– MARC country codes

Oct. 29, 2008 ASIST 2008

Thesauri

A thesaurus is a controlled vocabulary with multiple types of relationships

Example:Rice UF paddyBT CerealsBT Plant productsNT Brown riceRT Rice straw

Oct. 29, 2008 ASIST 2008

Standards maintained at LC that use controlled vocabularies

MARC (including code lists) MODS METS MIX (XML schema for Z39.87 Technical metadata for

digital still images) PREMIS ISO 639-2 (language codes) Thesaurus of Graphic Materials LCSH … and some others

Oct. 29, 2008 ASIST 2008

SKOS: What is it?

Simple Knowledge Organisation System(s)

SKOS is … for declaring and publishing taxonomies, thesauri or

classification schemes, for use in a distributed, decentralised information system (i.e. a semantic web).

for describing Concepts and creating relationships between Concepts and Terms

A practical application of RDF a formal language for representing controlled, structured

vocabularies

Oct. 29, 2008 ASIST 2008

The SKOS data model

…views a knowledge organization system as a concept scheme comprising a set of conceptual resources (concepts).

– These concept schemes and conceptual resources are identified by URIs.

– The model is multilingual and extensible

10

Oct. 29, 2008 ASIST 2008

Concepts can be…

labeled with any number of strings. One label, in any given language, can be indicated as the "preferred" label for that language, and others as "alternate“ labels, "hidden“ labels, or using a notation:

– skos:prefLabel– skos:altLabel– skos:hiddenLabel– skos:notation

11

Oct. 29, 2008 ASIST 2008

Concepts can be…

linked to other concepts within the same concept scheme. Hierarchical links:

– skos:broader and skos:narrower– skos:broaderTransitive and

skos:narrowerTransitive

Associative links: – skos:related

12

Oct. 29, 2008 ASIST 2008

Concepts can be…

grouped into collections, which can be labeled and/or ordered. A concept can be in one or more collections

– skos: Collection– skos: OrderedCollection– skos: member– skos: memberList

13

Oct. 29, 2008 ASIST 2008

Concepts can be…

mapped to other concepts in different concept schemes.

Hierarchical mapping:– skos:broadMatch – skos:narrowMatch

Associative mapping:– skos:relatedMatch– skos:closeMatch– skos:exactMatch

14

Oct. 29, 2008 ASIST 2008

Advantages to using SKOS

SKOS has a defined element set which is particularly relevant for controlled vocabularies

Relationships between entries in a thesaurus can be expressed (broader, narrower, etc.)

Relationships between entries in different thesauri can be expressed (exactMatch, related)

Having a dereferencable URI for concepts and their concept schemes enhances the ability to provide web services for consumers of these standards

Oct. 29, 2008 ASIST 2008

Controlled vocabularies registry at LC

Library of Congress is establishing databases with controlled vocabulary values for standards that it maintains

Controlled lists are represented using SKOS as well as alternative syntaxes Lists currently in progress:

– ISO 639-2 and MARC language code list– MARC geographic area codes– MARC country code list– MARC relators– PREMIS controlled value lists– Thesaurus of Graphic Materials

Other possibilities– Enumerated values in MODS schema– Coded and uncoded value lists in MARC

Oct. 29, 2008 ASIST 2008

Reasons for developing a registry

Facilitate development and maintenance process

Make controlled lists openly available Develop a web service where comprehensive

information about controlled terms is available Experiment with semantic web technologies Expose vocabularies to a wider communities

http://www.loc.gov:8081/standards/registry/lists.html

Oct. 29, 2008 ASIST 2008

Example: ISO 639-2 vocabulary

One in the family of ISO 639 language coding standards

Has a close relationship with other language coding standards (ISO 639-1 and -3, MARC)

LC is maintenance agency The standard is the CODE, not the language

name; multiple names are given

ISO 639-2 language code example

<rdf:Description rdf:about= "http://www.loc.gov/standards/registry/vocabulary/iso639-2/por">

<rdf:type rdf:resource="http://www.w3.org/2008/05/skos #Concept"/>

<skos:prefLabel xml:lang="x-notation">por</skos:prefLabel>

<skos:altLabel xml:lang="en-Latn">Portuguese</skos:altLabel>

<skos:altLabel xml:lang="fr-Latn">portugais</skos:altLabel>

<skos:notation rdf:datatype="xs:string">por</skos:notation>

<skos:definition xml:lang="en-Latn">This Concept has not yet been defined.</skos:definition>

<skos:inScheme rdf:resource="http://www.loc.gov/standards/registry/vocabulary/iso639-2"/>

<vs:term_status>stable</vs:term_status> <skos:historyNote rdf:datatype="xs:dateTime">2006-07-

19T08:41:54.000- 05:00</skos:historyNote><skos:exactMatch rdf:resource=

"http://www.loc.gov/standards/registry/vocabulary/iso639-1/pt"/> <skos:changeNote rdf:datatype="xs:dateTime">2008-07-09T13:49:05.321-04:00</skos:changeNote>

</rdf:Description>

Oct. 29, 2008 ASIST 2008

PREMIS controlled lists

PREMIS Data Dictionary for Preservation Metadata Some semantic units call for controlled vocabularies

and have suggested lists A central registry could document and make them

available Users could submit their own terms PREMIS schema could be enhanced with enumerated

values for validation generated dynamically

PREMIS event type example

<rdf:Description rdf:about= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/creation">

<rdf:type rdf:resource= "http://www.w3.org/2008/05/skos#Concept"/>

<skos:prefLabel xml:lang="en-latn"> creation</skos:prefLabel>

<skos:narrower rdf:resource= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/migration"/>

<skos:narrower rdf:resource= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/normalization"/>

<skos:definition xml:lang= "en-latn">the act of creating a new object</skos:definition>

<skos:inScheme rdf:resource= "http://www.loc.gov/standards/registry/vocabulary /preservationEvents"/>

</rdf:Description>

XML Database using XQuery

(eXist)

RDF Triple Store(Sesame)

Registry Web service

Interprets URIFormulates SPARQL query

HTTP request

User

Runs queryGets resultsSends back to database and then to user

Oct. 29, 2008 ASIST 2008

Further development

Consider programming changes to improve speed

Develop mechanisms to output all public documentation from database

Include additional coding about relationships to other concept schemes and controlled vocabularies (facilitating crosswalks)

Encourage experimentation

Oct. 29, 2008 ASIST 2008

Questions?

Contacts:– Rebecca Guenther: [email protected]– Clay Redding: [email protected]