rda vocabularies briefing
DESCRIPTION
Presented Jan. 15, 2010 for the Technical Services 'Big Heads', as an introduction to the RDA Vocabularies and the opportunities provided by this different approach to data.TRANSCRIPT
The Case for RDA Vocabularies
Diane Hillmann, Jon Phipps
Metadata Management Associates
1/15/2010 1Big Heads briefing
RDA in Two PartsYou’ve heard about the guidance text
The vocabularies have been developed in parallel Agreement made in Apr./May 2007 for a Task
Group to work on this Vocabularies are up-to-date as of latest JSC
changes
1/15/2010 2Big Heads briefing
They’re HerePlease explore!
http://metadataregistry.org/rdabrowse.htm
1/15/2010 3Big Heads briefing
Why are the Vocabularies Important?They provide a way for libraries to move
from a limited, bespoke “format” and elderly encoding to a more modern approach to data creation, management, and sharing
They are open and usable by others, making re-use of non-library data easier for libraries to accomplish
1/15/2010 4Big Heads briefing
1/15/2010 5Big Heads briefing
1/15/2010 6Big Heads briefing
1/15/2010 7Big Heads briefing
Why Not Just “Improve” MARC?
MARC is optimized for records, and although some improvement is possible (and is happening), a complete overhaul is not feasible Too much change (i.e., “improvement”) is
likely to make a transition more difficult We need to re-think our approach to
creating, managing, sharing metadata, not apply bandaids to 45-year-old standards
1/15/2010 8Big Heads briefing
What’s Going On Outside Libraries?
Many more sources of good data becoming available Much of it is freely available, with links to even
more sources
NY Times is one of the newer entrants in this field, building links to enrich their own data in an similar environment of retrenchment
1/15/2010 9Big Heads briefing
1/15/2010 10Big Heads briefing
1/15/2010 Big Heads briefing 11
We’re Not In This Picture
With MARC, we’re currently “delivering:” Primarily textual information, with few or no links
to follow Information almost exclusively created and
maintained by [expensive] human agents
Currently, as we look at financial retrenchment, we are focusing on how to make our data less expensive by doing less of it Isn’t this a strategy designed to put us at the
margins?
1/15/2010 12Big Heads briefing
1/15/2010Big Heads briefing 13
1/15/2010Big Heads briefing 14
Moving Beyond RecordsLinked open data--enables conversations
with the rest of the data world This data is independent of format, syntax and
"records" (although can be aggregated for various uses)
May include “crowd-sourced” data (DBPedia or FreeBase) or data re-used from other sources
1/15/2010 15Big Heads briefing
... To StatementsThe one book=one record world of MARC is a
serious limitation Making use of FRBR also requires a new view of
data management
An RDF approach, based on statements rather than records, gives us a means to incorporate other sources of data and to do so using cheaper machine-based strategies
1/15/2010 16Big Heads briefing
Why Invest in Change?We know our current way of creating and
managing data is: Unsustainable in an environment of limited
resources Based on a notion of standard data that does
not meet the needs of our users Relies on expensive human effort
1/15/2010 17Big Heads briefing
The Vocabularies ...Built according to RDF Vocabulary standards,
can be used in a variety of data environments
Based on library data experience
Intended to be attractive to the data world outside libraries, in hopes that they will use our vocabularies for their bibliographic description This would make re-use easier for us
1/15/2010 18Big Heads briefing
Richer, Cheaper Data?Data that is more easily manipulated and
maintained by machine
Data that is created and maintained by someone else, but “good enough” to provide important functionality
Ex.: Geographic data, to support mapping applications
Ex.: Data to better support faceted searching and browsing
1/15/2010 19Big Heads briefing
Real ExampleLC Chronicling American Project
Building georeferencing into library data
1/15/2010 20Big Heads briefing
About this Newspaper: The Daytona Daily News
• HTMLhttp://chroniclingamerica.loc.gov/lccn/sn93063916/
• RDFhttp://chroniclingamerica.loc.gov/lccn/sn93063916.rdf
• MARC (HTML)http://chroniclingamerica.loc.gov/lccn/sn93063916/marc/
• MARC (XML)http://chroniclingamerica.loc.gov/lccn/sn93063916/marc.xml
• WorldCat (HTML only?)http://www.worldcat.org/oclc/1631353 1/15/2010 Big Heads briefing 21
1/15/2010 22Big Heads briefing
1/15/2010 23Big Heads briefing
Un-Linked DataMARC21 has a naming convention for place
names…752 $a United States $b Florida $c Volusia $d Daytona Beach
Wikipedia also has a naming convention for place names…http://en.wikipedia.org/wiki/Daytona_Beach,_Florida
LC staffer created a little script to use the 752 hierarchy to build a Wikipedia URL and see if it would resolve as a URI from DBpedia…
1/15/2010 Big Heads briefing 24
Linked DataDbpedia:
<dcterms:coverage rdf:resource=http://dbpedia.org/resource/Daytona_Beach%2C_Florida />
Geonames:<dcterms:coverage rdf:resource=http://sws.geonames.org/4152872/ />
1/15/2010 Big Heads briefing 25
DBpedia
Dbpedia is “a community effort to extract structured information from Wikipedia and to make this information available on the Web.”
1/15/2010 Big Heads briefing 26
1/15/2010 27Big Heads briefing
1/15/2010 28Big Heads briefing
DBpedia“The DBpedia knowledge base currently
describes more than 2.6 million things, including at least…
213,000 persons
328,000 places
57,000 music albums
36,000 films
20,000 companies.”
1/15/2010 Big Heads briefing 29
Dbpedia (even more data)
owl:sameAs
Rdfabout: The 2000 U.S. Censushttp://www.rdfabout.com/rdf/usgov/geo/us/fl/counties/volusia_county/daytona_beach
GeoNameshttp://sws.geonames.org/4152872/
Freebasefbase:Daytona Beach, Florida
1/15/2010 Big Heads briefing 30
Interesting QuestionsThere are hundreds, if not thousands of people tracking down place names in Wikipedia and making sure they are normalized and geo-referenced.
Is this crowd-sourced, Wikipedia data ‘authoritative’?
Is it ‘good enough’?
How different is this from the strategy that’s used for NACO?
1/15/2010 Big Heads briefing 31
More QuestionsChronicling America’s data for the Daytona Beach Daily News references Dbpedia but there’s no corresponding reference to Chronicling America data in Dbpedia, even though there’s a ‘place’ where it could be referenced.
How do we make sure that happens?
Where’s the library data anyway?
1/15/2010 Big Heads briefing 32
Even More QuestionsDBpedia uses it’s own vocabulary for many statements, chooses to use skos:subject instead of dc:subject, foaf:name instead of dc:title.
Was there a specific reason for this choice?
Would there be value for us if they used more RDA properties instead?
1/15/2010 Big Heads briefing 33
How Do We Get From Here to There?
Work with vendors to shift from MARC to RDA; from records to statements
Focus community effort on solid innovation rather than incremental shifts
Worry less about the costs of moving forward, and more about the costs of stasis
Support open sharing of library data!
1/15/2010 34Big Heads briefing
The Elephants in the Room
Record “ownership” as OCLC is attempting to enforce will not help libraries as they attempt to move forward
OCLC’s membership must reinforce an open model of record use and re-use, lest necessary innovation be stifled
LC’s R2 report recommends a backward facing strategy Given LC’s well-known (and well-respected)
record for innovation, why is cataloging data exempt from consideration?
1/15/2010 35Big Heads briefing
What RDA Vocabularies Bring to the Table
Readiness for participation in the open data world
Potential for automating more data capture to enrich library data without using expensive human resources, and sharing without artificial boundaries
Improved "marketing" of our collections (particularly digital and special collections) beyond the library world
1/15/2010 36Big Heads briefing