metadata standards and applications 8. metadata interoperability and quality issues

54
Metadata Metadata Standards and Standards and Applications Applications 8. Metadata 8. Metadata Interoperability and Interoperability and Quality Issues Quality Issues

Upload: sheila-hicks

Post on 22-Dec-2015

236 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards Metadata Standards and Applicationsand Applications

8 Metadata 8 Metadata Interoperability and Interoperability and

Quality IssuesQuality Issues

Goals of SessionGoals of Session

Understand interoperability protocols Understand interoperability protocols (OpenURL for reference OAI-PMH for (OpenURL for reference OAI-PMH for metadata sharing)metadata sharing)

Understand crosswalking and Understand crosswalking and mapping as it relates to mapping as it relates to interoperabilityinteroperability

Investigate issues concerning Investigate issues concerning metadata qualitymetadata quality

Metadata Standards amp ApplicationsMetadata Standards amp Applications 22

Whatrsquos the Point About Whatrsquos the Point About InteroperabilityInteroperability

For users itrsquos about resource discovery For users itrsquos about resource discovery (user tasks)(user tasks)ndash Whatrsquos out thereWhatrsquos out therendash Is it what I need for my taskIs it what I need for my taskndash Can I use itCan I use it

For resource creators itrsquos about For resource creators itrsquos about distribution and marketingdistribution and marketingndash How can I increase the number of people who How can I increase the number of people who

find my resources easilyfind my resources easilyndash How can I justify the funding required to make How can I justify the funding required to make

these resources availablethese resources availableMetadata Standards amp ApplicationsMetadata Standards amp Applications 33

Metadata Standards amp ApplicationsMetadata Standards amp Applications 44

OAI-PMHOAI-PMH Open Archives Initiative-Protocol for Open Archives Initiative-Protocol for

Metadata Harvesting (Metadata Harvesting (httpwwwopenarchivesorg))

Roots in the ePrint community although Roots in the ePrint community although applicability is much broaderapplicability is much broader

Mission ldquoThe Open Archives Initiative Mission ldquoThe Open Archives Initiative develops and promotes interoperability develops and promotes interoperability standards that aim to facilitate the standards that aim to facilitate the efficient dissemination of contentrdquoefficient dissemination of contentrdquo

Content in this context is actually Content in this context is actually ldquometadata about contentrdquoldquometadata about contentrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 55

Metadata About the Resource

Metadata Standards amp ApplicationsMetadata Standards amp Applications 66

OAI-PMH in a NutshellOAI-PMH in a Nutshell

Essentially provides a simple protocol Essentially provides a simple protocol for ldquoharvestrdquo and ldquoexposurerdquo of for ldquoharvestrdquo and ldquoexposurerdquo of metadata recordsmetadata records

Specifies a simple ldquowrapperrdquo around Specifies a simple ldquowrapperrdquo around metadata records providing metadata records providing metadata about the record itselfmetadata about the record itself

OAI-PMH is about the OAI-PMH is about the metadatametadata not not about the about the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 77

The OAI WorldThe OAI World Divided into two categoriesDivided into two categories

ndash Data providers ldquoA data provider Data providers ldquoA data provider maintains one or more repositories (web maintains one or more repositories (web servers) that support the OAI-PMH as a servers) that support the OAI-PMH as a means of exposing metadatardquomeans of exposing metadatardquo

ndash Service providers ldquoA service provider Service providers ldquoA service provider issues OAI-PMH requests to data issues OAI-PMH requests to data providers and uses the metadata as a providers and uses the metadata as a basis for building value-added servicesrdquobasis for building value-added servicesrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 88

Metadata Standards amp ApplicationsMetadata Standards amp Applications 99

Other important definitionsOther important definitions

ArchiveArchive Not the same as lsquoarchiversquo used in Not the same as lsquoarchiversquo used in libraries more like ldquorepositoryrdquolibraries more like ldquorepositoryrdquo

ProtocolProtocol a set of rules defining a set of rules defining communication between systems FTP communication between systems FTP (File Transfer Protocol) and HTTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are other (Hypertext Transport Protocol) are other examples of Internet protocolsexamples of Internet protocols

HarvestingHarvesting the gathering together of the gathering together of metadata from a number of distributed metadata from a number of distributed repositories into a combined data storerepositories into a combined data store

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1010

Inside OAI RepositoriesInside OAI Repositories repositoryrepository - A - A repositoryrepository is a network is a network

accessible server that can process accessible server that can process requests A requests A repositoryrepository is managed by a is managed by a data provider to expose metadata to data provider to expose metadata to harvestersharvesters

resourceresource - A - A resourceresource is the object or is the object or stuff that metadata is aboutrdquo whether stuff that metadata is aboutrdquo whether physical or digital stored in the repository physical or digital stored in the repository or a constituent of another databaseor a constituent of another database

itemitem - An - An itemitem is a constituent of a is a constituent of a repository from which metadata about a repository from which metadata about a resource can be disseminated resource can be disseminated

recordrecord - A - A recordrecord is metadata in a specific is metadata in a specific metadata formatmetadata format

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 2: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Goals of SessionGoals of Session

Understand interoperability protocols Understand interoperability protocols (OpenURL for reference OAI-PMH for (OpenURL for reference OAI-PMH for metadata sharing)metadata sharing)

Understand crosswalking and Understand crosswalking and mapping as it relates to mapping as it relates to interoperabilityinteroperability

Investigate issues concerning Investigate issues concerning metadata qualitymetadata quality

Metadata Standards amp ApplicationsMetadata Standards amp Applications 22

Whatrsquos the Point About Whatrsquos the Point About InteroperabilityInteroperability

For users itrsquos about resource discovery For users itrsquos about resource discovery (user tasks)(user tasks)ndash Whatrsquos out thereWhatrsquos out therendash Is it what I need for my taskIs it what I need for my taskndash Can I use itCan I use it

For resource creators itrsquos about For resource creators itrsquos about distribution and marketingdistribution and marketingndash How can I increase the number of people who How can I increase the number of people who

find my resources easilyfind my resources easilyndash How can I justify the funding required to make How can I justify the funding required to make

these resources availablethese resources availableMetadata Standards amp ApplicationsMetadata Standards amp Applications 33

Metadata Standards amp ApplicationsMetadata Standards amp Applications 44

OAI-PMHOAI-PMH Open Archives Initiative-Protocol for Open Archives Initiative-Protocol for

Metadata Harvesting (Metadata Harvesting (httpwwwopenarchivesorg))

Roots in the ePrint community although Roots in the ePrint community although applicability is much broaderapplicability is much broader

Mission ldquoThe Open Archives Initiative Mission ldquoThe Open Archives Initiative develops and promotes interoperability develops and promotes interoperability standards that aim to facilitate the standards that aim to facilitate the efficient dissemination of contentrdquoefficient dissemination of contentrdquo

Content in this context is actually Content in this context is actually ldquometadata about contentrdquoldquometadata about contentrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 55

Metadata About the Resource

Metadata Standards amp ApplicationsMetadata Standards amp Applications 66

OAI-PMH in a NutshellOAI-PMH in a Nutshell

Essentially provides a simple protocol Essentially provides a simple protocol for ldquoharvestrdquo and ldquoexposurerdquo of for ldquoharvestrdquo and ldquoexposurerdquo of metadata recordsmetadata records

Specifies a simple ldquowrapperrdquo around Specifies a simple ldquowrapperrdquo around metadata records providing metadata records providing metadata about the record itselfmetadata about the record itself

OAI-PMH is about the OAI-PMH is about the metadatametadata not not about the about the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 77

The OAI WorldThe OAI World Divided into two categoriesDivided into two categories

ndash Data providers ldquoA data provider Data providers ldquoA data provider maintains one or more repositories (web maintains one or more repositories (web servers) that support the OAI-PMH as a servers) that support the OAI-PMH as a means of exposing metadatardquomeans of exposing metadatardquo

ndash Service providers ldquoA service provider Service providers ldquoA service provider issues OAI-PMH requests to data issues OAI-PMH requests to data providers and uses the metadata as a providers and uses the metadata as a basis for building value-added servicesrdquobasis for building value-added servicesrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 88

Metadata Standards amp ApplicationsMetadata Standards amp Applications 99

Other important definitionsOther important definitions

ArchiveArchive Not the same as lsquoarchiversquo used in Not the same as lsquoarchiversquo used in libraries more like ldquorepositoryrdquolibraries more like ldquorepositoryrdquo

ProtocolProtocol a set of rules defining a set of rules defining communication between systems FTP communication between systems FTP (File Transfer Protocol) and HTTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are other (Hypertext Transport Protocol) are other examples of Internet protocolsexamples of Internet protocols

HarvestingHarvesting the gathering together of the gathering together of metadata from a number of distributed metadata from a number of distributed repositories into a combined data storerepositories into a combined data store

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1010

Inside OAI RepositoriesInside OAI Repositories repositoryrepository - A - A repositoryrepository is a network is a network

accessible server that can process accessible server that can process requests A requests A repositoryrepository is managed by a is managed by a data provider to expose metadata to data provider to expose metadata to harvestersharvesters

resourceresource - A - A resourceresource is the object or is the object or stuff that metadata is aboutrdquo whether stuff that metadata is aboutrdquo whether physical or digital stored in the repository physical or digital stored in the repository or a constituent of another databaseor a constituent of another database

itemitem - An - An itemitem is a constituent of a is a constituent of a repository from which metadata about a repository from which metadata about a resource can be disseminated resource can be disseminated

recordrecord - A - A recordrecord is metadata in a specific is metadata in a specific metadata formatmetadata format

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 3: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Whatrsquos the Point About Whatrsquos the Point About InteroperabilityInteroperability

For users itrsquos about resource discovery For users itrsquos about resource discovery (user tasks)(user tasks)ndash Whatrsquos out thereWhatrsquos out therendash Is it what I need for my taskIs it what I need for my taskndash Can I use itCan I use it

For resource creators itrsquos about For resource creators itrsquos about distribution and marketingdistribution and marketingndash How can I increase the number of people who How can I increase the number of people who

find my resources easilyfind my resources easilyndash How can I justify the funding required to make How can I justify the funding required to make

these resources availablethese resources availableMetadata Standards amp ApplicationsMetadata Standards amp Applications 33

Metadata Standards amp ApplicationsMetadata Standards amp Applications 44

OAI-PMHOAI-PMH Open Archives Initiative-Protocol for Open Archives Initiative-Protocol for

Metadata Harvesting (Metadata Harvesting (httpwwwopenarchivesorg))

Roots in the ePrint community although Roots in the ePrint community although applicability is much broaderapplicability is much broader

Mission ldquoThe Open Archives Initiative Mission ldquoThe Open Archives Initiative develops and promotes interoperability develops and promotes interoperability standards that aim to facilitate the standards that aim to facilitate the efficient dissemination of contentrdquoefficient dissemination of contentrdquo

Content in this context is actually Content in this context is actually ldquometadata about contentrdquoldquometadata about contentrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 55

Metadata About the Resource

Metadata Standards amp ApplicationsMetadata Standards amp Applications 66

OAI-PMH in a NutshellOAI-PMH in a Nutshell

Essentially provides a simple protocol Essentially provides a simple protocol for ldquoharvestrdquo and ldquoexposurerdquo of for ldquoharvestrdquo and ldquoexposurerdquo of metadata recordsmetadata records

Specifies a simple ldquowrapperrdquo around Specifies a simple ldquowrapperrdquo around metadata records providing metadata records providing metadata about the record itselfmetadata about the record itself

OAI-PMH is about the OAI-PMH is about the metadatametadata not not about the about the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 77

The OAI WorldThe OAI World Divided into two categoriesDivided into two categories

ndash Data providers ldquoA data provider Data providers ldquoA data provider maintains one or more repositories (web maintains one or more repositories (web servers) that support the OAI-PMH as a servers) that support the OAI-PMH as a means of exposing metadatardquomeans of exposing metadatardquo

ndash Service providers ldquoA service provider Service providers ldquoA service provider issues OAI-PMH requests to data issues OAI-PMH requests to data providers and uses the metadata as a providers and uses the metadata as a basis for building value-added servicesrdquobasis for building value-added servicesrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 88

Metadata Standards amp ApplicationsMetadata Standards amp Applications 99

Other important definitionsOther important definitions

ArchiveArchive Not the same as lsquoarchiversquo used in Not the same as lsquoarchiversquo used in libraries more like ldquorepositoryrdquolibraries more like ldquorepositoryrdquo

ProtocolProtocol a set of rules defining a set of rules defining communication between systems FTP communication between systems FTP (File Transfer Protocol) and HTTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are other (Hypertext Transport Protocol) are other examples of Internet protocolsexamples of Internet protocols

HarvestingHarvesting the gathering together of the gathering together of metadata from a number of distributed metadata from a number of distributed repositories into a combined data storerepositories into a combined data store

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1010

Inside OAI RepositoriesInside OAI Repositories repositoryrepository - A - A repositoryrepository is a network is a network

accessible server that can process accessible server that can process requests A requests A repositoryrepository is managed by a is managed by a data provider to expose metadata to data provider to expose metadata to harvestersharvesters

resourceresource - A - A resourceresource is the object or is the object or stuff that metadata is aboutrdquo whether stuff that metadata is aboutrdquo whether physical or digital stored in the repository physical or digital stored in the repository or a constituent of another databaseor a constituent of another database

itemitem - An - An itemitem is a constituent of a is a constituent of a repository from which metadata about a repository from which metadata about a resource can be disseminated resource can be disseminated

recordrecord - A - A recordrecord is metadata in a specific is metadata in a specific metadata formatmetadata format

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 4: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 44

OAI-PMHOAI-PMH Open Archives Initiative-Protocol for Open Archives Initiative-Protocol for

Metadata Harvesting (Metadata Harvesting (httpwwwopenarchivesorg))

Roots in the ePrint community although Roots in the ePrint community although applicability is much broaderapplicability is much broader

Mission ldquoThe Open Archives Initiative Mission ldquoThe Open Archives Initiative develops and promotes interoperability develops and promotes interoperability standards that aim to facilitate the standards that aim to facilitate the efficient dissemination of contentrdquoefficient dissemination of contentrdquo

Content in this context is actually Content in this context is actually ldquometadata about contentrdquoldquometadata about contentrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 55

Metadata About the Resource

Metadata Standards amp ApplicationsMetadata Standards amp Applications 66

OAI-PMH in a NutshellOAI-PMH in a Nutshell

Essentially provides a simple protocol Essentially provides a simple protocol for ldquoharvestrdquo and ldquoexposurerdquo of for ldquoharvestrdquo and ldquoexposurerdquo of metadata recordsmetadata records

Specifies a simple ldquowrapperrdquo around Specifies a simple ldquowrapperrdquo around metadata records providing metadata records providing metadata about the record itselfmetadata about the record itself

OAI-PMH is about the OAI-PMH is about the metadatametadata not not about the about the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 77

The OAI WorldThe OAI World Divided into two categoriesDivided into two categories

ndash Data providers ldquoA data provider Data providers ldquoA data provider maintains one or more repositories (web maintains one or more repositories (web servers) that support the OAI-PMH as a servers) that support the OAI-PMH as a means of exposing metadatardquomeans of exposing metadatardquo

ndash Service providers ldquoA service provider Service providers ldquoA service provider issues OAI-PMH requests to data issues OAI-PMH requests to data providers and uses the metadata as a providers and uses the metadata as a basis for building value-added servicesrdquobasis for building value-added servicesrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 88

Metadata Standards amp ApplicationsMetadata Standards amp Applications 99

Other important definitionsOther important definitions

ArchiveArchive Not the same as lsquoarchiversquo used in Not the same as lsquoarchiversquo used in libraries more like ldquorepositoryrdquolibraries more like ldquorepositoryrdquo

ProtocolProtocol a set of rules defining a set of rules defining communication between systems FTP communication between systems FTP (File Transfer Protocol) and HTTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are other (Hypertext Transport Protocol) are other examples of Internet protocolsexamples of Internet protocols

HarvestingHarvesting the gathering together of the gathering together of metadata from a number of distributed metadata from a number of distributed repositories into a combined data storerepositories into a combined data store

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1010

Inside OAI RepositoriesInside OAI Repositories repositoryrepository - A - A repositoryrepository is a network is a network

accessible server that can process accessible server that can process requests A requests A repositoryrepository is managed by a is managed by a data provider to expose metadata to data provider to expose metadata to harvestersharvesters

resourceresource - A - A resourceresource is the object or is the object or stuff that metadata is aboutrdquo whether stuff that metadata is aboutrdquo whether physical or digital stored in the repository physical or digital stored in the repository or a constituent of another databaseor a constituent of another database

itemitem - An - An itemitem is a constituent of a is a constituent of a repository from which metadata about a repository from which metadata about a resource can be disseminated resource can be disseminated

recordrecord - A - A recordrecord is metadata in a specific is metadata in a specific metadata formatmetadata format

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 5: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 55

Metadata About the Resource

Metadata Standards amp ApplicationsMetadata Standards amp Applications 66

OAI-PMH in a NutshellOAI-PMH in a Nutshell

Essentially provides a simple protocol Essentially provides a simple protocol for ldquoharvestrdquo and ldquoexposurerdquo of for ldquoharvestrdquo and ldquoexposurerdquo of metadata recordsmetadata records

Specifies a simple ldquowrapperrdquo around Specifies a simple ldquowrapperrdquo around metadata records providing metadata records providing metadata about the record itselfmetadata about the record itself

OAI-PMH is about the OAI-PMH is about the metadatametadata not not about the about the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 77

The OAI WorldThe OAI World Divided into two categoriesDivided into two categories

ndash Data providers ldquoA data provider Data providers ldquoA data provider maintains one or more repositories (web maintains one or more repositories (web servers) that support the OAI-PMH as a servers) that support the OAI-PMH as a means of exposing metadatardquomeans of exposing metadatardquo

ndash Service providers ldquoA service provider Service providers ldquoA service provider issues OAI-PMH requests to data issues OAI-PMH requests to data providers and uses the metadata as a providers and uses the metadata as a basis for building value-added servicesrdquobasis for building value-added servicesrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 88

Metadata Standards amp ApplicationsMetadata Standards amp Applications 99

Other important definitionsOther important definitions

ArchiveArchive Not the same as lsquoarchiversquo used in Not the same as lsquoarchiversquo used in libraries more like ldquorepositoryrdquolibraries more like ldquorepositoryrdquo

ProtocolProtocol a set of rules defining a set of rules defining communication between systems FTP communication between systems FTP (File Transfer Protocol) and HTTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are other (Hypertext Transport Protocol) are other examples of Internet protocolsexamples of Internet protocols

HarvestingHarvesting the gathering together of the gathering together of metadata from a number of distributed metadata from a number of distributed repositories into a combined data storerepositories into a combined data store

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1010

Inside OAI RepositoriesInside OAI Repositories repositoryrepository - A - A repositoryrepository is a network is a network

accessible server that can process accessible server that can process requests A requests A repositoryrepository is managed by a is managed by a data provider to expose metadata to data provider to expose metadata to harvestersharvesters

resourceresource - A - A resourceresource is the object or is the object or stuff that metadata is aboutrdquo whether stuff that metadata is aboutrdquo whether physical or digital stored in the repository physical or digital stored in the repository or a constituent of another databaseor a constituent of another database

itemitem - An - An itemitem is a constituent of a is a constituent of a repository from which metadata about a repository from which metadata about a resource can be disseminated resource can be disseminated

recordrecord - A - A recordrecord is metadata in a specific is metadata in a specific metadata formatmetadata format

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 6: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 66

OAI-PMH in a NutshellOAI-PMH in a Nutshell

Essentially provides a simple protocol Essentially provides a simple protocol for ldquoharvestrdquo and ldquoexposurerdquo of for ldquoharvestrdquo and ldquoexposurerdquo of metadata recordsmetadata records

Specifies a simple ldquowrapperrdquo around Specifies a simple ldquowrapperrdquo around metadata records providing metadata records providing metadata about the record itselfmetadata about the record itself

OAI-PMH is about the OAI-PMH is about the metadatametadata not not about the about the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 77

The OAI WorldThe OAI World Divided into two categoriesDivided into two categories

ndash Data providers ldquoA data provider Data providers ldquoA data provider maintains one or more repositories (web maintains one or more repositories (web servers) that support the OAI-PMH as a servers) that support the OAI-PMH as a means of exposing metadatardquomeans of exposing metadatardquo

ndash Service providers ldquoA service provider Service providers ldquoA service provider issues OAI-PMH requests to data issues OAI-PMH requests to data providers and uses the metadata as a providers and uses the metadata as a basis for building value-added servicesrdquobasis for building value-added servicesrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 88

Metadata Standards amp ApplicationsMetadata Standards amp Applications 99

Other important definitionsOther important definitions

ArchiveArchive Not the same as lsquoarchiversquo used in Not the same as lsquoarchiversquo used in libraries more like ldquorepositoryrdquolibraries more like ldquorepositoryrdquo

ProtocolProtocol a set of rules defining a set of rules defining communication between systems FTP communication between systems FTP (File Transfer Protocol) and HTTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are other (Hypertext Transport Protocol) are other examples of Internet protocolsexamples of Internet protocols

HarvestingHarvesting the gathering together of the gathering together of metadata from a number of distributed metadata from a number of distributed repositories into a combined data storerepositories into a combined data store

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1010

Inside OAI RepositoriesInside OAI Repositories repositoryrepository - A - A repositoryrepository is a network is a network

accessible server that can process accessible server that can process requests A requests A repositoryrepository is managed by a is managed by a data provider to expose metadata to data provider to expose metadata to harvestersharvesters

resourceresource - A - A resourceresource is the object or is the object or stuff that metadata is aboutrdquo whether stuff that metadata is aboutrdquo whether physical or digital stored in the repository physical or digital stored in the repository or a constituent of another databaseor a constituent of another database

itemitem - An - An itemitem is a constituent of a is a constituent of a repository from which metadata about a repository from which metadata about a resource can be disseminated resource can be disseminated

recordrecord - A - A recordrecord is metadata in a specific is metadata in a specific metadata formatmetadata format

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 7: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 77

The OAI WorldThe OAI World Divided into two categoriesDivided into two categories

ndash Data providers ldquoA data provider Data providers ldquoA data provider maintains one or more repositories (web maintains one or more repositories (web servers) that support the OAI-PMH as a servers) that support the OAI-PMH as a means of exposing metadatardquomeans of exposing metadatardquo

ndash Service providers ldquoA service provider Service providers ldquoA service provider issues OAI-PMH requests to data issues OAI-PMH requests to data providers and uses the metadata as a providers and uses the metadata as a basis for building value-added servicesrdquobasis for building value-added servicesrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 88

Metadata Standards amp ApplicationsMetadata Standards amp Applications 99

Other important definitionsOther important definitions

ArchiveArchive Not the same as lsquoarchiversquo used in Not the same as lsquoarchiversquo used in libraries more like ldquorepositoryrdquolibraries more like ldquorepositoryrdquo

ProtocolProtocol a set of rules defining a set of rules defining communication between systems FTP communication between systems FTP (File Transfer Protocol) and HTTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are other (Hypertext Transport Protocol) are other examples of Internet protocolsexamples of Internet protocols

HarvestingHarvesting the gathering together of the gathering together of metadata from a number of distributed metadata from a number of distributed repositories into a combined data storerepositories into a combined data store

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1010

Inside OAI RepositoriesInside OAI Repositories repositoryrepository - A - A repositoryrepository is a network is a network

accessible server that can process accessible server that can process requests A requests A repositoryrepository is managed by a is managed by a data provider to expose metadata to data provider to expose metadata to harvestersharvesters

resourceresource - A - A resourceresource is the object or is the object or stuff that metadata is aboutrdquo whether stuff that metadata is aboutrdquo whether physical or digital stored in the repository physical or digital stored in the repository or a constituent of another databaseor a constituent of another database

itemitem - An - An itemitem is a constituent of a is a constituent of a repository from which metadata about a repository from which metadata about a resource can be disseminated resource can be disseminated

recordrecord - A - A recordrecord is metadata in a specific is metadata in a specific metadata formatmetadata format

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 8: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 88

Metadata Standards amp ApplicationsMetadata Standards amp Applications 99

Other important definitionsOther important definitions

ArchiveArchive Not the same as lsquoarchiversquo used in Not the same as lsquoarchiversquo used in libraries more like ldquorepositoryrdquolibraries more like ldquorepositoryrdquo

ProtocolProtocol a set of rules defining a set of rules defining communication between systems FTP communication between systems FTP (File Transfer Protocol) and HTTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are other (Hypertext Transport Protocol) are other examples of Internet protocolsexamples of Internet protocols

HarvestingHarvesting the gathering together of the gathering together of metadata from a number of distributed metadata from a number of distributed repositories into a combined data storerepositories into a combined data store

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1010

Inside OAI RepositoriesInside OAI Repositories repositoryrepository - A - A repositoryrepository is a network is a network

accessible server that can process accessible server that can process requests A requests A repositoryrepository is managed by a is managed by a data provider to expose metadata to data provider to expose metadata to harvestersharvesters

resourceresource - A - A resourceresource is the object or is the object or stuff that metadata is aboutrdquo whether stuff that metadata is aboutrdquo whether physical or digital stored in the repository physical or digital stored in the repository or a constituent of another databaseor a constituent of another database

itemitem - An - An itemitem is a constituent of a is a constituent of a repository from which metadata about a repository from which metadata about a resource can be disseminated resource can be disseminated

recordrecord - A - A recordrecord is metadata in a specific is metadata in a specific metadata formatmetadata format

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 9: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 99

Other important definitionsOther important definitions

ArchiveArchive Not the same as lsquoarchiversquo used in Not the same as lsquoarchiversquo used in libraries more like ldquorepositoryrdquolibraries more like ldquorepositoryrdquo

ProtocolProtocol a set of rules defining a set of rules defining communication between systems FTP communication between systems FTP (File Transfer Protocol) and HTTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are other (Hypertext Transport Protocol) are other examples of Internet protocolsexamples of Internet protocols

HarvestingHarvesting the gathering together of the gathering together of metadata from a number of distributed metadata from a number of distributed repositories into a combined data storerepositories into a combined data store

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1010

Inside OAI RepositoriesInside OAI Repositories repositoryrepository - A - A repositoryrepository is a network is a network

accessible server that can process accessible server that can process requests A requests A repositoryrepository is managed by a is managed by a data provider to expose metadata to data provider to expose metadata to harvestersharvesters

resourceresource - A - A resourceresource is the object or is the object or stuff that metadata is aboutrdquo whether stuff that metadata is aboutrdquo whether physical or digital stored in the repository physical or digital stored in the repository or a constituent of another databaseor a constituent of another database

itemitem - An - An itemitem is a constituent of a is a constituent of a repository from which metadata about a repository from which metadata about a resource can be disseminated resource can be disseminated

recordrecord - A - A recordrecord is metadata in a specific is metadata in a specific metadata formatmetadata format

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 10: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1010

Inside OAI RepositoriesInside OAI Repositories repositoryrepository - A - A repositoryrepository is a network is a network

accessible server that can process accessible server that can process requests A requests A repositoryrepository is managed by a is managed by a data provider to expose metadata to data provider to expose metadata to harvestersharvesters

resourceresource - A - A resourceresource is the object or is the object or stuff that metadata is aboutrdquo whether stuff that metadata is aboutrdquo whether physical or digital stored in the repository physical or digital stored in the repository or a constituent of another databaseor a constituent of another database

itemitem - An - An itemitem is a constituent of a is a constituent of a repository from which metadata about a repository from which metadata about a resource can be disseminated resource can be disseminated

recordrecord - A - A recordrecord is metadata in a specific is metadata in a specific metadata formatmetadata format

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 11: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1111

OAI GoalsOAI Goals Low barrier to participationLow barrier to participation

ndash Server software available in many Server software available in many programming languages intended to be programming languages intended to be easy to installeasy to install

ndash Server-less implementation available now Server-less implementation available now via ldquoStatic repositoryrdquo (essentially a web via ldquoStatic repositoryrdquo (essentially a web page that looks like an OAI response and page that looks like an OAI response and can be harvested as such)can be harvested as such)

Limited set of commandsLimited set of commands Predictable responses and flows of Predictable responses and flows of

datadata

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 12: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1212

Other OAI InfoOther OAI Info Responses are encoded in XML syntaxResponses are encoded in XML syntax OAI-PMH supports any metadata format OAI-PMH supports any metadata format

encoded in XMLmdashSimple Dublin Core is the encoded in XMLmdashSimple Dublin Core is the minimal format specified minimal format specified

Data Providers may define a logical set Data Providers may define a logical set hierarchy to support levels of granularity hierarchy to support levels of granularity for harvesting by Service Providersfor harvesting by Service Providers

Date stamps flag the last change of the Date stamps flag the last change of the metadata set and thus provide further metadata set and thus provide further support for granularity of harvestingsupport for granularity of harvesting

OAI-PMH supports flow controlOAI-PMH supports flow control

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 13: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1313

OAI RequestsOAI Requests Identify--gtReturns general information Identify--gtReturns general information

about the particular OAI serverabout the particular OAI server ListMetadataFormats--gtreturns formats ListMetadataFormats--gtreturns formats

availableavailable ListSets--gtreturns list of sets availableListSets--gtreturns list of sets available ListIdentifiers--gtreturns identifiers onlyListIdentifiers--gtreturns identifiers only ListRecords--gtreturns record ids in a setListRecords--gtreturns record ids in a set GetRecord--gtreturns particular recordGetRecord--gtreturns particular record Try it out at the UIUC OIA Registry Try it out at the UIUC OIA Registry ((

httpgitagraingeruiuceduregistrysearchformasp))

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 14: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1414

Dates Used in OAI-PMHDates Used in OAI-PMH

Datestamps are used as values in requests Datestamps are used as values in requests to support selective harvesting by date to support selective harvesting by date (generally latest update date of the (generally latest update date of the metadata record)metadata record)

Datestamps are also used in record Datestamps are also used in record headers in responsesheaders in responses

Datestamps are particular to a repositoryDatestamps are particular to a repository Repeat OAI dates are about the Repeat OAI dates are about the metadatametadata

not the not the resourcesresources

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 15: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1515

OAI-PMH Optional ContainersOAI-PMH Optional Containers

Repository levelRepository levelndash RightsRightsndash BrandingBranding

Record levelRecord levelndash AboutAbout

ProvenanceProvenanceRightsRights

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 16: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1616

About Container ExampleAbout Container Example

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 17: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1717

OAI Rights ExpressionsOAI Rights Expressions

Rights expressions are valid at three Rights expressions are valid at three levelslevelsndash RepositoryRepositoryndash SetSetndash RecordRecord

Rights expressed at the Repository Rights expressed at the Repository and Set levels are not a substitute for and Set levels are not a substitute for expressions at the Record Levelexpressions at the Record Level

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 18: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

OAI Best Practices (DLF amp OAI Best Practices (DLF amp NSDL)NSDL)

Guidelines for data providers and Guidelines for data providers and service providersservice providersndash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpindexphpMain_Page1048708mediawikioaibpindexphpMain_Page1048708 Best Practices for Shareable Best Practices for Shareable

MetadataMetadatandash httpwebservicesitcsumicheduhttpwebservicesitcsumichedu

mediawikioaibpPublicTOC1048708mediawikioaibpPublicTOC1048708

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1818

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 19: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

OAI In PracticeOAI In Practice

The UIUC OAI-PMH Data Provider RegistryThe UIUC OAI-PMH Data Provider Registryndash httpgitagraingeruiuceduregistrysearchformasp

Includes most known data providersIncludes most known data providers Link on home page to Service ProvidersLink on home page to Service Providers Provides multiple reports sample records Provides multiple reports sample records

browses search etcbrowses search etc Ex Show report from left hand menu Ex Show report from left hand menu

ldquoDistinct Metadata Schemasrdquo ldquoDistinct Metadata Schemasrdquo ndash httpgitagraingeruiuceduregistryListSchemasasp ndash Choose a schema look for providers and sample records Choose a schema look for providers and sample records

Metadata Standards amp ApplicationsMetadata Standards amp Applications 1919

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 20: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2020

Whatrsquos an OpenURLWhatrsquos an OpenURL

The OpenURL provides a standardized The OpenURL provides a standardized format for transporting bibliographic format for transporting bibliographic metadata about objects between metadata about objects between information servicesinformation services

Provides a basis for building services via Provides a basis for building services via the notion of an the notion of an extended service-linkextended service-link which moves beyond the classic notion of which moves beyond the classic notion of a a reference linkreference link (a link from metadata to (a link from metadata to the full-content described by the the full-content described by the metadata)metadata)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 21: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2121

ldquoldquoThe OpenURL standard enables a user who has The OpenURL standard enables a user who has retrieved an article citation for example to obtain retrieved an article citation for example to obtain immediate access to the most appropriate copy of immediate access to the most appropriate copy of that object through the implementation of extended that object through the implementation of extended linking services The selection of the best copy is linking services The selection of the best copy is based on user and organizational preferences based on user and organizational preferences regarding the location of the copy its cost and regarding the location of the copy its cost and agreements with information suppliers and similar agreements with information suppliers and similar considerations This selection occurs without the considerations This selection occurs without the knowledge of the user it is made possible by the knowledge of the user it is made possible by the transport of metadata with the OpenURL link from transport of metadata with the OpenURL link from the source citation to a resolver (the link server) the source citation to a resolver (the link server) which stores the preference information and the which stores the preference information and the links to the appropriate materialrdquolinks to the appropriate materialrdquo

--OpenURL Overview SFX website--OpenURL Overview SFX website

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 22: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2222

OpenURL CharacteristicsOpenURL Characteristics

Protocol operates between an Protocol operates between an information resource and a service information resource and a service componentcomponent

Service component is called a ldquolink Service component is called a ldquolink serverrdquo or ldquolink resolverrdquoserverrdquo or ldquolink resolverrdquo

Link server defines the user contextLink server defines the user context Takes source citation and determines Takes source citation and determines

whether a user has accesswhether a user has access

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 23: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2323

Distinguishing UsersDistinguishing Users

Uses information stored in a cookie Uses information stored in a cookie (the CookiePusher mechanism)(the CookiePusher mechanism)

Uses information contained in a Uses information contained in a digital certificate such as the one digital certificate such as the one proposed by the DLF digital proposed by the DLF digital certificates prototype projectcertificates prototype project

Identifies a users IP addressIdentifies a users IP address Obtains user attributes via the Obtains user attributes via the

Shibboleth frameworkShibboleth framework

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 24: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2424

Examples of Extended Service Examples of Extended Service LinksLinks

From a record in an abstracting and indexing From a record in an abstracting and indexing database (AampI) to the full-text described by the database (AampI) to the full-text described by the recordrecord

From a record describing a book in a library From a record describing a book in a library catalogue to a description of the same book in an catalogue to a description of the same book in an Internet book shopInternet book shop

From a reference in a journal article to a record From a reference in a journal article to a record matching that reference in an AampI databasematching that reference in an AampI database

From a citation in a journal article to a record in a From a citation in a journal article to a record in a library catalogue that shows the library holdings library catalogue that shows the library holdings of the cited journalof the cited journal

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 25: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2525

OpenURL Examples amp DemoOpenURL Examples amp Demo

httpsfxserveruniedusfxmenuissn=1234-5678ampdate=1998ampvolume=12ampissue=2ampspage=134

An OpenURL demoAn OpenURL demondash httpwwwukolnacukdistributed-syste

msopenurl

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 26: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2626

Defining and Ensuring Metadata Defining and Ensuring Metadata QualityQuality

What constitutes qualityWhat constitutes quality Techniques for evaluating and Techniques for evaluating and

enforcing consistency and enforcing consistency and predictabilitypredictability

Automated metadata creation Automated metadata creation advantages and disadvantagesadvantages and disadvantages

Metadata maintenance strategiesMetadata maintenance strategies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 27: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2727

Beginning to Define QualityBeginning to Define Quality

Experience of the library Experience of the library community--BIBCO amp NACOcommunity--BIBCO amp NACOndash Agreed upon standards for library Agreed upon standards for library

qualityqualityndash Training and documentation in support Training and documentation in support

of practitionersof practitionersndash Review and enforcement of standards Review and enforcement of standards

by means of institutional ldquobuddy by means of institutional ldquobuddy systemrdquosystemrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 28: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2828

How Does Quality HappenHow Does Quality Happen

Lessons from the library communityLessons from the library communityndash Quality is quantifiable and measurableQuality is quantifiable and measurablendash To be effective enforcement of standards To be effective enforcement of standards

of quality must take place at the of quality must take place at the community levelcommunity level

FurthermoreFurthermorendash Data problems are not unique to particular Data problems are not unique to particular

communitiescommunitiesndash general strategies can improve general strategies can improve

interoperabilityinteroperability

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 29: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 2929

Quality Measurement CriteriaQuality Measurement Criteria

CompletenessCompleteness AccuracyAccuracy ProvenanceProvenance Conformance to expectationsConformance to expectations Logical consistency and coherenceLogical consistency and coherence Timeliness (Currency and Lag)Timeliness (Currency and Lag) AccessibilityAccessibility

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 30: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3030

CompletenessCompleteness

ldquoldquoMetadata should describe the target Metadata should describe the target objects as completely as objects as completely as economically feasiblerdquoeconomically feasiblerdquo

ldquoldquoElement set should be applied to Element set should be applied to the target object population as the target object population as completely as possiblerdquocompletely as possiblerdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 31: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3131

AccuracyAccuracy

Information provided in values Information provided in values should be correct and factualshould be correct and factual

Editing applied toEditing applied tondash Eliminate typosEliminate typosndash Ensure conforming name expressionsEnsure conforming name expressionsndash Ensure standard abbreviations usages Ensure standard abbreviations usages

in generalin general

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 32: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3232

ProvenanceProvenance

Who prepared the metadata What Who prepared the metadata What do we know about the preparerdo we know about the preparer

What methods were used to create What methods were used to create the metadata Is it human created or the metadata Is it human created or created by machinecreated by machine

What transformations have been What transformations have been applied since creationapplied since creation

Where has it been beforeWhere has it been before

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 33: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3333

Conformance to ExpectationsConformance to Expectations

Contains elements a community Contains elements a community would expect to find would expect to find

Controlled vocabularies are well-Controlled vocabularies are well-chosen and explicitly exposed to chosen and explicitly exposed to downstream usersdownstream users

Metadata is reflective of community Metadata is reflective of community thinking about necessary thinking about necessary compromises compromises

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 34: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3434

Logical ConsistencyCoherenceLogical ConsistencyCoherence

Standard mechanisms like Standard mechanisms like application profiles and common application profiles and common crosswalks are usedcrosswalks are used

Similar structures and appearance Similar structures and appearance are enabled for search resultsare enabled for search results

There is very limited reliance on There is very limited reliance on defaulted valuesdefaulted values

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 35: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3535

TimelinessTimeliness CurrencyCurrency

ndash Target object changes but metadata does Target object changes but metadata does notnot

LagLagndash Target object disseminated before some Target object disseminated before some

or all metadata is availableor all metadata is available ldquoldquoMetadata agingrdquo is affected by Metadata agingrdquo is affected by

cultural differences between librarians cultural differences between librarians and technologistsand technologistsndash Librarians once and itrsquos doneLibrarians once and itrsquos donendash Technologists metadata as an iterative Technologists metadata as an iterative

processprocess

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 36: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3636

AccessibilityAccessibility

Barriers to accessibility may be Barriers to accessibility may be economic technical or organizationaleconomic technical or organizationalndash Metadata as ldquopremiumrdquo or proprietary Metadata as ldquopremiumrdquo or proprietary

informationinformationndash Unreadable for technical reasons (file Unreadable for technical reasons (file

formats etc)formats etc)ndash Metadata may not be properly linked to Metadata may not be properly linked to

relevant object(s)relevant object(s)

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 37: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3737

Evaluating Metadata (1)Evaluating Metadata (1)

Random sampling (XMLSpy)Random sampling (XMLSpy)ndash AdvantagesAdvantages

Includes some formatting and color codingIncludes some formatting and color coding

ndash DisadvantagesDisadvantagesAssumes consistencypredictabilityAssumes consistencypredictabilityDifficult to determine extent of problems Difficult to determine extent of problems

foundfoundTedious at bestTedious at best

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 38: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3838

Evaluating Metadata (2)Evaluating Metadata (2)

Spreadsheets (Microsoft Excel)Spreadsheets (Microsoft Excel)ndash AdvantagesAdvantages

Better sorting and control by reviewerBetter sorting and control by reviewer

ndash DisadvantagesDisadvantagesUnwieldy for large filesUnwieldy for large filesRequires sustained focus from reviewerRequires sustained focus from reviewerRequires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 39: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 3939

Evaluating Metadata (3)Evaluating Metadata (3)

Visual Graphical Analysis (Spotfire)Visual Graphical Analysis (Spotfire)ndash AdvantagesAdvantages

View of several data dimensions simultaneouslyView of several data dimensions simultaneously Reviewer controls data displayReviewer controls data display Tends to pull reviewer focus to anomaliesTends to pull reviewer focus to anomalies Handles fairly large files at one time while allowing Handles fairly large files at one time while allowing

subset viewssubset views Display manipulation possible without programmersDisplay manipulation possible without programmers

ndash DisadvantagesDisadvantages High cost of softwareHigh cost of software Requires translation into tab-delimited fileRequires translation into tab-delimited file

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 40: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp Applications

40

Element Names vs Record Ids (Scatter Plot)

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 41: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp Applications

41

Missing Elements (Scatter Plot)

2 records without

language element

format element present

inconsistently

Easy to rescale axis on the fly

and scroll through records

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 42: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp Applications

42

Table View

Non-empty ldquono informationrdquo

values that may confuse end users

Only DC Date elements are

selected for display

The only W3CDTF syntax present is four

digits

Sorted by element value

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 43: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4343

Improving Metadata Quality hellipImproving Metadata Quality hellip

DocumentationDocumentationndash Basic standards best practice Basic standards best practice

guidelines examplesguidelines examplesndash Exposure and maintenance of local and Exposure and maintenance of local and

community vocabulariescommunity vocabulariesndash Application ProfilesApplication Profilesndash Training materials tools methodologiesTraining materials tools methodologies

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 44: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4444

hellip hellip Over TimeOver Time

Culture changeCulture changendash Support for documentation and Support for documentation and

exchange of knowledge and experienceexchange of knowledge and experiencendash Routine contribution to the ldquogeneral Routine contribution to the ldquogeneral

goodrdquogoodrdquondash More focused research on practical More focused research on practical

metadata use and quality considerationsmetadata use and quality considerationsndash Better project-based and community-Better project-based and community-

wide documentationwide documentation

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 45: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4545

CrosswalkingCrosswalking

ldquoldquoCrosswalks support conversion projects and Crosswalks support conversion projects and semantic interoperability to enable semantic interoperability to enable searching across heterogeneous searching across heterogeneous distributed databases Inherently there distributed databases Inherently there are limitations to crosswalks there is are limitations to crosswalks there is rarely a one-to-one correspondence rarely a one-to-one correspondence between the fields or data elements in between the fields or data elements in different information systemsrdquodifferent information systemsrdquo

-- Mary Woodley -- Mary Woodley ldquoCrosswalks The Path to Universal AccessrdquoldquoCrosswalks The Path to Universal Accessrdquo

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 46: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4646

ldquoldquoMetadata schema transformations are more Metadata schema transformations are more complex than purely structural transforms complex than purely structural transforms because they require a set of equivalences because they require a set of equivalences identified by human expertsmdashDublin Core title identified by human expertsmdashDublin Core title can be mapped to MARC 245 Dublin Core can be mapped to MARC 245 Dublin Core author can be mapped to MARC 100 and so onauthor can be mapped to MARC 100 and so onmdashbut this important knowledge is recorded in a mdashbut this important knowledge is recorded in a multitude of ways that are not standardized and multitude of ways that are not standardized and not always machine-processable including not always machine-processable including Web pages databases spreadsheets PDF Web pages databases spreadsheets PDF documents and the source code of many documents and the source code of many computer languagesrdquocomputer languagesrdquo -- Jean Godby -- Jean Godby Two Paths to Interoperable MetadataTwo Paths to Interoperable Metadata

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 47: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

CrosswalksCrosswalks

In general Semantic mapping of elements In general Semantic mapping of elements between source and target metadata standardsbetween source and target metadata standards

The process of metadata conversion specification The process of metadata conversion specification includes transformations required to convert a includes transformations required to convert a metadata record content to another format metadata record content to another format includingincludingndash Element to element mappingElement to element mappingndash Hierarchy and object resolutionHierarchy and object resolutionndash Metadata content conversionsMetadata content conversionsndash Stylesheets can be created to transform Stylesheets can be created to transform

metadata based on crosswalksmetadata based on crosswalks

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4747

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 48: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4848

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 49: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 4949

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 50: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Available CrosswalksAvailable Crosswalks

Library of CongressLibrary of Congressndash httpwwwlocgovmarcmarcdoczhtmlhttpwwwlocgovmarcmarcdoczhtml

MITMITndash httplibrariesmiteduguideshttplibrariesmiteduguides

subjectsmetadatamappingshtmlsubjectsmetadatamappingshtml GettyGetty

ndash httpwwwgettyeduresearchhttpwwwgettyeduresearchconducting_researchstandardsconducting_researchstandardsintrometadatacrosswalkshtmlintrometadatacrosswalkshtml

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5050

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 51: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Problems With Converted Problems With Converted RecordsRecords

Differences in granularity (complex Differences in granularity (complex vs simple scheme)vs simple scheme)ndash Some data might be lostSome data might be lostndash Differences in semantics can occurDifferences in semantics can occurndash Differences in use of content standards Differences in use of content standards

make sharing sometimes problematicmake sharing sometimes problematicndash Properties may vary (eg repeatability)Properties may vary (eg repeatability)

Converting everything may not Converting everything may not always be the best solutionalways be the best solution

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5151

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 52: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

ExampleMappingExampleMappingMODStitle to DCtitleMODStitle to DCtitle

Includes attribute for type of titleIncludes attribute for type of titlendash AbbreviatedAbbreviatedndash TranslatedTranslatedndash AlternativeAlternativendash UniformUniform

Other attributesOther attributesndash IDauthoritydisplayLabelxLinkIDauthoritydisplayLabelxLink

Subelements title partName Subelements title partName partNumber nonSortpartNumber nonSort

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5252

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 53: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Mapping MODStitle toMapping MODStitle toDCtitleDCtitle

DC has one element refinementDC has one element refinement

AlternativeAlternativendash DC title has no substructure MODS allows for DC title has no substructure MODS allows for

subelements for partNumber partNamesubelements for partNumber partName Best practice statement in DC-Lib says to Best practice statement in DC-Lib says to

include initial article include initial article ndash MODS parses intoltnonSortgtMODS parses intoltnonSortgt

MODS can link to a title in an authority file MODS can link to a title in an authority file if desiredif desired

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5353

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View
Page 54: Metadata Standards and Applications 8. Metadata Interoperability and Quality Issues

Metadata Standards amp ApplicationsMetadata Standards amp Applications 5454

ExerciseExercise

Evaluate a small set of human and Evaluate a small set of human and machine-created metadatamachine-created metadata

  • Element Names vs Record Ids (Scatter Plot)
  • Missing Elements (Scatter Plot)
  • Table View