managing your metadata quality 2010 crossref workshops
TRANSCRIPT
![Page 1: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/1.jpg)
Patricia FeeneyMetadata Quality Coordinator
Managing your metadata quality
![Page 2: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/2.jpg)
Agenda
I. Metadata quality auditII. DOI registrationIII. Conflicts overhaul (discussion)IV. Metadata Quality tools
![Page 3: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/3.jpg)
Best query ever -> bad metadata = matchMediocre query -> bad metadata = matchHorrible query -> bad metadata = match
Best query ever -> good metadata = match ✓+
Mediocre query -> good metadata = match (probably) ✓
Horrible query -> good metadata = match (maybe) ✓-
Metadata Quality Audit: Overview
Accurate and complete metadata is vital to querying and citation linking.
If the metadata for a DOI is incorrect, incomplete, or messy, a match can't be made, regardless of the quality of a query.
![Page 4: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/4.jpg)
Current efforts include:
ReportsResolution report (emailed
monthly)depositor report (on website)crawler (on website)field report (on website)conflict report (on website,
emailed monthly)schematron reports (emailed
weekly)failed query report (on website)DOI error reports (emailed daily)
Contact members individually (as issues arise)
Documentation and communication
![Page 5: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/5.jpg)
Metadata Quality Audit
A Metadata Quality Audit will: provide publishers with detailed feedback on
the quality of their metadata by identifying problem areas
identify members who need attention provide motivation and support to members
with metadata issues The intent of the audit is to provide information, but there may be consequences for extreme abusers.
![Page 6: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/6.jpg)
Audit Scope
I. DOI resolutionII. ConflictsIII.Overall metadata
qualityIV.Metadata
maintenance Hello, I’d like to
audit you
Great, lets get
started!Hooray!
![Page 7: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/7.jpg)
Level I: DOIs that have been distributed but not deposited and resolve to the Handle error page. * Level II: DOIs resolving to an error page *
Level III: DOIs with response page blocked by access control Level IV: DOIs that resolve to an inadequate response page.
I. DOI Resolution
* actionable transgressions
![Page 8: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/8.jpg)
II. Conflicts
Conflicts occur when two (or more) DOIs are deposited with identical metadata.
Level I: conflicts created between members *
Level II: conflicts within a publisher prefix(es) *
Level III: conflicts created due to insufficient metadata +
Level IV: conflicts created due to item/content type +
* actionable transgressions+ this may change, more later
![Page 9: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/9.jpg)
Quality of deposited metadata
I. Missing metadata: is all available metadata deposited?
II. Accuracy: is metadata correct?
III. Unusual metadata: does metadata fit into the correct content type?
IV. Overall quality: is metadata messy?
![Page 10: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/10.jpg)
Maintenance
I. Gaps in coverage - this usually indicates undeposited DOIs (very very bad)
II. Currency of deposits - are deposits made ahead of DOIs being distributed?
III. Title maintenance - less of a problem with recent title restrictions, but we still have problems, title abbreviations
IV. Reference linking compliance
![Page 11: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/11.jpg)
Actionable AreasDOI Resolution:
Level I (Undeposited DOIs)Level II (DOIs resolving to error page)
If action is not taken within a reasonable time period (TBD), DOIs will be registered on behalf of the member (eventually for a fee) Continual distribution of unregistered DOIs may affect membership
Conflicts:Level I conflict created between members Level II conflicts within a publisher prefix
A $2 per DOI conflict penalty fee may be imposed for conflicts of this type if they are not resolved within a reasonable time period (TBD).
Metadata Maintenance:Outbound linking compliance
members found to not be linking during the audit will be subject to non-linking penalties
![Page 12: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/12.jpg)
Audit Process
![Page 13: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/13.jpg)
Questions?
![Page 14: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/14.jpg)
II. DOI Registration Pilot
DOIs should without exception be registered before they are released to the public.
Most DOIs resolve, but the ones that don’t are a big problem.
Solution: we’re going to register them*
*(ideal solution: publisher registers them)
![Page 15: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/15.jpg)
DOI selection: At the moment, we will register DOIs reported by end users, using the DOI error report as a source.
![Page 16: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/16.jpg)
![Page 17: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/17.jpg)
DOI error report:
Implemented mid-2008
~4,000 DOI errors reported monthly
> 1,400 fixed monthly through publisher deposits
Some of the unfixed DOIs are not ‘real’ DOIs, but many are.
![Page 18: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/18.jpg)
We will register DOIs that meet the following criteria: Have been distributed publicly by the
publisher/prefix owner Have an identifiable response page Have been reported to the publisher’s
technical and business contacts
![Page 19: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/19.jpg)
DOI Registration Process
1. DOI reported: a user reports an unresolving DOI using the DOI error form
2. Technical contact notified (DOI error report email)
3. CrossRef review: CR staff reviews reported DOIs and expires DOIs that do not meet our registration criteria
4. Business contact notified: 2 weeks from the initial report, business contact is notified of remaining valid unregistered DOIs.
5. CR deposit: after 2 weeks have passed from business contact notification, CrossRef will register any undeposited DOIs.
![Page 20: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/20.jpg)
Questions?
![Page 21: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/21.jpg)
Conflicts overhaulConflicts occur when two (or more) DOIs
share the same metadata, suggesting two DOIs are assigned to a single item.
![Page 22: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/22.jpg)
Why are conflicts bad?
Only one DOI should be assigned per item
Queries will return multiple DOIs, causing confusion
Some queries (OpenURL) may not return a DOI if multiple results are present
Conflicts between two DOIs often result in one of the DOIs being neglected***
![Page 23: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/23.jpg)
We currently have ~200,000+ conflicts in our system. Not all of them are a problem:
For some items, our schema only allows minimal metadata
Some content types require matching metadata (standards and book chapters with minimal metadata (dictionaries) for example)
![Page 24: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/24.jpg)
Legitimate conflicts
Conflict between 2 prefixes:
http://dx.doi.org/10.1639/0044-7447(2001)030[0037:IOPOFU]2.0.CO;2
http://dx.doi.org/10.1579/0044-7447-30.1.37
Sample query
Conflict within 1 prefix:
http://dx.doi.org/10.3724/SP.J.1006.2008.00070http://dx.doi.org/10.3724/SP.J.1006.2008.00770
Journal Title Year Vol Issue Page Author
Article Title
AMBIO 2001
30 1 37 Köhlin Impact of Plantations on Forest Use a...
Journal Title Year Vol Iss
Page
Author Article Title
ACTA AGRONOMICA SINICA
2008 34 5 770 Zhang Differential Gene Expression in Upper…
![Page 25: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/25.jpg)
‘Bad’ conflicts
Conflicts with minimal metadata:
10.1002/ijc.1109510.1002/ijc.11093
Conflict due to content type:
10.1520/C0506-10 10.1520/C0506-10A10.1520/C0506-10B
Journal Title Year Vol Issue
Page Author Article Title
International Journal of Cancer 2003 104 6 798 Errata
Book Title Year Edition
Page Author Title
Specification for Reinforced Concrete...
2010 2010
C13 Committee
![Page 26: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/26.jpg)
Elements considered during conflict generation: Content type Journal, book and/or series title Article title /content_item title (book chapters) Publication year Volume Issue First page Author Edition
If there is a match between all deposited elements, a conflict is generated.
2 Items with matching journal title, volume, issue, and article title will cause a conflict.
![Page 27: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/27.jpg)
Ideas?What should our minimum set of
metadata be?
How should conflicts be monitored/reported?
![Page 28: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/28.jpg)
Managing your metadata quality
![Page 29: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/29.jpg)
Sample #1: incorrect metadataQ: My link resolver is retrieving the wrong metadata for
DOI 10.1002/rra.1288, causing our links to break - here is my query*:
http://www.crossref.org/[email protected]&aulast=Null&title=River Research and Applications&volume=26&issue=6&page=663&year=2010
*query metadata matches the response page metadata
A: Two problems with deposited metadata (DOI query):#1 <year media_type="print">2009</year>
#2 <pages> <first_page>n/a</first_page> <last_page>n/a</last_page>
</pages>
![Page 30: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/30.jpg)
Sample #2: messy metadata
Q: I know DOI 10.1068/p6742 exists, why doesn’t my query work?
A: Let’s check the guest query form
Metadata for article:
Newport R, Preston C, 2010, "Pulling the finger off disrupts agency, embodiment and peripersonal space" Perception 39(9) 1296 – 1298
Problem is: author surname is deposited as: <person_name sequence="first" contributor_role="author">
<given_name>Roger</given_name></given_name>
<surname><surname>Newport</surname></surname>
</person_name>
![Page 31: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/31.jpg)
Sample #3: duplicate authorsQ: Why does DOI 10.2307/1382491 have multiple
versions of the same author?
A: attempt to improve query matching
<contributors> <person_name sequence="first"
contributor_role="author"> <given_name>Erling Johan</given_name> <surname>Solberg</surname>
</person_name> <person_name sequence="additional"
contributor_role="author"> <given_name>Bernt-Erik</given_name> <surname>Sæther</surname> </person_name> <person_name sequence="additional"
contributor_role="author"> <given_name>Bernt-Erik</given_name> <surname>Saether</surname> </person_name>
</contributors>
![Page 32: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/32.jpg)
New(ish) tools for managing metadata and deposit problems
Schema documentation: http://www.crossref.org/schema/documentation/ or linked from help doc
Reporting problems / asking for help:
Help documentation (http://www.crossref.org/help/)
Support portal and forums (http://support.crossref.org)
Contact [email protected]
![Page 33: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/33.jpg)
Schematron update
Schematron reports notify depositors of non-fatal deposit issues
35-40 emails sent out weekly
Alerts are generated for < 1% of deposits
Tend to identify ‘messy’ deposits
Rules updated periodically
![Page 34: Managing Your Metadata Quality 2010 CrossRef Workshops](https://reader034.vdocuments.us/reader034/viewer/2022052618/554e942eb4c90526358b5017/html5/thumbnails/34.jpg)
Schematron Warnings
page number contains under-
score2%
first page contains dash4%
last page contains
dash7%
Jr.' in surname61%
punctuation in surname
26%
Jr. in surname:Araújo JrPrata Jr.Szezech Jr.Punctuation in surname:(Earven) TribbleFrederick (Frikkie) J.Arch Marin [email protected]********Other rules:
‘ed’ ‘iss’ ‘vol’ in edition, issue, volume elements
Publication year exceeds current year by >2
Surname / title all upper case