1
Pre-processing OpenURLs
Case Study: University of Kansas
John [email protected]. of Kansas EndUser 2004
2
Outline the problems
examples possible solutions / tools
examples PubMed exception Benefits
3
The problems... inaccurate incoming OpenURLs
data in the wrong element incomplete incoming OpenURLs
GENRE sometimes missing Author data missing or hidden
problematic journal title data no logging for statistics
4
The problems... lack of “optional” passing of
OpenURL elements to Extended Services within LFP SysAdmin only required elements are passed absence of a required element
eliminates the link not really possible to pass an entire
OpenURL to an External Service simply from within SysAdmin
5
Examples of problems ... Journal Titles:
initial articles (Voyager doesn’t handle) quotation marks (Voyager chokes on them) internal dashes (SilverPlatter, et al.) author and other characters at end of title (causes
some title search links to fail) SilverPlatter:
author(s) embedded in the PID rather than in AUTHOR or AULAST
Book titles appear in ATITLE rather than TITLE
&pid=%3CAN%3E0668115%3C/AN%3E%3CAU%3ELedesma-Liebana%2c-Patricia%3C/AU%3E
&pid=<AN>0668115</AN><AU>Ledesma-Liebana,-Patricia</AU>
&aulast=Ledesma-Liebana,-Patricia
Example of SilverPlatter author tagging:
6
7
Examples of problems ... ISI (Web of Science):
journal titles appear only in STITLE, even though not abbreviated
mushes the volume and issue together incorrectly for Physical Review D -- is 4 digits, should be 2 or 3
American Physical Society Journals need to move ARTNUM to SPAGE
GENRE element often missing Only PAGES is supplied, but need SPAGE Need formatted data for later use, to get
around the “only required are passed” problem
8
Possible solution ... Intercept the incoming OpenURL
alter it, augment it send the revised OpenURL to LFP offers a generalized, flexible approach that
can be improved over time Coordinate with what is needed by
individual Extended Services (e.g., OPAC searches, ILL form) Use revised and augmented data supplied
by the pre-processing program
9
Tools & Techniques ... Pre-processor = fairly simple Perl / CGI program
(but could be something else) must be able to receive data from a URL, change it,
and send a new URL elsewhere substitute pre-processor’s URL for the normal
LFP base URL a willingness to fudge with some OpenURL
elements that are infrequently used and not needed by LFP
e.g., a fake BICI element create a log record of each “click”
Source• Index Citation
• Catalog Record
• Footnote
Link Resolver: Parser + Knowledge Base
Standard
Target
pre-processor
Extended
Service
Perl - PHP - etc.
10
(fake) BICI=sid|genre|atitle|full_author|
title|date|volume|issue|spage|
epage|issn|isbn|artnum the above string enables LFP SysAdmin to look only for
the presence of a BICI as a trigger for a particular extended service -- rather than the existence of a set of OpenURL elements
the extended service then has access to all of the above elements that exist
some elements (full_author, spage, epage) sometimes can be derived from others if they do not already exist (“full_author” is a locally-defined tag)
11
http://diglib.ku.edu/cgi-bin/illiad?bici=%BICI%
BICI %BICI%
12
13
14
Log file
DATE+TIME | SID | GENRE | TITLE | DATE | VOLUME | ISSUE | SPAGE | EPAGE | ISSN | ISBN
20040323094818|CAS:CAPLUS|article|Journal of Pharmaceutical Sciences|2003|92|8|1531||0022-3549|
20040323094920|ISI:WoK|article|BIODIVERSITY AND CONSERVATION|2004|13|1|1||0960-3115|
20040323095100|ISI:WoK|article|BIODIVERSITY AND CONSERVATION|2004|13|1|207||0960-3115|
20040323095247|ISI:WoK|article|BIODIVERSITY AND CONSERVATION|2004|13|1|275||0960-3115|
20040323095518|ISI:WoK|article|BASIC AND APPLIED ECOLOGY|2003|4|5|385||1439-1791|
20040323095749|ISI:WoK|article|CONSERVATION ECOLOGY|2002|6|2||14|1195-5449|
20040323095948|ISI:WoK|article|BIOLOGICAL CONSERVATION|2004|115|1|63||0006-3207|
20040323100026|SP:MLAB|article|Russian Studies in Literature|2001|37|3|89||1061-1975|
20040323100112|SP:MLAB|article|Russian Studies in Literature|2003|39|4|66||1061-1975|
20040323100519|ISI:WoK|article|AGRICULTURE ECOSYSTEMS &|2003|98|1-3|331||0167-8809|
20040323101518|SP:PY|article|American Psychologist|1954|9||632||0003-066X|
20040323101611|SP:PY|article|American Psychologist|1957|12||14||0003-066X|
Date / Time: 20040323095247
SID: ISI:WoK
GENRE: article
TITLE: BIODIVERSITY AND CONSERVATION
DATE: 2004
VOLUME: 13
ISSUE: 1
SPAGE: 275
ISBN:
ISSN: 0960-3115
ARTNUM:
20040323095247 | ISI:WoK | article | BIODIVERSITY AND CONSERVATION | 2004 | 13 | 1 | 275 | | 0960-3115|
15
16
March 2004 - “clicked on” titles
ASHP Midyear Clinical Meeting103
Journal of Personality and Social Psychology 88
International Journal of Eating Disorders 74
Social Work 72
Psychological Reports 70
Child Development 67
Journal of the American Academy of Child
and Adolescent Psychiatry 55
Journal of Adolescence 53
Child Abuse and Neglect 48
Addictive Behaviors 48
Journal of Youth and Adolescence 46
Journal of College Student Development 45
Drug Top 43
Child and Adolescent Social Work Journal 42
Journal of Applied Social Psychology 42
American Journal of Psychiatry 42
Journal of Applied Behavior Analysis 41
Perceptual and Motor Skills 41
Adolescence 41
Am J Health Syst Pharm 41
Nature 40
Annals of human biology 40
Smith College Studies in Social Work 40
Human Biology 40
(6,626 other titles with fewer than 40 clicks)
(18,644 clicks altogether)
17
March 2003- “clicked-from” databases
PsycInfo (SilverPlatter) 6931
Eric (CSA) 2128
Social Work Abstracts (SilverPlatter) 1034
MLA Bibliography (SilverPlatter) 1000
IPA (SilverPlatter) 867
SciFinder Scholar: CA Plus 724
Anthropology Plus (RLG) 552
ArticleFirst (OCLC FS) 435
Art Index (SilverPlatter) 429
Biological Abstracts (SilverPlatter) 344
Web of Science (ISI) 342
Sociological Abstracts (CSA) 255
Linguistics and Language Behavior
Abstracts (CSA) 246
Anthropological Index, Royal
Anthropological Institute (RLG) 243
Periodical Abstracts (OCLC FS) 235
GeoRef (SilverPlatter) 210
WorldCat (OCLC FS) 203
America: History and Life (ABC-Clio) 190
Sports Discus (SilverPlatter) 174
Education Abstracts (OCLC FS) 171
Social Service Abstracts (CSA) 154
Compendex (EV2) 145
SciFinder Scholar: Medline 139
Zoological Abstracts 136
PapersFirst (OCLC FS) 131
EconLit (SilverPlatter) 125
(33 other databases with fewer than 40 clicks)
(18,644 clicks altogether)
18
The “PubMed” exception All that comes from PubMed
initially is a PMID (PubMed Identifier)
Can log the identifier and the time, but nothing else
Requires redundant External Services to handle variations
This set-up is combined with custom XML to: (1) suppress duplicates when an incoming OpenURL satisfies more than one condition; and (2) supply a standard phrase
19
<xsl:for-each select="link"><xsl:variable name="link-name" select="name"/>
<xsl:choose><xsl:when test="contains($link-name, 'ILLiad')">
<xsl:choose><xsl:when test="position() = 1">
...<ul class="list">
<li class="list-item"><xsl:variable name="orig-url" select="url"/><xsl:variable name="url"><xsl:value-of select="$orig-url"/></xsl:variable><a target="_blank" href="{$url}">
Request a loan or copy of this item (if not available in the KU Libraries) </a>
...</xsl:when><xsl:otherwise/>
</xsl:choose> ...
from LFPDisplay.xsl
• multiple ILLiad services
• in priority order in SysAdmin
• this shows only the first one with a standard phrase
20
Standard ILL phrase:
“Request a loan or copy ...”
21
22
The Benefits? More full text links work More OPAC title searches work Some impossible services become
possible more importantly, they become consistently possible
Source use statistics are compilable