1 technical workshop london – on line 2004 technical workshop on line november 31, 2004 london
TRANSCRIPT
Technical Workshop London – On Line 2004 2
Agenda
10:00-10:45 System status & issues 10:45 - 11:15 New system features
11:15 - 11:30 XML Interfaces
11:30 - 12:15 New initiatives & schema developments
12:15 - 12:30 Questions ???
Technical Workshop London – On Line 2004 3
System status
Database hardware running at ~ 60% capacity
During peak loads Web server running >90% (single CPU) Dell’s running RH Linux E 3.0 Sun Solaris 9 Migration to Oracle 9i 95% completed Redundant IP handoff from MCI to complete by Dec 2004
Dell 2650
Dell 2650
Dell 2650
Java
Sun V4404 1.28Ghz Sparc III16GB Mem
Database
1 Gb switch100 Mb full dup
Sun 3510 DASFibre channel storage
Cisco PIX
Cisco PIX
Technical Workshop London – On Line 2004 4
Query response times
Single query in a request
Great: 0.5s, Good: 1.5s, Slow: 3.5s, Bad: 6+
Five queries in a request
Great: 2.0s, Good: 5.5s, Slow: 10s, Bad: 15+
We’re investigating the relationship between query request load, deposit processing load and query response time.
We’ll be adding SW load balancing to the Web front end
To help Limit the number of concurrent requests! Place more than one query in a request Use the batch upload
Technical Workshop London – On Line 2004 5
0
5000
10000
15000
20000
25000
30000
35000
40000
Mon-Sun, Oct 4-10
Weekly query load - hourly
Technical Workshop London – On Line 2004 6
Mon-Sun, Oct 11-17
0
5000
10000
15000
20000
25000
30000
35000
40000
Mon
-11:
Hr-00
Mon
-11:
Hr-05
Mon
-11:
Hr-10
Mon
-11:
Hr-15
Mon
-11:
Hr-20
Tue-1
2:Hr-0
1
Tue-1
2:Hr-0
6
Tue-1
2:Hr-1
1
Tue-1
2:Hr-1
6
Tue-1
2:Hr-2
1
Wed-
13:H
r-02
Wed-
13:H
r-07
Wed-
13:H
r-12
Wed-
13:H
r-17
Wed-
13:H
r-22
Thu-1
4:Hr-0
3
Thu-1
4:Hr-0
8
Thu-1
4:Hr-1
3
Thu-1
4:Hr-1
8
Thu-1
4:Hr-2
3
Fri-15
:Hr-0
4
Fri-15
:Hr-0
9
Fri-15
:Hr-1
4
Fri-15
:Hr-1
9
Sat-1
6:Hr-0
0
Sat-1
6:Hr-0
5
Sat-1
6:Hr-1
0
Sat-1
6:Hr-1
5
Sat-1
6:Hr-2
0
Sun-1
7:Hr-0
1
Sun-1
7:Hr-0
6
Sun-1
7:Hr-1
1
Sun-1
7:Hr-1
6
Sun-1
7:Hr-2
1
Mon-Sun, Oct 11-17
Weekly query load - hourly
Technical Workshop London – On Line 2004 7
Mon-Sun, Oct 18-24
0
5000
10000
15000
20000
25000
30000
Mon
-18:
Hr-00
Mon
-18:
Hr-05
Mon
-18:
Hr-10
Mon
-18:
Hr-15
Mon
-18:
Hr-20
Tue-1
9:Hr-0
1
Tue-1
9:Hr-0
6
Tue-1
9:Hr-1
1
Tue-1
9:Hr-1
6
Tue-1
9:Hr-2
1
Wed-
20:H
r-02
Wed-
20:H
r-07
Wed-
20:H
r-12
Wed-
20:H
r-17
Wed-
20:H
r-22
Thu-2
1:Hr-0
3
Thu-2
1:Hr-0
8
Thu-2
1:Hr-1
3
Thu-2
1:Hr-1
8
Thu-2
1:Hr-2
3
Fri-22
:Hr-0
4
Fri-22
:Hr-0
9
Fri-22
:Hr-1
4
Fri-22
:Hr-1
9
Sat-2
3:Hr-0
0
Sat-2
3:Hr-0
5
Sat-2
3:Hr-1
0
Sat-2
3:Hr-1
5
Sat-2
3:Hr-2
0
Sun-2
4:Hr-0
1
Sun-2
4:Hr-0
6
Sun-2
4:Hr-1
1
Sun-2
4:Hr-1
6
Sun-2
4:Hr-2
1
Mon-Sun, Oct 18-24
Weekly query load - hourly
Technical Workshop London – On Line 2004 9
Conflicts
www.crossref.org =>Members Area => System Reports => Conflict Report
Technical Workshop London – On Line 2004 10
=========================================== Created: 2004-10-21 04:38:03.0 ConfID: 139239 CauseID: 110986773 OtherID: 76436491,JT: Scottish Journal of Theology MD: Marsh, 55 ,3,253,2002,In defense of a self: the theological … DOI: 10.1017/S0336930602000313 (139239-null 139291-null )DOI: 10.1017/S0036930602000315 (139239-null 139291-null ) ===========================================
Conflicts
2 DOIs for the same article
The state of the conflict ID null => unresolved
DOIs are in a second conflict
Metadata used for both DOIs
Journal title
Technical Workshop London – On Line 2004 11
Conflicts: What to do about them
Send us an email instructing how to resolve the conflict Make one DOI prime, all others into aliases
Resolve the conflict without doing anything
Resend in one of the DOIs with new (different) metadata (Soon) login to doi.crossref.org and resolve them yourself
Primary DOI DOI to be aliased to primary Conflict IDs10.1016/j.clindermatol.2003.11.001 10.1016/S0738-081X(03)00103-2 10415510.1016/j.clindermatol.2003.12.026 10.1016/S0738-081X(03)00150-0 10415710.1016/j.clindermatol.2003.12.031 10.1016/S0738-081X(03)00153-6 104159
Conflict ID101115103044103048105650
Technical Workshop London – On Line 2004 12
Conflicts: prevent them
<journal_article publication_type="full_text"> <titles><title>Phys. Rev. A</title></titles> <contributors> <person_name sequence="first" contributor_role="author"> <given_name>Petr O.</given_name> <surname>Fedichev</surname> </person_name> <publication_date media_type="online"> <month>04</month> <year>2004</year> </publication_date> <publisher_item> <item_number item_number_type="sequence-number"> PhysRevA.69.049902 </item_number> </publisher_item> <doi_data> <doi>10.1103/PhysRevA.69.049902</doi> <timestamp>20040412120604</timestamp> <resource>http://link.aps.org/doi/10.1103/PhysRevA.69.049902</resource> </doi_data></journal_article>
Technical Workshop London – On Line 2004 13
Issues
Data quality Missing fields (publish ahead of print is OK, but
update when data is available) First author being mixed up with other contributors
Journal titles Full titles in query without ISSN may cause misses Two recent fuzzy match changes have had an effect
1. Eliminated a dangerous rule the could return false positives when title and ISSN did not match well
2. Lowered the threshold on matching long titles
Technical Workshop London – On Line 2004 14
Issues
Depositing a new title If you send in 2 files at the same time with DOIs for a
new title it may result in two title entries in CrossRef
DOIs for journal titles and issues These can be created in the <journal_metadata> and
<issue_metadata> tags
Page numbers with alpha characters ‘S110’ or ‘110S’ is handled better than ’30-1’ 110 in a query will match S110 or 110S 30 in a query will not match 30-1 10.1016/S0003-4975(02)04151-6 10.1029/2002GL014973 20-a should be Ok ’69F-a’ will only match an exact string
Technical Workshop London – On Line 2004 15
Issues
Query results in XML format
servlet/query?usr=<username>&pwd=<password>&
type=<queryType>&format=<resultFormat>&qdata= ….
Result format can be: piped, xml, xsd_xml
xml is the old legacy XML format (no schema)
xsd_xml has a schema and includes all new features
http://www.crossref.org/qrschema/crossref_query_output2.0.xsdUse of the legacy XML format should be discontinued
Technical Workshop London – On Line 2004 16
Issues
Each batch ID / query key combination must be unique.
<citations_diagnostic> <citation key="CR1" status="warning"> A stored query with doi_batch_id=SPI_2004-09-15_09-48-22 and query_key=CR1 already exists for the same depositor </citation>
<head> <doi_batch_id>SPI_2004-09-15_09-48-22</doi_batch_id>
<doi_data> <doi>10.1007/BF00393374</doi> <resource><![CDATA[ http://www.springerlink.com/index/10.1007/BF00393374 ]]> </resource></doi_data><citation_list> <citation key="CR1"> <journal_title>Appl Environ Microbiol</journal_title> <author>RI Amann</author> <volume>56</volume>
Technical Workshop London – On Line 2004 17
Issues
Upload timeouts Large (250K+) files may not be completing the upload No HTTP response is returned Only a few users seem to be effected (some can
upload 1M+ files)
Solutions (& work arounds) Break the files up (some are doing 1 DOI per file) CrossRef to investigate session time outs
Let me know if your having this problem
Technical Workshop London – On Line 2004 18
DX contingency planning
CrossRef will be running a secondary Handle system and a DOI proxy resolver.
The secondary Handle server receives updates from the DOI primary (at CNRI) about 15 minutes after DOIs are created/updated by CrossRef
The proxy will share the load going to http://dx.doi.org
(DNS subdomain will direct traffic to several IPs)
deposit1
3
2
Technical Workshop London – On Line 2004 19
New features
Unified query
Tracking ID
Open Channel Interface
Forward linking
Local hosting changes
Technical Workshop London – On Line 2004 20
Unified query
Journals, conf. proceedings and books have different metadata => queries must examine different fields
… and it gets worse Proceedings have event name and an event acronym Proceedings have event date and publication date Proceedings and Books have ISBNs and/or ISSNs
The real problem is: its hard to tell from a reference what kind of item is being referenced
Technical Workshop London – On Line 2004 21
Unified query
The solution is to have one query that examines everything and returns the right result
Step 1: change the current ‘journal’ query to have the ‘title’ field also examine proceedings event name and event acronym and the ‘issn’ field examine proceedings ISSNs
0277786X|Proceedings of SPIE||4272||133|2001|||
0277786X||Proceedings of SPIE |Srinivasan|4272||133|2001||full_text ||10.1117/12.430790
‘journal’ query: only one title
‘proceedings’ result: two title field (series is empty)
Technical Workshop London – On Line 2004 22
Tracking IDs
http://doi.crossref.org/servlet/submissionDownload?usr=<USR>&pwd=<PWD>&doi_batch_id=NJ028011-b406513a&type=result
http://doi.crossref.org/servlet/submissionDownload?usr=<USR>&pwd=<PWD>&file_name=b406513a_doi.xml&type=result
http://doi.crossref.org/servlet/submissionDownload?usr=<USR>&pwd=<PWD>&file_name=b406513a_doi.xml&type=contents
OR
Returns the log file
Returns the XML deposit file
Technical Workshop London – On Line 2004 23
Open Channel Interface
‘Premium’ fee has been dropped Available on a case-by-case basis Continuous connection to CrossRef for pipe’d queries Response time can by 10X better than HTTP queries
import java.net.*;import java.io.*;Socket socket;PrintWriter out;BufferedReader in;
socket = new Socket(host, port);out = new PrintWriter(socket.getOutputStream(), true);in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
out.println(qData);String line = in.readLine();
Technical Workshop London – On Line 2004 24
Forward Linking
1. Send them in with the article’s metadata2. Send them in separately after an article’s DOI
and metadata are deposited
Forward linking deposits are simply the list of references listed in the bibliography You most likely already send this data to CrossRef in the form of queries (in fact reference deposits look very much like queries) There are two ways to deposit references for an article
Technical Workshop London – On Line 2004 26
<?xml version="1.0" encoding="UTF-8"?><doi_batch_diagnostic status="completed"> <submission_id>115276193</submission_id> <batch_id>4219-com.wiley.cch.processes.JournalToDOI16047.xref</batch_id> <record_diagnostic status="Success"> <doi>10.1002/(ISSN)1097-0134</doi> <msg>Successfully updated in handle</msg> </record_diagnostic> <record_diagnostic status="Success"> <doi>10.1002/prot.20276</doi> <msg>Successfully added</msg> <citations_diagnostic> <citation key="10.1002/prot.20276-BIB1" status="stored_query" /> <citation key="10.1002/prot.20276-BIB2" status="resolved_reference">10.1006/jsbi.2001.4428</citation> <citation key="10.1002/prot.20276-BIB3“ status="resolved_reference">10.1110/ps.0227803</citation> <citation key="10.1002/prot.20276-BIB4“ status="resolved_reference">10.1006/jmbi.1990.9999</citation> <citation key="10.1002/prot.20276-BIB5" status="stored_query" /> <citation key="10.1002/prot.20276-BIB6" status="stored_query" /> <citation key="10.1002/prot.20276-BIB7" status="stored_query" />
Forward Linking – deposit log
Technical Workshop London – On Line 2004 27
Forward Linking – query
<?xml version = "1.0" encoding="UTF-8"?><query_batch version="2.0" xmlns = "http://www.crossref.org/qschema/2.0"> <head> <email_address>[email protected]</email_address> <doi_batch_id>fl_001</doi_batch_id> </head> <body> <fl_query alert='false'> <doi>10.1110/ps.0227803</doi> </fl_query> </body></query_batch>
Note: only user ‘coldspring’ can run this query
Technical Workshop London – On Line 2004 29
Multiple resolution
Multiple resolution presents choices to the user from the site of the link
An XML CrossRef deposit sets up the menu and multiple links (sample)
Technical Workshop London – On Line 2004 30
Multiple resolution - deposit
Normal link is built with the <a> (anchor) tag
<a href="http://dx.doi.org/10.5555/sample-doi">The Link Text</a>
Multiple resolution link is built with the <script> tag
One instance of <script> to load the menu library<script src="http://www.crossref.org/MRLoader/milonic_src.js"></script>
For each link<script src="http://www.crossref.org/MRLoader/MR/ 10.5555/sample-doi?The%20Link%20Text"></script>
The menu builder code
The DOI The link text
Technical Workshop London – On Line 2004 31
Multiple resolution - deployment
Multiple resolution deployment requires three things: 1. Registration of multiple targets for a given DOI
2. Operation of the MRLoader resolver
3. Construction of MR links on Web pages
Everyone has a part to play
1. Publishers that ‘own’ the target DOI must implement (or authorize a 3rd party) to register multiple targets
2. CrossRef and/or the content owner publisher must operate the MRLoader resolver
3. Every Web page that links to the MR enabled DOI must replace <a> tags with <script> tags
Technical Workshop London – On Line 2004 32
Web Deposit Form Allows users to enter the metadata for a deposit
using a Web form. No XML skills required
Supports journal articles, now working to add conference proceedings and books. Later, will add reference deposits and components
Must know your CrossRef member login
www.crossref.org =>Member Area => Member Resources => web deposit form
http://www.crossref.org/webDeposit
Technical Workshop London – On Line 2004 34
XML Queries
1. Enable multiple hits2. Control over which fields are fuzzy matched3. Forward linking queries4. Query match alerts
• XML Queries provide a more structured format and enable features unavailable in pipe’d queries
Technical Workshop London – On Line 2004 35
<?xml version = "1.0" encoding="UTF-8"?><query_batch version="1.0" xmlns = "http://www.crossref.org/qschema/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <head> <email_address>[email protected]</email_address> <doi_batch_id>SomeTrackingID2</doi_batch_id> </head> <body> <query key="MyKey1" enable-multiple-hits="false“ forward-match=“false”> <issn>10408746</issn> <journal_title>Current Opinion in Oncology</journal_title> <author>Chauncey</author> <volume>13</volume> <issue>1</issue> <first_page>21</first_page> <year>2001</year> </query> </body></query_batch>
Metadata query
•Order is important•Fields can be omitted
Technical Workshop London – On Line 2004 36
Fuzzy match control
• Fields with a “match” attribute can be controlled
ISSN: optional, exact Journal/Volume: optional, fuzzy, exact (default="fuzzy“) Series Title: optional, null, fuzzy, exact (default="fuzzy“) Author: optional, fuzzy, null, exact (default="fuzzy“) Volume: optional, fuzzy, exact (default="fuzzy") Issue: optional, fuzzy,exact (default="fuzzy“) Page: optional, null,exact (default="optional“)
Example: <journal_title match=“exact”>Current Opinion in Oncology</journal_title>
Technical Workshop London – On Line 2004 37
A word on special characters
Arrggghh…
• Metadata deposits are supposed to be UTF-8 Unicode é = é (decimal) = é (hex)
10.5555/char_test_001Issn:12345678 Title: Test Publication Author: Joénes Volume: 12 Issue: 1 Page: S125 Year: 1999
<journal_title>Test Publication</journal_title> <author>Joenes</author> <volume>12</volume> <first_page>125</first_page> <year>1999</year>
<journal_title>Test Publication</journal_title> <author>Joenes</author> <volume>12</volume> <year>1999</year>
<journal_title>Test Publication</journal_title> <author>Joénes</author> <volume>12</volume> <year>1999</year>
Works because page is supplied
DoesNOTwork
Works because correct author is supplied
Queries
Technical Workshop London – On Line 2004 38
Stored Queries
• CrossRef remembers queries that do not initially match and sends an email notice when the finally do.
<?xml version = "1.0" encoding="UTF-8"?><query_batch version="1.0" xmlns = "http://www.crossref.o…"> <head> <email_address>[email protected]</email_address> <doi_batch_id>fm_429_001</doi_batch_id> </head> <body> <query key="fm_1" enable-multiple-hits="false“ forward-match="true"> <journal_title>Test Publication</journal_title> <author>Anderson</author> <volume>33</volume> <issue>9</issue> <first_page>125</first_page> <year>2002</year> </query> </body></query_batch>
<?xml version = "1.0" encoding="UTF-8"?><query_batch version="1.0" xmlns = "http://www.crossref.o…"> <head> <email_address>[email protected]</email_address> <doi_batch_id>fm_429_001</doi_batch_id> </head> <body> <query key="fm_1" enable-multiple-hits="false“ forward-match="true"> <journal_title>Test Publication</journal_title> <author>Anderson</author> <volume>33</volume> <issue>9</issue> <first_page>125</first_page> <year>2002</year> </query> </body></query_batch>
Technical Workshop London – On Line 2004 39
Log message when query is submitted
<?xml version="1.0" encoding="UTF-8" ?> <crossref_result version="2.0" xmlns="http://www.crossref.org/qrschema/2.0" …"> <query_result> <head> <email_address>[email protected]</email_address> <doi_batch_id>fm_429_001</doi_batch_id> </head> <body> <query status="unresolved"> <journal_title>Test Publication</journal_title> <author>Anderson</author> <volume>33</volume> <issue>9</issue> <first_page>125</first_page> <year>2002</year> <msg>Query stored in CrossRef for forward matching</msg> </query> </body> </query_result></crossref_result>
<?xml version="1.0" encoding="UTF-8" ?> <crossref_result version="2.0" xmlns="http://www.crossref.org/qrschema/2.0" …"> <query_result> <head> <email_address>[email protected]</email_address> <doi_batch_id>fm_429_001</doi_batch_id> </head> <body> <query status="unresolved"> <journal_title>Test Publication</journal_title> <author>Anderson</author> <volume>33</volume> <issue>9</issue> <first_page>125</first_page> <year>2002</year> <msg>Query stored in CrossRef for forward matching</msg> </query> </body> </query_result></crossref_result>
Technical Workshop London – On Line 2004 40
Results email
<?xml version = "1.0" encoding = "UTF-8"?><crossref_result version="2.0" xmlns="http://www….-instance… "> <query_result>
<head> <email_address>[email protected]</email_address> <doi_batch_id> fm_429_001 </doi_batch_id></head><body> <query key=“fm_1" status="resolved"> <doi>10.5555/forward_match_test_2</doi> <issn>12345678</issn> <journal_title match="exact">Test Publication</journal_title> <author match="exact">Smith</author> <volume match="exact">3</volume> <issue>2</issue> <first_page match="exact">100</first_page> <year match="exact">1985</year> <publication_type>full_text</publication_type> </query></body>
</query_result></crossref_result>
<?xml version = "1.0" encoding = "UTF-8"?><crossref_result version="2.0" xmlns="http://www….-instance… "> <query_result>
<head> <email_address>[email protected]</email_address> <doi_batch_id> fm_429_001 </doi_batch_id></head><body> <query key=“fm_1" status="resolved"> <doi>10.5555/forward_match_test_2</doi> <issn>12345678</issn> <journal_title match="exact">Test Publication</journal_title> <author match="exact">Smith</author> <volume match="exact">3</volume> <issue>2</issue> <first_page match="exact">100</first_page> <year match="exact">1985</year> <publication_type>full_text</publication_type> </query></body>
</query_result></crossref_result>
Subject: Crossref stored query match: doi_batch_id= fm_429_001 ; query_key= fm_1
Technical Workshop London – On Line 2004 41
http://doi.crossref.org/servlet/downloadStoredQueries?usr=creftest&pwd=c53test&startDate=2004-03-31&endDate=2004-05-03
Polling for Query Matches
• You can interrogate the system to get a list of queries that may have matched.
Technical Workshop London – On Line 2004 42
Forward Linking Queries
• Forward linking is an ‘opt-in’ service
• Fees: a surcharge on the annual membership
• Permission must be enabled by a CrossRef administrator
Technical Workshop London – On Line 2004 43
Sample: forward linking query results
<?xml version = "1.0" encoding="UTF-8"?><query_batch version="2.0" xmlns = "http://www.crossref.org/qschema/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crossref.org/qschema/2.0 http://www.crossref.org/qschema/crossref_query_input2.0.xsd"> <head> <email_address>[email protected]</email_address> <doi_batch_id>fl_001</doi_batch_id> </head> <body> <fl_query alert="true"> <doi>10.1097/00001622-200101000-00005</doi> </fl_query> </body></query_batch>
Forward Linking Query Example
Reference deposit #1 (log) Reference deposit #2 (log)
Technical Workshop London – On Line 2004 44
Forward Linking Alerts
• Once you’ve made a forward link query, deposit of any new articles that cite the DOI you requested will generate an alert email
Reference deposit #3 (log)
Email alert notice
Technical Workshop London – On Line 2004 45
New Initiatives
Components
Extended content types
Plans for 2005
Technical Workshop London – On Line 2004 46
Component Deposits
What is a component ?
Components are considered to be sub-itemsthat are part of the construction of an article,chapter or conference paper or providesupporting (sometimes called supplemental)information.
These items by and of themselves are not typicallycited in a bibliography, but they are cited within the text
NOTE: Title DOIs and Issue DOIs are not components
They should be deposited in the journal, conf-proc or book metadata
Technical Workshop London – On Line 2004 47
Component Deposits
Why create DOIs for components ? To improve link management Build persistent links Use multiple resolution on them
How are components deposited Schema version 3.0.3 supports components Deposit as part of an article’s metadata or
standalone (note: a parent DOI must be specified)
Technical Workshop London – On Line 2004 48
Component Deposits
What Component services will CrossRef offer? Near term
Just the registration of the DOI
Long term
Some form of lookup service (e.g. query)
Expanded component metadata
(licensing, copyright …?)
Technical Workshop London – On Line 2004 49
<journal_article> ... <doi_data> <doi>10.9876/S0003695199019014</doi> <resource>http://ojps.aip.org:18000/link/?apl/74/1/76/ab</resource> </doi_data> <component parent_relation="isPartOf"> <description><b>Figure 1:</b> This is the caption of the first figure...</description> <format mime_type="image/jpeg">Web resolution image</format> <doi_data> <doi>10.9876/S0003695199019014/f1</doi> <resource>http://ojps.aip.org:18000/link/?apl/74/1/76/f1</resource> </doi_data> </component> <component parent_relation="isReferencedBy"> <description><b>Video 1:</b> This is a description of the video...</description> <format mime_type="video/mpeg"/> <doi_data> <doi>10.9876/S0003695199019014/video1</doi> <resource>http://ojps.aip.org:18000/link/?apl/74/1/76/video1</resource> </doi_data> </component></journal_article>
Component Deposits
Technical Workshop London – On Line 2004 50
Component Deposits
<body> <sa_component> <doi>10.9876/molcell/10/4</doi> <component parent_relation="isPartOf"> <description>Cover Image, Molecular Cell, Volume 10, Issue 4, January 2004 </description> <format mime_type="image/tiff"/> <doi_data> <doi>10.9876/molcell/10/4/cover</doi> <resource>http://molcell.org/10/4/cover</resource> </doi_data> </component> </sa_component></body>
Alternatively components may be deposited separately from their ‘parent’ item’s metadata
Technical Workshop London – On Line 2004 51
Expanded Content Types
Metadata study now underway Dissertations, technical reports, working papers, standards, patents and databases
Implementation to occur in early 2005
XML schema will update to version 4.0
Deposits
Query services
Extend current query mechanism ? ‘Firewall’ current content ?
Technical Workshop London – On Line 2004 52
Expanded Content Types
Add <advisor> elements Review NDLTD metadata
standards Survey T & D organizations
Cal Tech ProQuest Texas A&M ?
<dissertation> <person_name> <titles> <acceptance_date> <university> <name> <location> <department> <degree> <publisher_item> <doi_data>
Dissertations
Technical Workshop London – On Line 2004 53
Expanded Content Types
Drop ‘technical’ label Support chapters? Survey organizations
AGU NASA/JPL Other government? ?
<report> <contributors> <titles> <publisher. <publication_date> <publisher_item> <series_metadata> <isbn> <issn> <research_organization> <sponsor> <organization> <contract> <doi_data>
Reports
Technical Workshop London – On Line 2004 54
Expanded Content Types
Conflicts with published articles Include series metadata? Survey organizations
?
<report> <contributors> <titles> <publisher> or <university> <publication_date> <publisher_item> <series_metadata> <isbn> <issn> <research_organization> <sponsor> <organization> <contract> <doi_data>
Working papers
Technical Workshop London – On Line 2004 55
Expanded Content Types
Not included in initial analysis, added after annual member meeting
Interest from IEEE Metadata draft development TBD Accredited Vs consortium standards Survey organizations
Niso, ANSI, BSI, ISO IEEE ConsortiumInfo.org
Standards
Technical Workshop London – On Line 2004 56
Plans for 2005
Modify / improve page number processing
Normalized XML
Modularize CrossRef system
Implement Expanded Content Types
Others
Technical Workshop London – On Line 2004 57
Page & article numbers
CrossRef deposit schema allows for first page and article number <pages><first_page> <publisher_item><item_number item_number_type="article-number">
Article number will be used if no first page is provided a query has only one ‘page’ field and will search either first_page or article_number but not both
Some articles have both: both are presented to the reader change the query logic to search both fields add and XML query field for ‘article_number’
Page numbers (and article numbers?) are not numbers
would a full fuzzy match on page improve matching rates?
Technical Workshop London – On Line 2004 58
Normalized XML
Journal, proceedings and book content is stored in 2 places in the CrossRef database
1. Subset in tables/columns to support query operations
2. Entire deposit as a CLOB (not easily accessed)
XML query results are specialized for each content type (<journal_cite><conf_cite><book_cite>)
Reduce all content type info to a simpler ‘one size fits all’ schema
Store each DOI record as XML in a database column (memo?)
Facilitates access to all metadata (e.g. complete ‘lite’ weight local host files)
Yield a more consistent XML query result
Technical Workshop London – On Line 2004 59
Modularize
Current system is a monolith One database supports everything
Separate operations to improve performance and scalability Deposits & updates Queries Reports
Additional benefits Local host the CrossRef query system, not just the metadata
Technical Workshop London – On Line 2004 60
Other
Components Query mechanism Expand the metadata (license, rights …)
Production implementation of multiple resolution Integrate into the deposit process Implement a local host type option Automatic appropriate copy service