1 technical workshop london – on line 2004 technical workshop on line november 31, 2004 london

61
Technical Workshop London – On Line 2004 1 Technical Workshop On Line November 31, 2004 London

Upload: amberly-todd

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Technical Workshop London – On Line 2004 1

Technical WorkshopOn Line

November 31, 2004

London

Technical Workshop London – On Line 2004 2

Agenda

10:00-10:45 System status & issues 10:45 - 11:15 New system features

11:15 - 11:30  XML Interfaces

11:30 - 12:15  New initiatives & schema developments

12:15 -  12:30 Questions ???

Technical Workshop London – On Line 2004 3

System status

Database hardware running at ~ 60% capacity

During peak loads Web server running >90% (single CPU) Dell’s running RH Linux E 3.0 Sun Solaris 9 Migration to Oracle 9i 95% completed Redundant IP handoff from MCI to complete by Dec 2004

Dell 2650

Dell 2650

Dell 2650

Java

Sun V4404 1.28Ghz Sparc III16GB Mem

Database

1 Gb switch100 Mb full dup

Sun 3510 DASFibre channel storage

Cisco PIX

Cisco PIX

Technical Workshop London – On Line 2004 4

Query response times

Single query in a request

Great: 0.5s, Good: 1.5s, Slow: 3.5s, Bad: 6+

Five queries in a request

Great: 2.0s, Good: 5.5s, Slow: 10s, Bad: 15+

We’re investigating the relationship between query request load, deposit processing load and query response time.

We’ll be adding SW load balancing to the Web front end

To help Limit the number of concurrent requests! Place more than one query in a request Use the batch upload

Technical Workshop London – On Line 2004 5

0

5000

10000

15000

20000

25000

30000

35000

40000

Mon-Sun, Oct 4-10

Weekly query load - hourly

Technical Workshop London – On Line 2004 6

Mon-Sun, Oct 11-17

0

5000

10000

15000

20000

25000

30000

35000

40000

Mon

-11:

Hr-00

Mon

-11:

Hr-05

Mon

-11:

Hr-10

Mon

-11:

Hr-15

Mon

-11:

Hr-20

Tue-1

2:Hr-0

1

Tue-1

2:Hr-0

6

Tue-1

2:Hr-1

1

Tue-1

2:Hr-1

6

Tue-1

2:Hr-2

1

Wed-

13:H

r-02

Wed-

13:H

r-07

Wed-

13:H

r-12

Wed-

13:H

r-17

Wed-

13:H

r-22

Thu-1

4:Hr-0

3

Thu-1

4:Hr-0

8

Thu-1

4:Hr-1

3

Thu-1

4:Hr-1

8

Thu-1

4:Hr-2

3

Fri-15

:Hr-0

4

Fri-15

:Hr-0

9

Fri-15

:Hr-1

4

Fri-15

:Hr-1

9

Sat-1

6:Hr-0

0

Sat-1

6:Hr-0

5

Sat-1

6:Hr-1

0

Sat-1

6:Hr-1

5

Sat-1

6:Hr-2

0

Sun-1

7:Hr-0

1

Sun-1

7:Hr-0

6

Sun-1

7:Hr-1

1

Sun-1

7:Hr-1

6

Sun-1

7:Hr-2

1

Mon-Sun, Oct 11-17

Weekly query load - hourly

Technical Workshop London – On Line 2004 7

Mon-Sun, Oct 18-24

0

5000

10000

15000

20000

25000

30000

Mon

-18:

Hr-00

Mon

-18:

Hr-05

Mon

-18:

Hr-10

Mon

-18:

Hr-15

Mon

-18:

Hr-20

Tue-1

9:Hr-0

1

Tue-1

9:Hr-0

6

Tue-1

9:Hr-1

1

Tue-1

9:Hr-1

6

Tue-1

9:Hr-2

1

Wed-

20:H

r-02

Wed-

20:H

r-07

Wed-

20:H

r-12

Wed-

20:H

r-17

Wed-

20:H

r-22

Thu-2

1:Hr-0

3

Thu-2

1:Hr-0

8

Thu-2

1:Hr-1

3

Thu-2

1:Hr-1

8

Thu-2

1:Hr-2

3

Fri-22

:Hr-0

4

Fri-22

:Hr-0

9

Fri-22

:Hr-1

4

Fri-22

:Hr-1

9

Sat-2

3:Hr-0

0

Sat-2

3:Hr-0

5

Sat-2

3:Hr-1

0

Sat-2

3:Hr-1

5

Sat-2

3:Hr-2

0

Sun-2

4:Hr-0

1

Sun-2

4:Hr-0

6

Sun-2

4:Hr-1

1

Sun-2

4:Hr-1

6

Sun-2

4:Hr-2

1

Mon-Sun, Oct 18-24

Weekly query load - hourly

Technical Workshop London – On Line 2004 8

Batch processing times

Technical Workshop London – On Line 2004 9

Conflicts

www.crossref.org =>Members Area => System Reports => Conflict Report

Technical Workshop London – On Line 2004 10

=========================================== Created: 2004-10-21 04:38:03.0 ConfID: 139239 CauseID: 110986773 OtherID: 76436491,JT: Scottish Journal of Theology MD: Marsh, 55 ,3,253,2002,In defense of a self: the theological … DOI: 10.1017/S0336930602000313 (139239-null 139291-null )DOI: 10.1017/S0036930602000315 (139239-null 139291-null ) ===========================================

Conflicts

2 DOIs for the same article

The state of the conflict ID null => unresolved

DOIs are in a second conflict

Metadata used for both DOIs

Journal title

Technical Workshop London – On Line 2004 11

Conflicts: What to do about them

Send us an email instructing how to resolve the conflict Make one DOI prime, all others into aliases

Resolve the conflict without doing anything

Resend in one of the DOIs with new (different) metadata (Soon) login to doi.crossref.org and resolve them yourself

Primary DOI DOI to be aliased to primary Conflict IDs10.1016/j.clindermatol.2003.11.001 10.1016/S0738-081X(03)00103-2 10415510.1016/j.clindermatol.2003.12.026 10.1016/S0738-081X(03)00150-0 10415710.1016/j.clindermatol.2003.12.031 10.1016/S0738-081X(03)00153-6 104159

Conflict ID101115103044103048105650

Technical Workshop London – On Line 2004 12

Conflicts: prevent them

<journal_article publication_type="full_text"> <titles><title>Phys. Rev. A</title></titles> <contributors> <person_name sequence="first" contributor_role="author"> <given_name>Petr O.</given_name> <surname>Fedichev</surname> </person_name> <publication_date media_type="online"> <month>04</month> <year>2004</year> </publication_date> <publisher_item> <item_number item_number_type="sequence-number"> PhysRevA.69.049902 </item_number> </publisher_item> <doi_data> <doi>10.1103/PhysRevA.69.049902</doi> <timestamp>20040412120604</timestamp> <resource>http://link.aps.org/doi/10.1103/PhysRevA.69.049902</resource> </doi_data></journal_article>

Technical Workshop London – On Line 2004 13

Issues

Data quality Missing fields (publish ahead of print is OK, but

update when data is available) First author being mixed up with other contributors

Journal titles Full titles in query without ISSN may cause misses Two recent fuzzy match changes have had an effect

1. Eliminated a dangerous rule the could return false positives when title and ISSN did not match well

2. Lowered the threshold on matching long titles

Technical Workshop London – On Line 2004 14

Issues

Depositing a new title If you send in 2 files at the same time with DOIs for a

new title it may result in two title entries in CrossRef

DOIs for journal titles and issues These can be created in the <journal_metadata> and

<issue_metadata> tags

Page numbers with alpha characters ‘S110’ or ‘110S’ is handled better than ’30-1’ 110 in a query will match S110 or 110S 30 in a query will not match 30-1 10.1016/S0003-4975(02)04151-6 10.1029/2002GL014973 20-a should be Ok ’69F-a’ will only match an exact string

Technical Workshop London – On Line 2004 15

Issues

Query results in XML format

servlet/query?usr=<username>&pwd=<password>&

type=<queryType>&format=<resultFormat>&qdata= ….

Result format can be: piped, xml, xsd_xml

xml is the old legacy XML format (no schema)

xsd_xml has a schema and includes all new features

http://www.crossref.org/qrschema/crossref_query_output2.0.xsdUse of the legacy XML format should be discontinued

Technical Workshop London – On Line 2004 16

Issues

Each batch ID / query key combination must be unique.

<citations_diagnostic> <citation key="CR1" status="warning"> A stored query with doi_batch_id=SPI_2004-09-15_09-48-22 and query_key=CR1 already exists for the same depositor </citation>

<head> <doi_batch_id>SPI_2004-09-15_09-48-22</doi_batch_id>

<doi_data> <doi>10.1007/BF00393374</doi> <resource><![CDATA[ http://www.springerlink.com/index/10.1007/BF00393374 ]]> </resource></doi_data><citation_list> <citation key="CR1"> <journal_title>Appl Environ Microbiol</journal_title> <author>RI Amann</author> <volume>56</volume>

Technical Workshop London – On Line 2004 17

Issues

Upload timeouts Large (250K+) files may not be completing the upload No HTTP response is returned Only a few users seem to be effected (some can

upload 1M+ files)

Solutions (& work arounds) Break the files up (some are doing 1 DOI per file) CrossRef to investigate session time outs

Let me know if your having this problem

Technical Workshop London – On Line 2004 18

DX contingency planning

CrossRef will be running a secondary Handle system and a DOI proxy resolver.

The secondary Handle server receives updates from the DOI primary (at CNRI) about 15 minutes after DOIs are created/updated by CrossRef

The proxy will share the load going to http://dx.doi.org

(DNS subdomain will direct traffic to several IPs)

deposit1

3

2

Technical Workshop London – On Line 2004 19

New features

Unified query

Tracking ID

Open Channel Interface

Forward linking

Local hosting changes

Technical Workshop London – On Line 2004 20

Unified query

Journals, conf. proceedings and books have different metadata => queries must examine different fields

… and it gets worse Proceedings have event name and an event acronym Proceedings have event date and publication date Proceedings and Books have ISBNs and/or ISSNs

The real problem is: its hard to tell from a reference what kind of item is being referenced

Technical Workshop London – On Line 2004 21

Unified query

The solution is to have one query that examines everything and returns the right result

Step 1: change the current ‘journal’ query to have the ‘title’ field also examine proceedings event name and event acronym and the ‘issn’ field examine proceedings ISSNs

0277786X|Proceedings of SPIE||4272||133|2001|||

0277786X||Proceedings of SPIE |Srinivasan|4272||133|2001||full_text ||10.1117/12.430790

‘journal’ query: only one title

‘proceedings’ result: two title field (series is empty)

Technical Workshop London – On Line 2004 22

Tracking IDs

http://doi.crossref.org/servlet/submissionDownload?usr=<USR>&pwd=<PWD>&doi_batch_id=NJ028011-b406513a&type=result

http://doi.crossref.org/servlet/submissionDownload?usr=<USR>&pwd=<PWD>&file_name=b406513a_doi.xml&type=result

http://doi.crossref.org/servlet/submissionDownload?usr=<USR>&pwd=<PWD>&file_name=b406513a_doi.xml&type=contents

OR

Returns the log file

Returns the XML deposit file

Technical Workshop London – On Line 2004 23

Open Channel Interface

‘Premium’ fee has been dropped Available on a case-by-case basis Continuous connection to CrossRef for pipe’d queries Response time can by 10X better than HTTP queries

import java.net.*;import java.io.*;Socket socket;PrintWriter out;BufferedReader in;

socket = new Socket(host, port);out = new PrintWriter(socket.getOutputStream(), true);in = new BufferedReader(new InputStreamReader(socket.getInputStream()));

out.println(qData);String line = in.readLine();

Technical Workshop London – On Line 2004 24

Forward Linking

1. Send them in with the article’s metadata2. Send them in separately after an article’s DOI

and metadata are deposited

Forward linking deposits are simply the list of references listed in the bibliography You most likely already send this data to CrossRef in the form of queries (in fact reference deposits look very much like queries) There are two ways to deposit references for an article

Technical Workshop London – On Line 2004 25

Technical Workshop London – On Line 2004 26

<?xml version="1.0" encoding="UTF-8"?><doi_batch_diagnostic status="completed"> <submission_id>115276193</submission_id> <batch_id>4219-com.wiley.cch.processes.JournalToDOI16047.xref</batch_id> <record_diagnostic status="Success"> <doi>10.1002/(ISSN)1097-0134</doi> <msg>Successfully updated in handle</msg> </record_diagnostic> <record_diagnostic status="Success"> <doi>10.1002/prot.20276</doi> <msg>Successfully added</msg> <citations_diagnostic> <citation key="10.1002/prot.20276-BIB1" status="stored_query" /> <citation key="10.1002/prot.20276-BIB2" status="resolved_reference">10.1006/jsbi.2001.4428</citation> <citation key="10.1002/prot.20276-BIB3“ status="resolved_reference">10.1110/ps.0227803</citation> <citation key="10.1002/prot.20276-BIB4“ status="resolved_reference">10.1006/jmbi.1990.9999</citation> <citation key="10.1002/prot.20276-BIB5" status="stored_query" /> <citation key="10.1002/prot.20276-BIB6" status="stored_query" /> <citation key="10.1002/prot.20276-BIB7" status="stored_query" />

Forward Linking – deposit log

Technical Workshop London – On Line 2004 27

Forward Linking – query

<?xml version = "1.0" encoding="UTF-8"?><query_batch version="2.0" xmlns = "http://www.crossref.org/qschema/2.0"> <head> <email_address>[email protected]</email_address> <doi_batch_id>fl_001</doi_batch_id> </head> <body> <fl_query alert='false'> <doi>10.1110/ps.0227803</doi> </fl_query> </body></query_batch>

Note: only user ‘coldspring’ can run this query

Technical Workshop London – On Line 2004 28

Technical Workshop London – On Line 2004 29

Multiple resolution

Multiple resolution presents choices to the user from the site of the link

An XML CrossRef deposit sets up the menu and multiple links (sample)

Technical Workshop London – On Line 2004 30

Multiple resolution - deposit

Normal link is built with the <a> (anchor) tag

<a href="http://dx.doi.org/10.5555/sample-doi">The Link Text</a>

Multiple resolution link is built with the <script> tag

One instance of <script> to load the menu library<script src="http://www.crossref.org/MRLoader/milonic_src.js"></script>

For each link<script src="http://www.crossref.org/MRLoader/MR/ 10.5555/sample-doi?The%20Link%20Text"></script>

The menu builder code

The DOI The link text

Technical Workshop London – On Line 2004 31

Multiple resolution - deployment

Multiple resolution deployment requires three things: 1. Registration of multiple targets for a given DOI

2. Operation of the MRLoader resolver

3. Construction of MR links on Web pages

Everyone has a part to play

1. Publishers that ‘own’ the target DOI must implement (or authorize a 3rd party) to register multiple targets

2. CrossRef and/or the content owner publisher must operate the MRLoader resolver

3. Every Web page that links to the MR enabled DOI must replace <a> tags with <script> tags

Technical Workshop London – On Line 2004 32

Web Deposit Form Allows users to enter the metadata for a deposit

using a Web form. No XML skills required

Supports journal articles, now working to add conference proceedings and books. Later, will add reference deposits and components

Must know your CrossRef member login

www.crossref.org =>Member Area => Member Resources => web deposit form

http://www.crossref.org/webDeposit

Technical Workshop London – On Line 2004 33

Technical Workshop London – On Line 2004 34

XML Queries

1. Enable multiple hits2. Control over which fields are fuzzy matched3. Forward linking queries4. Query match alerts

• XML Queries provide a more structured format and enable features unavailable in pipe’d queries

Technical Workshop London – On Line 2004 35

<?xml version = "1.0" encoding="UTF-8"?><query_batch version="1.0" xmlns = "http://www.crossref.org/qschema/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <head> <email_address>[email protected]</email_address> <doi_batch_id>SomeTrackingID2</doi_batch_id> </head> <body> <query key="MyKey1" enable-multiple-hits="false“ forward-match=“false”> <issn>10408746</issn> <journal_title>Current Opinion in Oncology</journal_title> <author>Chauncey</author> <volume>13</volume> <issue>1</issue> <first_page>21</first_page> <year>2001</year> </query> </body></query_batch>

Metadata query

•Order is important•Fields can be omitted

Technical Workshop London – On Line 2004 36

Fuzzy match control

• Fields with a “match” attribute can be controlled

ISSN: optional, exact Journal/Volume: optional, fuzzy, exact (default="fuzzy“) Series Title: optional, null, fuzzy, exact (default="fuzzy“) Author: optional, fuzzy, null, exact (default="fuzzy“) Volume: optional, fuzzy, exact (default="fuzzy") Issue: optional, fuzzy,exact (default="fuzzy“) Page: optional, null,exact (default="optional“)

Example: <journal_title match=“exact”>Current Opinion in Oncology</journal_title>

Technical Workshop London – On Line 2004 37

A word on special characters

Arrggghh…

• Metadata deposits are supposed to be UTF-8 Unicode é = &#233; (decimal) = &#xE9; (hex)

10.5555/char_test_001Issn:12345678 Title: Test Publication Author: Joénes Volume: 12 Issue: 1 Page: S125 Year: 1999

<journal_title>Test Publication</journal_title> <author>Joenes</author> <volume>12</volume> <first_page>125</first_page> <year>1999</year>

<journal_title>Test Publication</journal_title> <author>Joenes</author> <volume>12</volume> <year>1999</year>

<journal_title>Test Publication</journal_title> <author>Jo&#233;nes</author> <volume>12</volume> <year>1999</year>

Works because page is supplied

DoesNOTwork

Works because correct author is supplied

Queries

Technical Workshop London – On Line 2004 38

Stored Queries

• CrossRef remembers queries that do not initially match and sends an email notice when the finally do.

<?xml version = "1.0" encoding="UTF-8"?><query_batch version="1.0" xmlns = "http://www.crossref.o…"> <head> <email_address>[email protected]</email_address> <doi_batch_id>fm_429_001</doi_batch_id> </head> <body> <query key="fm_1" enable-multiple-hits="false“ forward-match="true"> <journal_title>Test Publication</journal_title> <author>Anderson</author> <volume>33</volume> <issue>9</issue> <first_page>125</first_page> <year>2002</year> </query> </body></query_batch>

<?xml version = "1.0" encoding="UTF-8"?><query_batch version="1.0" xmlns = "http://www.crossref.o…"> <head> <email_address>[email protected]</email_address> <doi_batch_id>fm_429_001</doi_batch_id> </head> <body> <query key="fm_1" enable-multiple-hits="false“ forward-match="true"> <journal_title>Test Publication</journal_title> <author>Anderson</author> <volume>33</volume> <issue>9</issue> <first_page>125</first_page> <year>2002</year> </query> </body></query_batch>

Technical Workshop London – On Line 2004 39

Log message when query is submitted

<?xml version="1.0" encoding="UTF-8" ?> <crossref_result version="2.0" xmlns="http://www.crossref.org/qrschema/2.0" …"> <query_result> <head> <email_address>[email protected]</email_address> <doi_batch_id>fm_429_001</doi_batch_id> </head> <body> <query status="unresolved"> <journal_title>Test Publication</journal_title> <author>Anderson</author> <volume>33</volume> <issue>9</issue> <first_page>125</first_page> <year>2002</year> <msg>Query stored in CrossRef for forward matching</msg> </query> </body> </query_result></crossref_result>

<?xml version="1.0" encoding="UTF-8" ?> <crossref_result version="2.0" xmlns="http://www.crossref.org/qrschema/2.0" …"> <query_result> <head> <email_address>[email protected]</email_address> <doi_batch_id>fm_429_001</doi_batch_id> </head> <body> <query status="unresolved"> <journal_title>Test Publication</journal_title> <author>Anderson</author> <volume>33</volume> <issue>9</issue> <first_page>125</first_page> <year>2002</year> <msg>Query stored in CrossRef for forward matching</msg> </query> </body> </query_result></crossref_result>

Technical Workshop London – On Line 2004 40

Results email

<?xml version = "1.0" encoding = "UTF-8"?><crossref_result version="2.0" xmlns="http://www….-instance… "> <query_result>

<head> <email_address>[email protected]</email_address> <doi_batch_id> fm_429_001 </doi_batch_id></head><body> <query key=“fm_1" status="resolved"> <doi>10.5555/forward_match_test_2</doi> <issn>12345678</issn> <journal_title match="exact">Test Publication</journal_title> <author match="exact">Smith</author> <volume match="exact">3</volume> <issue>2</issue> <first_page match="exact">100</first_page> <year match="exact">1985</year> <publication_type>full_text</publication_type> </query></body>

</query_result></crossref_result>

<?xml version = "1.0" encoding = "UTF-8"?><crossref_result version="2.0" xmlns="http://www….-instance… "> <query_result>

<head> <email_address>[email protected]</email_address> <doi_batch_id> fm_429_001 </doi_batch_id></head><body> <query key=“fm_1" status="resolved"> <doi>10.5555/forward_match_test_2</doi> <issn>12345678</issn> <journal_title match="exact">Test Publication</journal_title> <author match="exact">Smith</author> <volume match="exact">3</volume> <issue>2</issue> <first_page match="exact">100</first_page> <year match="exact">1985</year> <publication_type>full_text</publication_type> </query></body>

</query_result></crossref_result>

Subject: Crossref stored query match: doi_batch_id= fm_429_001 ; query_key= fm_1

Technical Workshop London – On Line 2004 41

http://doi.crossref.org/servlet/downloadStoredQueries?usr=creftest&pwd=c53test&startDate=2004-03-31&endDate=2004-05-03

Polling for Query Matches

• You can interrogate the system to get a list of queries that may have matched.

Technical Workshop London – On Line 2004 42

Forward Linking Queries

• Forward linking is an ‘opt-in’ service

• Fees: a surcharge on the annual membership

• Permission must be enabled by a CrossRef administrator

Technical Workshop London – On Line 2004 43

Sample: forward linking query results

<?xml version = "1.0" encoding="UTF-8"?><query_batch version="2.0" xmlns = "http://www.crossref.org/qschema/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crossref.org/qschema/2.0 http://www.crossref.org/qschema/crossref_query_input2.0.xsd"> <head> <email_address>[email protected]</email_address> <doi_batch_id>fl_001</doi_batch_id> </head> <body> <fl_query alert="true"> <doi>10.1097/00001622-200101000-00005</doi> </fl_query> </body></query_batch>

Forward Linking Query Example

Reference deposit #1 (log) Reference deposit #2 (log)

Technical Workshop London – On Line 2004 44

Forward Linking Alerts

• Once you’ve made a forward link query, deposit of any new articles that cite the DOI you requested will generate an alert email

Reference deposit #3 (log)

Email alert notice

Technical Workshop London – On Line 2004 45

New Initiatives

Components

Extended content types

Plans for 2005

Technical Workshop London – On Line 2004 46

Component Deposits

What is a component ?

Components are considered to be sub-itemsthat are part of the construction of an article,chapter or conference paper or providesupporting (sometimes called supplemental)information.

These items by and of themselves are not typicallycited in a bibliography, but they are cited within the text

NOTE: Title DOIs and Issue DOIs are not components

They should be deposited in the journal, conf-proc or book metadata

Technical Workshop London – On Line 2004 47

Component Deposits

Why create DOIs for components ? To improve link management Build persistent links Use multiple resolution on them

How are components deposited Schema version 3.0.3 supports components Deposit as part of an article’s metadata or

standalone (note: a parent DOI must be specified)

Technical Workshop London – On Line 2004 48

Component Deposits

What Component services will CrossRef offer? Near term

Just the registration of the DOI

Long term

Some form of lookup service (e.g. query)

Expanded component metadata

(licensing, copyright …?)

Technical Workshop London – On Line 2004 49

<journal_article> ... <doi_data> <doi>10.9876/S0003695199019014</doi> <resource>http://ojps.aip.org:18000/link/?apl/74/1/76/ab</resource> </doi_data> <component parent_relation="isPartOf"> <description><b>Figure 1:</b> This is the caption of the first figure...</description> <format mime_type="image/jpeg">Web resolution image</format> <doi_data> <doi>10.9876/S0003695199019014/f1</doi> <resource>http://ojps.aip.org:18000/link/?apl/74/1/76/f1</resource> </doi_data> </component> <component parent_relation="isReferencedBy"> <description><b>Video 1:</b> This is a description of the video...</description> <format mime_type="video/mpeg"/> <doi_data> <doi>10.9876/S0003695199019014/video1</doi> <resource>http://ojps.aip.org:18000/link/?apl/74/1/76/video1</resource> </doi_data> </component></journal_article>

Component Deposits

Technical Workshop London – On Line 2004 50

Component Deposits

<body> <sa_component> <doi>10.9876/molcell/10/4</doi> <component parent_relation="isPartOf"> <description>Cover Image, Molecular Cell, Volume 10, Issue 4, January 2004 </description> <format mime_type="image/tiff"/> <doi_data> <doi>10.9876/molcell/10/4/cover</doi> <resource>http://molcell.org/10/4/cover</resource> </doi_data> </component> </sa_component></body>

Alternatively components may be deposited separately from their ‘parent’ item’s metadata

Technical Workshop London – On Line 2004 51

Expanded Content Types

Metadata study now underway Dissertations, technical reports, working papers, standards, patents and databases

Implementation to occur in early 2005

XML schema will update to version 4.0

Deposits

Query services

Extend current query mechanism ? ‘Firewall’ current content ?

Technical Workshop London – On Line 2004 52

Expanded Content Types

Add <advisor> elements Review NDLTD metadata

standards Survey T & D organizations

Cal Tech ProQuest Texas A&M ?

<dissertation> <person_name> <titles> <acceptance_date> <university> <name> <location> <department> <degree> <publisher_item> <doi_data>

Dissertations

Technical Workshop London – On Line 2004 53

Expanded Content Types

Drop ‘technical’ label Support chapters? Survey organizations

AGU NASA/JPL Other government? ?

<report> <contributors> <titles> <publisher. <publication_date> <publisher_item> <series_metadata> <isbn> <issn> <research_organization> <sponsor> <organization> <contract> <doi_data>

Reports

Technical Workshop London – On Line 2004 54

Expanded Content Types

Conflicts with published articles Include series metadata? Survey organizations

?

<report> <contributors> <titles> <publisher> or <university> <publication_date> <publisher_item> <series_metadata> <isbn> <issn> <research_organization> <sponsor> <organization> <contract> <doi_data>

Working papers

Technical Workshop London – On Line 2004 55

Expanded Content Types

Not included in initial analysis, added after annual member meeting

Interest from IEEE Metadata draft development TBD Accredited Vs consortium standards Survey organizations

Niso, ANSI, BSI, ISO IEEE ConsortiumInfo.org

Standards

Technical Workshop London – On Line 2004 56

Plans for 2005

Modify / improve page number processing

Normalized XML

Modularize CrossRef system

Implement Expanded Content Types

Others

Technical Workshop London – On Line 2004 57

Page & article numbers

CrossRef deposit schema allows for first page and article number <pages><first_page> <publisher_item><item_number item_number_type="article-number">

Article number will be used if no first page is provided a query has only one ‘page’ field and will search either first_page or article_number but not both

Some articles have both: both are presented to the reader change the query logic to search both fields add and XML query field for ‘article_number’

Page numbers (and article numbers?) are not numbers

would a full fuzzy match on page improve matching rates?

Technical Workshop London – On Line 2004 58

Normalized XML

Journal, proceedings and book content is stored in 2 places in the CrossRef database

1. Subset in tables/columns to support query operations

2. Entire deposit as a CLOB (not easily accessed)

XML query results are specialized for each content type (<journal_cite><conf_cite><book_cite>)

Reduce all content type info to a simpler ‘one size fits all’ schema

Store each DOI record as XML in a database column (memo?)

Facilitates access to all metadata (e.g. complete ‘lite’ weight local host files)

Yield a more consistent XML query result

Technical Workshop London – On Line 2004 59

Modularize

Current system is a monolith One database supports everything

Separate operations to improve performance and scalability Deposits & updates Queries Reports

Additional benefits Local host the CrossRef query system, not just the metadata

Technical Workshop London – On Line 2004 60

Other

Components Query mechanism Expand the metadata (license, rights …)

Production implementation of multiple resolution Integrate into the deposit process Implement a local host type option Automatic appropriate copy service

Technical Workshop London – On Line 2004 61

Questions / Discussion