publisher’s perspective: digitization of print resources, and archiving of digital resources judy...
TRANSCRIPT
Publisher’s Perspective:Digitization of print resources, and
archiving of digital resources
Judy Best, June 13, 2006
NRC Research Press Background
• NRC Research Press has been publishing scholarly journals since 1929; the electronic editions have been available since 1996
• Publishes journals in biology, life sciences, and physical sciences (everything from botany to chemistry to environmental engineering)
• Publish 16 journals, plus monographs and conference proceedings• Offer print and electronic subscriptions• A continuing investment in electronic publishing keeps us at the
forefront of scientific communications– XML-tagged data– HTML, PDF, metadata– Print
Digitization of print edition
• Digitization of back issues
• Pilot project– Canadian Journal of Chemistry– From 1951– Approximately 26,000 articles
Digitization of print edition
• Process– Scanned PDF image – (NRC Reprographics)– Metadata (contractor) – OCLC document type definition (DTD)– Created OCR PDF (in-house)
• Adobe Acrobat 7 Professional• Creates a single layered file - PDF image + PDF OCR
Digitization of print edition
• Benefits– Metadata deposited to CAS (Chemical Abstracting Service),
CrossRef (for reference linking from other scholarly journals), Chemport (for reference linking from other chemistry journals)
– Revenue through subscriptions – To be indexed by Google and Google Scholar– Tagged metadata in a standard format now available for future
uses
• Future Plans– Digitize back issues of all journals (PDFs plus metadata)– Expand locations and services for depositing content
Archiving electronic edition
• What is the archival format? Controversy in the publishing community
• Is PDF archival? Possible problems with being machine-readable in future
• SGML- and XML-tagged data considered archival because– they are text files– they can be read by all platforms– they can be translated to graphical (HTML or PDF) format by
writing a simple program• NRC Research Press currently implementing XML-based
workflow for all journals in order to have a fully XML-tagged version of all articles
Archiving electronic edition
• Library and Archives Canada– Formerly the National Library of Canada and– Public Archives of Canada
• Mandate– To preserve the documentary heritage of Canada for the benefit of
present and future generations; – To be a source of enduring knowledge accessible to all, contributing to
the cultural, social and economic advancement of Canada; – To facilitate in Canada cooperation among communities involved in
the acquisition, preservation and diffusion of knowledge; and – To serve as the continuing memory of the government of Canada and
its institutions.
Depositing material to LAC
• Publisher required by law to archive print
• Deposit of electronic version is voluntary
• NRC Research Press deposits print and electronic content– PDF, metadata– Website duplicated
Accessing electronic archive
• Access– Depositing data is only useful if you can access it– In case of catastrophe, subscribers require access to our
electronic edition– Current LAC duplication of our Website is not available to the
public (dark archive), but could be made available in case of disaster
– Explore other avenues to make content available to subscribers in case of disaster
– Portico (www.portico.org) sustains an archive of scholarly journals, on behalf of publishers and subscribing libraries
– In event of disaster, publishers agree to allow libraries to access the content (PDF or tagged SGML/XML) through Portico
Digitization and archiving — notes for publishers
• Publishers need to have an integrated plan for digitization of print resources and archiving of electronic resources
• Factors to consider:– Access (disaster scenarios)
• Communicate to subscribers — more likely to subscribe electronically if they know the Web site will be available in event of disaster
– Archival format• Format migration — will it be readable in 10 years?• Device-independence — can it be read by any computer?• Media — hard disk, tape, DVD?
– Reuse and repurposing of metadata and full text– Services available (national library, Portico)