mellon e-journal archiving project january20, 2002
TRANSCRIPT
![Page 1: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/1.jpg)
MELLONE-JOURNAL ARCHIVING
PROJECT
January20, 2002
![Page 2: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/2.jpg)
DIGITAL PRESERVATION
![Page 3: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/3.jpg)
THE BIG ISSUE IN DIGITAL LIBRARIES
• Digital is inherently fragile– constant technological change yields short life
for all digital materials
• Nothing will be saved passively– requires constant and conscious action to
preserve
• A core role for research libraries in the digital era????
![Page 4: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/4.jpg)
JOURNAL ARCHIVING IN THE PAPER ERA
• Large-scale redundancy
• Access copy and archival copy usually the same
• Not just storage, but preservation– includes environmental control, library binding,
repair, reformatting. . .
• Deliberate, long-term archiving largely the role of national and research libraries
![Page 5: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/5.jpg)
E-JOURNAL MODEL IS DIFFERENT
• “Copies” are remote, held in publisher systems– not replicated across different institutions
• Perpetual license provides limited comfort in the absence of independent copies
• Long-term preservation involves very different issues than day-to-day access
![Page 6: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/6.jpg)
LACK OF ARCHIVING A GROWING PROBLEM
• Libraries bearing double costs– the e-journals users prefer– the paper for preservation
• Publishers cannot convert totally to digital– authors and editors distrust e-only journals because of
concerns about persistence– libraries demand paper for preservation
• Libraries preserving paper version, but electronic more complete, increasingly the copy of record
![Page 7: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/7.jpg)
MELLON E-JOURNAL ARCHIVING PROGRAM
• 13 institutions invited to submit proposals for a one-year planning project
• Six planning proposals were selected and funded in December 2000– additional project focused on technology
(LOCKSS) also funded
• Second round of Mellon grants to be announced in June will fund actual implementation
![Page 8: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/8.jpg)
SIX PLANNING PROJECTS
• Publisher-based – Harvard (Wiley, Blackwell, University of Chicago
Press)– Penn (Oxford and Cambridge University Presses) – Yale (Elsevier)
• Discipline-based – Cornell (agriculture), – NYPL (performing arts)
• Dynamic e-journals – MIT
![Page 9: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/9.jpg)
SOME BASIC ASSUMPTIONS
• Archive should be independent of publishers– responsibility of institutions for whom archiving is
a core mission
• Archiving requires active publisher partnership• Address long timeframes (100 years?)• Archive design based on Open Archival
Information System (OAIS) model
![Page 10: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/10.jpg)
OBJECTIVES FOR PLANNING PROJECTS
• Develop draft archiving agreements with publisher partners
• Design technical architecture for an archive• Formulate an acquisitions and growth plan• Articulate access policies• Address validation/certification• Design an organizational model, staffing,
long-term funding model
![Page 11: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/11.jpg)
Key planning issues/decisions…
![Page 12: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/12.jpg)
BASE ON DL INFRASTRUCTURE
• Use existing infrastructure for storage, management, preservation, access
• Enhanced to comply with OAIS model
• New ingest and rendering functions
![Page 13: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/13.jpg)
ARCHIVING AGREEMENT
• Explicit archiving license with publisher
• License addresses what content is archived, responsibilities of parties, conditions of use, economics
• Not always an easy negotiation– archiving involves handing publisher’s
intellectual property to independent party
![Page 14: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/14.jpg)
PUSH MODEL
• Publishers will “push” content to be archived to Harvard– on-going regular deposit following on-line
publication of issue• (what happens when issues disappear?)
![Page 15: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/15.jpg)
WHAT CONTENT IS DEPOSITED?
• “Journal issues” are complex– publishers do not treat all journal content the
same (e. g. “front matter” treated as web pages, not objects in content management systems)
– “associated materials” (datasets, images, tables, etc.) not in the print versions
– advertising usually dynamic, and can involve country-specific complexities
![Page 16: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/16.jpg)
SOME COMMON STUFF
• Journal description• Editorial board• Instructions to authors• Rights and usage terms• Copyright statement• Ordering information• Reprint information• Indexes
• Career information• News• Events lists• Discussion fora• Editorials• Errata• Reviewers• Conference
announcements
![Page 17: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/17.jpg)
ARCHIVE MOST CONTENT
• Exclude little except advertisements – different from most “local loading”
• Articles include supplementary materials• Include an “issue object” in addition to the
article components– masthead, news, jobs, meetings, etc
• Reference links problematic– dynamic, frequently separate from article
![Page 18: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/18.jpg)
STANDARD ARCHIVAL ARTICLE DTD
• Publisher’s SGML formats vary widely• Consultant report on practicality of common
archival XML DTD• Dramatically reduces archive complexity• Issues include
– how low a common denominator– extended character sets, formulae, etc.– sacrifice functionality and original appearance– transformations involve risks
![Page 19: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/19.jpg)
DEPOSIT MORE THAN ONE FORMAT?
• Archive must accept PDF in any case– so include both SGML and PDF when
available?• belt and suspenders
– inclined to do this
• Accept publisher’s original SGML also?– conversion to archival DTD will result in loss– inclined to not do this
![Page 20: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/20.jpg)
“DARK-TO-LIGHT”
• Archived material not accessible at deposit– do not compete with publishers
• Content becomes accessible after “trigger event”– default then is universal access
• But how do you know “dark” archival content is still good? – it would be better if there was some on-going
access…..
![Page 21: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/21.jpg)
ACCESS MODEL
• Archived content always accessible to anyone with appropriate license from publisher – might be satisfied by batch export
• After trigger, simple on-line functionality – assume same functionality for auditors
![Page 22: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/22.jpg)
TRIGGER EVENTS
• “N” years after deposit– “N” set by publisher title-by-title
• When title/year no longer commercially accessible on the Internet – still problematic with some publishers
• When content enters public domain
![Page 23: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/23.jpg)
PRESERVATION
• Format-by-format issue
• Archive specifies preferred formats, which will be kept renderable
• Just maintain bits for others– e. g., “associated materials” (datasets, models, etc.)
generally accepted in ANY format• maintaining the viability of such wildly heterogeneous
materials unrealistic
– keep unaltered for future “digital archeology”
![Page 24: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/24.jpg)
ECONOMIC MODEL
• First question is not who pays, but what will it cost…– reducing costs to the minimum is critical
• In general publishers expected to bear preparation costs for archived objects
• Process automation critical to keeping costs low– ingest process
– auditing
![Page 25: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/25.jpg)
PAYMENT WITH DEPOSIT
• Two part fee– ingest fee to cover up-front costs
• varies with publisher effort to create easily archived objects???
– “dowry” to create maintenance endowment
• Sources include subscribers, authors, societies
![Page 26: MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649eb15503460f94bb7f48/html5/thumbnails/26.jpg)
NEXT…..
• Proposal to Mellon by April 1 for funding to implement an archive– particular parameters of the call-for-proposals still
uncertain
• Original plan suggested 3 or 4 year projects• Intent is to implement archive, contract for
deposit, begin operations– learn by getting dirty hands– help understand issues, costs