content production workflows in the scientific publishing process

WIRTSCHAFTSUNIVERSITÄT WIEN

DIPLOMARBEIT Titel der Diplomarbeit: Content Production Workflows in the Scientific Publishing Process. Building an E-Publishing Solution for Research Articles.

Verfasserin/Verfasser: Andreas Geyrecker

Matrikel-Nr.: 8951713

Studienrichtung: Betriebswirtschaft 2002

Beurteilerin/Beurteiler: Univ.-Prof. Dr. Gustaf Neumann Mag. Fridolin Wild (betreuender Assistent)

Ich versichere: dass ich die Diplomarbeit selbstständig verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt und mich auch sonst keiner unerlaubten Hilfe bedient habe. dass ich dieses Diplomarbeitsthema bisher weder im In- noch im Ausland (einer Beurteilerin/ einem Beurteiler zur Begutachtung) in irgendeiner Form als Prüfungsarbeit vorgelegt habe. dass diese Arbeit mit der vom Begutachter beurteilten Arbeit übereinstimmt. Datum

Unterschrift

CONTENT PRODUCTION WORKFLOWS IN THE SCIENTIFIC PUBLISHING

PROCESS

BUILDING AN E-PUBLISHING SOLUTION FOR RESEARCH ARTICLES

Andreas Geyrecker

Institute for Information Systems and New Media

Vienna University of Economics and Business Administration

January 2009

Thesis Supervisors: Univ.-Prof. Dr. Gustaf Neumann Mag. Fridolin Wild

ABSTRACT

Publishing is a vital step in science. The majority of new research is based on previous work which

has been published in scientific journals or conference proceedings. Making research findings fast

and openly accessible in publications is therefore encouraging research innovation.

A significant part of scientific publishing today means distribution of articles through journals in

both print and online. Improvements in web based publishing technologies are enabling scientific

communication in a faster, more collaborative way. These new methods challenge traditional pub-

lishing mechanisms. The work reviews different aspects around scholarly publishing, by examining

the theoretical concepts of the traditional publishing model. It presents Open Access, a new pub-

lishing model viable to complement the existing publishing model.

Based on this theoretical part, the work aims to provide a technical concept for a scientific publish-

ing system which is highly integrated with the existing DSpace repository platform software. The

system should provide value-added services for the repository. These include a simplified authoring

and submission process and publishing of archived articles into different formats, namely XHTML

and PDF with the help of XSL style sheets. In addition, the system provides a workflow for aggre-

gating articles into journals and the ability to produce a high quality journal PDF.

The project was motivated by several requirements. Those were all centred around the ability to

provide easy aggregation of submitted papers into one single publication, either a journal or confer-

ence proceedings. Today, papers are mostly submitted in a format which requires manual effort to

bundle them in a publication.

TABLE OF CONTENTS

Chapter 1 Introduction .................................................................................................................................................1 Chapter 2 Scientific Publishing....................................................................................................................................6

1 Publishing as Integral Part of the Research Process ..............................................................7 2 Functions in Scholarly Publishing..............................................................................................7 3 The Journal.....................................................................................................................................9 4 The Traditional Publishing Model...........................................................................................13 5 Peer Review..................................................................................................................................15 6 Problems with the Traditional Publishing Model.................................................................18

6.1 Accessibility....................................................................................................................18 6.2 Time Delay.....................................................................................................................19 6.3 The Volume of Research Papers ...............................................................................20 6.4 Copyrights ......................................................................................................................21

Chapter 3 Open Access ..............................................................................................................................................23

1 What is Open Access .................................................................................................................23 1.1 The Green Road ...........................................................................................................25 1.2 The Gold Road .............................................................................................................27

2 Open Access Milestones............................................................................................................29 2.1 Budapest Open Access Initiative...............................................................................29 2.2 Bethesda Statement ......................................................................................................30 2.3 Berlin Declaration.........................................................................................................30 2.4 Open Access Mandates ...............................................................................................31

3 Open Access Business Models.................................................................................................34 4 Key Open Access Aspects ........................................................................................................35

4.1 Global Access to Research..........................................................................................35 4.2 Visibility and Citation Impact.....................................................................................36 4.3 Long-term Preservation...............................................................................................37

5 Arguments against Open Access .............................................................................................39 Chapter 4 The Future of Scholarly Publishing .......................................................................................................41

1 New Research and Publishing Environments.......................................................................41 2 The Semantic Web......................................................................................................................43 3 Web 2.0 and Science...................................................................................................................45 4 Alternative Review Models .......................................................................................................48

Chapter 5 E-Publishing Solution ..............................................................................................................................51

1 Mixing the Green and Gold Road...........................................................................................51 1.1 Existing Literature ........................................................................................................52 1.2 Multi Channel Publishing............................................................................................53 1.3 Organizing Content into Journals .............................................................................54

2 Requirements ...............................................................................................................................56 3 Open Standards...........................................................................................................................58

ii

3.1 Metadata .........................................................................................................................58 3.2 XML................................................................................................................................59 3.3 Persistent Identifiers.....................................................................................................60 3.4 Authoring and Preservation Formats .......................................................................61

4 Solution Architecture .................................................................................................................63 4.1 The DSpace Repository...............................................................................................64 4.2 DSpace Customizations ..............................................................................................69

5 System Implementation .............................................................................................................71 5.1 Submission Workflow .................................................................................................73

5.1.1 Authoring..............................................................................................................73 5.1.2 Content Submission ...........................................................................................74 5.1.3 Quality Control Mechanism..............................................................................82 5.1.4 Archiving ..............................................................................................................82

5.2 Publishing in Various Formats...................................................................................83 5.3 Journal Management Workflow.................................................................................91

6 Related Work .............................................................................................................................100 7 Conclusion..................................................................................................................................101

iii

FIGURES

Number page Figure 1: Global market shares of STM publishers (2003) (House of Commons Science

and Technology Committee 2004, p.13) ..................................................................................10 Figure 2: The publishing cycle (Mabe 2006, p.58) ...................................................................................13 Figure 3: Berlin Declaration signatories (http://oa.mpg.de/openaccess-berlin/signatories-

graphs.html) ...................................................................................................................................31 Figure 4: Average citation ratios for articles in the same journal and year that were and were

not made Open Access by author self archiving. Date span: 1992–2003. (Harnad et al. 2008, p.38).............................................................................................................................37

Figure 5: RSS feed carrying additional bibliographic metadata (Hammond, Hannay & Lund 2004) ................................................................................................................................................47

Figure 6: A model for Open Access e-journals (Koohang & Harman 2006)...................................56 Figure 7: DSpace-based publishing solution's core components .........................................................63 Figure 8: geographical distribution of DSpace repositories (http://maps.repository66.org/).......65 Figure 9: DSpace data model (dspace.org 2006)......................................................................................66 Figure 10: DSpace technical architecture including the new Manakin user interface (Phillips

et al. 2005).......................................................................................................................................67 Figure 11: The Manakin architecture (Phillips et al. 2007).......................................................................68 Figure 12: Manakin DRI schema (Digital Initiatives 2005)......................................................................69 Figure 13: DSpace repository structure........................................................................................................70 Figure 14: PRISM metadata fields.................................................................................................................70 Figure 15: submission and publishing workflow........................................................................................72 Figure 16: OpenDocument format authoring template ...........................................................................73 Figure 17: traditional submission process....................................................................................................75 Figure 18: proposed submission process.....................................................................................................75 Figure 19: article conversion ..........................................................................................................................77 Figure 20: article upload - the author has immediate access to different output formats

(XHTML web preview, PDF and a report view) to ensure that the article renders as expected......................................................................................................................................81

Figure 21: report for quality control showing metadata, document structure and references ..........81 Figure 22: DSpace collection workflow (dspace.org 2006)......................................................................82 Figure 23: article landing page containing links to various viewing options and social

bookmarking services...................................................................................................................86 Figure 24: PDF article full text ......................................................................................................................87 Figure 25: XHTML article full text...............................................................................................................88 Figure 26: clickable images in XHTML are not impeding the reading flow ........................................89 Figure 27: automated saving of repository content to Zotero ................................................................90 Figure 28: Dublin Core metadata made visible by the Dublin Core Viewer, a Firefox

extension.........................................................................................................................................90 Figure 29: PDF containing mathematical formulae ..................................................................................91 Figure 30: aspect chain containing the Journal aspect ..............................................................................93 Figure 31: journal issue table of contents ....................................................................................................93 Figure 32: DRI content added by JournalIssueViewer transformer ......................................................95 Figure 33: journal landing page (showing unpublished issues to privileged users) ............................97 Figure 34: journal issue table of contents administration.........................................................................97 Figure 35: journal issue adminstration page ................................................................................................98 Figure 36: journal issue PDF..........................................................................................................................99

iv

ACKNOWLEDGMENT

I would like to thank all people who have helped and inspired me during my study.

I especially want to thank my advisor, Fridolin Wild, Institute for Information Systems and New

Media, for his commitment to helping see this project through to its final completion, and his wise

guidance during its development.

I want to thank Professor Gustaf Neumann, Institute for Information Systems and New Media, for

his guidance and advice during my research.

My deepest gratitude goes to my family Helga, Jacob and Luisa for their unflagging love and sup-

port throughout my study. This thesis is simply impossible without them.

v

ABBREVIATIONS

DC Dublin Core Metadata

DRI Digital Repository Interface

HTML Hypertext Markup Language

MathML Mathematical Markup Language

METS Metadata Encoding and Transmission Standard

OAI Open Archives Initiative

OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting

ORE Object Reuse and Exchange

PDF Portable Document Format

PRISM Publishing Requirements for Industry Standard Metadata

RDF Resource Description Framework

RSS Rich Site Summary

SVG Scalable Vector Graphics

WML Wireless Markup Language

XHTML eXtensible Hypertext Markup Language

XML eXtensible Markup Language

XSL-FO eXtensible Stylesheet Language Formatting Objects

XSLT eXtensible Stylesheet Language Transformation

C h a p t e r 1

INTRODUCTION

“Our mission of disseminating knowledge is only half complete if the information is not made widely

and readily available to society.” (Berlin Declaration 2003)

The purpose of this paper is to present the main concepts of the existing scholarly communication

system and new practices in scientific publishing, especially the Open Access model. This theoreti-

cal part will form the basis for the development of an E-Publishing solution based on the DSpace

repository (an open source software platform for digital archiving). The proposed solution suggests

enhancements to support Open Access journal management and publishing workflows in DSpace.

In Chapter 2 the functions and characteristics of the established scholarly communication system are

described, relying on the existing literature. As integral part of the research workflow the publishing

process satisfies major tasks like dissemination and archiving of research findings. Therefore the

players, system and processes of the traditional publishing model are presented.

Today a significant part of scientific publishing means distribution of articles through journals in

both print and online. Journal publication ensures quality and widespread dissemination of research

findings. But journals do not publish all the material they receive. In the traditional, approved

scholarly journal workflow, peer review is used for assessing the quality of research before it is pub-

lished. Peer review requires the critical review of the paper by two or more qualified experts. The

advantages of peer review as well as the drawbacks and challenges it is facing today are discussed.

Scientific publishing has been in a technological transformation in recent years. Web based publish-

ing technologies enable scientific communication in a faster, more open and collaborative way.

These new methods challenge parts of the existing publishing mechanisms.

One of these challenges focuses on how to organize access to scientific information. In the estab-

lished scholarly communication system economic conditions—cost increases and quantity of schol-

arly journals—limit access to scholarly information (Kennan & Kautz 2007). The cost increases

mean for most libraries and universities, that relevant publications are no longer affordable.

The dominant business model in the traditional publishing system is subscription based. This

means either the reader or his institution or library has to afford the journal subscription costs.

2

Combined with the public funding of research authors and reviewers, the cost increases of many

scholarly journals resulted in the so called Serials Crisis.

Web based publishing technologies on the other hand allow unrestricted, open access to informa-

tion. The Serials Crisis in combination with the possibilities of these new publishing technologies

promoted a new model for scholarly publishing called Open Access. The model of Open Access is

presented in Chapter 3.

The Budapest Open Access Initiative (2002) defines Open Access as: “free availability on the pub-

lic Internet, permitting any users to read, download, copy, distribute and/or print, with the possibil-

ity to search or link to the full texts of these articles, crawl them for indexing, pass them as data to

software, or use them for any other lawful purpose, without financial, legal, or technical barriers

other than those inseparable from gaining access to the Internet itself”.

The main purpose of Open Access is to maximise research access and impact. There are two basic

strategies of Open Access (Harnad et al. 2004): First, there is self-archiving a supplementary copy

of an already published, peer-reviewed article on an institutional repository by the author, known as

the Green Road to Open Access. As Suber (2005) notes, this strategy maximizes access and visibility not

only for authors, but for the publishers, too, because by increasing the dissemination and availabil-

ity the articles reach a much larger set of readers than any priced journal, in print or online.

Second, there are new Open Access Journals, known as the Gold Road to Open Access. Most of these

journals have adopted new business models, mostly an author pays model, where authors (or their

institutions) would pay a fee when an article is accepted for publication.

Thus Open Access forced traditional publishers to reconsider the current subscription based

model. Some commercial publishers already experiment with hybrid models (Lamb 2004, p.147),

which, for example let the authors choose to publish in the traditional scheme or pay for publica-

tion to make their articles Open Access.

Major concerns about the Open Access model include the need to maintain current quality stan-

dards by not undermining peer review (European Commission 2007) and uncertainty about the

sustainability of Open Access in terms of funding models, acceptance by the research community

and impact factors.

Chapter 4 discusses the future of scholarly publishing. “We are already in a world where scientific

information is primarily digital” (Hannay 2007). Digital availability of scientific journals has been

3

common for the last ten years at least. Although publishers provide access to their materials online,

the way how a journal issue is composed and the stages producing a publication have remained the

same as in print delivery. The current standards in electronic journal publishing and also in Open

Access still rely on journals as their basic unit (Harnad 2005).

Yet, information technology and especially the web can make fundamental changes possible in how

publishing can work in the future. As Tim Berners-Lee, the inventor of the World Wide Web, and

James Hendler (2001) put it: “But we are only in the early days of a new Internet revolution, one

which will have a deeper and more disruptive impact on scientific, and other, web publishing, and

have profound implications for the web itself. An emerging successor to the web, the Semantic

Web, will likely profoundly change the very nature of how scientific knowledge is produced and

shared, in ways that we can now barely imagine.”

Semantic markup will allow computers to read information. This will help scholars to manage the

vast amount of scholarly information. “Scientific information in an online world needs to be made

useful not only to readers but also to software and other websites. Only in this way will the infor-

mation become optimally useful to humans” (Hannay 2006).

The web is also evolving towards an “architecture of participation” (O'Reilly 2004). Concepts like

blogs, tags and comments are reconstructing scientific communication towards an environment of

collaboration. This type of user participation is adding value to existing content. For example, the

collection of all users tag assignments creates new relations between individual content items. Fur-

thermore, such collaborative environments can encourage the development of alternative review

models. But this must be considered as a complement to more formalized Web standards like the

Semantic Web.

In the final Chapter 5 the architecture of an E-Publishing solution is outlined. The aim of this solu-

tion is to provide storage for research articles based on the DSpace repository platform, an open-

source solution for accessing, managing and preserving scholarly works. The repository is intended

to ensure open and persistent access to research material.

The proposed solution suggests an approach to enhance the existing DSpace Green Road model

with Gold Road Open Access journal concepts. The solution provides additional multi channel pub-

lishing and journal management workflows for the DSpace repository platform.

Key concepts are the simplification of the article authoring and submission process, which includes

automated metadata extraction directly from the author's document. Delivering content in multiple

4

formats helps to improve user experience and enables better handling of archived scholarly mate-

rial. Therefore, the proposed system offers the publishing of repository content in different output

formats, namely XHTML (eXtensible Hypertext Markup Language) and PDF (Portable Document

Format) with the help of XSL style sheets (eXtensible Stylesheet Language).

The system incorporates e-journal management functions into DSpace which allow easy set up and

maintenance of journal issues containing repository content. The new journal workflow allows a

flexible way of packaging repository content and an automated fabrication of print editions of jour-

nal issues.

The platform will combine open-source software components to support the major stages of the

scientific publishing process. These stages include:

Content authoring: Most text documents created today are created in a word processor like Microsoft

Word or OpenOffice.org Writer. As Barnes (2006b) notes, these formats are not suitable for long-

term storage. Latest versions of word processor software offer increased support for the media

independent XML (eXtensible Markup Language) technology. Options for a viable authoring envi-

ronment have been evaluated. The system proposes the use of Open Document Format (ODF).

Submission: Submission is a crucial step in publishing, even in the age of electronic publishing, where

authors are expected to submit electronic manuscripts. Although most of the established publishing

and repository platforms ease the actual submission step, they often move much of the burden on

the author by expecting camera-ready papers. The proposed system is built to support both simple

authoring and a seamless submission process, incorporating automated article conversion and

metadata extraction. This approach should enhance user acceptance and enable a better and cleaner

authoring and submission process.

Preservation: The purpose of a repository is to care for accessibility of digital content and preserve

that content in the long term. Thus, preservation and reliable access include storing content in an

open, long-term accessible way. For the most part of existing repositories the chosen file format is

PDF. It is used for storage and access of content. Preserving scientific information in an online

repository can support additional possibilities in the ways and formats in which the content is pre-

sented. Limiting the formats to PDF introduces an unwanted constraint (Guédon 2006, p.35). That

is one of the main reasons why the proposed system uses a text based XML format for preserva-

tion.

5

Finally, XHTML has been chosen as preservation format. XHTML is an XML format and it is

suitable for structuring scientific articles. Furthermore, it is also an excellent access format which

can easily be viewed in a web browser. To allow storage in XHTML, all articles authored in the

Open Document Format must be converted into XHTML during submission.

Repository: The submitted articles are archived in a DSpace repository, making the content search-

and browsable. After approval by the collection administrator, articles are mapped to a Preprints

collection where they become accessible by the public.

Publishing: The existing DSpace platform is enhanced with publishing features to deliver content in

multiple output formats like PDF and XHTML. Different publishing channels increase reading

experience and allow the integration of new applications. This means enhanced interoperability

with other web based services like reference management systems.

E-journal issue management: DSpace by default does not support any journal management features.

The repository is limited to archive journal articles in a hierarchical structure after publication. The

system has been enhanced to support the workflow of creating and maintaining journals. Those

journals aggregate content already available in the Preprints collection.

6

C h a p t e r 2

SCIENTIFIC PUBLISHING

Scientific publishing means the dissemination of knowledge. Ideally, scientists should record and

share all their findings. Scholarly publications are crucial in the work of researchers, because nearly

all new research builds on work which has been published elsewhere. That means that without pub-

lishing research findings future innovation would not be possible. Moreover, the way how scientific

results are accessible, how rapidly this access is given as well as the cost of access all impact on re-

search excellence and innovation (Potočnik 2007).

The process to communicate and disseminate research findings is called scholarly communication.

Scholarly communication is based on formal and informal channels (Graham 2000, p.3): informal

networks like e-mails, web pages, science blogs etc.; initial public dissemination, taking place via

conferences or preprints; and finally formal publishing through scientific journals. The phrase sci-

entific publishing in this paper refers to the formal part of disseminating scholarly information by

means of journals.

Today scientific publishing is a sophisticated process with diverse players involved. Among those

players are researchers (who often act as authors as well as editors or referees), publishers, funding

bodies, libraries and universities. The advent of the Internet made formal and informal scholarly

publishing both more advanced and complex. Technology is playing an increasing role in the whole

process, enabling changes from the content creation phase to the point of new digital publishing

activities.

Despite these new web based publishing technologies the basic system of scholarly communication

has not changed significantly since the mid 17th century, when Henry Oldenburg published the first

scientific journal (Willinsky 2006, p.177). Scientific publishing is still a sophisticated process of se-

lecting, peer reviewing, editing, printing and distributing papers. The article is still the predominant

publishing format, even in the web based era.

In this chapter the history, functions and purposes of scientific publishing are explored. The tradi-

tional publishing model and the journal as principal means of distributing scholarly information are

presented.

7

A key stage in the traditional journal publishing model refers to the function of certification. This is

typically being achieved by applying the method of peer review, which is ensuring a certain quality

level of an article manuscript.

New possibilities in web based publishing also unveiled weaknesses in the traditional publishing

model, which are also presented.

1 Publishing as Integral Part of the Research Process

One can ask the question, why scientists must publish their research findings? Kling and McKim

(1999) note, that researchers must publish papers “to communicate their results, allocate status, and

allocate resources”. Therefore, publishing is essential for researchers to establish their own reputa-

tions, to advance their careers and to establish their priority and ownership of ideas (Mabe 2006,

p.59).

Kling and McKim (1999) refer to three criteria, which must be satisfied to complete a publication:

publicity, to alert the scientific community of its existence; trustworthiness, achieved by peer-review

which certifies the claims made in the article; and accessibility, making documents available for

readers in a stable print or online environment.

In this respect, publishing can be considered as integral part of the research process. Until all re-

search findings are published, the research process is not finished. Publications are the only count-

able and assessable output of research (Mabe 2006, p.59) and serve as the major input for new re-

search in the researcher’s community by encouraging discussions and reviews. Publishing thus is

making the research process more effective.

So the answer to the entry question can be: True, scientists must write and publish their findings.

This pressure on academics in respect of funding and career progression has come to be known as

“publish or perish” (Mabe 2006, p.59).

2 Functions in Scholarly Publishing

Today research most commonly publishes in scholarly journals and conference proceedings. The

scientific publishing model serves different purposes for different actors in the publishing cycle. As

noted above, a key function is that without publishing, innovation and progress in science would

8

not be possible. Roosendaal and Geurts (1998) have identified the following four functions for a

scholarly communication system (unaffected by the actual publishing medium):

• Registration, the establishment of priority, which allows claims of precedence for scholarly

findings and identifies the author as the originator of an idea.

• Certification, which establishes the validity of a registered scholarly claim. Certification en-

sures that a scholarly article has undergone a peer-review process. Beside quality control, its

most important function, peer review also provides constructive criticism for the author. A

further type of certification is the fact that a manuscript is being published in an essential,

quality journal title.

• Awareness means communicating and disseminating the findings to the intended audience,

allowing all actors in the scholarly system to remain aware of new research findings, simply

make research visible through wide distribution and thus make an impact on knowledge.

Without having access to the relevant scientific information, researchers would find them-

selves lagging behind.

• Archiving, providing long term access and preservation of publications. Archiving ensures

future accessibility as well as the possibility to reference and cite scholarly information.

A further function of journal publishing is rewarding, by giving authors recognition for their per-

formance in the communication system and helping them to build reputation (Mabe 2006, p.59).

That academic reward structure is either based on counts of the number of papers published, or

counts of citations of individual papers, or assessment of the impact factors (a measure of how

well-cited a journal is) of particular journals in which an author publishes (Kennan & Kautz 2007,

p.4).

Choosing a journal to which researchers submit their manuscripts is, to a certain level, driven by the

academic reward system. This is even more understandable as most researchers do not receive fi-

nancial compensation for publishing their works. Björk (2005) even states: ”Prestige counts much

more than wide and rapid dissemination, and easy access.” Being published in a well known, often

cited scientific journal not only complies with the four functions registration, certification, aware-

ness and archiving, it also enhances those functions by “delivering increased audience and visibility”

(Guédon 2001). Increased reputation again will support applications for research funding (House of

Commons Science and Technology Committee 2004, p.9).

9

But Guédon (2001) argues that researchers act as both authors and readers and thus while scientists

as authors want to publish in high prestigious publications, on the scientists’ reader side they and

their institutions and libraries suffer from high prices of such prestigious journals.

Another role journals play is the selection of material (John W. T. Smith 1999, p.81), which is mainly

subject based and ensures that all articles are in the field the journal focuses on. The selection of

material for inclusion in the journal helps to define the subject it serves.

Moreover, there are some technical characteristics of a journal, for example its format or naviga-

tional structure. Journals are text based, although they can carry images and tables they are unable

to deal with multimedia material like audio or video content. Excluding all non-textual content is a

major drawback of the existing journal publishing model.

3 The Journal

What is a journal? Librarians often speak about serials or periodicals, publications which are issued

continually (Garfield 1972, p.376). Journals are the primary scientific literature, where research find-

ings are published after peer review. Mabe (2006, p.57) highlights the final nature of an article pub-

lished in a scholarly journal as forming a “part of the ‘minutes of science’ of that discipline.”

The first scientific journal (called Philosophical Transactions) was published in 1665 by Henry

Oldenburg for the Royal Society of London. This journal was initially a collection of letters, ena-

bling scholars to communicate and archive research findings (Oppenheim et al. 2000, p.362). One

of the principle problems that should be solved by a periodical publication was the researchers’

demand to establish precedence for their scholarly findings. They wanted their “priority as discov-

erer to be publicly acknowledged” (Mabe 2006, p.56) before they were prepared to make their find-

ings accessible by the scientific community. This first periodical has adopted the function of regis-

tration described by Roosendaal and Geurts (section 2), by recording the author, his research find-

ings and the date of submission.

Scientific journals provide a measure of research output. The total number of scientific journals

available today count approximately 21,000 active, peer-reviewed journals publishing 1.4 million

articles each year (Mabe 2006, p.56). Harnad (2004) estimates 24,000 active, peer-reviewed scientific

journals publishing 2.5 million articles per year.

10

These figures in fact establish the scholarly journal as the principal means by which researchers and

scholars communicate. The number of journals and articles is growing, each year the number of

articles increases by 3%, the number of journals by about 3.5% (Mabe 2006, p.56).

This growth is simply caused by the growth in the number of researchers in the world, initiated by

the growth of postsecondary education and increasing government research funding after World

War II (Willinsky 2006, p.15).

The vast majority of scholarly journals is published by commercial publishers. Elsevier, Thomson,

Wolters Kluwer and Springer are the largest corporate publishers in the Science Technology Medi-

cine (STM) market, together account for about 52% market share (Figure 1).

Figure 1: Global market shares of STM publishers (2003) (House of Commons Science and Technology Committee 2004, p.13)

In the last two decades the creation and delivery of articles has been moving to digital forms, chal-

lenging the existing print journal publishing model. Articles available online are easily reproducable

and also changeable (Mabe 2006, p.61).

But Mabe (2006, p.57) also notes that the four functions of journal—registration, certification,

awareness and archival record—are so fundamental to the way “how science is carried out that all

subsequent journals, even those published electronically in the 21st century, have conformed to

Oldenburg’s model”.

Advantages of digitisation of journals compared to the print edition are highly visible: much easier

distribution; reduced costs of distributing the publication (although the coexistence of electronic

and traditional print journal editions even increased publishing costs); high effective searching

11

across all archived journal issues; multi format article delivery options; connecting articles by cita-

tion and reference linking; more accurate usage statistics helping publishers to develop new features

for their online platforms. For a comprehensive overview on the benefits of online journals see the

House of Commons Science and Technology Committee (2004, p.15) and Willinsky (2006, p.14).

In print, journal issues are static. If a journal has been published, the issue and its containing articles

perform a unique role, being an evaluated, final statement made by the authors (Mabe 2006, p.57).

Journals offer an aggregated collection of current research. Journal issues aggregate many articles

together into an issue in a fixed order. A print journal’s table of contents represents this order of a

particular journal issue.

Reading a journal in print means browsing through the entire issue, while online journals are ac-

cessed on an article basis mostly through a search. The technological transformation in publishing

is about to change the arrangement of journal articles. For example, Gerstein (1999) notes that in

the advent of “online journals, one might imagine a dynamic table of contents, arranging articles

according to the reader’s research interests, [or] download frequency.”

The scholarly journal’s digital transition will continue in the future, encouraging many technological

advances on publishers’ existing research platforms and will even lead to a fundamental change in

how scientific information will be chunked in the future. In highly dynamic and data intensive

fields, the traditional journal model fails to exploit the full capability of the Internet. Seringhaus and

Gerstein (2007) propose to capture and integrate database records and other types of digital for-

mats—datasets, simulations, software, annotations, and aggregates thereof—alongside manuscript

publication to build an optimal information architecture for a discipline. They note, that the tradi-

tional text based “format imposes sharp constraints upon the type and quantity of biological infor-

mation published today. Academic journals alone cannot capture the findings of modern genome-

scale inquiry.”

Hannay (2007) argues that the concept of the scientific paper will remain intact, because there is

real value in this form of publication. However, web technologies provide a lot of potential for

change. He is also predicting that the distinction between journals and databases will ultimately

become meaningless.

There is also a social aspect of journals which emerge from the collective, shared interests of its

authors and readers. The journal audience represents a scientific community, encouraging discus-

sions and debates about research findings. As Guédon (1994) puts it: “With the advent of the peri-

odical, print brought about a momentous change in the function of writing itself. Designed initially

12

as a prop for memory, writing evolved into a virtual discussion space.” Kling and McKim (1999)

also recognize the function of scholarly publishing as a communicative practice anchored in a par-

ticular community. Given the fact that scholarly research is a social process, discussion in research

will always be in high demand and the scientific journal serves as an important vehicle to share and

discuss findings with relevant peers.

If we recall the essential functions of a scholarly journal (section 2), it first provides a record of

accomplishment for authors by rewarding the author’s contribution to knowledge. Therefore it is

crucial for advancing their career. It also grants intellectual property rights. A journal has the impor-

tant functions of both knowledge distribution and archiving, which ensures the preservation of scholarly

information for the future. In the digital world preservation is more difficult to achieve than in the

print world. In contrast to print publications, libraries do not own digital publications. This effects

the archiving function.

Disseminating the latest research findings enables scholars to be continually aware of new findings, a

prerequisite to conduct further research. The digital age has changed distribution, making it possi-

ble for a journal issue to be delivered at much lower cost than through print delivery.

Peer review is probably the most important function of scholarly journals. In the process of peer re-

view research papers are refereed by qualified experts prior to publication, assessing the quality of

research before it is published. Peer review, its indisputable advantages as well as the challenges it is

facing today, is discussed in more detail in section 5.

Having identified the journal’s crucial functions in scholarly communication there are also some

characteristics in the existing journal publishing model which should be examined carefully. With all

new technological possibilities in mind, one may ask whether existing journals meet the needs of

the academic community and publisher requirements in terms of

• easy and open access, search and retrieval of the relevant information for every researcher

in the world who is interested in a particular piece of science

• communication and collaboration features by exploiting the growing capabilities of the

Internet

• fast availability and accessibility of new research findings

• quality assurance solely based on peer review. In the digital age peer review can be en-

hanced and connected with communication features provided by the Internet

13

• long term preservation and findability of scholarly information in the digital era

• the amount of knowledge is growing exponentially

• is the article still the unit of scholarly communication or are there other types and formats

of research information

• is there a possibility to store non-textual material and data with journal articles

• costs involved to provide different formats of journals

4 The Traditional Publishing Model

The traditional journal publishing model represents a system which has proven successful over

hundreds of years. The scientific publishing process involves scholars, publishers, institutions,

funding bodies and other organizations like learned societies. In this long process, rules and indi-

vidual behaviour of all actors has been devised. Today, established scholarly journal publishing

forms a dense web of individual relationships between all actors.

The network of moving information between different actors in the journal publishing process is

usually called the publishing cycle (Figure 2). Houghton (2001, p.168) lists creation, production and distri-

bution as the key stages in the publishing process.

Figure 2: The publishing cycle (Mabe 2006, p.58)

14

It is worth looking at the actors in the publishing cycle and at what their competence is. Authors,

readers, journal editors and referees are all part of the research community. This means that indi-

vidual scholars can have multiple roles in the system.

Authors research and write papers about their findings. They then submit it to a particular journal.

The journal’s audience and prestige may influence the choice of journal (House of Commons Sci-

ence and Technology Committee 2004, p.11). Being published in more and better journals, authors

get associated as experts in their field. Authors provide their articles to publishers for free. More-

over, Mabe (2006, p.62) points to the fact that due to the digital transition, authors are expected to

submit electronic manuscripts and even camera-ready papers which moved much of the burden of

the paper creation phase away from typesetters to authors.

Typically, the editor of a journal is an independent expert in the research area the journal focuses on.

The editor is appointed by the publisher. The journal editor judges the relevance of submissions to

the journal and organizes the peer review process by selecting expert referees. Thus, editors and refe-

rees act as gatekeepers in publishing (Guédon 2001). While the editor’s job is to compare the sub-

mission to other submissions, a referee has to compare the submission to the current findings in

the discipline. Peer review as one of the key steps in the publishing cycle will be outlined in more

detail in section 5.

Based on the results of the peer review process, the decision to publish a paper is also made by the

editor (Mabe 2006, p.59). The paper then has the status of being accepted for publication. Thus, the

editor plays an active, powerful role in the publishing process. After manuscript acceptance the

author gets constructive feedback from the editor and referee in order to improve the paper.

The publisher is engaged with the production of content based products and services (Houghton

2001, p.168). The journal publisher’s role is not limited to that of the printer, but it is much wider

(Mabe 2006, p.60): A publisher performs various tasks, from being a manufacturer of the actual

print publication to the distribution of the final publication. Publishers launch new journals and

gather the manuscripts, they sustain and promote these new journals. They organize and financially

support the peer review process.

In the digital era publishers are facing increased challenges, similar to all other actors in the publish-

ing cycle. They are expected to provide final output in both print and online, and to offer archiving

and access to scholarly information on a digital publisher’s platform. In traditional, paper based

publications publishers do not care about preservation or sustainability. The preservation of print

publications, once purchased, is the responsibility of the purchaser, mostly libraries. In the digital

15

age this has changed. Information sold and provided in digital format remains in the domain of the

publisher.

The publisher is the only actor in the publishing cycle who receives monetary profits directly, al-

though Kennan and Kautz (2007) argue that academics and scholars profit indirectly, for example

through increased reputation.

After the article is being published in a journal, whether online or in print, the research community

has access to that publication only if the journal has been subscribed by the library. The library acts

as an information professional and content service provider for the research community, separating

“the point of purchase” from “the point of use”. “Libraries purchase journals on behalf of their

community of users” (House of Commons Science and Technology Committee 2004, p.10). They

provide archiving for print publications and structured access to online resources. The growth of

distributed technologies and evolving modes of access have been affecting library services and

these implications are widely discussed, see for example Wittenberg (2008) and Borgman (1999).

The continuous move into a digital publishing era needs a rethinking of the roles of all actors in the

publishing cycle. Mabe (2006, p.62) states that through the digital transition “some processes have

become very easy (such as distribution), others have become much more complex.” However, the

discussion should not be limited to new technologies, it should rather also include issues like chang-

ing organizational structures and changing user needs (Wittenberg 2008, p.35).

5 Peer Review

Peer review is probably the most important defining characteristic of the modern, learned journal

(Mabe 2006, p.57). A discussion paper on peer review (Sense About Science 2004, p.ix) defines the

term peer review as “the reviewing and assessing of manuscripts for competence, significance and

originality, by independent qualified experts who are researching and publishing work in the same

field (peers)”.

Quality control has been an issue since the early days of journal publishing. Therefore, learned so-

cieties, such as the Royal Society of London (which published the first scientific journal back in

1665—see section 3), restricted their membership to fellows of known reputation (Foerster 2001).

They did not publish all submitted manuscripts, but they reviewed the papers approving a selection

for publication (Mabe 2006, p.57).

16

Peer review is considered a core value in science and the central mechanism by which scientific

quality is guaranteed. Beside the quality control mechanism, peer review serves the purpose of giv-

ing feedback to the authors, helping them to improve their manuscript.

Referees have to provide a very detailed, written feedback about the reviewed manuscript, not just a

‘yes’ or ‘no’ decision (Sense About Science 2004, p.8). Among the criteria that have to be assessed

by the peer reviewers are (House of Commons Science and Technology Committee 2004, p.11) the

quality of the research, the relevance of the article to the journal’s readership, its novelty and inter-

est and its content, structure and language.

Peer review is one of the key arguments of publishers keeping the current scholarly publishing sys-

tem. It is one added value publishers try to sell. As a newsletter for Elsevier’s journal editors

(Mulligan 2004) puts it: “It is testament to the power of peer review that a scientific hypothesis or

statement, presented to the world is largely ignored by the scholarly community unless it is first

published in a peer-reviewed journal.”

Although publishers only administer peer review, with qualified expert editors selecting the referees,

adjudicating the referee reports, and ensuring that authors revise as required (Berners-Lee et al.

2005). Researchers, not publishers, do the actual peer review, mostly without receiving financial

compensation. They invest their time and assess the manuscripts helping to ensure that the journals

publish only material of high quality.

The ways peer review is put into practice vary across disciplines, organizations, publishers and jour-

nals (Fröhlich 2006). The number of reviewers is typically limited, varying by journal, with averaging

two reviewers per manuscript. In the most common form of peer review, the reviewers’ names are

withheld from authors, with the intention to maintain the objectivity of the referees and the integ-

rity of the process (Foerster 2001).

Despite the acceptance within the research community, peer review has attracted criticism in recent

years. There have been concerns raised about bias, fairness, unnecessary delay, and overall ineffec-

tiveness of the process (Benos et al. 2007). The increasing volume of papers being sent for review

even increases the pressure on peer review.

Referees should identify flaws, but they are not infallible. In recent years, some notable incidents

happened by publishing a series of fraudulent papers which have passed the peer review process in

high profile journals like Nature and Science. In September 2002 Jan Hendrik Schön, nominated for

the Nobel prize, was discovered to have published 100 papers in just 4 years (Benos et al. 2007), in

17

March 2005 Woo Suk Hwang and his co-authors submitted a paper containing manipulated images

and fake DNA data to Science (Couzin 2006).

Mabe (2006, p.58) alerts that “generally peer review cannot determine whether the underlying data

presented in the article is correct or not, but peer review undoubtedly improves the quality of most

papers and the process is appreciated by authors and readers as greatly improving the quality of

reported research. The correctness or otherwise of the conclusions of a paper readily become ap-

parent as further investigations of that field are undertaken.” Peer review thus cannot detect every

case of fraud.

There are also claims that science and peer review act conservatively. This is highlighted by a sub-

stantial list of historical misjudgements, where even Nobel-winning works have been rejected by

prestigious journals (Nature 2003).

However, it is widely accepted, that peer review serves an important role. The report of the House

of Commons Science and Technology Committee (2004, p.94) and Benos et al. (2007) list argu-

ments, why the peer review system should remain intact:

• the volume of research output increases by approximately 3% per year (see also section 3). Re-

searchers are not able to scan all new papers to determine which are worth reading. The peer

review process provided by publishers “act as a filter”, helping researchers save time and

money.

• peer reviewed research output provides a measure of the researcher’s and their department’s

level of achievement. As noted above, scholars publish to advance career and reputation. Peer

review provides a “mark of distinction” for scholarly articles and these “incentive to publish

would be significantly reduced were the mark of achievement conferred by passing successfully

through the peer review process to be abandoned”.

• peer review gives the reader “an indication of the extent to which they can trust each article”.

Scientists depend on publications they can trust, and peer review helps to make journals a reli-

able source of new information.

• abolishment of the review process would remove the opportunity to respond to criticisms

raised by experts prior to publication.

18

Facing new developments in electronic publishing, peer review will have a vital role to play. Roberts

(1999) notes that this does not require the same methods of peer review as for print journals. But,

“[s]ome kind of 'filtering' system will, however, be essential if the academic community is to have

faith in the digital mode of scholarly publishing”.

New technologies available with the Internet and the inability to uncover scientific misconduct

have caused a discussion about the role of peer review. Many changes have been proposed (Benos

et al. 2007). There is an ongoing Nature web debate at http://www.nature.com/nature/

peerreview/debate/index.html, discussing the pros and cons of peer review, addressing questions

for improvements, as well as technological and ethical topics of the peer review process.

In chapter 4 section 4 some of these proposed modifications of the current peer review system will

be discussed.

6 Problems with the Traditional Publishing Model

6.1 Accessibility

Web based technologies have contributed to a major shift in the way scientific publishing is possi-

ble, and in fact most journals today have assumed “a parallel digital life” (Willinsky 2006, p.14).

Despite its long history and successful evolution the established journal publishing process is con-

fronted with growing dissatisfaction (Van de Sompel et al. 2004). The dissatisfaction is born, as

Houghton (2001, p.167) states, of a “combination of fundamental technological change and system

dysfunction”. One of the key arguments why the scholarly publishing system came under criticism

is its inability to provide fast and widespread access to scientific material even in the digital era.

By recalling the journal function of awareness (see section 2), authors demand that their articles

have the widest possible audience and therefore can be accessed without any barriers. Houghton

(2001, p.171) explains why social returns of consuming information are maximized through expan-

sion of access and wide dissemination, not by limiting access and exclusion: “The social value of

ideas and information increases to the degree they can be shared with, and used by, others. The

more such items are consumed, the greater the social return on investment in them.”

The lack of access to traditional scholarly publishing refers to the problem of the whole system to

achieve the purpose of communication cost-effectively (Graham 2000, p.4). The cost of the exist-

19

ing publishing process is borne by the consumers. In addition, in recent years the journal subscrip-

tion prices have been increasing continuously (Houghton 2001, p.168).

This fact together with static or even declining library budgets form the so called Serials Crisis. In

recent years research institutions have been unable to purchase the amount of research published

(Oppenheim et al. 2000, p.363). They cannot afford subscriptions to every publication they would

like to. Libraries and research institutions struggle with limited budgets and are even forced to can-

cel journal subscriptions (Harnad et al. 2004), limiting the amount of articles available to their users.

In conjunction with the exponentially growing amount of research findings, this phenomenon

slows down innovation and “run contrary to widespread knowledge dissemination, leading to a

declining access to research and scholarship within an otherwise expanding global academic com-

munity” (Willinsky 2006, pp.16-17).

Reasons for publishers for charging high journal prices include the spreading of costs across a

range of journal titles, a concentration in the corporate publishing industry, and uncertainty about

cost and pricing mechanisms for electronic journals (Houghton 2001, pp.172-173; Oppenheim et

al. 2000, p.369). Another factor is huge investments in new technologies and the increase of the

volume of research, leading—as often mentioned by publishers—to decreasing costs per article use

(European Commission 2006, p.59).

Funding bodies (Potočnik 2007) argue that public money is contributing multiple times to the re-

search process, by funding research and usually paying the salaries of reviewers. Finally, they often

acquire the final scientific journal publications for research organisations.

Harnad et al. (2008, p.37) distinguish between the access/impact problem and the journal-

affordability problem, noting that even if all journals were sold to universities at no profit, libraries

would not be able to afford all 25,000 scholarly journals. Hence, the access problem would still

persist.

6.2 Time Delay

Another problem is that current publishing processes introduce significant delays between submis-

sion and publication (Oppenheim et al. 2000, p.364), mostly as a result of communication delays in

the peer-review process and the increasing volume of research. It is evident that providing fast ac-

cess to scientific content is in conflict with the need of discussion and review process. The latency

between new findings and their publication is critical (Van de Sompel et al. 2004), especially in rap-

20

idly moving fields where current information is vital. These time delays can also be considered an-

other form of access barrier, hindering advance in science.

The delay of a manuscript being published in a journal can even be increased by the space limita-

tions of traditional print based journals, which results in a waiting queue of already accepted articles.

However, the Internet allows us to speed up many steps in the journal publishing process, by ena-

bling online submission of manuscripts, electronic correspondence in the review process and elimi-

nating the print production stage (Roberts 1999).

Despite these improvements, papers may not be published in much under a year from being sub-

mitted (Sense About Science 2004, p.11). Roberts (1999) even notes, that from the moment at

which ideas were first conceived to publication in print form can take up to three years in the tradi-

tional publishing model.

Therefore, applying web based technologies to speed up the publishing cycle is essential to re-

empower the journal’s function of communicating new research findings.

6.3 The Volume of Research Papers

Scientific research is generating increasing quantities of information. This fact is not limited to sci-

ence. Information is the fastest growing good today. The growth of published scholarly literature is

challenging the traditional publishing system. The researcher as reader is facing information over-

load. The growing volume of scientific material typically makes the individual article less visible and

harder to track. Finding relevant information is becoming more difficult. Similar to the problem

with time delays, web technologies can help scientists to locate more relevant research. These tech-

nologies allow scientists to have convenient access to an increasing amount of literature (Lawrence

2001).

Furthermore, semantic technologies will enable computers to read information. This can dramati-

cally improve communication and progress in science. It is mostly the responsibility of content

providers and publishers to offer such new services. Odlyzko (2002, p.7) notes that “readers are

faced with a ‘river of knowledge’ that allows them to select among a multitude of sources […] To

stay relevant, scholars, publishers and librarians will have to make even greater efforts to make their

material easily accessible.”

Another consequence of the fast growing amount of scholarly information is that it is becoming

more difficult to find enough reviewers who are willing to referee the increasing number of papers.

21

Hence, referees claim that the overall burden of review is increasing (Mulligan 2004, p.3). This can

lead to delays in obtaining reviews, hindering fast publication. The growing number of manuscripts

that needs to be reviewed can even affect the quality of the peer review process (Sense About Sci-

ence 2004, p.21).

6.4 Copyrights

Van de Sompel et al. (2004) refer to another issue which is mainly connected to access, the “per-

mission crisis”, which addresses the barrier arising from copyright law and licensing agreements,

restricting the use of publications once access has been obtained. There is growing resistance in the

scientific community transferring copyright of published works to publishers (Harnad 1997). These

restrictions do not only affect the reader, they also limit the use of the work for the author.

In most cases, scholars submitting their work to a journal for publication, have signed over copy-

right to the publisher (Swan 2006, p.3), in exchange for having the work reviewed and published.

Commercial publishers request the transfer of copyright to ensure exclusive distribution rights for

all media and to protect scholarly literature from unauthorized copying. They insist on exclusive

rights because of providing the journal’s function of certification (see chapter 1, section 2), ensuring

that readers can access the final, peer-reviewed version of an article. For details of copyright from a

publisher’s view see, for example, the authors’ section at Elsevier’s website

(http://www.elsevier.com/wps/find/authorsview.authors/copyright).

As the usage of scholarly content moves to online media, the information can be shared more eas-

ily. This threatens commercial publishers. To retain control, licensing and ownership of content by

publishers is increasing (Henry 2003).

Resistance to existing copyright agreements increases demand to support different types of copy-

right. Recently, with the background of growing investments in scientific research, a number of

research funders are mandating alternative copyright agreements. A Proposal from the ‘Transition

from Paper’ Working Group (Bachrach et al. 1998), addressing research funded by the U.S. gov-

ernment, states that “[f]ederal agencies that fund research should recommend (or even require) as a

condition of funding that the copyrights of articles or other works describing research that has been

supported by those agencies remain with the author.”

22

Graham (2000, p.9) supports this argument, noting that “researchers should retain copyright and

only licence the material to publishers for specific purposes.” Willinsky (2006, p.49) notes that “the

key to copyright is the right of authors to profit from their work.”.

In 2001, Lawrence Lessig and James Boyle founded Creative Commons (2005), a system of agree-

ments which helps authors to retain ownership of their work, while granting users some rights,

such as accessing and copying the work. Creative Commons offers a set of easy usable standard

agreements for authors.

Despite those challenges for commercial publishers, there is still a role for professional publishing

services in a world where fast growing scientific content must be widely disseminated. Publishers

must convince the scientific community that they can provide the best tools and value add ser-

vices—services which cannot be copied—to their publications. However, publishers must deal with

copyright issues in this changing technological environment. By enabling authors to self-archive

their work in a repository (see chapter 3 section 1.1), most commercial publishers granted a “sub-

stantial exception to their control of materials for which they hold the copyright” (Willinsky 2006,

p.48).

C h a p t e r 3

OPEN ACCESS

Open Access addresses some of the major problems with the traditional publishing model. This

chapter will give an introduction to the concepts of Open Access and its historical developments.

It discusses advantages and concerns, whether this new type of publishing represents a viable op-

tion to replace or at least complement existing publishing models.

1 What is Open Access

Communicating their results to peers remains the primary reason for scholars publishing their work

(Swan 2005). Advanced digital technologies encouraged a new era in scientific communication.

Today, scholars can collaborate, communicate, and share their research findings within seconds

over the Internet, simply by sending an email to colleagues or posting it on their weblog. The web

supports new methods of communication and collaboration beyond the traditional, formal journal

article.

Hence, the digital age has pushed changes in scientific publishing processes. Web based publishing

technologies challenge the established, print focused publishing model by enabling scientific com-

munication in a faster way and often allow unrestricted and free access to scholarly information.

These new methods—offering free access to scientific content—are commonly referred to as Open

Access. The term Open Access is defined—among others—in the Budapest Open Access Initiative

(2002) as the “free availability of research results on the public Internet, permitting any users to

read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them

for indexing, pass them as data to software, or use them for any other lawful purpose, without fi-

nancial, legal, or technical barriers other than those inseparable from gaining access to the Internet

itself”. Furthermore, Open Access is free of most copyright and licensing restrictions: “The only

constraint on reproduction and distribution, and the only role for copyright in this domain, should

be to give authors control over the integrity of their work and the right to be properly acknowl-

edged and cited.”

Bailey (2006, p.15) lists the key points of Open Access. First, it means availability of scientific mate-

rial on the Internet. The term scientific material applies to journal articles as well as to other types

24

of research output such as conference papers, theses or research reports. Utilizing the web as dis-

tribution channel allows expanding dissemination and access to the largest possible audience.

Hence, Open Access means greatly expanded access to research.

Peter Suber (2007) identifies some additional characteristics of Open Access: Open Access is free of

charge, it removes price barriers (e.g. subscription fees) and permission barriers (copyright and li-

censing restrictions). Another key aspect of Open Access is that it should be immediate (not delayed)

in order to remove access barriers. Furthermore, Open Access means the permanent availability of

full-text articles (not only abstracts) on the Internet.

The costs to distribute scientific content on the Internet are far lower compared to traditional print

dissemination. Although Open Access publishing is not costless to achieve, the Open Access

model helps to save publishing costs. It is important to distinguish that Open Access is free for

readers, but it is not free for producers (Suber 2007).

It is evident that better access to scientific publications will enhance and further stimulate research

findings and innovation. The scientific community benefits from an expanded research cycle in

which research can advance more effectively because researchers have immediate access to all the

findings they need. Hence, the progress of research can be maximised. Open access is a way to

attain this goal, because it does address the access problem described in the last chapter. When re-

searchers have immediate access to scientific literature, articles are cited sooner, which Kurtz et al.

(2005) have called the early access effect. This means greater visibility and greater research impact of

Open Access articles compared to subscription based journal articles (see also section 4.2). An arti-

cle's research impact “is the degree to which its findings are read, used, applied, and cited by users

in their own further research and applications.” (Harnad et al. 2004)

There are two concepts to achieve Open Access, which will be discussed in more detail below

(Harnad 2005):

• Self archiving of either already published, peer-reviewed articles (postprints) or draft ver-

sions of articles (preprints). This is being accomplished by archiving scholarly content in re-

positories. Institutional repositories are public online archives, where papers are deposited.

Such archives follow a set of standards to allow interoperability like searching across multi-

ple repositories. This concept is known as the Green Road to Open Access (see 1.1)

• articles being published in Open Access journals—the Gold Road to Open Access (see 1.2).

These journals often perform peer review and therefore have additional costs transforming

25

manuscripts into approved journal articles ready for free distribution on the Internet.

Hence, most Open Access journals follow the principle that the article processing fee has to

be paid by the author or the author’s sponsor. Mostly because of the involved costs, Open

Access journals are more difficult to launch and to operate.

1.1 The Green Road

The Green Road to Open Access means authors archiving copies of their articles in Open Access ar-

chives or repositories (Harnad 2005). This process is also referred to as self-archiving. A full-text ver-

sion of the article in an appropriate electronic format is deposited in an institutional or subject

based repository in order to achieve Open Access, dissemination and long-term archiving. Self-

archived articles may be preprints (draft articles that have not been peer-reviewed) or postprints

(the final draft of peer-reviewed articles, accepted for publication). The term used for both is e-prints

(Bailey Jr. 2005, p.xvii).

Clifford Lynch (2003) defines a (university based) institutional repository as “a set of services that a

university offers to the members of its community for the management and dissemination of digital

materials created by the institution and its community members. It is most essentially an organiza-

tional commitment to the stewardship of these digital materials, including long-term preservation

where appropriate, as well as organization and access or distribution.“

Self-archiving in an institutional repository is the means by which institutions can make their re-

search findings available for free over the Internet. Furthermore, it helps to ensure the preservation

of those articles—and hence to achieve the archiving function—in a rapidly evolving electronic

environment (House of Commons Science and Technology Committee 2004, p.56). Most universi-

ties can only afford to subscribe to a fraction of scientific journals. This problem with the tradi-

tional publishing model—as mentioned in chapter 2 section 6.1 commonly referred to as Serials

Crisis—means that research published the traditional way only has a fraction of its potential usage

and impact. Self-archiving papers in an institutional repository can overcome this problem.

Self-archiving is definitely not publication. It does not replace the functions of scientific journal

publishing. It rather preserves a copy of a fully peer-reviewed article. Self-archiving in repositories is

often regarded as an adjunct to the traditional journal publishing process, where commercial pub-

lishers still provide the certification and registration (Kennan & Kautz 2007).

26

Making already published articles available in such repositories is possible because most scientific

journals published by traditional publishers permit authors to distribute those articles under speci-

fied terms and conditions. Such self-archiving policies often include terms stipulating that availabil-

ity in repositories is delayed 6 to 12 months after first online publication in the journal. For a com-

prehensive overview of the diverse types of self-archiving policies see Suber (2007). Over 90%

(Harnad et al. 2004) of scientific journals already give authors a green light to self-archive in some

form.

This fact demonstrates that publishers have responded (Harnad et al. 2004) to the research com-

munity's demand for Open Access. Having the permission for self-archiving offers authors the

option of Open Access. In other words, realising the green strategy depends only on the authors

and their institutions. But so far only about 15% of researchers are spontaneously self-archiving

their publications (Harnad 2006). To increase this number even up to 100%, all universities and

research funding agencies must adopt Open Access self-archiving mandates, i.e. make self-archiving

of published articles a requirement.

In parallel, more indirect strategies (Guédon 2006) like incentives could increase the self-archiving

ratio. Such incentives can include Open Access performance metrics (like number of citations,

number of downloads, number of articles etc.) for motivating and rewarding self-archiving. But

these figures also reveal a shortcoming of the current Open Access strategies: the inability to con-

vince all authors to self-archive their research findings. As the Berlin Declaration (2003) puts it:

“Establishing Open Access as a worthwhile procedure ideally requires the active commitment of

each and every individual producer of scientific knowledge and holder of cultural heritage”.

Making Open Access possible for articles in institutional repositories required some technological

innovations and developments (Lynch 2003):

• the development of the World Wide Web (originally developed to enable scholarly com-

munication in physics). Open Access was physically impossible in the print-only publishing

era.

• cheap online storage costs.

• standards like the Open Archives Metadata Harvesting Protocol (OAI-PMH), exposing the

metadata of each article (title, authors and other bibliographic data) and allowing the har-

vesting of repository contents.

27

• ability to reference materials in institutional repositories using persistent identifiers.

• development of institutional repository platform software like DSpace, jointly developed by

the Massachusetts Institute of Technology (MIT) and Hewlett-Packard (HP), or EPrints

(developed by the University of Southampton). Current usage of the different platforms

can be tracked through the Registry of Open Access Repositories (ROAR, roar.eprints.org).

Self-archiving offers benefits for the authors as well as for the institution: both wish to enhance the

visibility and impact of the research generated within that institution (Swan 2005). But it also has

benefits to the research community as a whole by encouraging fast and barrier-free Open Access to

research findings.

Institutional repositories offer an additional benefit: They allow storing of research data that cannot

be published in traditional journal format, which is, however, often an integral part of research find-

ings. Such data can be very large datasets or audio-, video- and image files.

Repositories can greatly enhance access to scholarly literature. This is why Open Access is inevita-

ble and why it can be accomplished by using the Green Road strategy. Self archiving works and can

look back at a 15 year history. The first and best known subject-specific repository, ArXiv, was

founded in 1991 (Ginsparg 1997) as an archive for preprints in physics and had a big impact on its

discipline since. ArXiv today provides access to 500,000 e-prints in physics, mathematics, computer

science, quantitative biology and statistics.

1.2 The Gold Road

The other major Open Access strategy is Open Access journals, known as the Gold Road to Open Access.

This model is based on traditional journal publication, but alters the economic funding model.

Bailey Jr. (2005, pp.xviii-xix) characterises Open Access journals as follows:

• they are primarily electronic journals (print editions are sometimes offered as an optional

fee based add-on)

• all the content of the journals are freely available on the Internet

• they allow authors to retain their copyrights, but they may use Creative Commons or simi-

lar licences

28

• they are peer-reviewed like conventional journals (although there is some dispute whether

Open Access journals must utilise peer review)

• Open Access journals cost money to produce and distribute, especially since they utilise

quality-control mechanisms such as copy-editing. But they can realize significant savings by

publishing online only.

The Gold Road can be achieved by creating new Open Access journals from scratch or transform-

ing existing publications into Open Access journals.

To cover their costs, however, many of the gold journals have had to adopt the Open Access jour-

nal cost-recovery model (Harnad et al. 2004). The costs involved in producing such journals have

created various funding strategies. Open Access journals replace the current model of subscription

based journals with other economic models which do not charge readers or their institutions for

access. Some of them will be outlined in more detail in section 3.

The Directory of Open Access Journals (DOAJ, www.doaj.org) currently lists about 3,500 Open

Access journals, meaning that today about 15% of all learned journals are gold.

Currently, the “riskiness and untestedness of the Gold Road strategy” (Harnad et al. 2004) prevent

publishers from adopting the Gold Road. The fact that more than 90% of all journals permit self-

archiving make Open Access publishers more willing to go for the Green Road (Harnad et al.

2004). One of the main problems about the transition to Open Access journals today is whether

the existing business models are viable (Guédon 2006).

Still there is a growing number of Open Access journal publishers. The following organizations

play a major role in the publication and archiving of Open Access journals: BioMed Central, the

Public Library of Science (PLoS), and PubMed Central. Bailey (2006, p.24) refers to them as “born

Open Access publishers”, established solely for the purpose of publishing Open Access journals.

BioMed Central is a commercial publisher offering a wide range of Open Access journals in all

areas of medical research and biology. BioMed Central encourages self-archiving by authors. All

articles published in BioMed Central journals will be archived in PubMed Central, the digital ar-

chive of the U.S. National Institutes of Health (NIH).

The Public Library of Science (PLoS) is a non profit organization publishing Open Access journals

in such fields as biology, genetics and tropical diseases. It was founded by Nobel prize winner

Harold Varmus in 2003.

29

New developments in software assist and improve all stages involved in the manuscript manage-

ment and the publishing process of Open Access journals. The aim of such software is to ease the

setup and maintenance of journals and to reduce journal’s software design and development costs

as well as to reduce the costs needed for actually running journals (Willinsky 2006, p.74). One such

software is Open Journal Systems, an open source journal management system developed by the Pub-

lic Knowledge Project (pkp.sfu.ca/ojs).

2 Open Access Milestones

The Budapest, Bethesda and Berlin definitions of Open Access are the most central and influential

for the Open Access movement. These declarations together comprise the best definition of Open

Access. Peter Suber (2007) refers to them collectively as the BBB definition. He notes that Open Ac-

cess is compliant with the existing copyright law, because “it does not require the abolition, reform,

or infringement of copyright law. Nor does it require that copyright holders waive all the rights that

run to them under copyright law and assign their work to the public domain.” Authors grant users

additional rights to read, download, copy, share, store, print, search, link, and crawl of the full-text

of the work.

2.1 Budapest Open Access Initiative

A meeting in Budapest in December 2001 and the resulting statement of this meeting, published in

February 2002, stands as the most important definition of Open Access (Budapest Open Access

Initiative 2002). This document recommends the two strategies to achieve Open Access, namely

self-archiving and Open Access journals (1.1 and 1.2).

„An old tradition and a new technology have converged to make possible an unprecedented public

good. The old tradition is the willingness of scientists and scholars to publish the fruits of their

research in scholarly journals without payment, for the sake of inquiry and knowledge. The new

technology is the Internet. The public good they make possible is the world-wide electronic distri-

bution of the peer-reviewed journal literature and completely free and unrestricted access to it by all

scientists, scholars, teachers, students, and other curious minds.”

Examining this passage, Open Access is closely connected with electronic distribution of scholarly

literature over the Internet. Only electronic distribution enables cost effective, widespread Open

Access to scientific literature.

30

The ability to provide access to this literature at far lower costs encouraged "many alternative

sources of funds for this purpose, including the foundations and governments that fund research,

the universities and laboratories that employ researchers, endowments set up by discipline or insti-

tution, friends of the cause of Open Access, profits from the sale of add-ons to the basic texts,

funds freed up by the demise or cancellation of journals charging traditional subscription or access

fees, or even contributions from the researchers themselves. There is no need to favor one of these

solutions over the others for all disciplines or nations, and no need to stop looking for other, crea-

tive alternatives."

Every potential reader, from interested laymen to researchers should be able to find, access and use

scientific journal literature. Open Access gives authors and their institutions “vast and measurable

visibility, readership and impact”.

2.2 Bethesda Statement

In April 2003 another important meeting took place in Chevy Chase, Maryland. It resulted in ex-

tending the definition of Open Access. It specifies that copyright owners will grant users “a free,

irrevocable, worldwide, perpetual right of access to, and a license to copy, distribute, transmit and

display the work publicly and to make and distribute derivative works” (Bethesda Statement 2003).

Thus the statement introduced the right to make derivative works (for example translations to

other languages).

Another extension to the Budapest Open Access Initiative is the requirement to deposit the article

“immediately upon initial publication in at least one online repository that is supported by an aca-

demic institution” (Bailey Jr. 2006, p.17), helping to achieve the long-term preservation in contrast

to simply publishing on authors’ homepages.

2.3 Berlin Declaration

In October 2003, 19 international research organizations signed the Berlin Declaration on Open

Access (2003).

As regards content the Berlin Declaration on Open Access has only minor differences to the Be-

thesda statement.

31

It starts with: “The Internet has fundamentally changed the practical and economic realities of dis-

tributing scientific knowledge and cultural heritage improving access to scholarly research” and

manifest the goal of a “global and accessible representation of knowledge”. Therefore, “the future

Web has to be sustainable, interactive, and transparent. Content and software tools must be openly

accessible and compatible.“

The signatories to the Berlin Declaration—now about 250 organisations from all over the world,

including large research institutions such as Germany's Max-Planck Institutes and CERN in Swit-

zerland—have committed themselves to taking action within their organizations and hence to

strengthen Open Access.

Figure 3: Berlin Declaration signatories (http://oa.mpg.de/openaccess-berlin/signatories-graphs.html)

In March 2005 the penultimate in a series of follow-up meetings (Berlin 3 Open Access 2005) on

the implementation of the recommendations in the Berlin Declaration, issued a policy recommen-

dation that “universities, research institutions and research funding agencies should require—as a

matter of institutional policy—that their employees/fundees deposit a supplementary copy of each

of their published research journal articles into their own institutional OAI-compliant repository”,

hence specifying steps how an institution should provide Open Access.

2.4 Open Access Mandates

Various research funding organizations stated their support for Open Access to publicly funded

research data. Recently, funding bodies and research institutions are extending this support. They

are now beginning to mandate that works funded by them should be made freely available using

Open Access repositories.

32

The following list of major Open Access landmarks is non-exhaustive, but it contains some notable

events which contribute to the growing attention of Open Access:

In January 2004, the Organisation for Economic Co-operation and Development (OECD) and 34

nations declared that “open access to, and unrestricted use of, data promotes scientific progress”

(OECD 2004).

The Wellcome Trust, the world's largest private funder of medical research strongly endorsed Open

Access in a statement and a report on the economics of scientific research publishing (Wellcome

Trust 2003). Everyone should have free Internet access to the full text of high quality, peer re-

viewed articles as soon as the paper version is published. One solution supported by the trust is the

creation of Open Access journals, such as the Public Library of Science (PLoS). The Wellcome

Trust recognises that Open Access fees are a legitimate research cost, hence supporting the idea

that publication costs should be covered as part of research spending.

In 2004, the UK House of Commons Science and Technology Committee also issued a report on

scientific publishing (2004). This report recommended that UK “Research Councils and other

Government funders mandate their funded researchers to deposit a copy of all their articles in their

institution's repository within one month of publication”. A further passage focuses on copyright:

“The issue of copyright is crucial to the success of self-archiving […] Research Councils and other

Government funders should mandate their funded researchers to retain the copyright on their re-

search articles, licensing it to publishers for the purposes of publication. The Government would

also need to be active in raising the issue of copyright at an international level.”

The United Nations World Summit on the Information Society in Geneva (WSIS 2003) declares

key principles for an “Information Society to all”, stating that “[t]he ability for all to access and con-

tribute information, ideas and knowledge is essential in an inclusive Information Society” and

therefore “promote universal access with equal opportunities for all to scientific knowledge and the

creation and dissemination of scientific and technical information, including Open Access initia-

tives for scientific publishing”.

In December 2007 the U.S. Congress has mandated that the public has access to articles arising

from National Institute of Health (NIH) sponsored research through the PubMed Central reposi-

tory (National Institutes of Health 2007). The policy requires self-archiving of the final peer-

reviewed journal article no later than 12 months after publication. Since the NIH Open Access

mandate has been issued, the number of submissions to the PubMed Central repository has risen

significantly (Library Journal Academic Newswire 2008). Initially, archiving of NIH papers was

33

optional for authors, resulting in less than 4 per cent of all researchers self-archiving their articles

(Agosti 2008). Hence, this fact proves that Open Access mandates work and are the only success-

ful way to guarantee self-archiving.

In his opening address at the conference Scientific Publishing in the European Research Area in February

2007, European Commissioner for Science and Research Janez Potočnik noted, that “[o]ver the

next 7 years, the EU will invest over 54 billion euros in research and development. I want every

euro of this funding to contribute in some way to developing a true European Research Area and

creating a strong European knowledge society. That is my job. The European Commission, and,

indeed, the European citizen, must get a good return on its investment” (Potočnik 2007).

The European Research Council (ERC) (2007) stated their intention “to issue specific guidelines

for the mandatory deposit in Open Access repositories of research results”. The ERC has estab-

lished an interim position on Open Access that requires that all peer-reviewed publications be de-

posited on publication into an appropriate research repository, making ERC funded research Open

Access within 6 months of publication. Additionally the ERC considers essential that primary data

is deposited to relevant databases as soon as possible.

In August 2008 the European Commission launched a pilot project (European Commission 2008)

that will give unrestricted online access to EU-funded research results, primarily research articles

published in peer reviewed journals, after an embargo period of between 6 and 12 months. This

experiment will cover 20% of the EU 2007-2013 research budget.

There are a dozen earlier university mandates, Harvard University being the latest by adopting a

policy (Suber 2008a) that requires faculty members to allow the university to make their scholarly

articles freely available online. This makes Harvard the first university in the United States to man-

date Open Access to its faculty members’ research publications.

The Fonds zur Förderung der wissenschaftlichen Forschung (FWF 2008), an Austrian research

fund, issued the following Open Access policy for projects it supports: “In accordance with the

‘Berlin Declaration’, the FWF holds all project leaders and project workers responsible for making

their publications freely available via Open Access media on the Internet. The responsibility to pub-

lish in Open Access media may only be waived if legal reasons make Open Access publication im-

possible. Any such cases must be justified to the FWF.”

34

Almost all of the Open Access mandates and supporting documents mentioned above stress that

peer-review is of fundamental importance in ensuring the certification and dissemination of high-

quality scientific research.

3 Open Access Business Models

With the growing numbers of Open Access journals and repositories the number of business mod-

els increases as well. As already mentioned, Open Access is not completely free for the producer.

Open Access publishers must cope with investments in technological infrastructure, editors, cost

intensive peer review processes (Varmus 2003) and administrative processes.

Open Access publishers achieve significant cost savings compared to the traditional journal pub-

lishing workflow. This is established mainly through lower administration costs (subscription han-

dling, sales etc.). Distributing journals primarily over the Internet also contribute to cost savings by

eliminating print production costs, but this benefit is also realizable for commercial publishers.

Willinsky (2006, p.76) reports that with the elimination of print editions and the use of journal

management software cost savings up to 50 percent can be achieved.

Therefore, how and by whom an Open Access journal or repository is funded is still an important

issue.

The best known business model for Open Access journals are author fees. This model is known as

the author-pays model. Instead of the readers or libraries paying the journal subscription price, the

authors or their institution pay to publish their articles in a journal (House of Commons Science

and Technology Committee 2004). The model is based on the assumption that the publication is an

integral part of research and therefore the institution can pay for it. Suber (2006) refers to publica-

tions using this model as “fee based” Open Access journals. Major Open Access publishers like

BioMed Central or Public Library of Science (PLoS) utilize author fees. They charge between

US$ 500 and US$ 3,000 for published articles (Willinsky 2006, p.214).

Although author fees are the most common model, most Open Access journals (52%) do not

charge any sort of author-side fees (ALPSP 2005, p.10). Suber (2006) notes, that “the majority of

Open Access journals turned out to use business models that had rarely been acknowledged, let

alone studied”. Some of them have direct or indirect subsidies from institutions. Others have reve-

nues from advertising or membership fees, and there is often huge volunteer support.

35

Willinsky (2006, p.212) lists 10 models of Open Access, most of them financed by the research

institution. The list captures varieties of funding including subsidy from scholarly societies, gov-

ernments or foundations, some journals sell print subscriptions and offer an online Open Access

edition. Willinsky also includes “Partial Open Access” in his list, where some articles of each jour-

nal issue are Open Access. Even the largest commercial publisher, Reed Elsevier, contributes to

Open Access by providing free access to bibliographic information and abstracts via its ScienceDi-

rect online portal (Willinsky 2006, p.28).

Commercial publishers have also begun to offer real Open Access publishing programmes. Such

models are hybrid open and subscription based and funded in different ways. For example, more

than 40 journals published by Elsevier offer authors the option to sponsor non-subscriber access to

individual articles. The charge for article sponsorship is US$ 3,000. Springer Open Choice also al-

lows authors to publish their articles Open Access for a fee of US$ 3,000 (Willinsky 2006, p.5),

making the article freely available on SpringerLink and allowing authors to self-archive their article

in a repository.

In October 2008, Springer has bought one of the most established Open Access publishers, Bio-

Med Central. The acquisition proves that the Open Access publishing model is recognized as a

viable business model by one of the largest commercial publishers. Springer CEO, Derk Haank

even notes that “this acquisition reinforces the fact that we see Open Access publishing as a sus-

tainable part of STM publishing, and not an ideological crusade.” (Suber 2008b)

4 Key Open Access Aspects

4.1 Global Access to Research

Another key dimension of Open Access to scientific content is that it reduces the global digital

divide in terms of accessibility and affordability of research findings (Willinsky 2006, p.28). It is

evident that access to relevant information increases social and economic returns. Today access to

relevant information is unequally distributed. Research institutions in developing countries struggle

to afford subscription based journal literature.

The Internet has removed geographical barriers, enabling a global circulation of knowledge. Con-

sider the remarkable achievement of the Internet, where “in fewer than 4,000 days we have en-

coded half a trillion versions of our collective story and put them in front of 1 billion people”

36

(Kelly 2005). But there is still an economic barrier which prevents access to research findings for

scholars in developing countries. This barrier has a negative impact on innovation and scientific

strength in such countries. The current publishing model does not only restrict access, it also limits

the research output of developing countries (Kirsop & Chan 2005, p.247).

The Internet could help to lower this barrier, allowing researchers in developing countries conven-

ient access to research and journals produced in developed countries. Self-archiving of already pub-

lished journal articles in repositories will provide, like Kirsop & Chan (2005, p.251) state, “the fast-

est and most efficient way” for these nations to build their own research archives as well as access

international scholarly content. But, the transition to the digital publishing era requires fast Internet

connections and computer equipment to connect the developing world to new scientific informa-

tion.

In addition, Open Access increases visibility of research by developing countries, which is discussed

in the next section.

4.2 Visibility and Citation Impact

Deposited in Open Access repositories, scientific content can gain the widest possible audience.

Making research findings Open Access has implications in terms of visibility and the citation im-

pact of articles. The citation impact is defined as the count of the number of citations to an article

(Brody et al. 2004). A citation is defined as a listing of a previously published article in the reference

section of an article (Craig et al. 2007, p.240), indicating the relevance of the cited article for the

current work.

There are citation index databases available (Scopus, Google Scholar) which collect these bibliographic

records and citations. Such services are essential for locating research on the Internet and make it

possible for authors to see how many times their work has been cited.

Several studies covering fields like computer science and physics (Lawrence 2001; Harnad & Brody

2004) have found a clear correlation between the number of times an article is cited, and the prob-

ability that the article is online (Figure 4). These studies revealed dramatic citation advantages for

Open Access.

37

Figure 4: Average citation ratios for articles in the same journal and year that were and were not made Open Access by author self archiving. Date span: 1992–2003.

(Harnad et al. 2008, p.38) Whether Open Access has an causal effect on citation count is under heavy discussion. Most of the

studies mentioned above concentrate on differences between articles that were made available

online and those that were not (Craig et al. 2007, p.244). Kurtz et al. (2005) deconstructed the ele-

ments of the Open Access citation effect, dissecting three components which they called: first the

open access postulate, because authors are able to read them more easily, and thus they cite them more

frequently; second the early access postulate, because the article appears sooner and so it is cited more

often; and finally the self-selection bias postulate, which sees better authors archiving more. The results

showed that there is no general Open Access citation effect, but a strong early access and self-

selection bias effect.

4.3 Long-term Preservation

Long-term access to scholarly publications is an important component of the existing publishing

cycle and not limited to the Open Access publishing model. Archiving scholarly content ensures

usability of digital contents, future accessibility as well as the possibility to reference and cite schol-

arly information. Preserving scholarly content in the long-term helps to ensure traceability and re-

peatability of experiments (European Commission 2007).

In the print world, librarians have accepted responsibility for the long-term preservation (Park

2007, p.24). However, in the digital era, providing long term access and preservation of publications

has become a complex challenge. Libraries no longer own publications. A consequence of elec-

38

tronic publishing is the shift of responsibility for maintaining access to scholarly content from li-

braries to publishers.

However, librarians must ensure access to subscription based electronic journals, even if the license

period has ended. Therefore, libraries are required to provide strategies to maintain copies of elec-

tronic journals in the long term.

Whereas print editions of journals can be accessed for decades, digital materials can become un-

readable as storage technologies change and current preservation formats become obsolete (Kling

& McKim 1999). Gerstein (1999) alerts: “How can we be assured that an article written today in

format X will be interpretable in 50 years from now?”. It is hard to predict what the preferred elec-

tronic format will be in the future. But, like Ginsparg (1997, p.94) states, most likely it will be “none

of TeX, Postscript, PDF, Microsoft Word”.

Besides finding the right format for preservation, technological infrastructure must be developed to

address issues like persistent identifiers, interoperability or optimized metadata. Digital objects are

complex and are getting even more complex. In this respect, the use of open standards is crucial.

There exist several initiatives to develop the required digital archives. These projects are mostly

funded by agencies and research institutions.

The Open Archival Information System Reference Model (OAIS) (Consultative Committee for Space Data

Systems 2002) provides the most widely accepted standard on what is required for an archive to

retain perpetual access and permanent long-term preservation of digital information. Among the

predominately used repositories projects are DSpace and EPrints software. For an overview of re-

pository software packages see Park (2007, p.24) and Awre (2006).

This open source repository software is fundamental for facing long-term preservation challenges.

The Open Access movement is a key driver and therefore plays a vital role in the development

process of such repository software.

Though, at present there is little experience with preservation and uncertainty of funders and insti-

tutions about finding the right strategy to cope with digital preservation in the future. As the Euro-

pean Commission (2007) emphasizes, there is still no clear long-term preservation strategy in place

across the EU. Despite some promising approaches, digital preservation strategies deserve more

research. Resolving those long-term preservation problems is crucial to increase the acceptance of

electronic publishing models (Beyer & Irmer 2007).

39

5 Arguments against Open Access

The rise of the Open Access movement has resulted in a debate whether it is a viable publishing

option and what the implications for established commercial publishers are. Commercial publishers

defend the current subscription based system.

Among the main arguments against Open Access are (European Commission 2007, p.4):

• Publishers argue that there is no access problem. Quite to the contrary, access to scientific

information has never been better.

• Publishers add considerable value to the research process by guaranteeing the quality of

journal articles. Hence, the costs of publishing are adequate.

• The publishing market is a highly competitive market and does not require public interven-

tion.

• There is no clear and viable alternative to the existing system.

Open Access seeks to bypass the costly publishing process and this will seriously threaten the fu-

ture of established publishers (Oppenheim et al. 2000, p.361).

However, as Björk (2005) mentions, Open Access has not changed the commercially oriented

scholarly communication system so far, because there still exist some barriers to be changed.

Among these barriers the concern whether Open Access may find a sustainable business model

seems to be a very central issue. This argument is weak as the acquisition of Open access publisher

BioMed Central by Springer demonstrates. Another barrier is related to the critical mass of Open

Access publications and the fact that Open Access adoption differs from discipline to discipline.

For researchers, publishing in high prestigious journals still counts more than making their work

Open Access (Björk 2005).

A major concern about the Open Access publishing model includes the need to not undermine the

peer review quality mechanism (European Commission 2007). Confronted with gaining support by

the Open Access publishing model, in November 2007 commercial publishers announced the

Brussels Declaration (International Association of Scientific, Technical & Medical Publishers 2007),

ten principles about science publishing. The declaration tries to outline the indispensable contribu-

tion of publishers to maximise the dissemination of knowledge through economically self-

40

sustaining business models. Publishers argue that Open Access to scientific information would

jeopardise this system and therefore destroy the established peer review system.

On the other hand, even Open Access proponents (Harnad 1998) argue that peer review is the

central mechanism which maintains quality in scientific publishing and therefore needs to be re-

tained.

Aronson (2005) alerts that “we have little evidence about the balance of benefits and harms” of

immediate, free access to scholarly information. He outlines “few advantages and many disadvan-

tages” of the Open Access model, stating that someone has to pay, even in the author-pays model.

He warns that the Open Access system will be open to abuse. “The wide availability of the Internet

means that Open Access journals will spring up everywhere and vanity publishing in science will be

possible”.

However, publishers are already more open minded regarding Open Access. They already give au-

thors permission to self-archive their work (see section 1.1), but they try to delay self-archiving to

not destabilising their existing subscription based business model. It is unclear how large-scale self-

archiving will affect journal subscriptions. It is highly likely that libraries will cancel subscriptions to

journals whose content is immediately available at no charge (R. Anderson 2007). Yet, Berners-Lee

et al. (2005) state that after 15 years self-archiving and journal publications continue to co-exist

peacefully.

41

C h a p t e r 4

THE FUTURE OF SCHOLARLY PUBLISHING

Scholarly publishing will change. As the authors of a report called ‘University Publishing in the A

Digital Age’ (Brown et al. 2007, p.4) note, the future of scholarly publishing will look very different

from it looked like in the past. Although electronic publishing technologies have greatly improved

the distribution and reduced delays in publication, there are constantly new technologies emerging

which could further enhance scientific communication. New information technologies drive inno-

vation on each stage of the publishing process, from the idea generation process to the dissemina-

tion of the final publication. Technologies and concepts like the Semantic Web, Web 2.0 and net-

work-enabled collaboration emphasize new research environments and scholarly publishing work-

flows. They even allow reviews and discussions after publication.

In this chapter, some of these new ways and forms of disseminating and publishing scholarly in-

formation are explored. But will this ever affect the traditional journal and article as basic unit in

scholarly communication? Michael Mabe (2006, p.65) states: “It is impossible to tell. But based on

the fit between journal functions and researchers’ human needs, journals and their like are probably

around for some time to come.” The author believes that the article will remain the entity of text

based description of new research findings. But the scientific article will have to be enhanced to

integrate or at least reference all relevant supplemental data and make information easily accessible

for both humans and applications.

Web technologies will definitely affect one the journal’s most important characteristics, the function

of certification. Collaboration technologies on the Internet will emphasize new, more effective ways

of the traditional peer review quality mechanism.

1 New Research and Publishing Environments

Given the fact that most established journals today are available online, Hannay (2007) notes that

“we are already in a world where scientific information is primarily digital”. Seringhaus & Gerstein

(2007) state that although ”the Internet has revolutionized the way our society thinks about infor-

mation, the traditional text based framework of the scientific article remains largely unchanged”.

Currently most web publishing activities focus on the traditional journal which has already been

42

available in the print-only era. This leads to the parallel availability of scientific journals as a digital

edition. As outlined in chapter 1 section 3, electronic journal publication can serve the scientific

community in many ways.

In the light of fast changing web technologies there is no doubt that the Internet will enable com-

pletely new forms of scientific communication. We are just at the beginning of utilizing these new

technologies for advanced services. The use of new web based technologies is not limited to the

publishing stage in the research process. The rapid progress of such technologies opens completely

new opportunities on each stage in science, such as collaborative research environments (Van de

Sompel et al. 2004).

At the publishing stage, the Web encourages new forms of informal scholarly publishing like blogs,

discussion forums or preprint servers to share researchers’ work and ideas. This is a tremendous

change in scientific communication, allowing real-time collaboration and interaction. In addition,

these new applications allow capturing, sharing and archiving of new formats and types of data, for

example non textual content like audio and video data.

A consequence of the increased use of these new forms of communication will be the need to en-

hance the existing scholarly publishing system. Seringhaus & Gerstein (2006) note that “journals

must produce more than just papers”. New publishing systems need to be developed. Systems,

which will support different types of content, text and multimedia, associated data sets, presenta-

tions, software and simulations (Van de Sompel & Lagoze 2007). Increasingly data-driven research,

for example experiments in genomics, produce extensive amounts of data sets, which are also

worth preserving. However, publishing all associated data is not possible in the restricted, text

based framework of journal articles. As research findings consists of a broader set of modes and

formats, this data must be “regarded as a critically important part of the publication process”

(Rzepa & Murray-Rust 2001, p.178).

The volume of data produced by researchers exceed the possibilities of human readers (chapter 2

section 6.3). One of the emerging concepts which will assist researchers to cope with exponentially

growing scholarly content is the Semantic Web (see section 2), which “will likely change the nature of

how scientific knowledge is produced and shared, in ways that we can now barely imagine”

(Berners-Lee & Hendler 2001).

These developments will blur the boundaries between journals and databases (Hannay 2007) and

between formal and informal publications (Brown et al. 2007, p.4). There is great potential and the

full promise of web technologies has not been realized yet.

43

Brown, Griffiths & Rascoff (2007, pp.13-14) summarize some key statements from interviews

about the future of scholarly communications:

• scholarly publishing of the future will be electronic, all content must be available online.

• integrated electronic research and publishing environments will help researchers to conduct

and publish research.

• multi-format delivery of scholarly content will become increasingly important, allowing the

consumption of scientific information on any device.

2 The Semantic Web

The Semantic Web is one promising concept to develop completely new research environments.

According to Berners-Lee et al. (2001) “the Semantic Web is not a separate Web but an extension

of the current Web in which information is given well-defined meaning, better enabling computers

and people to work in cooperation”.

The Semantic Web would mean making scholarly information computer readable. All scientific

research could be interlinked in one open, interoperable information system. Today scholarly

communication still means manual scanning of scholarly (journal) content by researchers. This

process is solely based on human-human or human-computer interaction.

Semantically limited formats like PDF or HTML are primarily designed for human consumption.

The Semantic Web would extend the scientific communication to a process where computers can

interact.

By applying Semantic Web technologies, the scientific article can be enhanced to integrate data sets,

allowing computers to read and process that information. A central technology in this context is

XML (Extensible Markup Language), which allows semantical enrichment of content. The Resource

Description Framework (RDF) is a standardized XML format enabling the description of web re-

sources and relationships among them. Articles containing such additional markup will be found by

new and better tools, and “users will thus be able to issue significantly more precise queries”

(Berners-Lee & Hendler 2001).

44

Imagine a scholarly article where the text is marked up in XHTML (eXtensible Hypertext Markup

Language) and extended with structured metadata expressed in RDF, mathematical formulae de-

scribed in MathML (Mathematical Markup Language) and graphics represented in SVG (Scalable Vector

Graphics). Markup languages for various fields are already available, for example CML (Chemical

Markup Language) or CellML (Cell Markup Language). These can be used to further describe the se-

mantic meaning of scholarly content. If software also adopts such standards it will increase interop-

erability of semantic information (Hannay 2007).

An XHTML or PDF version would then just be one possible (visual) representation of such an

article. The same article can be processed by machines utilizing the structured RDF and other

markup information. This would amplify the current text based concept of scientific articles and

open up new aspects of scholarly communication. For example, content from multiple sources

could be combined and enriched to form a new powerful service (Bourne et al. 2008).

However, currently only few applications exist which make use of semantic markup. Similarly, the

integration of semantic information with data in repositories is lacking (Fink & Bourne 2007).

However, there are some promising approaches and experimental projects which integrate semantic

technologies.

A project experimenting with semantic markup in journal articles is BioLit (http://biolit.ucsd.edu),

which tries to integrate the text of articles with other bioinformatics information (Bourne et al.

2008).

Seringhaus & Gerstein (2007) propose an expanded publishing process in bioscience, extending the

text based article with a machine-readable XML based summary.

Another project is BibApp (Larson 2007), a mashup which matches researchers with their publica-

tions, disclosing collaborations and research communities.

YeastHub (Cheung et al. 2005) demonstrates how semantic web technologies can be used to inte-

grate data provided by different resources into a life science data warehouse.

The World Wide Web Consortium introduced RDFa (W3C 2008), a set of XHTML attributes to

augment visual data with machine-readable hints. Such integration of machine-readable content in

human-readable web pages enables standard web browsers to assist humans in interpreting seman-

tic information.

45

The Object Re-Use and Exchange (ORE) project (Van de Sompel & Lagoze 2007) of the Open Ar-

chives Initiative (OAI) defines aggregations of distinct information units as compound units, that,

when combined, form a logical whole. The concept describes mechanisms to represent a reference

such information in a machine-readable way.

To support semantic web applications, digital content must be captured in ways that are signifi-

cantly different from conventional methods. The effort of providing the semantic information is an

extra task for researchers. Despite the promising projects mentioned above, there is still a lack of

tools that help authors to integrate such markup in their content.

3 Web 2.0 and Science

The term Web 2.0 was invented by O’Reilly (2005) and comprises a wide range of concepts. A cen-

tral principle is to recognize the web as a platform where services replace traditional client software.

Another key factor is to leverage collective intelligence, where the audience is involved and contrib-

utes to making services better. These concepts are built on what O’Reilly (2004) calls the “architec-

ture of participation”. Such services generate network effects by harnessing the collective intelli-

gence of its users.

The Web is providing an environment which enables collaboration and communication. That

means that the Web is becoming a social environment, a community of collaborative interaction.

The Web is already a global space where users do most of the work. They manage, post and catego-

rize content. A Web 2.0 concept for categorizing content is Tagging. Tagging is a way of classifying

content in an informal way—in contrast to structured methods like ontologies which are used in

semantic web applications—by applying user generated metadata to that content.

Web 2.0 technologies even allow users to police themselves. Commenting and rating capabilities

allow abuse to be revealed. Even in science, where there is still a reluctance, the web will encourage

the adoption of Web 2.0 applications like social networks and blogs. Hannay (2007) mentions that

today there are no technical, but rather social and psychological barriers to not fully embrace such

tools in science.

The scholarly communication system and especially the established (journal) publishing process is a

network of moving information between different actors (see chapter 1 section 4). Utilizing the

Internet and its new Web 2.0 applications would enlarge that publishing network and enable re-

46

searchers to (inter)act as real-time readers and writers, as consumers and producers of online schol-

arly content.

Web 2.0 concepts can be utilized in science in different ways (Butler 2005). First, services like social

networks can influence the way science is done and help to increase collaboration in the research

process itself, by providing new research environments and forms of scholarly communication.

Such virtual communities and collaborative environments already exist. Sites like Nature Networks

(http://network.nature.com), SciSpace.net (http://www.scispace.net) or nanoHUB

(http://www.nanohub.org) try to connect scientists worldwide with blogs, forums and discussion

groups. These sites allow researchers to share and review preliminary findings and know-how and

therefore accelerate research. Another approach is SciVee (Fink & Bourne 2007), which allows au-

thors to upload an already published article with a video or podcast presentation. The video can

then be synchronized with the content of the article.

Secondly, Web 2.0 concepts can be used to enhance the publishing process by enabling post-

publication commenting and discussing of scholarly findings. A principle of Web 2.0 is that the

audience decides what’s important (O'Reilly 2005). This aspect applied to science would mean that

the readers of scholarly content act as a filter. Given the volume of new research findings, collective

scanning and commenting will help to discover the most relevant material from the growing

amount of new content.

Researchers must be aware of new research findings in their field. New content is relevant content

for them. Traditional, relevancy based search engines (like Google) fail for incremental, new con-

tent. RSS (Really Simple Syndication) is a Web 2.0 technology which is being used to push new

content, allowing users not only link to a webpage but “subscribe to it, with notification every time

that page changes” (O'Reilly 2005). This is especially helpful for chronologically ordered content

like a journal’s table of contents or new entries in a weblog.

Several specifications are available for RSS. RSS 0.91, RSS 1.0, RSS 2.0 (http://www.rss-

specification.com/rss-specifications.htm) and the Atom specification (http://www.atompub.org/).

All RSS versions are based on XML. These XML documents are exposed on a web server and can

be syndicated by RSS readers or other applications.

47

Figure 5: RSS feed carrying additional bibliographic metadata (Hammond et al. 2004)

Hammond, Hannay & Lund (2004) propose the use of RSS 1.0 for science publishers, because it is

an extensible format based on RDF (Resource Description Framework) with support for name-

spaces. It allows including of additional metadata like Dublin Core or PRISM (Publisher Require-

ments for Industry Standard Metadata) and therefore can also be used to exchange scholarly con-

tent.

Another Web 2.0 concept that has been gained wide momentum is blogs. Blogs address informal

and immediate ways of scholarly communication. Oppenheim, Greenhalgh & Rowland (2000,

p.365) alert to distinguish between scientific publishing and academic debate. A scientific article is

not a blog post. Disseminating new ideas on a science blog encourages early discussions but does not

include the functions and role of a formal publication. Blog posts are unfiltered, not reviewed.

Therefore, the demand of a final, peer reviewed publication of research findings still exists.

Scientific blogging is still a niche activity (Hannay 2007). “Scientists who blog see their activities as a

useful adjunct to formal journals, not a replacement” (Butler 2005). Researchers still do not em-

brace the potential of blogs to share and discuss their findings ahead of publication. Nor are they

joining open commenting on research papers (Liu 2007). Blogs could be a forum to informally

discuss science. Blogs are one way to get immediate feedback for new ideas. And they are a perfect

way to communicate science better to the general public (Clarke 2008).

48

There are even more Web 2.0 applications that encourage participation and collaboration (Hannay

2007):

• wikis, websites that any visitor can add to and edit. Scientific wikis allow geographically dis-

tributed researchers to contribute in the creation of documents.

• social bookmarking sites for researchers like del.icio.us (http://delicious.com) or Connotea

(http://www.connotea.org) allow saving links to research articles. They support tagging to

organize and categorize content. You can also view other user’s collections.

• virtual worlds (for example Second Life, http://secondlife.com) provide an environment where

people can meet and communicate in a virtual space. Some universities and publishers al-

ready have a presence there, exploring the potential of virtual worlds in education and other

areas like conferences.

4 Alternative Review Models

As outlined in chapter 1 section 5 there is agreement about the important role that peer review has

even in the Open Access publishing era. However, new web based technologies and collaborative

environments are challenging the traditional review process and emphasize new approaches that

make the process cheaper and effective. Ginsparg (2000) considers whether the established peer

review process “remains the most efficient way to organize the review and certification functions,

or if the dissemination and authentication systems can be naturally disentangled to create a more

forward looking research communications infrastructure.”

Anderson (2006) proposes to “democratize scientific publishing” by tapping the collective intelli-

gence of thousands of researchers and students in an abundant environment of online journals.

Bankier & Perciali (2008) state that in the Web 2.0 era “peer review is now everywhere—Google,

YouTube, Epinions, NetFlix. The academic community has worked this way for decades, and

scholars can now have the tools to do it faster and on a larger scale.”

As mentioned in the last section, Web 2.0 concepts like commenting and rating would harness the

full power of the Internet in scientific communication. Such concepts would even allow authors to

police themselves. This would mean that every paper is published and the audience decide what is

to be taken seriously (Harnad 1998).

49

This approach—what Harnad (1996) calls peer commenting—requires a critical mass of readers who

are willing to comment on the huge amount of new papers. And it would eliminate the hierarchical

filter of the current quality control mechanism (Harnad 1998). Therefore, it is currently not consid-

ered to be a full replacement for the established peer review system. Yet, this approach is success-

fully being tested as an adjunct to the traditional peer review process.

First, in the form of post publication commenting. At PLoS ONE, a journal published by the Public Li-

brary of Science (http://www.plosone.org), readers of the article are able to annotate parts of the

text, comment on the overall content or rate the article. PLoS still uses an additional peer review

process. However, by using these new feedback tools, the paper gathers post publication commen-

tary that improves the scientific debate around the content. PLoS is even asking its reviewers to

make their reviews public as a comment on the article after publication. Surridge (Waldrop 2008),

editor of PLoS ONE, highlights the importance of peer reviewed papers and journals, but “they're

effectively just snapshots of what the authors have done and thought at this moment in time. They

are not collaborative beyond that, except for rudimentary mechanisms such as citations and letters

to the editor".

A pure approach with post publication commentary and no peer review has been introduced by

Nature Precedings (http://precedings.nature.com) which calls itself a citable archive for pre-

publication research and preliminary findings and is therefore complementary to peer reviewed

journals.

This is similar to what happens in physics on the ArXiv e-print archive (http://arXiv.org), where

the audience sends comments about preprint articles (Rowland 2002, p.255). However, a major

objection to completely omit refereeing is the volume of new research findings, where peer review

can assist to filter high quality content.

Secondly, in the form of readers or closed groups of specialists commenting on unrefereed papers

and using these commentaries as a basis for formal publication (Harnad 1998). Nature (2006)

launched a trial of such community peer review in 2006 by hosting submitted manuscripts on the Inter-

net for public comment. The manuscripts are public while under review, but they are not formally

published. The two major advantages of such methods are that they makes research available im-

mediately and allow multiple people to comment on papers (Benos et al. 2007).

However, the experiments at Nature and BioMed Central (Adie 2008) have shown that the percentage

of people who do leave comments is small. It seems that Waldrop (2008) is right in stating that “the

acceptance of any such measure would require a big change in the culture of academic science”.

50

However, Foerster (2001) predicts that “it seems highly likely that such electronic forums will be-

come the norm of the future”. Commentaries undoubtedly add value to already published scientific

content, as long as reputable scholars take part (Rowland 2002, p.254).

There is a wide range and different types of open review systems, from pure forms to combinations

and tiered systems. At the Electronic Transactions on Artificial Intelligence (ETAI) journal the peer-review

process is divided into two steps (Sandewall 2006). First, reviewing, a three months period during

which the peer community can provide feedback for the authors. And, secondly, refereeing as a

subsequent quality threshold before acceptance to the journal.

Another example is Frontiers (http://www.frontiersin.org/neuroscience), which have developed a

tiered system in neuroscience, where multiple rounds of independent and interactive reviews let

papers gradually rise to its level of importance and publication in first tier, and the most outstanding

research is additionally republished in second tier.

A common advantage of new forms of peer review that take advantage of the Web is that such

methods are faster than the traditional peer review process. Papers are immediately available for

review and discussion to the scientific community. Sandewall (2006) even argues that journals that

use open review processes are likely to receive manuscripts of higher quality, because of a more

restrictive submission practice from the authors.

Traditional peer review is a closed system, where the author does not know the identity of the refe-

ree, but the referee knows the name of the author. A variant of this system is double-blind refereeing,

where the author is also hidden from the referees (Rowland 2002, p.248). The opposite approach is

open peer review, where the referees’ identity is known to the author. BMJ (British Medical Journal,

http://www.bmj.com) has adopted this system for their journals arguing that “it seems wrong for

somebody making an important judgment on the work of others to do so in secret“ (Richard Smith

1999).

A different form of establishing relevancy of scientific work is to automatically analyze citations of pub-

lished articles. However, a major drawback of this approach is that this only works with a signifi-

cant time lag. And, as Arms (2002) notes, importance does not guarantee quality, but articles that

are more cited tend to be of good quality.

All of the new concepts presented here have one thing in common: They mark a huge step forward

towards a more democratic review process where any member of the scientific community has the

ability to discuss new scholarly work.

51

C h a p t e r 5

E-PUBLISHING SOLUTION

So far many different aspects around scholarly publishing have been outlined. Technical, economic,

social, cultural and political issues related to the scientific publishing process have been discussed.

This theoretical part forms the background for the research presented in this last chapter, where a

technical publishing concept, based on the DSpace repository platform, is presented. The proposed

solution suggests an approach to enhance the existing DSpace Green Road repository model with

Gold Road Open Access journal concepts.

1 Mixing the Green and Gold Road

Repositories—the Green Road to Open Access—play a key role in self-archiving of already pub-

lished scholarly work. Although repositories have also been used to store preprints (unpublished

articles) in order to allow fast access to new research, it is a place where authors post their papers

after peer-review, after publication (Bankier & Perciali 2008).

In this project, the current functionalities offered by repository software have been investigated. In

a next step, it has been attempted to incorporate new publishing concepts into such repositories

and to build a system for the management of Open Access journals based on repositories.

Repositories provide a number of features which can also be essential for Open Access journals,

the Gold Road to Open Access: they are designed to post research articles, they allow easy submis-

sion of scholarly work, they index content and make it searchable, they assign persistent identifiers

to make content citable, they preserve data in the long term, they support standards like the OAI-

PMH (Protocol for Metadata Harvesting of the Open Archives Initiative) to make the content

available for harvesting, they provide detailed metadata for archived content and they allow meta-

data to be exposed via RSS feeds.

By assigning metadata to archived items, repositories provide different types of categorization of

content. Articles uploaded to a repository already have been classified during submission by assign-

ing keywords and other metadata like author, title etc. Articles can then be grouped and retrieved

by collection, keyword, author or date. Repository content is already organized, which allows users

52

to search and access content via different routes. However, repositories are currently not able to

represent journal volumes, issues and a table of contents with article sections. The common prac-

tice to package repository content into collections do not correspond to a journal’s organization

scheme.

In the first project phase, repository platforms have been evaluated whether they offer the flexibil-

ity to enhance the current repository functionalities. After this evaluation phase, DSpace has been

selected as the repository platform upon which the new features are built. The submission and

workflow architecture embodied in DSpace seems sufficiently flexible to apply those outside core

repository functions: The proposed solution gives the possibility to organize repository content into

journal issues. It enhances the repository with new publishing and production workflows: easy to

handle content submission, preserving content in the long term, publishing content into various

formats and enhance the organization of repository content by grouping it into journal issues. The

result of all these enhancements has been reduction of the time and effort from submission to final

journal publication.

1.1 Existing Literature

Numerous authors have been calling for a strengthened role of repositories in future publishing

environments. University publishing activities in general may play a more important role in the fu-

ture. Brown, Griffiths & Rascoff (2007, p.3) argue “that a renewed commitment to publishing in its

broadest sense can enable universities to more fully realize the potential global impact of their aca-

demic programs, enhance the reputations of their specific institutions [...] There seems to us to be a

pressing and urgent need to revitalize the university’s publishing role and capabilities in this digital

age”.

One possible way to achieve that goal is to enhance repositories with publishing features. Reposito-

ries are a widely accepted method for archiving university and institutional work. As presented in

chapter 2 section 1.2, the fact that more than 90% of all scientific journals permit self archiving has

strengthened the Green Road to Open Access. Nevertheless, Guédon (2004) states that the two

Open Access strategies—the Green and Gold Road—are complimentary approaches and they could

help each other by borrowing advantages of the other.

Bailey (2006, p.22) suggests that repository platform software should be further optimized to more

fully support electronic document publishing functions, such as e-journal management or confer-

ence management systems.

53

Bankier & Perciali (2008) argue that “the best way forward, and the best way for the university to

reconnect with its core mission and to support Open Access publishing, is to rediscover the [re-

pository] as a place for authors. This requires an expanded sort of repository and a new way of pre-

senting it. […] we are no longer talking about a repository as only an archive for preservation and

access. Instead, we are talking about a repository as a full-featured scholarly research and publishing

system. […] there is a second way to reinvigorate the repository, one that takes the repository be-

yond self-archiving altogether, to repositories as a platform for the creation of peer-reviewed jour-

nals. […] The repository has up to now been limited to the Green (self-archiving) model of Open

Access. We suggest that it is a place for Gold Open Access as well: A platform for journals that are

Open Access from the start. […] With university support, and with simple and easy to use tools,

repositories can become the most viable alternative for scholars to create Open Access journals”.

Royster (2008) also suggests a role for repositories beyond that of archival storage, arguing that they

are “well suited to become online publishers giving voice to a wide range of authors normally ex-

cluded […] of the conventional publication routes”.

1.2 Multi Channel Publishing

The growth of electronic communication and web based applications together with the increasing

number of (Open Access) electronic journals has encouraged the development of journal manage-

ment software (chapter 2 section 1.2). Existing open source journal management systems—for

example OJS (Open Journal Systems), http://pkp.sfu.ca/ojs—already assist and improve all stages of

the journal publishing process. These stages include submission, peer review, copy editing and the

publishing of journal articles. Online journal management systems are often structured around the

workflow required to connect these stages (Willinsky 2006, p.224). However, this type of software

focuses on optimising the communication workflows in the journal production process.

Such software does still not interoperate with repository software. In addition, it does not provide

any improvements in simplifying the process of authoring and submission of articles, nor does it

seek to reduce the effort of creating the final publication layout.

In repositories, content is predominantly archived in a layout-centric format like PDF. PDF is the

exact copy of an article’s print representation. But the print format offers only one possible way to

present scholarly content. To exploit the power of the Internet, scholarly publishing must support

multi format delivery.

54

Publishing journals in print and online requires output to multiple formats, at least PDF for print

and XHTML for online use. The future of scholarly publishing will imply that content must also be

available in additional formats, which, for example, are optimized for viewing on mobile devices.

Scholarly content will not be limited to a single format and reading environment. Guédon (1994)

already notes that in an electronic publishing environment print is “merely a subset of possible

modes of materialization”.

Multi-channel publishing is possible if it is based on a single-source production workflow. Single-

source means storing content in a media independent, XML based format. Journal management

systems currently do not support single-source publishing workflows. The whole production proc-

ess—including steps like format conversion and fabricating the final layout—is not covered by such

software.

The proposed solution addresses these issues by providing multi-channel publishing features for

the DSpace repository platform. Therefore, scholarly content is archived in a suitable, XML based

preservation format which enables flexible publishing workflows by rendering different article rep-

resentations like XHTML and PDF dynamically.

1.3 Organizing Content into Journals

Although the visibility of preprint articles is fairly increased by storing them in a digital repository,

articles could get far more attention by organizing them in a journal issue. This approach that jour-

nals could be overlaid on the repository has been first discussed by Ginsparg (1997), John Smith

(1999) and Arthur Smith (2000). The overlay journal offers more features like additional formats or

enhanced reading tools. Once a preprint article has been archived in the repository, it can be as-

signed to a journal issue. The article is then part of the issue and the article can be retrieved in dif-

ferent formats according to the journal’s look and feel.

Roosendaal and Geurts (1998) have identified the journal’s main functions registration, certifica-

tion, awareness and archiving (chapter 1 section 2). But the journal has the additional function to

organize content. Organizing content is a key task in a world of exponential growth of information.

Organizing content helps scholars to cope with that huge volume of scientific information. Organ-

ized content is easier to find, which is a precondition of using or consuming it.

Organizing content in journals is a successful method tested over some 300 years. The publication

process acts as a filter by selecting appropriate content (chapter 2 section 2).

55

Journals combine different aspects of content organization: All articles published in a journal are of

a certain quality. Journals publish new content, they offer an aggregated collection of current research

(Wellcome Trust 2003, p.1). They all represent a snapshot of research findings and knowledge at a

specific date. Articles published in a journal relate to each other in terms of subject, publication date or

other descriptive metadata. They all relate to a specific field in science, the field the journal focuses

on.

As mentioned in section 3, a published journal issue and its containing articles perform a unique

role, being an evaluated, final statement made by the authors (Mabe 2006, p.57). Journal issues or-

ganize and position many articles together into an issue in a fixed way. A print journals’ table of

contents represents this order of a particular journal issue.

Journals definitely have a place in the digital age. The functions provided by journals do not depend

on the actual distribution medium. The more important question is how we can preserve those

valuable functions by simultaneously utilizing the advantages of new web technologies.

As mentioned above, today a repository is often a place for archiving content after publication. The

proposed solution extends the repository by adding journal publishing features. The idea is to build

journals out of a single large (Preprints) archive. This is contrary to the original overlay journal con-

cept, which is built on distributed sources. The proposed concept suggests a simplified workflow

for managing journals in DSpace. The aim is to help journal editors to reduce the effort to create

journal issues and appropriate metadata out of already submitted articles. Furthermore, the aim is to

reduce the effort for layout by generating a high quality journal print PDF automatically.

The goal of the project is to remove some of the overheads associated with publishing, and to

make it relatively easy and cost effective to set up and run journals on top of a repository.

In rapidly moving research fields speed of publication is crucial. The proposed approach could help

making new scholarly findings fast accessible by making preprints immediately available in an ar-

chive and—in a subsequent step—award excellent papers by bundling the, into journal issues. This

approach follows Hannay (2007), who states that “the web is particularly well suited to a publish

then filter approach rather than the traditional filter then publish approach that was required when

publishing was necessarily a physical-world process”.

56

2 Requirements

The project focus was on building a production quality system. Basic functionality of the system

should include features such as submission, content management, preservation, authentication,

technological stability and system security.

Furthermore, the system of choice should provide the following characteristics:

• reduction of time from manuscript preparation to publication (through a simplified author-

ing and submission process)

• convenient and fast editorial process

• ability to publish content into various formats from the same source

• increase of interoperability

• integrated journal management workflow and automated production of a journal issue

PDF.

As the project aims to provide repository functionalities enhanced with journal management fea-

tures, some working models for such types of systems have been reviewed. For repository systems,

the Reference Model for an Open Archival Information System (OAIS) (Consultative Committee for Space

Data Systems 2002) describes what is required for an archive to provide long-term preservation of

digital information and interoperability (see also chapter 2 section 4.3).

For the publishing and journal management functionalities, Koohang & Harmann (2006) have de-

veloped a working model for open source communities involved in designing and developing Open

Access e-journals.

Figure 6: A model for Open Access e-journals (Koohang & Harman 2006)

57

They propose three essential parts for academic e-journals: the communication platform, the con-

tent management platform, and the portal. According to these components, journals can include a

wide spectrum of different functionalities.

Applying those models to our concept would require the following communication features:

• clean permissions and roles for authors, editors and journal administrators as well as users

of the system. The OAIS model speaks of the producers, management and consumers of a

digital archive.

• simplified editorial and reviewers tasks

The content management part must include the following requirements:

• easy authoring and preparation of manuscripts that requires no knowledge of XML or

other technical skills

• simplified submission process, including automatic metadata extraction like title, author,

keywords and abstract

• deposit of items in a media independent preservation format

• support of non-textual objects like images, mathematical formulae etc.

• adding of persistent identifiers to articles

• exposing article metadata for harvesting (interoperability)

Publishing articles online needs a deep understanding of how readers access and navigate through

the content, especially in a journal based environment. Therefore, the portal part of the system

should provide:

• immediately accessible content after submission and review

• open access to articles stored in the repository

• different publishing formats for archived items (PDF, XHTML)

• possibility of searching articles

58

• articles browsable per title, subject, date, author, journal volume and issue

• flexible setup and management of journal issues

• journal navigational structure (table of contents, journal sections)

• a merged, on-the-fly journal PDF

• RSS feeds for journal issues

• increased reading experience by adding hyperlinked references

• interoperability with reference management software by exposing article metadata in an

open standard

3 Open Standards

Some standards and technologies are essential parts of the proposed solution and therefore a brief

overview of these standards is provided in this section before the solution architecture is outlined in

detail.

Using open standards is a crucial factor to increase acceptance of institutional repositories. The

content stored in the system must be findable, easy to access and interoperable. Standards are a key

element in publishing. As different players are developing repository and publishing systems it is

fundamental to focus on interoperability (European Commission 2006, p.83), which eases data

exchange and improves the dissemination of scientific content. Open standards also ease the inte-

gration of other applications with the repository.

3.1 Metadata

Metadata is used to further describe and classify digital content. Each article in an institutional re-

pository has metadata associated with it. Applying metadata in a consistent way enables improved

navigation and discovery of digital material. For example, articles can be retrieved by browsing and

searching the content using associated keyword metadata. Therefore, the quality of metadata is an

important part of every repository (Awre 2006, p.60).

59

There are several metadata standards for different purposes. One of the most important metadata

standards is the Dublin Core Metadata Initiative (http://dublincore.org), which has been widely

adopted to describe digital resources. Dublin Core offers a set of fifteen metadata elements which

can be used for describing various types of resources.

The PRISM (Publishing Requirements for Industry Standard Metadata, http://www.prismstandard.org)

standard provides vocabulary for managing published resources like magazine, news, book, and

journal content. PRISM also defines metadata that can be applied as inline markup in the resource

itself.

Another metadata standard related to library content is the Metadata Encoding & Transmission Stan-

dard (METS, http://www.loc.gov/standards/mets/), which has been developed to describe objects

and its structure within digital libraries.

Beside preservation, metadata can be useful at the dissemination stage of the publishing process. If

the metadata is part of the published material in a standard based form, applications can read and

process that information.

Repositories are used by thousands of institutions worldwide. It requires consideration how these

distributed repositories can share information and can be searched. By using the

OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) standard (Lagoze & Van de Sompel

2002)—an application-independent interoperability framework—the metadata of repository items

can be exposed by the content providers (mostly the research institution) and then harvested by

service providers who add value like index to that metadata and make it searchable.

Such services are Google Scholar (http://scholar.google.com) or OAIster (http://www.oaister.org) at

the University of Michigan. The OAI protocol uses a standard set of fifteen Dublin Core metadata

elements (Willinsky 2006, p.241).

3.2 XML

XML (Extensible Markup Language, http://www.w3.org/XML) is a standard of the World Wide

Web Consortium (W3C). XML is a flexible, system-independent language for defining data and its

structure. XML focuses on the content and not the layout of a document. The major benefit of

XML is its extensible syntax. Various XML vocabularies for different—data or document centric—

purposes have been defined, such as XHTML, DocBook, RSS, RDF or SVG. If existing vocabular-

60

ies are not sufficient, XML allows creation of user-defined schemas. Such schemas or document

type definitions (DTDs) specify the structure and markup declarations of an XML document.

The XML standard has also been adopted as a main technology in the publishing process from

manuscript preparation to multi channel delivery (European Commission 2006, p.83). Processing

XML documents in publishing applications improves and simplifies the publishing process and

allows setting up flexible, media independent workflows. Using a consistent format like XML also

encourages a more consistent usage of content. The concept of separation of content and layout

enables multiple output format delivery. However, currently there is no standard XML vocabulary

to structure the full text of scientific articles.

Another important technology is XSLT (Extensible Stylesheet Language for Transformations). XSLT is a

powerful style sheet language for transforming XML to a different XML structure, or into a for-

matted document suitable for use, like HTML or text.

XSL-FO (Extensible Stylesheet Language for Formatting Objects) is an XML vocabulary for specifying

formatting semantics, which can be post-processed into page oriented formats like PDF, PostScript

or RTF. With the help of XSLT, arbitrary XML structures can be transformed into XSL-FO. The

XSL-FO representation must be fed into a XSL-FO processor which formats the Formatting Ob-

jects into PDF or PostScript.

3.3 Persistent Identifiers

Stable references and citations play a critical role in scientific publishing. They provide information

about precedence and importance of scholarly work. In a more technical sense each digital object

must have an identifier which must be unique and accessible in the long term. Improving reference

quality is an important task for publishers. Traditional URL (Uniform Resource Locator) have several

limitations (Sun et al. 2003) and therefore are no viable option for tracking digital information.

They can change, with the consequence of being no longer valid identifiers. Persistent digital object

identifiers provide unique names for digital objects and allow archived scientific material to be re-

trieved in the future. They should be preferred to URL (European Commission 2006, p.83).

As scholarly objects are comprised by more than just the article full text—such as additional im-

ages, equations, data sets or multimedia content—each of those article components should be ac-

cessible through a unique identifier.

61

Several schemes exist upon which persistent identifiers can be built (DiLauro 2004). The Digital

Object Identifier (DOI) system (http://www.doi.org) is an accepted naming convention for digital

entities. A DOI name contains a prefix and a suffix, for example doi:10.1038/news070618-15.

CrossRef (http://www.crossref.org) is the largest DOI registration agency that registers scholarly

and professional research content. As it reads on their website, researchers can rely on links to the

millions of scientific articles registered in CrossRef.

Another schema is the CNRI (Corporation for National Research Initiatives) Handle system (Sun et

al. 2003). The Handle system, for example, is used as the persistent identifier associated with each

item by the DSpace repository platform (McKenzie Smith et al. 2003). The Handle system includes

an open protocol, a namespace, and a reference implementation of that protocol. The protocol

enables storage of names (handles) of digital resources in a distributed computer network and re-

solves those handles into the information necessary to locate, access, and otherwise make use of the

resources (Sun et al. 2003). Handles use a similar syntax as DOI with a numerical prefix represent-

ing the handle’s naming authority, and a suffix which can be an arbitrary string: <Handle> ::=

<Handle Naming Authority> »/« <Handle Local Name>, for example

10.1045/january99-bearman.

3.4 Authoring and Preservation Formats

Scientific publishing lacks a comprehensive authoring standard. As Ginsparg (2007) notes, little

progress has been made in the development of underlying document formats. Within natural sci-

ences articles are mostly based on LaTeX. Other popular authoring environments are word proces-

sors like Microsoft Word or OpenOffice.org Writer. However, in the majority of cases the docu-

ments are not preserved in their authoring format. They are converted into a format suitable to

meet the needs of the documents’ distribution channel. For print delivery, PDF is the most widely

accepted format.

The documents’ PDF representation created for print delivery is mostly used for preservation as

well. Keeping the layout-centric format like PDF ensures preservation of documents exactly as they

were created without the slightest change. This means that currently preservation is a stage down-

stream of distribution.

62

The proposed solution will turn the order of stages around, with preservation preceding distribu-

tion. Articles are archived in the repository in a suitable preservation format and subsequently dis-

tributed into various channels.

One can argue that converting submitted papers to a different preservation format may go along

with losing important semantic or structural information. But if the conversion is set up carefully

and formats and processes are based on open standards, there is no difference between storing the

original file and storing the converted file and the conversion rules.

Evaluation work has been carried out to identify the most suitable authoring and preservation for-

mat. The proposed workflow is based on an authoring process using the OpenDocument Format (ODF).

The OpenDocument Format (ODF) has been chosen because it allows the upload of a complete

article, embedded images and objects (like equations) at a single blow. The free OpenOffice.org

Writer word processor uses the OpenDocument format. In addition, OpenDocument is an open

OASIS standard (Organization for the Advancement of Structured Information Standards,

www.oasis-open.org). OpenDocument files use an XML based format and therefore can be con-

verted into other XML formats using standard XML technologies. In addition, the OpenDocument

format preserves embedded formulae in MathML (Mathematical Markup Language).

In his detailed analysis on preservation of word processing documents, Barnes (2006b) defines a

“preservation format as one suitable for storing a document in an electronic archive for a long pe-

riod. An access format is one suitable for viewing a document or doing something with it”. Using

an XML based format helps to achieve long-term preservation. XML formats are text based and

therefore accessible and readable in the future.

Having identified DocBook (www.docbook.org) and XHTML (http://www.w3.org/TR/xhtml1) as

viable options, finally XHTML has been chosen as preservation format for the repository. XHTML

is not only a viable preservation format but also an excellent access format which can easily be

viewed in any web browser (Barnes 2006b). Being a W3C standard, XHTML is in plain text and

even readable in any text based viewer. It offers sufficient possibilities to represent the document

structure of research articles.

XHTML supports the embedding of additional (semantic) markup which resides in a separate

namespace, for example MathML or SVG. XHTML can also be extended with structured metadata

expressed in RDF (http://www.w3.org/TR/xhtml-rdfa-primer). This allows adding of value to the

article content in a standardized and machine-readable format. Because XHTML documents are

text based, incompatibilities with future technology are minimized. In addition, as document pres-

63

entation technologies evolve, new applications or transformation style sheets could be developed to

render the documents using the new technologies.

4 Solution Architecture

In this section a proposal for an architecture following the analysis and requirements contained in

earlier sections is presented. The system is based on the DSpace repository platform software

(www.dspace.org) and provides extensions to the software which allow multi-channel publishing

and journal management features. Therefore, the core concepts of the DSpace repository software

are explored in the next subsection.

Figure 7 gives an overview of the solution architecture. The approach tries to simplify the authoring

and submission process. The goal is to reduce the workload for authors in the paper preparation

process. Authors can submit their articles using a simplified submission step. During submission,

articles are converted to and preserved in XHTML and further described by extracting semantic

information. Repository metadata fields are pre-populated with the extracted information. The con-

sistent availability of this information helps to increase interoperability and allow computers to read

this information. During submission, authors have immediate feedback by various content pre-

views.

After approval, articles are available to the public in a Preprints repository. The Multi-Channel pub-

lishing options provide article representations in various formats. Selected contributions can be

aggregated into journal issues and published in a journal specific layout.

Figure 7: DSpace based publishing solution's core components

64

Publishers have to have capabilities (see chapter 1 section 4) exceeding the technical solution pro-

posed here. They perform various tasks (Mabe 2006, p.60), from being a manufacturer of a publica-

tion to the distribution of the final publication. They add value and communicate with all actors in

the publishing cycle, they provide services like marketing. This means that a technical solution like

the one presented in this project will not work out-of-the-box, it has to be embedded in an organ-

izational environment where all those professional publishing conditions are met.

4.1 The DSpace Repository

In a first phase several repository software platforms have been evaluated. Among them were

EPrints (http://www.eprints.org), a software developed by the University of Southampton, Fedora

(http://www.fedora.info), developed by Fedora Commons, and DSpace (http://www.dspace.org).

The criteria to go with DSpace were its comprehensive “out of the box” functionality and ease of

implementation. The open source approach allows software to be fully customized to suit specific

needs. DSpace allows the customization of the user interface, changing the default language and

provide the ability to customize or add new metadata schemes. DSpace offers flexible, clean exten-

sibility by adding extra functionality without worrying about the rest of the system. Another argu-

ment for choosing DSpace is its large and active community of developers.

DSpace is a digital repository platform jointly developed by the MIT Libraries and Hewlett -

Packard (HP) (McKenzie Smith et al. 2003). DSpace is an open-source solution for accessing, man-

aging and preserving scholarly works. DSpace addresses the need to collect and archive digital con-

tent for institutions and libraries. One of the core features of DSpace is to make the archived con-

tent more visible and easier to access. DSpace allows preservation of various types of digital con-

tent—for example text, datasets, images, presentations. However, the most common formats are

Word and PDF documents.

About 350 DSpace repositories worldwide (Figure 8) are currently listed as registered installations.

DSpace is mostly used as an institutional repository, but DSpace can be used for a wide variety of

purposes.

65

Figure 8: geographical distribution of DSpace repositories (http://maps.repository66.org/).

DSpace provides a comprehensive set of services. DSpace offers a web based user interface. It al-

lows web based submission of digital material. Each item has one metadata record. DSpace uses

the Dublin Core metadata standard (McKenzie Smith et al. 2003). The metadata is indexed for

searching and browsing, and the content is preserved in the repository in the long term. DSpace

supports the export of metadata via the OAI-PMH protocol (Protocol for Metadata Harvesting of

the Open Archives Initiative) to make the content available for harvesting. Users can search and

retrieve repository content. By assigning a persistent identifier to each item DSpace provides cita-

tions to items for very long time spans.

DSpace distinguishes several types of resources (dspace.org 2006) (Figure 9):

• a Bitstream represents a single file uploaded to the DSpace system. A bitstream can be of any

supported format like XML, text, Word, Excel or PDF document, image, videos, presenta-

tions, etc.

• a Bundle represents a related group of bitstreams that compose a unique deposit. For exam-

ple, an HTML file can consist of a list of bitstreams (the HTML file and included images)

• an Item represents a bundle and its metadata. An item consists of grouped, related content

and associated metadata. An item's exposed metadata is indexed for browsing and search-

ing.

• Collection: Items are organized in a hierarchical structure based on collections of related ma-

terial, for example a collection for working papers.

66

• a Community is a group of collections that share a common subject. A community is the

highest level of the DSpace content hierarchy. They correspond to parts of the organization

such as departments.

• an E-person is a registered user in DSpace. E-persons can have different roles in DSpace,

such as collection managers, item submitters, item reviewers, metadata editors, administra-

tors, etc. Such roles can be defined in DSpace groups, where E-persons can be assigned to.

DSpace has extensive authentication capabilities which allow setting of permissions on a

per collection basis. Many of DSpace's features such as document discovery and retrieval

can be used anonymously, but users must be registered to perform functions such as sub-

mission or administration (Michael J. Bass et al. 2002).

Figure 9: DSpace data model (dspace.org 2006)

DSpace is Java based and offers a comprehensive API (Application Programming Interface). The

system requires a Java servlet engine (such as Tomcat http://tomcat.apache.org). All the metadata

is stored in a relational database (either PostgreSQL or Oracle). The bitstreams are stored in the file

system. For a more detailed description on the DSpace system architecture see McKenzie Smith et

al. (2003).

67

In March 2008, DSpace 1.5 was released. This new version has introduced many new features in-

cluding a completely customizable Manakin user interface (Phillips et al. 2007). Manakin introduces

an interface layer which effectively replaces the JSP (JavaServer Pages) based interface system. Ma-

nakin allows customization of the repository interface according to specific needs. Adapting the

look and feel of the repository is even possible on a per collection basis.

Figure 10: DSpace technical architecture including the new Manakin user interface (Phillips et al. 2005)

Manakin is based on Apache Cocoon (Mazzocchi 2002). The basic concept of Cocoon is that it sepa-

rates style (layout) from content. This is accomplished by transforming XML based content with

XSL style sheets. Cocoon is a very powerful and scalable XML framework which uses the concept

of pipelines. Cocoon offers additional components like selectors, views, readers and actions

(Phillips et al. 2005) which are not described in detail here. There are three types of pipeline com-

ponents. A generator produces SAX (Simple API for XML) events that represent the input XML,

transformers can alter the stream of SAX events and a serializer finally creates an output stream which

is sent to the requesting client. The Cocoon configuration file is called sitemap, where the different

components can be assembled to form pipelines.

68

<map:pipeline> <map:match pattern="hello.html"> <map:generate src="docs/samples/hello-page.xml"/> <map:transform src="style sheets/page/simple-page2html.xsl"/> <map:serialize type="html"/> </map:match> ... </map:pipeline> The sample pipeline above matches the request for the URI hello.html, generates SAX events from

the input XML (hello-page.xsl), transforms that XML by applying the XSL style sheet simple-

page2html.xsl and serializes the output as HTML.

This architecture means that a final DSpace repository page is generated through the sequential

arrangement of components along a pipeline (Phillips et al. 2007).

Figure 11: The Manakin architecture (Phillips et al. 2007)

Manakin uses this pipeline concept to create its three major architectural components (Figure 11):

the DRI schema, aspects, and themes. The process of creating a final DSpace page consists of two steps.

In the content creation step the XML document is built, and in the style application part that XML

document is transformed and styled for output. Content generation is performed by aspect chaining,

while style application is performed by a theme (Digital Initiatives 2005).

The Digital Repository Interface (DRI) is a semantic representation of a DSpace repository page. This

generic XML structure contains metadata and structure of any repository page and complies with

the DRI schema (Figure 12).

69

Figure 12: Manakin DRI schema (Digital Initiatives 2005)

The metadata section contains information about the user who is authenticated and the repository

itself. DSpace items are referenced as METS (Metadata Encoding & Transmission Standard,

http://www.loc.gov/standards/mets) objects within the DRI page. METS is being developed by

the Digital Library Federation and provides a standard to wrap metadata together with DSpace

bundles and bitstreams. The first implementation of Manakin is encoding Dublin Core metadata

inside the METS objects (Phillips et al. 2005).

Manakin aspects are arrangements of Cocoon components that implement new features for the re-

pository (Digital Initiatives 2005). Aspects can include Java classes or simple XSL transformations.

An aspect expects a DRI XML document as input, modifies or adds content to this DRI page and

generates an output DRI document. This allows aspects being connected together. Manakin allows

the creation of modular extensions. They allow existing features to be modified or new features to

be created. In this way either minor changes regarding the display or even new workflow function-

ality can be added (Phillips et al. 2007).

Themes allow the customization of the look-and-feel of a repository. This is achieved by applying

XSL and CSS style sheets to the final DRI page (see Phillips et al. 2007 for detailed information on

the Manakin concepts). Themes typically produce XHTML for web consumption but it would be

possible to generate other formats like PDF or SVG (Phillips et al. 2005).

4.2 DSpace Customizations

The proposed solution requires some customization of the DSpace platform. However, only minor

changes have to be made to a default DSpace 1.5 installation.

70

The system is built upon a hierarchical DSpace structure:

Figure 13: DSpace repository structure

Library: the root community

All Articles: the main collection for all—preprints and published journal—articles

Preprints: the collection where items are mapped to after acceptance and before publishing in a jour-

nal.

Journals: the top level community of all journals. Child communities are actual journals (for example

UPGRADE) which again hold descendant issue collections.

To support additional, journal related metadata, the PRISM metadata schema (namespace:

http://prismstandard.org/namespaces/basic/2.0) has been added to DSpace. Figure 14

shows a list of the new metadata fields. These new metadata fields (prism:issueIdentifier, prism:issueName, prism:number, prism:publicationDate, prism:publicationName,

prism:section, prism:startingPage, prism:volume) are used in the journal management

workflow.

Figure 14: PRISM metadata fields

71

To allow on-the-fly generation of PDF, Apache FOP (http://xmlgraphics.apache.org/fop) has been

attached to DSpace. FOP is a print formatter driven by XSL formatting objects (XSL-FO). This has

been accomplished by adding a new serializer to Manakin's main Cocoon sitemap (see appendix,

Listing 14) and by adding the appropriate JAR files (fop.jar, xmlgraphics-commons-

1.3.1.jar and fop-hyph.jar for hyphenation support) to the DSpace web application’s lib di-

rectory (webapps\xmlui\WEB-INF\lib).

To allow MathML support, the JEuclid FOP plugin has been installed (see

http://jeuclid.sourceforge.net/jeuclid-fop/index.html).

We have altered the default XSLT processor used by Manakin to the XSLT 2.0 compliant Saxon

processor (http://saxon.sourceforge.net/). This allows us to use features introduced by XSLT

2.0—for example regular expression handling—in the conversion and publishing process (appen-

dix, Listing 12 and Listing 13).

The Manakin aspects and themes concept allows us to package modifications in a clean way, with-

out the need of changing existing functionalities.

These flexible concepts are used to build the journal management and publishing extensions. In

order to make this work, a Manakin theme, providing a distinct view of the repository and enabling

the multi channel publishing pipelines, has been developed. A new Manakin aspect has been developed

which provides the journal management functionalities.

The solution can be adapted to handle additional input formats (for example MS Office Open

XML), different preservation or output format needs, by changing the underlying XSL style sheets.

5 System Implementation

The system workflow (Figure 15) is based on a two tier publishing process:

In the first tier, articles—after submission and acceptance—are archived in the repository and

mapped into the preprints collection. Articles are immediately accessible for the scientific commu-

nity. In a second stage, selected contributions can be awarded by packaging them into a journal issue.

If an article is published in a journal it is unmapped from the preprints collection.

72

All articles—whether preprints or ones already published in journals—can be disseminated in dif-

ferent formats. The layout can be adapted on a per collection basis. This allows journal specific

document presentations.

The setup of DSpace requires some web technology experience. A basic procedure along with all

associated customizations is outlined in the appendix.

Figure 15: submission and publishing workflow

73

5.1 Submission Workflow

5.1.1 Authoring

As presented in section 3.4, the authoring process is based on the OpenDocument format (ODF).

Providing an authoring template allows production of a consistently structured manuscript. Having

manuscripts in a compliant form reduces time and effort of converting them into the target preser-

vation format.

Figure 16: OpenDocument format authoring template

The process of authoring an article will be roughly the same as it currently is. Following simple

authoring guidelines, the author has to apply custom styles—title, author, affiliation, abstract, index

terms, document structure (headings), tables, image captions and references—to structure the

document. The template assists the author in writing a paper that conforms to the specified struc-

ture, including appropriate metadata. Images must be embedded in the correct dimensions and

quality. Image captions must be marked up using the word processor’s built in functionality.

74

Sefton (2007a) reports that word-processing templates are used properly, “if you provide a feed-

back loop so people can see their document converted to HTML, then they do use and even like

templates.” Templates must not be excessively complex and authors must profit from using it. The

presented solution offers extended feedback options available immediately after submitting a

manuscript to the system, which is discussed in more detail in section 5.1.2. Furthermore, it should

both ease the paper creation process by eliminating a huge portion of layout work and the submis-

sion process by automatically extracting semantic information.

OpenDocument files are XML based and are stored in JAR (Java Archive) format. A JAR file is a

compressed ZIP file that has an additional manifest file that lists the contents of the archive. An

unzipped ODF file contains the following files (Eisenberg 2006):

mimetype the MIME type for the document

content.xml the actual content of the document

styles.xml information about the styles used in the content

meta.xml metadata about the content of the document

settings.xml application specific settings (such as zoom factor, headers and footers etc.)

META-INF/manifest.xml a list of all the other files in the JAR providing metadata about the entire JAR file

Pictures a directory that contains the list of all images contained in the document

The most important files replicating the semantics of an ODF file are content.xml, meta.xml

and style.xml. style.xml contains information about default and named styles which have been

predefined in the authoring template. If an ODF file contains additional objects (like equations),

they are stored in subdirectories and corresponding content.xml files.

5.1.2 Content Submission

In the electronic publishing era, authors are expected to submit their articles for the journal in elec-

tronic form. A crucial factor to increase the scholars' acceptance of Open Access repositories is a

simple submission process. Although Carr and Harnad (2005) have found the amount of time

spent on submitting papers is just about 40 minutes per year for a highly active researcher, the task

to enter the metadata required by repositories puts an unnecessary burden on the submitter. Fur-

75

thermore, most systems move much of the burden on the author by even expecting camera-ready

papers.

In a standard DSpace configuration, a large component deals with the submission process. A web

user interface containing several interactive steps guides the user through submission. Registered

users can submit content to a DSpace collection by providing mandatory and optional metadata. A

DSpace item is created from that information and one ore more uploaded files (bitstreams) are at-

tached to it. The Dublin Core metadata elements are part of the standard configuration, but other

metadata standards can be implemented, as described in section 4.2. After reviewing the item meta-

data and the associated files the user must accept the collection’s license.

With version 1.5, DSpace introduces a new item submission system (Donohue 2007) which is now

more configurable. The new submission system allows rearrangement of submission steps, to re-

move steps and new steps to be added. It also allows the definition of different submission proc-

esses on a per collection basis.

Yet, the DSpace 1.5 standard submission process is the same as in previous versions. The process

had the fixed sequence, containing the steps described above and shown in Figure 17.

Figure 17: traditional submission process

The steps have been rearranged by placing the licence and upload steps at the beginning of the

submission process.

Figure 18: proposed submission process

This has been accomplished by reordering the steps in the submission configuration file (item-

submission.xml, available in the DSpace config folder, Listing 1) according to Figure 18.

76

Listing 1: submission configuration

<submission-process name="upload_convert">  <step> <heading>submit.progressbar.upload</heading> <processing-class> at.ac.wuwien.dspace.submit.step.UploadStep</processing-class> <jspui-binding> org.dspace.app.webui.submit.step.JSPUploadStep</jspui-binding> <xmlui-binding> at.ac.wuwien.xmlui.aspect.submission.submit.UploadConvertStep </xmlui-binding> <workflow-editable>true</workflow-editable> </step> ... The new submission process should only apply to the main All Articles collection. Therefore, the

process must be restricted to the handle identifier of that collection in the <submission-map>

section in item-submission.xml (Listing 2).

Listing 2: restrict submission process to specific community

<submission-map>  <name-map collection-handle="123456789/5" submission-name="upload_convert" /> <name-map collection-handle="default" submission-name="traditional" /> </submission-map> Hannay (2006) notes that “scientific information in an online world needs to be made useful not

only to readers but also to software”. Articles should include additional machine-readable informa-

tion. Therefore, the proposed article submission system includes an enhanced upload step

(at.ac.wuwien.xmlui.aspect.submission.submit.UploadConvertStep), which post-

processes the uploaded, well structured OpenDocument file (Figure 19). The file is unpacked and

the main content is transformed into the XHTML preservation format using an XSL style sheet

(odt2html.xsl).

The style sheet is derived from J. David Eisenberg's ODF to XHTML style sheet (available at

http://books.evc-cit.info/odf_utils/odt_to_xhtml.html) and Ruslan Shevchenko's adaptation of

the style sheet (available at http://odt2html.gradsoft.ua/Odt2Html.html). The style sheet has been

extended to handle markup specific to scientific articles which have been created using the author-

ing template described in section 5.1.1. The metadata created by the author is embedded in the

output XHTML file following the recommendation of the Dublin Core Metadata Initiative

(http://dublincore.org/documents/dcq-html/). This machine readable information lets users

77

transfer that structured information between applications and increases interoperability with refer-

ence management tools or browser add-ons.

Figure 19: article conversion

During conversion, images are extracted and important semantic information is captured, such as,

for example, document headings structure or image captions. Although the goal was to be as media

independent as possible, the right balance had to be found between preserving semantic meaning

and layout. Some layout information is still indispensable, like table markup, borders, alignment etc.

The style sheet uses XSLT 2.0 functions and therefore needs an XSLT 2.0 compliant XSLT proces-

sor (like Saxon, see section 4.2). For example, regular expressions are used to extract the author’s

affiliation and email address from the submitted ODF file (Listing 3).

Listing 3: regular expression handling in XSLT 2.0

<xsl:when test="$map_style='author'">  <xsl:variable name="affiliation"> <xsl:value-of select=".//text:note-body"></xsl:value-of> </xsl:variable> <span class="authorname"><xsl:value-of se-lect="text:span[1]"></xsl:value-of></span> <span class="affiliation"> <xsl:analyze-string select="$affiliation" regex="\s(([A-Z0-9._%\+\-])+(@)([A-Z0-9\.\-])+(\.[A-Z]{{2,4}}))" flags="i"> <xsl:matching-substring>  <xsl:text> </xsl:text> <span class="email"><xsl:value-of select="regex-group(1)"></xsl:value-of></span> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="normalize-space(.)"></xsl:value-of> </xsl:non-matching-substring>

78

</xsl:analyze-string> </span> </xsl:when>

The style sheet must be applied to an input XML file which has the structure outlined in Listing 4.

This is accomplished by wrapping the meta.xml, content.xml and styles.xml into an

<office:document> root element.

Listing 4: input XML structure for XSL transformation

<office:document xmlns:office='urn:oasis:names:tc:opendocument:xmlns:office:1.0'>

<office:document-meta> ... content from meta.xml ... </office:document-meta> <office:document-content> ... content from content.xml ... </office:document-content> <office:document-styles ... content from styles.xml ... </office:document-styles> </office:document>

After completion of the XSLT transformation, a new DSpace item is generated and the article

metadata—title, author names, affiliation, abstract, keywords—is extracted from the article content

and assigned to the new DSpace item (Listing 5). The metadata information the author provides is

combined with additional system generated metadata like date submitted or format. The automated

metadata extraction allows pre-population of the DSpace metadata form in the next submission

step. There, the submitter only needs to double-check the already available data.

A bitstream is created from the converted article XHTML representation and—if present—for

each embedded image. The article bitstream is always named article.xhtml. This allows the setup of

generic publishing workflows.

Listing 5: Java method for metadata extraction

public void extractMeta(final InputStream bsInputStream, final Item item) throws Exception, IOException, XPathExpressionException, SAXException { Object result = null; // Remove existing values item.clearMetadata("dc", "title", null, Item.ANY); XPath xPath = XPathFactory.newInstance().newXPath(); // establish namespace context that binds prefix to namespace URI xPath.setNamespaceContext(new NamespaceContextProvider()); // select title InputSource inputSource = new InputSource(bsInputStream); try { // compile the XPath expression String expr = "/xhtml:html/xhtml:head/xhtml:meta[@name]";

79

XPathExpression xPathExpr = xPath.compile(expr); result = xPathExpr.evaluate(inputSource, XPathConstants.NODESET); NodeList nodes = (NodeList) result; String value; for (int i = 0; i < nodes.getLength(); i++) { NamedNodeMap nodemap = nodes.item(i).getAttributes(); value = nodemap.getNamedItem( "name" ).getNodeValue(); // dc.title if ( value.equals("DC.title") ){ item.addMetadata("dc", "title", null, "en", nodemap.getNamedItem( "content" ).getNodeValue()); } // dc.contributor.author if ( value.equals("DC.creator") ){ item.addMetadata("dc", "contributor", "author", "en", nodemap.getNamedItem( "content" ).getNodeValue()); } // dc.description.abstract else if ( value.equals("DCTERMS.abstract") ) { item.addMetadata("dc", "description", "abstract","en", nodemap.getNamedItem( "content" ).getNodeValue()); } // dc.subject else if ( value.equals("DC.subject") ) { item.addMetadata("dc", "subject", null, "en", nodemap.getNamedItem( "content" ).getNodeValue()); } // dc.date.submitted else if ( value.equals("DCTERMS.dateSubmitted") ) { item.addMetadata("dc", "date", "submitted", "en", nodemap.getNamedItem( "content" ).getNodeValue()); } } // dc.type item.addMetadata("dc", "type", null, "en", "Article"); // update item item.update(); } catch (Exception e) { e.printStackTrace(); } }

Listing 6 shows an example of an article bitstream preserved in XHTML format and containing

Dublin Core metadata.

Listing 6: XHTML preservation format including Dublin Core metadata

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://dublincore.org/documents/dcq-html/"> <title>Contextualized Attention Metadata in Learning Environ-ments</title> <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> <link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" /> <meta name="DC.title" content="Contextualized Attention Metadata in Learning Environments" /> <meta name="DC.creator" content="Martin Wolpers" />

80

<meta name="DCTERMS.created" scheme="DCTERMS.W3CDTF" content="2008-04-03" /> <meta name="DCTERMS.dateSubmitted" scheme="DCTERMS.W3CDTF" con-tent="2008-04-03" /> <meta name="DCTERMS.abstract" content="This paper presents the notion of contextualized attention metadata (CAM) in learning environments. CAM describes observations about the handling of digital information in rela-tion to the context in which the respective activities took place. The us-age of CAM is exemplified in three scenarios: (1) using CAM to support the learning process of employees in agile business process execution, (2) en-riching learning resource description with CAM and (3) identifying usage patterns of architectural learning resources with CAM. CAM helps to indi-vidualize the learning experience by providing detailed information about the learner’s way of dealing with digital information which can, for exam-ple, be used to target the information provision to the learners needs by helping her to focus on the learning activities rather than information management." /> <meta name="DC.format" scheme="DCTERMS.IMT" con-tent="application/xhtml+xml"/> <meta name="DC.type" scheme="DCTERMS.DCMIType" content="Text" /> <meta name="DC.subject" content="context" /> <meta name="DC.subject" content="attention" /> <meta name="DC.subject" content="technology enhanced learning" /> <meta name="DC.subject" content="contextualized attention metadata" /> </head> <body> <div class="title">Contextualized Attention Metadata in Learning Envi-ronments</div> <div class="author"> <span class="authorname">Martin Wolpers</span> <span class="affiliation">Fraunhofer Institut für Angewandte Infor-mationstechnologie (FIT), Schloss Birlinghoven, 53754 Sankt Augustin, Ger-many; <span class="email">[email protected]</span> </span> </div> <div class="abstract">This paper presents the notion of contextualized attention metadata (CAM) in learning environments. CAM describes observa-tions about the handling of digital information in relation to the context in which the respective activities took place. The usage of CAM is exempli-fied in three scenarios: (1) using CAM to support the learning process of employees in agile business process execution, (2) enriching learning re-source description with CAM and (3) identifying usage patterns of architec-tural learning resources with CAM. CAM helps to individualize the learning experience by providing detailed information about the learner’s way of dealing with digital information which can, for example, be used to target the information provision to the learners needs by helping her to focus on the learning activities rather than information management.</div> <div class="indexterms">context, attention, technology enhanced learn-ing, contextualized attention metadata </div> <h1>1 Introduction</h1> <div>Information is plenty today. ...

At the end of the upload and conversion step, the submitter has different feedback options to

check whether the converted article meets both the author’s and the repository requirements:

• a quality report automatically detects the article metadata and provides an overview of the

article structure by outlining the document headings

81

• a web preview in XHTML renders the full text of the article according to the repository

look-and-feel

• a PDF preview generates a high quality PDF, marked with a 'DRAFT' watermark on each

page

Figure 20: article upload - the author has immediate access to different

output formats (XHTML web preview, PDF and a report view) to ensure that the article renders as expected

The quality report (Figure 21) assists the submitter in checking the output of the upload and con-

version process. It includes warnings for missing metadata, an overview of the article structure to

check proper nesting of headings and a list of all references found in the article.

Figure 21: report for quality control showing metadata, document struc-ture and references

82

5.1.3 Quality Control Mechanism

After completing the submission process, the item can be reviewed in a limited review process in

DSpace. This editorial review process is called workflow. The workflow can have up to three steps,

performed by an associated e-person group. The submitted item ends up in a pool, awaiting a re-

viewer’s attention. The reviewers can check the submission and ensure it is suitable for archiving in

the repository collection.

If a reviewer takes a review task, she can simply accept the submitted item, change some metadata

or reject the item. Each collection within DSpace can have its own workflow (Michael J. Bass et al.

2002).

Figure 22: DSpace collection workflow (dspace.org 2006)

This simple workflow is insufficient in a journal management workflow. There is a Google Summer

of Code 2008 project (De Schouwer 2008), which seeks for improvements to provide modular,

configurable workflow steps in DSpace. The system presented in this paper utilizes the existing

workflow features.

5.1.4 Archiving

If a submitted article is reviewed and approved by the collection editor, the item will be archived in

the main All Articles collection. At this point it becomes accessible through the DSpace search and

browse functions. A persistent identifier is assigned to the item and each of its bitstreams. At the

same time, the archived item is mapped to the Preprints collection.

Mapping in DSpace means linking items to other collections. Moving items from one community

or collection to another is not supported in DSpace. The DSpace mapping concept is used to map

new articles into the Preprints collection and—after publication in a journal—to map the articles into

the journal issue collection.

83

5.2 Publishing in Various Formats

“The beauty of open access is that it allows endless possibilities in the ways in which information is

packaged and presented to the reader. But these possibilities may not be so open if poorly chosen file

formats introduce unwanted constraints or limitations.” (Guédon 2006, p.35)

“The sooner we have HTML editions of scientific articles, next to or instead of PDF editions, the bet-

ter. HTML and PDF files can both be Open Access, but HTML facilitates re-use of the content

and PDF retards it.” (Sefton 2007b)

There is evidence of increasing demand to support multiple publishing formats for scientific con-

tent. Offering various viewing options means increasing and improving access to scholarly material.

Default item representation in DSpace means document delivery in the format in which it is pre-

served in the repository, mostly as PDF. The PDF format is well suited for printing articles for

reading purposes. Though, dissemination of articles over the Web can go beyond access to the

PDF representation of articles. As already noted in the DSpace Technology and Architecture Speci-

fication (Michael J. Bass et al. 2002, p.6), “in the long term content needs to be accessible by a

wider audience, who may be using a wide variety of computing equipment. […] A method is

needed whereby a client’s capability for rendering media is assessed, and an appropriate version or

rendition of the content is accessed or computed”.

Additional article representations and output to multiple platforms provide a lot of potential to

enhance reading experience. Using a single-source, XML based preservation format opens the

flexibility of publishing content in various output formats. The same XML document can be ren-

dered by different XSL style sheets into XHTML for viewing in a web browser, into Adobe PDF

format for printing, or into some other format like WML (Wireless Markup Language) or SVG.

The proposed XHTML preservation format still has importance and is easily accessible, even with-

out applying a transformation style sheet, with any web browser. But, generating output formats

including additional representation information is definitely increasing user experience.

In DSpace 1.5, Manakin is used to make the repository interface adaptable, but not to transform the

content into different output formats. The proposed idea is to utilize the Manakin integrated Cocoon

component to enable additional publishing features and viewing options of archived XHTML arti-

cles.

84

To enable content to be delivered to different output formats, a new Cocoon generator (Bit-

streamGenerator.class) has been built which queries DSpace for a particular repository item

and passes it to the next step in a Cocoon pipeline. The new generator must be registered in Ma-

nakin’s main Cocoon sitemap (sitemap.xmap):

<map:generator name="BitstreamGenerator" src="at.ac.wuwien.xmlui.cocoon.BitstreamGenerator"/> The original task of a Manakin theme is to transform the final repository DRI page to a representa-

tion format like XHTML which allows setup of distinct repository user interfaces. This means, the

theme takes care of the visual appearance of the repository. The DRI page transformation chain is

determined by the aspect pipeline found in the theme sub-sitemap. In the proposed solution, the

template.xsl style sheet is responsible for transforming repository pages into XHTML.

The theme is configured in a Cocoon sub-sitemap, which resides in the theme subdirectory, in the

following way:

<map:component-configurations> <global-variables> <theme-path>library</theme-path> <theme-name>Library theme</theme-name> </global-variables> </map:component-configurations>

A theme is installed by adding an entry in the xmlui.xconf file, to be found in your DSpace con-

fig directory.

<themes> <theme name="Library Theme" regex=".*" path="library/"/> </themes>

The regex attribute determines which theme should apply to a particular URL. Specifying

regex=".*" means a very broad matching rule. This concept allows the setup of community or

collection based themes. For detailed instructions on how to add a new theme to Manakin see

dspace.org (2008).

In the proposed approach, the theme sitemap provides additional functionalities. Different rendi-

tion formats of DSpace content items are generated by setting up different pipelines in the theme

sitemap (Listing 7). The bitstream is located using the new BitstreamGenerator, transformed by an

XSL style sheet using the Saxon XSLT 2.0 processor and serialized in the appropriate format.

85

Each repository item can be delivered in XHTML and PDF format. The Cocoon pipeline concept

allows generation of specific layouts for preprints, journals and items in submission. The journal

name is passed as a parameter to select the appropriate XSL style sheet. All XSL style sheets reside

in the theme subdirectory.

Listing 7: Cocoon publishing pipeline

  <map:match pattern="view/handle/*/*/*/*.xhtml"> <map:generate type="BitstreamGenerator"> <map:parameter name="handle" value="{1}/{2}"/> <map:parameter name="name" value="{4}.xml"/> </map:generate> <map:transform type="xslt-saxon" src="xhtml_{3}.xsl"> <map:parameter name="handle" value="{1}/{2}"/> <map:parameter name="name" value="{4}"/> </map:transform> <map:serialize type="xhtml"/> </map:match>  <map:match pattern="view/handle/*/*/*/*.pdf"> <map:generate type="BitstreamGenerator"> <map:parameter name="handle" value="{1}/{2}"/> <map:parameter name="name" value="{4}.xml"/> </map:generate> <map:transform type="xslt-saxon" src="fo_{3}.xsl"> <map:parameter name="handle" value="{1}/{2}"/> <map:parameter name="name" value="{4}"/> </map:transform> <map:serialize type="fo2pdf"/> </map:match> The main theme style sheet (template.xsl) is also used to change the default structure of a re-

pository item’s page (Figure 23). The keywords have been hyperlinked to allow easy search of arti-

cles within the same subjects.

In the default DSpace implementation, the repository item’s page contains the item’s metadata and

a list of its associated bitstreams. The list of bitstreams has been omitted and links to the different

viewing options have been added in the right hand options section. These links make the full text

of articles immediately accessible in different formats.

Furthermore, links to various social bookmarking services (Connotea, CiteULike, Del.icio.us and

Facebook) have been added.

86

Figure 23: article landing page containing links to various viewing options and social bookmarking services

The PDF article full text (Figure 24) is generated using XSL-FO and Apache FOP. Apache FOP is

generally recognized as the leading open source XSL-FO processor. The new FOP version (0.95)

offers improved conformance to the W3C XSL-FO 1.0 standard (www.w3.org/Style/XSL/). XSL-

FO is considered to be a robust system for high-quality PDF production.

However, automated typesetting does have some limitations. This is due to limited conformance of

available XSL-FO processors and limitations of the specification itself. Apache FOP has limitations

with floating text and footnotes, especially in multi-column layouts.

An XSL style sheet (fo.xsl) has been developed which transforms the preservation XHTML into

Formatting Objects which can be serialized into the FOP engine to produce a high quality PDF.

The page layout can be customized by changing the XSL style sheet, residing in the theme subdi-

rectory. This allows journal specific layout definitions.

For the configuration of Apache FOP within Cocoon and how to add custom fonts see appendix,

Listing 14.

87

Figure 24: PDF article full text

88

The second format, in which repository content can be delivered to the user, is XHTML. The

XHTML article representation is generated (xhtml.xsl) according to the repository look-and-feel

(Figure 25).

Figure 25: XHTML article full text

Each medium, both online and print journals, fulfil useful, but different communicative purposes.

Though, electronic publishing services provide advanced communicative capabilities.

Guédon (1994) has written an excellent article about electronic publishing and its relation to the

print culture:

“... moving text from print to a digitized medium transforms its functionalities, the way we relate to it,

and the way it is distributed and received. For example, a digitized document is immediately amena-

ble to full-text searching – a possibility print cannot offer and which no index, chapter heading, sub-

sections or any other devices invented in the course of the last few centuries of print could ever hope to

fulfil. Also, print offers only one way to present information. Whether the reader is browsing or study-

ing deeply, printed texts remain wedded to paper. With digitized documents, the reader moves from

89

browsing mode, often on the screen, to deep reading, often through a printout of the document. In other

words, electronic publishing brings about a distinction between the access to information and the way

readers relate to it. According to our needs, we materialize the electronic information differently and we

search it or study it or recycle it in other documents differently.”

Distributing scholarly articles online enables easy searching and distribution and can improve read-

ing experience (Willinsky 2006, pp.14-15) in the following ways:

• references (cited work) can be linked to the appropriate full text (at least to the abstract).

This can even be automated, if all references are provided in a consistent reference format.

• internal references can be linked as well, for example authors are hyperlinked to their affilia-

tion.

• figures and images can be more comprehensive in size and colors. So as not to impede the

reading flow, images in the proposed solution are displayed in smaller size (Figure 26).

Clicking on the small image opens a new window showing the image in its original dimen-

sions.

Figure 26: clickable images in XHTML are not impeding the reading flow

• XHTML enables the use of semantic markup within articles, such as embedding Dublin Core

metadata in <meta> elements or embedding microformats (microformats.org). Exposing this

machine readable information increases interoperability with other applications like refer-

ence management tools, for example Zotero (http://www.zotero.org, Figure 27) or Connotea

(http://www.connotea.org). The Dublin Core metadata (Figure 28)—created in the

XHTML preservation format during the submission step—is exposed in the XHTML full

text representation of the article.

90

Figure 27: automated saving of repository content to Zotero

Figure 28: Dublin Core metadata made visible by the Dublin Core Viewer, a Firefox extension

• the article structure can be outlined at the beginning of the article to allow easy navigation

• in contrast to a print edition, there are no space limitations in the online era. For example,

abbreviations can be replaced or additionally described by their exact meaning.

Mathematical equations are common objects in scientific articles. The publishing workflow has

been chosen to fully support mathematical formulae. In order for this to work, formulae must be

authored using the equation editor in OpenOffice.org Writer. OpenOffice.org Writer emits equa-

tions in the MathML (Mathematical Markup Language) XML vocabulary. During submission, the

MathML equations are embedded into the XHTML preservation format.

91

Since MathML and XHTML are both XML based, namespaces provide a standard mechanism for

embedding MathML in XHTML. MathML markup reside in a separate namespace

(xmlns="http://www.w3.org/1998/Math/MathML").

For PDF generation (Figure 29), the MathML markup must be wrapped into a fo:instream-

foreign-object element:

 <xsl:template match="math:math"> <fo:instream-foreign-object> <math xmlns="http://www.w3.org/1998/Math/MathML" mode="inline"> <xsl:copy-of select="*"></xsl:copy-of> </math> </fo:instream-foreign-object> </xsl:template>

Figure 29: PDF containing mathematical formulae

5.3 Journal Management Workflow

The journal management workflow was built upon a new Manakin aspect (see section 4.1 for an

introduction on aspect concepts). An aspect provides a set of features for the DSpace repository

system. To install the new Journal aspect in Manakin, the aspect classes must be made available

for the DSpace web application by copying it to the [xmlui]/WEB-INF/classes directory. The

aspect configuration files (sitemap.xmap and journal.js) must reside in the (new)

[xmlui]/aspects/Journal directory. In the xmlui.xconf configuration file (available in the

[dspace]/config directory), the aspect must be registered by adding it to the <aspects> sec-

tion:

<aspect name="Journal" path="Journal/"/> See Donohue et al. (2007) for details about installing aspects.

92

The Journal aspect code relies on some new parameters specified in the main DSpace configura-

tion file ([dspace]/config/dspace.cfg, Listing 8). A new journal.community.handle pa-

rameter has been introduced, representing the handle of the journal community. This means, that

all descendant communities of this journal community represent journals and all descendant collec-

tions of these journals represent journal issues.

Likewise, the preprints.collection.handle parameter represents the handle of the Preprints

collection.

To allow articles to be sorted and displayed according to the journal table of contents, a new sort-

option has been defined in dspace.cfg, which allows sorting by the prism:startingpage

metadata element. The new sort option is then referenced to the journal.toc.sort-option

parameter.

Any change in the search or sorting configuration implies the recreation of the DSpace search in-

dexes. This is being accomplished by using the script [dspace]/bin/index-init index-

update.

Listing 8: Journal aspect specific parameters in dspace.cfg

# Set the options for what can be sorted by ... webui.itemlist.sort-option.4 = startingpage:prism.startingPage:text ... #---------------------------------------------------------------# #------------JOURNAL SPECIFIC CONFIGURATIONS--------------------# #---------------------------------------------------------------# # These configs are used by the JOURNAL aspect # #---------------------------------------------------------------# journal.community.handle = 123456789/4 preprints.collection.handle = 123456789/24 journal.toc.sort-option = startingpage

Manakin comes with four default aspects (Artifact Browser, Administration, E-Person and Submis-

sion aspect). All aspects are chained together in the aspect chain, generating the final DRI pages.

Each aspect adds content to or modifies content of the DRI page.

The new Journal aspect has been inserted at second position in the aspect chain (Figure 30). The

final DRI page is converted into XHTML by the Library theme (see section 5.2) style sheet (tem-

93

plate.xsl). This style sheet has been altered to—if present—prioritize content added by the

Journal aspect.

Figure 30: aspect chain containing the Journal aspect

An aspect is embodied in a Cocoon pipeline (dspace.org 2008), which is defined in the aspect sub-

sitemap. In addition, the sub-sitemap lists all components used by the aspect (see appendix, Listing

17 for the complete Journal aspect sitemap).

For example, Figure 31 shows the journal issue landing page, including a link to the issue’s PDF, a

table of contents and journal sections. By means of this piece of application the general aspect con-

cept will be presented.

Figure 31: journal issue table of contents

94

The Cocoon transformer component for this page is named JournalIssueViewer and associated

to the Java class (at.ac.wuwien.xmlui.aspect.journal.JournalIssueViewer), which im-

plements its function.

This transformer is referenced in a Cocoon pipeline, which—if it is matched and therefore proc-

esses the actual DRI document—augments the input DRI page with some content. In this particu-

lar example, the HandleTypeMatcher tests as to whether the current page represents a collection.

Listing 9: journal issue viewer transformer and pipeline section

<map:transformer name="JournalIssueViewer" src="at.ac.wuwien.xmlui.aspect.journal.JournalIssueViewer"/>

...  <map:match type="HandleTypeMatcher" pattern="collection"> <map:transform type="JournalIssueViewer"/> <map:serialize type="xml"/> </map:match>

The JournalIssueViewer transformer is a subclass of AbstractDSpaceTransformer. The class

adds a new division named collection-journal-toc containing a link to the issue PDF and a

list of items representing the table of contents (Listing 10 and Figure 32).

Listing 10: JournalIssueViewer Java class

public class JournalIssueViewer extends AbstractDSpaceTransformer imple-ments CacheableProcessingComponent { // link to journal pdf Para issueLinkPara = home.addPara(); issueLinkPara.addXref(contextPath+"/view/issue/handle/"+ dso.getHandle()+"/journal.pdf", "Journal PDF"); // Journal TOC { java.util.List<BrowseItem> items = getIssueItems(collection); Division tocDiv = home.addDivision("collection-journal-toc","secondary journal-toc"); // FIXME: Header should be internationalized tocDiv.setHead("Table of Contents"); String section = ""; ReferenceSet toc = null; for (BrowseItem item : items) { // check item section itemSectionArray = item.getMetadata("prism", "section", Item.ANY, Item.ANY); // if a new section starts, create a new referenceset

95

if (itemSectionArray != null && itemSectionArray.length >= 1){ if (!(itemSectionArray[0].value.equals(section))){ // referenceset for new section section = itemSectionArray[0].value; toc = tocDiv.addReferenceSet( "collection-table-of-contents "+section, ReferenceSet.TYPE_SUMMARY_LIST, null, "table-of-contents"); toc.setHead(section); } // add the item reference toc.addReference(item); } } }

Figure 32: DRI content added by JournalIssueViewer transformer

In more complex aspects, there is a need for interaction between the web application and the client.

Though, if the application needs to collect information from the user using more than one page,

the application has to maintain the input accumulated so far. For this to work, Cocoon offers an

advanced control flow, which describes the order of pages that have to be sent to the client, at any

given time in an application (Apache Cocoon 2005). Cocoon uses the concept of continuation ob-

jects, which contain a snapshot of the stack trace.

96

The flow of information using a control flow is described in detail in Apache Cocoon (2005): A

request is received by Cocoon and passed to the sitemap for processing. In the sitemap, two things

can be done to pass the control to the Control Flow layer:

• a JavaScript top-level function can be invoked to start processing a sequence of pages. Each

time a response page is being sent back to the client browser from this function, the proc-

essing of the JavaScript code stops at the point the page is sent back, and the HTTP request

finishes. The execution state is saved in a continuation object.

• To invoke a top level JavaScript function in the Control Flow, you use the

<map:callfunction="function-name"/> construction.

• To restart the computation of a previously stopped function, you use the

<map:callcontinuation="..."/> construction. This restarts the computation saved in

a continuation object. When the computation stored in the continuation object is restarted,

it appears as if nothing happened, all the local and global variables have exactly the same

values as they had when the computation was stopped.

In the proposed solution, already published journal articles can be consumed anonymously. How-

ever, the management of journals and issues requires certain user permissions. The management of

particular journal issues requires a sequence of interactive pages which have been built using the

described Cocoon flow concept.

The control flow for the Journal aspect uses the JavaScript defined in journal.js. The JavaScript

is registered in the aspect sub-sitemap.

<map:flow language="javascript"> <map:script src="journal.js"/> </map:flow>

The solution utilizes DSpace’s user groups and authorization concept. If a user has administration

privileges for a particular journal community, he is entitled to perform journal management tasks,

like creating new and editing existing—published or unpublished—journal issues (Figure 33). For a

comprehensive introduction on the DSpace authorization system see its documentation (dspace.org

2006).

97

Figure 33: journal landing page (showing unpublished issues to privileged users)

The administrator of the journal community can edit an issue, change the status of an issue and

map items from the main Articles collection to the journal issue collection. This causes a new meta-

data entry being assigned to the mapped item (DC.Relation.isPartOf), indicating the handle

identifier of the journal issue, which the article is now part of. Having done so, the journal adminis-

trator can reorder articles in a table of contents view (Figure 34). The articles in the journal issue

can also be grouped into sections (like Editorial, Research Articles,…).

Figure 34: journal issue table of contents administration

98

After final approval and changing the status to published (by clicking Publish Issue, Figure 35), all

items are automatically unmapped from the Preprints collection and permissions are changed, so that

the journal issue is available for all DSpace users.

Figure 35: journal issue adminstration page

A journal issue aggregates multiple articles. To create a complete journal issue in PDF format, a

new Cocoon generator (JournalIssueGenerator.class) has been developed which generates

an RDF file represented in XML, according to the issue’s table of contents. The RDF format is

based on RSS (http://web.resource.org/rss/1.0/).

The basic idea is that each journal issue is represented as an RSS feed. This XML based RDF file is

serialized into a Cocoon pipeline where an XSL style sheet is applied to generate the final journal

issue PDF (Figure 36). The style sheet is the same as is used for a single article (fo.xsl, Listing

11). It automatically detects—based on the different root elements—whether the request is to gen-

erate a single article PDF or a complete issue PDF. The template uses XSLT 2.0 functions for tun-

nelling the handle to other templates. For the issue’s PDF a front matter, including the table of

contents, is additionally generated.

Listing 11: matching RDF root element in fo.xsl

    <xsl:template match="/rdf:RDF">    <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">  <fo:layout-master-set> <xsl:copy-of select="$simple.page.master"/> <xsl:copy-of select="$page.sequence"/> </fo:layout-master-set>

99

<!—process Table of Contents --> <xsl:call-template name="TOC"> <xsl:with-param name="rdf_root" select="."></xsl:with-param> </xsl:call-template>  <xsl:for-each select="rss:channel/rss:items/rdf:Seq/rdf:li"> <xsl:variable name="item_position" select="position()"/> <xsl:variable name="current_handle" se-lect="concat(tokenize(@resource, '/')[last() - 1], '/' , token-ize(@resource, '/')[last()])"/> <xsl:variable name="current_bitstream" se-lect="concat(xmlui/bitstream/handle/', $current_handle, '/article.xhtml')"/> <xsl:variable name="current_xhtml" se-lect="document($current_bitstream)"/> <xsl:apply-templates select="$current_xhtml/html" mode="issue">  <xsl:with-param name="this_handle" select="$current_handle" tunnel="yes"/>  <xsl:with-param name="this_item_position" se-lect="$item_position" tunnel="yes"/> </xsl:apply-templates> </xsl:for-each> </fo:root> </xsl:template>

Figure 36: journal issue PDF

100

6 Related Work

Traditional journal management software does not interoperate with repository platforms. How-

ever, there are few projects which are heading into a similar direction as the project presented in

this paper.

The Australian University of Southern Queensland has developed the Integrated Content Environment

for Research and Scholarship (ICE-RS), a web based content management system with multi channel

publishing features (Sefton 2006). ICE converts and preserves documents written in a word proc-

essor in XHTML. Recently, they have done some research on how to integrate repositories with

their authoring and content conversion tools (Sefton 2008). For example, they have developed an

ICE disseminator for the Fedora repository platform. They have also integrated ICE with Open

Journal Systems, which allows uploading ICE documents to the journal management software.

Another system, which uses the same authoring templates as ICE, is the Digital’s Scholar Workbench

(Barnes 2006a). The Digital’s Scholar Workbench uses the DocBook XML format for preserving

documents in an institutional repository (DSpace). They also provide XHTML and PDF represen-

tations from DSpace items.

Another project is edoc, a document and publication repository at the Humboldt University in Berlin

(Dobratz 2005). The edoc server is built to provide Open Access and long term preservation of

archived theses. Theses received in a word processor format or LaTeX are converted into a custom

XML vocabulary for preservation.

The National Library of Australia registered a METS profile (Metadata Encoding and Transmission

Standard), describing rules to access content in digital repositories (Pearce et al. 2008). Utilizing this

METS profile, the journal workflow project developed an interface that takes a native Open Jour-

nal Systems (OJS) journal definition, maps it to the METS profile and submits it to a repository.

DiVA, the Digital Scientific Archive, is an electronic publishing system developed at Uppsala Uni-

versity Library (Müller et al. 2003). The system includes an OAI-PMH (Open Archives Initiative

Protocol for Metadata Harvesting) compliant portal which provides dissemination of content

stored in XML in different formats.

All mentioned projects encode documents using XML for long term preservation, which proves

that the approach presented in this paper is tested and ensures success.

101

7 Conclusion

There is evidence that the Open Access publishing model can complement traditional scientific

communication systems. Self-archiving in repositories is a widely accepted and an—even by com-

mercial publishers—supported method of increasing dissemination of and access to scholarly in-

formation.

However, self-archiving of scientific material is often limited to making the PDF version of already

published articles available in an institutional repository. This is limiting interoperability and flexibil-

ity. Scientific content is valued by its visibility and usability, therefore offering the content in multi-

ple formats is making the content more open. Such changes to scholarly communication can im-

prove accessibility to research findings.

It has been demonstrated that new concepts like multi channel publishing and journal management

workflows can be incorporated in a repository platform like DSpace. The proposed solution repre-

sents the attempt to achieve a combination of Green Road Open Access repository and Gold Road

Open Access journal functionalities. The proposed concept is re-using repository content as part of

new publishing processes. Articles can be packaged into journals which add a more structured view

on repository content.

The new DSpace architecture offers comprehensive methods to build workflow extensions. Such

extensions can also provide a mechanism to ease the authoring and submission process.

It has also been shown that metadata creation can be tied to the document creation phase. This

allows extraction of Dublin Core metadata directly from the authors’ documents.

Although the described system implies some effort of implementation (mostly in developing the

XSL style sheets for multi channel publishing), this work is compensated by economies of scale in

the journal management and publishing workflows.

The proposed system does have some limitations. The biggest is the limited quality control mecha-

nism in DSpace. Another restriction is that DSpace only provides simple collection and community

based metadata and cannot mix items and collections at the same hierarchical level.

Scientific communication is lacking universal authoring and preservation standards. Such standards

would allow to extend interoperability and to realize the overlay journal concept by aggregating

content from distributed sources.

102

Furthermore, automated typesetting using XSL-FO does have unavoidable limitations. But it is

considered to be a robust system which offers sufficient features for high-quality PDF production.

Further developments of the proposed system can include the workflow integration of missing

stages in the journal management process, like peer review. The system could also be extended to

support additional input word processing formats and to develop an automated citation lookup

tool for better reference handling.

103

REFERENCES

Adie, E., 2008. Who comments on scientific papers - and why? Nascent Nature's blog on web technology and science. Available at: http://blogs.nature.com/wp/nascent/2008/07/who_leaves_comments_on_scienti_1.html [Accessed November 5, 2008].

Agosti, D., 2008. Ein Erfolg für Open Access (Wissenschaft, NZZ Online). Neue Zürcher Zeitung (NZZ). Available at: http://www.nzz.ch/nachrichten/wissenschaft/ein_erfolg_fuer_open_access_1.647693.html [Accessed January 15, 2008].

ALPSP, 2005. The facts about Open Access, ALPSP - Association of Learned and Professional Society Publishers. Available at: http://www.alpsp.org/ngen_public/article.asp?id=200&did=47&aid=270&st=&oaid=-1 [Accessed July 1, 2008].

Anderson, C., 2006. Technical solutions: Wisdom of the crowds. Nature: Peer Review: Debate. Avail-able at: http://www.nature.com/nature/peerreview/debate/nature04992.html [Accessed June 11, 2008].

Anderson, R., 2007. Open access - clear benefits, hidden costs. Learned Publishing, 20, 83-84.

Apache Cocoon, 2005. Apache Cocoon - Control Flow. Available at: http://cocoon.apache.org/2.1/userdocs/flow/index.html [Accessed September 7, 2008].

Arms, W.Y., 2002. Quality Control in Scholarly Publishing on the Web. Journal of Electronic Publish-ing, 8(1). Available at: http://www.press.umich.edu/jep/08-01/arms.html [Accessed Febru-ary 11, 2008].

Aronson, J.K., 2005. Open access publishing: too much oxygen? BMJ, 330(7494), 759.

Awre, C., 2006. The technology of open access. In N. Jacobs, ed. Open Access: Key Strategic, Technical and Economic Aspects. Chandos Publishing.

104

Bachrach, S. et al., 1998. INTELLECTUAL PROPERTY: Who Should Own Scientific Papers? Science, 281(5382), 1459-1460.

Bailey Jr., C.W., 2005. Open Access Bibliography: Liberating Scholarly Literature with E-Prints and Open Access Journals, Washington, DC: Association of Research Libraries. Available at: http://www.digital-scholarship.com/oab/oab.pdf [Accessed July 23, 2008].

Bailey Jr., C.W., 2006. What is open access? In N. Jacobs, ed. Open Access: Key Strategic, Technical and Economic Aspects. Chandos Publishing.

Bankier, J. & Perciali, I., 2008. The Institutional Repository Rediscovered: What Can a University Do for Open Access Publishing? Serials Review, 34(1), 21-26.

Barnes, I., 2006a. Integrating the repository with academic workflow. Available at: http://www.apsr.edu.au/Open_Repositories_2006/ian_barnes.pdf [Accessed May 14, 2008].

Barnes, I., 2006b. Preservation of word processing documents, Available at: http://www.apsr.edu.au/publications/preservation_of_word_processing_documents.html [Accessed October 1, 2007].

Bass, M.J. et al., 2002. DSpace - Technology & Architecture. Available at: http://www.dspace.org/technology/architecture.pdf [Accessed March 2, 2008].

Benos, D.J. et al., 2007. The ups and downs of peer review. Advances in Physiology Education, 31(2), 145-152.

Berlin 3 Open Access, 2005. Berlin 3 Open Access: Progress in Implementing the Berlin Declara-tion on Open Access to Knowledge in the Sciences and Humanities. Available at: http://www.eprints.org/events/berlin3/index.html [Accessed August 24, 2008].

Berlin Declaration, 2003. Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. Available at: http://www.zim.mpg.de/openaccess-berlin/berlin_declaration.pdf [Accessed January 18, 2008].

105

Berners-Lee, T. et al., 2005. Journal publishing and author self-archiving: Peaceful Co-Existence and Fruitful Collaboration. Available at: http://eprints.ecs.soton.ac.uk/11160/ [Accessed June 10, 2008].

Berners-Lee, T. & Hendler, J., 2001. Scientific publishing on the 'semantic web'. Nature webdebates. Available at: http://www.nature.com/nature/debates/e-access/Articles/bernerslee.htm [Accessed January 20, 2008].

Berners-Lee, T., Hendler, J. & Lassila, O., 2001. The Semantic Web. Scientific American, 279(5). A-vailable at: http://www.sciam.com/article.cfm?id=the-semantic-web [Accessed October 18, 2007].

Bethesda Statement, 2003. Bethesda Statement on Open Access Publishing. In Howard Hughes Medical Institute, Chevy Chase, Maryland. Available at: http://www.earlham.edu/~peters/fos/bethesda.htm [Accessed May 17, 2008].

Beyer, A. & Irmer, M., 2007. Sicherheitsaspekte elektronischen Publizierens. In Information und E-thik: Dritter Leipziger Kongress für Information und Bibliothek. pp. 95-106.

Björk, B., 2005. Open access to scientific publications - an analysis of the barriers to change? EBIB, (2), 63.

Borgman, C.L., 1999. What are digital libraries? Competing visions. Information Processing and Man-agement, 35, 227-243.

Bourne, P.E., Fink, J.L. & Gerstein, M., 2008. Open Access: Taking Full Advantage of the Content. PLoS Computational Biology, 4(3). Available at: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2275780 [Accessed June 10, 2008].

Brody, T. et al., 2004. The effect of Open Access on Citation Impact. Available at: http://opcit.eprints.org/feb19oa/brody-impact.pdf [Accessed August 22, 2008].

106

Brown, L., Griffiths, R. & Rascoff, M., 2007. University Publishing In A Digital Age, Available at: http://www.ithaka.org/strategic-services/university-publishing [Accessed January 30, 2008].

Budapest Open Access Initiative, 2002. Budapest Open Access Initiative. Available at: http://www.soros.org/openaccess/read.shtml [Accessed October 8, 2007].

Butler, D., 2005. Science in the web age: Joint efforts. Nature, 438(7068), 548-549.

Carr, L. & Harnad, S., 2005. Keystroke Economy: A Study of the Time and Effort Involved in Self-Archiving. Available at: http://eprints.ecs.soton.ac.uk/10688/ [Accessed May 31, 2008].

Cheung, K. et al., 2005. YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics, 21(Suppl 1), 185-196.

Clarke, M., 2008. Role of blogs in communicating scientific knowledge. Peer-to-Peer. Available at: http://blogs.nature.com/peer-to-peer/2008/04/role_of_blogs_in_communicating.html [Accessed April 27, 2008].

Consultative Committee for Space Data Systems, 2002. Reference Model for an Open Archive Information System (OAIS). Available at: http://public.ccsds.org/publications/archive/650x0b1.pdf [Accessed August 11, 2008].

Couzin, J., 2006. STEM CELLS: ... And How the Problems Eluded Peer Reviewers and Editors. Science, 311(5757), 23-24.

Craig, I.D. et al., 2007. Do open access articles have greater citation impact? : A critical review of the literature. Journal of Informetrics, 1(3), 239-248.

Creative Commons, 2005. About. Available at: http://creativecommons.org/ [Accessed June 15, 2008].

107

De Schouwer, B., 2008. Google Summer of Code 2008 Collection Workflow. DSpace Wiki. Avail-able at: http://wiki.dspace.org/index.php/Google_Summer_of_Code_2008_Collection_Workflow [Accessed October 28, 2008].

Digital Initiatives, 2005. DRI SchemaReference. Available at: http://di.tamu.edu/projects/xmlui/schemaReference [Accessed March 17, 2008].

DiLauro, T., 2004. Choosing the components of a digital infrastructure. First Monday, 9(5). Available at: http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1144/1064 [Accessed October 9, 2008].

Dobratz, S., 2005. Thinking the long term: the XML-based publishing workflow for handling elec-tronic theses and dissertations at Humboldt-University Berlin. In University of New South Wales, Sydney, Australia. Available at: http://adt.caul.edu.au/etd2005/papers/075Dobratz.pdf [Accessed January 29, 2008].

Donohue, T.G., 2007. Configurable Submission System for DSpace. Available at: https://www.ideals.uiuc.edu/handle/2142/207 [Accessed May 31, 2008].

Donohue, T.G., Phillips, S. & Salo, D., 2007. DSpace How-To Guide. Available at: http://hdl.handle.net/2142/1043 [Accessed February 6, 2008].

dspace.org, 2006. DSpace System Documentation: Functional Overview. Available at: http://www.dspace.org/index.php?option=com_content&task=view&id=149 [Accessed September 16, 2008].

dspace.org, 2008. Create a new aspect (Manakin). DSpace Wiki. Available at: http://wiki.dspace.org/index.php/Create_a_new_aspect_(Manakin) [Accessed November 8, 2008].

dspace.org, 2008. Manakin theme tutorial. DSpace Wiki. Available at: http://wiki.dspace.org/index.php/Manakin_theme_tutorial [Accessed October 2, 2008].

108

Eisenberg, J.D., 2006. OpenOffice.org XML Essentials, O’Reilly & Associates, Inc. Available at: http://books.evc-cit.info/oobook/book.html [Accessed June 5, 2008].

European Commission, 2006. Study on the economic and technical evolution of the scientific publication markets in Europe, Available at: http://ec.europa.eu/research/science-society/pdf/scientific-publication-study_en.pdf [Accessed June 13, 2008].

European Commission, 2007. Communication on scientific information in the digital age: access, dissemination and preservation. Available at: http://ec.europa.eu/information_society/activities/digital_libraries/doc/scientific_information/communication_en.pdf [Accessed October 10, 2007].

European Commission, 2008. Better access to scientific articles on EU-funded research: European Commission launches online pilot project. Available at: http://europa.eu/rapid/pressReleasesAction.do?reference=IP/08/1262&format=HTML&aged=0&language=EN&guiLanguage=de [Accessed August 25, 2008].

European Research Council, 2007. ERC Scientific Council Guidelines for Open Access. Available at: http://erc.europa.eu/pdf/ScC_Guidelines_Open_Access_revised_Dec07_FINAL.pdf [Accessed May 23, 2008].

Fink, J.L. & Bourne, P.E., 2007. Reinventing Scholarly Communication for the Electronic Age. CTWatch Quarterly, 3(3). Available at: http://www.ctwatch.org/quarterly/articles/2007/08/ [Accessed June 15, 2008].

Foerster, T.V., 2001. The Future (?) of Peer Review. In The Transition from Paper: Where Are We Going and How Will We Get There? Available at: http://www.amacad.org/publications/trans5.aspx [Accessed May 16, 2008].

Fröhlich, G., 2006. "Informed Peer Review": Ausgleich der Fehler und Verzerrungen? In pp. 193-204. Available at: http://eprints.rclis.org/archive/00008493/fullmetadata.html [Accessed August 16, 2008].

FWF, 2008. Open Access Policy for FWF Projects. Available at: http://www.fwf.ac.at/en/public_relations/oai/index.html [Accessed July 25, 2008].

109

Garfield, E., 1972. What is a journal? Current Comments, 1(45), 376-377.

Gerstein, M., 1999. E-publishing on the Web: promises, pitfalls, and payoffs for bioinformatics. Bioinformatics, 15(6), 429-431.

Ginsparg, P., 1997. Winners and Losers in the Global Research Village. The Serials Librarian, 30(3/4), 83-95.

Ginsparg, P., 2000. Creating a global knowledge network. BMC News and Views, 1(1), 9.

Ginsparg, P., 2007. Next-Generation Implications of Open Access. CTWatch Quarterly, 3(3). Avail-able at: http://www.ctwatch.org/quarterly/articles/2007/08/ [Accessed June 15, 2008].

Graham, T.W., 2000. Scholarly Communication. Serials: The Journal for the Serials Community, 13(1), 3-11.

Guédon, J., 1994. Why are electronic publications difficult to classify? Available at: http://people.virginia.edu/~pm9k/libsci/guedon.html [Accessed January 31, 2008].

Guédon, J., 2001. In Oldenburg’s Long Shadow: Librarians, Research Scientists, Publishers, and the Control of Scientific Publishing. In Proceedings of the 138th Annual Meeting. Toronto. Avail-able at: http://www.arl.org/resources/pubs/mmproceedings/138guedon.shtml [Accessed August 9, 2008].

Guédon, J., 2004. The “Green” and “Gold” Roads to Open Access: The Case for Mixing and Mat-ching. Serials Review, 30(4), 315-328.

Guédon, J., 2006. Open access: a symptom and a promise. In N. Jacobs, ed. Open Access: Key Strate-gic, Technical and Economic Aspects. Chandos Publishing.

Hammond, T., Hannay, T. & Lund, B., 2004. The Role of RSS in Science Publishing. D-Lib Maga-zine, 10(12). Available at:

110

http://www.dlib.org/dlib/december04/hammond/12hammond.html [Accessed April 1, 2008].

Hannay, T., 2006. The Scientific Paper of the Future. Available at: http://blogs.nature.com/wp/nascent/061014_eScience_Hannay.pdf [Accessed October 4, 2007].

Hannay, T., 2007. Interview with Timo Hannay, Head of Web Publishing, Nature Publishing Group. Available at: http://jdupuis.blogspot.com/2007/07/interview-with-timo-hannay-head-of-web.html [Accessed November 29, 2007].

Hannay, T., 2007. Web 2.0 in Science. CTWatch Quarterly, 3(3). Available at: http://www.ctwatch.org/quarterly/articles/2007/08/web-20-in-science/ [Accessed June 15, 2008].

Harnad, S., 1996. Implementing Peer Review on the Net, MIT Press. Available at: http://cogprints.org/1692/0/harnad96.peer.review.html [Accessed December 9, 2007].

Harnad, S., 1997. Learned Inquiry and the Net: The Role of Peer Review, Peer Commentary and Copyright. Available at: http://eprints.ecs.soton.ac.uk/2633/ [Accessed August 18, 2008].

Harnad, S., 1998. The invisible hand of peer review. Nature Web Matters. Available at: http://www.nature.com/nature/webmatters/invisible/invisible.html [Accessed October 12, 2008].

Harnad, S., 2004. The Green Road to Open Access: A Leveraged Transition. Available at: http://users.ecs.soton.ac.uk/harnad/Temp/greenroad.html [Accessed July 22, 2008].

Harnad, S., 2005. Fast-Forward on the Green Road to Open Access: The Case Against Mixing Up Green and Gold. Ariadne, (42). Available at: http://www.ariadne.ac.uk/issue42/harnad/ [Accessed October 9, 2007].

111

Harnad, S., 2006. Publish or Perish — Self-Archive to Flourish: The Green Route to Open Access - ECS EPrints Repository. ERCIM News, (64). Available at: http://eprints.ecs.soton.ac.uk/11715/ [Accessed July 22, 2008].

Harnad, S. & Brody, T., 2004. Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals. D-Lib Magazine, 10(6). Available at: http://www.dlib.org/dlib/june04/harnad/06harnad.html [Accessed August 23, 2008].

Harnad, S. et al., 2004. The Access/Impact Problem and the Green and Gold Roads to Open Ac-cess. Serials Review, 30(4). Available at: http://eprints.ecs.soton.ac.uk/10209/01/impact.html [Accessed October 9, 2007].

Harnad, S. et al., 2008. The Access/Impact Problem and the Green and Gold Roads to Open Ac-cess: An Update. Serials Review, 34(1), 36-40.

Henry, G., 2003. On-line Publishing in the 21st Century. D-Lib Magazine, 9(10). Available at: http://www.dlib.org//dlib/october03/henry/10henry.html [Accessed August 17, 2008].

Houghton, J.W., 2001. Crisis and transition: the economics of scholarly communication. Learned Publishing, 14(3), 167-176.

House of Commons Science and Technology Committee, 2004. Scientific Publications: Free for all?, Available at: http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/39903.htm [Accessed May 19, 2008].

International Association of Scientific, Technical & Medical Publishers, 2007. Brussels Declaration on STM publishing. Available at: http://www.stm-assoc.org/brussels-declaration/ [Ac-cessed October 12, 2008].

Kelly, K., 2005. We Are the Web. Wired Magazine. Available at: http://www.wired.com/wired/archive/13.08/tech_pr.html [Accessed November 3, 2008].

112

Kennan, M.A. & Kautz, K., 2007. Scholarly Publishing and Open Access: Searching for Under-standing of an Emerging IS Phenomenon. In Proceedings ECIS 2007. St Gallen, Switzerland. Available at: http://dlist.sir.arizona.edu/1867/ [Accessed May 24, 2008].

Kirsop, B. & Chan, L., 2005. Transforming Access to Research Literature for Developing Coun-tries. Serials Review, 31(4), 246-255.

Kling, R. & McKim, G., 1999. Scholarly communication and the continuum of electronic publish-ing. Journal of the American Society for Information Science, 50(10), 890-906.

Koohang, A. & Harman, K., 2006. The Academic Open Access E-Journal: Platform and Portal. Informing Science Journal, 9. Available at: http://hosted.trinigeeks.com/001research/wp-content/uploads/2007/03/informnu-articles-vol9v9p071-081koohang71-the-academic-open-access-e-journal_-platform-and-portal.pdf [Accessed May 25, 2008].

Kurtz, M.J. et al., 2005. The Effect of Use and Access on Citations. Information Processing and Management, 41(6), 1395-1402.

Lagoze, C. & Van de Sompel, H., 2002. Open Archives Initiative - Protocol for Metadata Harvest-ing. Available at: http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm [Ac-cessed July 22, 2008].

Lamb, C., 2004. Open access publishing models: opportunity or threat to scholarly and academic publishers? Learned Publishing, 17(2), 143-150.

Larson, E., 2007. The BibApp. Available at: http://openrepositories.org/2007/program/files/4/larson.pdf [Accessed February 14, 2008].

Lawrence, S., 2001. Online or Invisible? Nature, 411(6837), 521.

Library Journal Academic Newswire, 2008. Submissions Jump Sharply Under New NIH Policy. Available at: http://www.libraryjournal.com/info/CA6581624.html?nid=2673#news1 [Accessed July 25, 2008].

113

Liu, S.V., 2007. Why are people reluctant to join in open review? Nature, 447, 1052.

Lynch, C.A., 2003. Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age. ARL, (226), 1-7.

Mabe, M., 2006. (Electronic) journal publishing. In The E-Resources Management Handbook. UK Seri-als Group, pp. 56-66. Available at: http://uksg.metapress.com/app/home/contribution.asp?referrer=parent&backto=issue,6,12;journal,1,1;linkingpublicationresults,1:120087,1 [Accessed January 30, 2008].

Mazzocchi, S., 2002. Introducing Cocoon 2.0. XML.com. Available at: http://www.xml.com/pub/a/2002/02/13/cocoon2.html [Accessed November 20, 2008].

Müller, E. et al., 2003. The DiVA Project - Development of an Electronic Publishing System. D-Lib Magazine, 9(11). Available at: http://www.dlib.org/dlib/november03/muller/11muller.html [Accessed December 15, 2008].

Mulligan, A., 2004. Is peer review in crisis? Available at: http://www.elsevier.com/framework_editors/pdfs/PerspPubl2.pdf [Accessed May 17, 2008].

National Institutes of Health, 2007. NIH Public Access Policy. Available at: http://publicaccess.nih.gov/policy.htm [Accessed June 25, 2008].

Nature, 2003. Coping with peer rejection. Nature, 425(6959), 645.

Nature, 2006. Nature's peer review trial. Available at: http://www.nature.com/nature/peerreview/debate/nature05535.html [Accessed Novem-ber 5, 2008].

Odlyzko, A., 2002. The rapid evolution of scholarly communication. Learned Publishing, 15(1), 7-19.

114

OECD, 2004. Science, Technology and Innovation for the 21st Century. Meeting of the OECD Committee for Scientific and Technological Policy at Ministerial Level, 29-30 January 2004 - Final Communique. Available at: http://www.oecd.org/document/0,2340,en_2649_34487_25998799_1_1_1_1,00.html [Accessed May 25, 2008].

Oppenheim, C., Greenhalgh, C. & Rowland, F., 2000. The future of scholarly journal publishing. Journal of Documentation, 56(4), 361 - 398.

O'Reilly, T., 2004. The Architecture of Participation. Available at: http://www.oreillynet.com/pub/a/oreilly/tim/articles/architecture_of_participation.html [Accessed October 1, 2007].

O'Reilly, T., 2005. What Is Web 2.0. Available at: http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html?page=1 [Accessed October 31, 2007].

Park, E.G., 2007. Perspectives on Access to Electronic Journals for Long-Term Preservation. Serials Review, 33(1), 22-25.

Pearce, J. et al., 2008. The Australian METS Profile - A Journey about Metadata. D-Lib Magazine, 14(3/4). Available at: http://www.dlib.org/dlib/march08/pearce/03pearce.html [Accessed December 15, 2008].

Phillips, S. et al., 2005. Manakin Developer Guide. Available at: http://di.tamu.edu/projects/xmlui/resources/DevelopersGuide.pdf [Accessed April 9, 2008].

Phillips, S. et al., 2007. Manakin: A New Face for DSpace. D-Lib Magazine, 13(11/12). Available at: http://www.dlib.org/dlib/november07/phillips/11phillips.html [Accessed January 29, 2008].

Potočnik, J., 2007. 'Scientific Publishing in the European Research Area' – Access, Dissemination and Preservation in the Digital Age. Available at: http://europa.eu/rapid/pressReleasesAction.do?reference=SPEECH/07/83&format=HTML&aged=1&language=EN&guiLanguage=en [Accessed May 24, 2008].

115

Roberts, P., 1999. Scholarly Publishing, Peer Review and the Internet. First Monday. Available at: http://firstmonday.org/issues/issue4_4/proberts/ [Accessed August 10, 2008].

Roosendaal, H.E. & Geurts, P.A.T.M., 1998. Forces and functions in scientific communication. In Oldenburg, Germany. Available at: http://www.physik.uni-oldenburg.de/conferences/crisp97/roosendaal.html [Accessed January 30, 2008].

Rowland, F., 2002. The peer-review process. Learned Publishing, 15(4), 247-258.

Royster, P., 2008. Publishing Original Content in an Institutional Repository. Serials Review, 34(1), 27-30.

Rzepa, H.S. & Murray-Rust, P., 2001. A new publishing paradigm: STM articles as part of the se-mantic web. Learned Publishing, 14(3), 177-182.

Sandewall, E., 2006. Opening up the process. Nature's peer review debate. Available at: http://www.nature.com/nature/peerreview/debate/nature04994.html [Accessed June 14, 2008].

Sefton, P., 2006. The Integrated Content Environment. Available at: http://eprints.usq.edu.au/697/1/Sefton_ICE-ausweb06-paper-revised-3.pdf [Accessed December 15, 2008].

Sefton, P., 2007a. Why ICE works. PT's Blog. Available at: http://ptsefton.com/blog/2007/08/10/09-25-10.681066/ [Accessed June 5, 2008].

Sefton, P., 2007b. Why not HTML for online journals? People need the right tools. PT's Blog. Avai-lable at: http://ptsefton.com/blog/2007/08/09/09-23-19.208941/ [Accessed June 5, 2008].

Sefton, P., 2008. Swimming upstream. From the repository to the source, in search of better con-tent - OR08 Publications. Available at: http://pubs.or08.ecs.soton.ac.uk/74/ [Accessed December 15, 2008].

116

Sense About Science ed., 2004. Peer review and the acceptance of new scientific ideas. Available at: http://www.senseaboutscience.org.uk/pdf/PeerReview.pdf [Accessed October 8, 2007].

Seringhaus, M. & Gerstein, M., 2006. The Scientist : The Death of the Scientific Paper. The Scientist, 20(9), 25.

Seringhaus, M. & Gerstein, M., 2007. Publishing perishing? BMC Bioinformatics, 8(17). Available at: http://www.biomedcentral.com/1471-2105/8/17/abstract [Accessed May 16, 2008].

Smith, A.P., 2000. The journal as an overlay on preprint databases. Learned Publishing, 13(1), 43-48.

Smith, J.W.T., 1999. The deconstructed journal - a new model for academic publishing. Learned Publishing, 12(2), 79-91.

Smith, M. et al., 2003. DSpace - An Open Source Dynamic Digital Repository. D-Lib Magazine, 9(1). Available at: http://dx.doi.org/10.1045/january2003-smith [Accessed January 3, 2008].

Smith, R., 1999. Opening up BMJ peer review. BMJ, 318(7175), 4-5.

Suber, P., 2005. Open access, impact, and demand. BMJ, (330), 1097-1098.

Suber, P., 2006. No-fee open-access journals. SPARC Open Access Newsletter, issue #103. Available at: http://www.earlham.edu/~peters/fos/newsletter/11-02-06.htm#nofee [Accessed June 14, 2008].

Suber, P., 2007. Open Access Overview (definition, introduction). Available at: http://www.earlham.edu/~peters/fos/overview.htm [Accessed October 8, 2007].

Suber, P., 2008a. SPARC Open Access Newsletter. SPARC Open Access Newsletter, issue #119. Avail-able at: http://www.earlham.edu/~peters/fos/newsletter/03-02-08.htm [Accessed July 25, 2008].

117

Suber, P., 2008b. Springer buys BioMed Central. Open Access News. Available at: http://www.earlham.edu/~peters/fos/2008/10/springer-buys-biomed-central.html [Ac-cessed October 11, 2008].

Sun, S., Lannom, L. & Boesch, B., 2003. Handle System Overview. Available at: http://www.handle.net/rfc/rfc3650.html [Accessed June 28, 2008].

Swan, A., 2005. Open access self-archiving: An introduction. Available at: http://eprints.ecs.soton.ac.uk/11006/1/jiscsum.pdf [Accessed July 22, 2008].

Swan, A., 2006. Overview of scholarly communication. In N. Jacobs, ed. Open Access: Key Strategic, Technical and Economic Aspects. Chandos Publishing.

Van de Sompel, H. & Lagoze, C., 2007. Interoperability for the Discovery, Use, and Re-Use of Units of Scholarly Communication. CTWatch Quarterly, 3(3). Available at: http://www.ctwatch.org/quarterly/articles/2007/08/ [Accessed June 15, 2008].

Van de Sompel, H. et al., 2004. Rethinking Scholarly Communication. D-Lib Magazine, 10(9). Avail-able at: http://dx.doi.org/10.1045/september2004-vandesompel [Accessed January 9, 2008].

Varmus, H., 2003. „Werdet Teil der Revolution!“. Available at: http://zeus.zeit.de/text/2003/26/N-Interview-Varmus [Accessed October 31, 2007].

W3C, 2008. RDFa in XHTML: Syntax and Processing. Available at: http://www.w3.org/TR/rdfa-syntax/ [Accessed October 16, 2008].

Waldrop, M.M., 2008. Science 2.0: Great New Tool, or Great Risk? Scientific American. Available at: http://www.sciam.com/article.cfm?id=science-2-point-0-great-new-tool-or-great-risk [Ac-cessed January 31, 2008].

Wellcome Trust, 2003. Economic analysis of scientific research publishing, Available at: http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtd003182.pdf [Accessed August 24, 2008].

118

Willinsky, J., 2006. The Access Principle: The Case for Open Access to Research and Scholarship., Cambridge, Massachusetts: MIT Press.

Wittenberg, K., 2008. The Role of the Library in 21st-Century Scholarly Publishing. In No Brief Candle: Reconceiving Research Libraries for the 21st Century. Washington, D.C.: Council on Li-brary and Information Resources. Available at: http://www.clir.org/pubs/reports/pub142/pub142.pdf [Accessed August 16, 2008].

WSIS, 2003. Declaration of Principles. Building the Information Society: a global challenge in the new Millennium. In Geneva. Available at: http://www.itu.int/wsis/docs/geneva/official/dop.html [Accessed August 25, 2008].

119

APPENDIX

DSpace Installation

For a detailed documentation for setting up DSpace see http://www.dspace.org/index.php/

Architecture/technology/system-docs/install.html

The following listings describe how to alter the default XSLT processor and add XSL-FO renderer

Apache FOP:

Listing 12: Adding Saxon XSLT processor to Manakin's main sitemap.xmap

<map:transformer name="xslt-saxon" pool-grow="2" pool-max="32" pool-min="8" src="org.apache.cocoon.transformation.TraxTransformer"> <use-request-parameters>false</use-request-parameters> <use-browser-capabilities-db>false</use-browser-capabilities-db> <xslt-processor-role>saxon</xslt-processor-role> </map:transformer>

Listing 13: Activating Saxon XSLT processor in cocoon.xconf

<component logger="core.xslt-processor" role="org.apache.excalibur.xml.xslt.XSLTProcessor/saxon" class="org.apache.excalibur.xml.xslt.XSLTProcessorImpl"> <parameter name="use-store" value="true"/> <parameter name="transformer-factory" value="net.sf.saxon.TransformerFactoryImpl"/> </component>

Listing 14: Adding Apache FOP to Manakin's main sitemap.xmap

<map:serializer name="fo2pdf" src="org.apache.cocoon.blocks.fop.FOPNGSerializer" mime-type="application/pdf">

<user-config>context://fop/fop.xconf</user-config> </map:serializer>

fop.xconf: add custom fonts (example). For details on the generation of FOP font metrics files see

http://xmlgraphics.apache.org/fop/0.94/fonts.html:

<font metrics-url="context://fop/GDRG____.xml" kerning="yes" embed-url="context://fop/GDRG____.PFB">

<font-triplet name="Garamond" style="normal" weight="normal"/> <font-triplet name="AGaramond" style="normal" weight="normal"/> </font>

120

Listings

Listing 15: DSpace DRI (Digital repository interface) sample:

<?xml version="1.0" encoding="UTF-8"?> <document xmlns="http://di.tamu.edu/DRI/1.0/" xmlns:i18n="http://apache.org/cocoon/i18n/2.1" version="1.1"> <body> <div rend="primary" n="item-view" id="aspect.artifactbrowser.ItemViewer.div.item-view"> <head>The green AND gold road: journal management and publishing workflow extensions for the DSpace repository platform</head> <p rend="item-view-toggle item-view-toggle-top"> <xref target="/xmlui/handle/123456789/29?show=full">Show full item record</xref> </p> <referenceSet type="summaryView" n="collection-viewer" id="aspect.artifactbrowser.ItemViewer.referenceSet.collection-viewer"> <reference repositoryID="123456789" type="DSpace Item" url="/metadata/handle/123456789/29/mets.xml"> <referenceSet rend="hierarchy" type="detailList"> <head>This item appears in the following Collec-tion(s)</head> <reference repositoryID="123456789" type="DSpace Collec-tion" url="/metadata/handle/123456789/5/mets.xml"/> </referenceSet> </reference> </referenceSet> <p rend="item-view-toggle item-view-toggle-bottom"> <xref target="/xmlui/handle/123456789/29?show=full">Show full item record</xref> </p> </div> <div rend="primary journal collection" n="collection-journal" id="at.ac.wuwien.xmlui.aspect.journal.JournalItemViewer.div.collection-journal"> <head>The green AND gold road: journal management and publishing workflow extensions for the DSpace repository platform</head> <referenceSet type="summaryView" n="item" id="at.ac.wuwien.xmlui.aspect.journal.JournalItemViewer.referenceSet.item"> <reference repositoryID="123456789" type="DSpace Item" url="/metadata/handle/123456789/29/mets.xml"> <referenceSet rend="hierarchy" type="summaryList"/> </reference> </referenceSet> </div> </body> <options> <list n="browse" id="aspect.artifactbrowser.Navigation.list.browse"> <head>Browse</head> <list n="global" id="aspect.artifactbrowser.Navigation.list.global"> <head>All of DSpace</head> <item> <xref target="/xmlui/community-list">Communities & Col-lections</xref> </item> <item> <xref target="/xmlui/browse?type=dateissued">By Issue Date</xref> </item> <item>

121

<xref target="/xmlui/browse?type=author">Authors</xref> </item> <item> <xref target="/xmlui/browse?type=title">Titles</xref> </item> <item> <xref target="/xmlui/browse?type=subject">Subjects</xref> </item> </list> <list n="context" id="aspect.artifactbrowser.Navigation.list.context"> <head>This Collection</head> <item> <xref tar-get="/xmlui/handle/123456789/5/browse?type=dateissued">By Issue Date</xref> </item> <item> <xref tar-get="/xmlui/handle/123456789/5/browse?type=author">Authors</xref> </item> <item> <xref tar-get="/xmlui/handle/123456789/5/browse?type=title">Titles</xref> </item> <it <xref tar-get="/xmlui/handle/123456789/5/browse?type=subject">Subjects</xref> </item> </list> </list> <list n="account" id="aspect.artifactbrowser.Navigation.list.account"> <head>My Account</head> <item> <xref target="/xmlui/login">Login</xref> </item> <item> <xref target="/xmlui/register">Register</xref> </item> </list> <list n="context" id="aspect.artifactbrowser.Navigation.list.context"/> <list n="administrative" id="aspect.artifactbrowser.Navigation.list.administrative"/> <list n="context_journal" id="at.ac.wuwien.xmlui.aspect.journal.Navigation.list.context_journal"> <head>This Article</head> <item> <xref tar-get="/xmlui/view/handle/123456789/29/xhtml/article.xhtml">Full Text (XHTML)</xref> </item> <item> <xref tar-get="/xmlui/view/handle/123456789/29/xhtml/article.pdf">PDF</xref> </item> <list n="context_share" id="at.ac.wuwien.xmlui.aspect.journal.Navigation.list.context_share"> <head>Bookmark</head> <item> <xref rend="connotea" tar-get="http://www.connotea.org/add?uri=http://hdl.handle.net/123456789/29">Connotea</xref>

122

</item> <item> <xref rend="citeulike" tar-get="http://www.citeulike.org/posturl?url=http://hdl.handle.net/123456789/29">CiteULike</xref> </item> <item> <xref rend="delicious" tar-get="http://delicious.com/post?url=http://hdl.handle.net/123456789/29">Del.icio.us</xref> </item> <item> <xref rend="facebook" tar-get="http://www.facebook.com/sharer.php?u=http://hdl.handle.net/123456789/29">Facebook</xref> </item> </list> </list> </options> <meta> <userMeta authenticated="no"> <metadata element="identifier" quali-fier="loginURL">/xmlui/login</metadata> <metadata element="language" qualifier="RFC3066">en_GB</metadata> <metadata element="language" qualifier="RFC3066">en</metadata> </userMeta> <pageMeta> <metadata element="contextPath">/xmlui</metadata> <metadata element="request" qualifier="queryString"/> <metadata element="request" qualifier="scheme">http</metadata> <metadata element="request" qualifier="serverPort">8080</metadata> <metadata element="request" qualifier="serverName">dspace.wu-wien.ac.at</metadata> <metadata element="request" quali-fier="URI">handle/123456789/29</metadata> <metadata element="search" quali-fier="simpleURL">/xmlui/search</metadata> <metadata element="search" quali-fier="advancedURL">/xmlui/advanced-search</metadata> <metadata element="search" qualifier="queryField">query</metadata> <metadata element="page" quali-fier="contactURL">/xmlui/contact</metadata> <metadata element="page" quali-fier="feedbackURL">/xmlui/feedback</metadata> <metadata element="focus" quali-fier="object">hdl:123456789/29</metadata> <metadata element="focus" quali-fier="container">hdl:123456789/5</metadata> <metadata element="title">The green AND gold road: journal manage-ment and publishing workflow extensions for the DSpace repository plat-form</metadata> <trail target="/xmlui/">DSpace Home</trail> <trail target="/xmlui/handle/123456789/1">Library</trail> <trail target="/xmlui/handle/123456789/5">All Articles</trail> <trail>View Item</trail> </pageMeta> <repositoryMeta> <repository repositoryID="123456789" url="/metadata/internal/repository/123456789/mets.xml"/> </repositoryMeta> </meta> </document>

123

Listing 16: DSpace METS object sample:

<?xml version="1.0" encoding="UTF-8"?> <mets:METS xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/TR/xlink/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dim="http://www.dspace.org/xmlns/dspace/dim" LABEL="DSpace Item" OBJID="/xmlui/handle/123456789/29" PROFILE="DSPACE METS SIP Profile 1.0" OBJEDIT="/xmlui/admin/item?itemID=178" ID="hdl:123456789/29"> <mets:dmdSec GROUPID="group_dmd_0" ID="dmd_1"> <mets:mdWrap OTHERMDTYPE="DIM" MDTYPE="OTHER"> <mets:xmlData> <dim:dim dspaceType="ITEM"> <dim:field element="contributor" mdschema="dc" quali-fier="author">Andreas Geyrecker</dim:field> <dim:field element="contributor" mdschema="dc" quali-fier="author">Fridolin Wild</dim:field> <dim:field element="date" mdschema="dc" quali-fier="accessioned">2008-06-10T05:13:56Z</dim:field> <dim:field element="date" mdschema="dc" quali-fier="available">2008-06-10T05:13:56Z</dim:field> <dim:field element="date" mdschema="dc" quali-fier="issued">2008-06-10T05:13:56Z</dim:field> <dim:field element="date" mdschema="dc" quali-fier="submitted" language="en">2008-06-10</dim:field> <dim:field element="identifier" mdschema="dc" quali-fier="uri">http://hdl.handle.net/123456789/29</dim:field> <dim:field element="description" mdschema="dc" quali-fier="abstract" language="en">Today a major part of scientific publishing means distribution of articles through journals both in print and online. Articles are still the predominant publishing format, even in the web-based publishing era. In this paper we present an integrated journal management and publishing workflow based on the DSpace repository software. Our aim is to provide storage for and persistent access to research articles archived in an institutional repository and to publish journals from those articles. It is meant to suggest an approach which simplifies the authoring and sub-mission process, including automated metadata extraction directly from the author's document. Delivering content in flexible ways helps to improve reading experience and enables better handling of archived scholarly mate-rial. Therefore, the system offers the publishing of repository content in different output formats, namely XHTML and PDF with the help of XSL style sheets. Furthermore, it should make the set up and maintenance of journal issues easier. The new journal workflow allows flexible packaging of re-pository content and even automated production of print editions of journal issues. We also consider how our approach could be extended to integrate missing journal management stages like peer review.</dim:field> <dim:field element="description" mdschema="dc" quali-fier="provenance" language="en">Submitted by Andreas Geyrecker ([email protected]) on 2008-06-10T05:13:31Z No. of bitstreams: 1 article.xml: 41465 bytes, checksum: 4c0443f33cd3e63f2aa51f08a7e647a3 (MD5)</dim:field> <dim:field element="description" mdschema="dc" quali-fier="provenance" language="en">Approved for entry into archive by Andreas Geyrecker([email protected]) on 2008-06-10T05:13:56Z (GMT) No. of bit-streams: 1 article.xml: 41465 bytes, checksum: 4c0443f33cd3e63f2aa51f08a7e647a3 (MD5)</dim:field> <dim:field element="description" mdschema="dc" quali-fier="provenance" language="en">Made available in DSpace on 2008-06-10T05:13:56Z (GMT). No. of bitstreams: 1 article.xml: 41465 bytes, checksum: 4c0443f33cd3e63f2aa51f08a7e647a3 (MD5)</dim:field>

124

<dim:field element="subject" mdschema="dc" lan-guage="en">open access</dim:field> <dim:field element="subject" mdschema="dc" lan-guage="en">journal management</dim:field> <dim:field element="subject" mdschema="dc" lan-guage="en">institutional repository</dim:field> <dim:field element="title" mdschema="dc" language="en">The green AND gold road: journal management and publishing workflow extensions for the DSpace repository platform</dim:field> <dim:field element="type" mdschema="dc" lan-guage="en">Article</dim:field> <dim:field element="volume" mdschema="prism" lan-guage="">X</dim:field> </dim:dim> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec> <mets:fileGrp USE="LICENSE"> <mets:file SIZE="1842" GROUP_ID="group_file_418" CHECKSUM="9753550784ac11023c625b60772ea2bc" MIMETYPE="text/plain" CHECKSUMTYPE="MD5" ID="file_418"> <mets:FLocat LOCTYPE="URL" xlink:href="/xmlui/bitstream/handle/123456789/29/license.txt?sequence=1" xlink:title="license.txt" xlink:type="locator"/> </mets:file> </mets:fileGrp> <mets:fileGrp USE="CONTENT"> <mets:file SIZE="41465" GROUP_ID="group_file_419" CHECKSUM="4c0443f33cd3e63f2aa51f08a7e647a3" MIMETYPE="text/xml" CHECKSUMTYPE="MD5" ID="file_419"> <mets:FLocat LOCTYPE="URL" xlink:href="/xmlui/bitstream/handle/123456789/29/article.xml?sequence=2" xlink:title="article.xml" xlink:type="locator" xlink:label="Article"/> </mets:file> <mets:file SIZE="26964" GROUP_ID="group_file_421" CHECKSUM="35e28175baf07d30cdf17f46a7a61315" MIMETYPE="image/png" CHECKSUMTYPE="MD5" ID="file_421"> <mets:FLocat LOCTYPE="URL" xlink:href="/xmlui/bitstream/handle/123456789/29/10000000000002C0000002B71A4ABC91.png?sequence=4" xlink:title="10000000000002C0000002B71A4ABC91.png" xlink:type="locator" xlink:label=""/> </mets:file> <mets:file SIZE="19701" GROUP_ID="group_file_423" CHECKSUM="e39998431c146a32b40a54596eda0934" MIMETYPE="image/png" CHECKSUMTYPE="MD5" ID="file_423"> <mets:FLocat LOCTYPE="URL" xlink:href="/xmlui/bitstream/handle/123456789/29/1000000000000394000002C3EAF8DC56.png?sequence=6" xlink:title="1000000000000394000002C3EAF8DC56.png" xlink:type="locator" xlink:label=""/> </mets:file> <mets:file SIZE="12441" GROUP_ID="group_file_425" CHECKSUM="6ddf6d050ec0c652834c0c392edd0654" MIMETYPE="image/png" CHECKSUMTYPE="MD5" ID="file_425"> <mets:FLocat LOCTYPE="URL" xlink:href="/xmlui/bitstream/handle/123456789/29/10000000000002920000019BDB8CEE50.png?sequence=8" xlink:title="10000000000002920000019BDB8CEE50.png" xlink:type="locator" xlink:label=""/> </mets:file> <mets:file SIZE="2001" GROUP_ID="group_file_427" CHECKSUM="301db28f9bf4e7549f55940c14c035fe" MIMETYPE="image/png" CHECKSUMTYPE="MD5" ID="file_427">

125

<mets:FLocat LOCTYPE="URL" xlink:href="/xmlui/bitstream/handle/123456789/29/1000000000000250000000228B3C9064.png?sequence=10" xlink:title="1000000000000250000000228B3C9064.png" xlink:type="locator" xlink:label=""/> </mets:file> <mets:file SIZE="106241" GROUP_ID="group_file_420" CHECKSUM="492982f52e190eb18a0a71584bf9b0c6" MIMETYPE="image/png" CHECKSUMTYPE="MD5" ID="file_420"> <mets:FLocat LOCTYPE="URL" xlink:href="/xmlui/bitstream/handle/123456789/29/10000000000002B40000011E7337002A.png?sequence=3" xlink:title="10000000000002B40000011E7337002A.png" xlink:type="locator" xlink:label=""/> </mets:file> <mets:file SIZE="9236" GROUP_ID="group_file_422" CHECKSUM="96eac881fd48f79cfcac51caa3f9fb09" MIMETYPE="image/png" CHECKSUMTYPE="MD5" ID="file_422"> <mets:FLocat LOCTYPE="URL" xlink:href="/xmlui/bitstream/handle/123456789/29/10000000000002F500000144DAB359C5.png?sequence=5" xlink:title="10000000000002F500000144DAB359C5.png" xlink:type="locator" xlink:label=""/> </mets:file> <mets:file SIZE="2475" GROUP_ID="group_file_424" CHECKSUM="3a3e2a8936148fa2efa57fa5e745097b" MIMETYPE="image/png" CHECKSUMTYPE="MD5" ID="file_424"> <mets:FLocat LOCTYPE="URL" xlink:href="/xmlui/bitstream/handle/123456789/29/10000000000002920000003D62B832F5.png?sequence=7" xlink:title="10000000000002920000003D62B832F5.png" xlink:type="locator" xlink:label=""/> </mets:file> <mets:file SIZE="44560" GROUP_ID="group_file_426" CHECKSUM="45b3d4f0604f9d86d246cce0fcdb0298" MIMETYPE="image/png" CHECKSUMTYPE="MD5" ID="file_426"> <mets:FLocat LOCTYPE="URL" xlink:href="/xmlui/bitstream/handle/123456789/29/100000000000026900000361BEB954CC.png?sequence=9" xlink:title="100000000000026900000361BEB954CC.png" xlink:type="locator" xlink:label=""/> </mets:file> </mets:fileGrp> </mets:fileSec> <mets:structMap TYPE="LOGICAL" LABEL="DSpace"> <mets:div TYPE="DSpace Item" DMDID="dmd_1"> <mets:div TYPE="DSpace Content Bitstream" ID="div_2"> <mets:fptr FILEID="file_419"/> </mets:div> <mets:div TYPE="DSpace Content Bitstream" ID="div_3"> <mets:fptr FILEID="file_421"/> </mets:div> <mets:div TYPE="DSpace Content Bitstream" ID="div_4"> <mets:fptr FILEID="file_423"/> </mets:div> <mets:div TYPE="DSpace Content Bitstream" ID="div_5"> <mets:fptr FILEID="file_425"/> </mets:div> <mets:div TYPE="DSpace Content Bitstream" ID="div_6"> <mets:fptr FILEID="file_427"/> </mets:div> <mets:div TYPE="DSpace Content Bitstream" ID="div_7"> <mets:fptr FILEID="file_420"/> </mets:div> <mets:div TYPE="DSpace Content Bitstream" ID="div_8"> <mets:fptr FILEID="file_422"/> </mets:div> <mets:div TYPE="DSpace Content Bitstream" ID="div_9">

126

<mets:fptr FILEID="file_424"/> </mets:div> <mets:div TYPE="DSpace Content Bitstream" ID="div_10"> <mets:fptr FILEID="file_426"/> </mets:div> </mets:div> </mets:structMap> </mets:METS> Listing 17: Journal aspect sitemap.xmap

<?xml version="1.0"?>   <map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0"> <map:components> <map:transformers> <map:transformer name="JournalViewer" src="at.ac.wuwien.xmlui.aspect.journal.JournalViewer"/>

127

<map:transformer name="JournalIssueViewer" src="at.ac.wuwien.xmlui.aspect.journal.JournalIssueViewer"/> <map:transformer name="JournalItemViewer" src="at.ac.wuwien.xmlui.aspect.journal.JournalItemViewer"/> <map:transformer name="Navigation" src="at.ac.wuwien.xmlui.aspect.journal.Navigation"/> <map:transformer name="JournalMain" src="at.ac.wuwien.xmlui.aspect.journal.JournalMain"/> <map:transformer name="BrowseJournalForm" src="at.ac.wuwien.xmlui.aspect.journal.BrowseJournalForm"/> <map:transformer name="SearchItemForm" src="at.ac.wuwien.xmlui.aspect.journal.SearchItemForm"/> <map:transformer name="SystemwideAlerts" src="org.dspace.app.xmlui.aspect.administrative.SystemwideAlerts"/> <map:transformer name="NotAuthorized" src="org.dspace.app.xmlui.aspect.administrative.NotAuthorized"/> <map:transformer name="RestrictedItem" src="org.dspace.app.xmlui.aspect.artifactbrowser.RestrictedItem"/> <map:transformer name="EditJournalMetadataForm" src="at.ac.wuwien.xmlui.aspect.journal.EditJournalMetadataForm"/> <map:transformer name="DeleteCommunityConfirm" src="org.dspace.app.xmlui.aspect.administrative.community.DeleteCommunityConfirm"/> <map:transformer name="CreateJournalForm" src="at.ac.wuwien.xmlui.aspect.journal.CreateJournalForm"/> </map:transformers> <map:matchers default="wildcard"> <map:matcher name="requestParameterWild" src="org.apache.cocoon.matching.WildcardRequestParameterMatcher"/> <map:matcher name="HandleTypeMatcher" src="org.dspace.app.xmlui.aspect.general.HandleTypeMatcher"/> <map:matcher name="HandleAuthorizedMatcher" src="org.dspace.app.xmlui.aspect.general.HandleAuthorizedMatcher"/> </map:matchers> <map:selectors> <map:selector name="AuthenticatedSelector" src="org.dspace.app.xmlui.aspect.general.AuthenticatedSelector"/> </map:selectors> </map:components> <map:flow language="javascript"> <map:script src="journal.js"/> </map:flow> <map:pipelines> <map:pipeline>  <map:select type="AuthenticatedSelector"> <map:when test="eperson">   <map:match pattern="journal/issue"> <map:match type="request" pattern="journal-continue"> <map:call continuation="{1}"/> </map:match> <map:match type="request" pattern="collectionID"> <map:call function="startJournal"/> </map:match> </map:match>  <map:match pattern="journal/createJournal"> <map:match type="request" pattern="journal-continue">

128

<map:call continuation="{1}"/> </map:match> <map:match type="request" pattern="createNew"> <map:match type="request" pattern="communityID"> <map:call function="startCreateJournal"/> </map:match> <map:call function="startCreateJournal"/> </map:match>  <map:match type="request" pattern="communityID"> <map:call function="startEditJournal"/> </map:match> </map:match> </map:when> </map:select> <map:generate/> <map:transform type="SystemwideAlerts"/> <map:transform type="Navigation"/>  <map:select type="AuthenticatedSelector"> <map:when test="eperson">  <map:match type="WildcardParameterMatcher" pattern="true"> <map:parameter name="parameter-name" value="flow"/> <map:parameter name="flow" value="{flow-attribute:flow}"/>  <map:match type="WildcardParameterMatcher" pattern="true"> <map:parameter name="parameter-name" value="notice"/> <map:parameter name="notice" value="{flow-attribute:notice}"/> <map:transform type="notice"> <map:parameter name="outcome" value="{flow-attribute:outcome}"/> <map:parameter name="header" value="{flow-attribute:header}"/> <map:parameter name="message" value="{flow-attribute:message}"/>

129

<map:parameter name="characters" value="{flow-attribute:characters}"/> </map:transform> </map:match> <map:match pattern="journal/not-authorized"> <map:transform type="NotAuthorized"/> </map:match>  <map:match pattern="journal/issue/main"> <map:transform type="JournalMain"> <map:parameter name="collectionID" value="{flow-attribute:collectionID}"/> </map:transform> </map:match>  <map:match pattern="journal/issue/browse"> <map:transform type="BrowseJournalForm"> <map:parameter name="collectionID" value="{flow-attribute:collectionID}"/> </map:transform> </map:match>  <map:match pattern="journal/issue/search"> <map:transform type="SearchItemForm"> <map:parameter name="collectionID" value="{flow-attribute:collectionID}"/> <map:parameter name="query" value="{flow-attribute:query}"/> </map:transform> </map:match>   <map:match pattern="journal/editJournalMetadata/admin"> <map:transform type="EditJournalMetadataForm"> <map:parameter name="communityID" value="{flow-attribute:communityID}"/> </map:transform> </map:match>  <map:match pattern="journal/deleteJournal/admin"> <map:transform type="DeleteCommunityConfirm"> <map:parameter name="communityID" value="{flow-attribute:communityID}"/> </map:transform> </map:match>  <map:match pattern="journal/createJournal/admin"> <map:transform type="CreateJournalForm"> <map:parameter name="communityID" value="{flow-attribute:communityID}"/> </map:transform> </map:match> </map:match>  </map:when>

130

<map:otherwise> <map:match pattern="journal"> <map:act type="StartAuthentication"/> </map:match> </map:otherwise> </map:select>  <map:match pattern="handle/*/**"> <map:match pattern="handle/*/*"> <map:match type="HandleAuthorizedMatcher" pattern="READ">  <map:match type="HandleTypeMatcher" pattern="community"> <map:transform type="JournalViewer"/> <map:serialize type="xml"/> </map:match>  <map:match type="HandleTypeMatcher" pattern="collection"> <map:transform type="JournalIssueViewer"/> <map:serialize type="xml"/> </map:match>  <map:match type="HandleTypeMatcher" pattern="item"> <map:transform type="JournalItemViewer"/> <map:serialize type="xml"/> </map:match> </map:match> <map:match type="HandleAuthorizedMatcher" pattern="!READ"> <map:select type="AuthenticatedSelector"> <map:when test="eperson"> <map:transform type="RestrictedItem"/> <map:serialize/> </map:when> <map:otherwise> <map:act type="StartAuthentication"> <map:parameter name="header" value="xmlui.ArtifactBrowser.RestrictedItem.auth_header"/> <map:parameter name="message" value="xmlui.ArtifactBrowser.RestrictedItem.auth_message"/> </map:act> <map:serialize/> </map:otherwise> </map:select> </map:match> </map:match> </map:match>   <map:serialize type="xml"/> </map:pipeline> </map:pipelines> </map:sitemap>

content production workflows in the scientific publishing process

Documents