8th international symposion on electronic theses and dissertations, etd2005, sydney scope an xml...

30
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney SCOPE An XML Based Publishing Platform Uwe Müller, Manuel Klatt Humboldt-Universität zu Berlin Electronic Publishing Group {u.mueller, manuel.klatt}@cms.hu-berlin.de S ervice C ore for O pen Publishing Environm ents

Upload: joella-sheryl-caldwell

Post on 26-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

SCOPEAn XML Based Publishing Platform

Uwe Müller, Manuel KlattHumboldt-Universität zu Berlin

Electronic Publishing Group{u.mueller, manuel.klatt}@cms.hu-berlin.de

Service Core for Open Publishing Environments

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Background

• Humboldt University: 800 – 1.000 dissertations / year• Germany: duty to publish dissertations• Humboldt U.: ~ ¼ dissertations published electronically• conference proceedings• series (university series, preprint series, technical

reports …)• electronic journals• Open Access campaign (Pre- / Postprints)

• XML as central strategy

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Why XML?

• Standardized format• Long term preservation• easily convertible to

– presentation formats (HTML, PDF)

– other XML structures• qualified full text retrieval • contains structural and

contextual information – in a machine readable format

HTMLHTML

digital signaturedigital signature

PDFPDF

digital signaturedigital signature

Office documentOffice document

digital signaturedigital signature

XMLXML

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

XML: Restrictions to deal with

• XML source does not contain layout information• rather linear structure• XML is not used as Authoring System

– authors use their 'own' systems• Microsoft Word• LaTeX• Open Office / Star Office• Framemaker• Word Perfect

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

How to overjump the gap?

• get the authors where they are …• instructions and guidelines for authors

– usage of style files (e.g., dissertation-hu.dot) – manuals, support hotline, regular courses

• different conversion processes– SGML author (plug in for MS Word <= 97) – Open Office / Star Office

• exploit genuine XML format

– MS Office 2003XML according to DiML DTD– common pitfalls: tables, pictures

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Conversion Process Using OO (Example)

Open OfficeOpen Office

example.docexample.doc

example.sxw(zip file)

.

.

.

.

.

.

.

.

example.sxw(zip file)

.

.

.

.

.

.

.

.

content.xmlcontent.xml

example_stl.xmlexample_stl.xml

example.xmlexample.xml

front.xmlfront.xmlchapter1.xmlchapter1.xml

chapter2.xmlchapter2.xmlchapter3.xmlchapter3.xml

example.htmlexample.html

*.gif*.gif*.jpg*.jpg

front.htmlfront.htmlchapter1.htmlchapter1.html

chapter2.htmlchapter2.htmlchapter3.htmlchapter3.html

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Principal Structure of a DiML document<etd>

<front>..title...author...abstract...</front> <body> <chapter> <section> ... </body> <back>..bibliography...appendix...vita...</back>

</etd>

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

From flat structure to Hierarchy• only two types of styles in Word

– paragraph styles– character styles

• e.g., in case of the first occurring Heading 1 paragraph style the converter has to know– Heading 1 is the beginning of a chapter– Heading 1 implies a head element– the element chapter can only occur in body

</front><body><chapter>

<head id="anyID">Introduction</head>

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

One Core – Multiple Views

• HTML generation (static or dynamic)– performance problems with XSLT and huge

documents– solution: division of XML sources into components

(easier and fast to process)• PDF + Print on Demand (http://www.proprint-service.de)• Current problems

– changing Office systems and versions• ongoing implementations and adaptations necessary• but: might be restricted to XSL coding

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

XML Based Publishing

characterized by

• complex processes and workflows

• many dependent tools and manual work steps

• relatively high human effort

• different processes for different publications, but with a lot of equal steps and properties

• ongoing development – changing versions

• Basic Idea:1. Raise concrete process description to an abstract level2. Implement integrated workflow system

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

SCOPE

Service

Core for

Open

Publishing

Environments

• support for authors and editorsprovide an integrated publishing platform

• XML based • aiming at technological aspect of

publishing processes• tool management• platform for distributed publishing• generic framework for different

processes

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

SCOPE goals• elementary Publication Components (Document Models,

Authoring Tools, Conversion Scripts, Digital Signatures …)• Management System to organize and administer the Publication

Components– modelling of relations and dependencies– version management

• Publishing System– management and storage of documents

• Workflow System– modelling of recurrent processes (technical validation,

conversion processes, reviewing, conference organisation …)

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Authoring Tools

(e.g. Document Style)

Validation and Correction Tools

(e.g. Word Macros)

Conversion and Tranformation Tools

(XSLT, Perl-Scripts, Java, ...)

Digital Signature Software

(TeleSec)

Print on Demand System

(ProPrint Service)

Registration and Cataloguing Tools

(Metadata System)

Idea / Scholarly Research

Source File

(e.g. MS Word)

XML File(s)

(e.g. in xDiML Format)

corrected Source File

(e.g. MS Word)

HTML File(s)

PDF File

Digital Signatures

MetadataPrinted

Edition / Paper Copy

Web Presentation (Search, Browsing, Fulltexts)

Authoring Process

Publishing Processes

Archival Processes

Publication Components

Publication Components

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Documents and Publication Comp.

• Documents can occur in different formats

• Publication Components can convert formats into each other and change properties

• PCs: automatic and manual "tools"

Publication Component

OccurrenceConversion

PropertyConversion

PropertyValidation

DocumentOccurrence

Property Base Type

Tool

Informal Guidelines

realizes

represents

trans-forms

from

in

trans-forms

from

in

implements

implements

characterizes

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Publication Components (PC)Main Properties (Metadata)• source occurrence (base type + properties) or property • target occurrence (base type + properties) or property• parameters • necessary environment / interfaces• modules / used filesExamples: • autohring tools, conversion scripts, word macros, XSLT

scripts, PDF checker, …

Management system to register PCs and metadata (CVS based)

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Metadata System• special publication component• basic information on each document• adaptable and configurable in terms of

– data model– management processes (forms …)– presentation styles (browsing, search …)

• configuration via XML files and style files• data entry forms can be used

– internally– by extern data managers (editos, by login)– by normal authors (document upload)

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Publishing Process

Formal process model

• assembled with the help of the PC management system(Enquiries to the database …)

• realized and monitored by workflow system

– abstract state machine

– PCs: atomic actions

– integration of external workflow components (e.g. GAPWorks)

• web based – distributed access

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Workflow Engine based on State Machine

WF Definition (XML)

Role Model

User Management

Alert System

Emails

Work List

File System

Metadata DB

edoc Server

WF Data

ApplikationApplikationPublication Component

accesses

accessesaccesses

accesses

manages

activates

interprets

references

references

asksactivates

uses

sends

enquires

activates

Work Control

accesses

User Interface

enquires

accesses

uses

ApplikationApplikationPublication Component

activates

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

SCOPE Service Platform

Publishing System

Management System for Publication Components (PC)

XML DTDs Authoring Tools

XSLT Styles

Validation Tools

DTD

Database and PC Database

Development Interface

Production Interface

DMS

Administration Interface

Interface for Editors

Interface for Users

Workflow System (State Machine) General User Interface

Author

Herausgeber

Editor

Reviewer

Developer

Herausgeber

Editor

Editor

Author

Conversion Tools

Word macros

Metadata Database

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

SCOPE – Service

Support for authors and editors

• tools, adaptations, advisory service

Hosting – centralized technology for distributed publishing

• institutions within university

• small research institutions, smaller universities

• editorial boards of electronic journals

• also: single publication series, technical reports

Technology Transfer

• Publication Components

• modular structure but also: HU specific components

28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin

SCOPE:

An XML Based Publication PlatformService Core for Open Publishing Environments

8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney

Thank you

Questions?

[email protected]

[email protected]

http://edoc.hu-berlin.de/