8th international symposion on electronic theses and dissertations, etd2005, sydney scope an xml...
TRANSCRIPT
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
SCOPEAn XML Based Publishing Platform
Uwe Müller, Manuel KlattHumboldt-Universität zu Berlin
Electronic Publishing Group{u.mueller, manuel.klatt}@cms.hu-berlin.de
Service Core for Open Publishing Environments
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Background
• Humboldt University: 800 – 1.000 dissertations / year• Germany: duty to publish dissertations• Humboldt U.: ~ ¼ dissertations published electronically• conference proceedings• series (university series, preprint series, technical
reports …)• electronic journals• Open Access campaign (Pre- / Postprints)
• XML as central strategy
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Why XML?
• Standardized format• Long term preservation• easily convertible to
– presentation formats (HTML, PDF)
– other XML structures• qualified full text retrieval • contains structural and
contextual information – in a machine readable format
HTMLHTML
digital signaturedigital signature
PDFPDF
digital signaturedigital signature
Office documentOffice document
digital signaturedigital signature
XMLXML
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
XML: Restrictions to deal with
• XML source does not contain layout information• rather linear structure• XML is not used as Authoring System
– authors use their 'own' systems• Microsoft Word• LaTeX• Open Office / Star Office• Framemaker• Word Perfect
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
How to overjump the gap?
• get the authors where they are …• instructions and guidelines for authors
– usage of style files (e.g., dissertation-hu.dot) – manuals, support hotline, regular courses
• different conversion processes– SGML author (plug in for MS Word <= 97) – Open Office / Star Office
• exploit genuine XML format
– MS Office 2003XML according to DiML DTD– common pitfalls: tables, pictures
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Conversion Process Using OO (Example)
Open OfficeOpen Office
example.docexample.doc
example.sxw(zip file)
.
.
.
.
.
.
.
.
example.sxw(zip file)
.
.
.
.
.
.
.
.
content.xmlcontent.xml
example_stl.xmlexample_stl.xml
example.xmlexample.xml
front.xmlfront.xmlchapter1.xmlchapter1.xml
chapter2.xmlchapter2.xmlchapter3.xmlchapter3.xml
example.htmlexample.html
*.gif*.gif*.jpg*.jpg
front.htmlfront.htmlchapter1.htmlchapter1.html
chapter2.htmlchapter2.htmlchapter3.htmlchapter3.html
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Principal Structure of a DiML document<etd>
<front>..title...author...abstract...</front> <body> <chapter> <section> ... </body> <back>..bibliography...appendix...vita...</back>
</etd>
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
From flat structure to Hierarchy• only two types of styles in Word
– paragraph styles– character styles
• e.g., in case of the first occurring Heading 1 paragraph style the converter has to know– Heading 1 is the beginning of a chapter– Heading 1 implies a head element– the element chapter can only occur in body
</front><body><chapter>
<head id="anyID">Introduction</head>
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
One Core – Multiple Views
• HTML generation (static or dynamic)– performance problems with XSLT and huge
documents– solution: division of XML sources into components
(easier and fast to process)• PDF + Print on Demand (http://www.proprint-service.de)• Current problems
– changing Office systems and versions• ongoing implementations and adaptations necessary• but: might be restricted to XSL coding
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
XML Based Publishing
characterized by
• complex processes and workflows
• many dependent tools and manual work steps
• relatively high human effort
• different processes for different publications, but with a lot of equal steps and properties
• ongoing development – changing versions
• Basic Idea:1. Raise concrete process description to an abstract level2. Implement integrated workflow system
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
SCOPE
Service
Core for
Open
Publishing
Environments
• support for authors and editorsprovide an integrated publishing platform
• XML based • aiming at technological aspect of
publishing processes• tool management• platform for distributed publishing• generic framework for different
processes
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
SCOPE goals• elementary Publication Components (Document Models,
Authoring Tools, Conversion Scripts, Digital Signatures …)• Management System to organize and administer the Publication
Components– modelling of relations and dependencies– version management
• Publishing System– management and storage of documents
• Workflow System– modelling of recurrent processes (technical validation,
conversion processes, reviewing, conference organisation …)
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Authoring Tools
(e.g. Document Style)
Validation and Correction Tools
(e.g. Word Macros)
Conversion and Tranformation Tools
(XSLT, Perl-Scripts, Java, ...)
Digital Signature Software
(TeleSec)
Print on Demand System
(ProPrint Service)
Registration and Cataloguing Tools
(Metadata System)
Idea / Scholarly Research
Source File
(e.g. MS Word)
XML File(s)
(e.g. in xDiML Format)
corrected Source File
(e.g. MS Word)
HTML File(s)
PDF File
Digital Signatures
MetadataPrinted
Edition / Paper Copy
Web Presentation (Search, Browsing, Fulltexts)
Authoring Process
Publishing Processes
Archival Processes
Publication Components
Publication Components
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Documents and Publication Comp.
• Documents can occur in different formats
• Publication Components can convert formats into each other and change properties
• PCs: automatic and manual "tools"
Publication Component
OccurrenceConversion
PropertyConversion
PropertyValidation
DocumentOccurrence
Property Base Type
Tool
Informal Guidelines
realizes
represents
trans-forms
from
in
trans-forms
from
in
implements
implements
characterizes
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Publication Components (PC)Main Properties (Metadata)• source occurrence (base type + properties) or property • target occurrence (base type + properties) or property• parameters • necessary environment / interfaces• modules / used filesExamples: • autohring tools, conversion scripts, word macros, XSLT
scripts, PDF checker, …
Management system to register PCs and metadata (CVS based)
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Metadata System• special publication component• basic information on each document• adaptable and configurable in terms of
– data model– management processes (forms …)– presentation styles (browsing, search …)
• configuration via XML files and style files• data entry forms can be used
– internally– by extern data managers (editos, by login)– by normal authors (document upload)
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Publishing Process
Formal process model
• assembled with the help of the PC management system(Enquiries to the database …)
• realized and monitored by workflow system
– abstract state machine
– PCs: atomic actions
– integration of external workflow components (e.g. GAPWorks)
• web based – distributed access
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Workflow Engine based on State Machine
WF Definition (XML)
Role Model
User Management
Alert System
Emails
Work List
File System
Metadata DB
edoc Server
WF Data
ApplikationApplikationPublication Component
accesses
accessesaccesses
accesses
manages
activates
interprets
references
references
asksactivates
uses
sends
enquires
activates
Work Control
accesses
User Interface
enquires
accesses
uses
ApplikationApplikationPublication Component
activates
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
SCOPE Service Platform
Publishing System
Management System for Publication Components (PC)
XML DTDs Authoring Tools
XSLT Styles
Validation Tools
DTD
Database and PC Database
Development Interface
Production Interface
DMS
Administration Interface
Interface for Editors
Interface for Users
Workflow System (State Machine) General User Interface
Author
Herausgeber
Editor
Reviewer
Developer
Herausgeber
Editor
Editor
Author
Conversion Tools
Word macros
Metadata Database
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
SCOPE – Service
Support for authors and editors
• tools, adaptations, advisory service
Hosting – centralized technology for distributed publishing
• institutions within university
• small research institutions, smaller universities
• editorial boards of electronic journals
• also: single publication series, technical reports
Technology Transfer
• Publication Components
• modular structure but also: HU specific components
28-30 October 2005 Uwe Müller & Manuel Klatt, Electronic Publishing Group, CMS / UB Humboldt-Universität zu Berlin
SCOPE:
An XML Based Publication PlatformService Core for Open Publishing Environments
8th International Symposion on Electronic Theses and Dissertations, ETD2005, Sydney
Thank you
Questions?
http://edoc.hu-berlin.de/