xml-publishing - implementation strategy file · web viewimplementation strategy of an...

24
STIS Statistical Information Systems Consortium INTRASOFT INTERNATIONAL S.A. and AGILIS S.A. European Commission – EUROSTAT/B3 Framework Contract 14200/2005/007-2005/699 - Lot 1 Specific Contract 17101.2006.001-2006.457 ‘XML-Publishing’ Implementation Strategy of an XML-based publishing in Eurostat D2.1: Analysis & Evaluation of existing standards

Upload: ngokien

Post on 06-Feb-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

STISStatistical

Information Systems

ConsortiumINTRASOFT INTERNATIONAL S.A.

andAGILIS S.A.

European Commission – EUROSTAT/B3

Framework Contract 14200/2005/007-2005/699 - Lot 1Specific Contract 17101.2006.001-2006.457

‘XML-Publishing’Implementation Strategy of an XML-based publishing

in Eurostat

D2.1: Analysis & Evaluation of existing standards

May 2007

Page 2: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 2/17

Document Service Data

Type of Document Project deliverable

Reference: document.doc

Issue: 2 Revision: 0 Status: Company Approved

Created by: Victorio Bentivogli, Christian Boudot

Date: 18/05/2007

Distribution: EU-Eurostat, Intrasoft International S.A.

Contract Full Title: XML-Publishing - Implementation Strategy

Service contract number: Specific Contract 17101.2006.001-2006.457

For Internal Use Only

Reviewed by: CBO, VJB

Approved by: MFE

Document Change Record

Issue/Revision Date Change

0.1 15/01/2007 First draft document

0.2 05/02/2007 Updated draft

0.3 23/02/2007 Updated draft

0.4 07/03/2007 Updated draft

1.0 07/04/2007 Added information about CoSSI

1.1 10/04/2007 Updated draft

1.2 15/04/2007 Updated draft

1.3 30/04/2007 Updated draft

1.4 09/05/2007 Delivery version

2.0 18/05/2007 Delivery version

Page 3: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 3/17

Table of contents Page

1 Introduction.................................................................................................................................... 41.1 Purpose......................................................................................................................................... 41.2 References.................................................................................................................................... 4

2 Existing Standards......................................................................................................................... 52.1 CoSSI............................................................................................................................................ 5

2.1.1 Introduction......................................................................................................................... 52.1.2 Specification....................................................................................................................... 52.1.3 Usage................................................................................................................................. 5

2.2 Formex 4....................................................................................................................................... 52.2.1 Introduction......................................................................................................................... 52.2.2 Specification....................................................................................................................... 62.2.3 Usage................................................................................................................................. 6

2.3 DocBook........................................................................................................................................ 62.3.1 Introduction......................................................................................................................... 62.3.2 Specification....................................................................................................................... 62.3.3 Usage................................................................................................................................. 7

2.4 ODF............................................................................................................................................... 72.4.1 Introduction......................................................................................................................... 72.4.2 Specification....................................................................................................................... 82.4.3 Usage............................................................................................................................... 11

2.5 MS OOXML................................................................................................................................. 112.5.1 Introduction.......................................................................................................................112.5.2 Specification..................................................................................................................... 112.5.3 Usage............................................................................................................................... 13

2.6 Comparison Tables..................................................................................................................... 132.6.1 General Information..........................................................................................................132.6.2 Characteristics..................................................................................................................13

2.7 Conclusion................................................................................................................................... 14

3 ODF vs OOXML............................................................................................................................. 153.1 Advantages of OpenDocument over Office Open XML formats..................................................153.2 Advantages of Office Open XML formats over OpenDocument..................................................153.3 Shortcomings of OpenDocument................................................................................................153.4 Shortcomings of Office Open XML..............................................................................................163.5 Cross-platform interoperability.....................................................................................................163.6 Conclusion................................................................................................................................... 16

Table of Figures

Table 1: References............................................................................................................................... 4Table 2: Document types........................................................................................................................ 8

Page 4: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 4/17

1 Introduction

1.1 Purpose

This document stands for the deliverable “D2.1 Analysis & Evaluation of existing standards” and is the outcome of Task 2.1 “Analysis & Evaluation existing standards for publications”.

1.2 References

This document references:

Reference Document/Resource Name Filename

R1 OASIS foundation http://opendocument.xml.org/

http://www.oasis-open.org/committees/download.php/18630/06-06-08-bidi-appendix

http://books.evc-cit.info/odbook/ch05.html#table-value-table

R2 Ecma International http://www.ecma-international.org/publications/standards/Ecma-376.htm

http://www.ecma-international.org/news/PressReleases/PR_TC45_Dec2006.htm

R3 Microsoft http://www.microsoft.com/office/xml/covenant.mspx

R4 ISO http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=43485

R5 W3C http://www.ecma-international.org/news/PressReleases/PR_TC45_Dec2006.htm

Table 1: References

Page 5: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 5/17

2 Existing Standards

2.1 CoSSI

2.1.1 Introduction

CoSSI stands for “Common Structure of Statistical Information”. This model covers different ways of statistical data organisation (statistical data matrix and statistical table), statistical publications (monthly and quarterly publications, press releases, etc.) and quality declarations. The structuring of the metadata connected to statistical data is also implemented within this system.

2.1.2 Specification

The CoSSI model defines the structures of statistical data (matrices and tables), metadata (document and statistical metadata, and quality declarations), and publications using XML DTDs. The CoSSI model is comprised of several DTDs that can be modularly combined for different types of documents. The basic document types are a statistical table, a statistical matrix and a publication. These documents are XML documents that are compatible with the CoSSI model and also contain the metadata and the language versions necessary for describing a set of statistics.

2.1.3 Usage

As an in-house initiative within the Statistic Finland, it is not used elsewhere.

!CoSSI is a model that was designed to fulfil the requirements of Statistics Finland. According to our research it hasn’t been widely adapted.

Furthermore the integration into Office products was not part of the implementation scope.

2.2 Formex 4

2.2.1 Introduction

Formex stands for “Formalised exchange of electronic documents”. It is the document exchange format for the Office for Official Publications of the European Union. It is used for the delivery of documents to the Official Journal of the European Union. Formex was developed in-house by the Office of Publications and it is not a standard used elsewhere in the industry.

The first version of the specification was started in 1985. Initially it was a mixture of SGML (Standard Generalized Markup Language) and CCF (Common Communication Format). By 1999, the slow take-

Page 6: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 6/17

up of SGML and lack of tools available on the market led to a move to XML. Formex v4 came out on February 9, 2004. It included three major initiatives, the migration to XML, the character migration to Unicode and the adoption of XML schema. The migration to XML also gave opportunity to stream-line the specifications. Instead of about 1200 tags in Formex version 3, Formex version 4 consists of only about 260 tags.

2.2.2 Specification

Version 4 of Formex 4 was published in October 31, 2006. It will enter into force on January 1, 2007. It includes changes for the new structure of the Official Journal.

2.2.3 Usage

As an in-house initiative within the Office for Official Publications of the European Union, it is not used elsewhere.

!Formex 4 is a model that was designed to fulfil the requirements of OPOCE. According to our research it hasn’t been widely adapted.

As for CoSSI the integration into Office products was not part of the implementation scope.

2.3 DocBook

2.3.1 Introduction

DocBook is a markup language for technical documentation. It was originally intended for authoring technical documents related to computer hardware and software but it can be used for other sort of documentation as well.

DocBook began in 1991 as a joint project of HaL Computer Systems and O'Reilly & Associates and eventually spawned its own maintenance organization before moving in 1998 to the SGML Open consortium, which subsequently became OASIS. DocBook is currently maintained by the DocBook Technical Committee at OASIS.

2.3.2 Specification

As of December 2006, DocBook version 5.0 is in its 1st candidate release. There are many changes between the older 4.x versions and the 5.0 version. Among them are that DocBook is defined by a RELAX NG + Schematron schema. While there is a W3C XML Schema + Schematron version

Page 7: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 7/17

available, it is not considered the definitive or "normative" version of the schema. There is also a DTD available, though it lacks the power to truly validate all DocBook 5 documents.

DocBook 5 markup is not a strict superset of DocBook 4.x. Many of the redundancies that grew from DocBook's origins have been concatenated. For example, in DocBook 4.x, there were a set of info elements (bookinfo, chapterinfo, appendixinfo, etc) that describe information about that particular kind of element (book, chapter, appendix). In most cases, the contents of these elements were identical. However, because DocBook 4.x was defined by a DTD, any differences between these info elements based on their context required a new element name, as DTDs can only describe the content model of an element based on its name. RELAX NG has no such limitations, so all of these elements are called info in DocBook 5.

Because DocBook 5 is defined by a RELAX NG schema rather than a DTD, versioning became an issue. As such, in DocBook 5, the version of a document is defined by a version, which is required on the root element of a DocBook 5 document. This attribute specifies the version of DocBook 5 that the document is written against. Through Schematron rules, the schema requires that it appear on the root, though it may appear on other elements.

2.3.3 Usage

DocBook was adopted by O’Reilly and the open source community and was used for creating documentation for many projects, including FreeBSD, KDE, GNOME desktop documentation, the GTK+ API references, the Linux kernel documentation, and the work of the Linux Documentation Project.

!DocBook is a model that was originally designed for hardware and software documentation. The format is widely used in the open source community

As for previous listed standards the integration into Office products was not part of the implementation scope.

2.4 ODF

2.4.1 Introduction

OpenDocument or ODF, is a document file format used for describing electronic documents such as memos, reports, books, spreadsheets, charts, presentations and word processing documents. This standard was developed by a Technical Committee under the Organization for the Advancement of Structured Information Standards consortium and based upon the XML format originally created and implemented by the OpenOffice.org office suite. OpenDocument is an OASIS Standard and a published ISO and IEC International Standard referred to as ISO/IEC 26300:2006.

Page 8: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 8/17

2.4.2 Specification

Document and Template

The most common file extensions used for OpenDocument documents are .odt for text documents, .ods for spreadsheets, .odp for presentation programs, and .odg for graphics. OpenDocument also supports a set of template types that represent formatting information (including styles) for documents, without the content themselves.

Here is the complete list of document types, showing the type of file, the recommended file extension, and the MIME:

File type Extension MIME TypeText .odt application/vnd.oasis.opendocument.text

Spreadsheet .ods application/vnd.oasis.opendocument.spreadsheet

Presentation .odp application/vnd.oasis.opendocument.presentation

Drawing .odg application/vnd.oasis.opendocument.graphicsChart .odc application/vnd.oasis.opendocument.chartFormula .odf application/vnd.oasis.opendocument.formulaImage .odi application/vnd.oasis.opendocument.imageMaster Document .odm application/vnd.oasis.opendocument.text-master

Table 2: Document types

Metadata

The OpenDocument format supports storing metadata by having a set of pre-defined metadata elements, as well as allowing user-defined and custom metadata.

Content

OpenDocument's text content format supports both typical and advanced capabilities. Headings of various levels, lists of various kinds (numbered and not), numbered paragraphs, and change tracking are all supported. Page sequences and section attributes can be used to control how the text is displayed. Hyperlinks, bookmarks, and references are supported as well. Text fields (for autogenerated content), and mechanisms for automatically generating tables such as tables of contents, indexes, and bibliographies, are included as well.

In the OpenDocument format, spreadsheets are an example of a set of tables. Thus, there are extensive capabilities for formatting the display of tables and spreadsheets. Database ranges, filters, and data pilots (known to Excel users as "pivot tables") are also supported. Change tracking is available for spreadsheets as well.

The graphics format supports a vector graphic representation, in which a set of layers and the contents of each layer is defined. Available drawing shapes include Rectangle, Line, Polyline, Polygon, Regular Polygon, Path, Circle, Ellipse, and Connector. 3D Shapes are also available; the

Page 9: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 9/17

format includes information about the Scene, Light, Cube, Sphere, Extrude, and Rotate. Custom shapes can also be defined.

Presentations are supported. Animations can be included in presentations, with control over the Sound, showing a shape or text, hiding a shape or text, or dimming something, and these can be grouped. In OpenDocument, much of the format capabilities are reused from the text format, simplifying implementations. However, tables are not supported within OpenDocument as drawing objects, so may only be included in presentations as embedded tables.

Charts define how to create graphical displays from numerical data. They support titles, subtitles, footer, and a legend to explain the chart. The format defines the series of data that is to be used for the graphical display, and a number of different kinds of graphical display (such as line charts, pie charts, and so on).

Formatting

The style and formatting controls are numerous, providing a number of controls over how information is displayed.

Page layout is controlled by a variety of attributes. These include page size, number format, paper tray, print orientation, margins, border (and its line width), padding, shadow, background, columns, print page order, first page number, scale, table centering, maximum footnote height and separator, and many layout grid properties.

Headers and footer can have defined fixed and minimum heights, margins, border line width, padding, background, shadow, and dynamic spacing.

There are many attributes for specific text, paragraphs, ruby text, sections, tables, columns, lists, and fills. Specific characters can have their fonts, sizes, and other properties set. Paragraphs can have their vertical space controlled through attributes on keep together, widow, and orphan, and have other attributes such as "drop caps" to provide special formatting.

Format internals

An OpenDocument file is a Jar compressed archive containing a number of files and directories. This simple compression mechanism means that OpenDocument files are normally significantly smaller than equivalent Microsoft ".doc" or ".ppt" files. This smaller size is important for organizations who store a vast number of documents for long periods of time, and to organizations those who must exchange documents over low bandwidth connections. Once uncompressed, most data is contained in simple text-based XML files, so the data contents (once uncompressed) have the typical ease of modification and processing of XML files. The standard also allows for the creation of a single XML document, which uses <office:document> as the root element, for use in document processing.

Directories can be included to store non-SVG images, non-SMIL animations, and other files that are used by the document but cannot be expressed directly in the XML.

Page 10: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 10/17

Due to the openly specified compression format used, it is possible for a user to extract the container file to manually edit the contained files. This allows a corrupted file to be repaired or low level manipulation of the contents. It is known that many programs which implement the OpenDocument format do not utilise high compression levels. It is therefore possible for the user to optimise the file sizes by using more aggressive compression programs. This may be coupled with a number of image optimisation programs being used on the contained pictures and has been seen to give over 40% reduction in file size over a file directly saved from an OpenDocument compatible program.

The zipped set of files and directories includes the following:

XML files

o content.xml

o meta.xml

o settings.xml

o styles.xml

Other files

o mimetype

Directories

o META-INF/

o Thumbnails/

The OpenDocument format provides a strong separation between content, layout and metadata. The most notable components of the format are described in the subsections below. The files in XML format are further defined using the RELAX NG language for defining XML schemas. RELAX NG is itself defined by an OASIS specification, as well as by part two of the international standard ISO/IEC 19757: Document Schema Definition Languages (DSDL).

content.xml is the most important file. It carries the actual content of the document (except for binary data, like images). The base format is inspired by HTML, and though far more complex, it is reasonably legible to humans

styles.xml contains style information. OpenDocument makes heavy use of styles for formatting and layout. Most of the style information is here (though some is in content.xml). Styles types include:

Paragraph styles.

Page Styles.

Character Styles.

Frame Styles.

List styles.

Page 11: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 11/17

meta.xml contains the file metadata. For example, Author, "Last modified by", date of last modification, etc.

settings.xml includes settings such as the zoom factor or the cursor position. These are properties that are not content or layout.

mimetype is just a one-line file with the mimetype of the document. One implication of this is that the file extension is actually immaterial to the format. The file extension is only there for the benefit of the user.

2.4.3 Usage

OpenDocument is designed to reuse parts of existing XML standards whenever they are available, and it creates new tags only where no existing standard can provide the needed functionality. So, OpenDocument uses DublinCore for metadata, MathML for displayed formulas, SVG for vector graphics, SMIL for multimedia, XLink for hyperlinks etc.

This new standard gets largely supported by the open community and the industry which explains that the list of applications supporting or based on OpenDocument is continual growing. Large organisations and governments around the world have started evaluating the use of OpenDocument for saving and exchanging editable office documents.

!The Open Document Format was initially developed as ISO standard to represent Office Documents. Due to it origins ODF can naturally be integrated into any Office products. Plug-ins and conversion tools are freely available.

The format is widely used, supported and adopted by the open source community, the industry and the national bodies.

2.5 MS OOXML

2.5.1 Introduction

Office Open XML (commonly abbreviated as OOXML) is a file format specification for the storage of electronic documents such as memos, reports, books, spreadsheets, charts, presentations and word processing documents. The specification was developed by Microsoft for its Microsoft Office 2007 product suite and was standardized by Ecma International as Ecma 376 in December 2006.

2.5.2 Specification

File format and structure

Page 12: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 12/17

The Office Open XML file is a ZIP package containing the individual files that form the basis of the document. As well as XML files the ZIP package can also include embedded (binary) files in formats such as PNG, BMP, AVI or PDF.

Since this is a complete break with the previous binary based Microsoft Office file formats, and is completely new, it isn't entirely clear what the extent of Microsoft's backwards compatibility claims for OOXML are. It cannot be backwards compatible with existing Microsoft Office documents, by virtue of it being a completely new format, and it isn't backwards compatible with versions of Microsoft Office prior to 2007 without the Microsoft Office Compatibility Pack.

Document format

Office Open XML is a container format for several specialized XML-based document markup languages, roughly corresponding to individual applications within the Microsoft Office product line:

* WordprocessingML for word processing documents

* SpreadsheetML for spreadsheets

* PresentationML for presentations

* DataDiagramingML for technical diagrams

* FormTemplate for electronic forms

Container structure

A basic Office Open XML file contains an XML file called [Content_Types].xml at the root level of the ZIP package, along with three folders: _rels, docProps, and a directory specific for the document type (for example, in a .docx word processing file that would be a word directory). The word directory contains the document.xml file which is the core content of the document.

[Content_Types].xml file

This file describes the content of the ZIP package. It also contains a mapping for file extensions and overrides for specific URIs.

_rels Folder

The _rels folders are where one goes to find the relationships for any given part within the package. To find the relationships for a specific part, one looks for the _rels folder that is a sibling of one's part. If the part has relationships, the _rels folder will contain a file that has one's original part name with a .rels appended to it. For example, if the content types part had any relationships, there would be a file called [Content_Types.xml.rels] inside the _rels folder.

_rels/.rel

The root level _rels folder always contains a part called .rels. This URI (/_rels/.rels) and /[Content_Types].xml are the only two reserved URIs for parts in files that adhere to Office Open XML conventions. This is where the "package relationships" are located. Whenever one opens a file using

Page 13: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 13/17

these conventions, one always starts by going to the _rels/.rels file. All relationship files are represented with XML. If one opens it in a text editor, one will see a bunch of XML that outlines each relationship for that part. In a minimal word document containing only the basic document.xml, the top level parts are two metadata parts, and the document.xml part.

word/document.xml

This is the main part for any Word document. If one views it in an XML editor, one will see a pretty basic XML file. The body of the word processing document is contained in this part.

2.5.3 Usage

Office Open XML is the default Office 2007 format if macros are not enabled. Microsoft has also released a compatibility pack for older versions. Using the compatibility pack users can create and edit Office Open XML files from within Office 2000, Office XP and Office 2003. The compatibility pack can also be used as a stand alone converter in combination with Office 97.

!OOXML was developed as standard to represent Office Documents within MS Office 2007. According to Microsoft the format is not reverse compatible with previous version of MS Office document.

As the format has not been released to the public yet it hasn’t been adopted anywhere outside the MS Office 2007 product.

2.6 Comparison Tables

2.6.1 General Information

Language Creator First public release date

Latest stable version

Editor Viewer

CoSSI Statistic Finland -- 0.91 XML Editor --Formex 4 OPOCE 2004 4 XML Editor --DocBook The Davenport

Group1992 5.0 XML Editor Output to HTML,

PDF, CHM, javadoc, others.

ODF OASIS 2005 1.1 Office suite

Office suite

MS OOXML Microsoft -- -- Office suite

Office suite

2.6.2 Characteristics

Language Major purpose Based on Structural markup Presentational markup

CoSSI Statistical Information SGML / XML Yes NoFormex 4 Document exchange SGML / XML Yes No

Page 14: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 14/17

format for OPOCEDocBook Technical documents SGML / XML Yes NoODF Multi-purpose XML/ZIP Yes YesMS OOXML Multi-purpose XML/ZIP Yes Yes

Page 15: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 15/17

2.7 Conclusion

In order to fulfil the requirements of Eurostat the chosen standard will have to provide functionalities to separate content from layout and fully integrate into office suite tools. From the above analysis only two of the standards match these expectations namely ODF and MS OOXML.

We recommend choosing ODF as it is strongly supported by the industry, well documented, easy to use and it is a recognised ISO standard.

We want to highlight that being based on XML, ODF can be transformed into other XML based formats if that specific requirement arise. These formats could include Formex 4, DocBook, etc.

Page 16: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 16/17

3 ODF vs OOXMLOffice Open XML and OpenDocument are two competing XML-based formats for documents intended for use in office productivity software. Both formats combine XML content with other files into compressed ZIP archives. In both formats, the main office document content and presentation information is stored as XML, with the ability to reference embedded and external binary content such as PNG, BMP, GIF, and JPEG.

3.1 Advantages of OpenDocument over Office Open XML formats1. OpenDocument uses a mixed content model whereas the Office Open XML format does not.

Non-mixed documents usually represent structured data and mixed documents are usually used to represent narrative. MS XML uses the non-mixed model to represent narrative which in certain case leads to an ambiguous markup. The mixed-content model is closer to what a developer will be familiar to.

2. OpenDocument is similar to XHTML, while MS XML is not. OpenDocument uses mixed content and marks styles in a similar way. This makes it easier to transform data accurately between OpenDocument and XHTML.

3. OpenDocument gives better separation of style and content.

4. OpenDocument hyperlink URLs are embedded in the main file, whereas in Office Open XML the URL is placed in a separate file. This cause problems with manipulation of OOXML using standard tools such as XSLT.

5. OpenDocument reuses existing standards whenever possible whereas MS XML implements its own definitions.ODF uses parts of SVG for drawings, MathML for equations, XLink for linking, Dublin Core for metadata, etc. This makes the format infinitely more transparent to someone familiar with XML technologies. It also allows you to reuse existing tools that understand these standards.

6. OpenDocument is an approved ISO standard Office Open XML is not.

7. OpenDocument is royalty-free. It can be used without charge by anyone. Whereas Microsoft only released a covenant not to sue for the use of their Schemas.

3.2 Advantages of Office Open XML formats over OpenDocument1. Microsoft Excel has a well-known formula language that has been defined in its entirety in the

new XML formats. The ODF implementation of the formulas in spreadsheets is specific to every vendor.

2. The OpenXML spreadsheet is faster than the ODF spreadsheet format.

3.3 Shortcomings of OpenDocument1. OpenDocument has no macro language specification.

2. ODF 1.1 has no digital signature

Page 17: XML-Publishing - Implementation Strategy file · Web viewImplementation Strategy of an XML-based publishing. in Eurostat. D2.1: Analysis & Evaluation of existing standards. May 2007

Project: XML-Publishing - Implementation StrategyContract: Specific Contract 17101.2006.001-2006.457Prepared by: VBE, CBO Reviewed by: MFEVersion 2.0

Date Updated: 18/05/2007Status: Company Approved Page 17/17

3.4 Shortcomings of Office Open XML1. The specification is incomplete and not entirely publicly available

2. The markup language for spreadsheets used in Office Open XML has two numeric formats for storing dates.

3.5 Cross-platform interoperability1. Microsoft Office 2007 for Windows uses Office Open XML as its native file format. Microsoft

Office 2008 for Mac OS X, scheduled for release in late summer 2007, will also use Office Open XML as its native file format. An ODF converter plugin for Microsoft Office XP/2003/2007 for Windows allows one to open and save OpenDocument word processing (.odt) files.

2. Corel has announced that the WordPerfect Office X3 suite will include support for OpenDocument Format as well as Office Open XML by mid-2007.

3. Gnumeric has included support for OpenDocument spreadsheet and preliminary support for Microsoft Office Open XML spreadsheet format since version 1.7.

4. IBM announced that Lotus Notes will use OpenDocument as the native format for its office productivity editors in the next release, due in 2007. IBM Workplace 2.6 already supports OpenDocument format.

5. Google Docs and Spreadsheets supports OpenDocument word processing and spreadsheet formats.

6. AbiWord 2.4 supports OpenDocument word processing format.

7. Scribus 1.3.3, a multi-platform, open source, page layout application, supports import of OpenDocument word processing files.

8. OpenDocument Format is currently supported in several office suites and individual applications, including as the native file format for KOffice 1.5, OpenOffice.org 2.0 and StarOffice 8.

3.6 Conclusion

According to the pro and contra arguments presented in the previous sections, we can conclude that Open Document Format is the leading format with the biggest potential and flexibility. Its advantages clearly overweight its disadvantages. As an ISO standard, ODF has been widely adopted by the industry and is well supported by most common Office Tools. Furthermore from a technical point of view, the schema is clear, well structured and easily convertible into any other format using XSLT.