Office Open XML Developer Workshop
Office Open XML ArchitectureOffice Open XML Architecture
A developer’s introduction to the file formats
Office Open XML Developer Workshop
DisclaimerDisclaimerThe information contained in this slide deck represents the current view of Microsoft Corporation on the issues discussed as of the date of
publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This slide deck is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this slide deck may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this slide deck. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this slide deck does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred.
© 2006 Microsoft Corporation. All rights reserved.Microsoft, 2007 Microsoft Office System, .NET Framework 3.0, Visual Studio, and Windows Vista are either registered trademarks or
trademarks of Microsoft Corporation in the United States and/or other countries.The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Office Open XML Developer Workshop
ObjectivesObjectives
• In this module, we will learn about the architecture of the Office Open XML formats.
• Primary focus is on concepts that apply to all three main document types.
• Details specific to word processing documents, spreadsheets, or presentations will be covered in separate modules for each of those document types.
Office Open XML Developer Workshop
Evolution of Document AuthoringEvolution of Document Authoring
Old Approach:Linear, static process
Paper Documents – printed document of recordElectronic Masters – digital work style, temporary until printedFace-to-face Collaboration – time consuming & coordination issuesPaper-based project managementBinary formats with “copy/paste” tasks for content reuse
New Approach:Dynamic, interactive process
Paper Documents – temporary & disposable, rarely usedElectronic formats – automatic, machine-driven processesElectronic collaboration – always using up-to-date informationElectronic management – scheduling, tracking easier to communicateXML formats make content easy to locate, edit and reuse
Office Open XML Developer Workshop
• Formats describe information• Define content appearance/rendering• Structure content for business processes
Enable machines (software) to use information
Software applications use informationProvide functionality for authoring, organizing,developing, representing, evaluating, reviewing,collaborating, validating, calculating, protecting,and printing information
Formats can influence application design, and vice versaOpen Document Format (ODF) and OpenOffice functionalityOffice Open XML and Microsoft Office functionality
Document Formats and ApplicationsDocument Formats and Applications
-5-
Office Open XML Developer Workshop
Levels of InteroperabilityReference and Custom-defined SchemasLevels of InteroperabilityReference and Custom-defined Schemas
Custom-defined SchemasData-oriented (e.g.: Price, Invoice)business informationEnable System Integration
XML Reference SchemasDisplay-oriented (Bold, Italics, Tables, Paragraphs, Styles,…)Document FormatEnable Archival and File Formats Interoperability
Office Open XML Developer Workshop
Levels of Interoperability Reference and Custom-defined SchemasLevels of Interoperability Reference and Custom-defined Schemas
<w:p> <w:r> <w:rPr><w:b /></w:rPr> <w:t>John Doe</w:t> </w:r> <w:r> <w:rPr><w:i /></w:rPr> <w:t>Health Agency</w:t> </w:r></w:p>
XML Reference SchemasDisplay-oriented (for example, Bold, Italics, Tables, Paragraphs, Styles)Document FormatEnable Archival and File Formats Interoperability
DEMO
Office Open XML Developer Workshop
Levels of Interoperability Reference and Custom-defined SchemasLevels of Interoperability Reference and Custom-defined Schemas
Custom-defined SchemasData-oriented (for example, Price, Invoice)business informationEnable System Integration
<ConferenceReport> <Date>3/24/2004</Date> <Attendees> <Attendee Name=“John Doe”> <Department>
Health Agency </Department> <Potential> <Sales>100</Sales> <Growth>25%</Growth> … </Attendee>
DEMO
Office Open XML Developer Workshop
Goals of XML File FormatsGoals of XML File Formats
Reduce operational costProgram interoperabilityStreamlined processes
Not locked into a single vendor or platformIntegrated solutions
Successfully share information across applications
Not proprietary
Facilitate openness, transparency and interoperability
Transition from the locked-in approach to system design.
Office Open XML Developer Workshop
User View of Open XML FilesUser View of Open XML Files
Single fileCompact
Compression
Corruption resistantSegmented architectureCorruption of any part would not prohibit opening
Separation of macro-enabled contentMacro-enabled extension end with “m” instead of “x” (e.g. .docm)VBA, Excel Macro-Sheets, PowerPoint Action Commands
Enforced at runtime by 2007 Office programs
Office Open XML Developer Workshop
Programmer View of Open XML FilesProgrammer View of Open XML Files
ZIP ArchiveDocument Parts
XML PartsBinary PartsTyped (RFC 2616)
RelationshipsConnections between parts
Content Type StreamA specially-named streamDefines mappings from part names to content typesNot itself a part, not URI addressable
Folder structure for convenience only DEMO
Office Open XML Developer Workshop
Files and folders – NO! Parts and relationships – YES
How to think about OPC packagesHow to think about OPC packages
Office Open XML Developer Workshop
Ecma Office Open XML SpecificationsEcma Office Open XML Specifications
WordprocessingML SpreadsheetML PresentationML
ZIP XML + Unicode
DrawingML
Content Types
Custom XML Bibliography
Markup Languages
Relationships
Metadata
DigitalSignatures
VML (legacy) Equations
Open Packaging Convention
Core Technologies
Vocabularies
Office Open XML Developer Workshop
Ecma Office Open XML SpecificationsEcma Office Open XML Specifications
WordprocessingML(.docx)
SpreadsheetML(.xslx)
PresentationML(.pptx)
ZIP XML
DrawingML
Content Types
Custom XML Bibliography
Shared Markup
Relationships
Metadata
DigitalSignatures
VML (legacy) Equations
Main Document MLs
Open Packaging Convention
Core Technologies
Module 06, 07AModule 03, 04
Module 07B Module 05
Module 08
Module 02
Module 01, 09
Module 02
Module 02
Office Open XML Developer Workshop
Office Open XML Reference SchemasOffice Open XML Reference Schemas
First published schema in September 2005, with ongoing standardization and documentation since then.
In December 2005, Microsoft and 9 other companies submitted upgrades to Ecma International.
Estimated completion for Format specifications by December 2006. (Final vote on December 7!)
Full XML-based environments begins with the 2007 Microsoft Office system.
Office Open XML Developer Workshop
InteroperabilityInteroperability
Extensibility allows for interoperability problems: the consumer may not understand extensions added by the producer.To guarantee interoperability, there must be a standardized list of allowed media formats, etc.An organization that wants to assure interoperability must take responsibility for making these decisionsExample: MS-Office allows for reliable interoperability by:
Using the Office Open XML formatAutomatic document conversion from legacy formatsUsing the macro-enabled extensions to support a defined set of allowed content types
Office Open XML Developer Workshop
Office Open XML File Formats ExtensionsOffice Open XML File Formats Extensions
Macro-Free Macro-Enabled
Document Template Document Template
docx dotx docm dotm
pptx potx pptm potm
xlsx xltx xlsm xltm
Open Packaging Convention
Office Open XML Developer Workshop
Hello WorldHello World
Creating the minimal WordprocessingML document
Office Open XML Developer Workshop
Developer Scenario: Styling ContentDeveloper Scenario: Styling Content
Example: enforce organizational standards for document formatting.
Open XMLProcessing
Office Open XML Developer Workshop
Example #1: remove confidential information, tracked changes or metadata from outbound documents.
Example #2: remove macros, inappropriate language, or other content from inbound documents.
Developer Scenario: Content InspectionDeveloper Scenario: Content Inspection
Open XMLProcessing
Open XMLProcessing
Office Open XML Developer Workshop
Back-end system(LOB/CRM/etc.)
Development Scenario: Consuming DocumentsDevelopment Scenario: Consuming Documents
Example: user creates expense reports as spreadsheet documents, which are loaded into a back-end system on the server.
Open XMLProcessing
Authoring environment(Microsoft Office, etc.)
Office Open XML Developer Workshop
Development Scenario: Document AssemblyDevelopment Scenario: Document Assembly
Example: create sales reports from financial and forecast data stored in a CRM system.
Web client or rich clientallows user to select orenter content criteria
Open XMLProcessing
Office Open XML Developer Workshop
Example: tagging document content with custom semantics for processing by a back-end system.
Authoring environment
Development Scenario: Custom XML MarkupDevelopment Scenario: Custom XML Markup
Open XMLProcessing
Office Open XML Developer Workshop
Custom XML Data StoreCustom XML Data Store
Customer-defined XML stored separately from other document partsAny XML can be stored
Document propertiesWSS meta-dataCustom XML (with or without XML schema)
XML data is available as an editable tree (using familiar DOM) within WordExternal applications (client/server) can process the store or populate the store
Doc/Template
Doc Parts
XML
External App
Office Open XML Developer Workshop
XML Data BindingXML Data Binding
Link content controls to nodes in the XML data storeMappings are created using standard XPath expressionsMicrosoft Offices offers built-in support for mapping to Office metadata (document properties)
Customers
Office Open XML Developer Workshop
Open XML InteroperabilityOpen XML Interoperability
Linux Java Microsoft COM
ZIP LibraryMinizip
zLib
J2SEjava.util.zip
.NET Framework 3.0System.IO.Packaging *
Xceed .NET controls
Xceed ActiveX controls
XML Library Apache Xerces JAXP .NET Framework 3.0System.Xml MSXML
* Also includes abstractions for OPC concepts (Open Packaging Convention)
Office Open XML Developer Workshop
Standardization of Office Open XMLStandardization of Office Open XML
/// timeline of current status, etc./// Ecma = December 7, 2006/// submitted to ISO as Fast-Track December 8, 2006/// administration phase, comments were filed; Ecma is working to resolve these comments/// upcoming: DIS ballot period, then the vote
Office Open XML Developer Workshop
The Ecma SpecThe Ecma Spec
Where to get the final drafthttp://www.ecma-international.org/news/TC45_current_work/TC45-2006-50_final_draft.htm
Organization of the spec1. Fundamentals2. Open Packaging Conventions3. Primer4. Markup Language Reference5. Markup Compatibility and Extensibility
Reference Schemas (XSD, RelaxNG)
Office Open XML Developer Workshop
The Ecma Spec: Where To StartThe Ecma Spec: Where To Start
Where to get the final drafthttp://www.ecma-international.org/news/TC45_current_work/TC45-2006-50_final_draft.htm
Organization of the spec1. Fundamentals2. Open Packaging Conventions3. Primer4. Markup Language Reference5. Markup Compatibility and Extensibility
Reference Schemas (XSD, RelaxNG)
Read 1st
ReferencematerialsRead 2nd
Office Open XML Developer Workshop
OpenXmlDeveloper.orgOpenXmlDeveloper.org
Formed by 40 companies to share developer information about the Office Open XML file formats
Articles with full source code for C#, VB, Java, XSLT
Forums for posting technical questions