day of dot net ann arbor 2007

44
Creating Office Creating Office Documents with Documents with Open XML Open XML David Truxall, David Truxall, Ph.D. Ph.D. Principal Principal Consultant Consultant NuSoft Solutions NuSoft Solutions

Upload: david-truxall

Post on 01-Jul-2015

858 views

Category:

Technology


0 download

DESCRIPTION

My presentation on Office 2007/OpenXML file formats

TRANSCRIPT

Page 1: Day Of Dot Net Ann Arbor 2007

Creating Office Documents Creating Office Documents with with

Open XML Open XML

David Truxall, Ph.D.David Truxall, Ph.D.

Principal ConsultantPrincipal Consultant

NuSoft SolutionsNuSoft Solutions

Page 2: Day Of Dot Net Ann Arbor 2007

AgendaAgenda

OverviewOverview System.IO.PackagingSystem.IO.Packaging Building Documents with .NetBuilding Documents with .Net

Page 3: Day Of Dot Net Ann Arbor 2007

Open XMLOpen XML

A Standard that describes a family of A Standard that describes a family of XML schemas (Ecma Standard)XML schemas (Ecma Standard)

Defines the XML vocabularies for word-Defines the XML vocabularies for word-processing, spreadsheet, and processing, spreadsheet, and presentation documentspresentation documents

Defines the packaging of documents Defines the packaging of documents that conform to these schemasthat conform to these schemas

Page 4: Day Of Dot Net Ann Arbor 2007

Features of Office Open XMLFeatures of Office Open XML

Page 5: Day Of Dot Net Ann Arbor 2007

Support for Open XMLSupport for Open XML

iPhoneiPhone iWorkiWork Microsoft OfficeMicrosoft Office OpenOfficeOpenOffice GnumericGnumeric WordPerfectWordPerfect Palm OSPalm OS NeoOfficeNeoOffice

PHPPHP JavaJava Monarch v.9.0Monarch v.9.0 OpenXML WriterOpenXML Writer Word Counter 2.2.1Word Counter 2.2.1 Altsoft XML2PDFAltsoft XML2PDF MindMappingMindMapping XmlSpyXmlSpy

Page 6: Day Of Dot Net Ann Arbor 2007

Open XML Format ArchitectureOpen XML Format Architecture

File Container

Document Properties

Comments

WordML / Spreadsheet ML

Custom XML

Embedded Code

Images / Video / Sound

User view: single Office file

Document PartsMost parts are XMLMost parts are XMLEach XML part is a discrete Each XML part is a discrete componentcomponentCan add, extract and modify Can add, extract and modify individual parts without using individual parts without using Office programsOffice programsCorruption of any part would not Corruption of any part would not prohibit the file from openingprohibit the file from opening

Developer view: modular file

Page 7: Day Of Dot Net Ann Arbor 2007

Open Packaging OrganizationOpen Packaging Organization Package – The container (a ZIP archive)Package – The container (a ZIP archive) Document Parts – The files inside the containerDocument Parts – The files inside the container Relationships – Every part that references other Relationships – Every part that references other

parts does so via a relationshipparts does so via a relationship

Document Properties

Application Properties

Custom PropertiesSheet 1

Sheet 2

Sheet 3Strings

Theme

Workbook

Page 8: Day Of Dot Net Ann Arbor 2007

Exploring the Document PackageExploring the Document Package

Page 9: Day Of Dot Net Ann Arbor 2007

Reference SchemasReference Schemas

Xml Reference SchemasXml Reference Schemas 80+ that make up the standard80+ that make up the standard

Display orientedDisplay oriented Document formatDocument format

Custom SchemasCustom Schemas Specific to your businessSpecific to your business

Data orientedData oriented Business informationBusiness information

Page 10: Day Of Dot Net Ann Arbor 2007

Custom XML ContentCustom XML Content Enables interoperability with other systemsEnables interoperability with other systems

Documents can provide a rich view of back-end data Documents can provide a rich view of back-end data sourcessources

Documents can update back-end data sourcesDocuments can update back-end data sources

Exposes business data in Open XML documentsExposes business data in Open XML documents Heterogenous systems can easily read data from Heterogenous systems can easily read data from

documentsdocuments Business-specific semantics can be applied to document Business-specific semantics can be applied to document

datadata

Separates presentation and dataSeparates presentation and data Simplified programming model for all of the aboveSimplified programming model for all of the above

Custom XML schema support was a key design Custom XML schema support was a key design objective for Open XML: objective for Open XML: any schema any schema can be used can be used in Open XML documents.in Open XML documents.

Page 11: Day Of Dot Net Ann Arbor 2007

System.IO.PackagingSystem.IO.Packaging Part of Windows Presentation Part of Windows Presentation

FoundationFoundation Installed with .NET 3.0Installed with .NET 3.0 Requires .NET 2.0 RuntimeRequires .NET 2.0 Runtime Enables package manipulation forEnables package manipulation for

Office Open XML File FormatsOffice Open XML File Formats XML Paper Specification FilesXML Paper Specification Files Any Open Packaging Convention filesAny Open Packaging Convention files

Page 12: Day Of Dot Net Ann Arbor 2007

The PackageThe Package

Package ClassPackage Class

Provides methods to Provides methods to create, enumerate create, enumerate and delete the and delete the following entities:following entities: PackagePackage Package PropertiesPackage Properties PackageRelationshipsPackageRelationships PackagePartsPackageParts

Common Package Parts

Pac

kage

Rel

atio

nshi

psP

acka

ge R

elat

ions

hips

Core PropertiesCore Properties

Digital SignaturesDigital Signatures

Specific Format Parts

Office DocumentOffice Document

Par

t Rel

atio

nshi

psP

art R

elat

ions

hips XML PartXML Part

XML PartXML Part

Par

t Rel

sP

art R

els

Etc…

Page 13: Day Of Dot Net Ann Arbor 2007

The PackagePartThe PackagePart A PackagePart is the A PackagePart is the

object of data within the object of data within the PackagePackage

It provides support to It provides support to create, enumerate and create, enumerate and delete part relationshipsdelete part relationships

Get data as a Get data as a System.IO.StreamSystem.IO.Stream

PackagePart properties:PackagePart properties: CompressionOptionCompressionOption ContentTypeContentType PackagePackage UriUri

Page 14: Day Of Dot Net Ann Arbor 2007

PackageRelationshipPackageRelationship Required to find parts Required to find parts

(part names are not (part names are not guaranteed)guaranteed)

Iterate through a Iterate through a RelationshipCollection RelationshipCollection by type or IDby type or ID

Relationship PropertiesRelationship Properties IDID PackagePackage RelationshipTypeRelationshipType SourceUriSourceUri TargetModeTargetMode TargetUriTargetUri

Page 15: Day Of Dot Net Ann Arbor 2007

Package Uri HelperPackage Uri Helper Find a related PackagePart by searching Find a related PackagePart by searching

relationships, either by relationship type or relationships, either by relationship type or relationship IDrelationship ID This returns a list of PackageRelationship objectsThis returns a list of PackageRelationship objects

A PackageRelationship defines two relative URIsA PackageRelationship defines two relative URIs Source URI, pointing to the source PackagePartSource URI, pointing to the source PackagePart Target URI, pointing to the target PackagePartTarget URI, pointing to the target PackagePart

Retrieve a PackagePart by using a URI relative to Retrieve a PackagePart by using a URI relative to the root of the Packagethe root of the Package Translation of Source and Target URIs is requiredTranslation of Source and Target URIs is required Use the PackUriHelper class to aid in the translationUse the PackUriHelper class to aid in the translation

Page 16: Day Of Dot Net Ann Arbor 2007

System.IO.PackagingSystem.IO.Packaging

Page 17: Day Of Dot Net Ann Arbor 2007

SpreadsheetMLSpreadsheetMLWorkbook properties

table

chart

styles

calcChain

sharedStrings

sheet1..Nsheet1..Nsheet1..Nsheet1..N

sheet1..Nsheet1..Nsheet1..Ndrawing

Workbooks, WorksheetsWorkbooks, Worksheets Rows, Columns, ValuesRows, Columns, Values FormulasFormulas

Workbooks, WorksheetsWorkbooks, Worksheets Rows, Columns, ValuesRows, Columns, Values FormulasFormulas

Page 18: Day Of Dot Net Ann Arbor 2007

The Minimal xlsxThe Minimal xlsx Required: Required: workbook.xmlworkbook.xml, the document “start part”, the document “start part” Required: at least one sheet, Required: at least one sheet, worksheet.xmlworksheet.xml Required: one relationship part (Required: one relationship part (.rels.rels))

Must be in a Must be in a _rels _rels folderfolder

Required: Required: [Content_Types].xml[Content_Types].xml Required part for all Open XML documentsRequired part for all Open XML documents ThreeThree content types must be defined: content types must be defined:

SpreadsheetML main document (for the start part)SpreadsheetML main document (for the start part) WorksheetWorksheet Package relationships (for the required relationships)Package relationships (for the required relationships)

Everything else is optionalEverything else is optional Worksheet Worksheet <sheetdata><sheetdata> is required, but may be empty is required, but may be empty

Page 19: Day Of Dot Net Ann Arbor 2007

SpreadsheetML TablesSpreadsheetML Tables

SpreadsheetML tables provide structure and SpreadsheetML tables provide structure and formatting for worksheet informationformatting for worksheet information

Separation of presentation and data:Separation of presentation and data: Data stays in the worksheetData stays in the worksheet Table definition in separate part (implicit relationship)Table definition in separate part (implicit relationship)

Open XML has different types of tables for each Open XML has different types of tables for each document type, optimized for different scenarios:document type, optimized for different scenarios: WordprocessingML has its WordprocessingML has its tbltbl element element SpreadsheetML has its SpreadsheetML has its tabletable element element PresentationML uses DrawingML tables (PresentationML uses DrawingML tables (tbl tbl

inside inside graphicDatagraphicData))

Page 20: Day Of Dot Net Ann Arbor 2007

SpreadsheetML TableSpreadsheetML Table

<sheetData> <row r="1" spans="1:2"> <c r="A1" t="s"><v>0</v></c> <c r="B1" t="s"><v>1</v></c> </row> <row r="2" spans="1:2"> <c r="A2"><v>1</v></c> <c r="B2"><v>4</v></c> </row> <row r="3" spans="1:2"> <c r="A3"><v>2</v></c> <c r="B3"><v>5</v></c> </row> <row r="4" spans="1:2"> <c r="A4"><v>3</v></c> <c r="B4"><v>6</v></c> </row></sheetData>...<tableParts count="1"> <tablePart r:id="rId2"/></tableParts>

Headings = shared strings

Worksheet (sheet1.xml)

Table definition (table1.xml)<table … ref="A1:B4” …> <autoFilter ref="A1:B4”/> <tableColumns count="2"> <tableColumn id="1" name="Column1" /> <tableColumn id="2" name="Column2" /> </tableColumns> <tableStyleInfo …/> </table>

Page 21: Day Of Dot Net Ann Arbor 2007

ExcelPackageExcelPackage

Open Source API on CodeplexOpen Source API on Codeplex Wraps System.IO.Packaging and Wraps System.IO.Packaging and

SpreadsheetMLSpreadsheetML

http://www.codeplex.com/ExcelPackage

Page 22: Day Of Dot Net Ann Arbor 2007

WordProcessingML DocumentWordProcessingML DocumentDocument

bodyproperties

fontTable

headers/footers

images

numberingDefinitions

styles

customXML

footnotes/endnotes

commentsA WordprocessingML file is a collection of multiple “stories”:

The main story

Header(s) / Footer(s)

Footnote(s) / Endnote(s)

Subdocuments

Comment(s)

Page 23: Day Of Dot Net Ann Arbor 2007

Main Document PartMain Document Part

The top-level element in the start part (e.g., The top-level element in the start part (e.g., document.xml) is document.xml) is documentdocument

Document Document has two optional child elements:has two optional child elements: The The backgroundbackground element, which specifies the element, which specifies the

settings for the background for the documentsettings for the background for the document The The bodybody element, which contains the content of element, which contains the content of

the main storythe main story

Page 24: Day Of Dot Net Ann Arbor 2007

Block-Level ElementsBlock-Level Elements The The bodybody element contains the main document element contains the main document

story, made up of block-level elements:story, made up of block-level elements: ParagraphsParagraphs TablesTables Custom XML markupCustom XML markup Alternate format chunksAlternate format chunks SubdocumentsSubdocuments Final section propertiesFinal section properties Future extensibility containersFuture extensibility containers

Nested elements: a table may contain a table which Nested elements: a table may contain a table which contains a paragraph, etc.contains a paragraph, etc.

Page 25: Day Of Dot Net Ann Arbor 2007

Inline StructuresInline Structures The The <w:p><w:p> paragraph element contains inline paragraph element contains inline

structures:structures:

Runs (containing <w:t> text regions)Runs (containing <w:t> text regions) Custom Markup (can occur at block or inline level)Custom Markup (can occur at block or inline level) Annotations (comments, tracked changes, Annotations (comments, tracked changes,

bookmarks)bookmarks) DrawingML elementsDrawingML elements Fields (date, page number, document creator, etc.)Fields (date, page number, document creator, etc.) HyperlinksHyperlinks

Page 26: Day Of Dot Net Ann Arbor 2007

Paragraphs <w:p>Paragraphs <w:p> The most basic unit of a WordprocessingML The most basic unit of a WordprocessingML

documentdocument Contains three pieces of information:Contains three pieces of information:

Paragraph propertiesParagraph properties Inline contentInline content optional revision IDs used for document merge and optional revision IDs used for document merge and

comparecompare

A paragraph may occur at any location which A paragraph may occur at any location which allows block level content:allows block level content: At the top-most level within a story (e.g. header, footer, At the top-most level within a story (e.g. header, footer,

main document)main document) Nested within a table cellNested within a table cell Nested within a structured document tag or annotation Nested within a structured document tag or annotation

markersmarkers

Page 27: Day Of Dot Net Ann Arbor 2007

Paragraph PropertiesParagraph Properties

Can be set directly on a paragraph (below)Can be set directly on a paragraph (below)or in a paragraph styleor in a paragraph style

24 total property settings24 total property settings

<w:p> <w:pPr> <w:widowControl w:val=“on” /> <w:keepNext/> <w:keepLines/> <w:pageBreakBefore/> <w:suppressLineNumbers /> <w:suppressAutoHyphens /> <w:textBoxTightWrap /> </w:pPr> … runs, paragraph content …</w:p>

<w:p> <w:pPr> <w:widowControl w:val=“on” /> <w:keepNext/> <w:keepLines/> <w:pageBreakBefore/> <w:suppressLineNumbers /> <w:suppressAutoHyphens /> <w:textBoxTightWrap /> </w:pPr> … runs, paragraph content …</w:p>

Page 28: Day Of Dot Net Ann Arbor 2007

Runs <w:r>Runs <w:r> A run is a region of text with a common set A run is a region of text with a common set

of propertiesof properties All text must be contained within runsAll text must be contained within runs All runs must be contained within All runs must be contained within

paragraphsparagraphs A run contains three types of information:A run contains three types of information:

Run propertiesRun properties Run content (text, fields, soft line breaks, Run content (text, fields, soft line breaks,

pictures, etc.)pictures, etc.) Optional revision IDs for document comparisonOptional revision IDs for document comparison

Page 29: Day Of Dot Net Ann Arbor 2007

Define formatting forDefine formatting forindividual charactersindividual characters

Font attributes, size/position, etc.Font attributes, size/position, etc. 24 total properties24 total properties

Run PropertiesRun Properties

<w:r> <w:rPr> <w:rFonts w:ascii=“Arial” w:hAnsi=“Arial” w:cs=“Arial” /> <w:b/> <w:i/> <w:sz w:val=“11” /> <w:dstrike w:val=“true” />

<w:r> <w:rPr> <w:rFonts w:ascii=“Arial” w:hAnsi=“Arial” w:cs=“Arial” /> <w:b/> <w:i/> <w:sz w:val=“11” /> <w:dstrike w:val=“true” />

Page 30: Day Of Dot Net Ann Arbor 2007

PresentationMLPresentationML

View Properties

PresentationProperties

Code

Themes

Fonts

Notes Masters

Slides

HandoutMasters

Slide Masters

Notes Slides

Slide Layouts

Presentation

Page 31: Day Of Dot Net Ann Arbor 2007

The Minimal pptxThe Minimal pptx

Presentation ElementPresentation Element Presentation.xmlPresentation.xml

Slide MastersSlide Masters Notes MastersNotes Masters Handout MastersHandout Masters SlidesSlides

Relationships PartRelationships Part Links to slide partsLinks to slide parts

Page 32: Day Of Dot Net Ann Arbor 2007

Slide PartsSlide Parts

<p:sld xmlns:p=“…/presentationml/2006/main” xmlns:a=“…/drawingml/2006/main” …> <p:cSld> <p:spTree> <p:sp> <p:nvSpPr>   <p:cNvPr id="2" name="7-Point Star 1” /> … <p:sp> <p:nvSpPr>   <p:cNvPr id="3" name="TextBox 2” /> … <p:graphicFrame> <p:nvGraphicFramePr> <p:cNvPr id="4" name="Chart 3” /> … </p:spTree> </p:cSld> <p:clrMapOvr> <a:masterClrMapping /> </p:clrMapOvr></p:sld>

<p:sld xmlns:p=“…/presentationml/2006/main” xmlns:a=“…/drawingml/2006/main” …> <p:cSld> <p:spTree> <p:sp> <p:nvSpPr>   <p:cNvPr id="2" name="7-Point Star 1” /> … <p:sp> <p:nvSpPr>   <p:cNvPr id="3" name="TextBox 2” /> … <p:graphicFrame> <p:nvGraphicFramePr> <p:cNvPr id="4" name="Chart 3” /> … </p:spTree> </p:cSld> <p:clrMapOvr> <a:masterClrMapping /> </p:clrMapOvr></p:sld>

Shape

Chart

Textbox

Page 33: Day Of Dot Net Ann Arbor 2007

Object Parts – DrawingMLObject Parts – DrawingML

Shape ChartTextbox

Chart Part (chart1.xml)

Data source

Page 34: Day Of Dot Net Ann Arbor 2007

DrawingMLDrawingML 5 Main types of objects5 Main types of objects

ShapeShape Group ShapeGroup Shape ConnectorConnector PicturePicture Graphic FrameGraphic Frame

General-purpose container General-purpose container Used for Charts, Diagrams, TablesUsed for Charts, Diagrams, Tables

Most widely used elements are Property elementsMost widely used elements are Property elements Non-Visible Properties (nvPrs): union of common Non-Visible Properties (nvPrs): union of common

nvPrs and object specific nvPrsnvPrs and object specific nvPrs Visible Properties: object specificVisible Properties: object specific

Page 35: Day Of Dot Net Ann Arbor 2007

ShapesShapes Preset geometryPreset geometry

Pick the preset shapePick the preset shape Specify the adjust values for the shapeSpecify the adjust values for the shape

Text geometryText geometry Pick the preset text shapePick the preset text shape Specify the adjust values for the text shapeSpecify the adjust values for the text shape

Custom geometryCustom geometry Not covered in this courseNot covered in this course

Page 36: Day Of Dot Net Ann Arbor 2007

<a:blipFill> <a:blip r:embed="rId2" /> <a:stretch> <a:fillRect /> </a:stretch></a:blipFill>

<a:ln> <a:solidFill> <a:srgbClr val="4F81BD" /> </a:solidFill> <a:prstDash val="sysDash" /></a:ln>

Shape Line and Fill PropertiesShape Line and Fill Properties

Indicates relationship idto image data

BLIP (Binary Large Image or Pictures) Fill

Gradient Fill

Dash Line and Solid Fill

Fill

Dashed Line

Line

<a:gradFill flip="none" rotWithShape="1"> <a:gsLst> <a:gs pos="0"> <a:srgbClr val="DDEBCF" /> </a:gs> <a:gs pos="50000"> <a:srgbClr val="9CB86E" /> </a:gs> ... </a:gsLst> <a:lin ang="4200000" scaled="0" /> <a:tileRect /></a:gradFill>

Gradient stop and color

Page 37: Day Of Dot Net Ann Arbor 2007

PicturesPictures <p:pic> <p:nvPicPr> <p:cNvPr id="4" name="lake.jpeg" /> <p:cNvPicPr> <a:picLocks noChangeAspect="1" /> </p:cNvPicPr> <p:nvPr /> </p:nvPicPr> <p:blipFill> <a:blip r:embed="rId2" /> <a:stretch> <a:fillRect /> </a:stretch> </p:blipFill> <p:spPr> <a:xfrm> <a:off x="762000" y="571500" /> <a:ext cx="7620000" cy="5715000" /> </a:xfrm> <a:prstGeom prst="rect"> <a:avLst /> </a:prstGeom> </p:spPr></p:pic>

Define a Picture:Define a Picture:<p:pic/><p:pic/>

Source image rel. idSource image rel. id<a:blip r:embed=“rId2”/><a:blip r:embed=“rId2”/>

Acts similar to a shapeActs similar to a shape<p:spPr/><p:spPr/>

Non-Visual picture Non-Visual picture properties convey properties convey picture specific save picture specific save propertiesproperties<p:nvPicPr/><p:nvPicPr/>

Similar for Audio & Similar for Audio & VideoVideo

Page 38: Day Of Dot Net Ann Arbor 2007

Pictures vs. ShapesPictures vs. Shapes

1. Single fill allowed2. Borders grow in/outward3. Must be done by app4. Can have text attached5. Can have shape properties6. Shape specific UI enabled

1. Two overlaid fills allowed2. Borders grow outward3. Lock aspect ratio flag4. Cannot have text attached5. Can have shape properties6. Picture specific UI enabled

Page 39: Day Of Dot Net Ann Arbor 2007

Graphic ObjectsGraphic Objects

GraphicGraphic element represents a single graphical object element represents a single graphical object GraphicDataGraphicData element and element and UriUri attribute attribute

Specifies the namespace for the embedded contentSpecifies the namespace for the embedded content Tells the consumer how to interpret the graphicDataTells the consumer how to interpret the graphicData Ability to render is application specificAbility to render is application specific Office supports a set of specific URI values:Office supports a set of specific URI values:

http://schemas.openxmlformats.org/drawingml/2006/charthttp://schemas.openxmlformats.org/drawingml/2006/chart http://schemas.openxmlformats.org/drawingml/2006/diagramshttp://schemas.openxmlformats.org/drawingml/2006/diagrams

Graphic Object

<graphic> <a:graphicData uri="http://schemas.../drawingml/2006/chart"> <c:chart xmlns:c="http://schemas.../drawingml/2006/chart" xmlns:r="http://schemas.../officeDocument/2006/relationships" r:id="rd123232" /> </a:graphicData></graphic>

<graphic> <a:graphicData uri="http://schemas.../drawingml/2006/chart"> <c:chart xmlns:c="http://schemas.../drawingml/2006/chart" xmlns:r="http://schemas.../officeDocument/2006/relationships" r:id="rd123232" /> </a:graphicData></graphic>

URI means chartfollows

Page 40: Day Of Dot Net Ann Arbor 2007

ChartsCharts Graphic Object definitionGraphic Object definition

References separate XML chart partReferences separate XML chart part Defined in DrawingML namespaceDefined in DrawingML namespace

Chart XML PartChart XML Part Visual representation of data.Visual representation of data. Includes a cache of data for chart.Includes a cache of data for chart. Includes formatting using DrawingML.Includes formatting using DrawingML.

Data RelationshipData Relationship External relationship to file, orExternal relationship to file, or Internal relationship to embedded Internal relationship to embedded

spreadsheetspreadsheet Spreadsheets point to their own data.Spreadsheets point to their own data.

Chart DrawingChart Drawing Contains shapes and pictures drawn Contains shapes and pictures drawn

on charton chart

XML Chart Part

XML Chart Part

Graphic Object

Graphic Object

Data SourceData

SourceChart

DrawingChart

Drawing

Page 41: Day Of Dot Net Ann Arbor 2007

Build a Document in CodeBuild a Document in Code

Page 42: Day Of Dot Net Ann Arbor 2007

ResourcesResources

OpenXMLDeveloper.orgOpenXMLDeveloper.org OpenXMLSDKOpenXMLSDK Package ExplorerPackage Explorer Code SnippetsCode Snippets

http://blogs.nusoftsolutions.com/DTruxall/

Page 43: Day Of Dot Net Ann Arbor 2007
Page 44: Day Of Dot Net Ann Arbor 2007

[email protected]@nusoftsolutions.com

http://blogs.nusoftsolutions.com/DTruxall/http://blogs.nusoftsolutions.com/DTruxall/