transforming tei with oxgaragetei.oucs.ox.ac.uk/talks/2014-11-warsaw/talk-3-01-oxgarage.pdf ·...
Post on 19-Apr-2020
3 Views
Preview:
TRANSCRIPT
How hard is it to convert a Word file to other formats?
It is relatively easy to
Save to HTML from Word (its not as bad as it used to be)
Save to HTML and then tidy up (many utilities, eg tidy)
Read Word into OpenOffice and use its slightly better export
but we can also access Word 2007 (and later) files more directly and moreflexibly.
2/25
Semantically, this is:
.
......
<div><head>Cats</head><p>Cats are nice. Really quite <hi rend="bold">nice</hi>.</p>
</div><div><head>Dogs</head><p>Dogs are horrid, because</p><list type="ordered"><item>They jump around</item><item>They bark</item><item>They steal the sofa</item>
</list></div>
4/25
What are the files for?
[Content_Types].xml mime types of files_rels/.rels links between names and ob-
jectsword/_rels/document.xml.rels links between names and sup-
port filesword/document.xml document bodyword/media/image1.jpeg pictureword/theme/theme1.xmldocProps/thumbnail.jpeg document thumbnailword/settings.xml settingsword/webSettings.xml settings for HTML exportword/styles.xml style definitionsword/numbering.xml numbering schemesdocProps/core.xml document propertiesword/fontTable.xml font detailsdocProps/app.xml application details
Most of these are XML files.
10/25
A list item in DOCX
.
......
<p rsidR="00272A5B"rsidRDefault="00272A5B" rsidP="00272A5B"><r><t>Dogs are horrid, because</t>
</r></p><p rsidR="00272A5B"rsidRDefault="00272A5B" rsidP="00272A5B"><pPr><pStyle val="ListParagraph"/><numPr><ilvl val="0"/><numId val="1"/>
</numPr></pPr><r><t>They jump around</t>
</r></p>
11/25
OxGarage to the rescue!
OxGarage is a web app(http://oxgarage.oucs.ox.ac.uk:8080/ege-webclient) whichprovides document transformations, featuring
Web and REST interface
Chained XSLT conversions
Uses headless OpenOffice for binary conversions
Uses TEI XML as pivot format
Supports Stylesheets “profiles” for variations
Open source across the board
12/25
History and dependencies
Built at the Poznań Supercomputing and Networking Center forENRICH, an EU-funded eContent+ project.
It was called the EGE '(ENRICH Garage Engine') and designed as apipeline conversion for converting manuscript descriptions, usingconversions and libraries from University of Oxford
Now much further developed and maintained as a fork by theUniversity of Oxford
Java servlet, running under Tomcat in current instances
Almost all work done as XSLT transforms using Saxon processor
Uses headless OpenOffice to read/write .doc, .xls, .ppt files etc.
http://www.github.com/sebastianrahtz/oxgarage
13/25
OxGarage in OxfordOxGarage is used:
As a data/text cleanup/nornalization tool by humanities researchers(eg converting doc to TEI XML, TEI to Excel, Wordpress blog to LaTeX,TEI to Word)
As an enabling technology for IT Services course booking system(creating Word files for download)
As a component of teaching in Digital Humanities Summer Schooland other IT Learning Programme where the Text Encoding Initiativeis covered (teaching students how to make different outputs)
As an enabling technology for schema creation by TEI usersworldwide, underlying the Roma application(http://www.tei-c.org/Roma/)
.
......
OxGarage is currently unofficial in its support and maintenance -- we areapplying for an internal project to transition it to being a proper service.
14/25
OxGarage: constructing a path
http://oxgarage.oucs.ox.ac.uk:8080/ege-webservice/Conversions/format/format/?properties
‘formats’ are a name followed by a mime type. For example:
format codeePub application%3Aepub+zipXSL FO application%3Axslfo+xmlLaTeX application%3Ax-latexTEI LITE text%3AxmlODD HTML application%3Axhtml+xmlODD Json application%3AjsonODT application%3Avnd.oasis.opendocument.textRDF application%3Ardf+xmlRELAX NG application%3Axml-relaxngTEI ODD ODD%3Atext%3Axml/TEI P5 TEI%3Atext%3Axml/Word docx%3Aapplication%3Avnd.openxmlformats-
officedocument.wordprocessingml.document/
22/25
OxGarage web service example (1)
Process ODD to compiled ODD, then to TEI Lite, then to DOCX
curl -s -F upload=@test.odd -o test.docxhttp://oxgarage.oucs.ox.ac.uk:8080/ege-webservice/Conversions/ODD%3Atext%3Axml/ODDC%3Atext%3Axml/TEI%3Atext%3Axml/docx%3Aapplication%3Avnd.openxmlformats-officedocument.wordprocessingml.document/
23/25
OxGarage web service example (2)
ODD to HTML, in French
curl -s -F upload=@test.odd -o test.htmlhttp://oxgarage.oucs.ox.ac.uk:8080/ege-webservice/Conversions/ODD%3Atext%3Axml/ODDC%3Atext%3Axml/oddhtml%3Aapplication%3Axhtml%2Bxml/?properties=<conversions><conversion%20index='1'><property%20id='oxgarage.lang'>fr</property></conversion></conversions>
24/25
top related