formatting jats: as easy as 1-2-3 - mentea · 2014. 4. 9. · • navigating area tree isn’t easy...

26
Formatting JATS as easy as 1-2-3 Tony Graham Mentea 13 Kelly’s Bay Beach Skerries, Co Dublin, Ireland [email protected] @MenteaXML http://www.mentea.net Version 1.0 – 2 April 2014 © 2014 Mentea All rights reserved. The author grants the U.S. National Library of Medicine permission to archive and post a copy of this paper on the Journal Article Tag Suite Conference proceedings website.

Upload: others

Post on 14-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Formatting JATS

as easy as 1-2-3

Tony GrahamMentea13 Kelly’s Bay BeachSkerries, Co Dublin, [email protected]@MenteaXMLhttp://www.mentea.net

Version 1.0 – 2 April 2014© 2014 MenteaAll rights reserved. The author grants the U.S. National Library ofMedicine permission to archive and post a copy of this paper on theJournal Article Tag Suite Conference proceedings website.

Page 2: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating
Page 3: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Formatting JATSas easy as 1-2-3

JATS Preview stylesheets 5

Aside: GitHub 11

XSLT 1.0: Government body 11

XSLT 2.0: PLOS ONE 15

XSLT 3.0: xslt3testbed 21

References 24

Appendix A – About 25

Formatting JATS

© 2014 Mentea 3

Page 4: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Mentea

4 © 2014 Mentea

Page 5: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Formatting JATS: as easy as 1-2-3 1

• JATS Preview stylesheets

• XSLT 1.0

• XSLT 2.0

• XSLT 3.0

The 1-2-3 comes from using JATS with three versions of XSLT.

JATS Preview stylesheets 2

https://github.com/NCBITools/JATSPreviewStylesheets

• XSLT 1.0

• Public domain

• No copyright issues

• Developed for NCBI by Mulberry Technologies

JATS Preview with “selfie” 3

Paper for this talk formatted using JATS Preview stylesheet with picture of paper formattedusing JATS preview stylesheets.

(‘selfie’ was added to the OED in 2013 so maybe it doesn’t need quotes)

Formatting JATS

© 2014 Mentea 5

Page 6: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Reconstructed timeline 4

2000

2010

2015

2005XSLT 1.

0

Initial

design

NLM Pre

view

relea

se

JATS Pre

view 1.

0

XSLT 2.0

XSLT 3.0 ?

Reconstructed from comments in code, downloads, and emails with Kim Tryka and TommieUsdin.

Why still XSLT 1.0 in 2012? 5

• XSLT 1.0 still dominant on some platforms

• .NET

• Linux/Unix

• Also tested with XSLT 2.0

• NLM stylesheets developed circa 2006/2007

• One well-known XSLT 2.0 processor

• Java only

What does it do? 6

• Preprocessing

• Convert OASIS tables to HTML tables

• Massage citation format

• Some require XSLT 2.0

• Formatting

• XML to HTML

• XML to XSL-FO for formatting as PDF• Post-processing

• HTML to XHTML for MathML

The only part that I’ve needed to use, and the only part being covered, is the transformationto XSL-FO and formatting to PDF.

Mentea

6 © 2014 Mentea

Page 7: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Customizability 7

“These stylesheets are provided as a point of entry for JATS users who may not have theresources to create them from scratch. Because there are many varied implementations of JATS,you should have no expectation that these stylesheets will create production ready files in anyarbitrary system. Instead, the stylesheets should be customized for your particular needs.”

“Because we view these stylesheets as a template for a customized solution, not the solutionitself, we will accept changes that fix an actual bug, but we will not merge in changes that we viewas “customization”. For example, we will accept changes that fix a problem which otherwise leadsto failure in creating a final output file, but we will not accept changes that focus on presentationalaspects of the final output (such as font changes, margin changes, graphics sizing, etc).”

Statement about customisation from JATSPreviewStylesheets README with addedemphasis.

XSLT features supporting customizability 8

• Templates

• Modular stylesheets

• Named attribute sets

Templates 9

• match matches a context in source XML

• Content of xsl:template instantiated when template is applied

<xsl:template match="td"> <fo:table-cell xsl:use-attribute-sets="td"> <xsl:call-template name="process-table-cell"/> </fo:table-cell></xsl:template>

Elements in the body of the template not in the XSLT namespace are copied to the result,and elements and attributes in the XSLT namespace are acted on by the XSLT processor.

Formatting JATS

© 2014 Mentea 7

Page 8: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Modular stylesheets 10

<xsl:include href = uri-reference />

• href refers to other stylesheet

• Children of other xsl:stylesheet replace xsl:include

<xsl:import href = uri-reference />

• href refers to other stylesheet

• Imported definitions and template rules not part of importing stylesheet

• Have lower import precedence

Imports in JATS XSL-FO preview 11

xhtml-tables-fo.xsl

xsl:includexsl:import

jats-xslfo.xsl

There are more interesting block diagrams later.

Overriding templates 12

• Template in importing stylesheet overrides same context in imported

• Good when overriding complete function of template

• Extra overhead if you just want to change one little thing

Mentea

8 © 2014 Mentea

Page 9: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Attribute sets 13

• Named set of attribute definitions

• Use in multiple places

• Definitions evaluated in each context where used

<xsl:attribute-set name="fig"> <xsl:attribute name="keep-together.within-page" >always</xsl:attribute> <xsl:attribute name="id"> <xsl:value-of select="generate-id()" /> </xsl:attribute></xsl:attribute-set>

Since attribute definitions in attribute sets are evaluated each time the attribute set is used,the value of the id attribute will be unique to each context.

JATS Preview supporting customizability 14

• Global variables

• Attribute sets

• Named templates

Formatting JATS

© 2014 Mentea 9

Page 10: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Example customization 15

• Add to attribute set from JATS stylesheets

<xsl:attribute-set name="td"> <xsl:attribute name="line-stacking-strategy" >max-height</xsl:attribute></xsl:attribute-set>

• New attribute set reusing merged td attribute set

<xsl:attribute-set name="td-small" use-attribute-sets="td"> <xsl:attribute name="line-height">10pt</xsl:attribute> <xsl:attribute name="border">none</xsl:attribute> <xsl:attribute name="padding-top">0pt</xsl:attribute> <xsl:attribute name="padding-bottom">0pt</xsl:attribute></xsl:attribute-set>

• Override JATS stylesheet in more-specific context

<xsl:template match="td[ancestor::table[@style = 'small']]"> <fo:table-cell xsl:use-attribute-sets="td-small"> <xsl:call-template name="process-table-cell"/> </fo:table-cell></xsl:template>

The xsl:attribute-set extends the ‘td’ defined in the JATS Preview stylesheet.

The new ‘td-small’ attribute set includes the attribute definitions from all declarations for the‘td’ attribute set plus the definitions contained in its definition.

The template matches on a more-specific context than the general-purpose template for tdin the JATS Preview stylesheets, so in those particular contexts, the XSLT processor uses thistemplate, which adds a different set of attributes to the generated fo:table-cell butwhich still uses the ‘process-table-cell” named template from the JATS Preview stylesheets asis used in the original template for td.

This illustrates in a nutshell how a customisation is able to extend, override, and reuse theconstructs in the core JATS Preview stylesheets.

Summary: JATS Preview 16

• XSLT 1.0

• Not accepting customisations into core

• Stylesheet structure facilitates customisations

Mentea

10 © 2014 Mentea

Page 11: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Aside: GitHub 17

• “World’s largest open source community”

• Git distributed version control system

• Easy to “fork” – make your own version of projects

• Easy to “pull” merge requests from other projects

XSLT 1.0: Government body 18

The paper for this talk formatted using XSLT 1.0 stylesheets

Formatting JATS

© 2014 Mentea 11

Page 12: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Project details 19

• Source: variation on JATS Blue with custom metadata

• Result: similar page design to JATS preview stylesheets

• XSLT 1.0 because...

• Client preference

• Body and back content unchanged from JATS

• Page design similar to JATS preview

• Customisation...

• Changes in new modules

• Import JATS Preview stylesheets

Import structure 20

Mentea

12 © 2014 Mentea

Page 13: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

MathML fix-up modules 21

• Separate modules that can be dropped when problems solved

• project-mathml.xsl – add parentheses around display equation number

• format-mathml.xsl – workaround too-high accented characters

becomes

(Latest formatter has rewritten MathML support)

Formatting JATS

© 2014 Mentea 13

Page 14: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

What’s in project-xslfo.xsl? 22

Summary: XSLT 1.0 23

• Customisation on top of JATS Preview stylesheets

• Preview stylesheets provided sufficient hooks

Mentea

14 © 2014 Mentea

Page 15: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

XSLT 2.0: PLOS ONE 24

Sample PLOS ONE pages.

Project details 25

• Peer-reviewed, open-access, online publication

• Public Library of Science

• JATS/NLM markup

• Lights-out batch formatting with XSL-FO

• Previously produced use 3B2 and (presumably) manual fix-up

• XSLT 2.0 because...

• Big differences in metadata, figure, table handling

• Needed vendor extensions

• Customisation...

• Modified version of jats-xslfo.xsl

• Additional XSLT modules

Formatting JATS

© 2014 Mentea 15

Page 16: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

PONE “features” 26

• Figures and tables float to top (or bottom) of page

• Figures column-wide or page-wide

• No size information in XML

• Figure graphic+caption can’t overflow page

• Tables column-wide, page-wide, or page-high

• Page-high may be single column

• May be multiple pages

• No width indication in XML

• No row spanning (thank goodness!)

• No figures or tables allowed after start of back matter

XSLT/XSL-FO “features” 27

• Page-wide floats

• Vendor extension for column-wide

• Floats don’t break

• Floats only at top of page

• Bottom-float extension available but unused

• Graphic size not available to XSLT

• Fire-and-forget processing

Table handling 28

• “Pre-format” tables in three widths on long pages

• Column-wide, page-wide, (width of) page-high

• Prefix table IDs with string indicating width

• Format to area tree XML

• Compare area trees for each table

• Use width with least area and no overflow

• Recreate as multiple fo:float if overflows page

• Re-use table column widths from area tree to remain consistent

Mentea

16 © 2014 Mentea

Page 17: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Picking “Best” Tables 29

Three tables formatted in each of three widths, with preferred versions highlighted.

Sized and placed tables 30

Column-wide and page-wide tables placed on pages.

Formatting JATS

© 2014 Mentea 17

Page 18: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Usual processing model 31

fo:</>

xsl:

plos-xslfo

The conventional XSLT–XSL-FO processing model.

Table-handling processing model 32

fo:

</> fo:

xsl: xsl:

table-chooser sizer

The processing model including preprocessing tables to generate an area tree from which todetermine the preferred width for each table.

Graphics handling 33

• Get TIFF graphics

• ImageMagick identify gives graphic size and resolution

• “Pre-format” caption at both widths to get exact size

• Choose best width

• (Possibly) scale down graphic so caption also fits on page

Mentea

18 © 2014 Mentea

Page 19: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Figure-handling processing model 34

fo:

</> fo:

xsl: xsl:

table-chooser sizer

413 ×600

unparsed-text()

Processing model when graphics handling added.

Floats after back matter 35

Figures and tables are required to not appear after the start of the back matter.

Formatting JATS

© 2014 Mentea 19

Page 20: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Splitting at back matter 36

• Format “final” FO with right-width tables and figures to area tree

• Compare positions of first “back” content and last float

• back plus bits from front, body

• Generate new FO with either one or two fo:page-seqence

• If second fo:sequence, it contains only back matter so floats in first appear before backmatter

Putting It All Together 37

fo:

</> fo:

fo:

xsl: xsl: xsl:

table-chooser sizer splitter

413 ×600

unparsed-text()

The full processing model.

Import structure 38

All the top-level stylesheets use plos-xslfo.xsl for basic formatting.

splitter.xsl does everything size-chooser.xsl does, and more, so it imports thatfile rather than importing plos-xslfo.xsl directly.

Mentea

20 © 2014 Mentea

Page 21: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Summary: XSLT 2.0 39

• It shouldn’t be this hard

• Column-wide floats require vendor extension

• Navigating area tree isn’t easy

• No standard for area tree XML made it harder and even less portable

• Creating new FO and reprocessing easier than rewriting area tree

• EXPath Binary Module (and a TIFF-handling library!) could avoid using ImageMagick

• Or use vendor extension

XSLT 3.0: xslt3testbed 40

https://github.com/MenteaXML/xslt3testbed

• Trying out new XSLT 3.0 features

• Converting existing JATS stylesheets to XSLT 3.0

Why? 41

“...the design process does not include enough feedback; by the time people start reporting theirusability experiences, the decisions are difficult to change.”

• Early start on patterns and idioms to help adoption

• Find infelicities in spec (and implementations)

• The time is right

• Project started November 2013

• XSLT 3.0 Last Call WD – 12 December 2013

Quote from Micheal Kay, editor of XSLT 3.0 spec: http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201403/msg00332.html

Motivation comes from looking for a better way to get people using the new version:

• 1997: Wanted to discuss DSSSL so started DSSSList

• 1998: XSL-List started – people tried every new XSL feature as it came out

• 2004–2007++: People had working XSLT 1.0 systems and there weren’t many XSLT 2.0processors, so adoption slow

• 2013–2014: Looking for a quicker win than mailing lists, and people now used to workingwith GitHub projects

W3C Process 42

• End game for a W3C spec:

• Last Call

• Candidate Recommendation

• Proposed Recommendation

• Recommendation

• Changes after “Last Call” require more documentation and substantiation

Formatting JATS

© 2014 Mentea 21

Page 22: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Why JATS? 43

• Simpler than, e.g., DocBook or TEI

• Not a toy

• Potentially useful to authors and archives

• Existing XSLT stylesheets available

Why JATSPreviewStylesheets? 44

https://github.com/NCBITools/JATSPreviewStylesheets

• XSLT 1.0

• Easy for new contributors to add XSLT 2.0-isms

• Public domain

• No copyright issues

• XSLT 3.0 stylesheets also public domain

• Explicitly not supporting gazillion customisation parameters, PIs, etc.

• Simpler processing

• Fewer user expectations

xslt3testbed goals 45

• Trial different techniques

• Open for dipping into to try random ideas

• Develop patterns and idioms

• Develop XSLT 3.0 package for XHTML tables

• xsl:package new in XSLT 3.0

• XHTML tables used in many document types

xslt3testbed non-goals 46

• Single best way of doing anything

• Multiple ways to solve the same problem are okay

• Definitive XSLT 3.0 testbed

• It’s easy to fork and make your own version

• Complete stylesheet for all of JATS

• Existing stylesheets don’t cover everything yet either

Results so far 47

• Trying out maps, anonymous functions, and xsl:iterate

• Small advances in multiple areas

• Both XSL-FO and XHTML stylesheets

• More details in XML Prague 2014 talkhttp://www.mentea.net/resources/xslt30testbed-slides.pdf

Mentea

22 © 2014 Mentea

Page 23: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

6 W3C Bugzilla bu^H^Htickets so far 48

5 JATSPreviewStylesheets patches so far 49

Other results 50

• One XSLT processor bug

• One change to Wendell Piez’s JATS Oxygen plug-in

• Technique for hosting Oxygen plugins on GitHub

Formatting JATS

© 2014 Mentea 23

Page 24: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Summary: XSLT 3.0 51

https://github.com/MenteaXML/xslt3testbed

• The time is right

• Useful in multple arenas

• Results summarised on project wiki and http://inasmuch.as/

• Well suited for trying things out

• Go fork and multiply

Conclusion 52

• JATS Preview stylesheets:

• Explicitly don’t support customisation

• Good basis for your own customization

• Customise by:

• Layer on top of existing styleheets

• Modify your copy of the stylesheets

• Usable with XSLT 1.0, 2.0, or 3.0

References 53

• slide 41 – Micheal Kayhttp://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201403/msg00332.html

• slide 42 – W3C Process Documenthttp://www.w3.org/2005/10/Process-20051014/tr.html

• slide 48 – Bugs so farhttps://www.w3.org/Bugs/Public/buglist.cgi?email1=tgraham%40mentea.net&emailreporter1=1&emailtype1=substring&product=XPath%20%2F%20XQuery%20%2F%20XSLT&query_format=advanced

Mentea

24 © 2014 Mentea

Page 25: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating

Appendix AAboutTony Graham 25

Mentea 25

Tony Graham

Tony Graham has been working with markup since 1991, with XML since 1996, and with XSLT/XSL-FO since 1998. He is Chair of the Print and Page Layout Community Group at the W3C andpreviously an invited expert on the W3C XML Print and Page Layout Working Group (XPPL)defining the XSL-FO specification, as well as an acknowledged expert in XSLT, developer of theopen source xmlroff XSL formatter, a committer to both the XSpec and Juxy XSLT testingframeworks, the author of “Unicode: A Primer”, a member of the XML Guild, and a qualifiedtrainer.

Tony’s career in XML and SGML spans Japan, USA, UK, and Ireland, working with data in English,Chinese, Japanese, and Korean, and with academic, automotive, publishing, software, andtelecommunications applications. He has also spoken about XML, XSLT, XSL-FO, EPUB, andrelated technologies to clients and conferences in North America, Europe, and Australia.

Mentea

Mentea specialises in consulting and training in XML, XSL-FO, & XSLT. We are available for on-sitemeetings and classes, worldwide, but as well as on-site meetings and classes, we routinely keep intouch with clients though email, Skype, instant messaging, and telephone and through a secure,per-client or per-project wiki, revision-control, and issue-tracking system.

Our staff have been working with markup since 1991, with XML since 1996, and with XSLT/XSL-FOsince 1998. Based in Dublin, Ireland, Mentea has a global reach: in recent projects, we have helpedcompanies and organisations in the USA, Ireland, England, and France with their XSLT, XSL, andXML, including:

• Writing Schematron for a professional body

• Augmenting a XSLT-based automated schema documentation system that produces bothHTML and PDF

• Extending FOP for a software company

• Training in XML, oXygen, DocBook, XSLT 2.0, and XSL-FO

• Formatting JATS to PDF for a scientific journal

• Writing XSLT stylesheets to convert non-XML into XML then into EPUB

• Writing XSLT to convert Excel into XML for a commercial bank

Mentea presents a unique range of skills extending beyond XML and XSL-FO/XSLT into Unicode,SGML, DSSSL, and programming in C, Java, Perl, Lisp, and other languages.

We understand how markup works. Our staff has worked with markup in Japan, USA, UK, andIreland as user, consultant, and developer, with data in English, French, Chinese, Japanese, andKorean, with academic, automotive, publishing, software, and telecommunications applications,and in the Web Services and document processing arenas.

We are also interested in applying the tools for ensuring software quality – unit testing, codecoverage, profiling, and other tools – to XML and XSLT/XSL-FO processing.

Through our associations and affiliations with other consultants around the world, we can call onextra help for large or specialised projects.

Formatting JATS

© 2014 Mentea 25

Page 26: Formatting JATS: as easy as 1-2-3 - Mentea · 2014. 4. 9. · • Navigating area tree isn’t easy • No standard for area tree XML made it harder and even less portable • Creating