pdf/a-3 for preservation. notes on embedded files and jpeg2000
DESCRIPTION
Johan van der Knijff, the National Library of the Netherlands, presented his views on ‘PDF/A-3 for preservation’ based on notes on embedded files and JPEG2000. The presentation was given at DPC briefing (http://bit.ly/1b487mD) which introduced and reviewed recent developments with the PDF / A standard, with particular emphasis on PDF/A version 3 published in October 2012. The meeting took place in Leeds on 13 March 2013.TRANSCRIPT
![Page 1: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/1.jpg)
SCAPE
Johan van der Knijff
Koninklijke Bibliotheek – National Library of the Netherlands
DPC, PDF/A-3 Briefing, Leeds, 13.3.2013
PDF/A-3 for preservation Notes on embedded files and JPEG 2000
![Page 2: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/2.jpg)
Part 1: Embedded files
PDF/A-3: embedding of any file (type)
![Page 3: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/3.jpg)
![Page 4: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/4.jpg)
Key point:
Use of “embedded files” really means “embedded file streams” = specific data structure in PDF!
![Page 5: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/5.jpg)
File specification dictionary
31 0 obj <</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >> endobj
![Page 6: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/6.jpg)
File specification dictionary
31 0 obj <</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >> endobj
EF key points to embedded file
stream
![Page 7: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/7.jpg)
Embedded file stream
32 0 obj <</Type /EmbeddedFile /Subtype /image#2Fsvg+xml /Length 72>> stream …SVG Data… endstream endobj
![Page 8: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/8.jpg)
Uses of embedded file streams
![Page 9: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/9.jpg)
![Page 10: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/10.jpg)
File attachments not meant to be rendered by viewer
![Page 11: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/11.jpg)
File attachment annotation EmbeddedFiles entry in name dictionary
PDF/A-3
![Page 12: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/12.jpg)
![Page 13: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/13.jpg)
Rendered in/by PDF viewer
![Page 14: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/14.jpg)
Rendition actions Screen annotations
PDF/A-3
![Page 15: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/15.jpg)
What about inline images?
![Page 16: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/16.jpg)
Not based on “embedded file stream”, but on “Image XObject” data structure (allows limited set of pre-defined formats)
What about inline images?
![Page 17: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/17.jpg)
No impact on content that is meant to be rendered by PDF viewer
But PDF/A-3’s may contain file of any possible
format as an attachment
Embedded files wrap-up:
![Page 18: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/18.jpg)
Part 2: JPEG 2000
Supported since PDF/A-2
![Page 19: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/19.jpg)
![Page 20: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/20.jpg)
Image XObject
1614 0 obj <</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB /BitsPerComponent 8/Interpolate true/Length 5278 /Filter/JPXDecode>> stream … Image data … :: :: endstream endobj
![Page 21: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/21.jpg)
Image XObject
1614 0 obj <</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB /BitsPerComponent 8/Interpolate true/Length 5278 /Filter/JPXDecode>> stream … Image data … :: :: endstream endobj
Identifies object as JPEG 2000 image
![Page 22: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/22.jpg)
ISO 19005-2 (PDF/A-2):
JPEG 2000 support based on subset of JPEG 2000 Part 2 (JPX baseline)
Only Part 1 of the standard (JP2) commonly
used for archival applications!
![Page 23: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/23.jpg)
JP2 vs JPX
JP2
JPX
JPEG 2000 Part 1: Basic still image format
JPEG 2000 Part 2: = JP2 + assorted advanced stuff …
![Page 24: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/24.jpg)
Fragmented codestreams
Allowed in JPX Baseline!
![Page 25: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/25.jpg)
OS PDF viewers – JPEG 2000 libraries
Ghostscript: OpenJPEG or JasPer Evince: OpenJPEG Mupdf: OpenJPEG Firefox PDF viewer: built-in decoder None of these libraries support fragmented
codestreams!
![Page 26: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/26.jpg)
Is it really a problem?
Fragmented codestreams extremely rare But why is this feature even allowed in a long-
term archival format? OS support of JPEG 2000 in general remains
problematic
![Page 27: PDF/A-3 for preservation. Notes on embedded files and JPEG2000](https://reader033.vdocuments.us/reader033/viewer/2022042813/546cfa25b4af9f7f2c8b527b/html5/thumbnails/27.jpg)
#SCAPEProject
http://www.scape-project.eu
This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).
Funding