scape information day at bl - flint, a format and file validation tool
DESCRIPTION
Alecs Geuder from the British Library presented a new SCAPE developed tool called ‘Flint’ at the ‘SCAPE Information Day at the British Library’, on 14 July 2014. Flint is a format and file validation tool which can be used to valide your files and/or formats against a policy. At the British Library Flint is used to deal with non print legal deposit. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.TRANSCRIPT
Flint – a format and file validation tool
Alecs Geuder
SCAPE Information Day
British Library, UK, 14th July 2014
Introducing Flint: Presentation Structure
• Introduction
• What does Flint do?
• Flint-the-API
• Policy-focused Validation
• Flint-the-toolbox
• Format-specific Implementations
• How we are using it
• Mini-demo
Introduction
• Flint facilitates [file/format validation against a policy]
• the code centres on individual file format modules (pdf, epub, ..)
• Comes with a command line interface, GUIs and a hadoop mapreduce program
FLint – core features
Schematron Policy
• categoryA – three tests • categoryB – two tests
Input file of specific format
PolicyAware (Uses schematron-utils)
categoryC – two tests
Format specific Implementation
• canCheck • validationResult • ..
<checkresult file=“input file“ result=“passed”> <categoryA result=“passed”/> <categoryB result=“failed”/> <testB.1 result=“failed”/> <testB.2 result=“failed”/> <categoryC result=“passed”/>
</checkresult>
configuration
code
Set of internal & third party tools
The FLint ecosystem
config
code
CLI
GUIs
hadoop
EPUB
Geospatial data
…
Entry points
Format/Feature specific
Implementations
CORE
DRM-detection PDF/EPUB
Input file
<checkResult>
How we are using it
• To deal with non print legal deposit
What’s next
• Add additional format/feature modules (geospatial, etc..)
Mini-demo