creating a new jhove2 format module sheila morrissey portico code4lib 2011 bloomington in, february...
TRANSCRIPT
Creating a New JHOVE2 Format Module
Sheila MorrisseyPortico
Code4Lib 2011Bloomington IN, February 7, 2011
The preservation problemManaging the gap between what you were given and what you need
– That gap is only manageable if it is quantifiable
– Characterization tells you what you have, as a stable starting point for iterative preservation planning and action
Adopted from A. Brown, “Developing Practical Approaches to Active Preservation,” IJDC 2:1 (June 2007): 3-11.
Characterization
Preservation action
Preservation planning
“What? So what?”
Characterization is the automated determination of the intrinsic and extrinsic properties of a formatted object
– Identification
– Feature extraction
– Validation
– Assessment
Determining the presumptive format of a digital object based on suggestive extrinsic hints and intrinsic signatures
Reporting the intrinsic properties of an object significant for classification, analysis, and planning
Supported formats
JHOVE2 can identify (by DROID) many more formats than it can validate (by modules)
– PRONOM registry documents over 550 “formats”http://www.nationalarchives.gov.uk/PRONOM
Supported formats
ICC color profile (ICC.1:2004-10)
JPEG 2000 JP2 (ISO/IEC 15444-1), JPX (ISO/IEC 15444-2)
PDF PDF 1.0 – 1.7, ISO 3200-1, PDF/A-1 (ISO 19005-1), PDF/X-1(ISO 15920-1), -1a (ISO 15930-4), -2 (ISO 15930-5) -3 (ISO 15930-6)
SGMLShapefileMain, Index, dBASE, …
TIFF TIFF 4 – 6, Class B, F, G, P, R, Y, TIFF/EP (ISO 12234-2),TIFF/IT (ISO 12639), GeoTIFF, Exif (JEITA CP-3451), DNG
UTF-8 ASCII (ANSI X3.4)
WAVE BWF (EBU N22-1997)
XMLZip
Contributed format modules
From Wegener Institute (http://www.awi-potsdam.de)– netCDF– Grib
From NationalbibliothekBibliothèque nationale de France (BnF) (http://www.bnf.fr/fr/acc/x.accueil.html)– arc– gzip
YOU!!!– ???
Characterization strategy
1. Identify format (if not previously identified)
2. Dispatch to appropriate format module
a) Extract format features and validate– If a nested source unit is found, process
recursively…
b) Validate format profiles (if registered)3. Assess
4. If unitary source unit, calculate message digests (optional)
5. If an aggregate source unit, try to identify aggregate format, and if successful, process recursively…
Characterization strategy
directory/
abc.shp abc.shx abc.dbf abc.tif
Main Index dBASE GeoTIFF
xyz.pdf
Characterization strategy
directory/
abc.shp abc.shx abc.dbf
abc.tifclump
Main Index dBASE
GeoTIFF
Shapefile xyz.pdf
Characterization strategy
directory/
abc.shp abc.shx abc.dbf
abc.tif
clump
clump
Main Index dBASE
GeoTIFF
Shapefile
“GIS object” xyz.pdf
API design idioms
Separation of concerns– Annotation and reflection
confluence.ucop.edu/display/JHOVE2Info/Background+Papers
Inversion of control (IOC) / dependency injection– Martin Fowler
martinfowler.com/articles/injection.html
– Spring frameworkwww.springsource.org/
Separation of concerns
“Let POJOs be POJOs”– Focus on modeling the format itself
“Let the code write itself”– Reportables “know” how to expose their
properties for display– Reference documentation generated from the
code
Annotation and Reflection:Reportable properties
Each reportable property is represented by a field and accessor and mutator methodsThe accessor method must be marked with the @ReportableProperty annotation
public class MyReportable implements Reportable{ protected String myProperty;
@ReportableProperty(order=1, desc=“description”, ref=“reference”) public String getMyProperty() { return this.myProperty; } public void setMyProperty(String property) { this.myProperty = property; }}
Dependency injection
All JHOVE2 function is embodied in pluggable modules
– Flexible customization Re-sequencing of pre-existing modules
– Easy extensibility Additional format modules and profiles Additional aggregate identifiers Additional displayers New behaviors
RenderabilityModule
JHOVE2 framework
Embodiment of a characterization strategy as a configurable sequence of command-invoked modules
public void characterize(Source source, Input input) throws IOException, JHOVE2Exception{ source.getTimerInfo().setStartTime();/* Update summary counts of source units, by type. */ this.sourceCounter.incrementSourceCounter(source); for (Command command : this.commands){ TimerInfo time2 = command.getTimerInfo(); time2.resetStartTime(); try { command.execute(this, source, input); } finally { time2.setEndTime(); } } source.getTimerInfo().setEndTime();}
Creating a New Format Module:What are the deliverables?
• Source code• Configuration files• Sample (test) files• Documents
Format Module Artifacts:Source Code
• Module classes– Module (extends org.jhove2.module.format.BaseFormatModule)
– Profiles (extend org.jhove2.module.format. AbstractFormatProfile) as required by format
– Supporting classes expressing format content model as required by format
• Test classes– JUnit test(s)
Format Module ArtifactsConfiguration Files
• Spring IOC Bean XML configuration files,• For Module• For unit test as needed• For Assessment criteria
• Messages properties file additions if needed• Properties files
• Displayer• Units of measure• Module-specific
Format Module Artifacts:Sample (Test) Files
–Sample files used in unit test• Valid files• Invalid files to exercise validity constraints
Format Module Artifacts:Documentation
• Module Specification DocumentSee examples on the JHOVE2 wiki “Modules Documents” page
<https://bitbucket.org/jhove2/main/wiki/Module>
Format Module Artifacts ListNew CSV Format Module
Source codesrc/main/java/org/jhove2/module/format/csv/CsvModule.javasrc/test/java/org/jhove2/module/format/csv/CsvModuleTest.java
Configuration filesSpring
config/spring/module/format/csv/jhove2-csv-config.xmlconfig/spring/module/assess/jhove2-ruleset-csv-config.xmlsrc/test/resources/config/module/format/csv/test-config.xml
Messagesconfig/messages/jhove2_message.properties (update, not new)
Displayconfig/properties/module/displayer/org/jhove2/module/format/csv/CsvModule_displayer.propertiesconfig/properties/module/units/org/jhove2/module/format/csv/CsvModule_unitproperties (optional)
Module-specific properties filesconfig/properties/module/format/csv/csv.properties (optional, implementation-determined)
Test File(s)src/test/resources/examples/csv/goodFile.csvsrc/test/resources/examples/csv/badFile01.csvsrc/test/resources/examples/csv/badFile02.csv….
DocumentationCSV Module specification document: Jhove2 wiki
Format Module Artifacts:The Good News
• Generate module and profile from interfaces and base classes via inheritance– Classes reflect format’s own content model: cross-cutting “JHOVE2”
concerns handled via annotation (persistence, serialization, generation of JHOVE2 identifiers for reportable properties)
• Template for Spring XML Module configuration files• Utilities to generate
– Displayer properties files– Units of measure properties files– XML assessment configuration file
• Utilities for specification document– Script to generate tabular content for specification document– Macro to import utility-generated tabular content
Format Module: Research and Analysis
• Format Definition (org.jhove2.core.format.Format)– Names– Type (format/family)– Ambiguity (ambiguous/unambiguous)– Identifiers– Specifications– Validity (comprehensive/selective)– Profiles (none)
• Significant (Reportable) properties (org.jhove2.module.format.csv.CsvFormatModule)
Format Definition:CSV Names
• JHOVE2 canonical name– Comma Separated Values
• Format aliases– CSV– DSV
Might already be defined in config/spring/module/format/jhove2-otherFormats-config.xml
Format Definition :CSV Formal Identifiers
• JHOVE2 identifier (see org.jhove2.core.I8R$Namespace)– [JHOVE2] http://jhove2.org/terms/format/csv
• PRONOM (PUID) identifier (used by DROID)– [PUID] x-fmt/18
• MIME type identifier– [MIME] text/csv
• RFC identifer– [RFC] text/csv
• Other identifiers in other namespaces (see org.jhove2.core.I8R$Namespace)
Might already be defined in config/spring/module/format/jhove2-otherFormats-config.xmlIf you are not using DROID, then you MUST have the identifier(s) from the namespace of your identification tool
Format Definition :CSV Formal Identifiers in Spring
<!– Comma Separated Values JHOVE2 identifier bean --> <!-- (canonical identifier in JHOVE2 namespace) --><!– Single constructor arg defaults to JHOVE2 namespace -->
<bean id="CommaSeparatedValuesIdentifier" class="org.jhove2.core.I8R" scope="singleton">
<constructor-arg type="java.lang.String" value="http://jhove2.org/terms/format/csv"/></bean>
<!– Comma Separated Values PUID identifier bean --><!-- (canonical identifier in PRONOM namespace (used by DROID identifier tool)
--><bean id="CommaSeparatedValuesPUID1" class="org.jhove2.core.I8R"
scope="singleton"> <constructor-arg type="java.lang.String”value="x-fmt/18"/> <constructor-arg type="org.jhove2.core.I8R$Namespace" value="PUID"/></bean
Format Definition :CSV Formal Identifiers in Spring
<!–- Comma Separated Values MIME type aliasIdentifier bean --><bean id="CommaSeparatedValuesMIMEType" class="org.jhove2.core.I8R"
scope="singleton"><constructor-arg type="java.lang.String" value="text/csv"/><constructor-arg type="org.jhove2.core.I8R$Namespace" value="MIME"/>
</bean>
<!–- Comma Separated Values RFC aliasIdentifier bean--><bean id="CommaSeparatedValuesRFC4180" class="org.jhove2.core.I8R"
scope="singleton"><constructor-arg type="java.lang.String" value="RFC 4180"/><constructor-arg type="org.jhove2.core.I8R$Namespace" value="RFC"/>
</bean>
Format Definition :CSV Specifications
• For CSV, many variants• Closest document to a format spec is RFC
– RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt)
Format Definition :CSV Specification in Spring
<bean id=“CsvSpec" class="org.jhove2.core.Document" scope="singleton"><constructor-arg type="java.lang.String"
value=“RFC 4180 Common Format and MIME Type for CSV Files"/><constructor-arg type="org.jhove2.core.Document$Type" value="Specification"/><constructor-arg type="org.jhove2.core.Document$Intention" value="Authoritative"/><property name="author" value=“Y. Shafranovich"/><property name="date" value=“October 2005"/><property name="identifiers">
<list value-type="org.jhove2.core.I8R"><ref bean=" CsvSpecificationURI "/>
</list></property><property name="publisher" value="The Internet Engineering Task Force (IETF)"/>
</bean>
<!–- CSV RFC specification URI bean --><bean id=“CsvSpecificationURI" class="org.jhove2.core.I8R" scope="singleton">
<constructor-arg type="java.lang.String" value=“http://www.ietf.org/rfc/rfc4180.txt"/><constructor-arg type="org.jhove2.core.I8R$Namespace" value="URI"/>
</bean>
Format Definition :CSV Format Bean Definition in Spring
<!-- Bean for the JHOVE2 Comma Separated Values Format Bean --> <bean id="CommaSeparatedValuesFormat" class="org.jhove2.core.format.Format" scope="singleton"><constructor-arg type="java.lang.String" value="Comma Separated Values"/><constructor-arg ref="CommaSeparatedValuesIdentifier"/> <constructor-arg type="org.jhove2.core.format.Format$Type" value="Format"/> <constructor-arg type="org.jhove2.core.format.Format$Ambiguity" value="Unambiguous"/><property name="aliasIdentifiers">
<set value-type="org.jhove2.core.I8R"><ref bean="CommaSeparatedValuesIdentifier"/><ref bean="CommaSeparatedValuesPUID1"/><ref bean="CommaSeparatedValuesMIMEType"/><ref bean="CommaSeparatedValuesRFC4180"/>
</set></property><property name="aliasNames">
<set><value>CSV</value><value>DSV</value>
</set></property><property name="specifications">
<list value-type="org.jhove2.core.Document"><ref bean="CsvSpec"/>
</list></property></bean>
Format Module:Format Module Recipe
• Create package• Place in inheritance hierarchy• Enforce persistence requirements• Populate static (non-user-configurable) fields• Implement 2-argument constructor • Create module’s Spring Bean• Define reportable properties and associated methods• Annotate reportable properties accessors• Configure Message properties file• Override parse() method• Implement Validator interface methods
Format Module:Inheritance Hierarchy
• Inheritance– Extends org.jhove2.module.format.BaseFormatModule
– Implements org.jhove2.module.format.Validator
Format Module:Persistence requirements
• Module must be annotated with the BerkeleyDBJE @Persistent annotation
• Module must have a 0-argument constructor• Module should not contain any non-static nested (inner)
classes• Module field type must be
– “simple” Java type or– Persistent type or– Have a com.sleepycat.persist.model.PersistentProxy
implementation created for it in package org.jhove2.persist.berkeleydpl.proxies
Annotate Reportable Properties
Format Module:Persistence requirements
import com.sleepycat.persist.model.Persistent;
// Berkeley DB JE annotation@Persistent
public class CsvModule extends BaseFormatModule implements Validator{/** * No-arg constructor required by persistence layer */@SuppressWarnings("unused")private CsvModule() {
this(null, null);}…
Format Module:Non-configurable fields
@Persistentpublic class CsvModule extends BaseFormatModule implements Validator{/** Directory module version identifier. */public static final String VERSION = "n.n.n";/** Directory module release date. */public static final String RELEASE = "yyyy-mm-dd";/** Directory module rights statement. */public static final String RIGHTS = "Copyright YYYY by "+ "Copyright holder name "+ "Available under the terms of the BSD license.";/** Module validation coverage. */public static final Coverage COVERAGE = Coverage.Inclusive;/** CSV validation status. */protected Validity validity;
Format Module:Two-argument Constructor
/** * @param format * @param formatModuleAccessor */public CsvModule(Format format, FormatModuleAccessor
formatModuleAccessor) {super(VERSION, RELEASE, RIGHTS, format, formatModuleAccessor);this.validity = Validity.Undetermined;
}…
Format Module:Spring Bean
<bean id="CSVModule" class="org.jhove2.module.format.csv.CsvModule" scope="prototype"><constructor-arg ref="CommaSeparatedValuesFormat"/><!–- persistence manger bean ref; same for all format modules =<constructor-arg ref="FormatModuleAccessor"/><property name="developers">
<list value-type="org.jhove2.core.Agent"><ref bean="CSVAgent"/>
</list></property>
</bean>
<!–- Module author bean -<bean id="CSVAgent" class="org.jhove2.core.Agent" scope="singleton">
<constructor-arg type="java.lang.String" value="CSV Author Name"/><constructor-arg type="org.jhove2.core.Agent$Type" value=“Personal"/> <!-- Personal or Corporate -<property name="URI" value="http://www.csvagent.org/"/>
</bean>
Format Module: Reportable Properties:CSV Base Definition
file = [header CRLF] record *(CRLF record) [CRLF]header = name *(COMMA name)record = field *(COMMA field)name = fieldfield = (escaped / non-escaped)escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF /
2DQUOTE) DQUOTEnon-escaped = *TEXTDATACOMMA = %x2CCR = %x0D ;as per section 6.1 of RFC 2234 [2]DQUOTE = %x22 ;as per section 6.1 of RFC 2234 [2]LF = %x0A ;as per section 6.1 of RFC 2234 [2]CRLF = CR LF ;as per section 6.1 of RFC 2234 [2]TEXTDATA = %x20-21 / %x23-2B / %x2D-7E
From RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt)
Format Module: Reportable Properties: CSV Complications
• Delimiter character might be “;” instead of “,”• EOL might be “\n” instead of “\r\n”• EOL might be embedded in contents of field• Different implementations escape the escape character
differently– “” vs. \”
• Last record in file might not have EOL• All records might not have same number of fields• Some implementations trim leading/trailing whitespace in
escaped fields• Some implementations allow characters other than ASCII-
printable characters• No syntactic way to detect if first record is “header” record
Format Module: CSV Reportable Properties
• Delimiter character• EOL character(s)• Escape character• Escape character sequence within field• Number of records• Number of fields
– First record– Max– Min– Per record
• Field names from header row• Count of records with embedded EOL• Count of records with embedded escape characters• Count of records with leading/trailing whitespace in escaped fields• Does last record in file have EOL?• Does file contain characters other than ASCII-printable ones?
Format Module: CSV Reportable Properties
• Add significant properties as protected fields to module class – Might need to create ancillary @Persistent class to
reflect model of format– Class should extend org.jhove2.core.reportable.AbstractReportable
• Create public accessors for those fields• Annotate accessors with @ReportableProperty
annotation
Format Module:Reportable Properties: Fields
// Add significant properties as protected fieldsprotected String delimiterCharacter;protected String eolString;protected String escapeCharacter;protected String escapeCharacterSequenceWithinField;protected int recordCount;protected int fieldCountFirstRecord;protected int fieldCountMax;protected int fieldCountMin;protected List<Integer> fieldsPerRecord;protected List<String> fieldNames;protected int recordsWithEmbeddedEolCount;protected int recordsWithEmbeddedEscapeCharCount;protected int recordsWithUntrimmedWhitespaceCount;protected boolean eolInLastRecord;protected boolean containsNonAsciiPrintableChars;
Format Module:Reportable Properties: Accessors
// Create public accessors for reportable properties fieldspublic String getDelimiterCharacter() {...}public String getEolString() {...}public String getEscapeCharacter() {...}public String getEscapeCharacterSequenceWithinField() {...}public int getRecordCount() {...}public int getFieldCountFirstRecord() {...}public int getFieldCountMax() {...}public int getFieldCountMin() {...}public List<Integer> getFieldsPerRecord() {...}public List<String> getFieldNames() {...}public int getRecordsWithEmbeddedEolCount() {...}public int getRecordsWithEmbeddedEscapeCharCount() {...}public int getRecordsWithUntrimmedWhitespaceCount() {...}public boolean isEolInLastRecord() {...}public boolean isContainsNonAsciiPrintableChars() {...}
Format Module:Reportable Properties: Annotation
public @interface ReportableProperty { /** Default description and reference value. */ public static final String DEFAULT = "Not available."; /** * Property type: raw or descriptive. A raw property reports itself in the exact form that was found * in the source unit; a descriptive property reports itself in a more human-readable form. */ public enum PropertyType {Default, Raw, Descriptive} /** * Ordinal position of this property relative to all properties directly defined in a class. */ public int order() default 1; /** * Property reference, a citation to an external source document that defines the property. */ public String ref() default DEFAULT;
/** Property type: raw or descriptive. */ public PropertyType type() default PropertyType.Default;
/** Property description. */ public String value() default DEFAULT;}
Format Module:Reportable Properties: Annotation
@ReportableProperty( order=10, value="Character used to delimit fields in record.",
ref="RFC 1480, Section 2, paragraph 4")public String getDelimiterCharacter() {return delimiterCharacter;
}
Format Module:Reportable Message Properties
import org.jhove2.core.Message;
…// (Reportable) Message propertiesprotected Message delimiterCharNotFoundMessage;
Format Module:Configure Message Properties File
############################################################################## Message templates for class org.jhove2.module.format.csv.CsvModule# #########################################################################
org.jhove2.module.format.csv.CsvModule.DelimitorCharacterNotFoundMessage=No occurrence of delimiter character {0} found in source
#
Added to file config/messages/jhove2_messages.properities
Format Module:Message Creation
Object[]messageArgs = new Object[]{csvDelimiterChar};
delimiterCharNotFoundMessage = new Message( Severity.WARNING,
Context.OBJECT,"org.jhove2.module.format.csv.CsvModule.DelimitorCha
racterNotFoundMessage",messageArgs,jhove2.getConfigInfo());
Format Module: Override Parse() method
/** * Parse a source unit. * @param jhove2 JHOVE2 framework * @param sourceunit * @param input CSV source input * @return Number of bytes consumed * @throws EOFException * @throws IOException * @throws JHOVE2Exception */
@Override public long parse(JHOVE2 jhove2, Source source, Input input) throws IOException, JHOVE2Exception { // where the real work happens // parse the Source (take care of those CSV complications!!) // populate reportable properties // construct any Error, Warning, or Info messages return 0; }
Format Module: Override Parse() method
Some Implementation Choices:• Write from scratch
– TIFF– WAV– UTF-8– ICC
• Wrap existing JAVA library– XML– Beware of persistence traps: Inner classes, non-persistable fields
• Wrap existing non-JAVA library– SGML– Beware of performances hits (shell out) or memory leaks (JNI)
Format Module: Implement Validator methods
/* (non-Javadoc) * @see org.jhove2.module.format.Validator#getCoverage() */@Overridepublic Coverage getCoverage() {
return this.COVERAGE;}/* (non-Javadoc) * @see org.jhove2.module.format.Validator#isValid() */@Overridepublic Validity isValid() {
return this.validity;}
Format Module: Implement Validator methods
/* (non-Javadoc) * @see
org.jhove2.module.format.Validator#validate(org.jhove2.core.JHOVE2, org.jhove2.core.source.Source, org.jhove2.core.io.Input)
*/@Overridepublic Validity validate(JHOVE2 jhove2, Source source, Input
input)throws JHOVE2Exception {
//Parse might already have set validity; if not; test //reportable fields values and setif (this.validity.equals(Validity.Undetermined)){
//...}return this.validity;
}
Format Module: Unit Test
package org.jhove2.module.format.csv;import static org.junit.Assert.*;import org.junit.Before;import org.junit.Test;
public class CsvModuleTest {@Before
public void setUp() throws Exception {}@Test
public void testValidate() {fail("Not yet implemented");
}@Test
public void testParse() {fail("Not yet implemented");
}}
Format Module: Unit Test: Where it Goes
Unit tests: src/test/java/org/jhove2/module/format/csv
Sample (test) files src/test/resources/examples/csv
Spring beans for unit tests: src/test/resources/config/module/format/csv– Update Spring configuration file filepaths-config.xml with
base path of your sample file <bean id="csvDirBasePath" class="java.lang.String" >
<constructor-arg type="java.lang.String" value="examples/csv/"/>
</bean>
Format Module Artifacts:What’s Left?
• Source code• Configuration files• Sample (test) files• Documents
Format Module ArtifactsConfiguration Files
• Spring IOC Bean XML configuration files,• For Module• For unit test as needed• For assessment
• Messages properties file additions if needed• Properties files
– Displayer– Units of measure– Module-specific
Format Module: CSV Assessment Criteria
• Delimiter character=?• EOL character(s)=?• Escape character =?• Escape character sequence =?• All records have same number of columns?• Contains no escaped fields with untrimmed
whitespace?• Contains no characters other than ASCII-printable?• Contains no fields with embedded EOL?See Richard Anderson’s workshop this
afternoon!!!!
Configuration Files:“We’ve got an app for that!”
• Displayer– jhove2_dpfg.cmd (Windows)– jhove2_dpfg.sh (Unix)
• Units of measure– jhove2_upfg.cmd (Windows)– jhove2_upfg.sh (Unix)
Configuration Files:Displayer Properties
USAGE:jhove2_dpfg.cmd <fully-qualified-classname> <output-directory-path>
Configuration Files:Displayer Properties
Example:jhove2_dpfg.cmd org.jhove2.module.format.csv.CsvModule c:\props
Command line output:Succesfully created displayer property file for class org.jhove2.module.format.csv.CsvModule
File can be found at c:\props\org\jhove2\module\format\csv\CsvModule_displayer.properties
Configuration Files:Editable File
# _displayer.properties# The visibility directives control the display of the properties identified by URI# The directives can be: Always, IfFalse, IfNegative, IfNonNegative, IfNonPositive,# IfNonZero, IfPositive, IfTrue, IfZero, Never# A property is not displayed if its value is not consistent with the directive.# Negative means ...,-2,-1; NonNegative means 0,1,2...# Positive means 1,2,3,...; NonPositive means ...,-2,-1,0http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/DelimiterCharacter Always | Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/EolString Always | Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/EscapeCharacter Always | Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/EscapeCharacterSequenceWithinField Always |
Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountFirstRecord Always | Never |
IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountMax Always | Never | IfNegative |
IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountMin Always | Never | IfNegative |
IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldNames Always | Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldsPerRecord Always | Neverhttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordCount Always | Never | IfNegative |
IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordsWithEmbeddedEolCount Always | Never |
IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordsWithEmbeddedEscapeCharCount Always |
Never | IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/RecordsWithUntrimmedWhitespaceCount Always
| Never | IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZerohttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/isContainsNonAsciiPrintableChars Always |
Never | IfTrue | IfFalsehttp\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/isEolInLastRecord Always | Never | IfTrue |
IfFalse
Configuration Files:Editable File
# _displayer.properties# The visibility directives control the display of the properties
identified by URI# The directives can be: Always, IfFalse, IfNegative, IfNonNegative,
IfNonPositive,# IfNonZero, IfPositive, IfTrue, IfZero, Never# A property is not displayed if its value is not consistent with the
directive.# Negative means ...,-2,-1; NonNegative means 0,1,2...# Positive means 1,2,3,...; NonPositive means ...,-2,-1,0
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/DelimiterCharacter Always | Never
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountFirstRecord Always | Never | IfNegative | IfNonNegative | IfNonPositive | IfNonZero | IfPositive | IfZero
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/isContainsNonAsciiPrintableChars Always | Never | IfTrue | IfFalse
Configuration Files:Editable File
# _displayer.properties# The visibility directives control the display of the properties
identified by URI# The directives can be: Always, IfFalse, IfNegative, IfNonNegative,
IfNonPositive,# IfNonZero, IfPositive, IfTrue, IfZero, Never# A property is not displayed if its value is not consistent with the
directive.# Negative means ...,-2,-1; NonNegative means 0,1,2...# Positive means 1,2,3,...; NonPositive means ...,-2,-1,0
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/DelimiterCharacter Always
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/FieldCountFirstRecord IfPositive
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/isContainsNonAsciiPrintableChars IfTrue
Configuration Files:Units of Measure Properties
USAGE:jhove2_upfg.cmd <fully-qualified-classname> <output-directory-path>
Configuration Files:Units of Measure Properties
Example:jhove2_upfg.cmd org.jhove2.module.format.csv.CsvModule c:\props
Command line output:Succesfully created unit property file for class org.jhove2.module.format.csv.CsvModule
File can be found at c:\props\org\jhove2\module\format\csv\CsvModule_unit.properties
Configuration Files:Editable File
# Units of measure properties# Note: These unit of measure labels are descriptive only; changing the label# does NOT change the determination of the underlying property value.http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/
RecordCount http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/
RecordsWithUntrimmedWhitespaceCount http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/
RecordsWithEmbeddedEscapeCharCount http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/
FieldCountMax http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/
FieldCountMin http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/
FieldCountFirstRecord http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/
RecordsWithEmbeddedEolCount
Configuration Files:Editable File
# Units of measure properties# Note: These unit of measure labels are descriptive only; changing the label# does NOT change the determination of the underlying property value.
http\://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/
RecordsWithEmbeddedEolCount record
Format Module Artifacts:What’s Left?
• Source code• Configuration files• Sample (test) files• Documents
– Format Module Specification Document• “We’ve got an app for (part of) that!”
Documentation :Specification Sections
1. Introduction2. Identification3. References4. Terminology and Conventions5. Validity6. Format Profiles7. Reportable Properties8. Configuration9. Implementation Notes
Documentation :Minimal template edit
1. Introduction2. Identification3. References4. Terminology and Conventions5. Validity6. Format Profiles7. Reportable Properties8. Configuration9. Implementation Notes
Documentation :Sections from Tabular Data
1. Introduction2. Identification3. References4. Terminology and Conventions5. Validity6. Format Profiles7. Reportable Properties8. Configuration9. Implementation Notes
Documentation :Write “By Hand”
1. Introduction2. Identification3. References4. Terminology and Conventions5. Validity6. Format Profiles7. Reportable Properties8. Configuration9. Implementation Notes
DocumentationModule Specification Recipe
• Create module specification from Word Template• Generate tabular information (reportable
properties)• Use Word macro to format tabular information
for pasting into module specification• Complete other sections• Add specification document to JHOVE2 wiki
Documentation :Create Tabular Data
• Generate tabular information (reportable properties) for format module specification– jhove2_doc.cmd (Windows)– jhove2_doc.sh (Unix)
Documentation :Create Tabular Data
USAGE:jhove2_doc.cmd<fully-qualified-classname> <output-directory-path
Documentation :Create Tabular Data
• Outputs– CsvModule_id.txt
• (Section 2: Identification)
– CsvModule_ref.txt • (Section 3: References)
– CsvModule_Reportable_properties.txt• (Section 7: Reportable properties)
Documentation :Format tabular data with Macro
• Edit the output file in WordPad or NotePad to save with MS line endings)
• Follow instructions in Macro file to create formatted text
• Copy and paste in Specification document
Documentation :Create Tabular Data
IN generated file:
Property DelimiterCharacterIdentifier http://jhove2.org/terms/property/org/
jhove2/module/format/csv/CsvModule/DelimiterCharacterType java.lang.StringDescription Character used to delimit fields in
record.ReferenceRFC 1480, Section 2, paragraph 4
Documentation :Create Tabular Data
DelimiterCharacter PropertyIdentifier http://jhove2.org/terms/property/org/jhove2/module/format/csv/CsvModule/
DelimiterCharacter
Type java.lang.StringDescription Character used to delimit fields in record.Reference RFC 1480, Section 2, paragraph 4
DocumentationModule Specification Recipe
• Create module specification from Word Template• Generate tabular information (reportable
properties)• Use Word macro to format tabular information
for pasting into module specification• Complete other sections• Add specification document to JHOVE2 wiki
Questions?http://jhove2.org
[email protected]@listserv.ucop.edu
CDLStephen AbramsPatricia CruseJohn KunzeIsaac RabinovitchMarisa StrongPerry Willett
Stanford UniversityRichard AndersonTom CramerHannah Frost
PorticoJohn MeyerSheila Morrissey
Library of CongressMartha AndersonJustin Littman
With help fromWalter HenryNancy HoebelheinrichKeith JohnsonEvan Owens
Advisory BoardDeutsche NationalbibliothekDspace / MITEx LibrisFedora Commons / RutgersFlorida Center for Library AutomationHarvard UniversityKoninklijke BibliotheekNational Archives (UK)National Archives (US)National Library of AustraliaNational Library of New ZealandNationalbibliothekBibliothèque nationale de France (BnF)Planets / Universität zu KölnTessella