1
Using Scalable and Secure Web Using Scalable and Secure Web Technologies to Design Global Format Technologies to Design Global Format
RegistryRegistry
Muluwork Geremew, Sangchul Song and Joseph JaJa
Institute for Advanced Computer Science Studies
Department of ECE, University of Maryland
Sponsored by Library of Congress and NSF
2
MotivationMotivation• Handling of digital formats is an essential
part of long-term preservation.• Format obsolescence
– Technology evolution and the obsolescence of systems and applications software may leave users unable to access their old files.
– Software developers may go out of business and no longer support the applications.
• Digital preservation requires– Different essential aspects of objects. – Tools for capturing the essential format
characteristics of information stored as digital object and processing it.
3
Existing MethodologiesExisting Methodologies
• Standardizing the digital contents to few common formats.– JPEG2000, OMF, and PDF/A are among the few
selected open standard formats.
• Migration– Transforms older versions to newer formats.– Tends to be costly and prone to errors.
• Emulation– The original bit-streams are executed using an
emulator.– Implementing such a strategy is extremely
challenging and can be viewed as a transformation.
4
Our GoalOur Goal• A flexible framework for incorporating advances
achieved through the existing approaches.
• Development of an efficient, scalable and platform independent prototype to enable the tracking and handling of format obsolescence.– Development of a Global Digital Format Registry
(GDFR) – FOrmat CUration Service (FOCUS)– Development of enabler modules that can interface
between GDFR and end-user applications.
5
FOCUS ArchitectureFOCUS Architecture
6
FOCUS on LDAP and SOAPFOCUS on LDAP and SOAP
• Interoperability– Protocols are platform independent
• Performance– Most operations are read-only queries. LDAP gives high
performance in this environment.
• Extensibility– LDAP schema can be easily extended
• Scalability– By the use of Distributed LDAP
• Security– SOAP can be on top SSL (https)– LDAP-based Format Registry can be easily integrated
with any other LDAP-based authentication/authorization mechanisms.
7
Global Digital Format RegistryGlobal Digital Format Registry
• GDFR serves to provide detailed information about formats.
• Existing Format Registries:– UPenn’s FRED- (http://tom.library.upenn.edu/fred)
– Pronom- (http://www.nationalarchives.gov.uk/pronom/)
– Wotzit’s Format- (http://www.wotsit.org)
• Not clear how extensible, scalable, or how they can be interfaced with existing preservation systems.
8
FOCUSFOCUS
• The registry contains information– File formats– Software tools
• Multiple ways to access GDFR in FOCUS are provided.– Directly through LDAP interface– Indirectly through SOAP interface
WebServiceAgent
GlobalDigitalFormatRegistry
Software
Software
9
GDFR-Internal StructureGDFR-Internal Structure
dc=umiacs, dc=umd, dc=edu
ou=Format-Registry
ou=Applications ou=Formats
Adobe Acrobat v6.0
Adobe Photoshop v7.0
Adobe PDF v1.4
CompuServ GIF 1989a
JPEG Image Format 2000 Jhove 1.0
General descriptive General descriptive properties.properties.Processing: rendering, Processing: rendering, editing, conversion and editing, conversion and validation validation services/systemsservices/systems..
General General descriptive descriptive properties.properties.Processing : Processing : format taken as format taken as input and/or output. input and/or output.
10
Web-Service AgentWeb-Service Agent
• Mediator between user and registry• Serviced via SOAP• Contains a file format identifier module, FIDER
– Java module for format identification– Uses file magic number– Sequential from restrictive to general
WebServiceAgent
GlobalDigitalFormatRegistry
Client
FormatInquiry
11
Web-Service AgentWeb-Service Agent
• Tailorability– Specific needs of an existing preservation
system can be met by custom-tailoring Web-Service.
• Interoperability – Independent of OS and languages
• Convenience – Multiple LDAP queries can be reduced to one
Web Service function call. – Any updates can be done in a single place, not
having to distribute new modules to end users
12
FOCUS- Supplementary ToolsFOCUS- Supplementary Tools
• Validation Software– Verifies and validates file formats of given file.
• Rendering Software– Interprets bit streams of files into human-
friendly representation on the screen.
• Editing Software– Adds/Deletes/Modifies the contents of given
file, keeping the correct file format.
• Conversion Software– Converts a file format to current or emerging
formats.
13
Validation Software
Validation Software
Conversion Software
Conversion Software
WebServiceAgent
Identification Service
Identification Service
RenderingSoftware
RenderingSoftware
FOCUS Service ModelFOCUS Service Model
FormatRegistry
Identifies format of a specific DO using the internal signature
Determines a verification service to verify the format of a specific DO
Identifies current rendering conditions for specific digital format.
Locates transformation services to convert DO from source format to format of interest.
14
Example Scenario: Digital Object Example Scenario: Digital Object Format VerificationFormat Verification
Validation Service
Validation Service
Conversion service
Conversion service
WebServiceAgent
ID Service
ID Service
RenderingService
RenderingService
FormatRegistry
Format ?Format ?
Format ID / Format Info
Verifier?
App ID / App Info
Verify this?
Verify this?
Valid/Well-formed
Step 1: User requests to identify the format a file via Web Service
Step 2: Registry returns format ID and format information
Step 3: User requests for information on available verifier for this formatStep 4: Registry returns validation service ID and information, such as its service location
Step 5: User connects to the validation service and verify the formatStep 6: Validation service returns the
verification result
WebServiceAgent
FormatRegistry
15
DemoDemo
16
ConclusionConclusion• FOCUS design offers maximum
– Flexibility – Web Service Agent can be easily tailored to
meet the various needs of different preservation institutions.
– Scalability – Format registry can also be distributed.
• FOCUS integrates current format preservation techniques and makes them available through SOAP-based web interface.
• In summary, we believe that the FOCUS prototype represents a significant advance towards the development of secure and scalable digital format registry.