versioning of digital objects in a fedora-based repository matthias razum fiz karlsruhe dorsdl...
Post on 20-Dec-2015
228 views
TRANSCRIPT
Versioning of Digital Objects in a Fedora-based Repository
Matthias Razum
FIZ Karlsruhe
DORSDL Workshop
Alicante
September 21, 2006
2 September 21, 2006ECDL – DORSDL Workshop, Alicante
Outline
• Motivation• Versioning Concepts in eSciDoc• Content Models• Technical Approach• Conclusion
3 September 21, 2006ECDL – DORSDL Workshop, Alicante
• eSciDoc is a joint project of the Max-Planck-Society (MPS) and FIZ Karlsruhe
• 6 million € five-year grant (2004 – 2009) from the German Federal Ministry of Education and Research
• It aims to build an integrated information, communication and publishing platform for web-based scientific work, exemplarily demonstrated for multi-disciplinary applications in the MPS
• eSciDoc is not a mere research project, but aims at establishing an innovative productive system
Project Setup and Mission
4 September 21, 2006ECDL – DORSDL Workshop, Alicante
Repositories for eScience
• The contents of an institutional repository or a digital library form the ‘institutional memory’ of an organization
• And just like human memory, they should allow for associating information objects in novel contexts, thus creating new scholarship
• Interdisciplinary work is becoming increasingly important, so systems have to span scientific disciplines
• Repositories should be open, application-independent and flexible, thus laying the ground today for repurposing the information in future applications
5 September 21, 2006ECDL – DORSDL Workshop, Alicante
Turning Static Objects into ‘Living’ Knowledge
• e-Scholarship allows to publish all intermediate results of knowledge generation from first ideas, theories, discussions with peers to final results
• Institutional Repositories and Digital Libraries need to support scholars already in the early steps of this process, thus enabling their users to share their work in progress with peers
• Thinking a step further leads to interactive authoring environments with support for collaboration and annotations
• As a result, objects loose their static nature and become ‘active nodes’ in a network of knowledge
6 September 21, 2006ECDL – DORSDL Workshop, Alicante
Implications
• The concept of ‘ownership’ of an artifact is loosened and partly replaced by an ongoing authoring process which spans persons, places, and time
• Collaborative authoring raises an issue familiar to software developers: versioning of digital objects
• All intermediate or working versions of artifacts should become part of the repository, not just the final versions
• Good Scientific Practice requires provenance data for objects and versioning
7 September 21, 2006ECDL – DORSDL Workshop, Alicante
Outline
• Motivation• Versioning Concepts in eSciDoc• Content Models• Technical Approach• Conclusion
8 September 21, 2006ECDL – DORSDL Workshop, Alicante
Versioning on Object Level
• Fedora’s basic object model – as defined in FOXML – is composed of an identifier, some key descriptive properties and a set of datastreams
• Currently, each change to a datastream leads to a new version of the datastream, but not of the object itself.
• On the other hand, authors and editors perceive objects as one coherent entity, not as a set of datastreams.
• They request a ‘whole-object’ versioning which complies with their mental model.
9 September 21, 2006ECDL – DORSDL Workshop, Alicante
Fixed and Floating Object References
• Scholarly work strongly relies on citations and external references to existing material (e.g. primary data and supplementary material)
• In the context of digital repositories, these associations are
expressed as object relations.
• Versioning of objects then raises the question how to handle relations pointing to a versioned object.
• eSciDoc implements two approaches: fixed relations pointing exactly to a given version of an object and floating relations which always point to the latest version of an object.
10 September 21, 2006ECDL – DORSDL Workshop, Alicante
Internal and Public Versions
• Versions represent intermediate work statuses and are only visible to authors of digital objects
• Revisions are published versions of objects with persistent identifiers.
• Creating a revision is an intellectual step which most often includes some form of quality assurance, whereas versioning is an automated process.
11 September 21, 2006ECDL – DORSDL Workshop, Alicante
Container Objects
• eSciDoc allows the grouping of objects by means of container objects like collections or bundles.
• A change to one of the contained objects substantially changes the container object as well. Therefore, any change to a contained object should lead to a new version of the container object.
• The same applies to revisioning: container objects are citable objects with their own persistent identifier. Revisioning of contained objects forces a new revision of the container object too.
12 September 21, 2006ECDL – DORSDL Workshop, Alicante
Outline
• Motivation• Versioning Concepts in eSciDoc• Content Models• Technical Approach• Conclusion
13 September 21, 2006ECDL – DORSDL Workshop, Alicante
Content Models in General
• An important part of implementing a Fedora repository is modeling different classes or “genre” of digital object that will be created, stored, and managed in the repository.
• A content model will typically describe the following: – Datastream composition
• the number and kinds of datastreams that must be present in the digital object
• the format(s) for those datastreams, either MIME or format identifiers• whether each kind of datastream is required or optional• whether each kind of datastream has cardinality contraints
– Semantic identifiers for each kind of datastream relationships• in the cases where a content model is a “graph” of related content
models
– Disseminators (optional)
15 September 21, 2006ECDL – DORSDL Workshop, Alicante
Structural View of Content Item
Content Item
hasRevision
*
Content Component
hasComponent*
CC License
hasLicense*
License
hasLicense
*
Metadata
hasMD*
EssentialProperties
hasProperties1
eSciDoc Metadata
hasDefaultMD1
CC Metadata
1
hasMD
16 September 21, 2006ECDL – DORSDL Workshop, Alicante
Content Item Modeled as Fedora Object
Content Component
RELS-EXT
CC MD
License1
...
Licensen
Content Stream
hasComponent *Content Item
RELS-EXT
eSciDoc MD
MD1
...
MDn
WOV MD
17 September 21, 2006ECDL – DORSDL Workshop, Alicante
Container Modeled as Fedora Object
Content Item
RELS-EXT
eSciDoc MD
MD1
...
MDn
WOV MD
hasMember *Container
RELS-EXT
eSciDoc MD
MD1
...
MDn
Structure Map
WOV MD
18 September 21, 2006ECDL – DORSDL Workshop, Alicante
Outline
• Motivation• Versioning Concepts in eSciDoc• Content Models• Technical Approach• Conclusion
19 September 21, 2006ECDL – DORSDL Workshop, Alicante
Whole-Object Versioning Metadata
• Fedora versioning works automatically within objects
• The eSciDoc middleware keeps track of whole object versions via objectVersion metadata
• The eSciDoc middleware also can tag particular whole object versions as “revisions” which will be official published views of the object
20 September 21, 2006ECDL – DORSDL Workshop, Alicante
Animated View
t0 t1 t2 t3 t4
ContentItem
CC1
PID: parent:1VersionID: 1.0DOI: --
PID: child:1Version: t0
PID: child:2Version: t0
PID: parent:1VersionID: 1.1DOI: --
PID: child:1Version: t0
PID: child:2Version: t1
PID: parent:1VersionID: 1.2DOI: --
PID: child:1Version: t0
PID: child:2Version: t1
PID: child:3Version: t2
PID: parent:1VersionID: 1.3DOI: x.y/rev:1
PID: child:1Version: t0
PID: child:2Version: t1
PID: child:3Version: t2
PID: parent:1VersionID: 1.4DOI: --
PID: child:1Version: t4
PID: child:2Version: t1
PID: child:3Version: t2
CC2
CC3
Revision
21 September 21, 2006ECDL – DORSDL Workshop, Alicante
Object Version XML
<objectVersion versionID=”1.0”>
<comment> this is the first whole object version </comment>
<component PID=”child:5” dateTime=”2006-05-10T12:21:57Z”/>
<component PID=”child:6” dateTime=”2006-05-10T12:21:57Z”/>
</objectVersion>
<objectVersion versionID=”1.1” revisionID=”doi:10.11.1234”>
<comment>demo:5 is the same; demo:6 modified; demo:7 ingested </comment>
<component PID=”child:5” dateTime=”2006-05-10T12:21:57Z”/>
<component PID=”child:6” dateTime=”2006-08-11T09:23:09Z”/>
<component PID=”child:7” dateTime=”2006-08-11T09:23:09Z”/>
</objectVersion>
22 September 21, 2006ECDL – DORSDL Workshop, Alicante
Outline
• Motivation• Versioning Concepts in eSciDoc• Content Models• Technical Approach• Conclusion
23 September 21, 2006ECDL – DORSDL Workshop, Alicante
Conclusion
• Versioning is essential for repositories which cover the whole object lifecycle
• Fedora already comes with a powerful versioning mechanism, but cannot fulfill all requirements of eSciDoc
• Atomistic content models make versioning even more complex
• The proposed approach provides a solution for advanced versioning requirement and at the same time is a demonstration of Fedora’s flexibility and adaptability
24 September 21, 2006ECDL – DORSDL Workshop, Alicante
Acknowledgements
The concepts in this presentation are based on
• eSciDoc’s Logical Data Model, created by Natasa Bulatovic (ZIM, Max Planck Society)
• a joint workshop of ZIM and FIZ with Sandy Payette and Carl Lagoze
Questions
[email protected]/homepage.html