-
A strategic view of document and digital object managementfor the University of the Witwatersrand,
Johannesburg
Prof Derek W. KeatsDeputy Vice Chancellor
(Knowledge & Information Management)The University of the Witwatersrand, Johannesburg
http://[email protected]
-
-
Whataredocuments?
How does
the computer
'see' them?
-
Thestorage
view
-
Themanipulation
view
-
Thestructural
view
-
Theoperational
view
-
Thestorage
view
Theoperational
view
Themanipulation
view
Thestructural
view
-
Require softwarethat understandsthe 'document' andknows how to present it.
The storage view
The operational view
The manipulation view
The structural view
Time Time Time
-
The futureToday
Physicaldeterioration
Digitalobsolescence
Accidentaldamage
Loss of metadata
Survival
Devices
File formats
-
A major threat to
proprietaryfile formatscommon inproprietary
systemsToday
Physicaldeterioration
Digitalobsolescence
Accidentaldamage
Loss of metadata
Survival
Devices
File formats
-
Device obsolescence
-
File format obsolescence
Software supporting the
format fails in the marketplace or is
bought by a competitor and
withdrawn.
-
File format obsolescence
Software upgrades fail to support legacy files
The format itself is superseded by
another or evolves in complexity
The format "take up" is low or
industry fails to create compatible
software
The format fails, stagnates, or is no longer compatible with the current
environment
-
>
A small subset of commonly used media formats!
Media
-
If you don't have the software,even a perfectly preserved document is of no use.
-
Digitization
Documentmanagement
Borndigital
Digitalrecovery
Digital archiving
Digital preservation
Ana
logu
eD
igita
l
Time
Dig
ital
asse
ts
Risk without long term planning
-
As a componentof how we manageour digital assets
-
Why digital asset management?
● We are a knowledge organization● Knowledge workers spend 30-40%
of their time on document related tasks● This increases significantly when
other digital assets are taken into consideration● Digital assets are increasing and increasingly
easy to lose● Digital assets form the basis of much of our
research
-
Digital archiving and preservation● Institutional papers and documents
Other digital assets
● Historical papers● Library collections● Various history projects● Rockart collections ● Video and audio collections
● e.g. Wits TV● Donations of significant collections
from industry● History of human evolution research● Research output and theses● Research data
-
The curse of the born-analogue
-
Capture
Create
Classify
Share
Archive
DestroyProtect
Retain
Find &use
Preserve
Route
-
Creating semanticand socially connected
document storesarchives
repositoriesmuseumsherbaria
21st Century
-
Chisimba
Semantic and social 'X'● Fedora commons● Fedora commons
SWORD API● Chisimba
Fedora Commons
SWORD API
Chisimba API
XMPPeLearning'Portals'
-
Workflow
WEWE
-
Workflow
-
WeWe Basics● Rules-driven workflow engine● Rules represented in XML● Sequential event support● Conditional Return support● Written in Perl● Uses PostgreSQL Database● Open Source ● Originally developed for The University of the
Witwatersrand, Johannesburg● Multiple Management interfaces
-
WeWe Designer● Web-based design tool for designing
workflows● Supports multiple events with multiple return
types/states● Drag and drop interface● Written in JQuery● Open Source Interface● Adapt from Design “Template” support
-
WeWe Developer● Developers create Rules Modules● Modules can be written in Perl or any other
language that can be executed from the Linux commandline
● API● Commandline Interface
-
Workflow Process
-
Enterprise document managementAn approach using private cloud
Folderserver WEWE Chisimba
Private cloud infrastructure
Site
Ingest
Bor
ndi
gita
l
Sharedfolder
Network
WEWE
NetworkSite Site
Site
Sharedfolder
WWW
WEWEWorkflow managed by WEWE layer
-
Hostedservices
Digitalarchive
Virtualization
ChisimbaFedora
ChisimbaOther
Private cloud infrastructure
Witsportals eLearning
OS: Open Solaris
SOA layer
email
Zimbra
iRODS
Remotesite
Remotesite
Remotesite
Remotesite
WEWE
Compute cloud
Hierarchical storageRobotictape library Spinning disks
Flash memory
-
Computecloud
Storagecloud
Robotictape
library
Digitalarchive
Fedora
WEWE
ChisimbaArchon
Private cloud infrastructureUse in establishing digital archive
W
EW
E ru
les
Inge
st
Sou
rce
artif
acts
Dig
ital
conv
ersi
onRemote
site
Ingest
Sourceartifacts
Digitalconversion
WEWE rules
Remotesite
Borndigital
Docs
Aud
ioV
ideo
etcSOA layer
OS: Open Solaris
First tier storage
-
Computecloud
Storagecloud
Robotictape
library
Digitalarchive
Fedora
WEWE
ChisimbaArchon
Private cloud infrastructureUse in establishing digital archive
W
EW
E ru
les
Inge
st
Sou
rce
artif
acts
Dig
ital
conv
ersi
onRemote
site
Ingest
Sourceartifacts
Digitalconversion
WEWE rules
Remotesite
Borndigital
Docs
Aud
ioV
ideo
etcSOA layer
OS: Open Solaris
First tier storage
Scanning &assembly
-
#!/bin/bash#Scan in the pagesscanadf mode "Black & White" resolution 200
#Convert each page to a pdf filedoconvert $file $file.pdfrm $filedone
#Concatenate all the individual pdf files pdftk image*.pdf cat output $1.pdfrm image*.pdfmv *.pdf /home/$USER/monitored/outgoing/ .
exit 0
The real challengeis getting the documentscanned and into aPDF and sent off to somewhere meaningful.
Thats why we needexpensive documentimaging software.
Right?
-
Let's have one digital asset management project for Wits and let us create the synergy
that leads to innovation.
-
-
Attribution file: http://www.dkeats.com/usrfiles/users/ 1563080430/attribution/attrib.txt
http://www.dkeats.com/usrfiles/users/
Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26WeWe BasicsWeWe DesignerWeWe DeveloperWorkflow ProcessSlide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38