Foundations of Foundations of ExcellenceExcellence
DSpace vs Fedora: Or what I DSpace vs Fedora: Or what I do on my summer vacationdo on my summer vacation
TRLN: Staff Enrichment Series: 8 Nov, 2007
ObjectivesObjectives
Background: Why we even Background: Why we even considered a digital repositoryconsidered a digital repository
FOE – version 1FOE – version 1 DSpace & Fedora: 50,000 foot viewDSpace & Fedora: 50,000 foot view FOE – version 2FOE – version 2 FOE – version 3FOE – version 3 Where to from here?Where to from here?
TRLN: Staff Enrichment Series: 8 Nov, 2007
BackgroundBackground
TRLN: Staff Enrichment Series: 8 Nov, 2007
7575thth Anniversary Anniversary
Duke University School of Medicine Duke University School of Medicine established in 1930established in 1930
2005 – year-long celebration2005 – year-long celebration New published historyNew published history Articles, videos, speechesArticles, videos, speeches Alumni weekend gala eventAlumni weekend gala event
Josiah C. Trent Foundation GrantJosiah C. Trent Foundation Grant
TRLN: Staff Enrichment Series: 8 Nov, 2007
Digitization ProjectDigitization Project
500 images documenting the first 3 500 images documenting the first 3 decades of the School of Medicine decades of the School of Medicine and Hospitaland Hospital
Image groups:Image groups: BuildingsBuildings EducationEducation EventsEvents ClinicalClinical PeoplePeople TechnologyTechnology
TRLN: Staff Enrichment Series: 8 Nov, 2007
Digitization Project Digitization Project (cont.)(cont.)
Selection – Whole staffSelection – Whole staff Digitization – Outsourced to Digitization – Outsourced to
University PhotographyUniversity Photography Description – Technical services and Description – Technical services and
Reference coordinatorsReference coordinators Subject terms – Technical services Subject terms – Technical services
coordinator, Head, Cataloging coordinator, Head, Cataloging services.services.
Controlled vocabulary – Notetab Controlled vocabulary – Notetab templates and librariestemplates and libraries
FOE1.0FOE1.0
XML, XSLT, and PostgresqlXML, XSLT, and Postgresql
TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE1.0FOE1.0
600 images = 600 xml files = 2 xslt 600 images = 600 xml files = 2 xslt stylesheetstylesheet
Xml = Xml = EAD2002EAD2002 XSLT = 1) convert xml to html; 2) XSLT = 1) convert xml to html; 2)
convert xml to SQL statementsconvert xml to SQL statements Postgresql database used only for Postgresql database used only for
search search Result Result
http://archives.mc.duke.edu/projects/bhttp://archives.mc.duke.edu/projects/bld/bld00012.htmlld/bld00012.html
TRLN: Staff Enrichment Series: 8 Nov, 2007
IssuesIssues
SQL search statements worked…notSQL search statements worked…not No indexing by search enginesNo indexing by search engines JDBCJDBC I am not a programmerI am not a programmer Definite need for improvementsDefinite need for improvements
TRLN: Staff Enrichment Series: 8 Nov, 2007
DSpace & Fedora:DSpace & Fedora:A Birds-eye ViewA Birds-eye View
TRLN: Staff Enrichment Series: 8 Nov, 2007
Need for a Digital Need for a Digital RepositoryRepository
DSpaceDSpace First released in 2002. Developed by MIT First released in 2002. Developed by MIT
Libraries and Hewlett-Packard (Libraries and Hewlett-Packard (USA TodayUSA Today)) Current version (Current version (downloaddownload)) Optimal performance in a *nix environment, Optimal performance in a *nix environment,
but should operate in any environmentbut should operate in any environment Written in Java Written in Java VERY active listservsVERY active listservs Manakin – TAMU created “front-end” which Manakin – TAMU created “front-end” which
makes for easier UI localizationmakes for easier UI localization
TRLN: Staff Enrichment Series: 8 Nov, 2007
Need for a Digital Need for a Digital Repository (cont.)Repository (cont.)
FEDORA FEDORA (Flexible Extensible Digital Object and Repository (Flexible Extensible Digital Object and Repository Architecture)Architecture)
Began as a DARPA and NSF-funded research Began as a DARPA and NSF-funded research project at Cornell in 1997project at Cornell in 1997
2001, UVA and Cornell: $1M Mellon grant 2001, UVA and Cornell: $1M Mellon grant 1.0 released 20031.0 released 2003 Current version 2.2.1 (Current version 2.2.1 (downloaddownload)) Optimal performance in a *nix env, but will run on Optimal performance in a *nix env, but will run on
Windows based systemsWindows based systems Written in JavaWritten in Java Several front-end tools developed. (more in a Several front-end tools developed. (more in a
moment)moment)
TRLN: Staff Enrichment Series: 8 Nov, 2007
Side by side testingSide by side testing
Testing environment:Testing environment: Lenovo T60, 120 G hard drive, 2 G Lenovo T60, 120 G hard drive, 2 G
memory, Fedora 7, 2.6.23 kernel, java memory, Fedora 7, 2.6.23 kernel, java 1.51.5
TRLN: Staff Enrichment Series: 8 Nov, 2007
RequirementsRequirements
DSpaceDSpace Java1.4 +Java1.4 + Apache Ant 1.6.2 +Apache Ant 1.6.2 + Postgresql 7.3 + Postgresql 7.3 +
(or Oracle 9 +)(or Oracle 9 +) Jakarta Tomcat Jakarta Tomcat
4.x/5.x (I used 6.x)4.x/5.x (I used 6.x) Can also run on Can also run on
Jetty or Caucho Jetty or Caucho ResinResin
FedoraFedora JDK 1.5 +JDK 1.5 +
OptionalOptional MySQLMySQL PostgresqlPostgresql Oracle 9Oracle 9 Jakarta TomcatJakarta Tomcat Ant 1.6.5 + if Ant 1.6.5 + if
building from source building from source codecode
TRLN: Staff Enrichment Series: 8 Nov, 2007
File Size & Download File Size & Download timestimes
DSpaceDSpace 16 mb16 mb 1:43 over a T1 line1:43 over a T1 line 1:13 on a T line1:13 on a T line
FedoraFedora 72 mb72 mb 7:49 over a T1 line7:49 over a T1 line 1:53 over a T line1:53 over a T line
TRLN: Staff Enrichment Series: 8 Nov, 2007
Installation timeInstallation time
DSpaceDSpace Postgresql installation Postgresql installation
and set up: 8 minutesand set up: 8 minutes Ant build and Ant build and
configuration: 8 configuration: 8 minutesminutes
DSpace/Tomcat DSpace/Tomcat configuration and configuration and deployment: 8 deployment: 8 minutesminutes
Total time to live: 24 Total time to live: 24 minutesminutes
FedoraFedora Postgresql Postgresql
installation and set installation and set up: 8 minutesup: 8 minutes
Fedora install: 5 Fedora install: 5 minutesminutes
Total time to live: Total time to live: 13 minutes13 minutes
TRLN: Staff Enrichment Series: 8 Nov, 2007
Initial Live ViewInitial Live View
DSpaceDSpace Front Page Front Page
FedoraFedora Front PageFront Page
FOE2.0FOE2.0
Choosing our Digital Choosing our Digital RepositoryRepository
TRLN: Staff Enrichment Series: 8 Nov, 2007
Deciding FactorsDeciding Factors DSpaceDSpace
Off-the-shelf viewOff-the-shelf view Workflow processWorkflow process Individual submitters, Individual submitters,
one project adminone project admin Item submission form Item submission form
(link here)(link here) Bulk load script (dc, Bulk load script (dc,
item, mapfile)item, mapfile) Searchbot harvestableSearchbot harvestable OAI harvestableOAI harvestable
FedoraFedora Off-the-shelf viewOff-the-shelf view One submitterOne submitter Item submission not Item submission not
intuitive (link)intuitive (link) Bulk load script (foxml)Bulk load script (foxml) Content Models (will Content Models (will
return)return) DissemenatorsDissemenators Behavior Definitions Behavior Definitions Would require Would require
extensive programmingextensive programming
TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE2.0 = DSpaceFOE2.0 = DSpaceCup is Half FullCup is Half Full
March 2006March 2006 Foundations new homeFoundations new home Data submission formData submission form Item View Item View bld00012bld00012 Item UpdateItem Update Access RestrictionsAccess Restrictions Handle serverHandle server
TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE2.0 = DSpaceFOE2.0 = DSpaceCup is Half EmptyCup is Half Empty
Object is entered as one itemObject is entered as one item DSpace is self-containedDSpace is self-contained No real way to show complex relationshipsNo real way to show complex relationships All or nothing metadataAll or nothing metadata Access RestrictionsAccess Restrictions Handle serverHandle server Searchbot indexing: Searchbot indexing:
DSpace@DukeMed: Item 2193/77DSpace@DukeMed: Item 2193/77Title:, Title:, A. Jack TannenbaumA. Jack Tannenbaum. Issue Date:, 10-Nov-2005 . Issue Date:, 10-Nov-2005 ...... Abstract:, Abstract:, A. Jack TannenbaumA. Jack Tannenbaum received his medical degree received his medical degree from Duke University in 1935. from Duke University in 1935. ......
FOE3.0FOE3.0
““Our goal is to never be Our goal is to never be satisfied”satisfied”
Content ModelsContent Models
Reusing datastreamsReusing datastreams(next 2 slides borrowed from EDUCASE (next 2 slides borrowed from EDUCASE 2004 presentation by Grizzle, Wayland, 2004 presentation by Grizzle, Wayland,
and Wilper)and Wilper)
TRLN: Staff Enrichment Series: 8 Nov, 2007
Atomistic ModelAtomistic ModelT E I e te x t o fa 3 -p a g e le tte r
Persistent ID (P ID)
D is s e m in a to rs
System Metadata
T e xtT e xtT e xtT e xt< im a g e p o in te r ta g >
T e xtT e xtT e xtT e xtT e xtT e xtT e xtT e xt
< im a g e p o in te r ta g >T e xtT e xtT e xtT e xtT e xtT e xtT e xtT e xt
< im a g e p o in te r ta g >
P a g e 1 im a g e
P a g e 2 im a g e
P a g e 3 im a g e
Persistent ID (PID)
D i s s e m i nato r s
System Metadata
i m ag e
Persistent ID (PID)
D i s s e m i nato r s
System Metadata
i m ag e
Persistent ID (PID)
D i s s e m i nato r s
System Metadata
i m ag e
TRLN: Staff Enrichment Series: 8 Nov, 2007
Compound ModelCompound ModelT E I etex t of
a 3 -p ag e letter
Persistent ID (PID)
D is s e m in a to rs
System Metadata
T e xtT e xtT e xtT e xt< im a g e p o in te r ta g >
T e xtT e xtT e xtT e xtT e xtT e xtT e xtT e xt
< im a g e p o in te r ta g >T e xtT e xtT e xtT e xtT e xtT e xtT e xtT e xt
< im a g e p o in te r ta g >
S creen size im age fo r page 1
S creen size im age fo r page 2
S creen size im age fo r page 3
TRLN: Staff Enrichment Series: 8 Nov, 2007
An old favorite blanketAn old favorite blanket
2005-2007 Fedora minimally utilized2005-2007 Fedora minimally utilized Primarily used for archiving Library Primarily used for archiving Library
Administrative documents (Council and Administrative documents (Council and Management Team minutes, and Management Team minutes, and Policies and procedures)Policies and procedures)
Use of XACML policies to restrict Use of XACML policies to restrict access (156\.16\.\d{1,3}\.\d{1,3} lock access (156\.16\.\d{1,3}\.\d{1,3} lock down)down)
Began looking at front-end GUIsBegan looking at front-end GUIs
TRLN: Staff Enrichment Series: 8 Nov, 2007
Front End toolsFront End tools FezFez – – A web front-end management system for Fedora that is A web front-end management system for Fedora that is
developed in PHP. Fez functionality includes: Web-based developed in PHP. Fez functionality includes: Web-based browsing and searching; Semi-advanced searching; Complex browsing and searching; Semi-advanced searching; Complex security; Basic image handling; Dublin Core. security; Basic image handling; Dublin Core. http://http://espace.library.uq.edu.au/documentation/espace.library.uq.edu.au/documentation/
ElatedElated - - ELATED is a lightweight, general-purpose application ELATED is a lightweight, general-purpose application for managing digital files. ELATED is built on top of the Fedora for managing digital files. ELATED is built on top of the Fedora Repository system, and can be used as a digital assets Repository system, and can be used as a digital assets management system, an institutional repository, or to meet management system, an institutional repository, or to meet other collection archiving, publishing and searching needs. other collection archiving, publishing and searching needs. Dublin Core metadata entry and search; Custom metadata by Dublin Core metadata entry and search; Custom metadata by collection; Automatic previews for images; Collections with collection; Automatic previews for images; Collections with simple editorial workflow; Indexing and searching of content; simple editorial workflow; Indexing and searching of content; User feedback, enabled by collection; Select and import existing User feedback, enabled by collection; Select and import existing Fedora objects Fedora objects http://elated.sourceforge.net/ http://elated.sourceforge.net/
Both require extensive programming for localizationBoth require extensive programming for localization
TRLN: Staff Enrichment Series: 8 Nov, 2007
External Forces at playExternal Forces at play
Fall 2006 we began a project to Fall 2006 we began a project to digitize 10,000+ cytopathology slides. digitize 10,000+ cytopathology slides. Images converted to JPEG2000 to increase user Images converted to JPEG2000 to increase user
experience (experience (exampleexample)) Archives purchased Aware JPEG2000 Image Archives purchased Aware JPEG2000 Image
ServerServer
History of Medicine image database, History of Medicine image database, Historical Images in Medicine (HIM) Historical Images in Medicine (HIM) needed new platformneeded new platform
TRLN: Staff Enrichment Series: 8 Nov, 2007
Call out of the blueCall out of the blue
VTLS – VitalVTLS – Vital Open RepositoriesOpen Repositories
TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE3.0 = Fedora/VitalFOE3.0 = Fedora/VitalCup is Half FullCup is Half Full
June 2007June 2007 Foundations new home (link)Foundations new home (link) Data submission (3 ways to enter items)Data submission (3 ways to enter items) Item View Item View bld00012bld00012 Object is entered as many datastreams (fedora view)Object is entered as many datastreams (fedora view) Vita/Fedora/Aware…interoperabilityVita/Fedora/Aware…interoperability Complex relationshipsComplex relationships Multiple metadata streamsMultiple metadata streams Handle serverHandle server Searchbot indexing:Searchbot indexing:
A. Jack Tannenbaum. | MeDSpaceA. Jack Tannenbaum. | MeDSpaceDescription: Description: A. Jack TannenbaumA. Jack Tannenbaum received his medical received his medical degree from Duke University in 1935. degree from Duke University in 1935. ...... per00165, per00165, A. Jack A. Jack TannenbaumTannenbaum. 302.3 kB, JPEG 2000 Image . 302.3 kB, JPEG 2000 Image ......
TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE3.0 = Fedora/VitalFOE3.0 = Fedora/VitalCup is Half EmptyCup is Half Empty
Fedora is open source, Vital is notFedora is open source, Vital is not Customization possible with Customization possible with
programming knowledgeprogramming knowledge No way at this time to implement No way at this time to implement
xacml policies (work arounds exist)xacml policies (work arounds exist) Vital upgrades require full software Vital upgrades require full software
installationinstallation Local customization can cause breaks Local customization can cause breaks
in certain functionsin certain functions
Conclusions and Conclusions and obligatory linksobligatory links
TRLN: Staff Enrichment Series: 8 Nov, 2007
Selected LinksSelected Links
DSpace – http://dspace.orgDSpace – http://dspace.org
Manakin - http://di.tamu.edu/projects/xmlui/installManakin - http://di.tamu.edu/projects/xmlui/install
Fedora – http://www.fedora-commons.org/Fedora – http://www.fedora-commons.org/
Elated - http://elated.sourceforge.net/Elated - http://elated.sourceforge.net/
Fez - http://espace.library.uq.edu.au/documentation/Fez - http://espace.library.uq.edu.au/documentation/
Vital – http://vtls.comVital – http://vtls.com
DSpace@DukeMed – http://dspace.mclibrary.duke.eduDSpace@DukeMed – http://dspace.mclibrary.duke.edu
MeDSpace – MeDSpace – http://medspace.mc.duke.edu/vital/access/manager/Indexhttp://medspace.mc.duke.edu/vital/access/manager/Index