foundations of excellence dspace vs fedora: or what i do on my summer vacation

33
Foundations of Foundations of Excellence Excellence DSpace vs Fedora: Or what DSpace vs Fedora: Or what I do on my summer I do on my summer vacation vacation

Upload: julie-george

Post on 26-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

Foundations of Foundations of ExcellenceExcellence

DSpace vs Fedora: Or what I DSpace vs Fedora: Or what I do on my summer vacationdo on my summer vacation

Page 2: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

ObjectivesObjectives

Background: Why we even Background: Why we even considered a digital repositoryconsidered a digital repository

FOE – version 1FOE – version 1 DSpace & Fedora: 50,000 foot viewDSpace & Fedora: 50,000 foot view FOE – version 2FOE – version 2 FOE – version 3FOE – version 3 Where to from here?Where to from here?

Page 3: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

BackgroundBackground

Page 4: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

7575thth Anniversary Anniversary

Duke University School of Medicine Duke University School of Medicine established in 1930established in 1930

2005 – year-long celebration2005 – year-long celebration New published historyNew published history Articles, videos, speechesArticles, videos, speeches Alumni weekend gala eventAlumni weekend gala event

Josiah C. Trent Foundation GrantJosiah C. Trent Foundation Grant

Page 5: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Digitization ProjectDigitization Project

500 images documenting the first 3 500 images documenting the first 3 decades of the School of Medicine decades of the School of Medicine and Hospitaland Hospital

Image groups:Image groups: BuildingsBuildings EducationEducation EventsEvents ClinicalClinical PeoplePeople TechnologyTechnology

Page 6: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Digitization Project Digitization Project (cont.)(cont.)

Selection – Whole staffSelection – Whole staff Digitization – Outsourced to Digitization – Outsourced to

University PhotographyUniversity Photography Description – Technical services and Description – Technical services and

Reference coordinatorsReference coordinators Subject terms – Technical services Subject terms – Technical services

coordinator, Head, Cataloging coordinator, Head, Cataloging services.services.

Controlled vocabulary – Notetab Controlled vocabulary – Notetab templates and librariestemplates and libraries

Page 7: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

FOE1.0FOE1.0

XML, XSLT, and PostgresqlXML, XSLT, and Postgresql

Page 8: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

FOE1.0FOE1.0

600 images = 600 xml files = 2 xslt 600 images = 600 xml files = 2 xslt stylesheetstylesheet

Xml = Xml = EAD2002EAD2002 XSLT = 1) convert xml to html; 2) XSLT = 1) convert xml to html; 2)

convert xml to SQL statementsconvert xml to SQL statements Postgresql database used only for Postgresql database used only for

search search Result Result

http://archives.mc.duke.edu/projects/bhttp://archives.mc.duke.edu/projects/bld/bld00012.htmlld/bld00012.html

Page 9: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

IssuesIssues

SQL search statements worked…notSQL search statements worked…not No indexing by search enginesNo indexing by search engines JDBCJDBC I am not a programmerI am not a programmer Definite need for improvementsDefinite need for improvements

Page 10: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

DSpace & Fedora:DSpace & Fedora:A Birds-eye ViewA Birds-eye View

Page 11: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Need for a Digital Need for a Digital RepositoryRepository

DSpaceDSpace First released in 2002. Developed by MIT First released in 2002. Developed by MIT

Libraries and Hewlett-Packard (Libraries and Hewlett-Packard (USA TodayUSA Today)) Current version (Current version (downloaddownload)) Optimal performance in a *nix environment, Optimal performance in a *nix environment,

but should operate in any environmentbut should operate in any environment Written in Java Written in Java VERY active listservsVERY active listservs Manakin – TAMU created “front-end” which Manakin – TAMU created “front-end” which

makes for easier UI localizationmakes for easier UI localization

Page 12: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Need for a Digital Need for a Digital Repository (cont.)Repository (cont.)

FEDORA FEDORA (Flexible Extensible Digital Object and Repository (Flexible Extensible Digital Object and Repository Architecture)Architecture)

Began as a DARPA and NSF-funded research Began as a DARPA and NSF-funded research project at Cornell in 1997project at Cornell in 1997

2001, UVA and Cornell: $1M Mellon grant 2001, UVA and Cornell: $1M Mellon grant 1.0 released 20031.0 released 2003 Current version 2.2.1 (Current version 2.2.1 (downloaddownload)) Optimal performance in a *nix env, but will run on Optimal performance in a *nix env, but will run on

Windows based systemsWindows based systems Written in JavaWritten in Java Several front-end tools developed. (more in a Several front-end tools developed. (more in a

moment)moment)

Page 13: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Side by side testingSide by side testing

Testing environment:Testing environment: Lenovo T60, 120 G hard drive, 2 G Lenovo T60, 120 G hard drive, 2 G

memory, Fedora 7, 2.6.23 kernel, java memory, Fedora 7, 2.6.23 kernel, java 1.51.5

Page 14: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

RequirementsRequirements

DSpaceDSpace Java1.4 +Java1.4 + Apache Ant 1.6.2 +Apache Ant 1.6.2 + Postgresql 7.3 + Postgresql 7.3 +

(or Oracle 9 +)(or Oracle 9 +) Jakarta Tomcat Jakarta Tomcat

4.x/5.x (I used 6.x)4.x/5.x (I used 6.x) Can also run on Can also run on

Jetty or Caucho Jetty or Caucho ResinResin

FedoraFedora JDK 1.5 +JDK 1.5 +

OptionalOptional MySQLMySQL PostgresqlPostgresql Oracle 9Oracle 9 Jakarta TomcatJakarta Tomcat Ant 1.6.5 + if Ant 1.6.5 + if

building from source building from source codecode

Page 15: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

File Size & Download File Size & Download timestimes

DSpaceDSpace 16 mb16 mb 1:43 over a T1 line1:43 over a T1 line 1:13 on a T line1:13 on a T line

FedoraFedora 72 mb72 mb 7:49 over a T1 line7:49 over a T1 line 1:53 over a T line1:53 over a T line

Page 16: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Installation timeInstallation time

DSpaceDSpace Postgresql installation Postgresql installation

and set up: 8 minutesand set up: 8 minutes Ant build and Ant build and

configuration: 8 configuration: 8 minutesminutes

DSpace/Tomcat DSpace/Tomcat configuration and configuration and deployment: 8 deployment: 8 minutesminutes

Total time to live: 24 Total time to live: 24 minutesminutes

FedoraFedora Postgresql Postgresql

installation and set installation and set up: 8 minutesup: 8 minutes

Fedora install: 5 Fedora install: 5 minutesminutes

Total time to live: Total time to live: 13 minutes13 minutes

Page 17: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Initial Live ViewInitial Live View

DSpaceDSpace Front Page Front Page

FedoraFedora Front PageFront Page

Page 18: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

FOE2.0FOE2.0

Choosing our Digital Choosing our Digital RepositoryRepository

Page 19: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Deciding FactorsDeciding Factors DSpaceDSpace

Off-the-shelf viewOff-the-shelf view Workflow processWorkflow process Individual submitters, Individual submitters,

one project adminone project admin Item submission form Item submission form

(link here)(link here) Bulk load script (dc, Bulk load script (dc,

item, mapfile)item, mapfile) Searchbot harvestableSearchbot harvestable OAI harvestableOAI harvestable

FedoraFedora Off-the-shelf viewOff-the-shelf view One submitterOne submitter Item submission not Item submission not

intuitive (link)intuitive (link) Bulk load script (foxml)Bulk load script (foxml) Content Models (will Content Models (will

return)return) DissemenatorsDissemenators Behavior Definitions Behavior Definitions Would require Would require

extensive programmingextensive programming

Page 20: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

FOE2.0 = DSpaceFOE2.0 = DSpaceCup is Half FullCup is Half Full

March 2006March 2006 Foundations new homeFoundations new home Data submission formData submission form Item View Item View bld00012bld00012 Item UpdateItem Update Access RestrictionsAccess Restrictions Handle serverHandle server

Page 21: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

FOE2.0 = DSpaceFOE2.0 = DSpaceCup is Half EmptyCup is Half Empty

Object is entered as one itemObject is entered as one item DSpace is self-containedDSpace is self-contained No real way to show complex relationshipsNo real way to show complex relationships All or nothing metadataAll or nothing metadata Access RestrictionsAccess Restrictions Handle serverHandle server Searchbot indexing: Searchbot indexing:

DSpace@DukeMed: Item 2193/77DSpace@DukeMed: Item 2193/77Title:, Title:, A. Jack TannenbaumA. Jack Tannenbaum. Issue Date:, 10-Nov-2005 . Issue Date:, 10-Nov-2005 ...... Abstract:, Abstract:, A. Jack TannenbaumA. Jack Tannenbaum received his medical degree received his medical degree from Duke University in 1935. from Duke University in 1935. ......

Page 22: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

FOE3.0FOE3.0

““Our goal is to never be Our goal is to never be satisfied”satisfied”

Page 23: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

Content ModelsContent Models

Reusing datastreamsReusing datastreams(next 2 slides borrowed from EDUCASE (next 2 slides borrowed from EDUCASE 2004 presentation by Grizzle, Wayland, 2004 presentation by Grizzle, Wayland,

and Wilper)and Wilper)

Page 24: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Atomistic ModelAtomistic ModelT E I e te x t o fa 3 -p a g e le tte r

Persistent ID (P ID)

D is s e m in a to rs

System Metadata

T e xtT e xtT e xtT e xt< im a g e p o in te r ta g >

T e xtT e xtT e xtT e xtT e xtT e xtT e xtT e xt

< im a g e p o in te r ta g >T e xtT e xtT e xtT e xtT e xtT e xtT e xtT e xt

< im a g e p o in te r ta g >

P a g e 1 im a g e

P a g e 2 im a g e

P a g e 3 im a g e

Persistent ID (PID)

D i s s e m i nato r s

System Metadata

i m ag e

Persistent ID (PID)

D i s s e m i nato r s

System Metadata

i m ag e

Persistent ID (PID)

D i s s e m i nato r s

System Metadata

i m ag e

Page 25: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Compound ModelCompound ModelT E I etex t of

a 3 -p ag e letter

Persistent ID (PID)

D is s e m in a to rs

System Metadata

T e xtT e xtT e xtT e xt< im a g e p o in te r ta g >

T e xtT e xtT e xtT e xtT e xtT e xtT e xtT e xt

< im a g e p o in te r ta g >T e xtT e xtT e xtT e xtT e xtT e xtT e xtT e xt

< im a g e p o in te r ta g >

S creen size im age fo r page 1

S creen size im age fo r page 2

S creen size im age fo r page 3

Page 26: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

An old favorite blanketAn old favorite blanket

2005-2007 Fedora minimally utilized2005-2007 Fedora minimally utilized Primarily used for archiving Library Primarily used for archiving Library

Administrative documents (Council and Administrative documents (Council and Management Team minutes, and Management Team minutes, and Policies and procedures)Policies and procedures)

Use of XACML policies to restrict Use of XACML policies to restrict access (156\.16\.\d{1,3}\.\d{1,3} lock access (156\.16\.\d{1,3}\.\d{1,3} lock down)down)

Began looking at front-end GUIsBegan looking at front-end GUIs

Page 27: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Front End toolsFront End tools FezFez – – A web front-end management system for Fedora that is A web front-end management system for Fedora that is

developed in PHP.  Fez functionality includes: Web-based developed in PHP.  Fez functionality includes: Web-based browsing and searching; Semi-advanced searching; Complex browsing and searching; Semi-advanced searching; Complex security; Basic image handling; Dublin Core. security; Basic image handling; Dublin Core. http://http://espace.library.uq.edu.au/documentation/espace.library.uq.edu.au/documentation/

ElatedElated - - ELATED is a lightweight, general-purpose application ELATED is a lightweight, general-purpose application for managing digital files. ELATED is built on top of the Fedora for managing digital files. ELATED is built on top of the Fedora Repository system, and can be used as a digital assets Repository system, and can be used as a digital assets management system, an institutional repository, or to meet management system, an institutional repository, or to meet other collection archiving, publishing and searching needs. other collection archiving, publishing and searching needs.  Dublin Core metadata entry and search; Custom metadata by  Dublin Core metadata entry and search; Custom metadata by collection; Automatic previews for images; Collections with collection; Automatic previews for images; Collections with simple editorial workflow; Indexing and searching of content; simple editorial workflow; Indexing and searching of content; User feedback, enabled by collection; Select and import existing User feedback, enabled by collection; Select and import existing Fedora objects Fedora objects http://elated.sourceforge.net/ http://elated.sourceforge.net/ 

Both require extensive programming for localizationBoth require extensive programming for localization

Page 28: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

External Forces at playExternal Forces at play

Fall 2006 we began a project to Fall 2006 we began a project to digitize 10,000+ cytopathology slides. digitize 10,000+ cytopathology slides. Images converted to JPEG2000 to increase user Images converted to JPEG2000 to increase user

experience (experience (exampleexample)) Archives purchased Aware JPEG2000 Image Archives purchased Aware JPEG2000 Image

ServerServer

History of Medicine image database, History of Medicine image database, Historical Images in Medicine (HIM) Historical Images in Medicine (HIM) needed new platformneeded new platform

Page 29: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Call out of the blueCall out of the blue

VTLS – VitalVTLS – Vital Open RepositoriesOpen Repositories

Page 30: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

FOE3.0 = Fedora/VitalFOE3.0 = Fedora/VitalCup is Half FullCup is Half Full

June 2007June 2007 Foundations new home (link)Foundations new home (link) Data submission (3 ways to enter items)Data submission (3 ways to enter items) Item View Item View bld00012bld00012 Object is entered as many datastreams (fedora view)Object is entered as many datastreams (fedora view) Vita/Fedora/Aware…interoperabilityVita/Fedora/Aware…interoperability Complex relationshipsComplex relationships Multiple metadata streamsMultiple metadata streams Handle serverHandle server Searchbot indexing:Searchbot indexing:

A. Jack Tannenbaum. | MeDSpaceA. Jack Tannenbaum. | MeDSpaceDescription: Description: A. Jack TannenbaumA. Jack Tannenbaum received his medical received his medical degree from Duke University in 1935. degree from Duke University in 1935. ...... per00165, per00165, A. Jack A. Jack TannenbaumTannenbaum. 302.3 kB, JPEG 2000 Image . 302.3 kB, JPEG 2000 Image ......

Page 31: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

FOE3.0 = Fedora/VitalFOE3.0 = Fedora/VitalCup is Half EmptyCup is Half Empty

Fedora is open source, Vital is notFedora is open source, Vital is not Customization possible with Customization possible with

programming knowledgeprogramming knowledge No way at this time to implement No way at this time to implement

xacml policies (work arounds exist)xacml policies (work arounds exist) Vital upgrades require full software Vital upgrades require full software

installationinstallation Local customization can cause breaks Local customization can cause breaks

in certain functionsin certain functions

Page 32: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

Conclusions and Conclusions and obligatory linksobligatory links

Page 33: Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Selected LinksSelected Links

DSpace – http://dspace.orgDSpace – http://dspace.org

Manakin - http://di.tamu.edu/projects/xmlui/installManakin - http://di.tamu.edu/projects/xmlui/install

Fedora – http://www.fedora-commons.org/Fedora – http://www.fedora-commons.org/

Elated - http://elated.sourceforge.net/Elated - http://elated.sourceforge.net/

Fez - http://espace.library.uq.edu.au/documentation/Fez - http://espace.library.uq.edu.au/documentation/

Vital – http://vtls.comVital – http://vtls.com

DSpace@DukeMed – http://dspace.mclibrary.duke.eduDSpace@DukeMed – http://dspace.mclibrary.duke.edu

MeDSpace – MeDSpace – http://medspace.mc.duke.edu/vital/access/manager/Indexhttp://medspace.mc.duke.edu/vital/access/manager/Index