digital asset management and publication with ladybird eric james programmer/analyst library it yale...

51
Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library [email protected] 12 July 2013

Upload: dominic-hopkins

Post on 27-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Digital Asset Management and Publication with LadyBird

Digital Asset Management and Publication with LadyBird

Eric Jamesprogrammer/analyst library ITYale University [email protected]

12 July 2013

Eric Jamesprogrammer/analyst library ITYale University [email protected]

12 July 2013

Page 2: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

What is LadyBird?What is LadyBird?

• Bebop song by Tadd Dameron• First Lady, Lyndon B. Johnson presidency• Old dog from King of the Hill• Digital asset management tool

2

Page 3: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

LadyBird - Digital Asset Management ToolLadyBird - Digital Asset Management Tool

3

LadyBird from its origin is a system which processes metadata and temporarily houses digital assets to be published. It provides a configurable system for migrating digital objects and collections, normalizing metadata, and preserving and publishing content.

It was initially writing in Microsoft .Net and C#, hosted on Windows 2008 using Microsoft SQL Server 2008.

Some work on java modules (for import)Wish list – To migrate to Jruby/rails.

Page 4: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

LadyBird componentsLadyBird components

• Web interface• Job processing engine - imports• Export processing engine – exports• Bag creation• Heartbeat monitor• Application cleanup system

• This presentation will focus on the workflow and concepts involved in publication of digital objects w/ metadata to fedora

4

Page 5: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

LadyBird concepts ILadyBird concepts I

• Core of the application is the object table• Collection – departments within the library and Yale

(later will come into play when discussing c# tables)• Project – projects specific to a collection• An object belongs to a project and a project belongs

to a collection• Currently 16 collections with 34 projects and 1.53

million objects• We call objects “oids”, technically “oid” means object

id column of the object table but we tend to use it to describe the whole ball of wax

• User table – cataloger is registered and roles and permissions setting are used throughout the app

5

Page 6: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

LadyBird concepts II LadyBird concepts II

• Processing objects is all about the spreadsheet• Each row is an object• Each column represents either functions or metadata

• Functions ex – {F1} is the object as identified by oid(primary key of object table), if left blank that is signal to create a new oid

• {F4} parent oid (for complex objects)• {F40} can have a value PUBLISH telling ladybird to auto publish this object

• Metadata ex – {FDID=58} call number,{FDID=262} Host,creator,etc.

The cataloger can take advantage of excel functionality (like repeating fields) to quickly create a spreadsheet for batch import,

6

Page 7: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

LadyBird concepts IIILadyBird concepts III

field_definition (fdid) table (230 metadata fields)

51 Cataloger52 Record source53 Record date54 Record modified date55 Record ID56 Local record ID57 Local record ID, other58 Call number59 Accession number60 Box

The values are either strings or acid values (more on acids later)

7

Page 8: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

LadyBird concepts IVLadyBird concepts IV

• Import tables – all about the spreadsheets, though you can import MARC or EAD records by bibid, barcode, handle too, in that case the records are deserialized into fdids, and any spreadsheet data overrides the records

im_job (1 master row for spreadsheet)Im_job_exHead (column headers from spreadsheet)im_job_contents (values)Im_files(for files)import_checksum (for files)im_job_contents_history

• Job tracking (overall tracking associates a oid imported to a specific job)

trk_projecttrk_jobtrk_job_contentstrk_oid

8

Page 9: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

LadyBird concepts VLadyBird concepts V

• The C# tables – c for “current”,# for each collection• The “Metadata home” - data imported to the im tables finally transferred here• There is a set of tables for each collection.

Ex: # = 13 (collection:Hydra, project: Hydra Test)c13 – master list of oids

c13_stringsc13_longstringsc13_acid

Each row contains basically a oid/fdid/value, thus given an oid one could get all metadata fields for that object as rows from this table. It also has a favid for additional values associated with the fdid.

There also corresponding p# tables, p for “past” that keep a audit trail of any updates to specific oids.

C#table designed for high volumeExploring better options, hashing

9

Page 10: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

LadyBird concepts VILadyBird concepts VI

• Acid – authority control – a system for using controlled vocabulary for metadata fields

Fdid 62 = Host, Creator

Acid fdid value126434 62 Luhan, Mabel Dodge, 1879-1962126626 62 Dobbs, Arthur, 1689-1765126628 62 Filson, John, ca. 1747-1788126630 62 Thomson, Charles, 1729-1824126632 62 Hutchins, Thomas, 1730-1789126635 62 Adair, James, ca. 1709-1783

So If for an oid row in the spreadsheet the fdid 62 column was given the value 126635, that field would resolve to Adair, James, ca. 1709

Currently 155,415 values.Potential for more sophisticates uses with linked data.

10

Page 11: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

LadyBird sample workflow startLadyBird sample workflow start

• Workstation mounted with a job folder for both import and export

Windows:\\birdcage.library.yale.edu\project25\import\

Mac: SMB://birdcage.library.yale.edu/project25//import//

Windows:\\birdcage.library.yale.edu\project25\export\

Mac:SMB://birdcage.library.yale.edu/project25//export//

• Project25 corresponds to the project table• Create a folder in the import directory and drag files into folders or subfolders• LadyBird will now have detected that folder and have created a job for this under

the “Dashboard” menu selection

11

Page 12: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

LadyBird dashboardLadyBird dashboard

12

Page 13: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

add digital object to folderadd digital object to folder

13

Page 14: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Got to dashboard and process this folderGot to dashboard and process this folder

14

Page 15: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Receive email confirmationReceive email confirmation

Subject: LadyBird Import Complete job: test_open_rep

Your import has been processed.test_open_repVisit your dashboard in Ladybird for your most recent jobs.http://ladybird.library.yale.edu/user_jobs.aspx

View job: http://ladybird.library.yale.edu/user_jobs.aspx?qa=query&qid=12307

* A jobcomplete.txt file with the time is added to import folder so app know that directory is complete

15

Page 16: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

View jobView job

16

Page 17: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

View setView set

17

Page 18: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

New object->Metadata (form)New object->Metadata (form)

18

Page 19: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Or From View Set, “Export as Job”Or From View Set, “Export as Job”

19

Page 20: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Receive export email confirmationReceive export email confirmation

Subject: LadyBird Export Ready

Your export is ready. \\birdcage\project25\export\ermadmix_46371_06262013_165116.xls

20

Page 21: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Spreadsheet – fill in and save as tab-delimited text fileSpreadsheet – fill in and save as tab-delimited text file

21

Page 22: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

ImportImport

22

Page 23: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Import Email ConfirmationImport Email Confirmation

Subject: LadyBird Import Complete job: ermadmix_import_062613_171134

Your import has been processed.ermadmix_import_062613_171134Visit your dashboard in Ladybird for your most recent jobs.http://ladybird.library.yale.edu/user_jobs.aspx

View job: http://ladybird.library.yale.edu/user_jobs.aspx?qa=query&qid=12313

23

Page 24: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

PublishPublish

• Publishes automatically if {F40}=publish• Or can use interface to check file and metadata and

explicitly click the publish button

24

Page 25: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Publish (behind the scenes)Publish (behind the scenes)

• Oid is added to the hydra table with date (when added) and date published (when processing complete) timestamps

Id oid date date published

… … … …39176 10684347 2013-06-26 16:01:11.043 2013-06-26 17:14:05.90039177 10684348 2013-06-26 16:01:11.043 2013-06-26 17:14:07.45739178 10684349 2013-06-26 16:01:11.043 2013-06-26 17:14:09.01739179 10684350 2013-06-26 16:01:11.043 2013-06-26 17:14:10.57739180 10684351 2013-06-26 16:01:11.043 2013-06-26 17:14:12.13739181 10684352 2013-06-26 16:01:11.043 2013-06-26 17:14:13.697… … … …

25

Page 26: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

oid added to hydra_publish tableoid added to hydra_publish table

Key fields:hpid: 23703hcmid: 2cid:9Pid: 27Oid: 10681633_oid: 0zindex: 0hydraID: nulldateReady: 2013-06-26 16:01:55.430dateHydraStart: null

26

Page 27: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Rows for oid added to hydra_publish_path tableRows for oid added to hydra_publish_path table

Key fields w/ example:hppid: 139004Hpid: 26340Type: jp2pathHTTP: http://lbfiles.library.yale.edu/10684274.jp2pathUNC: \\storage.yale.edu\home\ladybird-801001-yul\ladybird\project27\publish\

dl\10684274\1758.02.00.00_page1.jp2Md5: 35433b00ca9de2cdaed275c455339090controlGroup: MmimeType: image/jp2Dsid: jp2ingestMethod: filepathoidPointer: null

27

Page 28: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Hydra_publish_path – typical filesHydra_publish_path – typical files

xml rights (hydra rights)Xml metadata (MODS descMetadata)Xml access (home grown granular rights)pdf (transcript YIPP)pdf2 (annotated transcript YIPP)jp2 (derivative)jpg (derivatives)tif (master)

28

Page 29: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

descMetadata - creationdescMetadata - creation

There is a service (c# class and methods) that is called upon hydra publish that iterates through all the fdids for an oid and uses the XML DOM to create a MODS file. This is basically a mapping of field definitions to the MODS schema.

There is the potential to map the fdids to any metadata format.

29

Page 30: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

accessMetadataaccessMetadata

30

Page 31: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Rights metadataRights metadata

31

Page 32: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Transition in fedora hydra worldTransition in fedora hydra world

select * from hydra_content_model

id date uid contentModel1 2013-04-25 08:50:20.043 1 simple2 2013-04-25 08:50:26.350 1 complexParent3 2013-04-25 08:50:30.420 1 complexChild

ContentModel maps to ActiveFedora model

32

Page 33: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Transition into fedora hydra world IITransition into fedora hydra world II

select * from hydra_content_model_ds

id date uid hcmid dsid ingMethod required1 2013-04-25 08:56:11.6701 1 accessMetadata pullHTTP y2 2013-04-25 08:56:11.6701 1 descMetadata pullHTTP y3 2013-04-25 08:56:11.6701 1 rightsMetadata pullHTTP y4 2013-04-25 08:56:11.6701 1 tif filepath y5 2013-04-25 08:56:11.6701 1 jp2 filepath y6 2013-04-25 08:56:11.6701 1 jpg filepath y7 2013-04-25 08:56:11.6701 2 accessMetadata pullHTTP y8 2013-04-25 08:56:11.6701 2 descMetadata pullHTTP y9 2013-04-25 08:56:11.6701 2 rightsMetadata pullHTTP y10 2013-04-25 08:56:11.6701 2 tif filepath n11 2013-04-25 08:56:11.6731 2 jp2 filepath n12 2013-04-25 08:56:11.6731 2 jpg filepath n13 2013-04-25 08:56:11.6731 3 accessMetadata pullHTTP y14 2013-04-25 08:56:11.6731 3 descMetadata pullHTTP y15 2013-04-25 08:56:11.6731 3 rightsMetadata pullHTTP y16 2013-04-25 08:56:11.6731 3 tif filepath y17 2013-04-25 08:56:11.6731 3 jp2 filepath y18 2013-04-25 08:56:11.6731 3 jpg filepath y19 2013-05-31 10:48:25.6201 2 oidPointer pointer n20 2013-06-07 11:03:24.5371 2 pdf filepath n21 2013-06-07 11:03:52.9331 2 pdf2 filepath n

33

Page 34: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Example - simple content modelExample - simple content model

• require "active-fedora"• class Simple < ActiveFedora::Base•   belongs_to :collection, :property=> :is_member_of•   •   has_metadata :name => 'descMetadata', :type => Hydra::Datastream::SimpleMods•   has_metadata :name => 'accessMetadata', :type => Hydra::Datastream::AccessConditions•   has_metadata :name => 'rightsMetadata', :type => Hydra::Datastream::Rights •   has_metadata :name => 'propertyMetadata', :type => Hydra::Datastream::Properties•   •   delegate :oid, :to=>"propertyMetadata", :unique=>true•   delegate :projid, :to=>"propertyMetadata", :unique=>true•   delegate :cid, :to=>"propertyMetadata", :unique=>true•   delegate :zindex, :to=>"propertyMetadata", :unique=>true•   delegate :parentoid, :to=>"propertyMetadata", :unique=>true•

• end

34

Page 35: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Example – Properties DatastreamExample – Properties Datastream

• require 'active_fedora'•  • module Hydra•   module Datastream•     class Properties < ActiveFedora::OmDatastream •

• #ERJ note ladybird pid = projid, ladybird _oid = parentoid •       set_terminology do |t|•         t.root(:path=>"root")•

• t.oid(:path=>"oid")• t.cid(:path=>"cid")• t.projid(:path=>"projid")• t.zindex(:path=>"zindex")• t.parentoid(:path=>"parentoid")• t.ztotal(:path=>"ztotal")• t.oidpointer(:path=>"oidpointer")•

• end•   • def to_solr(solr_doc=Hash.new)•         super(solr_doc)• solr_doc['oid_isi'] = oid• solr_doc['cid_isi'] = cid• solr_doc['projid_isi'] = projid• solr_doc['zindex_isi'] = zindex• solr_doc['parentoid_isi'] = parentoid• solr_doc['ztotal_isi'] = ztotal• solr_doc['oidpointer_isi'] = oidpointer•         solr_doc•       end •     end•   end• end

35

Page 36: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Workflow reviewWorkflow review

1. Add folder with files to import folder2. Process folder. This will create the records in the database (oids, job

tracking,c# instances, and file derivatives)3. Export spreadsheet. This will create a spreadsheet template for the folder of

files in (1)4. Fill in metadata in spreadsheet – the main cataloging task.5. Import spreadsheet. This will ultimately populate the c# with metadata from

the oid rows of the spreadsheet.6. Publish to hydra. This will create the hydra tables with serialized metadata

files(MODS, access rights), and stage files in storage for ingest.

36

Page 37: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Ingest taskIngest task

• Set up within a hydra project• gem ‘tiny_tds’ connect to the ladybird SQL Server

database

37

Page 38: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

app/models (objects)app/models (objects)

• collection.rb – maps to pid (project) in ladybird, parent to simple.rb and complex_parent.rb

• simple.rb – 1 image w/derivatives, no hierarchy• complex_parent.rb – parent to a set of images (like a

book or image set)• complex_child.rb – 1 image w/derivatives (like a page

These relate to the hydra_content_model table

38

Page 39: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

app/model (datastreams)app/model (datastreams)

• coll_properties.rb• properties.rb• rights.rb• access_conditions.rb• simple_mods.rb

39

Page 40: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

simple_mods.rb - indexingsimple_mods.rb - indexing

40

Page 41: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

rake yulhy4:ingest Irake yulhy4:ingest I

Properties:• SQL server connection config• Mount of ladybird storage

Uses the hydra_publish table as a queue (driven by this query until done):

• select top 1 a.hpid,a.oid,a.cid,a.pid,b.contentModel,a._oid from dbo.hydra_publish a, dbo.hydra_content_model b where a.dateHydraStart is null and a.dateReady is not null and a._oid=0 and a.hcmid is not null and a.hcmid=b.hcmid and a.action='insert' order by a.dateReady")

•     

41

Page 42: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

rake yulhy4:ingest II rake yulhy4:ingest II

ActiveFedora ingest

Create new object based on content modelobj = Simple.newobj = ComplexParent.newobj = ComplexChild.new

42

Page 43: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Rake yulhy4:ingest IIIRake yulhy4:ingest III

Iterate through all datastreams for the content model• select hcmds.dsid as dsid,hcmds.ingestMethod as ingestMethod,

hcmds.required as required from dbo.hydra_content_model hcm, dbo.hydra_content_model_ds hcmds where hcm.contentModel = '#{contentModel}' and hcm.hcmid = hcmds.hcmid/)

For each in above query get the datastream info for the oid• select

type,pathHTTP,pathUNC,md5,controlGroup,mimeType,dsid,OIDpointer from dbo.hydra_publish_path where hpid=#{i["hpid"]} and dsid='#{dsid}'/)

Verify checksums and use activeFedora to ingest datastreams

43

Page 44: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

rake yulhy4:ingest IVrake yulhy4:ingest IV

Add ladybird specific info to properties datastream• oid• cid• pid• zindex• _oid

Add hierarchical info to RELS-EXT• Simple and complex_parent – is_member_of a collection• Complex_child – is member of a complex_parent

Some discussion about adding more linked data.

44

Page 45: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Rake yulhy4:ingest VRake yulhy4:ingest V

45

Page 46: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Rake yulhy4:ingest VI Rake yulhy4:ingest VI

46

Page 47: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Blacklight Blacklight

47

Page 48: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

reviewreview

48

Page 49: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

futurefuture

Hydra_publish – revise already ingested content• action=‘update’• action=‘insert’

Archivematica (by artefactual)• Replace the ingest task with a custom workflow• GUI interface• Human decision points and manual processing• Technical metadata generation (FITS)• Provenance (jhove)• Issues – how to employ OAI packages (SIP,AIP,DIP) for

objects without a natural package structure?

49

Page 50: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

ContributorsContributors

• Eric James• Lakeisha Robinson• Kalee Sprague• Osman Din• Jay Terray• Rebekeh Irwin• Mike Friscia

50

Page 51: Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

Thank youThank you