implementing durham etheses - sebastian palucha (pecha kucha)
DESCRIPTION
Pecha Kucha slides on Durham University's experience of implementing their Etheses system, presented by Sebastian Palucha, on Friday 2nd August 2013 at Repository Fringe 2013.TRANSCRIPT
Implementing Durham E-Theses
Presented by Sebastian Palucha#rfringe13
CC BY jitze http://www.flickr.com/photos/jitze1942/3521700792
∂
Durham E-Theses
Initial project spring/summer 2009
First deposit September 2009 ~ 300 research theses per year Simple deposit, single PDF EThOS interoperability EPrints 3.1.3 (born 2009)
CC BY didbygraham http://www.flickr.com/photos/didbygraham/5646920685/
∂
Registered: EThOS, Driver, OCL Digital Gateway (2010 spr.)
EThOS harvest in operation (2010 sum.)
Google Analytics stats (2010 dec.)
EThOS digitised theses loaded (2011 sum.)
Google Custom Search (aut. 2011)
Collaboration with The BL
to improve EThOS services
(aut. 2011 – spr. 2012)
EU/ICO Cookie Law support (2013 sum.)
local digitisation project,
10k (2012 spr2 – )
MySQL migrated to UTF-8 (2013 spring)
Creative Common Licences introduced (2012 aut.)
CC BY AlishaV http://www.flickr.com/photos/alishav/3156574283
Key milestones
∂
Branding: uniform user experience• Issues: browsers, branding
changes• Durham University CMS CSS
• Eprints 3 CSS
∂
Simplistic single PDF deposit• Details > Upload > Deposit• LDAP integration + user field population
• Embargo implemented in first screen
CC BY Pink Sherbet Photography http://www.flickr.com/photos/pinksherbet/236299644
∂
Cover pages
Highly customized LaTeX code Issues with UTF-8 both LaTeX
and plugin Issues with dynamic if/else
∂
Google Analytics: full text downloads
• Two steps:
1. PDF download link (core code)
2. special GA profile• URL structure include
department codes?DDD32
• Internal code modification
∂
EThOS interoperabilitythrough OAI-PMH harvest• Issues with out of the box plug-in, changes to XML schema needed
• uketdterms:qualificationlevel not defined in EPrints data model
• Embargo date not included. Plugin assumes embargo on an record level, whereas EP on an document level!
• Added department names
• Occasional issues with UTF-8 encoding
∂
EThOS download WS
• Script for mass download https://github.com/paluchas/ethos-bl
groovy EthosDownloadClient.groovy -i 238830 –m download
∂
EThOS avoiding duplication• We store EThOS persistent IDs
• We modified /cgi/oai2 script to conditionally exclude ethos records
• Modified record can be exposed to EThOS harvest in future
∂
UTF-8 issuesUnknown copy/paste issues
seen: OAI/PMH Cover Pages LaTeX Abstract pages
Solution: Code modification Whole MySQL database migration to
UTF-8, fortunately double encoding
CC BY familymwr http://www.flickr.com/photos/familymwr/5548057120 //
∂
Creative Common Licences Approached by student:
specific query about particular CC to be used
A lot of redefinition is code
∂
CC outreach
∂
Better search, DRO integrationGoogle Custom Search with modified search results
∂
Retrospective digitisation project• 10k paper theses being digitised by local company
• Mass upload with metadata in XML file and digitised material in PDF files, web and archive version. A lot of metadata and quality issues
• Interesting samples of other materials: big prints, DVDs, CDs, cassette tapes, microfilms, small datasets and research software.
∂
EU/ICO Cookies Law
CC BY USAG-Humphreys http://www.flickr.com/photos/31687107@N07/6206906748
∂
Repository versus real life• Users would like to deposit other than PDF files.
• Requested “Dark” storage
• Encrypted PDFs
• Take down requests, and Web cached content. How far should we liaise with external world
• Some students are not aware about consequences of web deposits: 3rd party copyright, sensitive data not embargoed etc.
• Disciplinary differences; not only humanities vs. sciences.
• External user requesting contact with author or supervisors
∂
Sustainability• Operational:
virtualization, operating systems support, database
• Customization: Bespoken changes and technology deficit
• Support: hard to coordinate across the University departments
CC BY Rennett Stowe http://www.flickr.com/photos/tomsaint/4515448425
∂
Future plans Review process, be paper free, include pass list, extend workflow to exam
board Actively encourage students to use CC licences by demonstrate its benefit Encourage deposit of key data sets and explore data visualization Migrate to new repository framework Integration with Durham University RIS Google Analytics live stats, integration with IRUS-UK
CC BY Boston Public Library http://www.flickr.com/photos/boston_public_library/8902381985/
∂
Repository of the future
CC by http://www.flickr.com/photos/keoni101/7069578953CC BY Keoni Cabral http://www.flickr.com/photos/52193570@N04/7069578953