hub distributed model 2009
DESCRIPTION
TRANSCRIPT
The Archives Hub ~Interoperability, Spokes and the Distributed Model
The Hub in a Nutshell
• Based at Mimas, University of Manchester
• In service since 2000
• Over 23,000 collection descriptions
• 170 repositories
• JISC funded
• Management and service team at Manchester
• Development team at Liverpool
• Cheshire software
• Cheshire for Archives – works with EAD descriptions
• Distributed system
Hub Workshop 2009
Content and contributors
• Strategic aim: build and enhance content
• Meeting the needs of the UK research community
• Meeting the needs of the wider community
• Archives for education and research
Flickr cc licence: eileenaway's photostream
The success of the Hub is a reflection of the rich content available from Hub contributors
Hub Workshop 2009
Current contributors
• Higher/Further Education
• Consortium contributions
• Institutions with a research agenda
• Others on a case-by-case basis
• We encourage institutions to contact us
John Rylands Library, Manchester
Hub Workshop 2009
Collection or lower-level…?
• Originally funded for collection-level
• Software/searches effective with both
• Complimentary approaches
• Researchers ask for detail
Flickr cc licence: Muffet’s photostream
Flickr cc licence: soylentgreen23’s photostream
• Images useful at item level
JISC Information Environment
… a vast and sometimes bewildering range of potential sources of electronic information. Each source of information has its own name, its own interface, features and search facilities. Little wonder, then, that many users remain unaware of their existence or fail to discover their value for their own learning, teaching or research.
A key challenge is therefore to achieve a managed, coherent and shared information environment that will overcome these obstacles.
Being able to cross-search and use customised, value added and other services will considerably simplify users’ interactions with online resources. This should encourage take-up and greatly improve means of accessing these resources.
…these activities need to be based on standards for the creation, access, use, preservation and interoperability of networked resources.
http://www.jisc.ac.uk/index.cfm?name=ie_home
Hub Workshop 2009
JISC Information Environment
Most content providers will already offer a Web site through which end-users can access their content. To be a part of the JISC-IE, content providers also need to support machine oriented interfaces to their resources.
1. Support searching using Z39.50/SRW
2. Support metadata harvesting using OAI-PMH
Andy Powell
5 step guide to becoming a content provider in the JISC Information Environment
http://www.ariadne.ac.uk/issue33/info-environment
Hub Workshop 2009
E-GIF, open source and open standardse-GIF version 6.1 (18th March 2005)
– The e-Government Interoperability Framework (e-GIF) sets out the government’s technical policies and specifications for achieving interoperability… across the public sector.
– There is a strategic decision to adopt XML and XSL as the core standards for data integration and management.
– It is a pragmatic strategy that aims to reduce cost and risk for government systems while aligning them to the global Internet revolution.
http://www.govtalk.gov.uk/documents/eGIF%20v6_1(1).pdf
Open Source, Open Standards and Re–Use: Government Action Plan http://www.netvibes.com/cabinetoffice#Open_Source
Hub Workshop 2009
Isn’t technology brilliant?!!
• Technical know-how• XML
• Data creation/editing template
• Web interface
• Machine interfaces
• Distributed model
• Web 2.0
• Dissemination
= Satisfying user experience
+ understanding users
Hub Workshop 2009
Hub Data Flow
• Sustainable model
• Data held as XML
• Efficient search
mechanism
• Flexible access
• Easy to become a
Spoke
The Distributed Hub
Flickr cc licence : Thomas Hawk
The main goal of a distributed computing system is to connect users and resources in a transparent, open, and scalable way. Ideally this arrangement is drastically more fault tolerant and more powerful than many combinations of stand-alone computer systems.
[Wikipedia]
• Administration interface
• Customisable web front-end
• Machine-to-machine interfaces
• Data Creation Template
• Local control
• Technical support locally
• Hub team support
Spokes software
• Offers a means of storing and sharing archival descriptions in XML
• Provides machine-to-machine access to the descriptions through Z39.50 and SRU (Search and Retrieve via URL) & OAI-PMH for harvesting records
• Provides a customisable Web search interface
• Is open source and based on open standards
• Includes a data creation and editing template
Hub Workshop 2009
Anatomy of a Spoke
EAD XML files
Web search interface
Direct searching access for other
applications through
standards-based machine-to-
machine protocols
…
Including the central Hub!
Z39.50
SRUCheshire indexes of EAD data
HT
TP
Spokes indexes
The database will provide indexes based on the following standards:
Data standard Data field(s)
cql.anywhere full text
dc.description unittitle, controlaccess, and scopecontent fields
dc.title collection title (titleproper)
dc.creator creator of the collection
dc.identifier eadid
dc.date unitdate
dc.subjects controlaccess fields
bath.name personal, family, corporate and geographic names
bath.personalName personal names
bath.corporateName corporate names
bath.geographicName geographic names
bath.genreForm genre
Hub Workshop 2009
Administration Interface
http://spoke.mimas.archiveshub.ac.uk/ead/admin/
Hub Workshop 2009
Hub Workshop 2009
Liverpool Spoke
Hub Workshop 2009
John Rylands Spoke
Hub Workshop 2009
Agreement with Spokes
Hub Workshop 2009
Hosted Spokes
• Spokes at Manchester– Configuration
– Agreement between parties
• Manchester team undertake agreed level of support
• Institution still responsible for the data
Hub Workshop 2009
Being an Archives Hub Spoke…
• Gives you control over your own EAD files
– Allows you to update and add new files when you need to
• Exposes your EAD to other applications which need to cross-search the descriptions
– Using standards-compliant methods
• Means you benefit from using software that has been developed with the Archives Hub community
Hub Workshop 2009
Collaboration & Sharing
• Networks and communities – the National Archives Network
• Cross-service and cross-domain collaboration
– Copac
– Intute
– Digitisation Projects
• Expand and share content
– import/export/M2M
• Links to other archive services
– NRA
Hub Workshop 2009
The National Archives Network
‘Our vision of the future of British archives is of a flow of archival information which takes account of all the opportunities offered by digital networks and offers opportunity for exploration - historical, personal, social - to the broadest possible range of people wherever they can use it - in the home, the classroom or the office.’
British Archives: The Way Forward (NCA, 2000)
A comprehensive national resource discovery mechanism
Hub Workshop 2009
The importance
‘There can be no higher priority for archives than the creation of this collaborative electronic network, overcoming the limitations of geography, crossing the many archival sectors and creating a truly unified digital directory or encyclopaedia of British historical documents.’
British Archives: The Way Forward (NCA, 2000)
Hub Workshop 2009
National Archive Network
Hub Workshop 2009
The opportunity
‘Outreach has been a developing preoccupation for archives in recent years, but the arrival of the internet age provides the opportunities to take archives, as never before, to the doorstep of the community at large.’
British Archives: The Way Forward (2000)
Hub Workshop 2009
Progress of the NAN
• Many archives took part in this drive towards a national archives network
• …many still are taking part
• The importance of recognised standards
• Intention to create collection level catalogues of all substantial collections within a defined timeframe
Hub Workshop 2009
Success of the NAN
• Strands of the national archives network provide access to archives that were previously inaccessible
• The HLF has played a major role in enabling access and online discovery
• Users of archives have benefited enormously
• Data standards have become of central importance
Hub Workshop 2009
Shortcomings of the NAN
• We don’t have a single national network
• Differences in data structure; content; search capabilities; look and feel
• Strands are not fully interoperable
• Politics, funding and willpower may not combine in favour of this approach
• The landscape has changed substantially since 2000 – maybe this solution is no longer appropriate?
Hub Workshop 2009
The NAN today
• Many ‘strands’
• Only a few use EAD (support EAD export)
• Lack of funding for a joint solution
Key is interoperability and machine-to-machine interfaces:
• NAN as a community, sharing knowledge and experiences
• NAN as a promoter of standards and facilitator for data sharing
• NAN strands as promoters of flexible and open approaches
Hub Workshop 2009
The Interoperable Hub
The ability of software and hardware on different machines to share data
• Content standards
• Structural standards
• Validation of content
• Data Editor
•Training and awareness
• Contributor responsibility
• Networking and community building
Hub Workshop 2009
Hub Workshop 2009
Machine-to-machine interfaces
• Web access is just one means of access to the data
• Machine access provides flexible access, so people can set their own agendas– Z39.50
– SRU
– OAI-PMH (harvester)
• Need to provide semantic data – properly marked up, well-structured
Hub Workshop 2009
Pilot project for SRU: Genesis portal for Women’s Studies
• Hub hosts data
• Genesis searches the Hub using SRU
• Implications for data – how search just for appropriate descriptions?
• Possible issues with search speeds
Hub Workshop 2009
Persistent Identifiers
• All Hub descriptions have their own identifiers – a unique reference
• Gives them their own web address – can point to any description
• Facilitates linking, e.g. from National Register of Archives
• Enables bookmarking of content
http://www.archiveshub.ac.uk/arch/glossary.shtml#identifier
Hub Workshop 2009
Challenges (of which there are many)
• Understanding our users
• Encouraging item-level descriptions
• Encouraging images/links to content
• Which technology?
• Perceptions of relevancy
• Understanding Impact
• Sustainability
Flickr cc licence: hoodwink’s photostream
Hub Workshop 2009
Moving Forward
• Increasing content and contributors
• Branding and new Website
• More engagement with users / user generated content
• Continuing to be standards-based, open and interoperable