open archives forum 3rd workshop, berlin 28th march 2003 overview – european activities of open...
TRANSCRIPT
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Overview – European activities of Open Archives Multimedia Projects
Philip Hunter
UKOLN
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Introduction to finding distributed multimedia resources
Simple Web searching for Multimedia resources and its deficiencies
The advantages of proper metadata for finding and using multimedia
Quick review of the key features of the OAI Protocol for Metadata Harvesting
How Metadata Harvesting is being (and might be) used by Multimedia Projects
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
European activities of open archives multimedia projects
A selection of European Projects using (or considering using) OAI as a means to make Multimedia collections more accessible
Significant organisational Issues affecting take-up
A need for the Creative Commons to promote take-up?
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Finding Multimedia Resources
How are multimedia resources to be located and accessed?
Search Engines – Google, Altavista, Alltheweb, Lycos, etc?
Through a single point of access? (Portal) Through multiple points of access? (Project
Web sites) What if we want sophisticated services?
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The Importance of Metadata for accessing Multimedia Resources
Metadata allows users to find out about the existence and availability of your resources.
Metadata allows users to find out important detail about the contents of your collections – whether these are digital or in some other form.
Metadata allows sophisticated use of your collections resources (whether or not your metatadata is publicly available).
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Finding the elusive Multimedia Object
These objects are connected by more than the word ‘Tyger’…
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Simple Web Search
Search engines bring up ‘relevant’ file names and files referenced from other files which contain ‘relevant’ text.
The right file is here, but so are lots of other irrelevant files. The noise level is high.
It would be impossible to build a useful service using this technique, even if clever algorithms are involved.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Google Advanced Search
Advanced search options are - not very advanced
Freetext searching with Boolean operators and string matching
Colour preference!
Filetype options
Content Filtering
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Google Advanced Search Results
Advanced search results still contain irrelevant returns.
It would be impossible to pull an appropriate image or multimedia object into an e-learning module (for example) using this level of technology.
The problem is the limited and unstructured metadata available for these digital objects.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The elusive Multimedia Object found?
The graphical file of the text of the poem is available as an inline file from a home page at the Institute of Genomic Research.
This is unlikely to be a source for the image which can deal with the digital rights management issues for the image.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Metadata is the solution for finding (usable) Multimedia objects
The creation of good metadata, or the repurposing of existing metadata (catalogue descriptions, etc), opens the possibility of sophisticated use and re-use of multimedia objects.
You might (for example) create a virtual reconstruction of an Assyrian palace from the 8th Century BCE, using digitisations of wall reliefs which are now distributed among museums in several countries.
Good interoperable metadata means the multimedia object can be requested and referenced for the appropriate context.
www-oi.uchicago.edu/OI/INFO/LEGACY/ Leg_Assyrian_Reliefs.htm
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Finding the Metadata for your Multimedia resources
How will users find the metadata for your collections (and hence the multimedia objects) any more easily than finding resources via Google?
A solution (among others) is the Open Archives Initiative Protocol for Metadata Harvesting.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Key Features of the Protocol for Metadata Harvesting
Specification of a Protocol for the exchange of Metadata
Specification of XML Markup format for the Metadata
The concept of ‘Metadata Exposure’
The concept of ‘Metadata Harvesting’
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Metadata Harvesting
Extraction of metadata from various sources Building services on local copies of metadata Resources themselves are not harvested Any kind of resource can be referenced by the metadata
usersearch for “William Blake, tyger tyger”
local copy ofMetadata (Service Provider)
metadataharvested offline
metadataharvested offline
metadataharvested offline
metadataharvested offline
each node independently maintained
all searching, browsing, etc. performed on the metadata hereindividual nodes
may support direct userinteraction
Data Providers
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
What ‘Exposing Metadata’ means:
Placing your metadata in a repository
Becoming a Data Provider
Making your metadata repository available to harvesters
Registering your metadata repository
Using a Service Provider
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
What Service Providers do:
Collect metadata available from Data Providers
Place aggregated metadata in a repository
Expose aggregated metadata via a Web Interface
Provide end-user facilities (most facilities provided are currently like those available with library OPACS)
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Open Archives
Archive:
repository of digital information
Open archive:
provides a machine interface for making its content available to external services
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Resources which can be accessible using OAI PMH
Peer reviewed papers (postprints) Grey literature (university reports, departmental
documents) Theses Collections (as Collection Level Descriptions) Images Multimedia (audio & video), and e-Learning Objects Virtually anything which can be described and
referenced in some way
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Third Generation Multimedia
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Multimedia and OAI
A search on the terms ‘oai, multimedia’ within the magazine Cultivate Interactive returns no hits
There are lots of projects implementing OAI, and lots implementing Multimedia, but not (so far) many projects implementing both.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Multimedia and OAI
Current state of play:
Archives making metadata for their resources available
Some Archives using metadata for processing their data
Few Service providers so far providing sophisticated third party services for users of multimedia resources
e-learning modules using metadata for processing and integration of resources (?)
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The New Opportunities Fund (NOF)-Digitise Programme (UK).
a NOF Portal has been created
The NOF-Digitise Programme has explicitly required projects to make metadata available for harvesting about their collections (Collection Level Description)
Has created a technical standards document containing general guidance which digitisation programmes might follow
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The New Opportunities Fund (NOF)-Digitise Programme (UK) – Collection Level & Item Level Description
OAI and the NOF-digitise Programme
In the context of the NOF-digitise Programme, OAI-PMH could be used at item-level, to enable users to search across a number of projects in order to find materials that were of interest to them. As an example, a user might be interested in anything that refers to the town of Chipping Norton. It is unlikely that any of the Collections Level records for projects and learning resources would include this term. However, there are likely to be items of interest in NOF-digitise projects such as the British Pathe newsreels, the Taunt Collection digitised by English Heritage and the Great British Historical Atlas. The user would have to find out that these projects existed, and then visit each in turn in order to see if there was information that was relevant. Few users would invest the time needed to navigate through many different websites, and would give up in frustration. Internet search engines, such as Google or Yahoo would be of little added help, as most of the items will be within databases, and therefore be hidden from them in the so-called 'dark web'.
From: Open Archives Initiative, Metadata Harvesting and the NOF Portal - David Dawson, Re:Source (UK) February 2003. http://www.ukoln.ac.uk/nof/support/help/papers/oai-pmh/
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The New Opportunities Fund (NOF)-Digitise Programme (UK) – Collection Level & Item Level Description
English Heritage Viewfinder project
England at Work (industrial history) and the Henry W Taunt Collection (vintage scenes of Oxfordshire and the Thames).
Currently makes Collection Level Description metadata available for harvesting
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The New Opportunities Fund (NOF)-Digitise Programme (UK) – Collection Level & Item Level Description
Simple Search Screen for Viewfinder, allowing searching of a minimal set of record fields. This allows very imprecise, Google-ish searches.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The New Opportunities Fund (NOF)-Digitise Programme (UK) – Collection Level & Item Level Description
Advanced Search Screen for Viewfinder, allowing searching of record fields. This allows very precise searches.
If these fields are wrapped in XML, and then exposed via an OAI compliant (ie harvestable) metadata repository, then the archives supplying the project with resources would be interoperable with other archives.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The New Opportunities Fund (NOF)-Digitise Programme (UK) – EnrichUK Portal
The NOF-Digitise Programme Portal is called EnrichUK. It gives access to around 150 Digitisation Projects.
Each of these projects supplies Collection Level Metadata about the materials which they have digitised, and which are now available via the EnrichUK portal.
The advanced search options available are based on the record fields in the Collection Level Descriptions
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The New Opportunities Fund (NOF)-Digitise Programme (UK) – EnrichUK Portal
The advanced search option for EnrichUK is not very sophisticated, since the information about collections contained within a Collection Level Description record is not very detailed.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The New Opportunities Fund (NOF)-Digitise Programme (UK) – EnrichUK Portal
The EnrichUK portal will display the full metadata records on request.
These records are available for harvesting by other service providers.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
BIRTH: Building an Interactive Research and delivery network for Television Heritage
Content from the time when television was in its infancy is barely accessible. On the one hand this is due to the fact that information on this material is stored in legacy databases or paper-based catalogues; on the other hand the content is recorded onto film or tape formats out of use. In order to provide access to this content the project BIRTH has been brought into being following a call for proposals for pilot projects in the MEDIA PLUS Programme of the European Community. Out of 50 proposals BIRTH has been selected together with five other projects for funding. Led by the Austrian, non-profit R&D organisation JOANNEUM RESEARCH, a consortium of six audiovisual archives together with another technical partner will select archive material and present it on a coherent web-portal. So material from the early days of television is prepared together with information on how television production looked in these early days. The content provided will not only be early programs but in addition also schedules, stills, statistical figures and much more. Particular attention is given to provide language independent search possibilities and to offer possibilities to compare the different development paths in several countries over Europe. The following audiovisual archives are partner in the BIRTH project: - BBC (UK) - Nederlands Instituut voor Beeld en Geluid (NL) - ORF (A) - RTBF (B) - Südwestrundfunk (D) - Telewizja Polska (PL). Technical Partners are the Belgian company Streamcase and the Austrian, non-profit R&D organisation JOANNEUM RESEARCH.BIRTH provides easy access to material - which is up till now buried in the archives - to the professional user, the scientific community and the general public.
No website for the BIRTH project as yet.
(Photo of tape formats from 1969 to the present day courtesy of Kazimierz Schmidt)
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The New Opportunities Fund (NOF)-Digitise Programme (UK) – British Pathe Newsreels
The site makes it possible to preview all items from the 3500 hour British Pathe Film Archive (90,000 individual items) which covers news, sport, social history and entertainment from 1896 to 1970.
Higher resolution copies of the same items can be licensed for (they say) PowerPoint Presentations and Web Publishing.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The New Opportunities Fund (NOF)-Digitise Programme (UK) – British Pathe Newsreels
Result of a search for the electronic instrument designer Peter Zinovieff
Each clip has an associated description which is brought up as a result of a user search
These descriptions are not in any standard metadata format.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The LACITO Archive
The LACITO ArchiveAn archive of natural speech in "rare" languagesThe LACITO Archive provides free access to documents of connected, spontaneous speech, mostly in "rare" or endangered languages, recorded in their cultural context and transcribed in consultation with native speakers. Its goal is to contribute to the documentation and study of a precious human heritage: the world's languages. At present, the archive contains some 78 documents in 15 languages
A sound archive with synchronized transcriptionsFor linguistic science, language is first and foremost spoken language. The medium of spoken language is sound. The LACITO archive gives access to original recordings simultaneously with transcriptions and translations, as a guarantee of authenticity and as a resource for further research.
A structured, open architectureThe archived data is structured in accordance with the latest data-processing standards, in an open format, and may be downloaded for research purposes. The software used to prepare and disseminate it is open-source.
CopyrightThe Archive is an ongoing project of the research group "Oral Tradition: languages and civilizations" of the French National Center for Scientific Research.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
The LACITO Archive
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
ArtWorld
ArtWorld
A three-year project, led by the University of East Anglia, across a partnership group of museums, art galleries and academic departments. The objective of the project is the provision of digital images and associated resources for the enhancement of learning and teaching in world art studies. It is designed to facilitate access for students and teachers to primary visual resource materials that are normally relatively inaccessible or widely scattered.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
ArtWorld
To make the collections’ databases and resources applicable for independent use in life-long learning. To link up with other on-line museums’ collections’ databases with related aims and other appropriate networks of digital resources.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Artiste/Sculpteur
Partners•Giunti Interactive Labs (Italy) •IT Innovation (UK) •University of Southampton, ECS Dept
. (UK) •GET-ENST – University of Paris •C2RMF / Louvre labs (France) •The Museum of Cherbourg (France) •The Uffizi Gallery (Italy) •The Victoria and Albert Museum
(UK)
•The National Gallery (UK)
Semantic and Content-based mULtimedia exPloiTation for EURopean benefit
This is a three year project funded by the 5th Framework Programme of the European Community. Began in May 2002.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Artiste/Sculpteur
The objectives are to: develop a multimedia digital library system, with specific support for 3D objects; develop a semantic layer and tools to populate it with metadata based on image and object content, existing metadata, and information from the Web; develop and influence open distributed multimedia interoperability protocols; build semantic knowledge bases using the above techniques, and exploit them through e-learning products. Results will be applicable to many sectors: 3D object algorithms; an ontology that links multimedia and metadata; metadata generation tools; interoperable protocols; an integrated system; and semantic knowledge bases exploited by e-learning products.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Resource Discovery Network (RDN)
The RDN is a collaborative network of subject gateways, funded for use by UK Higher and Further Education by the JISC (though it is used much more widely).
Each subject gateway, as part of its service, provides the end user with access to databases of descriptions of freely available, high quality, Web resources.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Resource Discovery Network (RDN) – Resource Discovery Service
Each RDN gateway maintains one or more databases of metadata records. There are two possible approaches to providing a multidisciplinary cross search of all these databases:
a distributed search, where a "broker" sends the same request to each of the databases in turn, retrieves results from them, and presents the whole set back to the user. a single database, where the records from each of the databases is pulled into a central store and indexed and served from there.
ResourceFinder initially used a distributed search approach, based on WHOIS++ and later Z39.50.
ResourceFinder now uses a single "union" database of all RDN records.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Resource Discovery Network (RDN) – Resource Discovery Service
UKOLN developed the Perl implementation of an OAI repository that is now in use across the RDN. It consists of two scripts: one to covert the records from the gateway database format (ROADS, MySQL tables, etc.) to DC XML records, and the other an OAI front end to that repository. In this implementation, a repository consists of one or more directories on the local file system, containing metadata records as individual files. This approach makes generating a repository very easy as there is no need to interface with databases or the like.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Resource Discovery Network (RDN) – Resource Discovery Service
It would have taken time and effort for each of the gateways to develop their own OAI front-ends to their database. Much simpler to develop a simple script to export their data. Part of the RDN OAI software included a sample ROADS to DC XML conversion script.
Exporting the data into an OAI repository means the RDN could (though currently it does not) enhance or adapt the metadata as part of the export process. It could make different attributes available in different OAI repositories for different audiences or licensees of RDN data.
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Open Archives Forum Resource Database
The Open Archives Forum has a database of OAI related activity in Europe, which is publiclly available
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
Why is there not (yet) more OAI related Multimedia activity?
IPR The Branding and Ownership issue Rights management & Payments Control over the context in which the resource appears The project funding is limited Just because it is possible to make everything available, do you want to do this?
Open Archives Forum 3rd Workshop, Berlin 28th March 2003
OAI PMH as a tool in the Multimedia World
How it looks:-
United Kingdom – significant activity and lots of interest
Europe – significant activity and lots of interest CEE – not much activity, but a good deal of
interest, and an awareness of the possibilities Scandinavia – some activity, and a lot of interest
http://www.oaforum.org