arts and humanities e-science—current practices and future challenges

7
Future Generation Computer Systems 25 (2009) 474–480 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs Arts and humanities e-science—Current practices and future challenges Tobias Blanke a,* , Mark Hedges b , Stuart Dunn a a Arts and Humanities e-Science Support Centre, King’s College London, United Kingdom b Centre for e-Research, King’s College London, United Kingdom article info Article history: Received 16 June 2008 Received in revised form 24 September 2008 Accepted 1 October 2008 Available online 17 October 2008 Keywords: Arts Humanities Data management Access grid High performance computing Virtual workbench abstract This article offers an analysis of UK arts and humanities e-Science practices in order to identify current trends. It also considers challenges of how arts and humanities disciplines fit into the overall e-Science agenda. We will discuss a first phase of early experimentation projects in 2007 and continue with a second phase since 2007, which more systematically investigates methodologies and technologies that could provide answers to grand challenges in digital arts and humanities research. © 2008 Elsevier B.V. All rights reserved. 1. Introduction: e-Science in the arts and humanities This article offers an analysis of UK arts and humanities e- Science practices in order to identify current trends. It also considers challenges of how arts and humanities disciplines fit into the overall e-Science agenda. We aim to develop a specific research agenda for arts and humanities e-Science that builds upon experiences from early experiments and subsequent systematic investigations during the first three years of the UK arts and humanities e-Science projects. The article will focus on research projects and trends and therefore not cover the institutional infrastructure to support them. 1 The UK arts and humanities e-Science initiative is among the most well established in the world, but has by far not been the only one worldwide. There have been several projects in the US that try to link arts and humanities research with cyberinfrastructure developments. Especially, the recently funded Bamboo project 2 aims to deliver shared technology services to the arts and humanities community. HASTAC is another significant platform for the US. 3 In Germany, TextGrid has been very successful in building * Corresponding author. E-mail address: [email protected] (T. Blanke). 1 An overview of these activities can be found on the Arts and Humanities e- Science Support Centre website: http://www.ahessc.ac.uk. 2 http://projectbamboo.org/. 3 http://www.hastac.org. a community grid for textual editing as part of the national D-Grid initiative. 4 Its approach however is different from the UK one in so far as almost the complete investment for arts and humanities e- Science went into this single project. There have been several other attempts worldwide to use grids and other e-Science technologies for humanities projects—particularly in library and information science. But these have mostly remained isolated projects. UK arts and humanities e-Science is following a different approach by distributing the existing money among as many researchers as possible to make it easier to find out about grass-roots research interests in the community. This article agues that from the grass-roots activities in the UK specific research interests in arts and humanities e-Science have been developing over the past three years, which justify further investigations. The contribution of this article will be to bring them together into a common research agenda. Rather than simply surveying these activities, we offer a classification of them that allows us positioning them within the newly emerging discipline of arts and humanities e-Science. Therefore, we have concentrated on common agenda items such as e-Science empowered collaboration in arts or virtual workbenches for support of research on digital data. These projects are all good indicators of what the future might deliver, as grand challenges for the arts and humanities e-Science programme such as the data deluge [4] emerge. For example, the Bush administration will have produced over 100 million emails by 4 http://www.textgrid.de. 0167-739X/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2008.10.004

Upload: tobias-blanke

Post on 21-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Future Generation Computer Systems 25 (2009) 474–480

Contents lists available at ScienceDirect

Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs

Arts and humanities e-science—Current practices and future challengesTobias Blanke a,∗, Mark Hedges b, Stuart Dunn aa Arts and Humanities e-Science Support Centre, King’s College London, United Kingdomb Centre for e-Research, King’s College London, United Kingdom

a r t i c l e i n f o

Article history:Received 16 June 2008Received in revised form24 September 2008Accepted 1 October 2008Available online 17 October 2008

Keywords:ArtsHumanitiesData managementAccess gridHigh performance computingVirtual workbench

a b s t r a c t

This article offers an analysis of UK arts and humanities e-Science practices in order to identify currenttrends. It also considers challenges of how arts and humanities disciplines fit into the overall e-Scienceagenda.Wewill discuss a first phase of early experimentation projects in 2007 and continuewith a secondphase since 2007, which more systematically investigates methodologies and technologies that couldprovide answers to grand challenges in digital arts and humanities research.

© 2008 Elsevier B.V. All rights reserved.

1. Introduction: e-Science in the arts and humanities

This article offers an analysis of UK arts and humanities e-Science practices in order to identify current trends. It alsoconsiders challenges of how arts and humanities disciplines fitinto the overall e-Science agenda. We aim to develop a specificresearch agenda for arts and humanities e-Science that builds uponexperiences from early experiments and subsequent systematicinvestigations during the first three years of the UK arts andhumanities e-Science projects. The article will focus on researchprojects and trends and therefore not cover the institutionalinfrastructure to support them.1

The UK arts and humanities e-Science initiative is among themost well established in theworld, but has by far not been the onlyone worldwide. There have been several projects in the US thattry to link arts and humanities research with cyberinfrastructuredevelopments. Especially, the recently funded Bamboo project2aims to deliver shared technology services to the arts andhumanities community. HASTAC is another significant platform forthe US.3 In Germany, TextGrid has been very successful in building

∗ Corresponding author.E-mail address: [email protected] (T. Blanke).1 An overview of these activities can be found on the Arts and Humanities e-Science Support Centre website: http://www.ahessc.ac.uk.2 http://projectbamboo.org/.3 http://www.hastac.org.

0167-739X/$ – see front matter© 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.future.2008.10.004

a community grid for textual editing as part of the national D-Gridinitiative.4 Its approach however is different from the UK one in sofar as almost the complete investment for arts and humanities e-Sciencewent into this single project. There have been several otherattempts worldwide to use grids and other e-Science technologiesfor humanities projects—particularly in library and informationscience. But these have mostly remained isolated projects. UKarts and humanities e-Science is following a different approachby distributing the existing money among as many researchers aspossible to make it easier to find out about grass-roots researchinterests in the community.This article agues that from the grass-roots activities in the UK

specific research interests in arts and humanities e-Science havebeen developing over the past three years, which justify furtherinvestigations. The contribution of this article will be to bringthem together into a common research agenda. Rather than simplysurveying these activities, we offer a classification of them thatallowsus positioning themwithin thenewly emergingdiscipline ofarts and humanities e-Science. Therefore, we have concentrated oncommon agenda items such as e-Science empowered collaborationin arts or virtual workbenches for support of research on digitaldata. These projects are all good indicators ofwhat the futuremightdeliver, as grand challenges for the arts and humanities e-Scienceprogramme such as the data deluge [4] emerge. For example, theBush administrationwill have produced over 100million emails by

4 http://www.textgrid.de.

T. Blanke et al. / Future Generation Computer Systems 25 (2009) 474–480 475

the end of its term [8]. These can provide the basis for new typesof historical and socio-political research that will take advantageof computational methods to deal with mass digital data.The integration of data items into arts and humanities research

is non-trivial, as complicated semantics underlie the archivesof human reports. Humanities data may be highly contextual,its interpretation depending on relationships to other resourcesand collections, which are not necessarily digital. For instance,a historian discussing the war in Iraq and the involvement ofthe Bush family might refer to two different historical events—depending on the historical context they are writing about.Also, new retrieval methods for digital data must be intuitivefor the user and not based on complicated metadata schemes.They have to be specific in their return and deliver exactly theparticular piece of information the researcher is interested in. Thisis fairly straightforward for structured information if it is correctlydescribed, but highly complex for unstructured information [8], asmost common in the arts and humanities.Some answers to such challenges have already been tried out

in the early experimentation projects in UK arts and humanities e-Science, but they have been evenmore systematically investigatedin the second generation projects since 2007.

2. Early experimentations in arts and humanities e-science

The first set of projects within the UK arts and humanitiese-Science programme, which ended in 2007, can be seen asa pioneering phase, in which ad hoc experiments of earlyadopters have dominated [2]. The second phase, which willbe discussed in the next section, can be considered as one ofsystematic investigations where specific experimentations deliverspecific building blocks of an e-Infrastructure for future arts andhumanities research. The challenge of the first phase was to provethat e-Science in the arts and humanities holds substantial benefitsfor research and can therefore spark interest among researchers. Atthe same time, the first phase also confronted establishedmethodsin computing with new challenges specific to arts and humanitiesresearch data. The second phase of systematic investigationcontinues to work on these specific challenges. It explores howe-Science tools and methodologies are useful for research in thearts and humanities. Methodological commonalities are clearlyemerging.In this section, which covers the early experimentations, we

will first describe the initial user requirements gathering attempts;second, the early ideas about how to deal with the data deluge;third, we will analyse what virtual workbenches could mean forarts and humanities research; fourth, the enthusiastic uptake of e-Science in performance theory and practice will be considered.

2.1. User requirements

Over the past few years, substantial efforts have gone into thecapture of user requirements for arts and humanities e-Scienceand e-Research [9]. It has long been noticed that the uptake ofe-Science infrastructures seems to be particularly challenging forthe arts and humanities [5]. There are some exceptions of highlyintegrated disciplines in the broad range of arts and humanitiessub-disciplines, but generally speaking, uptake is slow and morechallenging than in more quantitative-oriented research areassuch as the natural sciences or quantitative social sciences. Havingsaid that, without a concrete user need e-Science would be anineffective exercise.

In Oxford, three workshops entitled User Requirements Gather-ing for the Humanities were held at the Centre for the Study onAncient Documents in order to analyse best practices in user re-quirement studies for e-Science solutions that work for human-ities research.5 They complemented the larger e-Science ScopingStudy,6 which produced reports on ‘grand challenges’ for differentsubject areas in the arts and humanities. These grand challengesare mostly linked to the specificities of arts and humanities dataand the complex requirements to run, for example, information ex-traction algorithms on fuzzy and incomplete historical records.Due to space limitations, we would like to focus on another

project to look at the use of the video conferencing tool AccessGrid. A series of workshops and experiments organised by theHumanities Research Institute in Sheffield investigated The AccessGrid in Collaborative Arts and Humanities Research.7 For some timenow, the Access Grid has sparked interest in arts and humanities,as their research often takes place in highly specialised domainsand subdisciplines, niche subjects with expertise spread acrossuniversities and countries. The Access Grid can provide a cheaperalternative to face-to-face meetings.All of the Sheffield workshops were remote collaborations

across the Access Grid.8 The first workshop, for instance, dealtwith digital images, as they become more and more importantfor many aspects of humanities research. Recent years have seena steady increase in digitisation projects that demand marryinghigh-resolution image technologies with grid environments. InSection 2.3, we will discus the digitisation example of the FroissartChronicles, which have been used in grid research activities of theWorld Universities Network.9 The main finding of this workshopand experiments with digital image applications in the Access Gridhas been that the limited number of applications realised to workdirectly within the Access Grid restrict real interaction. Althoughthe Access Grid might have been designed as a conference tool,the researchers missed additional services, e.g. the opportunityto collaboratively annotate the digitisation images and add themto their research repositories. Access Grid development seemsto be too focussed on advancing its video conferencing facilities,but little if any attention has gone into tools and methodologiesfor sharing and collaborating in computer supported collaborativeresearch.Overall, the Sheffield experiments confirmed that Access Grid

might be a good environment to substitute some face-to-facemeetings, but lacks innovative means of collaboration, whichcan be especially important in arts and humanities research.The researchers’ concern has been how to realise real multicastinteraction, as it has been done in VNC technology or basic wikitechnology. These could support new models of collaboration inwhich, as theworkshoporganizers stress, the physical organisationof theAccessGrid suite canbe accommodated to specific needs thatwould e.g. allow participants to walk around and interact morelike in a real room. The procedure of Access Grid sessions couldalso be changed, away from staticmeetings towardsmore dynamiccollaborations.The workshop series in Sheffield offered insights into specific

interests and concerns of humanities researchers in dealing withadvanced network technologies such as the Access Grid. The nextsubsection will look into the needs of researchers with regards tothe newmushrooming of digital resources. A preparatory project at

5 http://ahessc.ac.uk/files/active/0/URH-report.pdf.6 http://ahds.ac.uk/e-science/e-science-scoping-study.htm.7 http://ahessc.ac.uk/files/active/0/AG-report.pdf.8 Information on the all the completed projects including their reports can befound: http://www.ahessc/projects.9 http://wungrid.org/.

476 T. Blanke et al. / Future Generation Computer Systems 25 (2009) 474–480

University College London (UCL) looked at new research thatmightdevelop out of newly available large amounts of data in historicalcensus holdings.

2.2. Managing the data deluge

ReACH, a workshop about Researching e-Science Analysis ofCensus Holdings, was held at UCL in London.10 Discussions startedoff from the assumption that the use of high-performancecomputing for humanities data is currently underdeveloped, alsodue to the lack of available data sets that could form the basis forpilot projects. ReACH set out to work on available digisations ofhistorical census data held at the private company Ancestry.co.uk.Ancestry has built up several Terabytes of census holdings dataworldwide and has digitised the censuses of England and Walesunder license from the UK’s National Archives.For e-Science research, these datasets pose specific challenges.

First and foremost, records are incomplete due to the fact thatthey are created by humans. This might lead to inconsistencies,but what makes things worse is the fact that these are 19thcentury records when data acquisition standards were not yetwell developed. Normally, the censuses were captured by visitinghouseholds and speaking to whoever answered the door. No otherinformation but that interview was considered. Say, an interviewin 1851 recorded a 19-year old male named Adam in a particularhousehold. Then, it can easily be the case that, in 1861, thereis either no Adam in the household and nobody remembers hisexistence, or there is an Adam, but now he is supposedly 41. Inorder to establish whether it is the same Adam, census historiansapply heuristic methods that deliver probabilities of data matchesor data links across different census holdings [6]. In order toeffectively support such research, a system of (semi-)automaticmatches of records would be needed to create what is known asa longitudinal database of individuals across the census. For that,systems have to be trained to incorporate modelling procedures ascurrently applied by census historians.As for all data-driven development, managing the data deluge

in the humanities means starting from the requirements of thespecific data set in scope. Then, data cleansing for manuallycreated data as in historical census sets is part of the process andnot something that can be done prior to applying ready-madealgorithms. It is also clear that at least semi-automatic recordmatching can only be done with significant processing powerto cope with repeated iterations for statistical calculations on29 millions records per census [6]. With appropriate processingpower, missing people and missing data could be reconstructedthrough contextual information.For security reasons and to protect the rights of Ancestry,

the commercially sensitive data should be run on a stand-alonemachine under the supervision of Ancestry staff; it should also bedelivered on physical media rather than via the internet. Arts andhumanities can rely on a range of expertise in health sciences andbio-informaticswhen it comes to dealingwith complex ethical andsecurity issues for data sets. More research, however, is neededon the ethics underlying the arts and humanities research withhuman data. Lastly, as the public is generally very interested incensus data, it seems feasible to useWeb 2.0 techniques and socialcomputing applications to enhance the current data resources.Many members of the public are keen genealogists and bestexperts on their family’s history.Data management and enhancement as envisioned in ReACH

are only the first steps for arts and humanities e-Science to find

10 http://ahessc.ac.uk/files/active/0/ReACH-report.pdf.

answers to the grand challenges emerging from their specific dataneeds. The next step should be to offer tools that allow processingand analysing the data. Virtual workbenches, as discussed in thefollowing subsection, generally offer easy access to large data setsas well as the tools to work with these data sets. They have alreadybeen successfully applied in many areas of e-Science.

2.3. Virtual workbench

Virtual workbenches offer a unified work environment toresearchers. Successful examples from the sciences include UKAstroGrid designed to create a working Virtual Observatory forUK and international astronomers.11 Key features include dataand information discovery with seamless access to a wide rangeof astrophysical databases as well as services to run applicationsagainst the data stores. The early experimentation phase of e-Science for arts and humanities in the UK included two virtualworkbench proof-of-concept demonstrators. The first one wasOxford’s Virtual Workspace for the Study of Ancient Documents(VWSAD).12 It allowed researchers working in the field of ancientdocuments to organise and annotate textual and related imagedata. The workspace also included a framework to add services tomediate online discussions about the data among the researchers.The second demonstrator to prove the potential of virtual

workbenches for arts and humanities e-Science was the VirtualVellum project in Sheffield.13 Its aim was to enable researchersto gain access to high-resolution digitisations. The test bed wasa set of illuminated manuscripts of Jean Froissart’s ‘Chronicles’, arecord of the 100 Years war between England and France fromthe 14th century. These images are not only of high interest toresearch, but also extremely costly. Thus, archives hesitate to givescholars access, not to mention the general public. Subject toIntellectual Property Rights, Virtual Vellum promises free accessto the ‘Chronicles’ and other manuscripts for users.The challenge for Virtual Vellum was to enable immediate

access to such large-scale images without too much of acompromise in terms of viewing quality, as researchers areparticularly interested in details that show differences andcommonalities between images. Virtual Vellum uses tile-baseddata structures and has scoped the JPEG 2000 format to realisethese requirements. Together with XML configuration files, JPEG2000 can semi-automate the tiling process. Employing tiling, usersdo not have to view the whole image at a time, but only relevanttiles. However, at low levels of magnification this might lead toproblems. Therefore, the images were rescaled to half their sizesand tiled again. In the end, we have collections of tiles at differentresolutions for different magnifications. Virtual Vellum has beensuccessfully tested in Access Grid sessionswhere it was consideredto be an interesting alternative to the too often non-interactivecollaboration sessions, as discussed earlier. It is also designed to beembedded in a framework of data grid technologies such as StorageResource Broker, which can be used to keep the image resourcesdistributed.The Virtual Workbench is one approach to deal with complex

arts and humanities data that can already be found in datastorages. The next subsection classifies performance research inarts and humanities e-Science. Here, researchers are faced notonly with incomplete and fuzzy data sets but with the need toeffectively integrate live data from performances into e-Scienceenvironments.

11 http://www2.astrogrid.org/.12 http://ahessc.ac.uk/files/active/0/VWSAD-report.pdf.13 http://ahessc.ac.uk/files/active/0/VV-report.pdf.

T. Blanke et al. / Future Generation Computer Systems 25 (2009) 474–480 477

2.4. Performance

When it comes to e-Science research and performance arts,there have been two general tendencies. The first one is to useexisting e-Science tools to help with the creation of practice-led research in performance arts. The second one is to supportthe analysis of performance data by providing effective accessto heterogenous live data resources. The first discussed projectwe look at attempts to cover both approaches by creatingnew performance data in distributed live performances via theAccess Grid and by semantically annotating them at the sametime for future reuse. Held at the university of Bristol, theworkshop and experimentation series Performativity, Place, Space—Locating Grid Technologies examined the use of grid and relatednetwork technologies to support research processes in performingarts.14 The Access Grid has already been widely used to realisedistributed performances [1]. Unique to the experiments inthe workshops, however, was the employment of the Memetictechnology, a JISC-funded Virtual Research Environment (VRE)to implement dialogue mapping tools in Access Grid sessions,originally designed to annotate meetings.15 A core component ofMemetic is Open University’s Compendium, a hypermedia tool formapping meetings, with whom events can be non-chronologicallylinked.16 The combination of Access Grid and Compendium candeliver semantically richly annotated video recordings of liveperformances for later use in research and analysis.At the Bristol workshop series, the Access Grid was more

closely examined as an environment for practice-led research. TheAccess Grid was reconceptualised as a medium for performances.Together with Memetic, it would be a new interface for livedata in performance research. The results of the Bristol seriesof experimentations are similar to the ones discussed aboveabout the Sheffield workshops on the Access Grid in humanitiesresearch. Both series of workshops included experimentationswhich changed Access Grid into something more than just avideo-conferencing tool, with a static configuration for the roomit is installed in. In performance, for instance, the positioningof the cameras within the Access Grid room would be part ofthe performance. Within these experiments, the Access Grid hasbecome more like its original promise, more like a new interfaceto the computing and network resources. This interface wouldinclude a whole room as an alternative to the classic desktop, as ithas been envisioned in ‘Group-Oriented Collaboration: The AccessGrid Collaboration System’ [3].On one hand, new technologies can deliver enhancements of

arts practices, while on the other, technology and the digitalworld are still not able to fully mimic the analogue world. Thisis often seen as a limitation for arts practices. Timing exercisesduring the Bristol experiments revealed issues of network delay,sound latency, and sound transmissions in Access Grids. Networksnaturally produce delays in transmissions, and Access Grids aretherefore never able to fully synchronise activities across twodifferent sites, which is to the disadvantage of performances.This delay, however, can be seen either as a problem or as anintegral part of the creative process itself. Networked artworkscould, for example, introduce these delays into their own scriptsand thus acknowledge the specific characteristics of this newmaterial for arts. Technology and creativity are not dichotomous,but are mutually dependent.Next, we will look at another type ofinterface to performance arts data captured with motion capturetechnology.

14 http://ahessc.ac.uk/files/active/0/PPS-report.pdf.15 http://www.memetic-vre.net/.16 http://kmi.open.ac.uk/projects/compendium/.

In Newcastle, Culture Lab togetherwith the North East Regionale-Science Centre and the Centre for Rehabilitation and EngineeringStudies experimented with Data Services for Associated MotionCapture User Categories (AMUC).17 In this project, e-Science toolsare again used to create novel and innovative interfaces to researchdata. AMUC targets the tracking and capturing of motion thatgo beyond the everyday use of human bodies. Grid technologiesprovide the infrastructure to adjust motion capture data tospecific user needs and to distribute it across multiple researchsites. Complex, coordinated movements produced by performingartists should benefit areas that require exact measurements ofhuman body movement such as medical engineering. Here, thecomplex and fuzzy data typical to humanities can assist thewider research.The AMUC collaboration first worked on the exact definition

of user requirements regarding motion capture data in orderto develop data retrieval methods for motion capture resourcesvia grid technologies. This work included capture and storagewith a Vicon advanced motion capture system, and advancedcomputational methods for analysis and visualisation of suchdata, which comprised methods for the sketch-based retrieval ofthe data. This newly developed retrieval method, which exactlymimics the thinking of motions by performance arts researchers,does not rely on prior knowledge of metadata annotations and isnot textual. Different input devices can be used from touch screensand haptic devices towards traditional computing elements. Theproject successfully implemented the prototype database overdistributed data resources with a distributed index. Queries can besent to the network of databases using aweb service. Furthermore,customisable indexers can help new user groups to define theirspecific interests and needs. One indexer can access one ormore data channels and return user-identified indexable features.Future work could include on-the-fly indexing for interactiveinstallations.The completed projects not only in performance arts showed

that ad-hoc experiments in arts and humanities e-Science offersome promising initial results. In particular the broad range of ap-proaches and the fact that there are nevertheless strong common-alitiesmean that an e-Science agenda has been developing—almostorganically. From September 2007, new projects have taken for-ward this agenda and elaborate some of the research questionsraised in the first set of projects while at the same time posing newones.

3. Systematic investigations in arts and humanities e-science

The second phase of arts and humanities e-Science moresystematically investigates methodologies and technologies thatcould provide answers to grand challenges in arts and humanitiesresearch. In this section, we are going to discuss the currentwork in the UK arts and humanities e-Science initiative. Againthe data grand challenge is at the centre of most projects butparticularly those looking to realise a better virtual library forarts and humanities research. This will be analysed in the nextsubsection. The second subsection will see how collaboration inarts further develops. Next to performance, music research playsan important vanguard function here, mostly because manymusicresources are already available in digital formats. Building on theearlier investigations into the data deluge and how to deal with it,many projects look into the so-called knowledge technologies thathelp with data and text mining as well as simulations in decision

17 http://ahessc.ac.uk/files/active/0/AMUC-report.pdf.

478 T. Blanke et al. / Future Generation Computer Systems 25 (2009) 474–480

support for arts and humanities research. The last subsection willintroduce these approaches.

3.1. Serving the library

Serving the library with e-Science means first and foremostdata integration via virtualisation that will hide ‘irrelevant’differences between data resources, making integration easier. Inthis subsection two projects are presented. The first one is based atthe Arts and Humanities Data Service (AHDS) and is already fairlyadvanced. Until the end of its funding in March 2008, the AHDSwad been the central place in the UK to store digital outputs fromAHRC-funded projects for more than 10 years. It also pioneerednovel methods in dealing with arts and humanities data.In order to satisfy the needs of a highly diversified community,

the AHDS had to look at virtualisation technologies that improveresource-sharing as well as accessibility. This is particularlyimportant in the arts and humanities where manually curateddata is predominant. Humanities and arts data objects canhave complex structures, with many internal relationships, bothstructural and semantic. Moreover, they are highly contextual,with many relationships to other resources and collections. TheAHDS has led such problems to the flexible preservation systemFedora18 that supports the representation of compound digitalobjects and aggregations of in principle arbitrary complexity,and that allows multiple heterogeneous metadata schemas tobe associated with an object. It contains built-in support forsemantically representing (as an RDF/OWL graph) the internalstructure of compound digital objects and relationships betweenobjects.The architecture of Fedora is essentially service-orientated,

with all functionality being exposed as web services; in particu-lar, all data and metadata stored within a Fedora object are madeavailable via web services. Fedora has successfully been integratedwith Storage Resource Broker (SRB), providing virtualised stor-age.19 Essentially, the data storage area of Fedorawasmapped ontoan SRB zone. One of the AHDS successor organisations will nowtake first steps into using iRODS for preservation.20 iRODS standsfor Rule Oriented Data System and is based on SRB. This mightbe more promising as the data grid software will be able to makeuse of the complex metadata stored within Fedora. Data manage-ment and preservation processes on an iRODS data grid can becoded as rules and controlling smaller atomic actions calledmicro-services. The sequence of actions performedwhen executing a rulecan be changed by adding, removing or replacing individual ser-vices. iRODS provides an abstraction for data management poli-cies and processes, as SRB does for storage and data. Fedora (andother) repositories can become structured data resources withinthe grid, while grid technologies can provide access to the con-tents of distributed repositories belonging to different administra-tive domains.Another library and archive focussed project is based at UCL.

E-Curator: 3D colour scans for remote object identification andassessment21 will use UCL’s collections and state of the art 3Dcolour scanner, which can revolutionise traditional methods inmuseums and archives based on text and images. UCL hosts 3museums, 10 departmental collections and half a million objectsand specimens. Their Arius3D scanner is the first of its kind inEurope providing high resolution 3D-geometry through the use

18 http://www.fedora.info.19 http://www.itee.uq.edu.au/eresearch/projects/dart/outcomes/FedoraDB.php.20 http://irods.sdsc.edu/.21 http://www.museums.ucl.ac.uk/research/ecurator/.

of a laser triangulation system at a 100 micron point spacing.Colour information is captured with red, green and blue lasers.The project uses 3D-recording to describe artifacts as a whole—inthe first instance 6 pilot projects for case studies. This method willoffer yet unknown details and insights into the object’s structure.Such 3D-scans could, for instance, help with the identification ofdegraded surfaces. Of particular interest to the project are gridtechnologies which allow sharing such objects, as they are oftendifficult to analyse by a single research team alone and will haveto be correlated with others. As with the AHDS Fedora project,data grid technologies like SRB are used by curators to share andorganise data. A portal will allow annotating and viewing theobjects and sharing information about them. Users can e.g. zoomor rotate the 3D-representations or change the lighting conditionsand the colour mixture. This way, UCL hopes to work towardsestablishing 3D-scan-data as a curatorial means of investigation.Thus, e-Curator is as much a methodological experiment as it isa technological one. The next subsection will turn towards newcollaborations in arts and music.

3.2. Collaboration in arts and music

The Bedfordshire, Manchester and Open University basedproject Relocating Choreographic Process (e-Dance) on The Impactof Grid technologies and collaborative memory on the documentationof practice-led research in dance continues parts of the successfulworkshops on Performativity, Place, Space.22 E-Dance will proceedwith the creative and critical engagement of the dance communitywith e-Science and specifically Access Grid. It will investigatehow video conferencing and video annotation tools can helppractice-led research. A main focus is thereby to use collectivetools to make sense of the processes during dance. The AccessGrid will be reinvented for use in performance arts as acontext for distributed performance and digital documentationof performance as research. In the end, the project will haveproduced a large data repository for future dance research usinga combination of manual and automated annotation of the mediaproduced in the Access Grid sessions.Since the web has matured, readily available digital music re-

sources have rapidly grown. At Goldsmith, University of London,the project Purcell Plus will build upon the successful collabora-tionOnlineMusical Recognition and Searching (OMRAS),23which hasjust achieved a second phase of funding by the EPSRC. With OM-RAS, it will be possible to efficiently search large-scale distributeddigital music collections for related passages. It uses grid technolo-gies to index the very large distributedmusic resources. Purcell Pluswill make use of the latest explosion in digital data for music re-search. It uses Purcell’s autographMS of ‘Fantazies and In Nominesfor instrumental ensemble’ in order to investigate the method-ology problems for using toolkits like OMRAS in musicologyresearch.Music resources available on the internet shall also be

employed in musicSpace: Using and Evaluating e-Science DesignMethods and Technologies to Improve Access to Heterogeneous MusicResources for Musicology in Southampton. musicSpace will bringtogether different resources into one single user interface, thusavoiding researchers having to carry out multiple searches onmultiple resources for their research questions. They will workwith the AHRC resources Cecilia, EPSRC’s CHARM, MIMAS’s COPACand with some other commercial resources in order to bestsupport musicology research. Decisively, new resources can beflexibly integrated and the framework of data changed. This

22 http://www.ahessc.ac.uk/e-dance.23 http://www.omras.org/.

T. Blanke et al. / Future Generation Computer Systems 25 (2009) 474–480 479

will significantly reduce the amount of time researches spendon looking through independent resources with the same orrelated queries. Knowledge in research can therefore easily beaccumulated and concentrated. The next subsection discusseshow arts and humanities research sees knowledge technologiesbenefiting their research.

3.3. Generating knowledge

Considering the structure of arts and humanities data, it isnot surprising that in the latest funding round for UK arts andhumanities e-Science, knowledge technologies such as datamininghave been strongly represented. The York-based ArchaeologicalData Service (ADS) will be responsible for developing Archaeotools:Data mining, facetted classification and e-Archaeology.24 Over40000 reports of grey literature in archaeological excavationslie potentially idle, as they are hard to access. Archaeotools isan attempt to provide access to these records using automatedmetadata generation techniques. These will index datasets fornew links in the records in terms of When, What and Where.The underlying datasets combine over one million records fromthe National Monuments Records of Scotland, Wales and Englandas well as Historic Environment Records from numerous localauthorities and the ADS’s own archive holdings.25 The formationof the facets is supported by existing thesauri and the Universityof Edinburgh’s geoXwalk service,26 which will provide geospatialinformation access to the data. A 3D-space will visualise facetsand their links and provide access to deeper unpublishedarchaeological literature, whereas users will be able to ask for theirown specific research interests to be represented in the indexing ofthese research records. This will create flexible access to resourcesup to now neglected in research.In Oxford, a project on Image, Text, Interpretation: e-Science,

Technology and Documents concentrates on deciphering andinterpreting manuscripts. These are often difficult to read, as theymight be fragmentary, stained, and damaged. Advanced imagingtechnology shall help develop and deliver an image-processingtool, that can be applied by researchers to a range of documents [7].The overall aim is to build an advanced software tool to helpClassicists with deciphering texts, analogous to decision supportsystems used in medical research to help with reasoning underuncertain conditions.The Medieval Warfare on the Grid: The Case of Manzikert

project in Birmingham27 will investigate the need for medievalstates to sustain armies by organising and distributing resources.A grid-based framework shall virtually reenact the Battle ofManzikert in 1071, a key historic event in Byzantine history.Agent-based modelling technologies will attempt to find outmore about the reasons why the Byzantine army was so heavilydefeated by the Seljurk Turks. Grid environments offer thechance to solve such complex humanmodelling problems throughdistributed simultaneous computing. With them, it is possible torun simulations on data from different sources such as transportinfrastructure, agriculture and military organisation.In all the new projects, we can identify a clear trend towards

the investigation of new methodologies for arts and humanitiesresearch, possible only because grid technologies offer unknowncomputational resources. It is interesting to see that although none

24 http://ads.ahds.ac.uk/project/archaeotools/.25 http://www.nesc.ac.uk/action/esi/download.cfm?index=3714.26 http://www.geoxwalk.ac.uk/.27 http://www.cs.bham.ac.uk/research/projects/mwgrid/.

of the original projects has achieved continuation funding, there isa continuation of the tendency to new types of scholarships fromthe early ad hoc experimentation.

4. Conclusion and future trends

The e-Science initiative in the UK has sparked enthusiasmand desire in the arts and humanities community to worktogether with information science and computing researchers tosolve challenges posed by the new digital resources availablein arts and humanities. The activities within the UK’s arts andhumanities e-Science community demonstrate the specific needsthat have to be addressed to make e-Science work within thesedisciplines. The early experimentation phase delivered projectsthat were mostly trying out existing approaches in e-Science.They demonstrated the need for a new methodology to meetthe requirements of humanities data that is particularly fuzzyand inconsistent, as it is not automatically produced, but is theresult of human effort. The data is fragile and its presentationoften difficult, as e.g. data in performing arts that only exists asan event. We have classified the early experimentation projectsinto 4 separate areas and argued that an identifiable researchagenda has been emerging. From these early experiments, wecould see how e-Science in the arts and humanities has maturedtowards the development of concrete systems that systematicallyinvestigate the use of e-Science for research. Whether it issimulation of past battles or musicology using state-of-the-artinformation retrieval techniques, this research would have notbeen possible before the shift in methodology towards e-Scienceand e-Research.However, gaps remain, which need to be filled by future

projects. In order for text mining and other automated knowledgegeneration technologies to work for humanities researchers, thereis a strong need for advanced optical character recognition (OCR).A proposal by the authors and the German TextGrid consortium28would like to bring together advanced OCR technologies withexisting text mining applications in order to support the whole lifecycle of textual research. Another commonly debated suggestionwould be to use the knowledge of the ‘crowds’ to supportdata integration for arts and humanities research. Collaborativeapproaches to metadata creation and editing are particularlyinteresting in circumstances where strong cultural differencesprevent the integration of research data. The authors togetherwith the Taiwanese Academia Sinica Grid Computing are involvedin a proposal to the AHRC to use volunteer thinking for thetranslation and adoption of Chinese metadata records in theTaiwanese National Digital Archives. Another good example offuture activities that build upon existing work is the ANCHisAEproposal to implement a collaborative, distributed infrastructurefor manuscript scholarship using the knowledge and expertise ofVirtual Vellum. These proposals are all under consideration byfunding bodies in Europe and the UK and mentioned here only, asthe authors are directly involved in them. There are many othergrass-roots activities originating from the multi-faceted researchagenda the arts and humanities e-Science initiative has broughtabout.

Acknowledgments

We would like to thank all the principal investigators of theprojects discussed here for theircollaboration and support. It

28 http://www.textgrid.de.

480 T. Blanke et al. / Future Generation Computer Systems 25 (2009) 474–480

would take too much space to list all of them here, a detaileddescription of their work and their reports can be found onthe AHeSSC website.29 Their research has been the foundationfor the success of the Arts and Humanities e-Science Initiative.Furthermore, AHRC, EPSRC and JISC have shown a remarkable willto experimentation in their commitments to arts and humanitiese-Science and to cross-council collaboration.

References

[1] T. Blanke, S. Dunn, The arts and humanities e-Science initiative in the UK, in: E-SCIENCE ’06: Proceedings of the Second IEEE International Conference on e-Science and Grid Computing, IEEE Computer Society, Washington, DC, USA,2006.

[2] T. Blanke, S. Dunn, M. Hedges, The arts and humanities e-Science initiative inthe UK, in: E-Science in the Arts and Humanities—From Early Experimentationto Systematic Investigation, IEEE Computer Society,Washington, DC, USA, 2007.

[3] I. Foster, C. Kesselman, The Grid 2: Blueprint for a New ComputingInfrastructure, 2nd ed., Morgan-Kaufmann, 2004.

[4] T. Hey, A. Trefethen, The data deluge: An e-Science perspective, in: F. Berman,A. Hey, G. Fox (Eds.), Grid Computing: Making the Global Infrastructure aReality, John Wiley and Sons, Hoboken, NJ, 2003.

[5] M. Nentwich, Cyberscience. Research in the Age of the Internet, AustrianAcademy of Science Press, Vienna, 2003.

[6] K. Schuerer,M.Woollard, National sample from the 1881 census of Great Britain5 random sample. Working Documentation v1.1, 2002.

[7] M. Terras, Image to Interpretation. An Intelligent System to Aid Historians inReading the Vindolanda Texts, OUP, 2006.

[8] J. Unsworth, The Draft Report of the American Council of Learned SocietiesCommission on Cyberinfrastructure for Humanities and Social Sciences,2006.

[9] A. Voss, e-Research infrastructure development and community engagement,in: UK e-Science All Hands, Nottingham, 2007.

29 http://www.ahessc.ac.uk/projects/.

Tobias Blanke is working as a Research Fellow at the Artsand Humanities e-Science Support Centre (http://www.ahessc.ac.uk) at King’s College London — researching theuse of e-Science technologies in the arts and humanities.He holds a Ph.D. in philosophy and is secretary of the OGFHumanities, Arts and Social Sciences Community Group.Tobias is a theme leader the e-Science institutes andleads the technical work package for DARIAH, a Europeanproject to create a research infrastructure for arts andhumanities.

Mark Hedges is Deputy Director of the Centre for e-Research at King’s College London, and before this wasTechnical Manager of the Arts and Humanities Data Ser-vice. At both of these institutions he has worked in thefields of data and informationmanagement, digital reposi-tories, digital libraries and e-research infrastructures. Priorto this, he was employed for 17 years in the software in-dustry, taking the lead on a number of large-scale devel-opment projects for industrial and commercial clients. Hisacademic background is in mathematics and philosophy– he has a Ph.D. in mathematics – and, more recently, in

Byzantine studies.

Stuart Dunn is a Research Fellow in the Centre fore-Research at King’s College London. Stuart receiveda Ph.D. in Aegean Bronze Age Archaeology from theUniversity of Durham in 2002. Before joining King’she worked for the AHRC’s ICT in Arts and HumanitiesResearch Programme. He has published on topics in e-Science generally, on e-Science methods in archaeology,and in the fields of Minoan environmental archaeologyand geospatial archaeological computing. He is also aVisiting Research Fellow in the School of Human andEnvironmental Science’s archaeology department at the

University of Reading.