lecture notes in computer science 9316 - springer978-3-319-24592-8/1.pdf · lecture notes in...

25
Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zürich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany

Upload: others

Post on 30-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Lecture Notes in Computer Science 9316

Commenced Publication in 1973Founding and Former Series Editors:Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board

David HutchisonLancaster University, Lancaster, UK

Takeo KanadeCarnegie Mellon University, Pittsburgh, PA, USA

Josef KittlerUniversity of Surrey, Guildford, UK

Jon M. KleinbergCornell University, Ithaca, NY, USA

Friedemann MatternETH Zurich, Zürich, Switzerland

John C. MitchellStanford University, Stanford, CA, USA

Moni NaorWeizmann Institute of Science, Rehovot, Israel

C. Pandu RanganIndian Institute of Technology, Madras, India

Bernhard SteffenTU Dortmund University, Dortmund, Germany

Demetri TerzopoulosUniversity of California, Los Angeles, CA, USA

Doug TygarUniversity of California, Berkeley, CA, USA

Gerhard WeikumMax Planck Institute for Informatics, Saarbrücken, Germany

Page 2: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

More information about this series at http://www.springer.com/series/7409

Page 3: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Sarantos Kapidakis • Cezary MazurekMarcin Werla (Eds.)

Research andAdvanced Technologyfor Digital Libraries19th International Conference on Theory and Practiceof Digital Libraries, TPDL 2015Poznań, Poland, September 14–18, 2015Proceedings

123

Page 4: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

EditorsSarantos KapidakisIonian UniversityCorfuGreece

Cezary MazurekPoznań Supercomputing

and Networking CenterPoznańPoland

Marcin WerlaPoznań Supercomputing

and Networking CenterPoznańPoland

ISSN 0302-9743 ISSN 1611-3349 (electronic)Lecture Notes in Computer ScienceISBN 978-3-319-24591-1 ISBN 978-3-319-24592-8 (eBook)DOI 10.1007/978-3-319-24592-8

Library of Congress Control Number: 2015949408

LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI

Springer Cham Heidelberg New York Dordrecht London© Springer International Publishing Switzerland 2015This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of thematerial is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology nowknown or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in this book arebelieved to be true and accurate at the date of publication. Neither the publisher nor the authors or the editorsgive a warranty, express or implied, with respect to the material contained herein or for any errors oromissions that may have been made.

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media(www.springer.com)

Page 5: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Preface

We are proud to present the proceedings of TPDL 2015, the 19th International Con-ference on Theory and Practice of Digital Libraries, held in Poznań, Poland, duringSeptember 14–18, 2015, organized by the Poznań Supercomputing and NetworkingCenter (PSNC).

The International Conference on Theory and Practice of Digital Libraries (TPDL),formerly known as European Conference on Research and Advanced Technology onDigital Libraries (ECDL), constitutes a leading scientific forum on digital libraries thatbrings together researchers, developers, content providers, and users in the field ofdigital libraries. The advent of the technologies that enhance the exchange of infor-mation with rich semantics is of particular interest in the community. Informationproviders inter-link their metadata with user-contributed data and offer new services forthe development of a web of data and addressing the interoperability and long-termpreservation challenges.

TPDL 2015 had the general theme “Connecting Digital Collections” andwas focused on four major topics:

– Connecting digital libraries– Practice of digital libraries– Digital libraries in science– Users, communities, personal data

There was also a special call for industry submissions, dedicating the “Systems andProducts” conference track for them.

There were 44 full paper and six short paper submissions in the main call. Allsubmissions were independently reviewed in a triple peer review process, initially byfour members of the Program Committee. A senior Program Committee membersubsequently coordinated a discussion among the four reviewers. The selection stagethat followed compared the paper evaluations and finalized the conference program. Asa result, 22 submissions were accepted as papers and some of the rest of the sub-missions were redirected for evaluation as potential posters or demonstrations. Theseredirected submissions were re-evaluated together with the 11 additional poster anddemonstration submissions. Finally, 15 poster/demo submissions were accepted. Thededicated “Systems and Products” call brought an additional six accepted submissions,which are not included in this proceedings volume, but published in a separate bookletand distributed among conference participants.

The most popular topics of submissions were user interfaces and user experience,user studies for and evaluation of digital library systems and applications, applicationsof digital libraries, infrastructures supporting content processing, social–technicalperspectives of digital information, interoperability and information integration, anddigital humanities. Regarding the number of accepted papers, the top five countries ofauthors were: USA, The Netherlands, Greece, Germany, and Brazil. Regarding the

Page 6: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

number of submitted papers, the top five countries of authors were: Greece, Germany,USA, New Zealand, and The Netherlands.

Beside submitted contributions, three keynote speakers were invited to present theirviews on crucial aspects of digital libraries. The opening keynote, given by DavidGiaretta, was focused on issues related to long-term data preservation. The secondkeynote speaker, Joseph Cancellaro, showed the user perspective in the context ofretrieval of digital audio assets, and Costis Dallas in the closing keynote shared histhoughts about the scholarly practice related to access and use of humanities data, inlight of his work conducted within the DARIAH-EU community. Another part of theconference was a discussion panel organized by Vittore Casarosa aiming to discussopen access to research data. Abstracts of all keynote speeches and the panel areincluded in the conference proceedings.

Around the main conference several side activities were organized, together creatinga five-day long series of events focused on digital libraries. The overall program beganwith five tutorials:

– Automatic Methods for Disambiguating Author Names in Bibliographic DataRepositories

– Building Digital Library Collections with Greenstone 3– Catmandu – A (Meta)Data Toolkit– Dynamic Data Citation – Enabling Reproducibility in Evolving Environment– Mappings, Application Profiles and Extensions for Cross-Domain Metadata in the

Europeana Context and Beyond

Following the main conference, several workshops were organized:

– 5th International Workshop on Semantic Digital Archives (SDA 2015)– Cloud-based Services for Digital Libraries– Extending, Mapping, and Focusing the CRM– Kick-Off Workshop of the IMPACT-OPF MOOC on Digitization and Digital

Preservation– Networked Knowledge Organization Systems and Services (NKOS)

In this context we are very grateful to the tutorial chairs, Giorgio Maria Di Nunzioand Giannis Tsakonas, for their hard work on attracting and evaluating the proposedtutorials. Also the workshop chairs, Trond Aalberg and Antoine Isaac, gave essentialfeedback to the conference tutorials. We would also like to thank all members of theconference Program Committee and especially the posters and demos chairs, JoséBorbinha and Preben Hansen, who did great work in evaluating a significant number ofsubmissions and creating a very interesting program for the conference.

Of course the conference could not happen without proper publicity, which wasassured by the publicity chairs, Marcos Goncalves, Raul Palma, Shigeo Sugimoto andHussein Suleman, and by the conference media partners: Coalition for NetworkedInformation and Digital Meets Culture.

For the Poznań Supercomputing and Networking Center, the operator of the PolishNational Research and Education Network PIONIER and the main organizer of theconference, it was a great occasion to actively support knowledge exchange and net-working in the worldwide digital libraries research community. Within the broad range

VI Preface

Page 7: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

of PSNC research and development activities, those related to the digital librariesdomain have been very dynamic since 1999 and have resulted in a number of nationaland international projects in which PSNC and its Digital Libraries and KnowledgePlatforms Department are continuously involved.

We hope that you will enjoy the proceedings and will be inspired to participate inthe following editions of the TPDL conference.

September 2015 Sarantos KapidakisCezary Mazurek

Marcin Werla

Preface VII

Page 8: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Organizations

General Chairs

Cezary Mazurek PSNC, PolandMarcin Werla PSNC, Poland

Program Chair

Sarantos Kapidakis Ionian University, Greece

Organizing Chair

Damian Niemir PSNC, Poland

Workshops Chairs

Trond Aalberg Norwegian University of Science and Technology,Norway

Antoine Isaac VU University Amsterdam, The Netherlands

Posters and Demos Chairs

José Borbinha IST/INESC-ID, PortugalPreben Hansen Stockholm University, Sweden

Tutorials Chairs

Giorgio Maria Di Nunzio University of Padova, ItalyGiannis Tsakonas University of Patras, Greece

Publicity Chairs

Marcos Goncalves Universidade Federal de Minas Gerais, BrazilRaul Palma PSNC, PolandShigeo Sugimoto University of Tsukuba, JapanHussein Suleman University of Cape Town, South Africa

Senior Program Committee

Trond Aalberg Norwegian University of Science and Technology,Norway

Page 9: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Maristella Agosti University of Padua, ItalyThomas Baker DCMI Ltd., USAJanusz Bień University of Warsaw, PolandJose Borbinha IST/INESC-ID, PortugalGeorge Buchanan City University London, UKDonatella Castelli CNR - ISTI, ItalyStavros Christodoulakis Technical University of Crete, GreecePanos Constantopoulos Athens University of Economics and Business, GreeceSally Jo Cunningham Waikato University, New ZealandErik Duval K.U.Leuven, BelgiumEdward Fox Virginia Polytechnic Institute and State University,

USAGeneva Henry Rice University, USAMartin Klein Los Alamos National Laboratory, USAStefanos Kollias NTUA, GreeceLaszlo Kovacs MTA SZTAKICarl Lagoze University of Michigan, USARonald Larsen University of Pittsburgh, USAClifford Lynch CNI, USACarlo Meghini CNR - ISTI, ItalyErich Neuhold University of Vienna, AustriaChristos Papatheodorou Ionian University, GreeceAndreas Rauber Vienna University of Technology, AustriaThomas Risse L3S Research Center, GermanyGiannis Tsakonas University of Patras, Greece

Program Committee

Robert Allen Yonsei University, KoreaDavid Bainbridge University of Waikato, New ZealandChristoph Becker University of Toronto, CanadaMaria Bielikova Slovak University of Technology in Bratislava,

SlovakiaTobias Blanke University of Glasgow, UKPável Calado IST/INESC-ID, PortugalJosé H. Canós Universitat Politècnica de València, SpainVittore Casarosa CNR - ISTI, ItalyLillian Cassel Villanova University, USAFabio Crestani University of Lugano, ItalyTheodore Dalamagas IMIS-“Athena” R.C., GreeceLois Delcambre Portland State University, USAGiorgio Maria Di Nunzio University of Padua, ItalyBoris Dobrov Research Computing Center of Moscow State

University, RussiaJ. Stephen Downie The University of Illinois at Urbana-Champaign, USAFabien Duchateau Université Claude Bernard Lyon 1 - LIRIS, France

X Organizations

Page 10: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Floriana Esposito Università Aldo Moro Bari, ItalyPierluigi Feliciati Università degli studi di Macerata, ItalyNicola Ferro University of Padua, ItalySchubert Foo Nanyang Technological University, SingaporeNuno Freire The European Library, The NetherlandsIngo Frommholz University of Bedfordshire, UKNorbert Fuhr University of Duisburg-Essen, GermanyRichard Furuta Texas A&M University, USAEmmanouel Garoufallou Alexander TEI of Thessaloniki, GreeceManolis Gergatsoulis Ionian University, GreeceC. Lee Giles Pennsylvania State University, USAMarcos Goncalves Federal University of Minas Gerais, BrazilJane Greenberg Drexel University, USAPreben Hansen Stockholm University, SwedenBernhard Haslhofer AIT, AustriaAnnika Hinze University of Waikato, New ZealandNikos Houssos National Documentation Centre/National Hellenic

Research Foundation, GreeceJane Hunter University of Queensland, AustraliaAntoine Isaac Europeana and VU University Amsterdam,

The NetherlandsAdam Jatowt Kyoto University, JapanJaap Kamps University of Amsterdam, The NetherlandsMichael Khoo Drexel University, USAClaus-Peter Klas GESIS, GermanyAlexandros Koulouris TEI of Athens, GreeceMichał Kozak Poznań Supercomputing and Networking Center,

PolandAlberto Laender Federal University of Minas Gerais, BrazilRay Larson University of California, Berkeley, USAFernando Loizides Cyprus University of Technology, CyprusZinaida Manžuch Vilnius University, PolandBruno Martins IST - Instituto Superior Técnico, PortugalDana Mckay Swinburne University of Technology, AustraliaAndras Micsik MTA SZTAKI, HungaryAgnieszka Mykowiecka IPI PAN, PolandWolfgang Nejdl L3S and University of Hannover, GermanyMichael Nelson Old Dominion University, USADavid Nichols University of Waikato, New ZealandRagnar Nordlie Oslo and Akershus University College, NowrayKjetil Nørvåg Norwegian University of Science and Technology,

NorwayPasquale Pagano CNR - ISTI, ItalyRaul Palma Poznan Supercomputing and Networking Center,

PolandIoannis Papadakis Ionian University, Greece

Organizations XI

Page 11: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Maggy Pezeril University of Montpellier, FranceDimitris Plexousakis FORTH, GreeceEdie Rasmussen University of British Columbia, CanadaLaurent Romary Inria and HUB-ISDL, FranceMike Rosner UM, MaltaSeamus Ross University of Toronto, CanadaRaivo Ruusalepp National Library of EstoniaHeiko Schuldt University of Basel, SwitzerlandTimos Sellis RMIT University, AustraliaMichalis Sfakakis Ionian University, GreeceFrank Shipman Texas A&M University, USANicolas Spyratos University of Paris South, FranceShigeo Sugimoto University of Tsukuba, JapanHussein Suleman University of Cape Town, South AfricaTamara Sumner University of Colorado at Boulder, USAAtsuhiro Takasu National Institute of Informatics, JapanManfred Thaller Universität zu Köln, GermanyChrisa Tsinaraki European Union - Joint Research Center (EU - JRC),

ItalyYannis Tzitzikas University of Crete and FORTH-ICS, GreecePertti Vakkari University of Tampere, FinlandFelisa Verdejo Universidad Nacional de Educacion a Distancia, SpainJan Weglarz Poznan University of Technology, PolandIris Xie University of Wisconsin-Milwaukee, USA

Maja Žumer University of Ljubljana, Slovenia

Additional Reviewers

Assante, MassimilianoBikakis, NikosCandela, LeonardoChandrasekar, PrashantCoro, GianpaoloFafalios, PavlosKalogeros, EleftheriosKanellos, Ilias

Kotzinos, DimitriManghi, PaoloManguinhas, HugoMckay, DanaMoro, RobertPapadakis, ManosPapadakos, PanagiotisSalem, Joseph

Stefanidis, KostasStudents, ThreephdStudents, TwophdWilliams, KyleWu, JianZagganas, Kostis

XII Organizations

Page 12: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Outlines of Keynote Presentations

Page 13: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Data - Unbound by Time or Discipline –

Challenges and New Skills Needed

David Giaretta

Giaretta Associates Ltd., Yetminster, Dorset, [email protected]

Abstract. We live in an exciting information age, where the deluge of dataenables the 4th paradigm to be used by the greatest number of scientists whohave ever lived, able to connect to hundreds of thousands of sources of infor-mation which are encoded digitally and used in an ever changing technologicalnetwork.

To take advantage of these opportunities presents challenges. The mostobvious involves simply coping with the volumes of data with which one hassome familiarity, from familiar sources.

However in order to combine data from multitudes of unfamiliar sources,covering a variety of disciplines, created over timescales which are long com-pared to technological and even many conceptual and terminological cyclesthere are new challenges both for the researchers and the infrastructure needed tosupport them.

This presentation will focus on these challenges raised by the need to ensurewe can deal with the unfamiliar and outline the resources, both human andtechnical, which will be needed to address them.

1 Opportunities

The term “4th paradigm” was coined by Jim Gray and colleagues to express the ideathat in addition to the empirical, theoretical and computational paradigms we now havedata exploration enabled by the vast amount of data that is being produced. This hasbeen explored in the literature as a source for scientific progress. However there are farbroader opportunities which those who fund the research are interested in.

The Riding the Wave report provided a vision for 2030 which addressed thequestion, as part of the EU Digital Agenda, “How Europe can gain from the rising tideof scientific data”.

The starting point was the observation that “A fundamental characteristic of our ageis the raising tide of data – global, diverse, valuable and complex. In the realm ofscience, this is both an opportunity and a challenge.”

The vision was of “a scientific e-Infrastructure that supports seamless access, use,re-use and trust of data. In a sense, the physical and technical infrastructure becomesinvisible and the data themselves become the infrastructure – a valuable asset, on whichscience, technology, the economy and society can advance.”

Page 14: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

2 Challenges

An underlying challenge was sustaining the availability and usability of the digitallyencoded information across disciplines and over time. An associated, fundamental,question was “who pays and why”. While data is newly created and of obvious usethere will be resources available, but as the Blue Ribbon Task Force pointed out, thevalue of much data is potential – it may be useful in the future, but this is not certain.

Resources are needed to address the many V’s1 which are normally discussed interms of big data – but which are also relevant to small data, since as noted2 the realrevolution, which is the mass democratisation of the means of access, storage andprocessing of data – small as well as big.

In this presentation I divide these Vs into two groups. The first consists of Volume,Velocity, Variety and Volatility which are ones more related to data management – i.e.issues which arise even if the data is being used by the researchers who created it andover just a few years. The other group consists of Veracity, Validity and Value, whichthis presentation will focus on for the following reasons.

Veracity, including Understandability and Authenticity, is vital for using data fromunfamiliar sources and with which the researcher is unfamiliar – otherwise how can aresearcher use the data and trust that it is what it is claimed to be? The challenge will beexacerbated by the data management “Vs” noted previously, in particular scaling withVariety.

Validity (including correctness, data quality and legality) is vital interest toresearchers if they wish to undertake scientifically useful work.

Value (or potential value) must be identified in order to justify keeping the data inthe long term – and even in the short term (related to Volatility) – because keeping datarequires resources. The minimum, relatively easily identified, costs are those for storagewhich tends to scale with Volume and are very front-loaded. Other costs, which are lessobvious and more uncertain are those associated with maintaining Veracity andValidity.

3 Solutions

The bulk of the presentation will look at practical solutions to the challenges presentedby the second group of V’s. These solutions involve underlying consistent concepts,technology and widely agreed procedures, all supported by skilled and well trainedhumans, across the whole lifecycle of data from conception through to and includingcuration.

They will help put in place the data infrastructure which can be used across dis-ciplines and across time for the benefit of science, technology, the economy andsociety.

1 http://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/2 http://www.theguardian.com/news/datablog/2013/apr/25/forget-big-data-small-data-revolution

XVI Data - Unbound by Time or Discipline – Challenges and New Skills Needed

Page 15: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Digital Audio Asset Archival and Retieval:A Users Perspective

Joseph Cancellaro

Interactive Arts and Media DepatmentColumbia College Chicago

[email protected]

Abstract. Both academically and professionally, the problem of codingand formatting audio for archiving and retrieval is constantly present.Many factors weigh in on how to build naming conventions, searchcriteria, and meta tags for audio assets, particularly in large scale non-linear virtual environment productions. The tendency to build newlibraries for each project is most intensive but allows a user to constantlybe aware of the assets available and who created them. This requiresinventing or reusing a new data retrieval and archiving engine or plat-form for each project. Needless to say, this approach is impractical andinefficient. The music industry has solved some of these issues throughthe convenience of having stylized “brands” of music to label. The usercan decide whether he/she likes the music and move on to the next pieceof similar character. Examples of engines like this include Pandora,Rdio, Spotify and of course Apple and Google. These music streamingautomated music recommendation services function moderately welluntil they don't. In sound effects, this solution is more complex. Whenthere are thousands of samples of differing footsteps, squeaks, knocks,thumps, etc. to manage per category a more refined Content ManagementSystem (CMS) is required. Again, the impractical issue of building fromscratch is introduced. One method being looked at is Fourier transformbased pattern recognition analysis and algorithms. This method usesdiscrete Fourier transform based pattern classifiers, defined and corre-lated by a designer, to map, or compare against a predefined data vectorspecifying a particular sound pattern. This research has already beenexperimented with but usually not in the form of a searchable librarytool. An added layer to this method includes the use of artificial intel-ligent programming as part of a machine learning tool to help expeditesearching and filing of data assets. This is currently being experimentedwith in the Interactive Arts and Media Department where students andfaculty are working on solutions to some of these problems. The land-scape of audio, which includes music, sound and silence is massivelycomprehensive in size and complexity. Methods and strategies for

Page 16: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

solving for a universal identifying and archival tool are constantly on theminds of composers, sound designers and all who deal with audio assetsin linear and non-linear environments. My discussion will raise someof the issues surrounding classifying audio and storage as well asproblems encountered by sound designers and composers in the field.

XVIII Digital Audio Asset Archival and Retieval: A Users Perspective

Page 17: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

The Era of the Post-repository: ScholarlyPractice, Information and Systems

in the Digital Continuum

Costis Dallas

Abstract. Research in the arts and humanities is often associated with the worldof the solitary scholar, surrounded by dusty books, manuscripts, or artefacts. Asearly as 1959, C.P. Snow lamented the “gulf of mutual incomprehension” sep-arating humanities scholars from scientists. Yet, the wide-ranging changes inscholarly practice associated with digital technology, the crisis of disciplinarity,the rise of new methodological and theoretical frameworks, the increased risks tothe longevity of cultural resources, and the emergence of new fields of contes-tation around their interpretation and value, casts a different light on Snow’snotion of the “two cultures”, introducing new challenges and opportunities.

For information researchers and computer scientists engaged with the con-ceptualization, design and development of digital infrastructures, tools and ser-vices in the domain of the arts, humanities and cultural heritage, understanding thenature and direction of these changes is of paramount importance. The advancedfield of digital humanities is only part of the story. In fact, humanists working withbig data, crafting their own schemas and encoding formalisms, engaging inontological modeling, and scripting their own analytical and representational toolsare but a small minority among an increasing number of scholars producinginfluential, highly cited research merely facilitated by digital technology – whatmay be called digitally-enabled humanists. As indicated by a survey conducted bythe Digital Methods and Practices Observatory Working Group of DARIAH-EU,digitally-enabled humanists use frequently applications such as word processorsand spreadsheets besides repositories to organize and curate research resources,controlled vocabularies and classification systems that are more often homegrownthan standard, and a variety of readily available online services and social mediafor information discovery, collaboration and dissemination. And, in tandem withchanges in scholarly practice, the rise of computational intelligence and social andparticipatory media, as well as the increasing availability of humanities and her-itage resources in the networked and mobile digital environment at a time ofglobalization, bring about new important stakeholders in their representation andinterpretation, such as descendant and source communities, amateurs engaging incitizen science, and culture and heritage publics.

Established wisdom on digital infrastructures for the arts and humanities isshaped by a notion of centralized custodial control, replicating the traditionalstructures of the physical archive, library and museum: in other words, on thenotion that research resources can be curated and preserved in the future in large-scale, centralized digital repositories. This becomes problematic as financialmeans grow increasingly scarce, and as the cultural record broadens to include aproliferation of born digital resources, grey literature, outcomes of independentand commercial research, fruits of self-publication, remix and social mediainteraction, and manifestations of community and personal memory. In fact,

Page 18: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

research on emerging digital research practices in a discipline such as archaeologyshows how the availability of multimodal, interactive, real time recording anddocumentation technologies, and the plurality of research actors, interpretationsand uses of archaeological knowledge give rise to multiple kinds of denselyinterconnected digital resources (e.g., GIS, LiDAR, formatted data, 3D models,annotations, interpretive narratives, video documentation, blogs, and social mediainteractions) and intertwine the ostensibly distinct processes of data recording andinterpretation. The shift towards a digital infrastructure for humanities and heri-tage resources, in tandem with these changes, brings about a rising “curationcrisis” which calls for a radical reconsideration of priorities in the specificationand design of digital infrastructures.

Central to this reconsideration is the concept of the records continuum, orig-inally advanced by Australian archival scholars to indicate the limits of a lifecycleapproach in dealing with the capabilities and challenges of digital information.Criticizing the custodial notion of archives as data mortuaries, continuum thinkingcalls for a unified approach to recordkeeping capable of attending to records fromthe point of creation to their “pluralizing” interpretation and use by diversecommunities. It resonates with a call for a radical re-examination of the theory andpractice of digital curation, based on the recognition that curation of researchresources facilitated by ubiquitous pervasive digital technology takes placeincreasingly “in the wild”, involves multiple stakeholders “exercising the archive”beyond data custodians, concerns not merely information resources qua digitalobjects but also their evolving epistemic content and context, and thus requires arethink of the requirements, affordances and priorities of digital infrastructures.

The promise of going beyond traditional repositories to deploy a digitalinfrastructure which explicitly focuses on the provision of curation capabilities isdemonstrated by the Metadata and Object Repository (MORe), a system deployedby the Digital Curation Unit, IMIS-Athena Research Centre to support thedynamic evolution and continuous semantic enrichment of heterogeneous meta-data and registry descriptions of arts and humanities resources and collections.MORe has been used extensively for Europeana metadata aggregation in theCARARE and LoCloud projects, supporting semi-automated and manual digitalcuration activities, and leveraging workflows of external services such as historicnames gazeteers and SKOS vocabularies. It supports the curation of resources “inthe wild” such as Wikimedia assets, and connects with client systems, includingOmeka-based LoCloud Collections, and a Metadata Entry Tool that could support“sheer curation” on a digital tablet at the point of creation.

Systems such as MORe herald a new approach to digital infrastructures,beyond the architecture and functionalities of traditional repositories such asFedora or DSpace. Yet, a key challenge remains how to address the fact that artsand humanities scholars, amateur researchers, memory institutions, collectors, andonline users and curators of digital information assets will continue to employ abricolage of digital tools and methods available “at hand”, some of which may beimprisoned within technical or commercially-controlled silos. An overarchingvision for future infrastructures might thus call for a radically expanded version ofcustodial repositories, combining open cloud storage of dynamic, potentiallyintelligent and self-documenting information objects with curation-enabled, dis-tributed information systems and orchestrated, user-configurable services acces-sible to multiple interfaces of end-user tools and applications in the continuum.

XX The Era of the Post-repository

Page 19: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Open Access to Research Data: Is it a Solutionor a Problem?

Vittore Casarosa

CNR - ISTI, ItalyPanel held in connection with TPDL 2015

Panel Introduction

In the past few years, we have witnessed a (slow) paradigm shift about the way inwhich research results are being published and disseminated. More and more we haveseen the push for publishing research results as Open Access (OA) “digital publica-tions” and more recently the push (especially from the European Commission) towards“Open Science”. This means not only the OA publication of research results, but alsothe OA publication of the “input to research”, i.e. the raw material underlying theresearch process, generically identified as Research Data.

The main argument in favour of Open Access is that most of research is being donewith public funds, with research results and research data being produced in the publicinterest, and therefore they should remain publicly available. Availability should berestricted only by legitimate reasons, such as privacy protection or intellectual copy-right. Of course, Open Access does not prevent commercial exploitation and protectionof the research results and the research data, with patents and copyrights.

Following the recommendations of the European Commission, to ensure openaccess, publication should be done either by self archiving the material in an onlinerepository, or by open access publication in peer-reviewed open access journals, whichvery often charge “Article Processing Charges” to the authors, to offset the cost ofmaking the content of the journal freely available. The first alternative is commonlyindicated as “Green OA” and the second one as “Gold OA”. The diagram below,borrowed from the European Commission, summarizes these concepts.

The push towards open access to research data has only increased the number ofissues generally encountered with open access to scientific publications. More than tenyears ago, a report from OECD [1] identified and categorized the main issues related toOpen Access.

• Technological issues: Broad access to research data, and their optimum exploita-tion, requires appropriately designed technological infrastructure, broad interna-tional agreement on interoperability, and effective data quality controls;

• Institutional and managerial issues: While the core open access principle applies toall science communities, the diversity of the scientific enterprise suggests that avariety of institutional models and tailored data management approaches are mosteffective in meeting the needs of researchers;

Page 20: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

• Financial and budgetary issues: Scientific data infrastructure requires continued, anddedicated, budgetary planning and appropriate financial support. The use ofresearch data cannot be maximized if access, management, and preservation costsare an add-on or after-thought in research projects;

• Legal and policy issues: National laws and international agreements directly affectdata access and sharing practices, despite the fact that they are often adoptedwithout due consideration of the impact on the sharing of publicly funded researchdata;

• Cultural and behavioural issues: Appropriate reward structures are a necessarycomponent for promoting data access and sharing practices. These apply to thosewho produce and those who manage research data.

Panel Objectives

Given the breadth and depth of all the issues, it should be clear that the main objectiveof the panel is not to solve the issues of Open Access. From one point of view, OpenAccess to Research Data is the “solution” to achieve a “better and more efficientscience (Science 2.0)”. From another point of view, Open Access to Research Databrings with it so many issues and problems that it might become an impediment to thedissemination of research results. The panel will try to stimulate a discussion and anexchange of ideas among the panellists, which is expected to trigger a wider discussionwith the audience, touching (some of) the benefits and issues mentioned before. Thepanellists will bring to the table their experience in many of the issues mentionedbefore, such as infrastructures and institutional repositories, data curation in librariesand archives, long term preservation of data, education and training for data producersand data curators, and so on.

As it is often the case in this type of events, most probably at the end of the panelthere will be even more questions than answers, but hopefully it will have contributed

XXII Open Access to Research Data: Is it a Solution or a Problem?

Page 21: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

to gain a more global view and a better understanding of the issues related to the actualimplementation of Open Access to Research Data.

Panel coordinatorVittore Casarosa (CNR-ISTI, Italy)

Panel participantsDavid Giaretta (Alliance for Permanent Access)Steve Griffin (University of Pittsburgh, Pittsburgh, USA)Herbert Maschner (University of Southern Florida, Tampa, USA)Cezary Mazurek (Poznan Supercomputing and Networking Center, Poznan,

Poland)Andy Rauber (Technical University of Vienna, Vienna, Austria)Anna Maria Tammaro (University of Parma, Parma, Italy)

Reference

[1] Arzberger, P., et al.: Promoting access to public research data for scientific, economic, andsocial development. Data Sci. J. 3 (2004)

Open Access to Research Data: Is it a Solution or a Problem? XXIII

Page 22: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Contents

Interoperability and Information Integration

Web Archive Profiling Through CDX Summarization . . . . . . . . . . . . . . . . . 3Sawood Alam, Michael L. Nelson, Herbert Van de Sompel,Lyudmila L. Balakireva, Harihar Shankar, and David S.H. Rosenthal

Quantifying Orphaned Annotations in Hypothes.is. . . . . . . . . . . . . . . . . . . . 15Mohamed Aturban, Michael L. Nelson, and Michele C. Weigle

Query Expansion for Survey Question Retrieval in the Social Sciences . . . . . 28Nadine Dulisch, Andreas Oskar Kempf, and Philipp Schaer

Multimedia Information Management and Retrieval and Digital Curation

Practice-Oriented Evaluation of Unsupervised Labeling of AudiovisualContent in an Archive Production Environment . . . . . . . . . . . . . . . . . . . . . 43

Victor de Boer, Roeland J.F. Ordelman, and Josefien Schuurman

Measuring Quality in Metadata Repositories . . . . . . . . . . . . . . . . . . . . . . . . 56Dimitris Gavrilis, Dimitra-Nefeli Makri, Leonidas Papachristopoulos,Stavros Angelis, Konstantinos Kravvaritis, Christos Papatheodorou,and Panos Constantopoulos

Personal Information Management and Personal Digital Libraries

Memsy: Keeping Track of Personal Digital Resources Across Devicesand Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Matthias Geel and Moira C. Norrie

Digital News Resources: An Autoethnographic Study of News Encounters. . . 84Sally Jo Cunningham, David M. Nichols, Annika Hinze, and Judy Bowen

Exploring Semantic Web and Linked Data

On a Linked Data Platform for Irish Historical Vital Records . . . . . . . . . . . . 99Christophe Debruyne, Oya Deniz Beyan, Rebecca Grant,Sandra Collins, and Stefan Decker

Keywords-To-SPARQL Translation for RDF Data Search and Exploration. . . 111Katerina Gkirtzou, Kostis Karozos, Vasilis Vassalos,and Theodore Dalamagas

Page 23: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Author Profile Enrichment for Cross-Linking Digital Libraries . . . . . . . . . . . 124Arben Hajra, Vladimir Radevski, and Klaus Tochtermann

User Studies for and Evaluation of Digital Library Systems and Applications

On the Impact of Academic Factors on Scholar Popularity:A Cross-Area Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Pablo Figueira, Gabriel Pacheco, Jussara M. Almeida,and Marcos A. Gonçalves

A Comparison of Offline Evaluations, Online Evaluations, and UserStudies in the Context of Research-Paper Recommender Systems . . . . . . . . . 153

Joeran Beel and Stefan Langer

Connecting Emotionally: Effectiveness and Acceptance of an AffectiveInformation Literacy Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Yan Ru Guo and Dion Hoe-Lian Goh

Applications of Digital Libraries

A Survey of FRBRization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185Joffrey Decourselle, Fabien Duchateau, and Nicolas Lumineau

Are There Any Differences in Data Set Retrieval Compared to Well-KnownLiterature Retrieval? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Dagmar Kern and Brigitte Mathiak

tc-index: A New Research Productivity Index Based on EvolvingCommunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Thiago H.P. Silva, Ana Paula Couto da Silva, and Mirella M. Moro

Digital Humanities

Detecting Off-Topic Pages in Web Archives. . . . . . . . . . . . . . . . . . . . . . . . 225Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson

Supporting Exploration of Historical Perspectives Across Collections . . . . . . 238Daan Odijk, Cristina Gârbacea, Thomas Schoegje, Laura Hollink,Victor de Boer, Kees Ribbens, and Jacco van Ossenbruggen

Impact Analysis of OCR Quality on Research Tasks in Digital Archives . . . . 252Myriam C. Traub, Jacco van Ossenbruggen, and Lynda Hardman

Social-Technical Perspectives of Digital Information

Characteristics of Social Media Stories . . . . . . . . . . . . . . . . . . . . . . . . . . . 267Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson

XXVI Contents

Page 24: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Tyranny of Distance: Understanding Academic Library Browsingby Refining the Neighbour Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

Dana McKay, George Buchanan, and Shanton Chang

The Influence and Interrelationships Among Chinese Libraryand Information Science Journals in Taiwan . . . . . . . . . . . . . . . . . . . . . . . . 295

Ya-Ning Chen, Hui-Hsin Yeh, and Po-Jui Lai

Poster and Demo Papers

An Experimental Evaluation of Collaborative Search ResultDivision Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

Thilo Böhm, Claus-Peter Klas, and Matthias Hemmje

State-of-the-Art of Open Access Textbooks and Their Implicationsfor Information Provision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

Ya-Ning Chen

Adaptive Information Retrieval Support for Multi-sessionInformation Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

Daniel Backhausen, Claus-Peter Klas, and Matthias Hemmje

Transformation of a Library Catalogue into RDA Linked Open Data . . . . . . . 321Gustavo Candela, Pilar Escobar, Manuel Marco-Such,and Rafael C. Carrasco

Segmenting Oral History Transcripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326Ryan Shaw

Digital Libraries Unfurled: Supporting the New Zealand Flag Debate . . . . . . 330Brandon M. Thomas, Joanna M. Stewart, David Bainbridge,David M. Nichols, William J. Rogers, and Geoff Holmes

Evaluating Auction Mechanisms for the Preservation of Cost-Aware DigitalObjects Under Constrained Digital Preservation Budgets . . . . . . . . . . . . . . . 334

Jose Antonio Olvera, Paulo Nicolás Carrillo, and Josep Lluis de la Rosa

Mobile Annotation of Geo-locations in Digital Books . . . . . . . . . . . . . . . . . 338Annika Hinze, Haley Littlewood, and David Bainbridge

Teaching Machine Learning: A Geometric View of Naïve Bayes . . . . . . . . . 343Giorgio Maria Di Nunzio

Study About the Capes Portal of E-Journals Non-users . . . . . . . . . . . . . . . . 347Wesley Rodrigo Fernandes and Beatriz Valadares Cendón

Contents XXVII

Page 25: Lecture Notes in Computer Science 9316 - Springer978-3-319-24592-8/1.pdf · Lecture Notes in Computer Science 9316 Commenced Publication in 1973 Founding and Former Series Editors:

Czech Digital Library – Big Step to the Aggregation of Digital Contentin the Czech Republic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

Tomas Foltyn and Martin Lhotak

MirPub v2: Towards Ranking and Refining miRNA PublicationSearch Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

Ilias Kanellos, Vasiliki Vlachokyriakou, Thanasis Vergoulis,Georgios Georgakilas, Yannis Vassiliou, Artemis K. Hatzigeorgiou,and Theodore Dalamagas

A Proposal for Autonomous Scientific Publishing Agent . . . . . . . . . . . . . . . 360Adam Sofronijevic, Aleksandar Jerkov, and Dejana Kavaja Stanisic

Extracting a Topic Specific Dataset from a Twitter Archive . . . . . . . . . . . . . 364Clare Llewellyn, Claire Grover, Beatrice Alex, Jon Oberlander,and Richard Tobin

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

XXVIII Contents