second periodic report - clarin · ist projects (e.g. flarenet, meta-net, trebleclef, medar) and...

43
Common Language Resources and Technology Infrastructure www.clarin.eu Second Periodic Report D1M-2 30 September 2011 Editors: Steven Krauwer, Hetty Winkel

Upload: others

Post on 31-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

www.clarin.eu

Second Periodic Report

D1M-2

30 September 2011

Editors: Steven Krauwer, Hetty Winkel

Page 2: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 2

The ultimate objective of CLARIN is to create a European federation of existing digital repositories that include language-based data, to provide uniform access to the data, wherever it is, and to provide existing language and speech technology tools as web services to retrieve, manipulate, enhance, explore and exploit the data. The primary target audience is researchers in the humanities and social sciences and the aim is to cover all languages relevant for the user community. The objective of the current CLARIN Preparatory Phase Project (2008-2010) is to lay the technical, linguistic and organisational foundations, to provide and validate specifications for all aspects of the infrastructure (including standards, usage, IPR) and to secure sustainable support from the funding bodies in the (now 23) participating countries for the subsequent construction and exploitation phases beyond 2010.

Page 3: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 3

Second Periodic Report

EC FP7 project no. 212230

Deliverable/Milestone: D1M-2

Responsible: Steven Krauwer

Contributing Partners: UU, MPI, OTA, HASRIL, UTU, UHEL, UAIC, UCPH

© all rights reserved by UU on behalf of CLARIN

Page 4: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 4

Scope of the Document

This is the second periodic report, covering the period July 2009 – June 2011.

Page 5: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 5

Contents

1 Project objectives for period II..................................................................... 6

2 Work progress and achievements for period II ............................................ 7

2.1 WP1C Coordination...............................................................................................7

2.1.1 Objectives ................................................................................................................7

2.1.2 Summary of progress towards the objectives........................................................................7

2.1.3 Details of each task .....................................................................................................8

2.1.3.1 Relation with national activities ............................................................................................. 9

2.1.3.2 Collaboration with DARIAH and the e-Infrastructures: SDH2010 and NEERI2010........................10

2.1.3.3 Collaboration with other ESFRI infrastructures .......................................................................10

2.1.3.4 Internal communication ......................................................................................................11

2.1.3.5 CLARIN Reference Manual ...................................................................................................11

2.2 WP2 Technical Infrastructure ............................................................................... 14

2.2.1 Objectives .............................................................................................................. 14

2.2.2 Summary of progress towards the objectives...................................................................... 14

2.2.3 Details of each task ................................................................................................... 14

2.2.3.1 Centres set-up (D2R-1a, -1b) ..............................................................................................14

2.2.3.2 AAI infrastructure (D2R-2a, 3a, -3b).....................................................................................15

2.2.3.3 Metadata (D2R-4, -5a, -5b).................................................................................................15

2.2.3.4 Web Services (D2R-6b, -7a, -7b, -8) ....................................................................................15

2.2.3.5 EPIC persistent identifier service (D2R-2b) ............................................................................15

2.2.3.6 Cost estimations (D2R-9a, suppressed: -9b)..........................................................................16

2.2.3.7 Software development ........................................................................................................16

2.3 WP3 Humanities Overview................................................................................... 18

2.3.1 Objectives .............................................................................................................. 18

2.3.2 Summary of progress towards the objectives...................................................................... 18

2.3.3 Details of each task ................................................................................................... 18

2.4 WP5 Language Resources and Technology Overview............................................... 26

2.4.1 Objectives .............................................................................................................. 26

2.4.2 Summary of progress towards the objectives...................................................................... 26

2.4.3 Details of each task ................................................................................................... 27

2.5 WP6 Dissemination............................................................................................. 30

2.5.1 Objectives .............................................................................................................. 30

2.5.2 Summary of progress towards the objectives...................................................................... 30

2.5.3 Details of each task ................................................................................................... 30

2.6 WP7 Legal Issues ............................................................................................... 32

2.6.1 Objectives .............................................................................................................. 32

2.6.2 Summary of progress towards the objectives...................................................................... 32

2.6.3 Details of each task ................................................................................................... 32

2.7 WP8 Construction and Exploitation Agreement ....................................................... 34

2.7.1 Objectives .............................................................................................................. 34

2.7.2 Summary of progress towards the objectives...................................................................... 34

2.7.3 Details of each task ................................................................................................... 34

3 Project Management................................................................................... 37

3.1 Consortium management tasks and achievements ................................................. 37

3.2 Problems that have occurred and how they were solved.......................................... 37

3.3 Changes in the consortium .................................................................................. 37

3.4 List of project meetings, dates and venues ............................................................ 37

3.5 Development of project website and communication with Consortium....................... 37

3.6 Communication and liaisons with other projects or programmes .............................. 37

3.7 Project planning and status ................................................................................. 38

3.8 Any changes to the legal status of any of the beneficiaries ...................................... 38

3.9 Use of foreground and dissemination activities during this period ............................. 38

APPENDIX: table of deliverables and milestones ........................................... 39

Page 6: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 6

1 Project objectives for period II At the general management and coordination level the main objective was to monitor the progress of the project, keep the various strands of the project together and ensure their convergence, to maintain contacts with sister projects and with similar initiatives elsewhere, to promote and present the project at different forums and to coordinate the production of the CLARIN Reference Manual. Research work on the technical infrastructure focused on federation building, creation of the registry infrastructure, collection of requirements for webservices and workflows and their creation, building services and applications and making cost estimations for the construction phase. At the user end the objectives were centered around the humanities projects: their launch and execution, and around the surveys, the impact studies and the strategic plan. On the language technology side the coordination objectives were the formulation of interoperability requirements, assessing current LRT coverage and quality, report on the LRT survey and the taxonomy, finalizing the BLARK description, and the production of the interoperability report. Research work along the same strand aimed at providing interoperability scenarios and workflows, making plans for providing LRTs and filling LRT gaps, the integration of LRTs and making proposals for validation. The objectives of our dissemination activities were the further development of the website and the continued publication of the newsletter. A second objective was to provide help desk and directory facilities and to prepare an outline plan for help desk facilities in the construction phase. Work in the area of licensing and authorization aimed at analyzing licensing problems and providing templates, at the production of various federation agreements and at collaboration with neighboring initiatives. The main objective related to governance was the preparation of the ready to sign agreement on the construction and exploitation phase, which eventually materialized as the ERIC application.

Page 7: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 7

2 Work progress and achievements for period II Please note that all references to project deliverables and milestones use the internal

numbering that has been used in the Description of Work (e.g. D5R-2.5, meaning Deliverable

2.5 in Research Work package 5), and not the numbering that was imposed by the EC’s

software. In order for the reader to keep track of the references we have included at the end of

this document a table of deliverables using the original numbering.

2.1 WP1C Coordination

2.1.1 Objectives Main objectives of the second phase were not much different then those for the first phase: - the consolidation of the organizational and support structures for the overall coordination of

the project - the preparation and finalizing of the 2nd periodic report and of the final report - the preparation and finalizing of two amendments of the Grant Agreement - the organisation of meetings of the various project bodies - communication with other projects and national funding agencies - overall coordination of the project

2.1.2 Summary of progress towards the objectives The following bodies constitute the CLARIN organisation: - the Executive Board (EB), consisting of all WP leaders and the DARIAH liaison officer - the Strategic Coordination Board (SCB), consisting of representatives appointed by the

funding agencies - the Scientific Board (SB), consisting of scientists appointed by the funding agencies - International Advisory Board (IAB). During the report period, the EB has met approx. every 3 months, and some virtual meetings were organized. After a first joint meeting of SCB and SB in period I both Boards had a second meeting in March 2010, together with the IAB. This second meeting was also attended by representatives of the Dutch ministry of research and the Dutch research council. The two important results of this meeting were that all three Boards expressed their satisfaction with the progress made by the project and the direction it had taken, and that there was a general agreement on the proposal made by the EB to move towards the creation of an ERIC as the governing body for the future CLARIN infrastructure. Both the Dutch ministry and the research council endorsed this decision and offered their further support during the implementation process. The SCB met again in March and April 2011 in joint meetings with representatives of the ministries in the participating countries, for the preparation of the CLARIN ERIC, see also the report of WP8. No more face-to-face meetings with the SB and IAB were organized, but they have commented on and approved our Deliverables and have been informed by e-mail of important developments in CLARIN. A number of meetings has been organized with other RI projects (such as ESFRI, ESFRI-SSH), IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the end of this section for meetings during this period. The overall coordination of the project has been in the hands of the Coordinator and the Executive Board, assisted by the Assistant Coordinator, Hetty Winkel.

Page 8: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 8

2.1.3 Details of each task The EB has been operational from the very beginning (and even before). Also during this reporting period the EB consisted of all WP leaders: Steven Krauwer, UU - WP1; Peter Wittenburg, MPI - WP2; Tamás Váradi, HASRIL - WP3; Erhard Hinrichs, UTU - WP5; Dan Cristea, UAIC - WP6; Kimmo Koskenniemi, UHEL - WP7; Bente Maegaard, UCPH - WP8; and Martin Wynne, OTA – our liaison officer with our sister ESFRI PP project DARIAH. The EB is responsible for the execution of the contract and for the overall coordination of the project. For WP2, Dieter Van Uytvanck (MPI) also participated in EB meetings, For WP7 Krister Lindén (UHEL), for WP8 (UCPH) Hanne Fersøe, and for WP6 Piroska Lendvai (HASRIL). The SCB and the SB have become operational early 2009 and had their first meeting in March 2009. The members were appointed by the national funding agencies. Some countries did not appoint a member. There were a few changes in the reporting period. The table below reflects the situation at the end of the project:

Country Scientific Board Strategic Coord. Board

Austria Prof. Wolfgang Ullrich Dressler Mrs Andrea Schmölzer

Belgium/Flanders Prof.dr. Jean-Pierre Martens Dr. Peter Spyns

Bulgaria Prof. Dr. Galya Angelova

Croatia Marko Tadic

Czech Republic Petr Ventluka

Denmark Mrs Lene Schosler Mr Peter Uffe Meier

Estonia Mrs Kadri Vider

Finland Prof. Arto Mustajoki Ms Helena Vänskä

France Prof. Benoit Habert

Germany Prof. Dr. Ludwig M. Eichinger Dr. Helge Kahler

Greece Mrs Afroditi Patroni

Hungary Prof Laszlo Hunyadi Dr. István Kenesei

Italy Dr. Nicoletta Calzolari Zamorani (chair)

Latvia Dr. Ilze Auzina Dr. Irina Arhipova

Malta Dr. Ray Fabri Dr. Nicholas J. Sammut

Netherlands Prof.dr.H.W. Bennis Prof.dr. Ans van Kemenade

Norway Mr Arne Svindland Mrs Siri Lader Bruhn

Poland Mr Maciej Piasecki (observer) Mr Maciej Piasecki (observer)

Portugal Prof Miguel Filgueiras Mrs Isabel Figueiredo

Romania Prof. Dan Tufis (vice chair) Mrs Ruxandra Popescu

Spain

Sweden Prof Joakim Nivre

U.K. Prof. Yorick Wilks Mr Matthew Dovey

Page 9: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 9

The International Advisory Board (IAB) was an advisory body consisting of high-level experts from outside Europe. We organized a first (and last) meeting with them in March 2010.

Name Department Organisation City Country

Chad Kainz Networking Services and Information Techn.

University of Chicago

Chicago United States

Harald Baayen Department of Linguistics

University of Alberta Edmonton Canada

Martha Palmer Dept. of Linguistics and Computer Science

University of Colorado

Boulder United States

Brian MacWhinney Dept. of Psychology Carnegie Mellon University

Pittsburgh United States

Sue Ellen Wright Modern Language and Classical Studies

Kent State University

Kent United States

Debjani Ganguly Research School of Humanities

Australian National University

Canberra Australia

Steven Bird Dept. of Computer Science

University of Melbourne

Victoria Australia

Sadaoki Furui Dept. of Computer Science

Tokyo Institute of Technology

Tokyo Japan

Sebastian Drude Linguistica, CCH Museu Paraense Emilio Goeldi

Belem Brazil

Neil Fraistat Maryland Institute for Technology

University of Maryland

College Park United States

Justus Roux School of Languages North West Univ Potchefstroom South Africa

Annie Zaenen Palo Alto Research Centre

XEROX Palo Alto United States

2.1.3.1 Relation with national activities

Participation in CLARIN is not limited to consortium partners. The CLARIN network grew from 120 member institutions to over 200, covering 33 countries. Three new partners joined the CLARIN Consortium during period II, so that by the end of the project we had 36 consortium members, representing 26 countries. The new partners are:

• Technical University of Istanbul, Turkey • University of Iceland, Reykjavik • Vytautas Magnus University, Kaunas, Lithuania

The original plan to fully integrate CLARIN activities at the national level into the CLARIN preparatory phase project (short: CLARIN-EU, used only where confusion is possible) had to be abandoned. The main obstacles were - the complete absence of national funding in some countries - the fact that different countries had widely different approaches to the creation of the

national roadmap and the time schedule for this process - the fact that in most of those countries where funding for CLARIN was made available the

funding was granted on a project basis, after competitive calls for proposals; this had two serious consequences:

� some strong players in the CLARIN-EU project did not (yet) succeed in the application

� in successful cases the projects did not have sufficient flexibility in their programmes to accommodate tasks following from the CLARIN-EU project

Page 10: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 10

Even though the activities undertaken as part of national CLARIN projects constituted without exception valuable contributions to the construction and the population of the future CLARIN infrastructure most of them did not feed directly into the CLARIN project. Notable exceptions were NL, DE and FI. The national roadmap in NL was finished early 2008, a first call for proposals was issued in the same year and after a successful application the Dutch CLARIN-NL project could start early 2009, with national funding for the remainder of the preparatory phase and for the start of the construction phase. CLARIN-NL was shaped as a mix of a project and a programme, which gave it sufficient flexibility to accommodate new tasks, including tasks following from CLARIN-EU. In DE national funding was obtained for the preparatory phase, with sufficient flexibility for the CLARIN-DE team to directly contribute to CLARIN-EU. The same was the case for FI. To this should be added that in many countries the local CLARIN teams have contributed significantly more to CLARIN-EU than could be expected on the basis of their funding from CLARIN-EU, and in many cases without any national funding. The experiences with the relation between national funded and CLARIN-EU activities have had a strong impact on the shape of the future CLARIN infrastructure as has emerged from WP8. CLARIN representatives have continued presenting the project at national meetings of the language resources and technology communities in order to present and promote CLARIN. An overview of this is included in the reports of the separate work packages.

2.1.3.2 Collaboration with DARIAH and the e-Infrastructures: SDH2010 and

NEERI2010.

From 19-21 October, 2010 CLARIN organized a conference together with our colleagues of the DARIAH project: Supporting the Digital Humanities, (SDH2010). The conference was followed by the second Networking Event for Research Infrastructures: NEERI2010. The local organization was in the hands of our CLARIN colleague Gerhard Budin and his team of the University of Vienna and the Austrian Academy of Sciences. The venue was a centrally located building of the Technical University of Vienna. SDH2010 consisted of a number of topical sessions where providers and users presented and discussed results, obstacles and opportunities for digitally-supported humanities research. Whereas the focus of SHD2010 was on the types of research made possible by research computing, NEERI2010 addressed the technical, architectural and social challenges of building the infrastructure. NEERI focused on what we share and what we can learn from each other. Examples of such commonalities are architectural issues, communication with users and integration of services and tools. Although it was the final conference of the CLARIN and DARIAH projects, we hope and expect that it will be the start for a series of e-Humanities conferences or meetings in the future. All abstracts of the sessions, both of SDH and NEERI can be found at the DARIAH website: http://www.dariah.eu/index.php?option=com_docman&task=cat_view&gid=87&Itemid=200.

2.1.3.3 Collaboration with other ESFRI infrastructures

CLARIN has actively participated in an informal committee of ESFRI RI-coordinators (ePPCC), in order to exchange and share experiences, problems and solutions. This committee worked in close collaboration with the EC, and it has had regular meetings (mostly virtual), sent out questionnaires and organised a number of internal workshops and contributed to workshops organised by the EC. This activity will continue under the auspices of COPORI.

Page 11: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 11

2.1.3.4 Internal communication

An internal website was used to share information and documents within the project. Several mailing lists were maintained to communicate with the CLARIN community at large or specific sub-communities.

2.1.3.5 CLARIN Reference Manual

This new deliverable targets those who will be in charge of building and operating the CLARIN infrastructure at the national level. It aims at bringing together all relevant information emerging from the project that is needed to set up a national CLARIN consortium. It covers both technical, linguistic, legal and governance issues, and it will also address the knowledge sharing infrastructure that will accompany the technical infrastructure. After the completion of the project it will be continuously updated and enhanced so as to reflect the latest stage of the development of the infrastructure. For technical details and discussions it will refer to project deliverables and other documents, which will be kept accessible on-line for this purpose.

Meetings organized/participated (Steven Krauwer and Hetty Winkel)

Description Dates Venue

EB meeting 22-23/10, 2009 University of Copenhagen, Denmark EB meeting 21-22/1, 2010 EWI, Brussels, Belgium EB meeting 02-03-2010 Utrecht University, the Netherlands EB meeting 17-18/06, 2010 Zagreb University, Dubrovnik,

Croatia EB meeting 23-24/09, 2010 Paris, France EB meeting 10-11/01,2011 Universidad Autónoma, Madrid,

Spain EB meeting 16-17/03, 2011 Oxford University, U.K. EB meeting 20-21/06, 2011 University of Copenhagen, Denmark Advisory Boards meeting 3-4/03, 2010 Utrecht University, the Netherlands Consortium meeting 18-10, 2010 Technical University Vienna, Austria NEERI-2009 29/9-3/10 2009 University of Helsinki National Coordinators meeting

29-30/06, 2011 Hungarian Academy of Sciences, Institute of Linguistics, Budapest

SDH2010 preparation 14-16/3, 2010 Vienna SDH2010 19-20/10, 2010 Technical University Vienna, Austria NEERI2010 21-10, 2010 Technical University Vienna, Austria

Page 12: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 12

Meetings participated by Steven Krauwer (WP1C)

Description Dates Venue

Digital Humanities Conf. 22-29/06, 2009 University of Maryland LTC 2009 6-8/11, 2009 Poznan, Poland INFuture 2009 3-6/11, 2009 Zagreb, Croatia CHAINS meeting 26-27/10, 2009 London CLARIN meeting MPI 3/9, 2009 Nijmegen Speech workshop 25-26/11, 2009 Stockholm FLaReNet meeting 10-14/2, 2010 Barcelona LTDays INFSO 22-24/03, 2010 Luxembourg CLARIN Portugal meeting 18-21/03, 2010 Lisbon PROPOR2010 27/4-3/5, 2010 Porto Alegre LREC2010 18-24/5, 2010 Malta META-NET meeting 4-6/06, 2010 Berlin CLARIN Basque workshop 7-9/6,2010 San Sebastian Baltic HLT2010 6-9/10, 2010 Riga EGI Technical Forum 16-17/9, 2010 Amsterdam EHRI launch/META Forum 16-17/11, 2010 Brussels CLARIN-CAT meeting 18-20/1, 2011 Barcelona Closing the Deal 2/12, 2010 Leiden CLARIN-CZ LINDAT 21-22/3, 2011 Prague Flarenet Forum 25-28/5, 2011 Venice Dealing with Data 8/6, 2011 The Hague Multimodal workshop 23-25/6, 2011 Lyon

Travels of CLARIN Participants to special events paid from the WP1C budget:

PERSON/INSTITUTE Dates Objective/venue

Fabio Zanzotto 11-13/5, 2009 Consortium meeting Barcelona Antonio Branco 30/9-3/10, 2009 NEERI/WP2 workshop, Helsinki Andreas Witt 30/9-3/10, 2009 NEERI/WP2 workshop, Helsinki Kai Wörner 30/9-3/10, 2009 NEERI/WP2 workshop, Helsinki Andreas Schmidt 30/9-3/10, 2009 NEERI/WP2 workshop, Helsinki Adam Przepiörkowski 30/9-3/10, 2009 NEERI/WP2 workshop, Helsinki Sue Ellen Wright 30/9/3/10, 2009 NEERI/WP2 workshop, Helsinki Senka Drobac 30/9-3/10, 2009 NEERI/WP2 workshop, Helsinki Matej Durco 30/9-3/10, 2009 NEERI/WP2 workshop, Helsinki Michael Zock 30/9-3/10, 2009 NEERI/WP2 workshop, Helsinki Lawrence Lannom 30/9-3/10, 2009 NEERI/WP2 workshop, Helsinki Artotel Leipzig 18-19/11, 2009 WP2 Workshop Leipzig Prof. dr. Gurevych 18-19/11, 2009 WP2 Workshop Leipzig Norman May 18-19/11, 2009 WP2 Workshop Leipzig Richarfd Eckart de Castillo 18-19/11, 2009 WP2 Workshop Leipzig Kirsten Bergmann 25-26/11, 2009 Speech workshop Stockholm Klara Vicsi 25-26/11, 2009 Speech workshop Stockholm Christoph Draxler 25-26/11, 2009 Speech workshop Stockholm Jean Carletta 25-26/11, 2009 Speech workshop Stockholm Nick Campbell 25-26/11, 2009 Speech workshop Stockholm Volker Steinbiss 25-26/11, 2009 Speech workshop Stockholm Ilze Auzina 2-3/3, 2010 CLARIN Boards meeting Utrecht Galia Angelova 2-3/3, 2010 CLARIN Boards meeting Utrecht Helge Kahlre 2-3/3, 2010 CLARIN Boards meeting Utrecht

Page 13: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 13

Istvan Kenesei 2-3/3, 2010 CLARIN Boards meeting Utrecht Miguel Filgueiras 2-3/3, 2010 CLARIN Boards meeting Utrecht Dan Tufis 2-3/3, 2010 CLARIN Boards meeting Utrecht

Sue Ellen Wright 2-3/3, 2010 CLARIN Boards meeting Utrecht Jean-Pierre Martens 2-3/3, 2010 CLARIN Boards meeting Utrecht Petr Ventluka 2-3/3, 2010 CLARIN Boards meeting Utrecht Justux Roux 2-3/3, 2010 CLARIN Boards meeting Utrecht Annie Zaenen 2-3/3, 2010 CLARIN Boards meeting Utrecht Helena Vänskä 2-3/3, 2010 CLARIN Boards meeting Utrecht Arto Mustajoki 2-3/3, 2010 CLARIN Boards meeting Utrecht Sebastian Drude 2-3/3, 2010 CLARIN Boards meeting Utrecht Neil Freistat 2-3/3, 2010 CLARIN Boards meeting Utrecht Gulsen Eryigit 19-20/10, 2010 SDH2010, Vienna Albert Esteve Palos 19-20/10, 2010 SDH2010, Vienna Orsolya Vincze 19-20/10, 2010 SDH2010, Vienna Bea Ehmann 19-20/10, 2010 SDH2010, Vienna Frans Wiering 19-20/10, 2010 SDH2010, Vienna Laszlo Janos 19-20/10, 2010 SDH2010, Vienna Dan Tufis 19-20/10, 2010 SDH2010, Vienna Alan Marsden 19-20/10, 2010 SDH2010, Vienna David Bretherton 19-20/10, 2010 SDH2010, Vienna Andrius Utka 19-20/10, 2010 SDH2010, Vienna Vladislav Fomin 19-20/10, 2010 SDH2010, Vienna Hans Bennis 19-20/10, 2010 SDH2010, Vienna Vladimir Stissi 19-20/10, 2010 SDH2010, Vienna Helen Katsiadakis 19-20/10, 2010 SDH2010, Vienna Tibor Polya 19-20/10, 2010 SDH2010, Vienna Louise Corti 19-20/10, 2010 SDH2010, Vienna Tanja Petrovic 19-20/10, 2010 SDH2010, Vienna Maria Ilvanidou 19-20/10, 2010 SDH2010, Vienna Richard Lewis 19-20/10, 2010 SDH2010, Vienna Todirascu Amalia 19-20/10, 2010 SDH2010, Vienna Malte Rehbein 19-20/10, 2010 SDH2010, Vienna Stuart Dunn 19-20/10, 2010 SDH2010, Vienna Michael B. Toth 19-20/10, 2010 SDH2010, Vienna Neil Fraistat 19-20/10, 2010 SDH2010, Vienna Marinus van Horik 19-20/10, 2010 SDH2010, Vienna Richard Deswarte 19-20/10, 2010 SDH2010, Vienna Elena Pierazzo 19-20/10, 2010 SDH2010, Vienna Thierry Declerck 19-20/10, 2010 SDH2010, Vienna Giuseppe Longobardi 19-20/10, 2010 SDH2010, Vienna Matthew Driscoll 19-20/10, 2010 SDH2010, Vienna K. Mandemakers 19-20/10, 2010 SDH2010, Vienna Johannes Wagner 23-25/6, 2011 Multimodal workshop, Lyon Marianne Gullberg 23-25/6, 2011 Multimodal workshop, Lyon Andrius Utka 29-30/6, 2011 CLARIN Coordinators, Budapest Peter Wittenburg 23-25/6, 2011 Multimodal workshop, Lyon Erhard Hinrichs 29-30/6, 2011 CLARIN coordinators, Budapest Koenraad de Smedt 29-30/6, 2011 CLARIN coordinators, Budapest Kadri Vider 29-30/6, 2011 CLARIN coordinators, Budapest Krista Liin 29-30/6, 2011 CLARIN Coordinators, Budapest Kaili Muurisep 29-30/6, 2011 CLARIN coordinators, Budapest Peter Wittenburg 29-30/6, 2011 CLARIN Coordinators, Budapest

Page 14: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 14

Meetings participated by Steven Krauwer and/or Hetty Winkel (WP8)

Description Dates Venue

WP8 meeting 12/13-10, 2009 Copenhagen ESFRI meeting (Krauwer en Winkel)

29/30-10, 2009 EC, Brussels

WP8 meeting (Krauwer) 10/11-11,2009 Copenhagen Meeting OCW 9/12-2009 The Hague Meeting NTU (Krauwer) 15/2, 1010 The Hague Meeting with WP8 coord. 2-5/2, 2010 Copenhagen SSH workshop 19-22/10, 2010 Brussels Meeting with WP8 coord. 9-10/3, 2010 Copenhagen Prepation Meeting ERIC 30/6-2/7, 2010 Copenhagen Digital Humanities 2010 7-11/, 2010 London Meertens/KB meeting 26/8, 2010 The Hague Planning meeting ERIC 8-10/10, 2010 Copenhagen ESFRI PPP (Winkel) 14-12, 2010 Brussels CLARIN WP8 meeting 25-26/1, 2011 The Hague CLARIN WP8 meeing 3-6/1, 2011 Copenhagen

2.2 WP2 Technical Infrastructure

2.2.1 Objectives The objective of WP2 was to specify the technical infrastructure on which CLARIN will be based and to build an experimental prototype of the service infrastructure that will serve to validate the specifications from various perspectives and to experiment with a variety of different web services for the prospective user community.

2.2.2 Summary of progress towards the objectives The WP has four main branches with mostly a specification and a prototype implementation track:

1. Identification of centres that are strong enough to play a role as stable backbone in the emerging infrastructure, based on a self-assessment and selection procedure.

2. Establishing a federation of language resource and technology archives/repositories based on the DAM-LR experiences, integrate this with existing national identity federations and establish a few typical domain-wide services such as one for registering and resolving persistent identifiers.

3. Specifying and prototyping a comprehensive metadata registry system that is based on the long experience in the discipline. It set out gathering the experiences and describing all requirements and after that it has started building a new more flexible component architecture.

4. The fourth branch has been discussing requirements for establishing a Service Oriented Architecture and the possibilities of Workflow systems. This specification work is deeply related with linguistic expertise about formats and encoding systems and could only be settled after broad discussions including other relevant initiatives.

2.2.3 Details of each task

2.2.3.1 Centres set-up (D2R-1a, -1b)

About 25 centers are now hosting a repository, linguistic web services and/or applications. 17 Among them offer their metadata via OAI-PMH. Long-term preservation guarantees are given by 3 centres. 11 Centres offer persistent identifiers for the resources they host. Details about the centre setup can be found at http://www.clarin.eu/centres

Page 15: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 15

2.2.3.2 AAI infrastructure (D2R-2a, 3a, -3b)

9 Service Providers have founded the CLARIN Service Provider Federation, offering services to all Dutch, German, Finnish, Danish and Swedish academic users. This temporary setup can be easily transferred to the CLARIN-ERIC as a legal entity. More details about the AAI setup can be found at http://www.clarin.eu/spf

2.2.3.3 Metadata (D2R-4, -5a, -5b)

ISOcat

The Athens Core group is formed by a broad expert panel from the Language Resources and Technology community. Originally initiated in Athens, around January 2009, it proposed an initial list of about 200 ISOcat entries related to the Metadata field. These entries have been translated into roughly 17 languages. Details about the Athens Core data categories can be found at http://www.isocat.org/rest/dcs/387

Metadata infrastructure

The Component Based Metadata xml-framework has been polished and reached a stable phase. Next to this, a range of related software has been released:

• The Component Registry. With this web application users can create reusable metadata components (including links to the ISOcat data category registry) and group components into a metadata. These profiles can then be used to create metadata descriptions with e.g. ARBIL.

• ARBIL, a professional CMDI-compatible metadata editor. • The CLARIN Metadata Browser. This highly sophisticated metadata repository

provides a front-end to the <eXist/> XML database where all CMDI records are stored and features integration with ISOcat, relCAT and the Component Registry.

• The Virtual Collection Registry enables users to create personalized and yet shareable collections of links to research data.

• The Virtual Language Observatory is an easy-to-use portal offering access to all metadata that is gathered in CLARIN. It supports facet search, geographic browsing and uses ISOcat for the semantic mapping between the diverse CMDI profiles.

Details about the metadata infrastructure setup and links to the mentioned software can be found at http://www.clarin.eu/cmdi

2.2.3.4 Web Services (D2R-6b, -7a, -7b, -8)

The CLARIN centres are offering a multitude of linguistic web services. A subset of these can be orchestrated via the WebLicht web application. Next to that efforts have been made to setup a federated content and metadata search, based on the SRU/CQL protocol and web services provided by some centres. More details about the CLARIN webservices can be found at http://www.clarin.eu/webservice

2.2.3.5 EPIC persistent identifier service (D2R-2b)

A handle-based registration and management service for persistent identifiers, labelled European Persistent Identifier Consortium (EPIC), has been established. Several centres are using this service. More details about EPIC can be found at http://www.pidconsortium.eu/

Page 16: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 16

2.2.3.6 Cost estimations (D2R-9a, suppressed: -9b)

Cost estimations and calculation examples have been made and have fed into the WP8 documents in preparation of CLARIN ERIC. Deliverable D2R-9b has been suppressed as a separate deliverable as it lost its relevance in the light of the proposed financial structure of CLARIN ERIC, where the technical infrastructure will be operated and funded at the national level, coordinated by CLARIN ERIC.

2.2.3.7 Software development

The source code of all software that has been developed within the CLARIN project is published at http://www.clarin.eu/sourcecode

Events and activities

Description Dates Venue

Workshop on phonological corpora 31/7,2009 Augsburg Developers meeting 29/9, 2009 Helsinki Standards workshop and NEERI 30/9, 2009 Helsinki WP2/WP7 meeting 8/10, 2009 Nijmegen DRIVER conference 20/10, 2009 Ghent Videoconference with TERENA 30/10, 2009 Centres workshop 5/11, 2009 Prague Web service workshop 19/11, 2009 Leipzig APA conference 24/11, 2009 The Hague Speech resources workshop 25/11, 2009 Stockholm eHumanities conference 9/12, 2009 Oxford Videoconference with WP7 18/12, 2009 Relation Registry workshop 8/1, 2010 Nijmegen Demonstrator workshop 25/1, 2010 Amsterdam CMDI meeting 30/1, 2010 Leiden Grid meeting 2/2, 2010 Jülich Video conference with TERENA 23/2, 2010 Big Grid meeting 8/3, 2010 Amsterdam eInfrastructure presentation 19/3, 2010 Oslo ISOcat workshop 25/3, 2010 Utrecht Developers meeting 29/3, 2010 Vienna EGEE user forum 12/4, 2010 Uppsala EPIC workshop 19/4, 2010 Göttingen LREC workshop and conference 17/5, 2010 Valletta CLARIN-DARIAH meeting 1/6, 2010 The Hague Video conference TERENA 23/6, 2010 ISOcat workshop 14/8, 2010 Dublin CLARA summerschool 1/8, 2010 Bad Homburg APA metadata workshop 7/9, 2010 Nijmegen EGEE technical forum 14/9, 2010 Amsterdam CMDI developers meeting 16/9, 2010 Nijmegen ISO metadata meeting 13/10, 2010 Berlin SDH/NEERI conference 19-22/10,

2010 Vienna

CMDI developers meeting 21/10, 2010 Vienna ISO metadata meeting 30/10, 2010 Berlin CMDI tutorial 17/1, 2011 Nijmegen Federated search workshop 9/5, 2011 Nijmegen XLDB workshop 8/6, 2011 Edinburg

Page 17: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 17

Description Dates Venue

RAMIRI workshop 14/6, 2011 Amsterdam WP2 session 29/6, 2011 Budapest

Page 18: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 18

2.3 WP3 Humanities Overview

2.3.1 Objectives

WP3 addresses several facets of supporting the Humanities and Social Sciences (HSS) research community in using the CLARIN infrastructure, which is a central aim of the CLARIN mission. Already in the preparatory phase, it is important to build bridges to the target communities, developing expertise and thereby not only improving the overall impact of the project, but also reducing the risk of either being too far ahead of them or not understanding their real needs, hence not serving them. For obvious reasons, researchers in the HSS domain, but also in several other domains, are so far only marginally or not at all familiar with linguistic processing tools and methods, so that the „Build it and they’ll come” strategy may not be wise to follow: they may come but easily become bewildered, frustrated, or confused how to get started. It proved to be vital for the further development of the CLARIN infrastructure to gain first hand knowledge about the research interests, methods, and conceptual frameworks based on which HSS scholars conduct their research. This knowledge is used in tailoring the services to suit the needs of our target communities. Reaching out to the users rested on three pillars: (1) Survey and liaison work, (2) Engaging in pilot collaborations, and (3) Training and guidance, out of which WP3 has addressed (1) and (2).

2.3.2 Summary of progress towards the objectives

2.3.3 Details of each task

Task 1 Working group formation and establishing the detailed activity plan The working groups were formed to address the issues that are necessary for charting up the target domain of WP3: HSS users utilizing language technology and its resources, and identifying our potential audience. (For details, see D3C-1.1).

Working Group 3.1. Scoping and Impact Study

The aim of this working group was to identify, mobilize and bring together a critical mass of producers and users around the infrastructure of language tools and resources across the heterogeneous fields that constitute the Humanities. It combines well-established social science techniques, such as interviews with key stakeholders, and new cybermetric techniques that can 'crawl' the content of the web and the hyperlinking structures between web content to identify a networked perspective of current producers and users of languages resources. The scoping study is complemented by a subsequent impact study of the infrastructure.

Page 19: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 19

Working Group 3.2.

Overview of relevant Humanities projects and professional associations WG3.2 focuses on the analysis of past and existing projects in the Humanities, with the aim of maximizing impact on the field, and to establish contact with leading professional associations in the Humanities that are potential partners in employing language technology in their research. We aimed to obtain a comprehensive overview, in order to develop a thorough understanding about success stories as well as less successful attempts of using language technology in Humanities studies. To this end, a survey of Humanities projects and research methods has been conducted to analyze research concerns as well as methods of the Humanities field (cf. Task 3). Working Group 3.3. Call for Humanities Projects

One way of securing impact was to engage in joint collaborative work with ongoing HSS projects. At the end of 2008 a Call for Proposals for Collaborating with Humanities and Social Science Projects was issued by WP3 of CLARIN (for details, see D3C-4.1), targeted at research institutions or consortia featuring high research capability, but in need of complementing their expertise and resources with specific language technology (LT) components for advancing beyond the established state-of-the-art. The contribution of CLARIN to this end was meant to consist of guidance about and providing access to language resources and LT, desirable to adopt for the particular objectives of a project. About the evaluation of the received proposals, cf. Task 4. Task 2 Scoping and impact study to identify potential collaboration partners to

integrate the communities (Deliverable 3C-2.1).

At the start of the CLARIN preparatory phase the following key partners were identified :

• DARIAH • Coalition of Humanities and Arts Infrastructures and Networks (CHAIN) • EGI.eu

And the following key collaboration activities were carried out: • Summit of Digital Humanities Centres, London, July 2010 • Networking Events for the European Research Infrastructures (NEERI) 2009 and 2011 • Supporting the Digital Humanities (SDH) 2010 & 2011

Liaison with the DARIAH research infrastructure was institutionalized at the start of the project by the arrangement for the University of Oxford to act as the official liaison partner, participating in both the CLARIN Preparatory Phase Project, and its counterpart Preparing

DARIAH. Communications between the two projects were good, and numerous joint activities and projects resulted. The most significant within the work plans of the projects were the Supporting the Digital Humanities conference, held in Vienna in October 2010, and the initiation of the Coalition of Humanities and Arts Infrastructures and Networks. Through these activities, the differing roles and activities of the two infrastructures have been clarified for all concerned, and both look forward to continuing to work closely together. At the national level in many countries where both CLARIN and DARIAH have a presence, they are working closely together to build carefully coordinated or joint national research infrastructures, thus ensuring that DARIAH and CLARIN will work together in complementary activities, with maximal synergies, with the maximum value for money, and the minimum of overlap. Furthermore, the DASISH and EUDAT project involve both infrastructures. While these ongoing activities are outside of the scope of the CLARIN Preparatory Phase Project, and outside of the scope of this report, their existence is largely a successful result of liaison between the two projects within this project.

Page 20: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 20

In 2009 CLARIN initiated a panel discussion at the Digital Humanities 2009 conference (at the University of Maryland) involving leaders of the key projects, associations and networks with interests in building digital infrastructure services for the humanities, including CLARIN, DARIAH, the Association of Digital Humanities Organizations (ADHO), centerNet and Project Bamboo. Each panellist addressed the ways in which their activity was addressing various aspects of the challenges of building research infrastructures. A general discussion started to examine how these different activities might work together better. It was decided in the aftermath of the discussion to set up an umbrella organisation to discuss and promote interoperability of services across the different initiatives. A subsequent meeting in London, the Coalition of Humanities and Arts Infrastructures and Networks (CHAIN) was founded. CHAIN is a forum for discussion and collaboration, with a very light-weight organisational structure, with fluid membership and boundaries, no budget, and meeting only when necessary. CHAIN participants have resolved to work together on advocacy for improved infrastructure, and on aligning their infrastructure initiatives to allow the maximum interoperability of services. In 2010, a follow-up panel session was held at the Digital Humanities Conference in London. A CHAIN technical workshop in Oxford in March 2011 brought together more than twenty application and infrastructure architects and developers, to discuss practical measures to promote interoperability of the various services under construction by the key international research infrastructure initiatives. This workshop was called by CLARIN to encourage the participants in CHAIN to go beyond strategic alliances and expressions of goodwill towards each other, and to work together on concrete technical problems. The participants included CLARIN, DARIAH, TextGRID, eSciDOc, ENROLLER, GATE, Bamboo Technology Project, centerNet, ADHO, and the Network of Expert Centres in the Digital Humanities in Britain and Ireland. There was also participation from experts in building research infrastructure for the sciences from the National Grid Service in the UK. Outcomes of the workshop included plans for the alignment of ongoing design and construction of services by the various initiatives to ensure maximum interoperability, and a project to widen participation in the CLARIN cross-search demonstrator to other centres. In July 2011 EGI (the European Grid Infrastructure) signed a Letter of Intent with the two European Commission funded projects CLARIN and DARIAH aiming to provide the arts and humanities with an e-Infrastructure of their own. The two projects both signed the three-way agreement, which has the express intention of ensuring that technology developed by the two ESFRI projects and EGI are compatible and provides the best service to their users. The agreement also provides a blueprint for similar arrangements between EGI and other user communities that are interested in exploring grid technology but are temporarily constrained from establishing a Virtual Research Community, pending the establishment of legal entities for the infrastructures.

While there is a wide range of research using the European grid, the arts and humanities are currently under-represented, but both DARIAH and CLARIN aim to remedy this. As the key projects European Strategy Forum on Research Infrastructures (ESFRI) programme projects building infrastructure for the Arts and Humanities, as well as interpretive Social Sciences, they will play a key role in providing access to high performance research computing facilities for these domains. This initiative helps to support a coherent approach to policy-making on research infrastructures in Europe so that researchers get the resources/technologies they need.

This latest agreement cements the relationship between the 2 ESFRI projects and EGI.eu helping all three to develop common tools and technologies while exploring further opportunities for collaboration. Connecting EGI with the infrastructures developed by these two projects will have benefits for the sites within partner countries providing the resources, benefits for the countries themselves in terms of integrating the resource centres with the EGI monitoring and support services and finally, benefits for the research communities in terms of support, advice and influence over the evolving infrastructure. The first priority for the ongoing work will be to understand the needs of the communities that DARIAH and CLARIN represent, so that they can build and manage an infrastructure for their users. The areas of greatest

Page 21: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 21

concern identified in initial discussions are data storage and availability, the hosting and monitoring of services, as well as authentication and authorization issues.

Impact in these areas was also measured in other ways. Information relating to 58 publications directly related to CLARIN, plus 49 conference or workshop presentations has been gathered. These figures are likely to be underestimates due to the difficulty in gathering data on all activities in all countries, and in finding all publications which mention or have been made possible by CLARIN. At least one interview with the press has been published (in International

Grid Science This Week). Furthermore, the CLARIN Newsletter has been very widely distributed, downloaded and read. The publications came from at least 37 authors across the various centres involved in the project, including chapters in academic collections on topics as diverse as lexicography, metadata and web services architectures. Presentations on various aspects of CLARIN were made across Europe, from Riga to Athens and from London to Poznan and beyond, at events in the domains of Grid and e-Science, cultural heritage, humanities, linguistics, computational linguistics, corpus linguistics, digital libraries, as well as at events for ESFRI projects and e-infrastructures

CLARIN also made some first steps towards training the next generation of scholars and developers, running a major summer school in Germany in 2010 (D-SPIN Sommerschule in Bad Homburg) and contributed to the RAMIRI training programme, in which young CLARIN managers also participated as trainees.

It can be seen that in many ways we were able to go beyond the proposed project output of the identification of potential partners, and to make concrete alliances and collaborations. While more can clearly be done in the next phase of CLARIN to build collaborations, particularly with stakeholders in the many research communities with whom we will work, it is assured that we will start the construction phase of our infrastructure with key collaborations already in place. CLARIN was first conceived as a response to the problem of fragmentation of resources, tools and projects in the Humanities. As well as addressing the fragmentation of the core domains from which our actors, tools and resources originate (linguistics and computational linguistics, described in reports from work packages 2 and 5), the activities described above demonstrate how we have played a key, in many cases central, role in wider developments across the Humanities and Social Sciences and beyond.

While the problem of fragmentation of isolated silos of digital activity remains, important initiatives now exist to address the problem by building infrastructure services, and promoting standards, interoperability, collaboration and reuse of resources.

We modestly believe that the activities of work package 3 of the CLARIN Preparatory Phase have made an important contribution to this change. Task 3 Analysis of past and ongoing projects in the humanities

The focus of Task 3 has been to gain a comprehensive view of the state of the art of applying language technology and creating modern infrastructures for it, as well as the identification of LT requirements of the relevant HSS domains. The benefits of applying LRT infrastructure in HSS domains are manifold, but the most fundamental ones needed to be clearly communicated and showcased to our target user groups. Namely, enrichment of textual data with linguistic information yields major improvement over manual text analysis and synthesis: it allows for obtaining more precise hits in content analysis (achieving cost and time reduction), allows for replicating experimental output, and secures access to a continually updated supply of LT resources. By compiling deliverable D3C-3.2, CLARIN wished to serve several purposes: (i) to inform CLARIN partners about the landscape of HSS research; (ii) to identify current levels of uptake and use of LRTs in HSS; (iii) to identify barriers to the uptake of LRTs in HSS; (iv) to identify trends in user requirements within the HSS community.

Page 22: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 22

Separate chapters in this document are devoted to the emerging step change taking place in traditional research in HSS disciplines, including a list of aspects of as well as conclusions reached by relevant expert centres, institutions, and projects in Europe. In the light of the complexity of HSS and LT research themes, directions, and layers, digital HSS research is characterized in terms of content, data types, tools, and requirements, whereas barriers and challenges are detailed along the dimensions of legal issues, uptake, as well as cross-community engagement for promoting awareness-raising in the role LT can play for HSS scholars. Task 4 Selected exemplary humanities projects (Milestone 3C-5.1 and Deliverable 3C-5.1)

Evaluation of the received proposals was based on the following criteria. Applicants had to be organizations or consortia from any sector, including at least one organization established in a Member State or Associated Country with respect to the European Commission Seventh Framework Programme. Research activities were to primarily consist of research or technological development within HSS, addressing new and pertinent research questions, and may also have included demonstration activities, designed to prove the viability of new approaches or their exploitation. The research plan should have reached some maturity and should profit from one component involving a specifically described need for language resources technology. The national representatives for CLARIN provided advice on how well each proposal fit in national priorities. The final decision about selection was taken by the CLARIN Executive Board, on the basis of expert peer reviews. From the total candidate applications CLARIN has received, in total nine selected projects started to collaborate with CLARIN. The goals of these posed intriguing challenges to current LT infrastructure, concepts, and standards, and as such were capable of demonstrating the potential of LT infrastructure applied to data from HSS disciplines.

Acronym Name Proposer

CAP A Hierarchical Lexical Function related to Proper Nouns

Université de Strasbourg

CKCC Circulation of Knowledge and Learned Practices in the 17th-century Dutch Republic.

University of Utrecht

CONPLISIT Consumption patterns and life-style in Swedish literature – novels 1830-1860

Göteborgs universitet

DATIST Studying speech and language therapy diagnosis, using statistical analysis and textual statistics

ATILF-CNRS

DID-Cph Consultancy for the Dictionary of Danish Insular Dialects

University of Copenhagen

HISTOPOL Narrative Social Psychological Studies of European History

Hungarian Academy of Sciences

HTM4EmodE Historical text mining for assisting the study of discourses in Early Modern England

University of Liverpool

LaMeCos Database and Database Analyzer of Medieval Latin Scientific Terminology

Universitat de Barcelona

MLT-CPhil Multilingual language technology for classical philology research

University of Hamburg

Pro_Trans The professionalisation of translation in the North of Portugal

University of Minho

Page 23: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 23

The first group of projects comprised those that best demonstrate the use of LRT and would show the potential of a research infrastructure in the Humanities. It is therefore in the interest of CLARIN to monitor and advise these projects. They include CAP, CKCC and MLT-CPhil. The second group, DATIST, DID-Cph and LaMeCos, have been selected for limited cooperation and support as it was felt that while the projects are of interest to CLARIN, the scope and nature of the required language technology work is very specific, allowing relatively little room for testing the capabilities of the CLARIN infrastructure. The third group consists of projects (i.e. CONPLISIT, HISTOPOL and HTM4EmodE) that already have some ongoing cooperation with some CLARIN partners, which is hereby endorsed with a formal arrangement. Such cooperation typically does not require central CLARIN funding and may rely on local resources.

Task 5 Guide the different project teams (Deliverable 3C-5.1)

The period of a one-year collaboration was offered by CLARIN to HSS scholars in order to help further articulate the relationship between new technologies and humanities. One objective of the collaboration was to help researchers manage and enrich their own digital research data and thus to advance scholarship by means of enhanced, automatized curation, analysis, or interpretation of digital texts. To this end, CLARIN provided knowledge about and expertise in applying LRT tools for linguistic and semantic processing of data sets from new domains. Another objective was to raise researchers’ awareness in linguistically enabled research and to impact the target group via real-life use cases. Each supported project aimed to benefit scholarly research in a knowledge institute by means of incorporating LT in preparing data or tools to be used for scientific investigation. The following HSS disciplines were addressed in the projects: – Linguistics (domain-specific language use; dialectology; lexicography; lexical semantics) – History – Cross-disciplinary research involving History and Psychology (on historical narratives) – Cross-disciplinary research involving History and Philosophy (on the circulation of

knowledge). In turn, CLARIN has gained first hand knowledge about the research interests, methods, and conceptual frameworks based on which HSS scholars conduct their research. This knowledge is the key to tailoring the services of the CLARIN infrastructure to suit the needs of its target communities.

Besides the community of humanists, the targeted user group included the IT personnel of HSS researchers, as experts possessing the skills for developing and delivering digitized data, on which the scholars would want to perform automated analysis and enhancement. Profiles of the users necessarily varied; typically, the research teams had possessed very limited prior knowledge of LRT before the start of the cooperation: they were able to formulate valid needs with respect to the required output, but — even the IT staff— could be easily mistaken concerning the necessary amount of work, the kind of tools and workflow, and the actual implementation or methodologies the project needs would imply.

D3C-5.1 describes the detailed research planning of each individual project. Specifically, the cooperation was meant to provide practice for HSS researchers in elaborating on the following issues: (i) What are common methodologies in LT, what are best practices to locate and to develop basic computational resources for text-based research? (ii) How could the adaptation of language-specific tools be performed to enable processing other languages? (iii) Along which dimensions and how can one make extensions to existing domain-specific LT approaches for full applicability to data from a project's target domain? (iv) To what extent can LT techniques (e.g., keyword and concept extraction) be applied to a specific task aimed at by a specific project (e.g., to analyzing knowledge circulation)?

Page 24: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 24

Results

By means of a network of experts who were ready and able to engage in responding to end user needs, LT infrastructures and techniques have been brought (closer) to HSS scholars by means of hands-on exercise. Users have overcome a significant initial barrier in utilizing new methods, and they realized that LT is an agglomerate of acquired skills, methods, and ways of thinking. The so-called “Humanities BLARK” defines, links, and documents HSS user requirements with respect to specific LT applications, and identifies the modules and data that are needed to build those applications (cf. Section 3 and Appendix A of CLARIN D5C-4). In the CLARIN collaboration projects, feedback on issues pertaining to the Humanities BLARK has been collected from the HSS users via the questionnaire survey, and was reported in D3C-6.1. In the same document, we report on LT resources (data and tools) utilized or adapted in the collaboration use cases, as well as LT created within the projects themselves. Scholarly activities that have been conducted can be characterized in the following dimensions: Quantitative research carried out in terms of:

• Most frequent word and their contexts, their frequent collocates • Statistical analyses run on the tables of metadata and term data • Term occurrence counts

Qualitative research issues addressed:

• Finding out diagnostic labels (terms) used by speech and language therapists • Extracting information from a corpus of transcribed tape recordings in a dictionary

project. All kinds of linguistic information are of potential use in the entries of the dictionary: phonetic, morphological, syntactic, semantic, pragmatic, which is not normally attested in the collection of paper slips

• Topic analysis • Search for named entities pairs (Person, Person) (Person, Organization), (Organization,

Organization) • Thesaurus-expanded browsing of 19th century fiction in order to answer a historical

research question • Subjective time experience as a personality trait • Subjective time experience as marker of depression • Social representation of the events of Hungarian history • National and European identity • Historical trajectory research • Distant monitoring of psychodynamics of isolated small groups • Space psychology (content analysis of crew reports in space-analogue experiments) • How knowledge was spreading in time and space

Processes that could additionally be automatized include the following:

• the generation of semantically similar search terms • input of new documents into the system with XML encoding • a single query language to search simultaneously all the existing corpora.

A demo/prototype system has been (partially) created during the projects in several cases, or will be created in a next stage. These feature the following functionalities:

• Solutions to the specific needs of the project, for instance the possibility of displaying full text and printing of paper slips with relevant occurrences of words, including basic information and a certain amount of surrounding text around the words

• Provide a list of patterns available in Unitex format in several languages • Various modes to select documents; facets: topic, person, location, language;

visualizations: word cloud, time line, map, social network; provide links to transcription of the documents.

Page 25: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 25

Task 6 Making an integrated strategic plan for support for the Humanities by CLARIN

(Deliverable 3C-6.1)

The deliverable provides an overview of the results achieved by the activities implemented during the period of a one-year collaboration within CLARIN-supported projects, between CLARIN experts as advisors and HSS researchers as users. The strategic suggestions put forward in this document aim to guide providers of linguistic resources and infrastructure services in interacting with their target audience in collaboration scenarios. Uninitiated users (non-IT HSS scholars, typically researchers and students from academia) want to notice LT in their workflow as little as possible, but there are also users with a technological background (e.g. the IT personnel of these scholars) who are interested in portability and tailoring of resources to new data and needs. In interacting with the user groups, infrastructure providers such as CLARIN are to serve both interests, which requires a step change in the mentality of both parties. Providers must learn to cope with a changing view on infrastructure building, and be ready to serve specific user needs that ask for maximal applied utility, allowing flexibility for tailor-made solutions, and preparing accessible documentation. They should not expect users to develop the ability to think in terms and methodology of LT, and turn into experts: cross-disciplinarity and cross-community interactions may emerge and progress only slowly. Joint work entails the influx of new concepts and approaches to each other’s everyday work, in which informing and coaching each other is a long and never-ending process. Infrastructure providers need to make users aware that LT has not emerged as a supporting discipline of Humanities, but it is applied NLP, a complex discipline that is not intuitive to learn and use. Users need to realize that LT is not perfect, in cannot provide full domain and language coverage. They need to understand that there are application workflows (for data preparation, i.e. formalization, cleanup, etc. as opposed to data processing, and so on). Consolidating the impact of cross-community interaction and collaboration might take various forms, ranging from the joint development of LT, to the writing of project reports, documenting demonstrator applications, as well as collaborating on articles. All of these are excellent means of documenting the process and expertise gained from the partnership. Last but not least, for maximum impact, two key factors are necessary to mind carefully: allocating a large amount of time is regularly required, as well as suitable workforce in the form of designated personnel within a project – the latter entailing that more language resources creators and LT programmers are needed to be educated and employed. CLARIN deliverables of both WP3 and WP5 specify in detail the types of LT resources that are most needed by HSS users, and the CLARIN Reference Manual will inform about several aspects of knowledge and resource sharing.

Cooperation and connection with other projects

Cooperation with the DARIAH project helps to organize interactions with leading initiatives and organizations in the Humanities, and further broaden the picture of how language resources and technologies are used in various disciplines, and what the needs of Humanities scholars are.

Page 26: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 26

Meetings, workshops and conferences

Description Date Venue

Digital Humanities 2008 Conference

June 2008 Oulu, Finland

New Trends in Language Corpora September 2008 Granada, Spain Digital Humanities 2009 Conference

June 2009 Maryland, USA

Poetics and Linguistics Association Annual Conference

July 2009 Middelburg, Netherlands

Corpus Linguistics 2009 July 2009 Liverpool, UK Digital Resources in the Humanities and Arts Conference

September 2009 Belfast, UK

CKCC collaboration kickoff meeting

September 2009 Lancaster, UK

European Conference of Digital Libraries

September 2009 Corfu, Greece

Battle of Ideas conference October 2009 London, UK CKCC internal workshop June 2010 The Hague, NL Digital Humanities 2010 July 2010 London, UK Supporting Digital Humanities 2010

October 2010 Vienna, UK

CKCC internal workshop January 2011 The Hague, NL Digital Humanities Collective Intelligence workshop

February 2011 London, UK

CHAIN workshop April 2011 Oxford, UK ICAME conference June 2011 Oslo, Norway

2.4 WP5 Language Resources and Technology Overview

2.4.1 Objectives The aim of this WP is twofold, and is split into two closely connected subcomponents, WP5C and WP5R (where C stands for Coordination and R for Research). In WP5C we specify and implement standards and strategies for achieving interoperability between language resources and tools of all kinds, and we validate the technical standards and service specifications of the CLARIN infrastructure. While WP5C is about the specification of the technological details of language resources and technology, WP5R is about implementing activities as far as they are required in the preparatory phase. Here the focus is on four activities: (1) investigating all aspects that have to do with the integration of resources and tools (that may originate from widely different sources) into the infrastructure; (2) studying usage scenarios including chains of operation (work flows) in detail; (3) the integration of selected language resources into the web service-based infrastructure; (4) validating the work done in WP2.

2.4.2 Summary of progress towards the objectives The aim of WP 5 in the second phase of the project has been reached overall. The focus has been on the exploration of interoperability and standards, which are a prerequisite for the construction of an infrastructure of language resources and tools. Criteria, priorities, and conditions for LRT were specified. Much time has been spent on the construction of a taxonomical system for LRT, as well as on the specification of a BLARK (basic language resource kit) matrix for LRT needed by researchers of the social sciences and humanities, and on a matrix of existing LRT for 13 European languages. A Pan-European Action plan to fill LRT gaps was developed.

Page 27: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 27

2.4.3 Details of each task

Task C1 Working group formation and establishing the detailed activity plan. In the first reporting phase, the working groups (WG 5.1 Tools, WG 5.2 Lexical Resources, WG 5.3 Corpora, WG 5.4 Basic Language Resource Toolkit (BLARK), WG 5.5 Taxonomy, and WG 5.6 LRT Integration were established. In month 28, WG 5.7 Interoperability and Standards responsible for D5C-3 was launched. Task C2 Create a broad and detailed survey of language resources and technology and

produce a comprehensive taxonomy. In a collaboration of WP 2 and WP 5, during the first project phase, the CLARIN Resource Inventory (http://www.clarin.eu/view_resources) and the CLARIN Tools Inventory (http://www.clarin.eu/view_tools) were launched. They contain an exhaustive collection of the resources and tools made available by the CLARIN member institutions. Listed are a variety of metadata, including contact data of the LRT creator. The Resource and Tools Inventories can be queried for the following LRT metadata: name, type, input/output language of the tool or language of the resource, and the distribution type of the resource/tool, as well as the name and country of its creator. At the end of the CLARIN project, the Inventory contains 890 resources and 231 tools. Working group 5.5 on taxonomy has been launched in 2009. The Deliverable D5C-2 Language

Resources and Tools Survey and Taxonomy and Criteria for the Quality Assessment with findings of the survey of existing LRT, the taxonomical system for LRT and recommendations for the metadata scheme was published in January 2010. Task C3 Work out the structure and encoding characteristics of resources and the

interfacing and import/export characteristics of tools In April 2010, working group 5.7 Interoperability and Standards was launched. It was divided into the subgroups Annotations, Lexica, Media Encoding, Ontologies, Translation, and Written Corpora. Agreements on standards are a necessary prerequisite for the interoperability of resources and tools, as well as for the evaluation of LRT. As work on milestones M5C-3b and M5C-3c Study of

Interoperability Requirements at month 24 and month 36, current standards have been explored and compared, and best practices have been suggested. The WP 5 subgroups evaluated the current situation concerning general standards, lexica and terminology standards, ontologies, written corpora, annotation, multimedia encoding and annotation, and translation. The results can be found in Deliverable D5C-3 Interoperability and Standards, which was supplied according to plan at the end of year 2. Task C4 Specify a Basic Language Resources Toolkit (BLARK) and work out criteria for the

quality assessment of resources and tools Working Group 5.4 has the goal of specifying both existing and needed language resources and tools for the social sciences and humanities. Ten representative humanities disciplines were chosen for the investigation of user needs in form of a survey. Additionally, a survey on existing resources and tools for 13 European languages was conducted, containing both major languages and languages with fewer speakers. A mapping of the needed LRT on the matrix of existing LRT leaded to the identification of gaps in the infrastructure of language resources and tools for the investigated languages.

Page 28: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 28

Criteria for the quality assessment and priorities and recommendations for filling the gaps in the CLARIN infrastructure were worked out in close cooperation with WP2 and the other CLARIN partners. The findings of this working group were published in the deliverable DR5C-4 Description of the BLARK, the situation of individual languages, which was published by the end of year two (M25). Work on the coverage of existing LRT and their quality continued and is published as Milestone M5C-4b Results about the current coverage of LRT and their quality. Task R1 Establish criteria and set priorities for the adaptation, encapsulation and

integration of language resources and technology and selectively integrate them. Criteria and priorities for integration of LRT in the CLARIN infrastructure have been established as planned in close collaboration with WP2. The results were accomplished in phase 1 and published in the Deliverable Criteria and Priorities for Integration (D5R-C1). The milestone Comprehensive Interoperability Scenario (M5R-1.1) specifies the criteria, priorities, and conditions to be met by language data and resources and was met in M24 according to schedule.

Task R2 Definition of representative usage scenarios that will help us to assess the added value of interoperable language resources.

In project phase one, a call for contribution of scenarios was launched. Four generic scenarios were chosen as possible cases for the prototype. All the scenarios proposed by CLARIN members have served to identify some basic core functionalities and pipelines, which appear to be fundamental for further processing of data and relatively stable and uncontroversial in the community. Results are presented in Deliverable D5R-2 Usage Scenarios and Interoperability

case studies at the end of phase one (M18).

The Milestone Plan for providing users with LRT (M5R-2) is based primarily on the outcomes of D5R-2, but it also builds on work in various other activities within the whole project in order to define an effective plan for providing CLARIN users with useful LRT. It informs WP2 and the subsequent construction phase in setting the priorities for adapting resources and tools to the infrastructure (i.e. as web services). M5R-2.was presented at the end of year 2 (M24).

Task R3 Integration of language resources into the emerging web service infrastructure In consultation with WP2 and based on the findings of tasks C3 and C4, the creation of web services is underway. Results were reported at the end of year two (M24, M5R-3 First

description of typical LRT operation workflows and D5R-3a Linguistic processing chains as Web

Services: Initial linguistic considerations). D5R-3a focuses on obtaining representative examples of the LRTs available as web services and getting an understanding about their status. Exploration results presented facilitate the selection of appropriate standards for the resources and tools to be integrated in the course of further CLARIN activities. Deliverable D5R-3b Integration of Language Resources into Web service infrastructure (M39) contains final conclusions on the subject of harmonized access to resources via published interfaces to enable the interoperable domain. This final deliverable was published at the end of the project (M39). Task R4 Validation of the technical standards set in WP2. Validation of representational

standards and the interoperability concept. The validation of technical standards based on task R2 started as scheduled at M30. Note that the responsibilities for Task R3 and Task R4 have been swapped for logistic reasons between UHEL and IPIPAN with the agreement of the Executive Board. Deliverable D5R-4 Validation of technical standards and infrastructure prototype – The Virtual

Language Observatory and WebLicht/D-SPIN (M39) represents the findings of the validation of the CLARIN infrastructure with specific focus on the two target language tools. Evaluation has been undertaken in form of a questionnaire survey in order to see how the project meets the

Page 29: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 29

specifications and functional user requirements. Conclusions of the questionnaire are: critics mainly focused on problems with usability. All but a few users were cautious with praise. Users that offered most enthusiastic praise were excited by the possibility to quickly experiment and put together workflows for simple experiments that would otherwise be quite time-consuming. Milestone M5R-4 Pan-European Action plan to fill LRT gaps (M28) provides an overview of existing LRT in 13 European languages based on the deliverable D5C-4 with an interpolated perspective of relative importance and presents an action plan including suggestions and some practices in order to fill LRT gaps in individual languages. Cooperation and connection with other projects

WP5 is conducting work in particularly close cooperation with WP2. There are naturally close connections to the German counterpart of CLARIN (D-SPIN), but also to ongoing European initiatives: A CLARIN, FlaReNet, DARIAH, and ELRA summit on LRT Standards took place in mid-July 2009 in Berlin (hosted by the BBAW).

Meetings, workshops and conferences

Description Dates Venue

Joint CLARIN/FlareNet Workshop on Standards for LRT 10-11 July, 2009 Berlin CLARIN Centres workshop 5-6 November, 2010 Prague D-SPIN Workshop: Web service and workflow aspects workshop

20-21 November, 2010 Leipzig

CLARIN Workshop on Spoken Language Resources 24-26 November, 2010 Stockholm CLARIN Demonstrator workshop 25-26 January, 2010 Amsterdam CLARIN Boards meeting 1-5 March, 2010 Utrecht CLARIN Meeting with the Portuguese CLARIN members 18-20 March, 2010 Lisbon Planning of CLARIN-D 31 May, 2010 Mannheim CMDI Workshop 26-28 May, 2010 Nijmegen CLARIN Meeting with the Norwegian CLARIN members 17-19 June, 2010 Oslo Workshop on Lexicon Tools and Lexicon Standards 4-5 August, 2010 Nijmegen D-SPIN summer school 30-31 August, 2010 Bad

Homburg CLARIN consortium meeting, CLARIN/DARIAH conference, NEERI 2010

18-21 October, 2010 Vienna

W3C Workshop on CLARIN Web Services 25-26 October, 2010 Madrid CMDI Tutorial 17 January, 2011 Nijmegen CLARIN-ERIC Meeting 17-20 March, 2011 Den Hague SHARE Workshop: Exploring Collaboration with CLARIN

4-6 April, 2011 Oxford

CLARIN National Coordinators Meeting 28-30 June, 2011 Budapest As a member of the Executive Board, Erhard Hinrichs has participated in the following consortium and EB meetings: Description Dates Venue

CLARIN EB meeting 22-23 October 2010 Copenhagen CLARIN EB meeting and Advisory Boards meeting

2-4 March, 2010 Utrecht

CLARIN EB Meeting 17-18 June, 2010 Dubrovnik CLARIN Consortium Meeting 18-20 October, 2010 Vienna CLARIN EB meeting 23-24 September, 2010 Paris CLARIN EB meeting 18-19 April, 2011 Oxford CLARIN National Coordinators Meeting 28-30 June, 2011 Budapest

Page 30: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 30

2.5 WP6 Dissemination

2.5.1 Objectives WP6 co-ordinates all the dissemination and outreach activities of the CLARIN project. WP6 generates plans for the dissemination and outreach material to provide advice and support for researchers in the construction and operational phases of CLARIN. WP6 closely interacts with WP2 and WP5 to understand and disseminate all new technical developments, and works with WP3 to react to humanities and social science request for dissemination activities. Communication and dissemination channels include a website, mailing lists, a newsletter, a registry of experts and a referral help-desk and expertise addressing all categories of users interacting with the CLARIN infrastructure.

2.5.2 Summary of progress towards the objectives The objectives set out for this period have been achieved and will feed into the construction phase: the development of the website and the publication of the newsletter will be handed over to CLARIN ERIC, and the first design of help desk services and the knowledge sharing infrastructure will be taken up by CLARIN ERIC for further elaboration and subsequently implementation.

2.5.3 Details of each task

Task 1. WP formation, planning for activity and dissemination

Task 1 has been completed in the 1st period.

Task 2. Editing a website and a newsletter

The Internal Website

Following recommendations from the EB, the internal website has been improved several times, including few months after the start of the second period. A final theme has been established. Modifications were made to the structure of the contents. The Drupal platform was updated to version 6. Other modifications were made to improve the website in terms of usability.

The Newsletter

All the newsletters are now available at http://www.clarin.eu/view_newsletters. Complementary to the original work plan, the CLARIN Newsletter issue #13 has also been published. Task 3. Referral Help Desk and Registry of Expertise

A Frequently Asked Questions section has been organised in the CLARIN external web space, therefore accessible to anyone. Various design alternatives for helpdesk systems were proposed:

• In a first proposal, the ALPE methodology (developed by Cristea et al.) has been adapted to cover the basic needs of a CLARIN help desk system. ALPE defines a hierarchy of XML annotation schemas complemented with processing power. A prototype design for the CLARIN help-desk, restricted to the BLARK (Basic Language Resource Kit), has been created. The idea is to show the user a visualisation on what is the BLARK and what type of knowledge about the text does each component of the resource kit add through its annotation. The proposed service configures a Processing Chains Help Service which would allow users to have an unintermediated interaction with the CLARIN infrastructure for solving their NLP-related research problems. The system can hint about possible processes can fit the format of an input

Page 31: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 31

file, can automatically configure processes in a chain based on the available input and requested output, can instruct on alternative solutions in order to optimise costs or IPR issues.

• A Natural-Language-based question and answering approach has been proposed to complement other classical ways of helping the user to find CLARIN-related information. Such a system would allow users to submit questions formulated in natural language and receive precise answers. Two variants have been suggested: one accessing the registry of expertise database and the other accessing internal CLARIN documents, user manuals, or any comprehensive collection of texts about the CLARIN technology.

• The Stackoverflow technology allows the development of help-desks addressing large and well structured communities of users which can ask and answer questions, and can rate other user’s answers. Each user has a number of reputation points which represents how much can the other users rely on his answers. Questions can be organized in topics and a knowledge base is formed quickly. By infiltrating a number of CLARIN experts in a Stackoverflow based community, the level of professionalism can be greatly improved.

• The tickets-based technology is the most common approach to a human intermediated help desk. A support ticket system manages and maintains lists of issues submitted by users. A ticket is an element which contains information about support interventions made by technical support staff on behalf of an end-user who has asked a question. A ticket has a life cycle in which staff members can do operations like answering, closing, reopening, post internal note, and transfer it to other staff colleagues. Task 4. Help Desk services in the construction phase

A helpdesk prototype has been created. The tickets-based technology was adopted for that. The application allows users to ask questions, while also providing an interface for the experts to manage and answer questions. The entire system is easily configurable through a visual interface. The question submitting interface is published at http://radsimu.digitalform.ro/helpdesk/osTicket/ The staff member and admin interfaces are published at http://radsimu.digitalform.ro/helpdesk/osTicket/scp/ For prototype staff member account, you can use: User: manager1 Pass: the_password For prototype administrator account, you can use: User: the_admin Pass: the_password We have also covered solutions for extending this system to reduce the required amount of work. Additional Task 1. Isocat screencasts

A screencast for the Isocat web application has been created and is available at http://www.isocat.org/files/manual.html Additional Task 2. The Knowledge Sharing Infrastructure

Knowledge sharing is of major importance in the future phases of CLARIN, Construction and Exploitation. To fully exploit the language processing infrastructure that will be developed by CLARIN up to its maximum benefits, the development efforts should be accompanied by an equally sophisticated knowledge sharing infrastructure. In the deliverable D6C-4.1 we described

Page 32: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 32

the main functionalities of such a complex environment, oriented to accomplish the following goals: - to make known to the world of HSS researchers CLARIN goals and achievements, and the advantages a researcher could get by accessing the CLARIN environment; - to offer a communication gate among potential users, where the community of creators and consumers could interact in a Question&Answer manner; - to facilitate fostering of joint research communities at European level, rooted on national CLARIN organisations; - to offer high level possibilities for searching of information about experts, technologies, data; - to offer help and to guide novice users in accessing CLARIN technologies and resources; - to accommodate different modalities of training and education at various levels in order to keep updated the CLARIN community with the latest developments while also facilitating the contact with the CLARIN infrastructure for the younger generation of researchers.

2.6 WP7 Legal Issues

2.6.1 Objectives Legal and ethical issues are of key importance to the viability of the CLARIN infrastructure. CLARIN is committed to open access. However, the LRT domain includes material that can only be made available subject to a variety of legal and ethical restrictions. This WP is for building the necessary legal and ethical agreement patterns in CLARIN. Agreements and licenses are needed for successful cooperation among the various actors and users of CLARIN, and for achieving and maintaining sufficient levels of trust. It is the task of this WP to create sufficient model agreements for the operational prototype and test beds that will be built in CLARIN during the preparatory phase and to find out what their coverage and level of acceptance is. This work will feed directly into WP2 where these schemes will have to be implemented, as well as WP5, which works with specifying guidelines for CLARIN metadata where terms of access and use will be represented.

2.6.2 Summary of progress towards the objectives The objectives set out for this period have been achieved and the results will feed into the construction phase.

2.6.3 Details of each task

Task 1 Working Group formation and activity plan.

Already concluded in period 1. Task 2 Providing a framework for licensing and authorization between CLARIN and external

providers to enable incorporating new and existing resources and technology into

CLARIN. This may extend to models for the inclusion of commercial resources and

software.

(Deliverable 7S-2.1)

This AAI outline was disseminated and approved (M7S-2.1). Existing licenses for LRT were collected (M7S-2.2) and a first comprehensive summary about licensing problems (M7S-2.3) was produced and an initial version of licensing templates and categories was created (M7S-2.4) and reviewed. The final form of the CLARIN licensing categories accommodating new as well as legacy licenses was approved by the Executive Board in spring 2010 (D7S-2.1) and presented at the ELREC conference in May 2010. Task 3 Defining the relation to existing initiatives such as in particular ELRA/ELDA and the

inclusion of their offering.

(Deliverable 7S-3.1)

Page 33: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 33

When the decision to make CLARIN into an ERIC was taken, it became clear that ELRA/ELDA, as a private organization, would not be part of the CLARIN ERIC. This also clarified the role of ELRA/ELDA to become one of the possible complementary commercial distribution channels for some of the language resources provided to the research community via CLARIN. A check-list was prepared on how national coordinators of CLARIN should relate to ELRA/ELDA and a whole set of other resource providers and distributors (D7S-3.1).

Task 4 Providing a set of legal agreement templates to be used between the various actors in

order to establish the necessary trust relations and to define the obligations and rights.

(Deliverable 7S-4.1)

The Federation Agreement was disseminated among the first wave of CLARIN centers. The prototype SP and AAI Federation became functional in January 2010 and new CLARIN centers as well as national Identity Provider Federations were incorporated into the CLARIN SP Federation with a final number of 9 centers and 8 identity federations in the prototype in June 2010.

Cooperation and connection with other projects

FIN-CLARIN implemented a piece of software called the Language Bank Rights, which work was subcontracted from CSC – IT Center for Science. This software takes care of the licensing procedure for end-users in a standardized way. During 2011, UHEL was also involved in the METANORD project – a project affiliated with METANET. UHEL produced a report on how CLARIN and METANET licenses differ aiming to harmonize the two. At the end of June 2010, the lawyers of METANET confirmed that they will also create licenses accommodating the legacy licenses of existing resources and thereby allow for a mapping between the CLARIN categories and the METASHARE licenses.

Meetings, workshops and conferences

Description Dates Venue

EB meeting 22-23/10, 2009 University of Copenhagen, Denmark

EB meeting 21-22/1, 2010 EWI, Brussels, Belgium EB meeting 02-03-2010 Utrecht University, the Netherlands EB meeting 17-18/06, 2010 Zagreb University, Dubrovnik,

Croatia EB meeting 23-24/09, 2010 Paris, France EB meeting 10-11/01,2011 Universidad Autónoma, Madrid,

Spain EB meeting 16-17/03, 2011 Oxford University, U.K. EB meeting 20-21/06, 2011 University of Copenhagen,

Denmark Advisory Boards meeting 3-4/03, 2010 Utrecht University, the Netherlands Consortium meeting 18-10, 2010 Technical University Vienna,

Austria National Coordinators meeting

29-30/06, 2011 Hungarian Academy of Sciences, Institute of Linguistics, Budapest

SDH2010 19-20/10, 2010 Technical University Vienna, Austria

NEERI2010 21-10, 2010 Technical University Vienna, Austria

Page 34: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 34

2.7 WP8 Construction and Exploitation Agreement

2.7.1 Objectives WP8 was completely dedicated to the preparation of an agreement between the funding agencies in the participating countries about the construction and exploitation phase of the CLARIN infrastructure. This included the investigation of possible legal, financial and organisational models.

2.7.2 Summary of progress towards the objectives According to the Description of Work the main deliverable of this WP is the CLARIN Construction and Exploitation Agreement that should form the basis for the joint construction and exploitation of the CLARIN Infrastructure by the participating countries. As the European Commission introduced the new legal entity European Research Infrastructure Consortium – ERIC – this is the form that has been adopted by CLARIN. The main deliverables have therefore been the documents required for the submission of a request to establish an ERIC, in particular statutes for CLARIN ERIC and Technical and Scientific Description for CLARIN ERIC. The first electronic submission was made 23 May 2011, and will be followed by the formal submission in July/August 2011. CLARIN has also been actively involved in a bottom-up initiative by a number of ESFRI-PP coordinators to create working groups and organize workshops in order to discuss issues of common interest, the ePPCC. This group has been operating in close collaboration with the EC. The work of WP8 has been done in close collaboration with WP1.

2.7.3 Details of each task These details concern only the work performed in the reporting period.

Task 1 The Expert Group was already formed in the previous period. Task 2 From the DoW: “An organisation plan, including both governance and operational level

will be worked out. This could also include the creation of new legal and other entities to support governance and operational management. Types of entities to be looked at include (but are not limited to) EEIGs (or similar structures being worked out by the EC), company structures, foundations, associations and other transnational cooperation structures. The expert groups will include experts from various disciplines and groups such as people from the EC or ministries that are involved in working out organization models.” Work on task 2 was already started in the previous period. In the current period the efforts have been concentrated on using the ERIC Regulation, as well as the previously acquired knowledge, to work out proposals for statutes for CLARIN ERIC. The workshops organised by the EC have been very useful for discussing and understanding the regulation better, and for exchanging views with other Preparatory Phase research infrastructures.

Task 3 From the DoW: “A financial plan, including estimations of the cost of construction and

exploitation as well as a plan to cover these costs on the basis of generated income, financial commitments from the participating countries and other participants, and contributions expected from other sources. Here we have to find out what the possibilities are to use EC Structural funds and whether the member states are willing to include the EIB.”

Page 35: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 35

A document on the costs of a CLARIN ERIC was made and discussed with the Strategic Coordination Board. A proposal for the calculation of the annual fee for members of CLARIN ERIC was made and discussed with the Advisory Board, and with the Steering Committee which was formed by the countries who signed the Memorandum of Understanding for creating CLARIN ERIC. The distribution of the contributions has been fixed for the first 5 years and is now annexed to the statutes.

Task 4 From the DoW: “A coordination plan that specifies how CLARIN will address coordination

with national and EU R&D programmes and third parties; problems arising from different economic conditions in different countries, problems arising from lack of synchronisation in national budgetary procedures, and problems arising from differences in R&D funding allocation procedures in different countries.” A first report was made, describing various schemes for collaboration with third countries. The second report provides more specific suggestions for collaboration opportunities. CLARIN has been met with very positive reactions in several continents, so real opportunities exist and should be exploited.

Task 5 From the DoW: “WP2 will work out a plan for the building and construction of the core infrastructure for CLARIN and WP5 will work out plans for integrating resources and tools into this infrastructure. Both plans will contain priorities, time planning and cost estimates based on various contributions also from the other WPs. WP8 will monitor these activities and discuss all aspects with the funding agencies iteratively in order to ensure broad acceptance.”

The elaboration of the statutes have taken into account the plan for the building and construction of the core technical infrastructure for CLARIN, and discussed with WP2 how the management of the technical infrastructure for CLARIN ERIC can be integrated in the governance structure. The input from all other work packages has also been taken into account, in particular input on legal issues from WP7 has been integrated. However, the major part of WP8 work has been the iterative discussions with the funding agencies from the involved countries. The Strategic Coordination Board consisted of representatives from or appointed by the funding agencies or ministries in the countries. In this reporting period meetings were held in spring 2010 and 2011. During 2010 it became clear that The Netherlands would be prepared to host CLARIN ERIC and that the other countries welcomed this. The Dutch Ministry of Education, Culture and Science (OCW) consequently became an important partner for WP8, and several meetings took place in the ministry premises in The Hague. The OCW distributed a Memorandum of Understanding (MoU) to be signed by those countries who aimed at joining the CLARIN ERIC. The MoU was signed by 14 countries and 1 intergovernmental institution. A Steering Committee was formed, consisting of representatives of those countries that signed the MoU, with other countries as observers. This Steering Committee took over the role of the Strategic Advisory Board as it became necessary to give those countries committed a stronger influence. Strategic Advisory Board members participated for the last time in the March meeting in the ministry of OCW, together with the Steering Committee, and the meeting in April 2011 was exclusively for the Steering Committee. This transition went smoothly. The Steering Committee was chaired by the ministry of OCW, and the main decisions which were taken were: 1) The adoption of the CLARIN ERIC statutes, including governance principles and principles for the distribution of the costs of running the ERIC. 2) The adoption of the Technical and Scientific Description.

Page 36: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 36

As mentioned above, the electronic submission of the request to establish CLARIN ERIC was made 23 May, and the project received feedback from the EC, including assessment by independent experts 27 June. This feedback contained some required formal changes to be made to the statutes, and some suggestions, as well as an invitation to submit the formal request for setting up CLARIN ERIC. The changes have been made, and the request will be submitted as soon as the necessary signatures have been collected.

Cooperation and connection with other projects

As mentioned above, CLARIN-WP8 is actively involved in the initiative by a number of ESFRI-PP coordinators to create working groups and organize workshops in order to discuss issues of common interest, the ePP-CC. This group has been operating in close collaboration with the EC.

Meetings, workshops and conferences

Note: WP8 participation in general meetings such as EB meetings (WP1) and Consortium meetings is not repeated here. Description Dates Venue

NEERI 30/9-2/10-2009 Helsinki Legal workshop 29-30/10-2009 EC, Brussels Workshop organised by WP2 4-6/11-2009 Charles University, Prague Strategic Coordination Board 3-4/3-2010 Utrecht ECRI conference 22-24/3-2010 Barcelona ePP-CC workshop 27-28/5-2010 EC, Brussels SDH 2010 19-20/10, 2010 Technical University,

Vienna NEERI 2010 21/10, 2010 Technical University,

Vienna Meeting with NL Ministry for the preparation of the ERIC

25-26/1, 2011 Ministry, The Hague

Meeting with EC for the preparation of the ERIC

21/2-2011 EC, Brussels

Meeting with NL Ministry for the preparation of the ERIC

22-23/2-2011 Ministry, The Hague

Meeting with NL Ministry for the preparation of the ERIC

10-11/3-2011 Ministry, The Hague

Meeting with the Steering Committee and Strategic Coordination Board

17-18/3-2011 Ministry, The Hague

Meeting with the Steering Committee

14-15/4 2011 Ministry, The Hague

ePP-CC workshop EC, Brussels Meta-Forum and National Coordinators meeting

26-30/6, 2011 Marriott Hotel and Hungarian Academy of Sciences, Budapest

ERIC Committee 1/7 2011 EC, Brussels

Special issues to be addressed

In the previous periodic report it was mentioned that the project had realised that some of the results would have to be made available earlier than foreseen in the DoW, in order to allow for the long process to arrive at a common understanding and agreement on the core issues. This has been done, and the process has been successful.

Page 37: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 37

3 Project Management

3.1 Consortium management tasks and achievements A central CLARIN office at Utrecht University is responsible for all aspects of the financial, administrative and legal management of the project. The CLARIN Office serves as the main administrative and financial centre and as the central contact point for all project participants, it is the main contact with the EC, and it will also liaise between EC and partners. To this end it makes use of an internal administrative website and electronic repository where partners can access all relevant financial and administrative documents.

3.2 Problems that have occurred and how they were solved No significant problems have been observed during this period.

3.3 Changes in the consortium In 2010 three new partners joined the Consortium: Haskoli Islands (the University of Iceland), Istanbul Teknik Universitesi, and Vytauto Didiojo Universitetas (Lithuania). This has no consequences for the budget; the new partners will participate in CLARIN activities with national money. One more country has formally expressed an interest in joining CLARIN: Luxemburg. From other countries, such Slovenia and Serbia, we also received expressions of interest to join CLARIN in the Construction phase.

3.4 List of project meetings, dates and venues

See the list included in the activities of WP1C. NB: this table includes meetings of the project coordinator and his assistant to meetings and workshops related to project management (WP 1M) and coordination (WP1C).

3.5 Development of project website and communication with Consortium

Communication with the consortium has been taken place through the internal website and several mailing lists. In addition a consortium meeting was organized in conjunction with the SDH2010 conference in Vienna. In June we organized a meeting in Budapest for all (prospective) leaders of the emerging national CLARIN consortia to inform them of the final results of CLARIN and the way the infrastructure will evolve in the future. The external website was further developed. An important new addition was the Virtual language Observatory, which allows users to get access to the world of Language resources in various ways.

3.6 Communication and liaisons with other projects or programmes See WP 1C

Page 38: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 38

3.7 Project planning and status The project has achieved all objectives, with some minor delays in July and August due to the vacation period and illness of the assistant to the coordinator.

3.8 Any changes to the legal status of any of the beneficiaries

.Partner 6- FFCUL (PIC 999580539) - change of legal address

Old legal address New legal address Effective date

Campo Grande, Edificio C7, 1 Piso, LISBOA, 1749016, PORTUGAL

CAMPO GRANDE EDIFICIO Cl PISO з,LISBOA, 1749016, PORTUGAL

01/01/2008

2.Partner 8- IPPBAS (PIC 999598775) - change of legal address

Old legal address New legal address Effective date

Acad. G. Bonchev Str. 25 A, SOFIA, 1113, BULGARIA

G BONCHEV STREET 25 A, SOFIA, 1113, BULGARIA

01/01/2008

3.Partner 27-ELDA (PIC 999749028) - the legal status has changed from SME to NONE

3.9 Use of foreground and dissemination activities during this period In accordance with our principle to share any project results with the whole research community all our results are freely accessible for anyone as soon as they have reached a certain state of maturity. Staff from institutions that have registered as CLARIN members have access to documents and results from an earlier stage. As is reflected in the various tables in the WP reports the CLARIN project and its (intermediary) results have been presented at many conferences and other events, in the form of submitted papers, posters, demos, invited talks, and panel sessions.

Page 39: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 39

APPENDIX: Table of deliverables and milestones

Table 1. Deliverables (D) and Milestones (M), ordered by WP

Del. no. Deliverable name WP no.

Lead participant

Nature Dissem. level

Due month

Delivered Yes/No

Actual / Forecast month

Comments

D1C-1 Implementation plan for nationally funded projects

1C UU R PU 12 yes 15 ok

D1C-2 Annual status report 2008

1C UU R PU 12 yes 16 ok

D1C-3 Annual status report 2009

1C UU R PU 24 yes 26 ok

D1C-4 Annual status report 2010

1C UU R PU 36 yes 42 is D1M-2

D1C-5 The CLARIN Reference Manual

1C UU R PU 42 yes 42 ok

M1C-1 Overview of national funding situation

1C UU R PU 06 yes 15 ok

M1C-2 Internal communication facilities in place

1C UU R PU 03 yes 03 ok

M1C-3 Semestrial report 1 to SB and SCB (suppressed)

1C UU R PU 06 no -- exit

M1C-4

Semestrial report 2 to SB and SCB (coincides with D1C-2, Annual Status Report 2008)

1C UU R PU 12 yes 15 ok

M1C-5 Semestrial report 3 to SB and SCB (coincides with D1M-1)

1C UU R PU 18 yes 20 is D1M-1

M1C-6 Semestrial report 4 to SB and SCB (coincides with D1C-3)

1C UU R PU 24 yes 26 is D1C-3

M1C-7 Semestrial report 5 to SB and SCB (suppressed)

1C UU R PU 30 no -- exit

M1C-8 Semestrial report 6 to SB and SCB

1C UU R PU 36 yes 42 is D1M-2

D1M-1 1st Periodical management and financial report

1M UU R PP 18 yes 20 ok

D1M-2 2nd Periodical management and financial report

1M UU R PP 42 yes 42 ok

M1M-1 Consortium agreement 1M UU R PU 02 yes 08 ok

D2R-1a Centres Network Formation

2 MPG R PU 03 yes 14 ok

D2R-1b Centres Network Formation - Centre types

2 MPG R PU 03 yes 14 ok

D2R-2a Federation Foundation - LRT

2 MPG R PU 06 yes 14 ok

D2R-2b Federation Foundation - PIDs

2 MPG R PU 06 yes 14 ok

D2R-3a Federation Building - v1 2 MPG P PU 18 yes 22 ok

D2R-3b Federation Building - v2 2 MPG P PU 36 yes 38 ok

D2R-4 Registry Requirements 2 MPG R PU 06 yes 14 ok

D2R-5a Registry Infrastructure - 2 MPG P PU 18 yes 22 ok

Page 40: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 40

v1

D2R-5b Registry Infrastructure - v2

2 MPG P PU 36 yes 38 ok

D2R-6a Web Services and Workflow Requirements - v1

2 MPG R PU 12 yes 18 ok

D2R-6b Web Services and Workflow Requirements - v2

2 MPG R PU 24 yes 24 ok

D2R-7a Web Services and Workflow Creation - v1

2 MPG P PU 24 yes 24 ok

D2R-7b Web Services and Workflow Creation - v2

2 MPG P PU 36 yes 42 ok

D2R-8 Service and Application Building

2 MPG R PU 36 yes 36 ok

D2R-9a Cost Estimates - v1 2 MPG R PU 24 no 24 ok

D2R-9b Cost Estimates - v2 2 MPG R PU 36 yes 42 ok

M2R-3 After 18 months a first network of collaborating centres

2 MPG R PU 18 yes 18 ok

M2R-5 After 18 months a first version of the new registry infrastructure

2 MPG R PU 18 yes 18 ok

M2R-6 After 12 months a specification for LRT web services (see D2R-6a)

2 MPG R PU 12 yes 15 ok

M2R-7.1 After 24 months a first set of integrated web services

2 MPG R PU 24 yes 24 ok

M2R-7.2 After 30 months a first version of a simple workflow system

2 MPG R PU 30 no 30 ok

M2R-8 After 30 months a first design for a set of web applications

2 MPG R PU 30 no 30 ok

D3C-1.1 Working Group Formation Report and Activity Plan

3 HASRIL R PP 06 yes 06 ok

D3C-2.1 Longitudinal study of impacts

3 OTA R PU 36 yes 42 ok

D3C-3.2 Humanities Organizations, Initiatives and Projects Report

3 OTA R PU 24 yes 26 ok

D3C-4.1 Call for Humanities Projects

3 HASRIL R PU 12 yes 12 ok

D3C-5.1 Detailed Project Plans 3 HASRIL R PU 21 yes 21 ok

D3C-6.1 Integrated strategic plan for supporting SSH research

3 HASRIL R PU 36 yes 36 ok

M3C-1.1 A first Working group activity plan

3 HASRIL R PU 03 yes 10 ok

M3C-2.1 Complete survey of initial state of knowledge and impacts

3 OTA R PU 12 yes 12 ok

M3C-2.2 Complete survey of final state of knowledge and impacts

3 OTA R PU 32 yes 42 is D3C-2.1

M3C-3.1 Overview of Humanities Projects

3 HASRIL R PU 12 yes 12 ok

M3C-4.1 A First Round of Projects to start

3 HASRIL R PU 16 yes 20 ok

Page 41: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 41

M3C-5.1 Required LR/T capabilities in place

3 HASRIL R PU 24 yes 24 ok

D5C-1

Report about outcome of liaison with other European projects and initiatives

5C UTU R PU 12 yes 14 ok

D5C-2

Language Resources and Tools Survey and taxonomy and criteria for the Quality assessment

5C UTU R PU 24 yes 24 ok

D5C-3 Interoperability and Standards

5C UTU R PU 36 yes 38 ok

D5C-4 Description of the BLARK, the situation of individual languages

5C UTU R PU 24 yes 25 ok

M5C-1 Working groups and activity plan established

5C UTU R PU 03 yes 03 ok

M5C-2 Broad LRT Survey and Comprehensive Taxonomy

5C UTU R PU 12 yes 18 ok

M5C-3a Study of Interoperability Requirements (see D5R-2)

5C UTU R PU 12 yes 17 ok

M5C-3b Study of Interoperability Requirements

5C UTU R PU 24 yes 24 ok

M5C-3c Study of Interoperability Requirements

5C UTU R PU 36 yes 42 ok

M5C-4a

Results about the current coverage of LRT and their quality (see 1st periodic report, full report in D5C-2)

5C UTU R PU 12 yes 12 ok

M5C-4b Results about the current coverage of LRT and their quality

5C UTU R PU 24 yes 24 ok

M5C-4c Results about the current coverage of LRT and their quality

5C UTU R PU 36 no 36 ok

D5R-1 Criteria and Priorities for integration

5R UTU R PU 12 yes 14 ok

D5R-2 Usage Scenarios and Interoperability case studies

5R CNR-ILC R PU 12 yes 18 ok

D5R-3a

Integration of LR - v1: Linguistic processing chains as Web Services: Initial linguistic considerations

5R IPIPAN R PU 24 yes 24 ok

D5R-3b Integration of LR - v2 5R IPIPAN R PU 36 yes 39 ok

D5R-4 Validation of technical standards and infrastructure prototype

5R IPIPAN R PU 36 yes 38 ok

M5R-1 Comprehensive interoperability scenario

5R UTU R PU 24 yes 24 ok

M5R-2 Plan for providing users with LRT

5R CNR-ILC R PU 24 yes 24 ok

M5R-3 First description of typical LRT operation workflows

5R IPIPAN R PU 24 yes 26 ok

M5R-4 Pan-European Action plan to fill LRT gaps

5R UHEL R PU 24 yes 24 ok

Page 42: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 42

D6C-1.1

Report on setting up network of correspondents and dissemination plan

6 UAIC R PU 03 yes 10 ok

D6C-2.1 Revised Web-Site 6 UAIC O PU 06 yes 10 ok

D6C-3.1 Help-Desk and Registry Prototype Report

6 UAIC R PU 18 yes 42 ok

D6C-4.1 Recommendations for future help-desk and advice infrastructure

6 UAIC R PU 36 yes 42 ok

M6C-1.1 Working and dissemination plan

6 UAIC R PU 03 yes 10 ok

M6C-2.01 First release of website 6 UAIC R PU 04 yes 04 ok

M6C-2.02 Newsletter 01 6 FFZG R PU 03 yes 05 ok

M6C-2.03 Newsletter 02 6 FFZG R PU 06 yes 07 ok

M6C-2.04 Newsletter 03 6 FFZG R PU 09 yes 09 ok

M6C-2.05 Newsletter 04 6 FFZG R PU 12 yes 12 ok

M6C-2.06 Newsletter 05 6 FFZG R PU 15 yes 15 ok

M6C-2.07 Newsletter 06 6 FFZG R PU 18 yes 18 ok

M6C-2.08 Newsletter 07 6 FFZG R PU 21 yes 21 ok

M6C-2.09 Newsletter 08 6 FFZG R PU 24 yes 27 ok

M6C-2.10 Newsletter 09-10 6 FFZG R PU 27 yes 30 ok

M6C-2.11 Newsletter 10 (=Newsletter 9-10)

6 FFZG R PU 30 yes 30 is M6C-2.10

M6C-2.12 Newsletter 11-12 6 FFZG R PU 33 yes 33 ok

M6C-2.13 Newsletter 13 6 FFZG R PU 36 yes 42 ok

M6C-3.1 Help-desk and registry prototype design

6 UAIC R PU 12 yes 18 ok

M6C-3.2 Help-desk and registry - v1

6 UAIC R PU 24 yes 34 ok

M6C-3.3 Help-desk and registry - v2

6 UAIC R PU 36 yes 36 ok

D7S-2.1

A report including Model Licensing Templates and Authorization and Authentication Scheme

7 UHEL R PU 36 yes 38 ok

D7S-3.1 Collaboration Plan between CLARIN and external services

7 UHEL R PU 24 yes 39 ok

D7S-4.1 Set of Federation Agreements for CLARIN centres

7 UHEL R PU 36 yes 38 ok

M7S-1.1 Working groups formed and Activity Plan exists

7 UHEL R PU 03 yes 05 ok

M7S-2.1 Initial authorization and authentication scheme plan exists

7 UHEL R PU 09 yes 20 ok

M7S-2.2

Registry of existing Licenses for LRT in operation and incrementally growing

7 UHEL R PU 12 yes 18 ok

M7S-2.3 First comprehensive summary about licensing problems Distributed

7 UHEL R PU 18 yes 21 ok

M7S-2.4 Initial versions of licensing templates distributed for review

7 UHEL R PU 24 yes 24 ok

M7S-4.1 Repository federations and initial set of Federation Agreements

7 UHEL R PU 12 yes 20 ok

Page 43: Second Periodic Report - CLARIN · IST projects (e.g. FlaReNet, META-Net, TrebleClef, MEDAR) and national funding agencies (in part via mail or email exchanges). See the list at the

Common Language Resources and Technology Infrastructure

CLARIN D1M-2 – Second Periodic Report 43

between CLARIN centres

D8S-1.1 Requirements and best practice overview for governance

8 UCPH R PU 12 yes 20 ok

D8S-1.2 Analysis and proposal(s) for governance

8 UCPH R PU 24 yes 24 ok

D8S-1.2a The shape of CLARIN (annex to D8S-1.2)

8 UU R PU 24 yes 26 ok

D8S-2.1a Cost estimations for construction and exploitation phase - v1

8 UCPH R PU 24 yes 26 ok

D8S-2.1b Cost estimations for construction and exploitation phase - v2

8 UCPH R PU 36 yes 36 see D8S-

2.2

D8S-2.2 Financial plan for construction and exploitation phase

8 UCPH R PP 24 yes 41 ok

D8S-3.1

Requirements and best practice for transnational coordination and collaboration with third parties

8 UCPH R PU 12 yes 20 ok

D8S-3.2 Analysis and proposal(s) for coordination

8 UCPH R PU 24 yes 42 ok

D8S-4 Draft CLARIN Construction and Exploitation Agreement

8 UCPH R PU 36 yes 40 ok

D8S-4a Draft CLARIN ERIC Statutes

8 UCPH R PU 36 yes 24 in D8S-4

D8S-5 CLARIN Construction and Exploitation Agreement (final version)

8 UCPH R PU 42 yes 42 ok

D8S-5a CLARIN ERIC Statutes (final version)

8 UCPH R PU 42 yes 42 in D8S-5

M8S-1 Creation of expert groups

8 UCPH R PU 03 yes 03 ok

M8S-2 Governance proposals ready for discussion

8 UCPH R PU 24 yes 24 see D8S-

1.2

M8S-3 Financial proposals ready for discussion

8 UCPH R PU 24 yes 41 ok

M8S-4 Coordination proposals ready for discussion

8 UCPH R PU 24 no 42 see D8S-

3.2

M8S-5 Agreement on CLARIN Construction and Exploitation phase

8 UCPH R PU 36 yes 42 see D8S-5

X8S-01 Lisbon Meeting Proceedings

8 FFCUL R PU ~ yes 27 ok