18-03-2013 hung lst day 1 language technology for the humanities: why and how? steven krauwer...
TRANSCRIPT
![Page 1: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/1.jpg)
18-03-2013 Hung LST Day 1
Language Technology for the Humanities:why and how?
Steven Krauwer
Utrecht University
CLARIN ERIC Executive Director
![Page 2: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/2.jpg)
18-03-2013 Hung LST Day 2
Overview
Why? How?
CLARIN in a nutshell The dream The vision Phasing CLARIN ERIC The nightmare The challenge
Why join? Concluding remarks
![Page 3: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/3.jpg)
18-03-2013 Hung LST Day 3
Why (1)
Wealth of digital language data, spread all over Europe in archives, repositories, libraries
Reflects human behaviour, communication, knowledge, culture etc
Rich source of data, information and knowledge for Humanities and Social Sciences (HSS) scholars (historians, philosophers, social scientists, …)
In addition results of 30 years of European HLT efforts In brief: a great opportunity for HSS to innovate itself and to
become world leaders, especially because of our multilinguality
BUT…….
![Page 4: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/4.jpg)
18-03-2013 Hung LST Day 4
BUT … How do HSS scholars know what data exists How can they get access to data from all over Europe How do they know what tools exist to retrieve, explore and
exploit these data How do they know how to decompose their HSS research
questions into sub-questions that can be answered by digital methods
OUR ANSWER: CLARIN: the Common Language Resources and
Technology Infrastructure for the Humanities and Social Sciences
Why (2)
![Page 5: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/5.jpg)
18-03-2013 Hung LST Day 5
How: CLARIN in a nutshell
Common Language Resources and Technology Infrastructure (http://www.clarin.eu)
Basic idea: European federation of digital repositories with language
data and tools (text, speech, multimodal, gesture …) with access to language and speech technology tools
through web services to retrieve, manipulate, enhance, explore and exploit data
with uniform single sign-on access to archives and tools target audience humanities and social sciences scholars to cover all EU and associated countries and all languages relevant for target audience
![Page 6: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/6.jpg)
18-03-2013 Hung LST Day 6
The CLARIN dream
give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)
give me all negative articles about Islam or about soccer in the Slovenski Narod daily newspaper (1868-1943)
find European TV news interviews that involve speakers with a Hungarian accent
summarize all articles in European newspapers of August 2012 about OCR – in Portuguese
show me the pronoun systems of the languages of Nepal
![Page 7: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/7.jpg)
18-03-2013 Hung LST Day 7
The vision:the role of language
Language is at the heart of many disciplines in the Humanities and Social Sciences (HSS), e.g. as an object of study as a means of human communication as a means of human expression as a record of our history as part of one’s cultural identity as carrier of knowledge and information
CLARIN wants to support them all Language and speech technology are part of
this (e.g. in the form of computational linguistics or speech science) – essential, but just a part!
![Page 8: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/8.jpg)
18-03-2013 Hung LST Day 8
The vision:what CLARIN wants to offer CLARIN makes it possible for the researcher to find
resources (metadata search), and to refer to them in a persistent way (persistent identifiers)
CLARIN allows for content search in and across collections CLARIN offers access to web services and workflows to
perform complex linguistic & content operations and visualisations
CLARIN covers both historical and contemporary language material in all modalities
CLARIN serves both expert and non-expert users CLARIN offers access to depositing and long term
preservation services Ultimate goal: advancing HSS in order to get a better
understanding of our society at a European scale
![Page 9: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/9.jpg)
18-03-2013 Hung LST Day 9
Phasing of CLARIN
Does CLARIN exist? Yes and no. 2008-2011: CLARIN Preparatory Phase Project, 26
countries, EC funded Goal: designing the infrastructure technically and organisationally, and lining up the players
2012-2015 Construction Phase, jointly funded by the participating countries, no EC fundingGoal: building the European infrastructure
2015-…: Exploitation Phase, jointly funded by the participating countries, no EC fundingGoal: making and keeping it running, populating it, and ensuring that it follows new trends in technology and research – covering all EU and associated countries
![Page 10: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/10.jpg)
18-03-2013 Hung LST Day 10
CLARIN ERIC
CLARIN ERIC is the governance and coordination body, but will not run or fund operational data services
An ERIC is new type of intergovernmental legal entity, created by the EC, essentially a consortium of countries, with no end point
CLARIN ERIC member countries pay a modest annual fee Countries will each set up a national CLARIN consortium, that
will provide data and linguistic services and create data and tools It is up to the countries to decide how to shape and fund their
CLARIN consortia and how to relate them to other activities at the national level (e.g. research programmes, digitisation programmes, etc)
CLARIN ERIC established by the EC on Feb 29th 2012, with 9 founding members: AT, BG, CZ, DE, DK, EE, NL, PL, DLU
More in the pipeline, NO joining at this moment – but we need all European countries!
![Page 11: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/11.jpg)
18-03-2013 Hung LST Day 11
What is so nice about ERICs?
They are legal entities, not projects, which helps to make them more sustainable
Members are governments, committing themselves for longer periods of time (min. 5 years)
CLARIN ERIC is a sign of recognition by governments and EC of the importance of sharing language resources
Closeness to funding agencies may help to enforce use of standards and sharing of data in projects they fund
Good starting point for international collaboration as third countries can join or make collaboration agreements (e.g. through agencies or data centres)
ERICs may submit proposals for EC funding
But: bulk of the funding dependent on funding mechanisms and cycles in participating countries – NOT from EC
![Page 12: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/12.jpg)
18-03-2013 Hung LST Day 12
The CLARIN nightmare
give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)
give me all negative articles about Islam or about soccer in the Slovenski Narod daily newspaper (1868-1943)
find European TV news interviews that involve speakers with a Hungarian accent
summarize all articles in European newspapers of August 2012 about OCR – in Portuguese
show me the pronoun systems of the languages of Nepal
![Page 13: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/13.jpg)
18-03-2013 Hung LST Day 13
The CLARIN nightmare, example1
give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350) “All” means from all countries and all archives, not just
some archives in some (now 10) CLARIN ERIC member countries
If contemporary docs exist in digital form at all they are probably pictures – how do we get access to the content? Is OCR doable?
Can we rely on standardized metadata to find them? Are our topic detection technologies good enough? Many of the docs may be in Latin, can we handle that, and
what about other languages, e.g. Hungarian? How would a non-technical scholar know how to formulate
this query?
![Page 14: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/14.jpg)
18-03-2013 Hung LST Day 14
The CLARIN Challenge
Do HSS scholars realize at all that they should be interested in these things? Some do, most don’t; we should make an effort to show them the
potential benefits of adopting these new methods Showcases and visualisation tools are indispensable Distinguish between lost and future generation
Are the tools offered by language and speech technology the direct answers to the problems of HSS scholars as they see them? Major technological efforts are needed, but technologists have a
strong tendency to offer more and better gearboxes to people who are just waiting for a bus with comfortable seats (and a gearbox)
Technologies that work for modern versions of big languages may not work for older versions or not even exist for digitally less favoured languages
Use and adaptation of existing tools to specific HSS questions may always require intervention by technologically skilled people
![Page 15: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/15.jpg)
18-03-2013 Hung LST Day 15
What would it take to join
Only countries can be ERIC members, not individual research institutions; countries that join CLARIN ERIC would have to
recognize the ERIC as a legal entity (done for EU countries) commit themselves for at least 5 years pay an annual membership fee (ranging from 12.000 to 200.000
euro, depending on GDP, for HU ca 12.000 euro) set up and fund a national CLARIN consortium (universities, data
archives, etc) to provide access to their data, and to create new data and tools according to their national research priorities
identify (and fund) at least one existing data centre as the national hub that is linked to the rest of CLARIN
commit themselves to sharing resources and adoption of CLARIN standards in nationally funded projects
![Page 16: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/16.jpg)
18-03-2013 Hung LST Day 16
The benefits from joining
Access to the CLARIN Infrastructure, i.e. to all CLARIN language resources and technology services for scholars in the humanities and social sciences (HSS)
Access to expertise from all over Europe via the CLARIN knowledge sharing infrastructure
Embedding in mainstream European HSS research community, with access to the same data
Better visibility of their research results, their resources, their language and their cultural heritage in the European research community
Open doors for cross-lingual and cross-cultural research Embedding in the European Research Area Opportunities to participate in EU projects initiated by
CLARIN ERIC
![Page 17: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/17.jpg)
18-03-2013 Hung LST Day 17
What if Hungary does not join?
The bright side: No need to pay an annual 12000 euro membership fee No need to agree on and comply with standards intended to
facilitate exchange of data No obligation to share and preserve digital results from
projects with public funding after their completion No need to set up a national consortium to coordinate
infrastructure building and creation of data and tools at the national level
No need to collaborate with European partners to make tools and resources interoperable at the European level
Researchers whose horizon lies within Hungary wouldn’t even notice!
![Page 18: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/18.jpg)
18-03-2013 Hung LST Day 18
What if Hungary does not join?
The less bright side for Hungarian researchers: They would have to make their own individual arrangements to get
access to data and services outside Hungary Not having access to the same data and tools might create obstacles for
cross-national collaboration Their data and tools might be less visible in the European research
community, and results not reproducible and therefore not recognized Hungary was one of the leading players in the CLARIN project and risks
to gradually lag behind
The less bright side for CLARIN: We would have to do without the excellent human and linguistic
resources we know the Hungarian research community has to offer We would have no alternative way to cover the Hungarian language and
to provide access to its data collections to the HSS research community
![Page 19: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/19.jpg)
18-03-2013 Hung LST Day 19
What makes CLARIN interestingin comparison with other RIs?
No cash contribution other than the annual fee to pay for governance and coordination; other than that no cross-border funding
Fee fixed for 5 years with 2% annual increase, no surprises Commitment to investing at the national level, but no major
capital investment required, no fixed prescribed amounts Selection of data and tools to be created follows from own
research priorities and economic situation – not centrally decided
HSS scholars have no digital tradition: unique opportunity to innovate research
HSS scholars tend to work in isolation: unique opportunity to become part of the mainstream European research community
![Page 20: 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649e0f5503460f94afa88b/html5/thumbnails/20.jpg)
18-03-2013 Hung LST Day 20
Concluding remarks
CLARIN has a lot to offer to the Hungarian research community in terms of access to data, tools and expertise, and participation in CLARIN will move Hungarian forward towards full participation in the Digital Age
Hungary has a lot to offer to CLARIN, as is demonstrated by its successful participation in the CLARIN Preparatory Phase and in sister initiatives such as META / CESAR
In times of crisis it is hard for the funding bodies to assign priorities to competing research infrastructure initiatives, but it should be kept in mind that in financial terms CLARIN is a low cost entry model research
infrastructure with no financial risks with its language Hungary has a unique selling point in
Europe!