from dobes to clarin and beyond

18
From DOBES to CLARIN and beyond Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen ?

Upload: wilson

Post on 27-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

  . From DOBES to CLARIN and beyond. Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen. ?.   . FACTS AND FIGURES. Non-profit-making foundation established unter private law based in Hanover - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: From DOBES to CLARIN and beyond

From DOBES to CLARIN and beyond

Axel Horstmann

Peter Wittenburg

Erhard Hinrichs

VolkswagenFoundation

MPI for Psycholinguistics

University of Tübingen

?

Page 2: From DOBES to CLARIN and beyond

FACTS AND FIGURES

• Non-profit-making foundation established unter private law based in Hanover• Not affiliated with the car manufacturer of the same name• Founded by the Governments of the Federal Republic of • Germany and the State of Lower Saxony in 1961• Objective: to support science and technology as well as the humanities and the social sciences in research and university teaching• Assets: about 2.45 billion euros• Funding p.a.: about 110 million euros• One of the most potent private research funding foundations in Europe

Page 3: From DOBES to CLARIN and beyond

FOCUS ON HUMANITIES AND SOCIAL SCIENCES

• Current funding initiatives (see KURZINFORMATION / BASIC INFORMATION): about 45 to 50 % of the funds given to H&SC

• Initiatives focussing on infrastructural support of H&SC:• Kulturwissenschaftliche Dokumentation (closed)• Archive als Fundus der Forschung (closed)• DOBES: Dokumentation bedrohter Sprachen

• Projects including infrastructural support of H&SC• Strategy building on digitization of endangered books• Digitization of the so-called “Aschebücher” of the HAAB

Weimar (in preparation)

Page 4: From DOBES to CLARIN and beyond

"E-HUMANITIES": POSSIBILITIES AND PERSPECTIVES

• Strong interest in innovative approaches

• Funds available for projects involving activities towards "E-Humanities" (e.g.: digitization of data, collections, archival material) within current funding initiatives

• Funding possibilities for meetings, workshops, conferences etc. focussing on "E-Humanities" (within the funding initiative Symposia and Summer Schools)

• New perspectives on "E-Humanities" (possibly) opened up within a new funding initiative aiming at Research in Museums (actually in planning) including to a certain extent digitization activities - … and not to forget the

• Flagship "DOBES" ...

Page 5: From DOBES to CLARIN and beyond

Concrete steps or Babylonian Tower

• we don’t know exactly what eHumanities means

• we feel that mechanisms in research processes are changing rapidly with technological innovation as motor• but we can’t say: “we are now going to design eHumanities” • we probably can say: “let’s plan further concrete projects and actions and see”

• many excellent projects around – let me just refer to the good sides of DOBES as one of these steps

(Documentation of Endangered Languages funded by VolkswagenFoundation)

Page 6: From DOBES to CLARIN and beyond

What is DOBES?

44 DOBES teams working fully distributed and self-organized incl. linguists, anthropologists, musicologists, ethno-biologists, etc. In addition, VWF installed a central archive Start in 2000

Page 7: From DOBES to CLARIN and beyond

What changed in DOBES?

• handing over all data after a limited time to an archive was completely new and is an explicit step, although the results will not be ready

• there is a push to make data accessible to others from the beginning - also new for many and not without conflicts

• asking researchers to categorize and organize material according to agreed metadata was also new and still requires evangelization

• including multimedia in the documentation and dealing with audio/video as basis was kind of new and requires techno-knowledge

Page 8: From DOBES to CLARIN and beyond

Which infrastructure by DOBES?

• a stable, reliable and open repository/archiving system handling 30 TB • data storage not encapsulated and in open formats • introduction of persistent identifiers to ensure investments in relating fragments• a network of 12 centers worldwide included in data distribution• of these 6 copies in centers with hardware migration strategy • a number of web-based applications offering various ways to access the data

Page 9: From DOBES to CLARIN and beyond

CLARIN/D-SPIN Challenges

eResearch is about global collaboration in key areas of science and the next generation of infrastructure that will enable it (J. Taylor)

• goal is an open research infrastructure to overcome the huge fragmentation of language resources and tools and to offer them to research communities - in particular to humanities

• help tackling the LARGE challenges (multilingual societies)

• but also helping the individual researcher • example: align a transcription and an audio signal• how many researchers know about how to do this

• see CLARIN/D-SPIN as a huge virtual marketplace of resources and tools that can be combined due to integration and interoperability solutions • not forget Henry Thompsons (one of the XML fathers) don't have an agreed descriptive system in our domain

Page 10: From DOBES to CLARIN and beyond

CLARIN/D-SPIN Research Infrastructure

• basis of big supermarket are classification and convincing organization principles based on 10 years of experience we know that only a flexible component model will be accepted

• seem to go towards a Federation of LRT producers that can make contracts with Identity Federations just one signature necessary to get all researchers with their home identity integrated have already setup a first small test federation (EC-DAM-LR)

• researchers dream: virtual collection building and creating workflows flexibly - not trivial due to import/export aspects LREC showed that we know already a lot about the problem

Page 11: From DOBES to CLARIN and beyond

CLARIN/D-SPIN Network of Service Centers

• need a network of strong and persistent centers of "new" type

• researchers will only adapt if they can rely on new mechanisms

• need to simplify the IPR/license situation

Page 12: From DOBES to CLARIN and beyond

towards eHumanities

• CLARIN has > 100 members from 32 countries • in Germany 9 well-known centers and some more will join • is an enormous challenge to make a real step ahead in CLARIN

• can we all together extend to eHumanities infrastructure or are we already close to collapse?

Page 13: From DOBES to CLARIN and beyond

a few questions I

• will there be a separate infrastructure for each H discipline?

• NO

• there will be several shared services such as a PID registration and resolution service

• however: • building a joint infrastructure has to do with community building, trust, common language etc • too big communities would not work • so let's move on in TextGrid, DARIAH, CLARIN etc• but let's have a close and fair contact to find synergies

• competition will become heavy and our competitors are the Googles of the world!

Page 14: From DOBES to CLARIN and beyond

a few questions II

• will there be a single market place for the humanities?

• NO

• acceptance of a market place is dependent on classification and organization principles - as already said• these are different in all disciplines

• so have to start from the disciplines in our solutions • already difficult enough

• leave it to Semantic Web guys to enable cross-walk

Page 15: From DOBES to CLARIN and beyond

a few questions III

• who will be the main players?

• of course the big libraries, archives and museums• but what about the universities and big organizations such as MPG

• important: • we see new requirement profiles emerging • kind of job sharing can be predicted

• of course: close collaboration with innovative libraries such as SUB etc is required

computer centers

curation centers

content centers

highly specialized groups

RZG, GWDG

MPDL + few domain MPIs

a number of domain MPIs

highly specialized MPI departments

Page 16: From DOBES to CLARIN and beyond

a few questions IV

• key bricks for interoperability?

• we need open registries of all sort and smart registry frameworks

• schema registries• concept registries (ISOcat - a creation of ISO TC37/SC4)• relation registries • etc

• however:• a very complex landscape seems to emerge • how to make it usable by laymen?• how to convince researchers to work with them?

• no one knows yet - we need to try out - what else?

Page 17: From DOBES to CLARIN and beyond

Summary

• we need initiatives again and again to stepwise advance the borders

• it is now also time to transform existing knowledge into persistent infrastructures

• will need a lot of sensitivity and patience - RI building costs time

• emerging landscapes will have an underlying complexity • need to offer discipline vocabulary• need to hide complexity to a certain extent • need to offer persistency

Project solutions are not per se useful as infrastructure solutions!

Page 18: From DOBES to CLARIN and beyond

End

in Germany we have already a good mixture with TextGrid, DOBES, eAqua, DARIAH and CLARIN/D-SPIN have to get together frequently

Thanks for the attention.