from dobes to clarin and beyond
DESCRIPTION
. From DOBES to CLARIN and beyond. Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen. ?. . FACTS AND FIGURES. Non-profit-making foundation established unter private law based in Hanover - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/1.jpg)
From DOBES to CLARIN and beyond
Axel Horstmann
Peter Wittenburg
Erhard Hinrichs
VolkswagenFoundation
MPI for Psycholinguistics
University of Tübingen
?
![Page 2: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/2.jpg)
FACTS AND FIGURES
• Non-profit-making foundation established unter private law based in Hanover• Not affiliated with the car manufacturer of the same name• Founded by the Governments of the Federal Republic of • Germany and the State of Lower Saxony in 1961• Objective: to support science and technology as well as the humanities and the social sciences in research and university teaching• Assets: about 2.45 billion euros• Funding p.a.: about 110 million euros• One of the most potent private research funding foundations in Europe
![Page 3: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/3.jpg)
FOCUS ON HUMANITIES AND SOCIAL SCIENCES
• Current funding initiatives (see KURZINFORMATION / BASIC INFORMATION): about 45 to 50 % of the funds given to H&SC
• Initiatives focussing on infrastructural support of H&SC:• Kulturwissenschaftliche Dokumentation (closed)• Archive als Fundus der Forschung (closed)• DOBES: Dokumentation bedrohter Sprachen
• Projects including infrastructural support of H&SC• Strategy building on digitization of endangered books• Digitization of the so-called “Aschebücher” of the HAAB
Weimar (in preparation)
![Page 4: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/4.jpg)
"E-HUMANITIES": POSSIBILITIES AND PERSPECTIVES
• Strong interest in innovative approaches
• Funds available for projects involving activities towards "E-Humanities" (e.g.: digitization of data, collections, archival material) within current funding initiatives
• Funding possibilities for meetings, workshops, conferences etc. focussing on "E-Humanities" (within the funding initiative Symposia and Summer Schools)
• New perspectives on "E-Humanities" (possibly) opened up within a new funding initiative aiming at Research in Museums (actually in planning) including to a certain extent digitization activities - … and not to forget the
• Flagship "DOBES" ...
![Page 5: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/5.jpg)
Concrete steps or Babylonian Tower
• we don’t know exactly what eHumanities means
• we feel that mechanisms in research processes are changing rapidly with technological innovation as motor• but we can’t say: “we are now going to design eHumanities” • we probably can say: “let’s plan further concrete projects and actions and see”
• many excellent projects around – let me just refer to the good sides of DOBES as one of these steps
(Documentation of Endangered Languages funded by VolkswagenFoundation)
![Page 6: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/6.jpg)
What is DOBES?
44 DOBES teams working fully distributed and self-organized incl. linguists, anthropologists, musicologists, ethno-biologists, etc. In addition, VWF installed a central archive Start in 2000
![Page 7: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/7.jpg)
What changed in DOBES?
• handing over all data after a limited time to an archive was completely new and is an explicit step, although the results will not be ready
• there is a push to make data accessible to others from the beginning - also new for many and not without conflicts
• asking researchers to categorize and organize material according to agreed metadata was also new and still requires evangelization
• including multimedia in the documentation and dealing with audio/video as basis was kind of new and requires techno-knowledge
![Page 8: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/8.jpg)
Which infrastructure by DOBES?
• a stable, reliable and open repository/archiving system handling 30 TB • data storage not encapsulated and in open formats • introduction of persistent identifiers to ensure investments in relating fragments• a network of 12 centers worldwide included in data distribution• of these 6 copies in centers with hardware migration strategy • a number of web-based applications offering various ways to access the data
![Page 9: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/9.jpg)
CLARIN/D-SPIN Challenges
eResearch is about global collaboration in key areas of science and the next generation of infrastructure that will enable it (J. Taylor)
• goal is an open research infrastructure to overcome the huge fragmentation of language resources and tools and to offer them to research communities - in particular to humanities
• help tackling the LARGE challenges (multilingual societies)
• but also helping the individual researcher • example: align a transcription and an audio signal• how many researchers know about how to do this
• see CLARIN/D-SPIN as a huge virtual marketplace of resources and tools that can be combined due to integration and interoperability solutions • not forget Henry Thompsons (one of the XML fathers) don't have an agreed descriptive system in our domain
![Page 10: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/10.jpg)
CLARIN/D-SPIN Research Infrastructure
• basis of big supermarket are classification and convincing organization principles based on 10 years of experience we know that only a flexible component model will be accepted
• seem to go towards a Federation of LRT producers that can make contracts with Identity Federations just one signature necessary to get all researchers with their home identity integrated have already setup a first small test federation (EC-DAM-LR)
• researchers dream: virtual collection building and creating workflows flexibly - not trivial due to import/export aspects LREC showed that we know already a lot about the problem
![Page 11: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/11.jpg)
CLARIN/D-SPIN Network of Service Centers
• need a network of strong and persistent centers of "new" type
• researchers will only adapt if they can rely on new mechanisms
• need to simplify the IPR/license situation
![Page 12: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/12.jpg)
towards eHumanities
• CLARIN has > 100 members from 32 countries • in Germany 9 well-known centers and some more will join • is an enormous challenge to make a real step ahead in CLARIN
• can we all together extend to eHumanities infrastructure or are we already close to collapse?
![Page 13: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/13.jpg)
a few questions I
• will there be a separate infrastructure for each H discipline?
• NO
• there will be several shared services such as a PID registration and resolution service
• however: • building a joint infrastructure has to do with community building, trust, common language etc • too big communities would not work • so let's move on in TextGrid, DARIAH, CLARIN etc• but let's have a close and fair contact to find synergies
• competition will become heavy and our competitors are the Googles of the world!
![Page 14: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/14.jpg)
a few questions II
• will there be a single market place for the humanities?
• NO
• acceptance of a market place is dependent on classification and organization principles - as already said• these are different in all disciplines
• so have to start from the disciplines in our solutions • already difficult enough
• leave it to Semantic Web guys to enable cross-walk
![Page 15: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/15.jpg)
a few questions III
• who will be the main players?
• of course the big libraries, archives and museums• but what about the universities and big organizations such as MPG
• important: • we see new requirement profiles emerging • kind of job sharing can be predicted
• of course: close collaboration with innovative libraries such as SUB etc is required
computer centers
curation centers
content centers
highly specialized groups
RZG, GWDG
MPDL + few domain MPIs
a number of domain MPIs
highly specialized MPI departments
![Page 16: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/16.jpg)
a few questions IV
• key bricks for interoperability?
• we need open registries of all sort and smart registry frameworks
• schema registries• concept registries (ISOcat - a creation of ISO TC37/SC4)• relation registries • etc
• however:• a very complex landscape seems to emerge • how to make it usable by laymen?• how to convince researchers to work with them?
• no one knows yet - we need to try out - what else?
![Page 17: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/17.jpg)
Summary
• we need initiatives again and again to stepwise advance the borders
• it is now also time to transform existing knowledge into persistent infrastructures
• will need a lot of sensitivity and patience - RI building costs time
• emerging landscapes will have an underlying complexity • need to offer discipline vocabulary• need to hide complexity to a certain extent • need to offer persistency
Project solutions are not per se useful as infrastructure solutions!
![Page 18: From DOBES to CLARIN and beyond](https://reader035.vdocuments.us/reader035/viewer/2022062519/5681557d550346895dc346b0/html5/thumbnails/18.jpg)
End
in Germany we have already a good mixture with TextGrid, DOBES, eAqua, DARIAH and CLARIN/D-SPIN have to get together frequently
Thanks for the attention.