challenges for the language technology industry
DESCRIPTION
Presentation at the LT-Innovate Summit, Brussels, June 24-25 2014. http://www.lt-innovate.eu/event/item/lt-innovate-summit-2014-brusselsTRANSCRIPT
![Page 1: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/1.jpg)
Challenges for the LT Industry
Antoine Isaac
LT-Innovate Summit 2014
Brussels, June 25, 2014
![Page 2: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/2.jpg)
Europe’s platform to access cultural heritage
Currently33M objects
![Page 3: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/3.jpg)
Built on descriptive metadatafrom a broad, heterogeneous network
Audiovisual collections
National Aggregators
Regional Aggregators
Archives
Thematic collections
Libraries
Musées Lausannois
Culture.frThe European Library
APEX
European Film Gateway Europeana Fashion
2,300 galleries, museums, archives and libraries
![Page 4: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/4.jpg)
Platform implies network
![Page 5: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/5.jpg)
Accessing items from 36 countries
top 16
Portal interface in 31 languagesMetadata in 33 languages
![Page 6: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/6.jpg)
Serving Europe’s citizens
5M visits on Europeana.eu7M Facebook impressionsAPI use…
![Page 7: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/7.jpg)
Facilitating re-use on the language side?
Our network needs automatic translation tools to address information needs all over Europe
![Page 8: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/8.jpg)
Gathering/linking existing multilingual data
![Page 9: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/9.jpg)
Related projects applying NLP tools
E.g. a project (PATHS) developed techniques to enrich English and Spanish collections
1)Identification of key entities
2)Detection of (typed) similarities between objects, using metadata
3)“Background links” to external resources such as Wikipedia
4)Classification of object against a hierarchy of topics
Applying these to other languages would require work
1)-> requires language-specific tools (PoS tagging, lemmatization)
2)-> straightforward to apply to new languages
3)-> requires language-specific tools
4)-> depends on (3) and on translation of some topics
http://www.paths-project.eu/eng/Resources/Semantic-Enrichment-of-Cultural-Heritage-content-in-PATHS
![Page 10: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/10.jpg)
Language challenges for Digital Libraries
Typical queries are very short
Average < 2 terms
Identification of query language is not easy, even manually
39% of queries may belong to several languages
Plenty of named entities
60% of queries are for persons & places
Not only is it hard for queries: the same issues apply to the descriptive metadata
Studies by Humboldt University on Europeana and The European Libraryhttp://www.clef-initiative.eu/documents/71612/86374/CLEF2010wn-LogCLEF-StillerEt2010.pdf
![Page 11: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/11.jpg)
Language issues at the scale of Europe
![Page 12: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/12.jpg)
Very diverse domains, probably with few training corpora available
Tools, UCL Museums, CC-BY-NC-SAParis, nouvelle machine à paver : [photographie de presse] / [Agence Rol], National Library of France, Public DomainSt. Philip holding a book and St. James (the Less?) holding a book, National Library of the Netherlands, Public domainLa paloma / O sole mio, Dalane Folkemuseum, CC0
![Page 13: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/13.jpg)
Relevant LT can come from everywhere in Europe, raising interoperability issues
![Page 14: Challenges for the Language Technology Industry](https://reader033.vdocuments.us/reader033/viewer/2022061220/54bc9d184a7959906e8b459c/html5/thumbnails/14.jpg)
Resource problem
Both for us and our partners - libraries, archives, museums
Not much money
Few technical experts
Emphasis on open source technology
We can provide interesting challenges for the industry in terms of (open) data availability, users and scenarios.
But we're not (yet) a market of the size of others