multilingual europe in late 2016 – a strategic research and innovation agenda for the multilingual...

26
META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER (grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119), CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899). Multilingual Europe in late 2016 A Strategic Research and Innovation Agenda for the Multilingual Digital Single Market Georg Rehm Coordinator CRACKER, General Secretary META-NET DFKI, Germany [email protected] FETLT 2016 2 nd International Workshop – Seville, Spain, 30 th November 2016

Upload: georg-rehm

Post on 10-Feb-2017

77 views

Category:

Technology


1 download

TRANSCRIPT

META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER(grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119), CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899).

Multilingual Europe in late 2016A Strategic Research and Innovation Agenda

for the Multilingual Digital Single Market

Georg RehmCoordinator CRACKER, General Secretary META-NET

DFKI, [email protected]

FETLT 2016 2nd International Workshop – Seville, Spain, 30th November 2016

Outlineq Initiatives for Multilingual Europeq Towards the Multilingual Digital Single Marketq The MDSM SRIA V0.9q Multilingual Europe in late 2016 –

where do we stand?

http://www.meta-net.eu – http://www.cracker-project.eu 2

q

60 research centres in 34 countries (founded in 2010)Chair of Executive Board: Jan Hajic (CUNI)Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde) General Secretary: Georg Rehm (DFKI)

q

Multilingual Europe Technology Alliance.826 members in 67 countries

(published in 2013) (31 volumes; published in 2012)

T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET

q Basqueq Bulgarian*q Catalanq Croatian*q Czech*q Danish*q Dutch*q English*q Estonian*q Finnish*q French*

q Galicianq German*q Greek*q Hungarian*q Icelandicq Irish*q Italian*q Latvian*q Lithuanian*q Maltese*q Norwegian

q Polish*q Portuguese*q Romanian*q Serbianq Slovak*q Slovene*q Spanish*q Swedish*q Welsh

* Official EU languagehttp://www.meta-net.eu/whitepapers

MT

English

good

French, Spanish

moderate fragmentary

Catalan, Dutch, German, Hungarian, Italian, Polish,

Romanian

weak or no support through LT

Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish,

Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh

excellent

Czech, Dutch, Finnish, French, German, Italian,

Portuguese, Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Danish, Estonian, Galician,

Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish

weak or no support through LT

Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian,

Welsh

excellent

English

good

Spee

ch

English

good

Dutch, French, German, Italian,

Spanish

moderate fragmentary

Basque, Bulgarian, Catalan,Czech, Danish, Finnish,

Galician, Greek, Hungarian, Norwegian, Polish,

Portuguese, Romanian, Slovak, Slovene, Swedish

weak or no support through LT

Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese,

Serbian, Welsh

excellent

English

good

Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish,

Swedish

moderate fragmentary

Basque, Bulgarian, Catalan, Croatian, Danish, Estonian,

Finnish, Galician, Greek, Norwegian, Portuguese,

Romanian, Serbian, Slovak, Slovene

Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh

weak or no support through LTexcellent

Res

ourc

esTe

xt A

naly

tics

Fragmentary

Weak/none

Moderate

Good

Excellent

Welsh

Maltese

Lithuanian

Latvian

Icelandic

Irish

Croatian

Serbian

Estonian

Slovene

Slovak

Roma

nian

Norwegian

Greek

Galician

Danish

Bulgarian

Basque

Swedish

Portu

guese

Finnish

Catal

anPo

lish

Hung

arian

Czech

Italia

nGe

rman

Dutch

Span

ishFre

nch

Engli

sh

Leve

l of s

uppo

rt

Languages with names in redhave little or no MT support

Results of the META-­NET  White  Paper  Study  (2012)

Strategic Research Agenda (2013)

q Addresses the problems we identified when preparing the white papers.

q Can put Europe ahead of its competitors in this technology area.

q 200 contributors; >2 years.54% industry; 46% research; 4% (inter)national institutions.

q Presented and discussed at 90+ conferences and major workshops.

q Published in early 2013.

q http://www.meta-net.eu/sra

http://www.meta-net.eu 7

Priority Research Themes

q Three priority research themes:§ Translingual Cloud§ Social Intelligence and

e-Participation§ Socially-Aware Interactive

Assistants

q Two additional themes:§ European Service Platform

for Language Technologies§ Core Technologies for

Language Analysis and Production

http://www.meta-net.eu 8

1 DFKI Germany Georg Rehm2 CUNI Czech Republic Jan Hajic3 ELDA France Khalid Choukri4 FBK Italy Marcello Federico5 ATHENA RC Greece Stelios Piperidis6 UEDIN UK Philipp Koehn7 USFD UK Lucia Specia

Coordination and Support Action, H2020-ICT17, 2015–2017, 36 months – http://www.cracker-project.eu

Cracking the Language BarrierCoordination, Evaluation and Resources for European MT Research

THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET

Multilingual access to all digital goods and services across Europe1

Geo-blocking:

due to nationality, location, or residence

customers

Language-blocking:

languages they do not speak

however, current online translation is insufficienttrying to conduct

common languages

Geo-blocking and language-blocking are barriers to access

Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.

Customers are six times more likely to buy from sites in their native language.

Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.

Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in

European businesses.

Language can be expensive for SMEs

Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar

and marketing costs.

Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.

English is not the answer52% of EU customers do not purchase

Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today

to increase market share.

6x morelikely to

purchase

Site in buyer’snative language

Site in foreignlanguage

Likel

ihoo

d of p

urch

asin

g

THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET

Multilingual access to all digital goods and services across Europe1

Geo-blocking:

due to nationality, location, or residence

customers

Language-blocking:

languages they do not speak

however, current online translation is insufficienttrying to conduct

common languages

Geo-blocking and language-blocking are barriers to access

Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.

Customers are six times more likely to buy from sites in their native language.

Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.

Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in

European businesses.

Language can be expensive for SMEs

Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar

and marketing costs.

Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.

English is not the answer52% of EU customers do not purchase

Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today

to increase market share.

6x morelikely to

purchase

Site in buyer’snative language

Site in foreignlanguage

Likel

ihoo

d of p

urch

asin

g

Communities• META-NET incl. META-SHARE and META• MT evaluation initiatives – WMT, IWSLT, MT Marathons• MT and other LT industry• Language resources – META-SHARE, ELRA• HT/MT evaluation tools – translate5 • Translation industry, translation profession• MT user communities

Strategic Agenda for the Multilingual Digital Single Market• Version 0.5 presented at META-FORUM 2015 (Riga)• Version 0.9 presented at META-FORUM 2016 (Lisbon)

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

Selected Activities

2015 2016 2017M

12M1

M24

M36

Kick-off meetingfor all ICT-17Projects

translate5 WMT2016

WMT2017

IWSLT2015

IWSLT2016

IWSLT2017

QT Marathon2015

QT Marathon2016

Roadmap forEuropean MT

Research

Survey on the Stateof HQMT in Industry

and LSPs

SRIA(initial version)

SRIA(update)

SRIA(final)

version 2version 1

• Production of  resources  (e.g.,  for  WMT  2016  and  2017,  IWSLT  2015-­2017)

• Tools (quality  control,  evaluations)• Strategies and  roadmaps  (SRIA,  Roadmap  for  European  MT  Research)

• Exchange  and  sharing  facility  for  resources  (META-­SHARE)

Recent or Upcoming Events

• LREC Workshop on MT Eval. (May 25)• META-FORUM 2016 (July 4/5, Lisbon)• WMT 2016 (Aug. 11/12, Berlin)• IWSLT 2016 (Dec. 8/9, Seattle)

• Federation of organisations and projects working on technologies for multilingual Europe.

• 10 organisations; 24 projects.• Areas of collaboration: data

management and repositories, tools, shared tasks, evaluations.

• Goal: provide one umbrella organisation for the whole community.

http://www.cracking-the-language-barrier.eu

http://www.cracker-project.eu • http://www.meta-net.eu

• Riga Summit 2015 and Riga Declaration.• Federation of European projects and

organisations working on technologies for a multilingual Europe.

• Multi-lateral Memorandum of Understanding; 10 organisations and 24 projects on board.

• Getting new members on a regular basis.• Selected areas of collaboration: data

management and repositories, tools, shared tasks, evaluations, events.

• Goal: provide one umbrella organisation for the whole community.

q Top priority in the European Union.

q Expected to add 400b€ to European GDP and hundreds of thousands of new jobs.

q Unfortunately, the language topic is not included in the EC’s Digital Single Market strategy (published in May 2015).

A. Ansip’s May 2016 Blog Post

q Posted on 27 May 2016. q First public acknowledgment

of the EC that the language topic is of very high relevance for the Digital Single Market.

q “Overcoming language barriers is vital for building the DSM, which is by definition multilingual. It is now time to reduce and remove the language barriers that are holding back its advance, and turn them into competitive advantages.”

http://www.meta-net.eu – http://www.cracker-project.eu 15

16

MDSM SRIA

q Version 0.5 unveiled at META-FORUM 2015q Version 0.9 unveiled at META-FORUM 2016q Version 1.0 foreseen for early 2017q Prepared and presented by Cracking the Language

Barrier federation (editorial team: 13 colleagues)q SRIA addresses how the LT community is going

to act united in order to make the DSM multilingualq Aligned to three of the BDVA SRIA V2.0’s technical priorities:

Data Management, Data Analysis, Data Processing.

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFTStrategic Agenda for the

Multilingual Digital Single Market

Technologies for Overcoming Language Barriers towardsa truly integrated European Online Market

DRAFT

Version 0.5 – April 22, 2015

http://www.meta-net.eu – http://www.cracker-project.eu 17

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

http

://w

ww

.cra

cker

-pro

ject

.eu

http://ww

w.cracking-the-language-barrier.eu

MDSM: Goals and Needs

q Crosslingual communication for SMEs, public institutions, citizensq Crosslingual SME presales communication and aftersales servicesq Multilingual (big) data, language and knowledge value chainsq Multilingual websites, product catalogues, product descriptionsq Multilingual knowledge bases and knowledge graphs (and services)q Multilingual conversational interfaces for connected devices (IoT)q Crosslingual business intelligence (e.g., based on UGC)q Crosslingual social media analytics for EU-wide societal issuesq Multilingual text and report generation (knowledge/data to text)q All services must be domain-adaptable (no one size fits all)q Translation Centre (Cloud) – HQ automated translation for all

http://www.meta-net.eu – http://www.cracker-project.eu 19

MLV Programme

q Multilingual Value Programe*§ Three-year programme§ Requires modest investment

q “Enabling the Multilingual Digital SingleMarket through technologies fortranslating, analysing, processing andcurating natural language content”

q Three components address the main needs of the Multilingual DSM (MDSM)and how to put them into practice:1. Multilingual Application Areas2. Multilingual Services3. Research

http://www.meta-net.eu – http://www.cracker-project.eu 20

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

* SRIA V0.9 and MLV Programme devisedbefore re-organisation of DG CONNECT.

Multilingual Digital Single Market

Automated Translation

E-Commerce Content, Media, Verticals

Translation, Language, Knowledge, Data

Knowledge andData Repositories

Multilingual Applications

Multilingual Services

ResearchCrosslingual Big Data Language

Analytics

Meaning, Semantics, Knowledge

High-Quality Machine

Translation

SMEs CEF DSIs IT Integrators Researchprovide innovative

applications

fills gaps

H2020 RIAs

H2020 CSAs, IAs, RIAs

H2020 CSAs, RAs, national funding

Multimodal Interaction

Language Processing, Analysis and Production – Language Resources

Citizens Public Business

interoperable and standardised

collaboration with member states

Conversational Technologies

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

MLV Programme

Application Areas (Selection)

q Multilingual E-commerce§ Customer-facing vs. back-office facing (after-market, after-sales)§ Crosslingual search, CRM, helpdesks, processes, workflows§ Semantic, crosslingual product descriptions and catalogues§ Online dispute resolution

q Multilingual Content, Media, Verticals§ Content analytics, curation, generation (incl. authoring support)§ Multimodal communication (conversational, written, IoT)§ Vertical domains: health, government, mobility, energy, legal.

q Translation, Language, Knowledge, Data§ Translation Cloud – written/spoken, automatic/human§ Crosslingual public and social intelligence, business intelligence§ HQ resources, under-resourced languages, domain-specific LRs

22

Setup – Timeframe – Costs

q Close collaboration with EC, EP and all other stakeholders (including SMEs, research centres, universities, NGOs etc.).

q Mix of funding sources: § Horizon 2020 (WP 2018-2020) for EU projects (RA, RIA, CSA)§ National/regional funding sources for work on monolingual LTs

and LRs and also to support and grow SMEs in this area§ Include, strengthen and broaden role of CEF AT (public services)

q Estimated costs for basic MLV implementation: ca. 175-200M€ § Includes set of mission-critical services and applications § Timeframe: 2018, 2019, 2020

http://www.meta-net.eu – http://www.cracker-project.eu 23

q Multilingual Europe: danger of digital language extinction; all languages are equal; multilingual DSM; world class LT research in Europe.

q Artificial Intelligence: Important breakthroughs and massive investments (USA, Asia) in AI R&D and applications (deep learning, DNNs).

q Need for LT: not only Multilingual DSM but also Translation, Internet of Things, Industrie 4.0, HCI, smart personal assistants etc.

q Need for European LT: US and other non-European technologies are not the solution! Europe must not make its crucial IT infrastructure dependent on non-European solutions (same reason why EU is building GALILEO).

q Digitalisation of our continent: SMEs, enterprises, public administrations are struggling to cope with the digital revolution (see Industrie 4.0, IoT etc.).

q Security and Privacy: Secure systems on European servers are essential for large-scale industry adoption.

q Growing need for Language Technologies made in Europe for Europe.

http://www.meta-net.eu – http://www.cracker-project.eu 24

Context – Current Developments

Multilingual Europe through

TechnologyCurrent Initiatives

and Activities

Multilingual Strategy of the EU: more tech

support for multilingualism

Language Technologies for Europe's digital public

services

Technologies for the

Multilingual Digital Single

Market

Language Technologies for Big Data text analytics

The Human Language

Project – long-term R&D&I, post-H2020

Language Technologies

R&D&I (H2020, WP

2018-20)

Multilingual Europein late 2016

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

Open calls andupcoming servicecontracts

Dec. 2016: EC brainstormingmeeting on future LT prioritiesIn and post Horizon 2020. Maybe a new document is needed?

Jan. 2017: STOA workshop and study on LT for Europe

Dec. 2017: LT Sessionat BDVA Summit inValencia

Q1 2017: MDSM SRIA V1.0

Policy change and initiative towards a European digital public sphere enabled by MT/LT

DG CONNECT

DGT andDG CONNECT

DG CONNECT

WP 2018-20 (incl. IoT, I4.0, assistants, robots etc.)

Shared programmebetween EU and MS MLV Programme

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

CEF ATELRC

Thank you for your attention.

[email protected]

http://www.meta-net.euhttp://www.facebook.com/META.Alliance

http://www.cracker-project.euhttp://www.cracking-the-language-barrier.eu

26