© copyright 2013 abbyy nlp platform for eu-lingual digital single market alexander rylov lti summit...

Post on 30-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© Copyright 2013 ABBYYConfidential

NLP PLATFORMFOR EU-LINGUALDIGITAL SINGLE MARKET

Alexander Rylov

LTi Summit 2013

Confidential

Market fragmentation

By domains By languages

3Confidential

WHY SHOULD LT VENDORS

SHARE THEIR RESOURCES?

● Many of LT vendors have their own LT

● LTs are focused on particular domain/language(s)

● Resources are critical for enabling such technologies

● If case of share vendors may loose competitive advantage

4Confidential

Technologies ability and restrictions

● Language specific = language centric = limited by language

● Difficulties - Controlled links ● Anaphora● Long distance links● Ellipsis

● Ontology, dictionaries, statistic = trained on limited set of data = covers only limited variety of meaning representations = sometimes good to achieve 40% of recall (NER US DoD track)

5Confidential

WHAT IS

BIGDATA… ● Multilingual● Covers more than 1 domain● 85 – 90% is in unstructured

text documents● Language expression of the

same meaning vary by uncountable number of ways

6Confidential

A FUNDAMENTAL NATURAL LANGUAGE TECHNOLOGYREQUIRED SCALABLE BY DOMAINS AND LANGUAGES

7Confidential

ABBYY Compreno as proposal

● Interlingua approach:● semantic model is based on universal

language independent representation both for lexis and grammar

● Working Languages:● Russian, English: at the stage of

terminological and collocation expansion● German: full prototype (lexis, syntax) is

completed; at the stage of main lexis expansion (from core to periphery)

● French: full prototype is completed (tested on controlled MT task) ;

● Chinese: lexical system prototype is completed (challenged task never carried out before);

● It is proved that Compreno is a scalable technology to use for any language

Universal Semantic Hierarchy

Statistic and

machine learning

Syntactic and

semantic analysis

Complete syntactic and semantic analysis

The bank was located at the bank of the river; it was closed.

The complete analysis helps overcome linguistic problems in the text, if any..

9Confidential

Compreno current achievements

Russian syntax analysis 2011 Precision Recall F

Compreno 0.95 0.98 0.97

System 2 0.93 0.98 0.96

System 3 0.90 0.98 0.94

System 4 0.89 0.95 0.92

System 5 0.86 0.98 0.92

System 6 0.86 0.86 0.86

System 7 0.79 0.98 0.87

Fact Extraction 2013 Compreno System 1 Compreno System 2 Compreno System 3

Precision 0.95 0.95 0.96 0.98 0.92 0.92Recall 0.93 0.70 0.84 0.44 0.92 0.74

F-measure 0.94 0.81 0.90 0.61 0.92 0.82ABBYY advantage 14% 32% 10%

10Confidential

Applications

● BigData analytics – analysis of facts, extraction of objects

● Intelligence, eDiscovery (any kind)● Search by meaning rather than by

concepts● Dialogues systems by natural language● Translation

11Confidential

Few facts about Compreno

● 18 years of development● About 350 people involved● More than 2000 man-years

12Confidential

Barriers for wide implementation

● At least 3 years per language● At least 30 linguists per language● At least 12M € per language

● Then support and improvement

13Confidential

EU project idea

● Describe ALL EU languages● Describe Major domains: healthcare,

law, government, major industries

● ABBYY commitment:● Methodology, management, instruments

14Confidential

EU BENEFITS – CREATE SINGLE DIGITAL LT MARKET

● Operate not with language but with universal model of it – interlingual approach● Describe one domain in one

language – apply in all other languages

● A platform for LT vendors to create solutions and products easy scalable by languages and domains

top related