![Page 1: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/1.jpg)
© Copyright 2013 ABBYYConfidential
NLP PLATFORMFOR EU-LINGUALDIGITAL SINGLE MARKET
Alexander Rylov
LTi Summit 2013
![Page 2: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/2.jpg)
Confidential
Market fragmentation
By domains By languages
![Page 3: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/3.jpg)
3Confidential
WHY SHOULD LT VENDORS
SHARE THEIR RESOURCES?
● Many of LT vendors have their own LT
● LTs are focused on particular domain/language(s)
● Resources are critical for enabling such technologies
● If case of share vendors may loose competitive advantage
![Page 4: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/4.jpg)
4Confidential
Technologies ability and restrictions
● Language specific = language centric = limited by language
● Difficulties - Controlled links ● Anaphora● Long distance links● Ellipsis
● Ontology, dictionaries, statistic = trained on limited set of data = covers only limited variety of meaning representations = sometimes good to achieve 40% of recall (NER US DoD track)
![Page 5: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/5.jpg)
5Confidential
WHAT IS
BIGDATA… ● Multilingual● Covers more than 1 domain● 85 – 90% is in unstructured
text documents● Language expression of the
same meaning vary by uncountable number of ways
![Page 6: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/6.jpg)
6Confidential
A FUNDAMENTAL NATURAL LANGUAGE TECHNOLOGYREQUIRED SCALABLE BY DOMAINS AND LANGUAGES
![Page 7: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/7.jpg)
7Confidential
ABBYY Compreno as proposal
● Interlingua approach:● semantic model is based on universal
language independent representation both for lexis and grammar
● Working Languages:● Russian, English: at the stage of
terminological and collocation expansion● German: full prototype (lexis, syntax) is
completed; at the stage of main lexis expansion (from core to periphery)
● French: full prototype is completed (tested on controlled MT task) ;
● Chinese: lexical system prototype is completed (challenged task never carried out before);
● It is proved that Compreno is a scalable technology to use for any language
Universal Semantic Hierarchy
Statistic and
machine learning
Syntactic and
semantic analysis
![Page 8: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/8.jpg)
Complete syntactic and semantic analysis
The bank was located at the bank of the river; it was closed.
The complete analysis helps overcome linguistic problems in the text, if any..
![Page 9: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/9.jpg)
9Confidential
Compreno current achievements
Russian syntax analysis 2011 Precision Recall F
Compreno 0.95 0.98 0.97
System 2 0.93 0.98 0.96
System 3 0.90 0.98 0.94
System 4 0.89 0.95 0.92
System 5 0.86 0.98 0.92
System 6 0.86 0.86 0.86
System 7 0.79 0.98 0.87
Fact Extraction 2013 Compreno System 1 Compreno System 2 Compreno System 3
Precision 0.95 0.95 0.96 0.98 0.92 0.92Recall 0.93 0.70 0.84 0.44 0.92 0.74
F-measure 0.94 0.81 0.90 0.61 0.92 0.82ABBYY advantage 14% 32% 10%
![Page 10: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/10.jpg)
10Confidential
Applications
● BigData analytics – analysis of facts, extraction of objects
● Intelligence, eDiscovery (any kind)● Search by meaning rather than by
concepts● Dialogues systems by natural language● Translation
![Page 11: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/11.jpg)
11Confidential
Few facts about Compreno
● 18 years of development● About 350 people involved● More than 2000 man-years
![Page 12: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/12.jpg)
12Confidential
Barriers for wide implementation
● At least 3 years per language● At least 30 linguists per language● At least 12M € per language
● Then support and improvement
![Page 13: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/13.jpg)
13Confidential
EU project idea
● Describe ALL EU languages● Describe Major domains: healthcare,
law, government, major industries
● ABBYY commitment:● Methodology, management, instruments
![Page 14: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649ea75503460f94baadf0/html5/thumbnails/14.jpg)
14Confidential
EU BENEFITS – CREATE SINGLE DIGITAL LT MARKET
● Operate not with language but with universal model of it – interlingual approach● Describe one domain in one
language – apply in all other languages
● A platform for LT vendors to create solutions and products easy scalable by languages and domains