Анатолий Старостин (abbyy) "abbyy infoextractor: технология...

ABBYY InfoExtractor: technology of producing domain oriented information extraction systems

Starostin A.S.

NLP-technologies

Rule-based - technologies based on the use of hand-written language rules applicable to a particular task.

Statistics-based - technologies based on machine learning on large text corpora, labeled, or parallel.

Hybrid technology - connecting a variety of approaches, for example: Rule-based + Statistics-based.

Model-based - technologies based on the universal (complete) language modeling

ABBYY Compreno

Universal Semantic Hierarchy

• It’s a tree• Intermediate nodes

represent semantic classes (concepts)

• Leafs represent lexical classes

• Concrete lexemes are linked to lexical classes

• All nodes are labeled with grammar and semantic information (set of grammemes and set of semantemes)

Syntactic-Semantic tree

Google sold Motorola to Lenovo for $2.91 billion.

OWL ontology

RDF graph

IE development factory

Extraction algorithm

Parse subtree interpretation rules

Identification rules

Type of statements

IE system production

Design Input: customer needs (unformal), text examples

(marked up or not) Output: OWL-ontology where every object is well-

documented Development Input: well-documented OWL-ontology, marked up text

examples Output: production system of rules

Testing Nightly testing (marked up corpora) Reclamations (pointed error examples)

All three activities within one framework, which is called OntoDPS

IE system design

IE system design (marked up text example)

IE system development: libraries

IE system development: projects

IE system development: reuse and customization

Adding new items to

dictionaries

Adding new instances to

ontologies

Reuse of libraries and rules

Complex rule customization

IE system testing: nightly testing

Thank you!Questions?

Анатолий Старостин (abbyy) "abbyy infoextractor: технология...

hybrid technology

semantemesabbyy infoextractor

librariesabbyy infoextractor

projectsabbyy infoextractor

rdf graphabbyy infoextractor

system development

system testing

production system of

Technology

abbyy recognition server 3

abbyy i phone_apps_engine

system administrator's guide - abbyy · system...

смартфон от извлечения данных...

abbyy® compreno products 2 › media › 10595 ›...

trend-setting - abbyy

abbyy® finereader 12 corporate · abbyy finereader 12...

abbyy finereader manual

abbyy® content iq for robotic process automation -...

introduction to abbyy flexicapture

abbyy effective day beograd

abbyy® finereader 11

abbyy finereader 12 · abbyy business card reader...

abbyy finereader v12.0.101

abbyy mobile products 0315

abbyy lingvo x5abbyy lingvo x5 system administrator’s...

abbyy flexicapture 10

abbyy mobile data capture...

Развивающая предметно...

abbyy® finereader 10 · abbyy finereader 10 corporate...