rule-based information extraction is dead! long live rule-based information extraction systems!

1
Laura Chiticariu, Yunyao Li, Frederick Reiss IBM Research - Almaden Rule-based Information Extraction is DEAD Long Live Rule-based Information Extraction Systems ! Commercial Products (2013) NLP Papers (2003-2012) 100% 50% 0% 21% 75% Rule- Based Hybrid Machine Learning Based 45% 22% 33% Implementations of Entity Extraction Large Vendors 67% 17% 17% All Vendors 3.5% Year of Publication Fraction of NLP Papers Hybrid Machine Learning Based Rule- Based Entity Extraction Papers by Year THE DISCONNECT: ACADEMIC vs. INDUSTRY THE EXPLANATIONS PROs Declarative Heuristic Easy to comprehend Easy to maintain Easy to incorporate domain knowledge Easy to debug PROs Trainable Adaptable Reduces manual effort CONs Heuristic Requires tedious manual labor Rule-based IE ML-based IE CONs Requires labeled data Requires retraining for domain adaptation Requires ML expertise to use or maintain Opaque Where is the research in rule-based IE? Making it more principled, effective, and efficient Define standard IE rule language and data model . What is the right data model to capture text, annotations over text, and their properties? Can we establish a standard declarative extensible rule language to solve most IE tasks encountered so far? Systems research based on standard IE rule language. Data representation Automatic performance optimization Exploring modern hardware … ML research based on standard IE rule language How to learn basic primitives such as regular expressions and dictionaries? How to automatically generate rules that are understandable and maintainable? BRIDGING THE GAP Academia Industry Evaluating Benefits of IE Evaluating IE on its own Evaluating IE as part of a larger process Precision and Recall Using ill-defined metrics that are subject to change Evaluating Costs of IE Labor cost Hardware cost Labor cost of writing rules Business risk Others What’s the research in Rule-based IE?

Upload: yunyao-li

Post on 04-Jul-2015

176 views

Category:

Technology


3 download

DESCRIPTION

Poster for our ACL'2013 short paper "Rule-based Information Extraction is Dead! Long Live Rule-based Information Extraction Systems! "

TRANSCRIPT

Page 1: Rule-based Information Extraction is Dead! Long Live Rule-based Information Extraction Systems!

Laura Chiticariu, Yunyao Li, Frederick Reiss IBM Research - Almaden

Rule-based Information Extraction is DEAD

Long Live Rule-based Information Extraction Systems!

Commercial Products (2013)

NLP Papers(2003-2012)

100%

50%

0%

21%

75%

Rule-Based

Hybrid

MachineLearningBased

45%

22%

33%

Implementations of Entity Extraction

Large Vendors

67%

17%

17%

All Vendors

3.5%

Year of Publication

Frac

tion

of N

LP P

aper

s

Hybrid

MachineLearningBased

Rule-Based

Entity Extraction Papers by Year

THE DISCONNECT: ACADEMIC vs. INDUSTRY

THE EXPLANATIONS

PROs

•Declarative Heuristic•Easy to comprehend •Easy to maintain •Easy to incorporate domain knowledge•Easy to debug

PROs

•Trainable•Adaptable•Reduces manual effort

CONs

• Heuristic•Requires tedious manual labor

Rule-based IE ML-based IE

CONs

•Requires labeled data•Requires retraining for domain adaptation•Requires ML expertise to use or maintain• Opaque

Where is the research in rule-based IE? Making it more principled, effective, and efficient Define standard IE rule language and data model.

• What is the right data model to capture text, annotations over text, and their properties? • Can we establish a standard declarative extensible rule language to solve most IE tasks encountered so far?

Systems research based on standard IE rule language.• Data representation• Automatic performance optimization• Exploring modern hardware …

ML research based on standard IE rule language• How to learn basic primitives such as regular expressions and dictionaries?• How to automatically generate rules that are understandable and maintainable?

BRIDGING THE GAP

Academia Industry

Evaluating Benefits

of IEEvaluating IE on its own Evaluating IE as part of a larger processPrecision and Recall

Using ill-defined metrics that are subject to change Evaluating

Costs of IE Labor cost

Hardware cost

Labor cost of writing rules

Business risk

OthersWhat’s the research inRule-based IE?