rule-based information extraction is dead! long live rule-based information extraction systems!

Post on 04-Jul-2015

176 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Poster for our ACL'2013 short paper "Rule-based Information Extraction is Dead! Long Live Rule-based Information Extraction Systems! "

TRANSCRIPT

Laura Chiticariu, Yunyao Li, Frederick Reiss IBM Research - Almaden

Rule-based Information Extraction is DEAD

Long Live Rule-based Information Extraction Systems!

Commercial Products (2013)

NLP Papers(2003-2012)

100%

50%

0%

21%

75%

Rule-Based

Hybrid

MachineLearningBased

45%

22%

33%

Implementations of Entity Extraction

Large Vendors

67%

17%

17%

All Vendors

3.5%

Year of Publication

Frac

tion

of N

LP P

aper

s

Hybrid

MachineLearningBased

Rule-Based

Entity Extraction Papers by Year

THE DISCONNECT: ACADEMIC vs. INDUSTRY

THE EXPLANATIONS

PROs

•Declarative Heuristic•Easy to comprehend •Easy to maintain •Easy to incorporate domain knowledge•Easy to debug

PROs

•Trainable•Adaptable•Reduces manual effort

CONs

• Heuristic•Requires tedious manual labor

Rule-based IE ML-based IE

CONs

•Requires labeled data•Requires retraining for domain adaptation•Requires ML expertise to use or maintain• Opaque

Where is the research in rule-based IE? Making it more principled, effective, and efficient Define standard IE rule language and data model.

• What is the right data model to capture text, annotations over text, and their properties? • Can we establish a standard declarative extensible rule language to solve most IE tasks encountered so far?

Systems research based on standard IE rule language.• Data representation• Automatic performance optimization• Exploring modern hardware …

ML research based on standard IE rule language• How to learn basic primitives such as regular expressions and dictionaries?• How to automatically generate rules that are understandable and maintainable?

BRIDGING THE GAP

Academia Industry

Evaluating Benefits

of IEEvaluating IE on its own Evaluating IE as part of a larger processPrecision and Recall

Using ill-defined metrics that are subject to change Evaluating

Costs of IE Labor cost

Hardware cost

Labor cost of writing rules

Business risk

OthersWhat’s the research inRule-based IE?

top related