rule-based information extraction is dead! long live rule-based information extraction systems!
Post on 04-Jul-2015
176 Views
Preview:
DESCRIPTION
TRANSCRIPT
Laura Chiticariu, Yunyao Li, Frederick Reiss IBM Research - Almaden
Rule-based Information Extraction is DEAD
Long Live Rule-based Information Extraction Systems!
Commercial Products (2013)
NLP Papers(2003-2012)
100%
50%
0%
21%
75%
Rule-Based
Hybrid
MachineLearningBased
45%
22%
33%
Implementations of Entity Extraction
Large Vendors
67%
17%
17%
All Vendors
3.5%
Year of Publication
Frac
tion
of N
LP P
aper
s
Hybrid
MachineLearningBased
Rule-Based
Entity Extraction Papers by Year
THE DISCONNECT: ACADEMIC vs. INDUSTRY
THE EXPLANATIONS
PROs
•Declarative Heuristic•Easy to comprehend •Easy to maintain •Easy to incorporate domain knowledge•Easy to debug
PROs
•Trainable•Adaptable•Reduces manual effort
CONs
• Heuristic•Requires tedious manual labor
Rule-based IE ML-based IE
CONs
•Requires labeled data•Requires retraining for domain adaptation•Requires ML expertise to use or maintain• Opaque
Where is the research in rule-based IE? Making it more principled, effective, and efficient Define standard IE rule language and data model.
• What is the right data model to capture text, annotations over text, and their properties? • Can we establish a standard declarative extensible rule language to solve most IE tasks encountered so far?
Systems research based on standard IE rule language.• Data representation• Automatic performance optimization• Exploring modern hardware …
ML research based on standard IE rule language• How to learn basic primitives such as regular expressions and dictionaries?• How to automatically generate rules that are understandable and maintainable?
BRIDGING THE GAP
Academia Industry
Evaluating Benefits
of IEEvaluating IE on its own Evaluating IE as part of a larger processPrecision and Recall
Using ill-defined metrics that are subject to change Evaluating
Costs of IE Labor cost
Hardware cost
Labor cost of writing rules
Business risk
OthersWhat’s the research inRule-based IE?
top related