tartar information extraction transforming arbitrary tables into f-logic frames with tartar...

Post on 19-Dec-2015

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

TARTARInformation Extraction

Transforming Arbitrary Tables into F-Logic Frames with TARTARAleksander Pivk, York Sure, Philipp Cimiano,Matjaz Gams, Vladislav Rajkovic, Rudi Studer

Presented By Stephen Lynn

TARTARInformation Extraction

Information Extraction Free-form Text

Linguistic/NLP approaches

Tabular StructuresTable comprehension task

html, excel, pdf, text, etc.Semantic interpretation taskMore effort???

TARTARInformation Extraction

TARTAR Architecture

TARTARInformation Extraction

Semantic Representation Frame Logic (F-Logic)

Model-theoretic semanticsComplete resolution-based proof theoryExpressive power of logicAvailability of efficient reasoning tools

TARTARInformation Extraction

F-Logic Frame

TARTARInformation Extraction

Table Comprehension Dimensions – a grouping of cells representing

similar entities

TARTARInformation Extraction

Table Comprehension Stub – dimension with headers used to index

elements in body

TARTARInformation Extraction

Table Comprehension Box head – column headers (often nested)

TARTARInformation Extraction

Table Comprehension Body – data values

TARTARInformation Extraction

Table Classes 1D, 2D, Complex

TARTARInformation Extraction

Methodology

TARTARInformation Extraction

Cleaning & Canonicalization Clean DOM tree

CyberNeko HTML Parser

Rowspan/Colspan expansion

TARTARInformation Extraction

Structure Detection Token Type Hierarchy Assign Functional Types and Probabilities

TARTARInformation Extraction

Structure Detection Detect Logical Table Orientation

TARTARInformation Extraction

Structure Detection Discover and Level Regions

Logical Units

TARTARInformation Extraction

FTM Building Functional Table Model (FTM)

Arrange regions into a treeLeaf nodes are data

TARTARInformation Extraction

Semantic Enriching of FTM Labeling

WordNet and GoogleSets

Map FTM to a frame

TARTARInformation Extraction

Evaluation Crawl, extract, filter web tables

135 tables85.4% success rateMostly problems with complex tables

Compare auto-generated frames with human generated frames14 people transformed 3 tables each21 total tables (each done twice)Syntactic/Semantic correctness (Strict and Soft)

TARTARInformation Extraction

Results

Inter-annotator agreement

System-annotator agreement

TARTARInformation Extraction

Benefits Fully automated knowledge formalization Arbitrary tables Independent of domain knowledge Independent of document type Explicit semantics of generated frames Query answering over heterogeneous tables

top related