Towards Efficient and Effective Semantic Table Interpretation
Ziqi Zhang Department of Computer Science, University of Sheffield
Outline
• Define semantic table interpretation
• State-of-the-art and motivation
• The method – TableMiner
• Evaluation
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Semantic Table Interpretation
• Input
• Ontology
• Relational table
• Goals/Tasks
• Label columns by concepts
• Link cells to named entities
• Connect columns by
relations
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Thing Work
Artist
Location
… …
Ent:USA
Ent:UK
… Film
Actor/ Actress
Country
Name Film Country
1 Tom Hanks Philadelphia USA
2 Jamie Foxx Ray USA
3 Kate Winslet The Reader UK
99 Charlize Theron
Monster South Africa
Table of Best Actor/Actress
< … … >
… … Rel:performIn
Rel:performIn
Semantic Table Interpretation
• Input
• Ontology
• Relational table
• Goals/Tasks
• Label columns by concepts
• Link cells to named entities
• Connect columns by
relations
Column classification/ header
disambiguation
Relation interpretation
Cell disambiguation
Motivation and State-of-the-art
• 154 mil. relational tables on the Web and growing [Cafarella2008]
• Classic Information Extraction methods do not work [Limaye2010, Lu2013]
• They cannot model the complex interdependence among table components
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Motivation and State-of-the-art
• SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013]
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Limitation 1 Inference is ‘exhaustive’, but unnecessary
Name Film Country
1 Tom Hanks Philadelphia USA
2 Jamie Foxx Ray USA
3 Kate Winslet The Reader UK
99 Charlize Theron
Monster South Africa
Table of Best Actor/Actress
< … … >
Goal: Assign a concept to this column
Hint: Content in the column gives useful clues
How much do we need for inference (99 rows in this example)?
- Human: SOME (learn by example)
- SoA: ALL
Motivation and State-of-the-art
• SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013]
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Limitation 2 Contextual features for inference
Table of Best Actor/Actress
SoA: features only from within the table
Context outside the table also makes hint for interpretation. E.g., the words in the paragraph are often found in descriptions of actors
TableMiner
TableMiner
• Two tasks:
• Column classification
• Cell disambiguation
• Non-exhaustive inference in a bootstrapping pattern
• phase 1 – inference with partial content
• phase 2 – propagation and update
• Contextual features both inside and outside tables
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Itr.1
….
(until stop)
Ei,j= {<e1,s1>, <e2,s2>, …}
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Itr.1
….
(until stop)
Ei,j= {<e1,s1>, <e2,s2>, …}
concepts = {<c1,s1>, <c2,s2>, …}
Cj= {<c1,s1’>, <c2,s2‘>}
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Itr.1
….
(until stop)
Ei,j= {<e1,s1>, <e2,s2>, …}
concepts = {<c1,s1>, <c2,s2>, …}
Cj= {<c1,s1’>, <c2,s2‘>}
|H(Cj) – H(prevCj)|<t? Yes – stop
No – next itr.
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
….
(until stop)
concepts = {<c1,s1>, <c3,s3>, …}
Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>}
Ei,j= {<e1,s1>, <e2,s2>, …}
Itr.2
|H(Cj) – H(prevCj)|<t? Yes – stop
No – next itr.
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
….
(until stop)
Itr.3 Ei,j= {<e1,s1>, <e2,s2>, …}
concepts = {<c11,s11>}
Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>}
|H(Cj) – H(prevCj)|<t? Yes – stop
No – next itr.
TableMiner – Phase 1 I-Inf
• To compute scores of candidate named entities (e.g.
<e1,s1>) and concepts (e.g., <c1,s1’>)
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
• Candidate NE
• Build a feature vector of a candidate using the ontology
• Build a feature vector of the cell/column header using its context
• Compute vector similarity
• Candidate concept: same principle, but also depends on score of contributing NEs
TableMiner – Phase 2 Propagate, Update
• When I-Inf stops
• Select the highest scoring candidate concept c+ to label the column
• Propagate: use c+ as constrain to disambiguate remaining cells – candidate NEs not belonging to c+ are discarded
• Update:
• Re-compute c+ after all cells are disambiguated
• If the new c+ is different, revise disambiguation across the entire column with it as new constraint
• Repeat until no change
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>}
c+ Rank and select
Use as constraint to disamb-iguate cells
Evaluation
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Evaluation
• Data
• Freebase as reference ontology/background knowledge
• Limaye112 – 112 Web tables from Limaye2010 originally annotated with Wikipedia
• Cells are automatically mapped to Freebase – some are unmapped
• Columns are manually annotated
• IMDB – 7,354 “cast” tables of films mapped to Freebase
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Evaluation
• Baselines (both uses exhaustive inference)
• Bfirst - cell disambiguation: choose the top ranked NE candidate in the Freebase search result
- column classification: each disambiguated cell casts a vote to the set of concepts the NEs belong to, and the majority wins
• Bsim - cell disambiguation: string similarity + feature vector similarity (in-table context only)
- column classification: the majority vote method as above + string similarity
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Evaluation Results
• Cell disambiguation
Manual validation of 932 cell annotations in Limaye112
not covered by the above results (i.e., unmapped cells)
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
If only consider those cells
where at least one system
predicts correctly
TableMiner – Evaluation Results
• Column classification
best only – a column is labelled correctly only if the concept
is suitable for the data in the column and is specific enough
best or ok – a column is labelled correctly if the concept is
suitable for the data in the column, though not very specific
(E.g., ‘Film Actors’ may be the best, while ‘Artist’ or
‘Person’ is OK, but ‘Engineer’ is incorrect)
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Evaluation Results
• Efficiency – TableMiner is efficient because
• Column classification: processes partial content from a column (avg. 57% Limaye112, 43% IMDB)
• Cell disambiguation: constrained by column classification, resulting in smaller NE candidate space (avg. 32% reduction Limaye32, 24% IMDB)
• Fewer candidates => less time spent on retrieval and feature space creation (typically >90% of CPU in the pipeline, Limaye2010)
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Conclusion
• TableMiner take-home messages
• How can it be more effective?
• Use both context within and outside tables as features for inference
• How can it be more efficient?
• Perform inference with partial data and follow the boot-strapping pattern of learning
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Message 1
Message 2
References
• [Cafarella2008] Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y. 2008: Webtables: exploring the power of tables on the web. Proceedings of VLDB Endowment 1(1), 538–549
• [Limaye2010] Limaye, G., Sarawagi, S., Chakrabarti, S. 2010: Annotating and searching web tables using entities, types and relationships. Proceedings of the VLDB Endowment 3(1-2), 1338–134
• [Lu2013] Lu, C., Bing, L., Lam, W., Chan, K., Gu, Y. 2013: Web entity detection for semi-structured text data records with unlabeled data. International Journal of Computational Linguistics and Applications
• [Mulwad2013] Mulwad, V., Finin, T., Joshi, A. 2013: Semantic message passing for generating linked data from tables. In: International Semantic Web Conference (1). pp. 363–378. Lecture Notes in Computer Science, Springer
• [Venetis2011] Venetis, P., Halevy, A., Madhavan, J., Pas ca, M., Shen,W.,Wu, F., Miao, G.,Wu, C. 2011: Recovering semantics of tables on the web. Proceedings of VLDB Endowment 4(9), 528–538
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Thank you
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation