a semantic best-effort approach for extracting structured discourse graphs from wikipedia
DESCRIPTION
Most information extraction approaches available today have either focused on the extraction of simple relations or in scenarios where data extracted from texts should be normalized into a database schema or ontology. Some relevant information present in natural language texts, however, can be irregular, highly contextualized, with complex semantic dependency relations, poorly structured, and intrinsically ambiguous. These characteristics should also be supported by an information extraction approach. To cope with this scenario, this work introduces a seman- tic best-effort information extraction approach, which targets an information extraction scenario where text information is extracted under a pay-as-you-go data quality perspective, trading high-accuracy, schema consistency and terminological normalization for domain-independency, context capture, wider extraction scope and maximization of the text semantics extraction and representation. A semantic information ex- traction framework (Graphia) is implemented and evaluated over the Wikipedia corpus.TRANSCRIPT
![Page 1: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/1.jpg)
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
A Semantic Best-Effort Approach for Extracting Structured
Discourse Graphs from WikipediaAndré Freitas, Danilo Carvalho, J. C. P. da
Silva, Sean O’Riain, Edward Curry
![Page 2: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/2.jpg)
Digital Enterprise Research Institute www.deri.ie
Outline
Motivation Representation
Requirements Semantic Best-effort Representation
Extraction Graphia Extractor Preliminary Evaluation Extraction Examples
Conclusion
![Page 3: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/3.jpg)
Digital Enterprise Research Institute www.deri.ie
Motivation
![Page 4: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/4.jpg)
Digital Enterprise Research Institute www.deri.ie
Motivation
![Page 5: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/5.jpg)
Digital Enterprise Research Institute www.deri.ie
Motivation
![Page 6: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/6.jpg)
Digital Enterprise Research Institute www.deri.ie
Motivation
![Page 7: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/7.jpg)
Digital Enterprise Research Institute www.deri.ie
Motivation
Linked Data Terminological and structural regularity Shared semantic agreement between data consumers
Natural language texts No terminological or structural
regularity Highly contextualized Complex semantic dependency
relations Ambiguity Information selection/normalization
- vocabulary constraints+ entity-centric+ pay-as-you-go data semantics = semantic best-effort
![Page 8: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/8.jpg)
Digital Enterprise Research Institute www.deri.ie
Motivation
Vocabulary-independent (schema-free queries) How to abstract users from knowing the data
representation? Semantic matching
Schemaless databases in the limit demands vocabulary-independency
How information extraction is reshaped in this scenario?
![Page 9: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/9.jpg)
Digital Enterprise Research Institute www.deri.ie
Motivational Scenario
What is the relationship between Barack Obama and Indonesia?
Sentence: From age six to ten, Obama attended local schools in Jakarta, including Besuki Public School and St Francis Assisi School.
Semantic Best-effort Extraction
Entity-centric text representation
![Page 10: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/10.jpg)
Digital Enterprise Research Institute www.deri.ie
Representation
![Page 11: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/11.jpg)
Digital Enterprise Research Institute www.deri.ie
Computational Linguistics Perspective
What is already there to represent NL? Discourse Representation Theory (DRT) Semantic Role Labeling (SRL)
![Page 12: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/12.jpg)
Digital Enterprise Research Institute www.deri.ie
Discourse Representation Theory (DRT) “The key idea behind (...) Discourse Representation Theory
is that each new sentence of a discourse is interpreted in the context provided by the sentences preceding it.”
van Eijck and Kamp Models propositions in discourse (multiple sentences). Discourse representation structures (DRS).
John enters a card. Every card is green.
![Page 13: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/13.jpg)
Digital Enterprise Research Institute www.deri.ie
Semantic Role Labeling (SRL)
Shallow semantic parsing. Detection of arguments associated with a predicate. Associated semantic types to arguments.
Bill cut his hair with a razor
[Agent Bill] cut [Patient his hair] [Instrument with a razor.]
![Page 14: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/14.jpg)
Digital Enterprise Research Institute www.deri.ie
Semantic Best-Effort
Objectives: Entity-centric & Standardized: easier to integrate with
other resources Remove the formal constraints and the ‘baggage’ from
existing approaches Representation robust to extraction limitations/errors
![Page 15: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/15.jpg)
Digital Enterprise Research Institute www.deri.ie
Semantic Best-Effort Requirements
Text segmentation into (s,p,o)s Context representation Conceptual model independency Resolve co-references (pay-as-you-go) Represent recurrent discourse structures Standardized representation (RDF(S)) Principled interpretation (compositionality)
![Page 16: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/16.jpg)
Digital Enterprise Research Institute www.deri.ie
Examples
- Text segmentation into (s,p,o)s
- Context representation- Resolve co-references (pay-as-you-go)- Conceptual model independency
![Page 17: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/17.jpg)
Digital Enterprise Research Institute www.deri.ie
Examples
- Context representation
![Page 18: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/18.jpg)
Digital Enterprise Research Institute www.deri.ie
Examples
![Page 19: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/19.jpg)
Digital Enterprise Research Institute www.deri.ie
Examples
- Represent recurrent discourse structures
![Page 20: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/20.jpg)
Digital Enterprise Research Institute www.deri.ie
Examples
- Represent recurrent discourse structures
- Resolve co-references (pay-as-you-go)
![Page 21: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/21.jpg)
Digital Enterprise Research Institute www.deri.ie
Examples
- Represent recurrent discourse structures
![Page 22: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/22.jpg)
Digital Enterprise Research Institute www.deri.ie
SDG Elements
Named, non-named entities and properties Quantifiers & operators Triple Trees Context elements Co-Referential elements Resolved & normalized entities
![Page 23: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/23.jpg)
Digital Enterprise Research Institute www.deri.ie
Graph Patterns
![Page 24: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/24.jpg)
Digital Enterprise Research Institute www.deri.ie
[[Interpretation]]
Graph traversal – deref sequence
![Page 25: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/25.jpg)
Digital Enterprise Research Institute www.deri.ie
Extraction
![Page 26: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/26.jpg)
Digital Enterprise Research Institute www.deri.ie
SBE Graph Extraction Tool
![Page 27: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/27.jpg)
Digital Enterprise Research Institute www.deri.ie
Extraction Pipeline Architecture
Subject Predicate Object Prepositional phrase & Noun complement Reification Time
![Page 28: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/28.jpg)
Digital Enterprise Research Institute www.deri.ie
Preliminary Evaluation
1033 relations (triples) from 150 sentences from 5 randomly selected Wikipedia articles
Manually classified the graphs: error categories and accuracy.
![Page 29: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/29.jpg)
Digital Enterprise Research Institute www.deri.ie
Preliminary Evaluation
![Page 30: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/30.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 31: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/31.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 32: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/32.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 33: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/33.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 34: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/34.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 35: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/35.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 36: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/36.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 37: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/37.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 38: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/38.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 39: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/39.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 40: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/40.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 41: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/41.jpg)
Digital Enterprise Research Institute www.deri.ie
Other Extraction Examples
![Page 42: A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia](https://reader036.vdocuments.us/reader036/viewer/2022062312/554e9aa1b4c90526358b53a4/html5/thumbnails/42.jpg)
Digital Enterprise Research Institute www.deri.ie
Conclusion
Main direction for improvement is completeness Aligned with the pay-as-you-go scenario
Still need to define clear criteria for what you can’t extract There is still a long way to go (e.g. complex subordination) Investigation using existing n-ary relations patterns Context (reification) should be a first-class citizen in the
representation of natural language Focus on getting the semantic pivots (rigid designators)
right Worth putting effort on enumerable patterns (timestamps,
operators)