crossing the vocabulary gap for querying complex and heterogeneous databases
DESCRIPTION
Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases: A Distributional-Compositional Semantics PerspectiveTRANSCRIPT
![Page 1: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/1.jpg)
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Crossing the Vocabulary Gap for Querying Complex and
Heterogeneous Databases:A Distributional-Compositional Semantics
Perspective
André Freitas, Sean O’Riain, Edward Curry
DEOS 2013, Oxford, UK
![Page 2: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/2.jpg)
Digital Enterprise Research Institute www.deri.ie
Big Data
Big Data: More complete data-based picture of the world.
![Page 3: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/3.jpg)
Digital Enterprise Research Institute www.deri.ie
Growing Schema Size
10s-100s attributes1,000s-1,000,000s attributes
Heterogeneous, complex and large-scale databases.
Very-large and dynamic “schemas”.
![Page 4: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/4.jpg)
Digital Enterprise Research Institute www.deri.ie
Growing Semantic Heterogeneity
Multiple perspectives (conceptualizations) of the reality.
Ambiguity, vagueness, inconsitency.
![Page 5: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/5.jpg)
Digital Enterprise Research Institute www.deri.ie
Problem
Structured queries are still the primary way to query databases.
![Page 6: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/6.jpg)
Digital Enterprise Research Institute www.deri.ie
Structured query
Schema size & heterogeneity
Query construction
time
HighLow
High
Low
10-100s attributes
103-106s attributes
![Page 7: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/7.jpg)
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to?
Schema-agnostic queries
Possible representations
![Page 8: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/8.jpg)
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to ?
Semantic Gap
Lexical-level
Abstraction-level
Structural-level
![Page 9: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/9.jpg)
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to ?
Semantic Gap
Lexical-level
Abstraction-level
Structural-level
Query:
Data
![Page 10: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/10.jpg)
Digital Enterprise Research Institute www.deri.ie
Solution: Schema-agnostic queries
Lexical-level
Abstraction-level
Structural-level
Distributional Semantics
Compositional Semantics
Based on the statistical analysis of large unstructured corpora
Query Processing and Planning
![Page 11: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/11.jpg)
Digital Enterprise Research Institute www.deri.ie
Statistical analysis
Datasets
![Page 12: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/12.jpg)
Digital Enterprise Research Institute www.deri.ie
Statistical analysis
Datasets
![Page 13: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/13.jpg)
Digital Enterprise Research Institute www.deri.ie
Core Elements of the Proposed Approach
Hybrid model database/IR/QA. Ranked query results. Existing IR approaches: traditional Vector Space
Models (VSMs) were not able to: (i) capture the structure of data. (ii) support a precise and comprehensive semantic
matching. A VSM supporting these two requirements was
formulated: Ƭ-Space. Ranking function based on a distributional
semantic relatedness measure.
![Page 14: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/14.jpg)
Digital Enterprise Research Institute www.deri.ie
Does it work?
DBpedia 3.7 + YAGO. 102 natural language queries (QALD 2011).
Entity-Attribute-Value (EAV) Dataset:
45,767 predicates5,556,492 classes
9,434,677 instances
![Page 15: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/15.jpg)
Digital Enterprise Research Institute www.deri.ie
![Page 16: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/16.jpg)
Digital Enterprise Research Institute www.deri.ie
![Page 17: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/17.jpg)
Digital Enterprise Research Institute www.deri.ie
![Page 18: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/18.jpg)
Digital Enterprise Research Institute www.deri.ie
Selected Publications
André Freitas, Edward Curry, João Gabriel Oliveira, João C. Pereira da Silva, Sean O'Riain, Querying the Semantic Web using Semantic Relatedness: A Vocabulary Independent Approach. Data & Knowledge Engineering (DKE) Journal, 2013. (Article). André Freitas, Fabricio de Faria, Sean O'Riain, Edward Curry, Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach, In Proceedings of the 36th Annual ACM SIGIR Conference, Dublin, Ireland, 2013. (Demonstration Paper in Proceedings).
André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends. IEEE Internet Computing, Special Issue on Internet-Scale Data, 2012 (Article).
André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A Distributional Structured Semantic Space for Querying RDF Graph Data. International Journal of Semantic Computing (IJSC), 2012 (Article).
![Page 19: Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e9af9b4c90573338b53a7/html5/thumbnails/19.jpg)
Digital Enterprise Research Institute www.deri.ie
http://treo.deri.ie