schema free querying of semantic data
DESCRIPTION
Schema Free Querying of Semantic Data. Lushan Han Advisor: Dr. Tim Finin May 23, 2014. Introduction Related Work SFQ Interface Schema Network and Association Models Query Interpretation Evaluation Conclusion. Road Map. Part 1. Introduction. Semantic Data. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/1.jpg)
Schema Free Querying of Semantic Data
Lushan Han Advisor: Dr. Tim Finin
May 23, 2014 1
![Page 2: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/2.jpg)
Introduction Related Work SFQ Interface Schema Network and Association Models Query Interpretation Evaluation Conclusion
Road Map
2
![Page 3: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/3.jpg)
Part 1. Introduction
3
![Page 4: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/4.jpg)
Semantic Data
A network of entities, which are annotated with types and interlinked with properties.
Increasing amount of Semantic Data
Examples: RDF semantic data LOD DBpedia Freebase
4
![Page 5: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/5.jpg)
Objectives
Develop schema-free query interfaces Works with “semantic data” in many forms, e.g., RDF, Freebase,
RDBMS Allow casual users to freely query semantic data without learning
its schema Queries should be in the user’s conceptual world
Two existing interfaces: Natural Language Interface (NLI) Keyword Interface
Three hard problems
5
![Page 6: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/6.jpg)
P1. No Practical Interface
Natural language interface NLP techniques are still not reliable to parse out the full relational
structure from natural language questions
Keyword interface Ambiguity and limited expressiveness
(e.g. “president children spouse”)
(e.g. Who was the author of the Adventures of Tom Sawyer and where was he born?)
6
![Page 7: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/7.jpg)
SFQ Interface
Still in the user’s conceptual world Make implicit structure of NL questions explicit
Who was the author of the Adventures of Tom Sawyer and where was he born?
7
![Page 8: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/8.jpg)
P2. Semantic Heterogeneity Problem
Many different ways to express (model) the same meaning
Vocabulary and structure mismatches between the user’s query and the machine’s representation
Existing methods: Labor-intensive and ad-hoc methods
Domain-specific syntactic or semantic grammars Mapping Lexicons (Mapping rules) Templates
Thesaurus (e.g. WordNet) is insufficient
8
![Page 9: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/9.jpg)
P2. Examples
9
![Page 10: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/10.jpg)
P2. More Examples
4 5
10
![Page 11: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/11.jpg)
A purely computational approach
Lexical Semantic similarity Measures Capture flexible semantics
Statistical Association Measures Carry out disambiguation
A novel “overall semantic similarity” or fitness metric that combines Lexical semantic similarity measures statistical association measures structure features
Context-sensitive mapping algorithms
11
![Page 12: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/12.jpg)
P3. Heterogeneous or unknown schema
Hard to reach consensus on a schema for the world
Open domain semantic data has heterogeneous or even unknown schema (e.g. Semantic Web data, DBpedia)
Traditional NLI systems are difficult to apply
Some modern systems Not produce formal queries (e.g. SQL or SPARQL). Directly search into the entity network for matchings
Computationally expensive and has ad-hoc natures
12
![Page 13: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/13.jpg)
The schema network
Learn a schema statistically from the entity network by exploiting co-occurrences. The schema itself is also represented as a network
Mapping the user’s query into the schema network, instead of the entity network. Much more scalable Produce formal queries Enable joint disambiguation and context-sensitive mapping
algorithm
13
![Page 14: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/14.jpg)
Thesis Statement
We can develop an effective and efficient algorithm to map a casual user's schema-free query into a formal knowledge base query language that overcomes vocabulary and structure mismatch problems by exploiting lexical semantic similarity measures, association degree measures and structural features.
14
![Page 15: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/15.jpg)
Contributions
An intuitive SFQ interface that avoids the problem of extracting relations structure from NL queries
Novel algorithms mapping SFQ queries to KB queries addressing both vocabulary and structure mismatches
A novel approach to handle heterogeneous or unknown schemas by building a schema from an entity network
Define the probability of observing a path in a schema network and develop two novel statistical association models
An improved PMI metric and new semantic text similarity measures and algorithms
15
![Page 16: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/16.jpg)
Part 2. Related Work
16
![Page 17: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/17.jpg)
Natural Language Interface to Database (NLIDB) Systems
Early Systems in 70s, (e.g. LUNAR and LADDER) Domain-specific syntactic or semantic grammars Heavily customized to a particular application
Later systems in 80s and 90s. (e.g. TEAM, ASK, MASQUE) More general parser Require human-crafted lexicons, mapping rules and domain
knowledge to interpret the parse tree Allow knowledge engineers or end users to enrich lexicons and
add new mapping rules through an interactive interface More portable than early systems
17
![Page 18: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/18.jpg)
Recent NLI SystemsSystem Data NL Parsing Vocabulary
MismatchStructureMismatch
Auto-matic
Limitations
PRECISE DB Tokenizer to get a collection of tokens Lexicon
bipartite matching
Yes• Very restricted domains
SCISSOR KB Semantic parser Machine learning Yes • Very restricted domains• Manually annotated training data
NaLIX XML Dependency parser to get adjacent tokens Lexicon
Adjacency matching
No• Restricted domains
ORAKEL RDF Syntactic parser to get logical lambda-calculus query Lexicon Lexicon No • Restricted domains
• Simple NL questions
FREyA RDF Syntactic parser to get a collection of terms Lexicon No No • Restricted domains
Aqualog RDF Shallow parsing and pattern rules to get relations Lexicon No No • Restricted domains
PANTO RDFSyntactic parser and a head-driven algorithm to get relations
Lexicon No Yes• Very restricted domains• Simple NL questions
True Knowledge(Evi)
KB1,200 templatesA very large repository of query rephrasing
Lexicon, 1,200 templates and a very large repository of query rephrasing
Yes
• Extremely laborious
PowerAqua RDF Shallow parsing and pattern rules to get relations Lexicon
partial matching
Yes• Directly match into the entity network• Not produce formal queries
Treo RDF Syntactic parser to get a ordered list of terms
Semantic similarity
No Yes • Directly match into the entity network• Not produce formal queries• Queries must be a single path • Produce no exact answers but triple paths that may contain the answers
18
![Page 19: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/19.jpg)
Part 3. SFQ Interface
19
![Page 20: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/20.jpg)
SFQ Examples1. Where was the author of the Adventures of Tom Sawyer born?
2. Give me authors in the CIKM conference
3. A more complicated one
20
![Page 21: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/21.jpg)
Default Relations
The relation name can be left out
A stop word list for filtering relation names with words like in, of, has, from, belong, part of, locate and etc.
21
![Page 22: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/22.jpg)
Envisioned Web Interface
22
![Page 23: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/23.jpg)
Output (1)
23
![Page 24: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/24.jpg)
Output (2)
24
![Page 25: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/25.jpg)
Part 4. Schema Network and Association Models
25
![Page 26: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/26.jpg)
Instance Data (ABox)
Two datasets The relation dataset (all relations between instances) The type dataset (all type definitions for instances)
Integrate all RDF data types into five types that are familiar to users ˆNumber, ˆDate, ˆYear, ˆText and ˆLiteral ˆLiteral is the super type of the other four
We use DBpedia for examples in the following slides
26
![Page 27: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/27.jpg)
Automatically enrich the set of types
Automatically deduce types from relations Infer attribute types from data type properties
e.g. <Beijing>, population, “20693000” => ˆPopulation
Infer classes from object properties e.g. < Zelig>, director, <Woody Allen> => ˜Director
27
![Page 28: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/28.jpg)
Counting Co-occurrence
28
![Page 29: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/29.jpg)
The Schema Network
A statistical meta description of the underlying entity network, which is a network itself.
29
![Page 30: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/30.jpg)
The Schema Path
A path on the schema network is called a schema path
A schema path P represents a composite relation
Example 1.
Example 2.
30
![Page 31: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/31.jpg)
The Schema Path Probability
Measure the reasonableness of a path
The probability of “observing” a path on the schema network
(A1) we select the starting node c0 of the path randomly from all the nodes in the schema network
(A2) observe the path in a random walk starting with c0
31
![Page 32: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/32.jpg)
Compute Transition Probability
0 ≤ ≤ 1
32
![Page 33: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/33.jpg)
A Property about Schema Path
A schema path P and its return path P’ represent the same relation.
Given a schema path P and its return path P’ we have P(P ) = P(P’).
P
P’
33
![Page 34: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/34.jpg)
Schema Path Model
Supposed to store and index all the schema paths with a length no larger than a given threshold and their probabilities
The only supported function is to return all the schema paths and their probabilities between two given classes.
Put in memory for fast computation
34
![Page 35: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/35.jpg)
Schema Path Model Optimization
35
![Page 36: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/36.jpg)
Concept Path
Group all the edges with the same direction between two nodes into a single edge
By analogy to schema path, we have concept path probability
Concept path frequency
36
![Page 37: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/37.jpg)
Concept Association Knowledge (CAK) model
Pairwise associations (i) direct association between classes and properties (ii) indirect association between two classes
PMI measure
Our improved PMI measure
37
![Page 38: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/38.jpg)
Concept Association Knowledge (CAK) model
Direct association between a directed class and a property p
Indirect association between two directed classes
38
![Page 39: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/39.jpg)
CAK Examples
39
![Page 40: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/40.jpg)
PMI* vs PMIThe most associated property for “Person” in DBpedia
PMI* PMI
40
![Page 41: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/41.jpg)
Part 5. Query Interpretation
41
![Page 42: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/42.jpg)
SFQ Interpretation
42
![Page 43: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/43.jpg)
Two Phase Mapping Algorithm
43
![Page 44: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/44.jpg)
Generating Candidates via Lexical Semantic Similarity
![Page 45: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/45.jpg)
Disambiguation via Optimization
![Page 46: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/46.jpg)
Concept Mapping Optimization Problem
46
![Page 47: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/47.jpg)
A joint disambiguation example
![Page 48: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/48.jpg)
Time Complexity of Concept Mapping Algorithm
A straightforward concept mapping algorithm
After exploiting locality – the optimal mapping choice of a property can be determined locally when the two classes it links are fixed
48
![Page 49: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/49.jpg)
Relation Mapping Optimization Problem
H* : the set of top k3 concept mapping hypotheses The reduced mapping space for the SFQ
The optimization problem
49
![Page 50: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/50.jpg)
Computing the fitness of a mapping σ on a relation r
Let
Two features and one parameter β Joint lexical semantic similarity between and P The schema path frequency of P The parameter β adjusts the relative importance of the two
features
50
![Page 51: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/51.jpg)
Align terms in P to terms in r
The relation
The path C = <c0, c1, …, cl-1, cl>
P = <p0, p1, …, pl-1>
We already know and are paired with c0 and cl
We ignore all the intermediate classes c1, …, cl-1 Semantics in c1, …, cl-1 is overlapped with that in p0, p1, …, pl-1
the less terms we join, the less likely errors can occur
The only unaligned terms are and p0, p1, …, pl-1
51
![Page 52: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/52.jpg)
Semantic Stretch and Heterogeneous Alignments
52
![Page 53: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/53.jpg)
Cutting Function and Cutting Objective Function
Each cutting defines a function, referred to as ω
The product of similarity of the pairs in the minimum pair set that covers and every
53
![Page 54: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/54.jpg)
Cutting Optimization Problem and a Greedy Algorithm
Cutting Optimization
The cutting space Ω has a size of Total running time
Greedy Algorithm: SmartCutter that run in First find the property in P that is the most similar to and
assume it is in the predicate region. Stretch to the left until we meet a property u that is more similar
to the and stretch to the right until meeting a property v that is more similar to the
54
![Page 55: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/55.jpg)
Joint Lexical Semantic Similarity
tends to be biased towards small l, short paths. α is a parameter in the range [0..1].
High similarities in the subject and object regions but low similarities in the predicate region can still have a fairly high
The joint lexical semantic similarity between and P
55
![Page 56: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/56.jpg)
Deal with Default Relation
Combining and
θ is a parameter in the range [0, ).
56
![Page 57: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/57.jpg)
Formal Query Generating and Entity Matching
57
![Page 58: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/58.jpg)
Part 5. Evaluation
58
![Page 59: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/59.jpg)
Evaluation Settings Two very different datasets
DBLP+ DBLP augmented with data from CiteSeerX and ArnetMiner (narrow domain) DBpedia Structured data in Wikipedia (open domain)
Three similarity measures LSA semantic similarity (purely statistical) Hybrid semantic similarity (LSA + WordNet) String similarity (bigrams + Dice coefficient)
Performance metrics Mean Reciprocal Rank (how high the first correct interpretation are in the top-10 list) Mean Precision and Recall (evaluate the answers produced by the SPARQL queries)
Test environment PC with 2.33GHz Intel Core2 CPU and 8GB memory
59
![Page 60: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/60.jpg)
DBLP+ Dataset Statistics
60
![Page 61: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/61.jpg)
Degree of connectivity 18 x 18 class pairs resulted from pairing every C with every C
61
Degree of connectivity Degree of connectivity
distribution of connectivity degree when distance = 1 distribution of connectivity degree when distance ≤ 3
Num
ber
of c
lass
pai
rs
Num
ber
of c
lass
pai
rs
![Page 62: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/62.jpg)
DBLP+ Query Set 64 test questions
31 Direct Single (DS) questions (e.g. Give me author x of the paper y ) 15 Indirect Single (IS) questions (e.g. Show person x who cites the person y ) 8 Direct Multiple (DM) questions (e.g. List person x who published the book y with
ISBN z ) 10 Indirect Multiple (IM) questions (e.g. List the institutions u of the author y with
whom the person x from the organization z has co-authored )
Rephrased to 220 SFQ queries for example, rephase “Give me author x of the paper y” to seven SFQ queries
62
![Page 63: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/63.jpg)
Resolving Parameters
Use sufficiently large numbers to set k1, k2 and k3
k1 = 10 (the size of the class candidates list )
k2 = 20 (the size of the property candidates list )
k3 = 40 (the number of top hypotheses returned by the concept mapping phase)
Resolving α, β, γ, and θ First tune α and γ while fixing β = 0 and θ = 1 Next, tune β while still fixing θ = 1 Last, tune θ
63
![Page 64: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/64.jpg)
Results of Tuning Parameters
Top-10 coverage of 220 SFQ hybrid 99.5% LSA 98.2% string 56.4%
64
![Page 65: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/65.jpg)
Cross-Validation
Using all the queries:
65
![Page 66: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/66.jpg)
DBpedia Dataset Statistics
66
![Page 67: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/67.jpg)
Degree of connectivity 249 x 249 class pairs resulted from pairing every C with every C
67
connectivity degree among 249 classes when distance ≤ 2connectivity degree among 249 classes when distance = 1
degr
ee o
f co
nnec
tivity
degr
ee o
f co
nnec
tivity
![Page 68: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/68.jpg)
DBpedia Query Set 2011 QALD (QA over Linked Data) workshop
50 training and 50 test questions on DBpedia 3.6 ground truth answers
33 questions from 50 QALD test questions that can be answered using only the DBpedia ontology Modify 7 questions due to unsupported operations and 1 question due to data
issue but we preserve the relational structure and vocabulary of the questions 27 DS questions (e.g. Which river does the Brooklyn Bridge cross?) 6 DM questions that contains two relations (e.g. Give me the official websites of
actors of the television show Charmed. )
Three graduate students who are unfamilar with DBpedia independently translated them into 99 SFQ queries .
68
![Page 69: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/69.jpg)
Results
Top-10 coverage of 99 SFQ hybrid 88.9% LSA 82.8% string 51.5%
The coverage has an upper limit 91.9% 5 test cases due to ambiguty.
Translators’ interpretation changed the questions. 3 test cases due to an incorrect property name.
69
![Page 70: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/70.jpg)
Generating SPARQL queries and answers
Use a non-empty strategy to automatically generate answers for a SFQ query Run SPARQL generated for the best interpretation of a SFQ query. If an empty result is returned, go to next interpretation and so on.
Results on 99 SFQ queries (33 NL questions) using the parameters learned on the DBLP+ dataset.
70
![Page 71: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/71.jpg)
Compare with QALD Systems
Compare with two QALD systems on 30 test questions
Three questions are excluded because we made them easier by dropping the aggregation functions.
Among 30 questions, PowerAqua modified 8 questions and FREyA modified 4 questions.
71
Our system (hybrid) Our system (LSA) FREyA PowerAqua
![Page 72: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/72.jpg)
Compare with QALD Systems
Compare with two QALD systems on 6 two-relation questions
Our Differences Both PowerAqua and FREyA use lexicons FREyA highly depends on the user’s interaction to perform mappings PowerAqua and FREyA tuned their systems on 50 training questions Both PowerAqua and FREyA use TBox data of DBpedia Ontology
72
Our system (hybrid) Our system (LSA) FREyA PowerAqua
![Page 73: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/73.jpg)
Compare with Online Systems
Compare with two online systems on 33 test questions
Both True Knowledge and PowerAqua online systems include DBpedia data as part of their knowledge base.
73
Our system (hybrid) Our system (LSA) True Knowledge PowerAqua
![Page 74: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/74.jpg)
Running time Comparison
QALD reported systems FREyA 36 seconds per question PowerAqua N/A
Online systems True Knowledge a few seconds PowerAqua 143.7 seconds
Our systems Hybrid 0.721 seconds LSA 0.766 seconds
Both True Knowledge and PowerAqua online systems include DBpedia data as part of their knowledge base.
74
![Page 75: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/75.jpg)
Conclusion and Future Work The schema-free structured query approach allows people to query
semantic data without mastering formal queries or acquiring detailed knowledge of the classes.
Our system uses statistical data about lexical semantics and semantic data to generate most appropriate formal queries from a user’s intuitive query.
Our evaluation showed that the approach was both effective and efficient for two very different, large datasets
Our next step is to make the approach easier to apply to new RDF data collection and to a large LOD cloud and develop the envisioned web interface
75
![Page 76: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/76.jpg)
Contributions The SFQ interface that works around the unsolved problem of parsing full
relational structure from natural language queries.
Novel context-sensitive and fully computation-based mapping algorithms that address both vocabulary and structure mismatch problems.
A novel approach to build a schema network from the entity network to deal with heterogenous or unknown schemas
Define the probability of observing a path on the schema network and develop two novel statistical association models
Improve a popular statistical association measure, PMI
Develop state of art and novel semantic simialrity measures
76
![Page 77: Schema Free Querying of Semantic Data](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815874550346895dc5d309/html5/thumbnails/77.jpg)
End
Thank you!!!Questions?
77