dbtrends exploring query logs for ranking rdf data · 2019-09-19 · edgard marx, gustavo publio...
TRANSCRIPT
CACAOConditional Spread Activationfor Keyword Query Interpretation
15th Internation Conference on Semantic WebKarlsruhe 2019
Edgard Marx, Gustavo Publio & Thomas Riechert
LiberAI
2
openQA
*https://github.com/AKSW/openQA
Previous Works
LiberAI
3
KBox
KBox
Is the Knowledge
Graph in KBox?
YES
NOResolve
User provides SPARQLquery and the target Knowledge Graph URI
ExecuteSPARQL Query
DDNS
User/Application
SELECT ?s ?p ?o ...
+http://knowledgegraph.uri
Dereference
*https://github.com/AKSW/KBoxLiberAI
Previous Works
4
NSpM
*https://github.com/AKSW/NSpMLiberAI
Previous Works
5
DBNQA
*https://github.com/AKSW/DBNQA
https://wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
Previous Works
LiberAI
Outline
• Motivation
• Problem Statment
• Background
• Approach
• Evaluation
• Counclusion & Future Works
6LiberAI
Resource Description
Framework (RDF)
Unstructured
Linked Data
Structured
Crowd
Linked Data
LiberAI
7
Motivation
8
• Entity Retrieval• Question Answering• Entity Linking
Linked Data
Resource Description
Framework (RDF)Access (the information needed)
LiberAI
Motivation
9
Resource Description
Framework (RDF)Entity Retrieval
Formally, a top-k entity retrieval takes a keyword query Q,
an integer 0 < k, a set of entities E={e1, e2, . . . , e|E|},
and returns the top-k entities based on
a scoring function S(Q, e)
LiberAI
Problem statement
10
Resource Description
Framework (RDF)Entity Retrieval
Select * where {
?s dbo:birthPlace dbpedia:Leipzig
}
LiberAI
Problem statement
TF-IDF
11
𝑤𝑡, 𝑑= 𝑡𝑓𝑡, 𝑑 ∙ 𝑙𝑜𝑔
|𝐷|
| 𝑑′ ∈ 𝐷 𝑡 ∈ 𝑑′
LiberAI
Background
BM25
12
"This can opener can open any can that a can opener can and if this can opener cannot open any can that a can opener can, then this can opener is free"
“If this can opener does not open any can, then you can take it for free”
LiberAI
Background
BM25
13LiberAI
Background
BM25F
14LiberAI
Background
Field Weighted
15LiberAI
weight(p1) > weight(p2) > weight(p3) …
Background
Learning to Rank
16LiberAI
Background
17LiberAI
Entity Link in Queries
Entity Linking
Query
Entity Candidate Generation
Ranking
Entities
Query Expansion
Query
Linked Entities
Entity Retrieval
Background
General IR Architecture
18
Entity Retrieval, Entity Linking, Question Answering
Query
Entity Candidate Generation
Ranking
Result
Query Expansion
LiberAI
Background
Interaction
19
Entity Retrieval Entity Linking Question Answering
Query
Entity Candidate Generation
Ranking
Entities
Query Expansion
Query
Entity Candidate Generation
Ranking
Linked Entities
Query Expansion
Query
Entity Candidate Generation
Ranking
Answer
Query Expansion
LiberAI
Background
Lastest Architecture
20
Entity Retrieval Entity Linking
Query
Entity Candidate Generation
Ranking
Entities
Query Expansion
Query
Entity Candidate Generation
Ranking
Linked Entities
Query Expansion
LiberAI
Background
Lastest Architecture
21LiberAI
Background
Current Scenario
22
Entity Retrieval Entity Linking Question Answering
Query
Entity Candidate Generation
Ranking
Entities
Query Expansion
Query
Entity Candidate Generation
Ranking
Linked Entities
Query Expansion
Query
Entity Candidate Generation
Ranking
Answer
Query Expansion
LiberAI
Background
Return linked resources
23
Entity Retrieval Entity Linking Question Answering
Query
Entity Candidate Generation
Ranking
Linked Resources
Query Expansion
Query
Entity Candidate Generation
Ranking
Linked Entities
Query Expansion
Query
Entity Candidate Generation
Ranking
Answer
Query Expansion
LiberAI
1
Proposal
Return the answer
24
Entity Retrieval Entity Linking Question Answering
Query
Entity Candidate Generation
Ranking
Linked Resources +
Answer
Query Expansion
Query
Entity Candidate Generation
Ranking
Linked Entities
Query Expansion
Query
Entity Candidate Generation
Ranking
Answer
Query Expansion
LiberAI
1
2
Proposal
Challenge 1: *Local Term Frequency
25LiberAI
*not to confuse with global
dbr:Albert_Leadbr:Albert_Lea_Public_Library …
18
1
“Albert Lea …”
“Albert Lea …”
“Albert Lea …” …
7
1
“Albert Lea …”
“Albert Lea …”
“Albert Lea …”
Q: “Albert Lea”
Challenge 2: Cohesion
26LiberAI
Q: “Albert Lea”
dbr:Albert_Leadbr:Albert_Lea_Public_Library …
18
1
“Albert Lea …”
“Albert Lea …”
“Albert Lea …” …
7
1
“Albert Lea …”
“Albert Lea …”
“Albert Lea …”
A B
27LiberAI
Q: “Albert Lea”
dbr:Albert_Leadbr:Albert_Lea_Public_Library …
18
1
“Albert Lea …”
“Albert Lea …”
“Albert Lea …” …
7
1
“Albert Lea …”
“Albert Lea …”
“Albert Lea …”
Preposition 1 [1][2]: maximize the number of tokens and reduce the number of mapped entities
A B
[1] SWJ, Shekarpour et al. 2015, [2] NAACL, Sakor et al. 2019
Challenge 2: Cohesion
28LiberAI
Q: “Albert Lea”
dbr:Albert_Leadbr:Albert_Lea_Public_Library
18“Albert Lea …”
7“Albert Lea …”
Proposition 1 [1][2]: maximize the number of tokens and reduce the number of mapped entities
A B
Does NOT satisfy: Either A and B comply with the proposition 1
[1] SWJ, Shekarpour et al. 2015, [2] NAACL, Sakor et al. 2019
Challenge 2: Cohesion
29LiberAI
Q: “Albert Lea”
dbr:Albert_Leadbr:Albert_Lea_Public_Library …
18
1
“Albert Lea …”
“Albert Lea …”
“Albert Lea …” …
7
1
“Albert Lea …”
“Albert Lea …”
“Albert Lea …”
Proposition 2 [3][4]: use the context
A B
[3] ISWC, Usbeck et al. 2014, [4] KCap, Moussallem et al. 2017
Challenge 2: Cohesion
30LiberAI
Q: “Albert Lea”
Proposition 2 [3][4]: use the context
Does NOT satisfy: Either A and B comply with preposition 2
[3] ISWC, Usbeck et al. 2014, [4] KCap, Moussallem et al. 2017
dbr:Albert_Leadbr:Albert_Lea_Public_Library
18“Albert Lea …”
7“Albert Lea …”
A B
Challenge 2: Cohesion
31LiberAI
Q: “strawberry ice cream”
Proposition 3 [5][6]: Max Similarity (Levenshtein and Jaccard)
dbr:Strawberry_ice_cream
“Strawberry_ice_cream”A
dbr:Banana_split
“strawberry”B
“ice cream”
[5] AAAI, Zhang et al. 2016, [6] VLDB, Zheng et al. 2019
Challenge 2: Cohesion
32LiberAI
Q: “strawberry ice cream”
Proposition 3 [5][6]: Max Similarity (Levenshtein and Jaccard)
Does NOT satisfy: Either A and B comply with preposition 3
[5] AAAI, Zhang et al. 2016, [6] VLDB, Zheng et al. 2019
dbr:Strawberry_ice_cream
“Strawberry_ice_cream”A
dbr:Banana_split
“strawberry”B
“ice cream”
1
1/3
2/3
Challenge 2: Cohesion
Challenge 3: Scoring Function
33LiberAI
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
BM25F
Score
Where is the answer?
Approach
34LiberAI
Can we do better?
Conditional Spread Activation
35LiberAI
dbr:Albert_Leadbr:Albert_Lea_Public_Library …
18
1
“Albert Lea …”
“Albert Lea …”
“Albert Lea …” …
7
1
“Albert Lea …”
“Albert Lea …”
“Albert Lea …”
s = 0
s = 0s > 0
s > 0
Q: “Albert Lea”Challenge 1: Local Term Frequency
A B
Approach
Field Weight
36LiberAI
dbr:Albert_Leadbr:Albert_Lea_Public_Library
1“Albert Lea …”
1“Albert Lea …”
s > 0 s > 0
Q: “Albert Lea”Challenge 1: Local Term FrequencyChallenge 2: Cohesion
weight(ptype) > weight(plabel) > weight(pothers)
A B
Approach
Field Weight
37LiberAI
dbr:Albert_Leadbr:Albert_Lea_Public_Library
1“Albert Lea …”
1“Albert Lea …”
Q: “Albert Lea”Challenge 1: Local Term FrequencyChallenge 2: Cohesion
𝑆 𝑄𝑖, 𝐿𝑖 = ∑𝑄𝑖𝐿𝑖 if ∑𝑄𝑖𝐿𝑖 = 1
A B
Approach
Field Weight
38LiberAI
dbr:Albert_Leadbr:Albert_Lea_Public_Library
1“Albert Lea …”
1“Albert Lea …”
1 > j(𝑄𝑖, 𝐿𝑖)
Q: “Albert Lea”Challenge 1: Local Term FrequencyChallenge 2: Cohesion
𝑆(𝑄𝑖, 𝐿𝑖) = ∑𝑄𝑖𝐿𝑖
22 = 4
A B
𝑆 𝑄𝑖, 𝐿𝑖 = ∑𝑄𝑖𝐿𝑖 if ∑𝑄𝑖𝐿𝑖 = 1
Approach
Field Weight
39LiberAI
Challenge 1: Local Term FrequencyChallenge 2: CohesionChallenge 3: Scoring Function
0
500
1000
1500
2000
2500
1 2 3 4 5 6
*PThe answer!!
Approach
Pipeline
40
Query
Entity Candidate Generation
Ranking
Result
Query Expansion
LiberAI
Evaluation
Benchmark
41LiberAI
Evaluation
DBpedia-Entity
42LiberAI
Evaluation
QALD-4
43LiberAI
Evaluation
Conclusion & Future Works
44LiberAI
• We have shown a promising algorithm that outperform state-of-the-art ER engines on standard benchmarks by addressing:
o Local Term Frequency;o Cohesion, and;o Score Function problems
• It is still too soon to trace conclusions (law of small numbers)achieves that• Improve the algorithm runtime so it can be used at scale
Acknowledgements
45
LiberAI
@theLiberAI @HTWKLeipzig @edgardmarx @akswgroup
Acknowledgements
46
@theLiberAI @HTWKLeipzig @edgardmarx @akswgroup
Thank you!!!* 100+ GitHub stars
* 40+ forks