01/06/15sergey chernov 1 extracting semantic relationships between wikipedia categories by sergey...
Post on 18-Dec-2015
222 views
TRANSCRIPT
April 18, 2023Sergey Chernov
1
Extracting Semantic Relationships between Wikipedia Categories
By Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou, Michal Kopycki, Przemyslaw Rys
April 18, 2023Sergey Chernov
2
Preliminaries
WIKIPEDIA: largest knowledge sharing system
Many pages assigned to CATEGORIES
All links are NAVIGATIONAL
Can we extract SEMANTIC links?
MOTIVATION
April 18, 2023Sergey Chernov
3
Wikipedia Categories ExampleMOTIVATION
April 18, 2023Sergey Chernov
4
Possible benefits
Semi-structured queries“find Countries which had Democratic Non-Violent Revolutions”
rephrased as
“find page from category Countries which is connected to some page in Non-Violent Revolutions”
Hints for authors
“you edit page from category Countries, do you want to add a link to page in category Capital?”
Raw data for manual semantic markup
MOTIVATION
April 18, 2023Sergey Chernov
5
Countries
HeuristicsExperiments
Denmark
Austria
CapitalsBerlin
Stockholm
Vienna
Germany
France Paris
Number of links
NL = 3
Connectivity Ratio
CR = 3/4 = 0.75
April 18, 2023Sergey Chernov
6
Dataset
INEX 2006 collection
Sample category rankings
Experiments
April 18, 2023Sergey Chernov
7
Manual assessment methodology
Semantic Connection Strength (SCS) Measure: 2 = strong semantic relationship, 1 = average semantic relationship, 0 = weak or no semantic relationship.
Instruction for Assessors
“category A is strongly related to category B (value 2) if you believe that every page in A should conceptually have at least one semantic link to B;”
“A and B are averagely related (value 1), if you believe 50% of pages in A should have semantic links to B;”
“otherwise, A and B are weakly related (value 0).”
April 18, 2023Sergey Chernov
8
Experiments with Number of Links
Average semantic connections strength for 100 sample categories, extracted using Number of Links.
Experiments
April 18, 2023Sergey Chernov
9
Experiments with Connectivity Ratio
Average semantic connections strength for 100 sample categories, extracted using Connectivity Ratio.
Experiments
April 18, 2023Sergey Chernov
10
General Results and Conclusions
Result is skewed toward Countries category
Connectivity Ratio is a better measure than Number of Links
We have observed that inlinks have better performance than outlinks.
Summary
April 18, 2023Sergey Chernov
11
Future Steps
More manual exploration, look for additional heuristics
Consider more categories
SCS composed of
Is this a “part of” relation? W1 Is this a “is a” relation? W2 Is this a “synonym” relation? W3 Is this a “antonym” relation? W4 It is related in a different way? Which one? W5
Summary
April 18, 2023Sergey Chernov
12
Thank You!