assessing linked data mappings using network measures
Post on 12-Sep-2014
1.754 Views
Preview:
DESCRIPTION
TRANSCRIPT
ESWC - May 2012 Assessing Linked Data mappings 1/25
Assessing Linked Data Mappings using Network Measures
Christophe Guéret, Paul Groth, Claus Stadler, Jens Lehmann
9th Extended Semantic Web Conference (ESWC)May 29, 2012
http://latc-project.eu http://www.vu.nlhttp://aksw.org
ESWC - May 2012 Assessing Linked Data mappings 2/25
The next 25+5 minutes
The impact of links in the Web of Data
Main questions
What is the impact of link creation?
Can we detect “bad” links based on their impact?
Is adding links always a good thing?
Contributions
A framework to assess the impact of links
Results for 5 metrics
ESWC - May 2012 Assessing Linked Data mappings 3/25
Is this a good or a bad link ?
ESWC - May 2012 Assessing Linked Data mappings 4/25
Measuring the Web of Data
Look at the topology using network analysis tools
Impossible to get the complete graph
Sampling of the graph focusing on specific nodes
See the bigger picture through aggregation
Build the local network around a resource
Repeat the process a sufficient number of time
ESWC - May 2012 Assessing Linked Data mappings 5/25
Network sampling process
Use SPARQL end point or de-reference the resources to get the descriptions
ESWC - May 2012 Assessing Linked Data mappings 6/25
Aggregation of local results
…
ObservedTarget
ESWC - May 2012 Assessing Linked Data mappings 7/25
Metrics
Compute local scores for a resource
Criteria
Use only the local network
Representative of a global property
Not sensitive to change of observation scale
5 metrics currently available in LinkQA
ESWC - May 2012 Assessing Linked Data mappings 8/25
What do we want to see?
Increase of connectivity within topical groups
Increase chances of finding related information
More bridges between topical groups
Improve browsing capabilities
More connectivity around hubs
Decrease the dependency upon the hubs
ESWC - May 2012 Assessing Linked Data mappings 9/25
Metric 1 – Degree
Metric
Number of edges around the target node
Target
Power-law distribution of values
Intuition
Presence of hubs
ESWC - May 2012 Assessing Linked Data mappings 10/25
Metric 2 – Clustering coefficient
Metric
Density of links around the target node
Target
Increase clustering around nodes
Intuition
Topical clusters
ESWC - May 2012 Assessing Linked Data mappings 11/25
Metric 3 – Centrality
Metric
Ratio between outgoing and incoming links
Target
Lower the discrepancy between the values
Intuition
Hubs are sensitive
ESWC - May 2012 Assessing Linked Data mappings 12/25
Metric 4 – SameAs chains
Metric
Number of “open” sameAs chains
Target
No open sameAs
Intuition
Peer agreement
ESWC - May 2012 Assessing Linked Data mappings 13/25
Metric 5 – Description enrichment
Metric
Richness of resource description
Target
Increase as possible
Intuition
“SameAsed” resources are complementary
ESWC - May 2012 Assessing Linked Data mappings 14/25
Under the hood of LinkQA
http://www.flickr.com/photos/cradlehall/5747161514
ESWC - May 2012 Assessing Linked Data mappings 15/25
Workflow of an analysis
ESWC - May 2012 Assessing Linked Data mappings 16/25
Output of an analysis
Results on the node and aggregated scale
Per metric:
Indication of change with respect to the target
Sorted list of outlier nodes, sorted by their distance to the target
Plus, a global ranking of nodes
=> Input for manual inspection by an expert
ESWC - May 2012 Assessing Linked Data mappings 17/25
Experimental results
ESWC - May 2012 Assessing Linked Data mappings 18/25
Global impact of links
Observe the distributions to detect bad links
ESWC - May 2012 Assessing Linked Data mappings 19/25
First evaluation
160 linking specifications for Silk, developed in the context of LATC
6 linking specifications with manual verification of results
50 positive links
50 negative links
Execute LinkQA with 10 samples of 50 links
ESWC - May 2012 Assessing Linked Data mappings 20/25
Results of the detection
“C” if change detected in > 50% of runs
ESWC - May 2012 Assessing Linked Data mappings 21/25
Some explanations
Low sensitivity of metrics:
Lack of data
Stable change
50/50 accuracy of detection:
Targets may not be the right ones
Sample may not be big enough
Semantics agnostic measures are less performant
ESWC - May 2012 Assessing Linked Data mappings 22/25
A closer look at the outliers
See if the outliers are necessarily bad links
ESWC - May 2012 Assessing Linked Data mappings 23/25
Second evaluation
Linking specifications for Silk, developed in the context of LATC
All linking specifications sampled to have
45 positive links
5 negative links
Execute LinkQA five time, on five samples
ESWC - May 2012 Assessing Linked Data mappings 24/25
Rank of positive and negative links
ESWC - May 2012 Assessing Linked Data mappings 25/25
Take home message
LinkQA is a node centric approach to measure the impact of links in the WoD network
Scalable, can be distributed
Current results show that
The 5 metrics defines are to be improved
Metrics considering Semantics perform better
The network sample seems too small
Outliers detection improves with the number of metrics
top related