using the structure of dbpedia for exploratory search

Post on 21-Nov-2014

1.633 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation at Mining Data Semantics in Heterogeneous Networks Workshop at KDD 2013

TRANSCRIPT

Using the Structure ofDBpedia for ExploratorySearch

Speaker: Samantha LamSupervisor: Conor Hayes

Motivating Work

DBpedia - heterogeneous graph

2

Motivating Work

Background

Network Similarity: PathSim, NetClus, RankClus

Faceted Search: Facets for refining search

specific schema, (semi) supervised

→ good for search when user is familiar with query

→ ...but what about complete beginners?

→ Requires Exploratory Search – Unsupervised

3

Motivating Work

Background

Network Similarity: PathSim, NetClus, RankClus

Faceted Search: Facets for refining search

specific schema, (semi) supervised

→ good for search when user is familiar with query

→ ...but what about complete beginners?

→ Requires Exploratory Search – Unsupervised

3

Motivating Work

Background

Network Similarity: PathSim, NetClus, RankClus

Faceted Search: Facets for refining search

specific schema, (semi) supervised

→ good for search when user is familiar with query

→ ...but what about complete beginners?

→ Requires Exploratory Search – Unsupervised

3

Exploratory Search?

Given query, how to organise results in a manner that is ‘useful’,i.e. aids exploratory search

E.g. suppose you hear a song on the radio...

Solution:

Classify results according to its contexts

Why? Alleviates in-depth reading and guides user

4

Exploratory Search?

Given query, how to organise results in a manner that is ‘useful’,i.e. aids exploratory search

E.g. suppose you hear a song on the radio...

Solution:

Classify results according to its contexts

Why? Alleviates in-depth reading and guides user

4

Assumption

similarity ⊂ relatedness

5

Research Questions

1 Can we provide an effective graph-based framework that canaid exploratory search?

2 To do this, what is DBpedia’s graph structures wrt itsdifferent datasets?

6

DBpedia graphs summary

Infobox properties

emergent, crowd-sourcedheterogeneous ‘types’dense

Infobox ontology, SKOS/Wiki Category, YAGO

agreed rulesis-A structuresparse, tree-like

Infobox good forGGGGGGGGGGA Relatedness

Ontology good forGGGGGGGGGGA Labelling similar items

7

DBpedia graphs summary

Infobox properties

emergent, crowd-sourcedheterogeneous ‘types’dense

Infobox ontology, SKOS/Wiki Category, YAGO

agreed rulesis-A structuresparse, tree-like

Infobox good forGGGGGGGGGGA Relatedness

Ontology good forGGGGGGGGGGA Labelling similar items

7

Research Q1 Proposition

General Framework:

8

Sample Query & Results

Query: Lisa Hannigan

Two methods Weighted (W) and Uniform (U), 6 clusters

Cluster 1 (W, U) instruments

Top label: (W, U) Musical instruments

Cluster 2 (W) songs (U) album and songs

Top label: (W) Songs by artist (U) Albums by artist

Cluster 3 (W) albums (U) album, music genres and songs

Top label: (W) Albums by artist (U) Music subgenres by genre

9

Sample Query & Results

Query: Lisa Hannigan

Two methods Weighted (W) and Uniform (U), 6 clusters

Cluster 1 (W, U) instruments

Top label: (W, U) Musical instruments

Cluster 2 (W) songs (U) album and songs

Top label: (W) Songs by artist (U) Albums by artist

Cluster 3 (W) albums (U) album, music genres and songs

Top label: (W) Albums by artist (U) Music subgenres by genre

9

Sample Query & Results

Query: Lisa Hannigan

Cluster 4 (W) mixed, (U) mixed

Top label: (W) Songs by artist (U) Missing people

Cluster 5 (W) mixed, (U) mixed

Top label: (W) Albums by artist (U)Towns and villages in the Republic of Ireland by county

Cluster 6 (W) musicians and bands, (U) musicians and bands

Top label: (W) Place of birth missing (living people) (U)Place of birth missing (living people)

10

Sample Query & Results

Summary:

Weighted produced 4 out of 6 coherent clusters whereasUnweighted only produced 2.

DBpedia Ontology labelling (see paper) provided broaderlabelling for messier clusters, e.g. top label was MusicalWorkfor mixed clusters

→ Categories better for more specific clusters.

11

Ongoing Challenges

Evaluation

User Study:

- compare only Weighted versus Unweighted results,different labelling methods?

Comparison:

- possible to compare against other faceted methods?

- compare with plain list for recall?

12

Summary

Investigated graph structure of DBpedia datasets

Framework to utilise this finding in exploratory search, gaveexample results

Ongoing challenge, evaluation

Thanks for listening! Questions welcome!

13

Summary

Investigated graph structure of DBpedia datasets

Framework to utilise this finding in exploratory search, gaveexample results

Ongoing challenge, evaluation

Thanks for listening! Questions welcome!

13

top related