swoogle swoogle semantic search engine web-enhanced information management bin wang

24
Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Post on 22-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Swoogle

Semantic Search Engine

Web-enhanced Information Management

Bin Wang

Page 2: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Outline

Background IntroductionSemantic WebSemantic Search

Swoogle – Semantic Search EngineSwoogle ArchitectureSemantic Web documentsFinding SWDsRanking SWDsSwoogle Indexing and Retrieval

Conclusion

Page 3: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Background Introduction

What is Semantic Web?An evolving development of WWW.

The semantics of information and services in the web is well-defined.

It makes it possible for web to understand and satisfy the requests of people and machines to use the web content.

Page 4: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Background Introduction

What is Semantic Web?

The Semantic Web Layers

Page 5: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Background Introduction

What is Semantic Search?A set of techniques on the management of

documents, especially semantically supported document retrieval.

Two forms of Search: Navigational Search, Research Search; Semantic Search belongs to the second category.

It attempts to augment and improve traditional search results by using data from Semantic Web.

Page 6: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Swoogle – Semantic Search Engine

Swoogle – A crawler-based indexing and retrieval system for semantic web – RDF and OWL documents encoded in XML and N3

It automatically discovers SWDs, indexes the metadata and answers queries about it.

SWDs are characterized by semantic annotation and meaningful references to other SWDs; conventional search engines do not take advantage of these features.

Page 7: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Swoogle Search Interface

Developed by UMD

Page 8: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Activities that Swoogle can do

Finding appropriate ontologiesIt allows users to query for ontologies that

contain specified terms anywhere in the document. The ontologies returned are ranked by Ontology Rank algorithm.

Finding instance dataIt helps users to integrate data distributed on

the web.

Characterizing the Semantic WebIt reveals interesting structural properties

about the semantic web by extracting metadata and especially inter-document relations.

Page 9: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Swoogle Architecture

Four main components:

SWD discovery, metadata creation, data analysis and interface

Page 10: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Swoogle Architeture

SWD discovery component:discovers potential SWDs throughout the web keeps up-to-date information about SWDs.

Metadata creation component:generates objective metadata about SWDs at

both the syntax level and the semantic level.

Data analysis component:derives analytical reports, such as

classification of SWOs and SWDBs, rank of SWDs and IR index of SWDs

Interface component:

Page 11: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Semantic Web Documents(SWDs) A SWD is a document in a semantic

web language(based on RDF, e.g. RDFS, DAML+OIL, and OWL) that is online and accessible to web users and software agents.

There are two kinds of documents in SWDs:

SWOs (Semantic Web Ontology)SWDBs (Semantic Web Databases)

Page 12: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Semantic Web Documents(SWDs)

SWOs(Semantic Web Ontology)A SWD with a significant proportion of the

statements it makes define new terms or extend the definitions of terms defined in other SWDs.

SWDBs(Semantic Web Databases)A SWD without defining or extending a

significant number of terms.

Page 13: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Finding SWDs

Develop a Google Crawler to search URLs using the Google Web Service.

starts with type extensions(e.g. .rdf, .owl, .daml, and .n3, good SWD indicators )

Develop a Focused Crawler to crawl documents within a given website.

only crawls URLs relative to the given base URL

invites SW community to submit the URLs

Page 14: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Finding SWDs

Develop the JENA2 based Swoogle Crawler.

It verifies if a document is a SWD or not

It revisits discovered URLs to check updates

Some heuristics are used to discover new SWDs through semantic relations.

--A URIref is highly likely to be the URL of a SWD.

--OWL: imports links to an external ontology, which is a SWD.

--etc. .

Page 15: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

SWD Metadata

It is collected to make SWD search more efficient and effective.

It is derived from the content of SWDs as well as the relations among SWDs.

Swoogle identifies three categories of metadata:

Basic metadata;Relations;Analytical results such as SWO/SWDB

classification, and SWD ranking.

Page 16: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

SWD Metadata

1. Basic metadataIt considers the syntactic and semantic

features of a SWD.Language feature It refers to the properties describing the

syntactic or semantic features of a SWD.RDF statisticsIt refers to the properties summarizing node

distribution of the RDF graph of a SWD.Ontology annotationIt refers to the properties that describe a

SWD as an ontology.

Page 17: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

SWD Metadata

2.Relations among SWDs

Swoogle focuses on SWD level relations which generalize RDF node level relations. The following relations are captured:•TM/IN: captures term reference relations between two

SWDs;

•IM: shows that an ontology imports another ontology;

•EX: shows that an ontology extends another ontology;

•PV: shows that an ontology is a prior version of another;

•CPV: shows that an ontology is a prior version of and is compatible with another;

•IPV: shows that an ontology is a prior version of and is incompatible with another.

Page 18: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Ranking SWDs

Rational Random Surfer

A user will arrive at a given page

->by directly addressing it

->by following one of the links pointing to it;

Different links may stand for different relations, thus have different weights.

Jump to arandom page

Follow arandom link

bored?

SWO?

no

yes

no

yes

Explore all linked

SWOs

Page 19: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Ranking SWDs

Rational Random Surfer - raw rank

T(x): the set of SWDs that x links to;

L(a): the set of SWDs that links to a;

d: a damping factor, typically set to 0.85.

)( )(

),()()1()(

aLx xf

axfxrawPRddarawPR

),(

)(),(axlinksl

lweightaxf

)(

),()(xTa

axfxf

Page 20: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Ranking SWDs

Rational Random Surfer – final rank

TC(A) is the transitive closure of SWOs imported by a.

Swoogle computes the rank for SWDBs using the first one, and computes the rank for SWOs using the sec one.

)()( ArawPRAPR

)(

)()(ATCXi

XirawPRAPR

Page 21: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Swoogle Indexing and Retrival

Swoogle adapts the Sire, a custom indexing and retrieval engine:

It employs a TF/IDF model with a standard cosine similarity metric.

It indexes discovered documents by using either character N-Gram or URIrefs as keywords to find relevant documents and to compute the similarity among a set of documents.

Page 22: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Conclusion

Introduces a prototype crawler-based indexing and retrieval system for Semantic Web documents.

One of the interesting properties computed for each SWD is its ontology rank. Here it uses the rational surfing model, different from what is used in conventional search engine.

Page 23: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

References

Li Ding , Tim Finin , Anupam Joshi , Rong Pan , R. Scott Cost , Yun Peng , Pavan Reddivari , Vishal Doshi , Joel Sachs, Swoogle: a search and metadata engine for the semantic web, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA

R. Guha , Rob McCool , Eric Miller, Semantic search, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary

Berners-Lee, Tim; James Hendler and Ora Lassila (May 17, 2001). "The Semantic Web". Scientific American Magazine.

Page 24: Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

Thank You!