swoogle swoogle semantic search engine web-enhanced information management bin wang
Post on 22-Dec-2015
215 views
TRANSCRIPT
Swoogle
Semantic Search Engine
Web-enhanced Information Management
Bin Wang
Outline
Background IntroductionSemantic WebSemantic Search
Swoogle – Semantic Search EngineSwoogle ArchitectureSemantic Web documentsFinding SWDsRanking SWDsSwoogle Indexing and Retrieval
Conclusion
Background Introduction
What is Semantic Web?An evolving development of WWW.
The semantics of information and services in the web is well-defined.
It makes it possible for web to understand and satisfy the requests of people and machines to use the web content.
Background Introduction
What is Semantic Web?
The Semantic Web Layers
Background Introduction
What is Semantic Search?A set of techniques on the management of
documents, especially semantically supported document retrieval.
Two forms of Search: Navigational Search, Research Search; Semantic Search belongs to the second category.
It attempts to augment and improve traditional search results by using data from Semantic Web.
Swoogle – Semantic Search Engine
Swoogle – A crawler-based indexing and retrieval system for semantic web – RDF and OWL documents encoded in XML and N3
It automatically discovers SWDs, indexes the metadata and answers queries about it.
SWDs are characterized by semantic annotation and meaningful references to other SWDs; conventional search engines do not take advantage of these features.
Swoogle Search Interface
Developed by UMD
Activities that Swoogle can do
Finding appropriate ontologiesIt allows users to query for ontologies that
contain specified terms anywhere in the document. The ontologies returned are ranked by Ontology Rank algorithm.
Finding instance dataIt helps users to integrate data distributed on
the web.
Characterizing the Semantic WebIt reveals interesting structural properties
about the semantic web by extracting metadata and especially inter-document relations.
Swoogle Architecture
Four main components:
SWD discovery, metadata creation, data analysis and interface
Swoogle Architeture
SWD discovery component:discovers potential SWDs throughout the web keeps up-to-date information about SWDs.
Metadata creation component:generates objective metadata about SWDs at
both the syntax level and the semantic level.
Data analysis component:derives analytical reports, such as
classification of SWOs and SWDBs, rank of SWDs and IR index of SWDs
Interface component:
Semantic Web Documents(SWDs) A SWD is a document in a semantic
web language(based on RDF, e.g. RDFS, DAML+OIL, and OWL) that is online and accessible to web users and software agents.
There are two kinds of documents in SWDs:
SWOs (Semantic Web Ontology)SWDBs (Semantic Web Databases)
Semantic Web Documents(SWDs)
SWOs(Semantic Web Ontology)A SWD with a significant proportion of the
statements it makes define new terms or extend the definitions of terms defined in other SWDs.
SWDBs(Semantic Web Databases)A SWD without defining or extending a
significant number of terms.
Finding SWDs
Develop a Google Crawler to search URLs using the Google Web Service.
starts with type extensions(e.g. .rdf, .owl, .daml, and .n3, good SWD indicators )
Develop a Focused Crawler to crawl documents within a given website.
only crawls URLs relative to the given base URL
invites SW community to submit the URLs
Finding SWDs
Develop the JENA2 based Swoogle Crawler.
It verifies if a document is a SWD or not
It revisits discovered URLs to check updates
Some heuristics are used to discover new SWDs through semantic relations.
--A URIref is highly likely to be the URL of a SWD.
--OWL: imports links to an external ontology, which is a SWD.
--etc. .
SWD Metadata
It is collected to make SWD search more efficient and effective.
It is derived from the content of SWDs as well as the relations among SWDs.
Swoogle identifies three categories of metadata:
Basic metadata;Relations;Analytical results such as SWO/SWDB
classification, and SWD ranking.
SWD Metadata
1. Basic metadataIt considers the syntactic and semantic
features of a SWD.Language feature It refers to the properties describing the
syntactic or semantic features of a SWD.RDF statisticsIt refers to the properties summarizing node
distribution of the RDF graph of a SWD.Ontology annotationIt refers to the properties that describe a
SWD as an ontology.
SWD Metadata
2.Relations among SWDs
Swoogle focuses on SWD level relations which generalize RDF node level relations. The following relations are captured:•TM/IN: captures term reference relations between two
SWDs;
•IM: shows that an ontology imports another ontology;
•EX: shows that an ontology extends another ontology;
•PV: shows that an ontology is a prior version of another;
•CPV: shows that an ontology is a prior version of and is compatible with another;
•IPV: shows that an ontology is a prior version of and is incompatible with another.
Ranking SWDs
Rational Random Surfer
A user will arrive at a given page
->by directly addressing it
->by following one of the links pointing to it;
Different links may stand for different relations, thus have different weights.
Jump to arandom page
Follow arandom link
bored?
SWO?
no
yes
no
yes
Explore all linked
SWOs
Ranking SWDs
Rational Random Surfer - raw rank
T(x): the set of SWDs that x links to;
L(a): the set of SWDs that links to a;
d: a damping factor, typically set to 0.85.
)( )(
),()()1()(
aLx xf
axfxrawPRddarawPR
),(
)(),(axlinksl
lweightaxf
)(
),()(xTa
axfxf
Ranking SWDs
Rational Random Surfer – final rank
TC(A) is the transitive closure of SWOs imported by a.
Swoogle computes the rank for SWDBs using the first one, and computes the rank for SWOs using the sec one.
)()( ArawPRAPR
)(
)()(ATCXi
XirawPRAPR
Swoogle Indexing and Retrival
Swoogle adapts the Sire, a custom indexing and retrieval engine:
It employs a TF/IDF model with a standard cosine similarity metric.
It indexes discovered documents by using either character N-Gram or URIrefs as keywords to find relevant documents and to compute the similarity among a set of documents.
Conclusion
Introduces a prototype crawler-based indexing and retrieval system for Semantic Web documents.
One of the interesting properties computed for each SWD is its ontology rank. Here it uses the rational surfing model, different from what is used in conventional search engine.
References
Li Ding , Tim Finin , Anupam Joshi , Rong Pan , R. Scott Cost , Yun Peng , Pavan Reddivari , Vishal Doshi , Joel Sachs, Swoogle: a search and metadata engine for the semantic web, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
R. Guha , Rob McCool , Eric Miller, Semantic search, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
Berners-Lee, Tim; James Hendler and Ora Lassila (May 17, 2001). "The Semantic Web". Scientific American Magazine.
Thank You!