ios press watson, more than a semantic web search engine · pdf fileundefined 0 (0) 1 1 ios...

9
Undefined 0 (0) 1 1 IOS Press Watson, more than a Semantic Web search engine Mathieu d’Aquin and Enrico Motta Knowledge Media Institute, The Open University Walton Hall, Milton Keynes, UK {m.daquin, e.motta}@open.ac.uk} Abstract. In this tool report, we present an overview of the Watson system, a Semantic Web search engine providing various functionalities not only to find and locate ontologies and semantic data online, but also to explore the content of these semantic documents. Beyond the simple facade of a search engine for the Semantic Web, we show that the avail- ability of such a component brings new possibilities in terms of developing semantic applications that exploit the content of the Semantic Web. Indeed, Watson provides a set of APIs containing high level functions for finding, exploring and querying semantic data and ontologies that have been pub- lished online. Thanks to these APIs, new applications have emerged that connect activities such as ontology construc- tion, matching, sense disambiguation and question answer- ing to the Semantic Web, developed by our group and others. In addition, we also describe Watson as a unprecedented re- search platform for the study the Semantic Web, and of for- malised knowledge in general. Keywords: Watson, Semantic Web search engine, Semantic Web index, Semantic Web applications 1. Introduction The work on the Watson system 1 originated from the idea that formalised knowledge and semantic data was to be made available online, for applications to find and exploit. However, for knowledge to be avail- able does not directly imply that it can be discov- ered, explored and combined easily and efficiently. New mechanisms are required to enable the develop- 1 http://watson.kmi.open.ac.uk ment of applications exploring large scale, online se- mantics [15]. Watson collects, analyses and gives access to on- tologies and semantic data available online. In princi- ple, it is a search engine dedicated to specific types of ‘documents’, which rely on standard Semantic Web formats. Its architecture (see next section) therefore includes a crawler, indexes and query mechanisms to these indexes. However, beyond this simple facade of a Semantic Web search engine (see Section 3), the main objective of Watson is to represent a homogeneous and efficient access point to knowledge published online, a gateway to the Semantic Web. It therefore provides many advanced functionalities to applications, through a set of APIs (see section 4), not only to find and lo- cate semantic documents, but also to explore them, ac- cess their content and query them, including basic level reasoning mechanisms, metrics and links to user eval- uation tools. Of course, Watson is not the only tool of its kind (see related work in Section 6), but it can be distinguished from others by its focus on providing a complete infrastructure component for the develop- ment of applications of the Semantic Web. It has led to the development of a large variety of applications, both from our group and from others. In this paper, we present a complete, up-to-date overview of the Watson system, as well as of applications which are made pos- sible by the functionalities it provides. We also show through several examples how, as a side effect of pro- viding a gateway to the Semantic Web, Watson is being used as a platform to support research activities related to the Semantic Web (see Section 5). 2. Anatomy of a Semantic Web search engine Watson realises three main activities: 1. it collects available semantic content on the Web, 2. it analyses it to extract useful metadata and in- dexes, and 0000-0000/0-1900/$00.00 c 0 – IOS Press and the authors. All rights reserved

Upload: vankhanh

Post on 25-Mar-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Undefined 0 (0) 1 1IOS Press

Watson, more than a Semantic Web searchengine

Mathieu d’Aquin and Enrico MottaKnowledge Media Institute, The Open UniversityWalton Hall, Milton Keynes, UK{m.daquin, e.motta}@open.ac.uk}

Abstract. In this tool report, we present an overview of theWatson system, a Semantic Web search engine providingvarious functionalities not only to find and locate ontologiesand semantic data online, but also to explore the content ofthese semantic documents. Beyond the simple facade of asearch engine for the Semantic Web, we show that the avail-ability of such a component brings new possibilities in termsof developing semantic applications that exploit the contentof the Semantic Web. Indeed, Watson provides a set of APIscontaining high level functions for finding, exploring andquerying semantic data and ontologies that have been pub-lished online. Thanks to these APIs, new applications haveemerged that connect activities such as ontology construc-tion, matching, sense disambiguation and question answer-ing to the Semantic Web, developed by our group and others.In addition, we also describe Watson as a unprecedented re-search platform for the study the Semantic Web, and of for-malised knowledge in general.

Keywords: Watson, Semantic Web search engine, SemanticWeb index, Semantic Web applications

1. Introduction

The work on the Watson system1 originated fromthe idea that formalised knowledge and semantic datawas to be made available online, for applications tofind and exploit. However, for knowledge to be avail-able does not directly imply that it can be discov-ered, explored and combined easily and efficiently.New mechanisms are required to enable the develop-

1http://watson.kmi.open.ac.uk

ment of applications exploring large scale, online se-mantics [15].

Watson collects, analyses and gives access to on-tologies and semantic data available online. In princi-ple, it is a search engine dedicated to specific typesof ‘documents’, which rely on standard Semantic Webformats. Its architecture (see next section) thereforeincludes a crawler, indexes and query mechanisms tothese indexes. However, beyond this simple facade of aSemantic Web search engine (see Section 3), the mainobjective of Watson is to represent a homogeneous andefficient access point to knowledge published online,a gateway to the Semantic Web. It therefore providesmany advanced functionalities to applications, througha set of APIs (see section 4), not only to find and lo-cate semantic documents, but also to explore them, ac-cess their content and query them, including basic levelreasoning mechanisms, metrics and links to user eval-uation tools. Of course, Watson is not the only toolof its kind (see related work in Section 6), but it canbe distinguished from others by its focus on providinga complete infrastructure component for the develop-ment of applications of the Semantic Web. It has ledto the development of a large variety of applications,both from our group and from others. In this paper, wepresent a complete, up-to-date overview of the Watsonsystem, as well as of applications which are made pos-sible by the functionalities it provides. We also showthrough several examples how, as a side effect of pro-viding a gateway to the Semantic Web, Watson is beingused as a platform to support research activities relatedto the Semantic Web (see Section 5).

2. Anatomy of a Semantic Web search engine

Watson realises three main activities:

1. it collects available semantic content on the Web,2. it analyses it to extract useful metadata and in-

dexes, and

0000-0000/0-1900/$00.00 c© 0 – IOS Press and the authors. All rights reserved

2 M. d’Aquin and E. Motta / Watson

Fig. 1. Overview of the Watson architecture.

3. it implements efficient query facilities to accessthe data.

While these three tasks are generally at the basis of anyclassical Web search engine, their implementation israther different when dealing with semantic content asopposed to Web pages.

To realise these tasks, Watson is based on a numberof components depicted in Figure 1, relying on exist-ing, standard and open technologies. Locations of ex-isting semantic documents are first discovered througha crawling and tracking component, using Heritrix, theInternet Archive’s Crawler2. The Validation and Anal-ysis component is then used to create a sophisticatedsystem of indexes for the discovered documents, usingthe Apache Lucene indexing system3. Based on theseindexes, a core API is deployed that provides all thefunctionalities to search, explore and exploit the col-lected semantic documents. This API also links to theRevyu.com Semantic Web based reviewing system toallow users to rate and publish reviews on ontologies.

Different sources are used by the crawler of Wat-son to discover ontologies and semantic data: Google,Swoogle4, PingTheSemanticWeb5, manually submit-ted URLs. Specialised crawlers were designed forthese repositories, extracting potential locations bysending queries that are intended to be covered by alarge number of ontologies. For example, the keyword

2http://crawler.archive.org/3http://lucene.apache.org/4http://swoogle.umbc.edu5http://pingthesemanticweb.com/

search facility provided by search engines such asSwoogle and Google is exploited with queries contain-ing terms from the top most common words in the en-glish language. Another crawler heuristically exploresWeb pages to discover new repositories and to locatedocuments written in certain ontology languages, byincluding “filetype:owl” in a query to Google. Finally,already collected semantic documents are frequentlyre-crawled, to discover evolutions of known semanticcontent or new elements at the same location.

Once located and retrieved, these documents are fil-tered to keep only the elements that characterise theSemantic Web. In particular, to keep only the docu-ments that contain semantic data or ontologies, thecrawler eliminates any document that cannot be parsedby Jena6. In that way, only valid RDF-based docu-ments are considered. Furthermore, a restriction ex-ists which imposes that all RDF based semantic docu-ments be collected with the exception of RSS. The rea-son to exclude these elements is that, even if they aredescribed in RDF, RSS feeds represent semanticallyweak documents, relying on RDF Schema more as away to describe a syntax than as an ontology language.

Many different elements of information are ex-tracted from the collected semantic documents: in-formation about the entities and literals they contain,about the employed languages, about the relations withother documents, etc. This requires analysing the con-tent of the retrieved documents in order to extract rel-evant information (metadata) to be used by the searchfunctionality of Watson.

Besides trivial information, such as the labels andcomments of ontologies, some of the metadata that areextracted from the collected ontologies influence theway Watson is designed. For instance, there are severalways to declare the URI of an ontology: as the names-pace of the document, using the xml:base attribute,as the identifier of the ontology header, or even, if itis not declared, as the URL of the document. URIs aresupposed to be unique identifiers in the scope of theSemantic Web. However, two ontologies that are in-tended to be different may declare the same URI. Forthese reasons, Watson uses internal identifiers that maydiffer from the URIs of the collected semantic docu-ments. When communicating with users and applica-

6http://jena.sourceforge.net/

M. d’Aquin and E. Motta / Watson 3

Fig. 2. Overview of the Watson Web interface.

tions, these identifiers are transformed into common,non-ambiguous URIs from the original documents.

Another important step in the analysis of a semanticdocument is to characterise it in terms of its content.Watson extracts, exploits, and stores a large range ofdeclared metadata or computed measures, such as theemployed languages/vocabularies (RDF, RDFS, OWL,DAML+OIL), information about the contained enti-ties (classes, properties, individuals and literals), ormeasures concerning the richness of the knowledgecontained in the document (e.g., the expressiveness ofthe employed language, the density of the class def-initions, etc.) These elements are then stored and ex-

ploited to provide quality related filtering, ranking andanalysis of the collected semantic content.

3. Watson as a Semantic Web search engine

Even if the first goal of Watson is to support se-mantic applications, it is important to provide Web in-terfaces that facilitate the access to ontologies for hu-man users. Users may have different requirements anddifferent levels of expertise concerning semantic tech-nologies. For this reason, Watson provides different‘perspectives’, from the most simple keyword search,

4 M. d’Aquin and E. Motta / Watson

to complex, structured queries using SPARQL (see fig-ure 2).

The keyword search feature of Watson is similar inits use to usual Web or desktop search systems (seefigure 2(a)). The set of keywords entered by the user ismatched to the local names, labels, comments, or liter-als of entities occurring in semantic documents. A listof matching ontologies is then displayed with, for eachontology, some information about it (languages, size,expressivity of the underlying description logic) andthe list of entities matching each keyword. The searchcan also be restricted to consider only certain types ofentities (classes, properties, individuals) or certain de-scriptors (labels, comments, local names, literals).

One principle applied to the Watson interface is thatevery URI is clickable. A URI displayed in the resultof the search is a link to a page giving the details of ei-ther the corresponding ontology or a particular entity.Since these descriptions also show relations to otherelements, this allows the user to navigate among enti-ties and ontologies. It is therefore possible to explorethe content of ontologies, navigating through the rela-tions between entities (displayed as a list of relations–Figure 2(b)– or a graph –Figure 2(c)), as well as toinspect ontologies and their metadata.

In order to facilitate the assessment and selectionof ontologies by users, it is crucial to provide easyto read and understand overviews of ontologies, bothat the level of the automatically extracted metadataabout them, as well as at the level of their content. Foreach collected semantic document, Watson providesa page that summarises essential information such asthe size of the document (in bytes, triples, number ofclasses, properties and individuals), the languages used(OWL, RDF-S and DAML+OIL, as well as the un-derlying description logic), the links with other docu-ments (through imports) and the reviews from users ofWatson (see Figure 2(d)). Watson also generates smallgraphs (Figure 2(e)), showing the 6 first key-conceptsof each ontology and an abstract representation of theexisting relations between these concepts, based in thekey concept extraction method described in [21].

Finally, a SPARQL endpoint has been deployed onthe Watson server and is customisable to the semanticdocument to be queried. A simple interface allows toenter a SPARQL query and to execute it on the selectedsemantic document (Figure 2(f)). This feature can be

Fig. 3. Using the Watson API to build applications that exploit theSemantic Web as source of knowledge.

seen as the last step of a chain of selection and accesstasks using the Watson Web interface. Indeed, keywordsearch and ontology exploration allow the user to se-lect the appropriate semantic document to be queried.

4. Building Semantic Web applications withWatson

As explained above, the focus of Watson is on im-plementing an infrastructure component, a gateway,for applications to find, access and exploit ontologiesand semantic data published online. To realise this,Watson implements a set of APIs giving access to itsfunctionalities through online services (see Figure 3).

4.1. The Watson APIs

The most commonly used and complete API toWatson is a Java library, giving remote access to themany functions of Watson through a set of SOAP ser-

M. d’Aquin and E. Motta / Watson 5

vices7. The basic design requirements for these APIsis that they should allow applications to exploit on-tologies online, which they might have to identify atruntime, while not having to download these ontolo-gies and the corresponding data, or to implement theirown infrastructure for handling, accessing and explor-ing them. More precisely the Watson Java/SOAP APIgives access, through three different services and in alightweight (for the application) way to functions re-lated to:

Searching ontologies and semantic documents: Usingkeywords and restrictions, related for example tothe type of entities (classes, properties, individu-als) the keywords should match to, or the placewhere they can match (name, label, comment orother literals in the entity), these functions allowto locate ontologies that relate to a particular do-main and contain particular concepts.

Searching in ontologies and semantic documents:Similarly, using keywords and restrictions, func-tions are provided to identify, within an givenontology or semantic document, the entities thatmatch the given keywords.

Retrieving metadata about an ontology: Many func-tions are implemented that allow to characterisea particular ontology through automatically gen-erated metadata (such as the languages used, thesize, labels and comments, imported ontologies,etc.), as well as evaluations from users of Watson.

Retrieving metrics on ontologies and entities: Mea-sures are provided in a dedicated service regard-ing ontologies and entities (e.g., depth of the hi-erarchy of an ontology or ‘density’ of an entity interms of connections). This allows applications todefine filters and selection criteria ensuring cer-tain characteristics from the elements they ex-ploit.

Exploring the content of ontologies: Functions areprovided that allow an application to access thecontent of an ontology, through exploring its en-tities and their connections. These functions in-clude the possibility to ask for the subclassesof an RDF, DAML or OWL class in any of thecollected ontologies, the labels of a given entityor any relation ‘pointing to’ a given individual.Some of these functions are also available in avariant providing basic level reasoning, in order

7http://watson.kmi.open.ac.uk/WS_and_API-v2.html

to, for example, obtain all the subclasses of aclass, i.e., both the directly declared ones and theones inferred from the transitivity of the subclassrelation.

SPARQL Querying: In case the functions providedare not sufficient, and complex queries are re-quired, a SPARQL query can be executed directlyfrom the API, or alternatively by using the de-ployed SPARQL endpoint.

[12] provides an example of a simple, lightweightapplication using the Watson API overviewed aboveto realise some basic ontology-based query expansionmechanisms for a search engine. This application sim-ply suggests to the user keywords that are either moregeneral or more specific than the ones used, by query-ing Watson for the subclasses and superclasses of cor-responding classes in online ontologies. While beingonly a basic demonstrator, this application shows oneof the key contributions of Watson: giving applicationsthe ability to efficiently make use of large scale seman-tics in an open domain. More advanced examples aredescribed in the next section.

Similarly to the Java/SOAP API, Watson also pro-vides a set of REST services/APIs8 including a sub-set of Watson’s functionality. This API is more conve-nient (and more often used) in dynamic Web applica-tions using scripting languages such as JavaScript (anexample of such an application is described in [18]).

4.2. Derived tools and example applications

The Watson APIs described above as been devel-oped for, and following the requirements of, newSemantic Web applications exploiting online knowl-edge. Many of such applications have been developed,with varying degrees of complexity and sophistication,mostly by research groups involved in the SemanticWeb area. Details of several of them are available inthe two papers [15] and [11].

Probably one of the most obvious applications of anautomated ontology search mechanism is for ontologyengineering itself. Related to this task, the Watson plu-gin [10] is an extension of an ontology editor (namely,the NeOn toolkit9) that allows the ontology engineerto check for knowledge included in online ontologies,

8http://watson.kmi.open.ac.uk/REST_API.html9http://neon-toolkit.org

6 M. d’Aquin and E. Motta / Watson

and reuse part of them while building a new ontology.It can integrate statements from ontologies discoveredby Watson and keep links between the created ontol-ogy and the elements reused by generating mappingsconnecting entities on the local ontologies with theones of identified, online ontologies.

Staying in the Semantic Web domain, Scarlet10 fol-lows the paradigm of automatically selecting and ex-ploring online ontologies to discover relations betweentwo given terms, as a way of realising ontology match-ing [23]. It realises this by finding, through Watson,ontologies that contain relations between the two giventerms, possibly exploring multiple ontologies and de-riving a relation from complete paths across differentontologies. When evaluated on two large scale agri-culture thesauri, Scarlet demonstrated good precision,while using hundreds of external ontologies identifiedat run-time.

Using PowerAqua11, a user can simply ask a ques-tion, such as “Who are the members of the rock bandNirvana?” and obtain an answer, in this case in theform of a list of musicians (Kurt Cobain, Dave Grohl,Krist Novoselic and other former members of thegroup). The main strength of PowerAqua resides in thefact that this answer is derived dynamically from therelevant data available on the Semantic Web, as dis-covered and explored through Watson.

Garcia et al. in [17] exploit Watson to tackle the taskof word sense disambiguation (WSD). Specifically,they propose a novel, unsupervised, multi-ontologymethod which 1) relies on dynamically identified on-line ontologies as sources for candidate word sensesand 2) employs algorithms that combine informationavailable both on the Semantic Web and the Web in or-der to integrate and select the right senses. They haveshown in particular that the Semantic Web provides agood source of word senses that can complement tra-ditional resources such as WordNet12.

Also, in [22], the authors use Watson in a sophisti-cated process to gather information about people andinclude this information in an integrated way into alearning mechanism for the purpose of identifying

10http://scarlet.open.ac.uk/11http://kmi.open.ac.uk/technologies/

poweraqua/12http://wordnet.princeton.edu

Web citations. In [20], Watson is used as a main sourceof knowledge for annotating Web services from theirdocumentation published online.

Finally, the Watson engine itself is at the basis of theCupboard13 ontology publishing tool, which providesfunctionalities to expose, promote, evaluate and reuseontologies for the Semantic Web community [14].

These are only brief descriptions of a few of theapplications developed so far that rely on the Watsonplatform. Moreover, as described in the next section,Watson is also used within research processes, as a wayto obtain corpora of real-life, online formalised knowl-edge for analysis and tests.

5. Using Watson as a research platform

A Semantic Web search engine such as Watson isnot only a service supporting the development of Se-mantic Web applications. It also represents a unprece-dented resource for researchers to study the Seman-tic Web, and more specifically, how formalised knowl-edge and data are produced, shared and consumed on-line [13].

For example, to give an account of the way seman-tic technologies are used to publish knowledge on theWeb, of the characteristics of the published knowl-edge, and of the networked aspects of the SemanticWeb, [7] presented an analysis of a sample of 25,500semantic documents collected by Watson. This analy-sis looked in particular into the use of Semantic Weblanguages and of their primitives. One noticeable factthat was derived from analysing both the OWL (ver-sion 1) species and the description logics used in on-tologies is that, while a large majority of the ontologiesin the set were in OWL Full (the most complex variantof OWL 1, which is undecidable), most of them werein reality very simple, only using a small subset of theprimitives offered by the language (95% of the ontolo-gies where based on the ALH(D) description logic).

More recently, research work has been conductedusing Watson as a corpus to detect and study variousimplicit relationships between ontologies and semanticdocuments on the Web [1]. For example, in [6], we in-troduced fine-grained measures of agreement and dis-

13http://cupboard.open.ac.uk

M. d’Aquin and E. Motta / Watson 7

agreement between ontologies, which were tested onreal-life ontologies collected by Watson. We also de-rived from agreement and disagreement, measures ofconsensus and controversy regarding particular state-ments, within a large collection of ontologies, suchas the one of Watson. Indeed, an implementation ofsuch measures allowed us to build a tool indicatingthe level of consensus and controversy that exist on agiven statement with respect to online ontologies. Werecently integrated this tool in the NeOn toolkit on-tology editor, as a way to provide an overview of thedeveloped ontology with respect to its agreement withother, online ontologies [8].

Many other aspects of online ontologies can also beconsidered for study, including for example how on-tologies evolve online [2], as well as for testing newtechniques and approaches applicable to ontologies atlarge. In [5] for example, Watson is used to provideontologies where ‘anti-patterns’ can be identified. Ina more systematic manner, in [9], the Watson API isused to constitute groups of ontologies with varyingcharacteristics to be used to benchmark semantic toolson resource-limited devices. In such a case, the largenumber and variety of ontologies online represent anadvantage for testing systems and technologies.

6. Related Work

There are a number of systems similar to Watson,falling into the category of Semantic Web search en-gines. However, Watson differs from these systems ina number of ways, the main one being that Watson isthe only tool to provide the necessary level of servicesfor applications to dynamically exploit Semantic Webdata. Indeed, we can mention the following systems:

Swoogle has been one of the first and most popularSemantic Web search engine [16]. It runs an auto-mated hybrid crawl to harvest Semantic Web datafrom the Web, and then provide search servicesfor finding relevant ontologies, documents andterms using keywords and additional semanticconstraints. In addition to search, Swoogle alsoprovides aggregated statistical metadata about theindexed Semantic Web documents and SemanticWeb terms.

Sindice14 is a Semantic Web index or entity look-upservice that focuses on scaling to very large quan-tities of data. It provides keyword and URI based

search, structured-query and rely on some sim-ple reasoning mechanisms for inverse-functionalproperties [25].

Falcons15 is a keyword-based semantic entity searchengine. It provides a sophisticated Web interfacethat allows to restrict the search according to rec-ommended concepts or vocabularies [4].

SWSE16 is also a keyword-based entity search engine,but that focuses on providing semantic informa-tion about the resulting entities rather than onlylinks to the corresponding data sources [19]. Itscollection is automatically gathered by crawlers.SWSE also provides a SPARQL endpoint en-abling structured query on the entire collection.

Semantic Web Search17 is also a semantic entitysearch engine based on keywords, but that allowsto restrict the search to particular types of enti-ties (e.g. DOAP Projects) and provides structuredqueries.

OntoSelect18 provides a browsable collection of on-tologies that can be searched by looking at key-words in the title of the ontology or by providinga topic [3].

OntoSearch219 is a Semantic Web Search engine thatallows for keyword search, formal queries andfuzzy queries on a collection of manually submit-ted OWL ontologies. It relies on scalable reason-ing capabilities based on a reduction of OWL on-tologies into DL-Lite ontologies [24].

Sqore20 is a prototype search engine that allows forstructured queries in the form of OWL descrip-tions [26]. Desired properties of entities to befound in ontologies are described as OWL entitiesand the engine searches for similar descriptions inits collection.

Among these, Sindice for example, is one of themost popular. However, while Sindice indexes a verylarge amount of semantic data, it only provides a sim-ple look-up service allowing applications/users to ‘lo-cate’ semantic documents. Therefore, it is still nec-essary to download and process these documents lo-cally to exploit them, which in many cases, is not feasi-ble. The Swoogle system is closer to Watson, but doesnot provide some of the advanced search and explo-ration functions that are present in the Watson APIs(including the SPARQL querying facility). The Fal-cons Semantic Web search engine has been focusingmore on the user interface aspects, but now providesan initial API including a sub-set of the functions pro-vided by Watson. The other systems focus on a re-

8 M. d’Aquin and E. Motta / Watson

stricted set of scenarios or functionalities (e.g., anno-tation and language information in OntoSelect), andhave not been developed and used further than as re-search prototypes.

Another important aspect to consider is how openSemantic Web Search engines are. Indeed, Watson isthe only Semantic Web search engine to provide un-limited access to its functionalities. Sindice, Swoogleand Falcons are, on the contrary, restricting the possi-bility they offer by limiting the number of queries ex-ecutable in a day or the number of results for a givenquery.

Finally, it is worth noticing that the issue of collect-ing semantic data from the Web has recently reached abroader scope, with the appearance of features withinmainstream Web search engine exploiting structureddata to improve the search experience and presenta-tion. Indeed, Yahoo! SearchMonkey21 crawls and in-dexes semantic information embedded in webpagesas RDFa22 or microformats23, in order to provide en-riched snippets describing the webpages in the searchresults. Similarly, Google Rich Snippets24 makes useof collected semantic data using specific schemas inwebpages to add information to the presentation ofresults. Watson currently focuses on individual RDFdocuments and does not index embedded formats suchas RDFa. Such an extension if planned to be realisedin a near future.

7. Future work and planned functionalities

With many users, both humans and applications25,and several years of development (the very first ver-sion was released in 2007), Watson is now a maturesystem that provides constant services, with very raredown times. It has evolved based on the requirements,requests and feedbacks from the community of devel-opers using it in many different applications.

21http://developer.yahoo.com/searchmonkey/22http://www.w3.org/TR/xhtml-rdfa-primer/23http://microformats.org/24http://googlewebmastercentral.blogspot.

com/2009/05/introducing-rich-snippets.html25There is currently about 2,500 queries and 8,000 pages viewed

per month on the Watson user interface. The activities of applica-tions using the Watson APIs cannot be traced, but is believed to gen-erate a significantly greater number of requests than the user inter-face, as some of the applications described earlier in this paper canmake several thousands of calls to the APIs in a very short time.

Of course, Watson is still being developed, includ-ing an ever growing index of semantic documents fromthe Web. New specialised indexes have been createdrecently, with manually initiated crawls of large linkeddata nodes such as DBPedia26. Also, the Cupboardsystem mentioned earlier is an ontology publicationplatform based on the engine of Watson, which meansthat any document submitted to it is being indexed us-ing the same process as the one of Watson, and is madeavailable through compatible APIs. A ‘federated ser-vice’ where different instances of Watson, includingthe current system and Cupboard, can be connected inorder to return aggregated results has been developedand is currently being tested. This means in particu-lar that user will be soon able to contribute ontologiesto the Watson collection in real time, shortcutting thecrawler, by simply submitting them the Cupboard.

New functionalities are also being considered. Inparticular, Watson includes a minimalistic evaluationmechanism through the connection with Revyu.com.Many refinements could be imagined, from simpleintegrations with social platforms (e.g., a Facebook‘like’ button for ontologies) to monitoring and keep-ing the connections between communities behind on-tologies, and the communities using particular ontolo-gies. This requires extensive research on the social as-pects of ontologies and how to keep track of them, butwould ultimately improve the ability of Watson to sup-port users in selecting ontologies.

An important characteristic of ontologies is that theyare not isolated artefacts. They are related to each otherin a network of semantic relations. However, apartfrom exceptions (noticeably, import), these relationsare mostly kept implicit. Extensive research work isbeing realised currently on formalising such relations(e.g., inclusion, versioning, similarity, see [1]) and de-ploying efficient methods to detect them in a largescale collection such as the one of Watson, as well asto evaluate the benefit of structuring search results onthe basis of ontology relations, for a more efficient on-tology selection approach.

References

[1] C. Allocca, M. d’Aquin, and E. E. Motta. DOOR: Towards aFormalization of Ontology Relations. In Proc.of InternationalConference on Knowledge Engineering and Ontology Devel-opment (KEOD), 2009.

26http://dbpedia/org

M. d’Aquin and E. Motta / Watson 9

[2] C. Allocca, M. d’Aquin, and E. Motta. Detecting different ver-sions of ontologies in large ontology repositories. In Interna-tional Workshop on Ontology Dynamic, IWOD 2009 at ISWC,2009.

[3] Paul Buitelaar, Thomas Eigner, and Thierry Declerck. Ontos-elect: A dynamic ontology library with support for ontologyselection. In Proc. of the Demo Session at the InternationalSemantic Web Conference, 2004.

[4] Gong Cheng, Weiyi Ge, and Yuzhong Qu. Falcons: searchingand browsing entities on the semantic web. In WWW confer-ence, pages 1101–1102. ACM, 2008.

[5] O. Corcho, C. Roussey, F. Scharffe, and V. Svatek. SPARQL-based detection of anti-patterns in OWL ontologies. In EKAW2010 Conference - Knowledge Engineering and KnowledgeManagement by the Masses, Poster session, 2010.

[6] M. d’Aquin. Formally Measuring Agreement and Disagree-ment in Ontologies. In International Conference on Knowl-edge Capture - K-CAP 2009, 2009.

[7] M. d’Aquin, C. Baldassarre, L. Gridinoc, S. Angeletou,M. Sabou, and E. Motta. Characterizing Knowledge on theSemantic Web with Watson. In Evaluation of Ontologiesand Ontology-based tools, 5th International EON Workshop atISWC 2007, 2007.

[8] M. d’Aquin and E. Motta. Visualising Consensus with On-line Ontologies to Support Quality in Ontology Development(submitted). In EKAW 2010 Workshop on Ontology Quality (toappear), 2010.

[9] M. d’Aquin, A. Nikolov, and E. Motta. How much SemanticData on Small Devices? In EKAW 2010 Conference - Knowl-edge Engineering and Knowledge Management by the Masses,2010.

[10] M. d’Aquin, M. Sabou, and E. Motta. Reusing knowledge fromthe semantic web with the watson plugin. In Demo, Interna-tional Semantic Web Conference, ISWC 2008, 2008.

[11] M. d’Aquin, M. Sabou, E. Motta, S. Angeletou, L. Gridinoc,V. Lopez, and F. Zablith. What can be done with the Se-mantic Web? An Overview of Watson-based Applications. In5th Workshop on Semantic Web Applications and Perspectives,SWAP 2008, 2008.

[12] Mathieu d’Aquin. Building Semantic Web Based Applicationswith Watson. In WWW2008 - The 17th International WorldWide Web Conference - Developers’ Track, 2008.

[13] Mathieu d’Aquin, Carlo Allocca, and Enrico Motta. A plat-form for semantic web studies. In Web Science Conference,poster session, 2010.

[14] Mathieu d’Aquin, Jérôme Euzenat, Chan Le Duc, and HolgerLewen. Sharing and reusing aligned ontologies with cupboard.In Demo, International Conference on Knowledge Capture -

K-CAP 2009, 2009.[15] Mathieu d’Aquin, Enrico Motta, Marta Sabou, Sofia Angele-

tou, Laurian Gridinoc, Vanessa Lopez, and Davide Guidi. To-ward a new generation of semantic web applications. Intelli-gent Systems, 23(3):20–28, 2008.

[16] Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost,Yun Peng, Pavan Reddivari, Vishal C Doshi, and Joel Sachs.Swoogle: A search and metadata engine for the semantic web.In CIKM’04: the Proceedings of ACM Thirteenth Conferenceon Information and Knowledge Management, 2004.

[17] J. Garcia and E. Mena. Overview of a semantic disambiguationmethod for unstructured web contexts. In Proceedings of thefifth international conference on Knowledge capture table ofcontents,K-CAP 2009, Poster session, 2009.

[18] L. Gridinoc and M. d’Aquin. Moaw - uri’s everywhere. In 4thWorkshop on Scripting for the Semantic Web (challenge entry),ESWC 2008, 2008.

[19] Andreas Harth, Aidan Hogan, Renaud Delbru, J?rgen Um-brich, Se?n O’Riain, and Stefan Decker. Swse: Answers be-fore links! In Semantic Web Challenge, volume 295 of CEURWorkshop Proceedings, 2007.

[20] M. Maleshkova, C. Pedrinaci, and J. Domingue. SemanticAnnotation of Web APIs with SWEET. In 6th Workshop onScripting and Development for the Semantic Web at ESWC2010, 2010.

[21] S. Peroni, E. Motta, and M. d’Aquin. Identifying key conceptsin an ontology through the integration of cognitive principleswith statistical and topological measures. In Proceedings of theThird Asian Semantic Web Conference, ASWC 2008, 2009.

[22] M. Rowe and F. Ciravegna. Disambiguating Identity Web Ref-erences using Web 2.0 Data and Semantics. Journal of WebSemantics, 2010.

[23] M. Sabou, M. d’Aquin, and Motta E. Exploring the semanticweb as background knowledge for ontology matching. Journalof Data Semantics, 2008.

[24] Edward Thomas, Jeff Z. Pan, and Derek H. Sleeman. On-tosearch2: Searching ontologies semantically. In OWLEDworkshop, volume 258 of CEUR Workshop Proceedings, 2007.

[25] Giovanni Tummarello, Eyal Oren, and Renaud Delbru.Sindice.com: Weaving the open linked data. In Proceedings ofthe 6th International Semantic Web Conference and 2nd AsianSemantic Web Conference (ISWC/ASWC2007), Busan, SouthKorea, volume 4825 of LNCS, pages 547–560, Berlin, Heidel-berg, November 2007. Springer Verlag.

[26] Rachanee Ungrangsi, Chutiporn Anutariya, and Vilas Wu-wongse. Sqore-based ontology retrieval system. In DEXA con-ference, volume 4653 of Lecture Notes in Computer Science,pages 720–729. Springer, 2007.