shortipedia: aggregating and curating semantic web …gil/papers/vrandecic-etal-jws11.pdf ·...

ShortipediaAggregating and Curating Semantic Web Data

Denny Vrandečića, Varun Ratnakarb, Markus Krötzschc, Yolanda Gilb

aInstitute AIFB, KIT Karlsruhe Institute of Technology, Karlsruhe, Germany, [email protected], USC University of Southern California, Marina del Rey, CA, USA, {varunr|gil}@isi.educComputing Laboratory, University of Oxford, Oxford, UK, [email protected]

Abstract

Shortipedia is a Web-based knowledge repository, that pulls together a growing number of sources in orderto provide a comprehensive, diversified view on entities of interest. Contributors to Shortipedia can easilyadd claims to the knowledge base, provide sources for their claims, and find links to knowledge alreadyavailable on the Semantic Web.

Keywords: semantic wikis, data integration, semantic infrastructure technology, technology forcollaboration

1. Introduction

The goal of Shortipedia is to develop a referenceWeb site that combines automatic aggregation offacts on the Semantic Web and manual curation byvolunteer users. Since different sources can take dif-ferent views and contain inconsistent information,central to the design of Shortipedia is that any as-sertion is in principle possibly true and must beincluded in the repository together with its prove-nance. Instead of taking assertions as true factualknowledge, we interpret them as assertions aboutan entity that are true within the context of theirprovenance record.

Key features of Shortipedia are that it

1. assigns a unique identifier to entities of interestusing DBpedia [4] as the basis for identity andWikipedia as the basis for consensus on entitiesof interest,

2. collects those identifiers in a consensus-built re-source,

3. aggregates facts about those entities from theWeb of Data and other available data sources,

4. embraces the diversity of views on any givenfact and documents their provenance, and

5. supports the curation of its contents by volun-teers.

Shortipedia is built on top of a number of Webservices and frameworks in order to provide a com-

prehensive and rewarding workflow for contributorsto the project, and thus ultimately to create a ref-erence website for all kind of facts. Whereas DB-pedia already shows the high potential of providinga nucleus for the Web of Data, an editable exten-sion of such a nucleus could join the advantages ofWikipedia and DBpedia, building on both servicesand enriching them.

As a knowledge repository, Shortipedia containsthe following types of information:

• a core of entities, based on Wikipedia articles,

• mappings of those entities to entities in theWeb of Data,

• aggregated assertions from Web of Datasources,

• provenance records of all assertions, and

• direct contributions and corrections from users.

While the first four types of information deal withaggregating and curating Semantic Web data, thelast type – the direct contributions – allow users theimmediate addition and correction of knowledge toShortipedia, thus providing a Wikipedia-like incen-tive and growth system. This turns Shortipediainto a system for knowledge acquisition, where theprovenance of the assertion is provided by the userediting log.

Preprint submitted to Journal of Web Semantics May 19, 2011

This results in a system that provides

• a knowledge repository containing an aggre-gated and curated view on Semantic Web data,

• a query and browsing interface for LinkedOpen Data, and

• a neutral and unbiased reference site that in-cludes alternative points of view.

Shortipedia was publicly released as a beta inNovember 2010 and the experiences gathered withShortipedia currently inform the development of asimilar project based on a more scalable approach.This system description first gives an overview ofthe architecture and some related approaches inSection 2 and 3. In Section 4 we note some of thedesign decisions we have made, followed by a sum-mary of lessons learned in Section 5. The system isavailable online at http://shortipedia.org.

2. Architecture

Shortipedia is an application of MediaWiki, ahighly extensible, open source wiki software [2].MediaWiki is widely used both on the Web, mostfamously on Wikipedia [1], as well as in corporateintranets for a huge variety of tasks. In order tosupport these tasks, more than 1,200 extensions arebeing provided for download from the MediaWikihomepage.1 Shortipedia is based on a newly devel-oped extension that can be deployed on top of ex-isting MediaWiki installations. The system buildsupon functionality that is provided by a numberof existing MediaWiki extensions. We will describeboth the newly developed software and its relation-ship to existing components in this section.

The major extension used in Shortipedia is Se-mantic MediaWiki (SMW). SMW enables addingmetadata to wiki pages and querying the collectedmetadata of the wiki [15]. Due to this functional-ity, the wiki can also be regarded as a light-weight,schemaless database editable through the wiki userinterface. So in addition to the richly formatted,human-readable wiki pages about persons, cities,companies, music bands, etc., the system allowsmachine-readable metadata to be associated withthese pages. The particular strength of SMW isthat it closely integrates both aspects, enabling the

1http://www.mediawiki.org/wiki/Manual:Extensions

collaborative maintenance of semantic data as partof the established editing process. Various queryingand browsing features provide immediate means forre-using the data within the wiki – a crucial incen-tive for editors to contribute metadata in the firstplace. Further details can be found in [15]. Shorti-pedia builds on the technical infrastructure of SMWbut replaces the normal text-based editing interfaceof MediaWiki (and thus Semantic MediaWiki) witha purely form-based interface. The template syntaxof MediaWiki is heavily used to create a wiki thatis primarily aimed at editing data instead of text.

Building on this baseline, we already have a sys-tem that is known to be scalable with regards tothe size of the data and the size of the commu-nity it is supposed to handle, and that providesus with all the basic functionalities like user man-agement, rights management, an infrastructure forRestful APIs, complete and rich article history fea-tures, and a comprehensive extension framework.

Moreover, SMW is part of a vibrant ecosystemof related extensions, that can be used to fur-ther extend the services provided by Shortipedia.The functionality provided by available extensionsranges from additional query services, such as therich query capabilities of Ask the Wiki [13] or Halo2,to support for cross-site data exchange and synchro-nization, as provided by DataPress3 or DistributedSMW [20]. Most of these extensions can be read-ily deployed in our scenario, but we focus on thecore functionality of Shortipedia within this systemdescription.

For Shortipedia we developed a MediaWiki ex-tension to provide us with the missing features weneed for the system. These include the following:

Loading Data from the Semantic Web.Information that is available as Linked OpenData [3] is integrated into the system. Thisdata is then visualized for the end user, andcan be mapped to the wiki-internal vocabu-lary. The information can then be copied intothe wiki system while retaining the link to theoriginal source as a reference.

Simple Editing of Facts and References.Shortipedia features a custom user interface,so that no wiki syntax needs to be learnedby the user. The new interface completely

2http://smwforum.ontoprise.com/3http://projects.csail.mit.edu/datapress/

2

http://shortipedia.org

http://www.mediawiki.org/wiki/Manual:Extensions

http://smwforum.ontoprise.com/

http://projects.csail.mit.edu/datapress/

Figure 1: The English Shortipedia page for Shanghai http://shortipedia.org/topic/en/Shanghai, including Wikipedia articletext (left), facts gathered by Shortipedia (top right), and related entities on the Web of Data (bottom right)

replaces the classical editing of the wiki’smarkup language, and the latter is disabled.

Multi-language support. Support for interna-tionalization generally increases usability, butis of particular relevance for establishing an in-tegrated knowledge base like Shortipedia.

Each entity in Shortipedia is represented by adedicated page, as illustrated in Fig. 1. If avail-able, the page displays the related Wikipedia articlefor the entity. Wikipedia and DBpedia [4] togetherhave created the largest resource of community-built consensus on the meaning of Web identi-fiers. For bootstrapping, we assume the mappingthat DBpedia provides for Wikipedia articles, i.e.we assume that the Wikipedia article at http://en.wikipedia.org/wiki/X is about the entityreferenced by http://dbpedia.org/resource/X.

Using the DBpedia identifier, we then query thesameAs.org service [12]. The latter is a servicethat, given an URI, provides further URIs thatare considered co-referent. Moreover, the SemanticWeb search engine Sindice provides entities given

a search term, among other services [18]. We thusalso query Sindice for data about the title of theentity (the Wikipedia page name). Shortipedia isdesigned to allow further sources of identity to beincorporated later, e.g., for special fields such asprotein research (e.g. through Chem2Bio2RDF [7])or music (e.g., MusicBrainz [22]).

To process the data that is retrieved from theSemantic Web, Shortipedia incorporates the RDFparser ARC2.4 This allows us to dereference theURIs suggested by sameAs.org and Sindice, and toprovide retrieved data to the user upon request. Anexample is shown in Fig. 1, where the entity re-trieved from New York Times (bottom right) hasbeen expanded. We use AJAX (via the jQuery li-brary5) to execute such requests dynamically. Thisrequires the information about the respective en-tities to be available as Linked Open Data. Thesystem then displays the assertions gathered fromthe Web of Data, and allows the user to map ex-ternal vocabulary to Shortipedia vocabulary. The

4http://arc.semsol.org/5http://jquery.com/

3

http://arc.semsol.org/

http://jquery.com/

Figure 2: View on DBpedia data on the English Shortipediapage for Shanghai

respective interface is shown in Fig. 2. Blue termshave been mapped to the internal vocabulary, whilered terms still need to be mapped. Mappings aremanaged globally, so that every external vocabu-lary URI needs to be mapped only once. When thevocabulary of a fact has been mapped, the user canadd it to the Shortipedia knowledge base by click-ing on the plus icon in Fig. 2. The provenance trailof the fact is preserved in this operation.

3. Related Approaches

Various approaches of collecting structuredknowledge from volunteers have been studied inprior research. The OpenMind6 constellation ofprojects has been collecting common sense knowl-edge7 to create structured repositories.8 The CycFACTory [17] allows contributors to add facts tothe Cyc knowledge base [16], but the contributionshave to conform to the pre-defined schema. TheESP game9 can also be considered to structure con-tributions, as it motivates users to provide commonlabels to pictures through rewards. Our own workon Learner [8, 9, 10] studied the contribution by

6http://www.openmind.org7http://openmind.media.mit.edu8e.g., http://csc.media.mit.edu/conceptnet9http://www.espgame.org

volunteers of common knowledge, and reported onissues of quality of the content and the design of theuser interaction to maximize the breadth of contri-butions as well as the automated reasoning theyenable.

Sig.ma [24] provides an interface to the Web ofData, in which each user can create a live collectionof data, pulling it together from different sourcesthat can be republished in several different ways.Sig.ma does not provide a place for the collabora-tive curation and collection of data, but gives thispower only to the individual user for the creationof their own view on the Web of Data.

There are several efforts in extracting structuredknowledge from Wikipedia. The Intelligence inWikipedia project [26, 27, 28, 29, 14] combines au-tomated text extraction with community createdcontent. Researchers have used structured infor-mation from Wikipedia to study relatedness [19],expertise [21], and document indexing [23].

DBpedia [4] is a project to extract structured in-formation from Wikipedia. It provides the Webof Data with a much needed nucleus of stable,Wikipedia-based URIs. DBpedia uses a multitudeof automatic and semi-automatic techniques to ex-tract information fromWikipedia articles. WhereasShortipedia also takes Wikipedia as its startingpoint for entities, especially for establishing iden-tity, it is not based on an automatic extraction fromWikipedia, but rather on the completely human ef-fort for extracting and curating single facts. Also,DBpedia does not provide the direct editing of itscontent but relies on respective changes in the orig-inal source, i.e. Wikipedia, or the extraction mech-anism. Recently, Ultrapedia10 has been released toprovide a wiki-based interface on top of DBpediadata, including the possibility to easily and selec-tively edit the source Wikipedia text and thus pro-vide a feedback loop to the original source.

Freebase is a Web-based database that enablesthe creation and editing of data entries for any en-tity of general interest [6]. Whereas close in spirit,Shortipedia puts a strong emphasis on retaining ref-erences for every piece of information. Freebase inturn provides a more polished user interface, butthe user interface (not the underlying model) is, asof writing, only available in English, whereas Short-ipedia was multilingual from the beginning. Themain difference between Shortipedia and Freebase

10http://wiking.vulcan.com/up/index.php/Main_Page

4

http://www.openmind.org

http://openmind.media.mit.edu

http://csc.media.mit.edu/conceptnet

http://www.espgame.org

http://wiking.vulcan.com/up/index.php/Main_Page

is the completely schemaless environment of Short-ipedia, a trait inherited from Semantic MediaWiki.Freebase on the other hand uses types and theirschemas for entities. This allows for a more sup-portive interface by Freebase, but at the same timeit constraints the usage of arbitrary properties thatis possible in Shortipedia. This is expected to leadto a more active community with regards to thedefinition of new properties in Shortipedia, but itis also expected that Freebase will in general havemore complete and consistent data. Due to the lim-ited release of Shortipedia this hypothesis cannot betested yet.

Not surprisingly, both Freebase and DBpedia arefrequent sources suggested for mappings in Short-ipedia, and thus data from both efforts are reusedinside the project.

Also, using external data in Semantic MediaWikiis not a new idea [11]. Instead of allowing the tightintegration for querying that many of these othersolutions provide, Shortipedia rather aims at fur-ther integrating external data within the system,and then to allow to query this integrated data set.Indeed, Shortipedia is rather different from typicaluses of SMW, regarding both the type of informa-tion that is managed, and the associated workflowand interface. In particular, the active discovery ofnew data sources, and the integration of from thesesources distinguishes our approach from more staticcross-site data retrieval methods proposed for SMWbefore.

4. Design Decisions

In this section we describe some key decisionswith regards to the design of Shortipedia, and ourrationales behind them. We acknowledge that thesedecisions are not imperative, but could often beentaken differently, resulting in a different system.

Wikipedia and DBpedia for identity. In or-der to circumnavigate the core problem of howto establish identity on the Semantic Web,we build up on the work already provided bythese two established sources.

Consistency is not required. The system is de-signed deliberately so that it allows the con-tributors to add data that leads to inconsis-tencies. Instead of ensuring data quality by en-forcing some form of logical consistency or for-mal constraints, we merely require data sources

to be referenced well. We thus exploit the wikiparadigm of community-based editing, whichshows that high quality content can be ob-tained by collaboratively improving individualcontributions in an incremental fashion. Al-though we generally assume these principlesto be effective for managing data as well, weexpect that communities will develop “garden-ing” approaches that are specific to the type ofcontent in Shortipedia. For example, possibleerrors can be detected more easily in machine-readable data using consistency rules, even ifthe system cannot repair potential problemsautomatically [25].

Triples are immutable. We do not allow forfacts to be changed. Instead, contributors canadd new facts and provide better sources, orthey can delete facts. We retain a history ofall edits to enable easy recovery from vandal-ism.

Don’t trust the triples. Every fact can be givena reference, and the user can decide which ref-erenced sources to trust. We considered im-plementing a rating system for triples directly,e.g., users would be able to agree/disagree/likethe fact that Paris is the capital of France, etc.,but we dropped that idea, since it would beunclear what such an endorsement would evenmean (see also the discussion on Wikipedia anddemocracy [5]). Instead we focus on externalreferences.

sameAs is different. Stating co-reference for twoURIs is not regarded as merely another asser-tion in the system, but rather is completelymaterialized – meaning that, once an assertionis added to the system it does not keep trackof the original URIs used to describe the as-sertion. Deleting the mapping to the externalURI will thus not delete the assertions addedwhile the URI was mapped. We do not ex-pect this to become a problem since (1) factsstill need to be added manually even thoughtwo URIs have been mapped, and (2) a spe-cific external source often uses a specific URIfor an entity, and since the source informa-tion is retained the URI can then be recon-structed. In order to enable this, the systemalso enforces that the mapping property is in-verse functional, i.e. that the same external

5

URI cannot be mapped to two different URIsin the Shortipedia namespace.

First map, then add. Facts from the Web ofData can only be added to the system aftera mapping to internal entities has been estab-lished. Both the property and the value (if nota literal) have to be mapped to URIs internalto Shortipedia. This ensures that data integra-tion is part of the editing process, preventingShortipedia from becoming a mere data aggre-gator for facts discovered on the Web.

The browser does the work. In order to pro-vide a performant and responsive interface wedelegate much of the processing to the browser.We think that a compelling UI is important toengage a broader community. The downsideis that complex client-side processing may ex-clude some browsers or devices. To addressthis, we plan to add a simple browse-only in-terface that cannot be used to edit Shortipedia.

5. Lessons Learnt

This section discusses a number of sometimes un-expected problems that we encountered in the de-velopment of Shortipedia.

Data is noisy or hard to understand. The re-cent explosive growth of the Linked Open Datacloud gave us hope to find huge amounts of in-teresting and useful data on the Web – andindeed, huge amounts of data we found. Butit is often hard to give a meaningful interpre-tation to this data, and some datasets, espe-cially automatically extracted ones, sport anunfavorable amount of noise. These challengespoint to interesting future work.

What’s in a name? Many RDF resources on theWeb only carry few or distributed labeling an-notations, raising the question of how to dis-play those entities to the end user. Whereasentities, when dereferenced, do often have somelabeling annotation for themselves, they do notcarry such annotations for the other entitiesmentioned in the RDF document. A possiblesolution is to dereference all mentioned enti-ties, giving rise to the next issue.

The Semantic Web is slow. Dereferencing bignumbers of entities, e.g. loading several hun-dred RDF documents just to create a single

page, would be unacceptably slow. We thushad to find a viable balance between the num-ber of calls and the additional information weexpected from the calls. Another approach wasto execute some calls only when explicitly re-quested by the user instead of providing all rel-evant data as part of the page that is initiallysent to the browser.

Semantic Web browsers cause problems.We wanted to enable users to browse the ref-erenced data sources, but faced two problems:(1) there are many different Semantic Webbrowsers, and linking to all of them cluttersthe interface, and (2) often the browsersdid not work for extended periods of time,or with the given data source, or had someother problems. In order to circumvent thisproblem, we created a web service called theLinked Open Data Browser Switch,11 a sitethat lets the user decide on the browser touse and remembers that decision for the nexttime.

6. Conclusions

Shortipedia aims to show that (1) it is possible tocollect encyclopedic-style structured knowledge inthe form of object-property-value triples that canbe aggregated to answer structured queries, and(2) that volunteer contributors can be used to in-tegrate and validate a variety of existing sources ofsuch triples. It builds on the widely used SemanticMediaWiki framework, which supports the entry ofstructured facts about the topic of any given page.We expect Shortipedia to provide valuable feedbackand experiences for the development of scalable andopen collaboratively-built knowledge bases of thefuture.

Acknowledgements

Work presented in this paper has been fundedby the EU IST FP7 project RENDER and the Na-tional Science Foundation (NSF) under Grant num-ber IIS-0948429. We acknowledge (and appreciate)the MediaWiki and Semantic MediaWiki commu-nity for their continuous support for the SMW plat-form and being the incubator for many of the ideaspresented in this work.

11http://browse.semanticweb.org

6

http://browse.semanticweb.org

References

[1] Phoebe Ayers, Charles Matthews, and Ben Yates. HowWikipedia works. No Starch Press, San Francisco, CA,October 2008.

[2] Daniel J. Barret. MediaWiki. O’Reilly, 2008.[3] Christian Bizer, Tom Heath, and Tim Berners-Lee.

Linked Data – the story so far. International Jour-nal on Semantic Web and Information Systems, 2009.Special Issue on Linked Data.

[4] Christian Bizer, Jens Lehmann, Georgi Kobilarov,Sören Auer, Christian Becker, Richard Cyganiak, andSebastian Hellmann. DBpedia – a crystallization pointfor the web of data. Journal of Web Semantics,7(3):154–165, 2009.

[5] Laura Black, Ted Welser, Jocely DeGroot, and DanielCosley. “Wikipedia is not a democracy”: Deliberationand policy-making in an online community. In Annualmeeting of the International Communication Associa-tion, Montreal, Quebec, Canada, May 2008.

[6] Kurt Bollacker, Colin Evans, Praveen Paritosh, TimSturge, and Jamie Taylor. Freebase: a collaborativelycreated graph database for structuring human knowl-edge. In SIGMOD’08: Proceedings of the 2008 ACMSIGMOD International Conference on Management ofData, pages 1247–1250, New York, NY, USA, 2008.ACM.

[7] Bin Chen, Ying Ding, Huijun Wang, D.J. Wild,Xiao Dong, Yuyin Sun, Qian Zhu, and M. Sankara-narayanan. Chem2bio2rdf: A linked open data por-tal for systems chemical biology. In Nick Cerconeand Jimmy Huang, editors, Proceedings of the Inter-national IEEE/WIC/ACM Conference on Web Intel-ligence and Intelligent Agent Technology (WI-IAT),Toronto, Canada, August 2010.

[8] Tim Chklovski. Designing interfaces for guided collec-tion of knowledge about everyday objects from volun-teers. In Proceedings of the 2005 International Confer-ence on Intelligent User Interfaces (IUI’05), San Diego,CA, January 2005.

[9] Tim Chklovski and Yolanda Gil. An analysis of knowl-edge collected from volunteer contributors. In Proceed-ings of the Twentieth National Conference on ArtificialIntelligence (AAAI’05), Pittsburgh, PA, July 2005.

[10] Tim Chklovski and Yolanda Gil. Towards manag-ing knowledge collection from volunteer contributors.In Proceedings of the 2005 AAAI Spring Symposiumon Knowledge Collection from Volunteer Contributors(KCVC), Stanford, CA, March 2005.

[11] Basil Ell. Integration of external data in semantic wikis.Master’s thesis, Hochschule Mannheim, 2009.

[12] Hugh Glaser, Afraz Jaffri, and Ian Millard. Managingco-reference on the semantic web. In Linked Data onthe Web (LDOW’09), Madrid, Spain, April 2009.

[13] Peter Haase, Daniel M. Herzig, Mark Musen, andDuc Thanh Tran. Semantic wiki search. In 6th An-nual European Semantic Web Conference (ESWC’09),Heraklion, Crete, Greece, volume 5554 of LNCS, pages445–460. Springer Verlag, Juni 2009.

[14] Raphael Hoffmann, Saleema Amershi, Kayur Patel, FeiWu, James Fogarty, and Daniel S. Weld. Amplifyingcommunity content creation with mixed initiative in-formation extraction. In Proceedings of the 27th Inter-national Conference on Human Factors in ComputingSystem (CHI’09), Boston, MA, April 2009.

[15] Markus Krötzsch, Denny Vrandečić, Max Völkel, HeikoHaller, and Rudi Studer. Semantic wikipedia. Journalof Web Semantics, 5:251–261, September 2007.

[16] Douglas B. Lenat and R.V. Guha. Building LargeKnowledge-Based Systems: Representation and Infer-ence in the Cyc Project. Addison-Wesley PublishingCompany, Reading, Massachusetts, 1989.

[17] Cynthia Matuszek, Michael J. Witbrock, Robert C.Kahlert, John Cabral, David Schneider, Purvesh Shah,and Douglas B. Lenat. Searching for common sense:Populating Cyc from the Web. In Proceedings of theTwentieth National Conference on Artificial Intelli-gence (AAAI’05), Pittsburgh, PA, July 2005.

[18] Eyal Oren, Renaud Delbru, Michele Catasta, RichardCyganiak, Holger Stenzhorn, and Giovannia Tum-marello. Sindice.com: A document-oriented lookupindex for open linked data. International Journal ofMetadata, Semantics and Ontologies, 3(1), 2008.

[19] Simone Paolo Ponzetto and Michael Strube. Knowl-edge derived from Wikipedia for computing semanticrelatedness. Journal of Artificial Intelligence Research,30(1), 2007.

[20] Charbel Rahhal, Hala Skaf-Molli, Pascal Molli, andStéphane Weiss. Multi-synchronous collaborative se-mantic wikis. In Gottfried Vossen, Darrell D. E. Long,and Jeffrey Xu Yu, editors, Proceedings of the Inter-national Conference on Web Information Systems En-gineering (Wise’09), volume 5802 of LNCS, Poznan,Poland, October 2009.

[21] Terrell Russell, Bongwon Suh, and Ed Chi. A com-parison of generated Wikipedia profiles using social la-beling and automatic keyword extraction. In FourthInternational Conference on Weblogs and Social Media(ICWSM’10), Washington, DC, May 2010.

[22] Aaaron Swartz. MusicBrainz: a semantic web service.IEEE Intelligent Systems, 17(1):76–77, 2002.

[23] Zareen Syed, Tim Finin, and Anupam Joshi. Wikipediaas an ontology for describing documents. In SecondInternational Conference on Weblogs and Social Media(ICWSM’08), Seattle, WA, March 2008.

[24] Giovanni Tummarello, Richard Cyganiak, MicheleCatasta, Szymon Danielczyk, Renaud Delbru, and Ste-fan Decker. Sig.ma: live views on the web of data.Journal of Web Semantics, 2010.

[25] Denny Vrandečić. Ontology Evaluation. PhD thesis,KIT Karlsruhe Institute of Technology, Karlsruhe, Ger-many, 2010.

[26] Daniel S. Weld, Fei Wu, Eytan Adar, Saleema Amer-shi, James Fogarty, Raphael Hoffmann, Kayur Patel,and Michael Skinner. Intelligence in Wikipedia. In Pro-ceedings of the 23rd National Conference on Artificialintelligence (AAAI’08), Chicago, IL, July 2008.

[27] Fei Wu, Raphael Hoffmann, and Daniel S. Weld. Infor-mation extraction from Wikipedia: moving down thelong tail. In Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge discovery andData Mining, Las Vegas, NV, August 2008.

[28] Fei Wu and Daniel S. Weld. Autonomously semantify-ing Wikipedia. In Proceeding of the ACM SixteenthConference on Information and Knowledge Manage-ment (CIKM’07), Lisbon, Portugal, November 2007.

[29] Fei Wu and Daniel S. Weld. Automatically refiningthe Wikipedia infobox ontology. In Proceedings of the17th International Conference on World Wide Web(WWW’08), Beijing, China, April 2008.

7

shortipedia: aggregating and curating semantic web …gil/papers/vrandecic-etal-jws11.pdf ·...

Documents