expertise classification of recommenders in the …etd.dtu.dk/thesis/266780/ep10_77.pdf ·...

76
Expertise classification of recommenders in the Wikipedia Recommender System Povilas Pilkauskas Kongens Lyngby 2010 IMM-M.Sc-2010-77

Upload: vudiep

Post on 11-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Expertise classification ofrecommenders in the Wikipedia

Recommender System

Povilas Pilkauskas

Kongens Lyngby 2010IMM-M.Sc-2010-77

Technical University of DenmarkInformatics and Mathematical ModellingBuilding 321, DK-2800 Kongens Lyngby, DenmarkPhone +45 45253351, Fax +45 [email protected]

Abstract

Wikipedia is a well known online encyclopedia, which is open to everyone. Itis based on a collaborative authoring principle, which gives a great value to theonline encyclopedia. Due to this fact the Wikipedia has over three millions arti-cles in English. Nowadays, the speed of increasing amount of articles is gettingslower, but still remains stunning. The Wikipedia attracts millions of visitorsand they are allowed to read, edit or create new articles. This statement is one ofthe crucial statements describing Wikipedia, which has ambiguous implications.Being open to everyone, makes Wikipedia the biggest online encyclopedia on theother hand, this statement leads to some negative features on the Wikipedia:the information in the articles could suffer from blemish of integrity, erroneousfacts and personal opinion propagation.

Wikipedia Recommender System (WRS) is a collaborative filtering system thatmakes it easy to determine the quality of an article and allows users to submitthe ratings and categories. WRS includes trust metrics, which allow any userto calculate a decentralized trust profile for any other user. Once the articleclassification was implemented into the WRS, the classification scheme startedplaying a vital role in determining the user‘s expertise area. The user expertiseis expressed by assigning different trust values to each user according to thetop-level category. This helps to improve the credibility of user‘s ratings andimplies the enhanced quality of article.

The goal of the Master thesis is to determine whether people agree about theclassification of articles in the Wikipedia. To evaluate people agreement onclassification of the Wikipedia articles, the survey is arranged. It investigatesfour different information classification schemes: Citizendium, Dewey DecimalClassification, Open Directory Project - Dmoz and top-level Wikiportals. The

ii

purpose of survey is to determine the scheme which enables people to classifyarticles the most consistently with the highest agreement level on it.

Preface

This thesis was prepared at the Department of Informatics Mathematical Mod-elling, the Technical University of Denmark in partial fulfillment of the require-ments for acquiring the M.Sc. degree in engineering. It has been carried outin the period from March 10th, 2010 to September 10th, 2010. The thesis hasbeen written under the supervision of Associate Professor Christian DamsgaardJensen.

I would like to take this opportunity to thank my supervisor for his help andguidance. I would like also like to thank Thomas Lefevre for providing usefulinformation about the Wikipedia Recommender System. My special thanks goto my friends and colleagues. Finally, thanks to my girlfriend and family forsupport and encouragement.

Lyngby, September 2010

Povilas Pilkauskass081544

iv

Contents

Abstract i

Preface iii

1 Introduction 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Wikipedia Recommender System . . . . . . . . . . . . . . . . . . 21.3 Improvements of Wikipedia Recommender System . . . . . . . . 31.4 Structure of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Terms definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 State of the Art 72.1 Wikipedia Recommender System . . . . . . . . . . . . . . . . . . 72.2 Classification schemes . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Impact of Classification on Trust Dynamics . . . . . . . . . . . . 182.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Analysis 253.1 The criteria of appropriate scheme . . . . . . . . . . . . . . . . . 253.2 The metrics of classification scheme . . . . . . . . . . . . . . . . . 263.3 Survey legitimacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.4 The results of the survey . . . . . . . . . . . . . . . . . . . . . . . 333.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Implementation 474.1 WRS overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 Applet interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.3 Rating Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

vi CONTENTS

4.4 Trust Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Evaluation 535.1 Overhead caused by the WRS . . . . . . . . . . . . . . . . . . . . 535.2 Limitation of WRS . . . . . . . . . . . . . . . . . . . . . . . . . . 565.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Conclusion 576.1 Wikipedia Recommender System . . . . . . . . . . . . . . . . . . 576.2 Summary of Conclusion . . . . . . . . . . . . . . . . . . . . . . . 596.3 Future works and research . . . . . . . . . . . . . . . . . . . . . . 59

A DDC as automatic classification 61

List of Tables

2.1 Interaction with weights value . . . . . . . . . . . . . . . . . . . . 82.2 Citizendium workgroups . . . . . . . . . . . . . . . . . . . . . . . 152.3 Dewey Decimal Classification classes . . . . . . . . . . . . . . . . 162.4 Open Directory Project top-level categories . . . . . . . . . . . . 172.5 Wikipedia top-level portals . . . . . . . . . . . . . . . . . . . . . 182.6 Implemented properties in trust evolution function of WRS . . . 21

3.1 Possible answers into evaluation questions . . . . . . . . . . . . . 283.2 The value of answers . . . . . . . . . . . . . . . . . . . . . . . . . 283.3 Online schedule for schemes . . . . . . . . . . . . . . . . . . . . . 323.4 People experience scores for different questions to all classification

schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.1 Page loading time with/without WRS. . . . . . . . . . . . . . . 54

viii LIST OF TABLES

List of Figures

2.1 Flowchart for HTML document in WRS plugin . . . . . . . . . . 102.2 Types of classification . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Flat and hierarchical classification . . . . . . . . . . . . . . . . . 142.4 Trust Model proposed by Jonker and Treur . . . . . . . . . . . . 192.5 Example how n influences the curvature of curve . . . . . . . . . 222.6 The trust evolution function in WRS with n = 2 . . . . . . . . . 23

3.1 The example of Experience value calculation . . . . . . . . . . . . 293.2 Articles classification results with Citizendium . . . . . . . . . . 343.3 Answers into questions about Citizendium . . . . . . . . . . . . . 353.4 Articles classification results with Dewey Decimal Classification . 363.5 Answers into questions about Dewey Decimal Classification . . . 373.6 Articles classification results with Open Directory Project . . . . 393.7 Answers into questions about Open Directory Project . . . . . . 403.8 Articles classification results with Wikiportals. . . . . . . . . . . 413.9 Answers into questions about Wikiportals . . . . . . . . . . . . . 423.10 Total results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 WRS plugin components . . . . . . . . . . . . . . . . . . . . . . . 484.2 WRS Applet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.1 Loading times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

x LIST OF FIGURES

Chapter 1

Introduction

1.1 Introduction

Wikipedia is a well known online encyclopedia which is open to everyone. Every-body can not only read articles, but also edit existing articles or add their ownarticle on a certain topic. This collaborative authoring principle gives a greatvalue to the online encyclopedia. Due to this fact the Wikipedia has 3,362,1331

articles in English and also there are more than 800,000 articles in other lan-guages. Nowadays, the speed of increasing amount of articles is getting slower,but still remains stunning. The Wikipedia attracts millions of visitors, becauseit is fully integrated with the biggest search engines (Google, AltaVista, Lycosand etc.) and online encyclopedia contains wide variety of articles. Visitors ofthe Wikipedia could read, edit or create new articles.

Possibility to create or edit the article is open to everyone. This statement is oneof the crucial statements describing Wikipedia, which has ambiguous implica-tions. Being open to everyone, makes Wikipedia the biggest online encyclopediain the World on the other hand, this statement enables some negative featureson the Wikipedia: the information in the articles could suffer from blemish ofintegrity, erroneous facts and personal opinion propagation. Therefore, the wiki

1http://en.wikipedia.org/wiki/Special:Statistics

2 Introduction

or collaborative written approach, which the Wikipedia is based on, is associatedwith the following risks [10]:

Accuracy. The absence of accuracy makes people hesitate about the trustwor-thiness of information. Misinformation has negative value and ingloriousreputation among the Wikipedia users.

Motives. The motives of contributing the articles are unknown. They may bebenevolent, political, commercial, humoristic or even vandal.

Uncertain Expertise. SomeWikipedia users contribute articles exceeding theirexpertise and submit hearsay, rumors, speculation or other misinforma-tion. There are some difficulties in determining the contributors‘ qualifi-cation, moreover, identifying them from pseudonyms makes it difficult tocheck the credentials and sources.

Volatility. Contributions and edits can be contradicted by the future contribu-tions or corrections. The volatility may cause the ambiguity for citation:should quotation include the version that was read or the latest version,which might be significantly changed?

Sources. Insufficient quantity of independent information citation. Minor ar-ticles contain citations to works not digitized and stored in the open In-ternet.

The negative features reduce Wikipedia‘s credibility and work up new challengesfor people. The Wikipedia Recommender System is one of the challenges forestablishing more trust in Wikipedia.

1.2 Wikipedia Recommender System

A number of projects have attempted to establish more trust in Wikipediaarticles. WikiTrust is one of them. The crucial idea of it is to analyze theWikipedia article text as well as recently edited history of that article and todetermine some measure of trust based on this information. Therefore, a stabletext does not guarantee the higher quality of article.

The alternative approach to establish more credibility to Wikipedia articles isWikipedia Recommender System (WRS). The system has a role of a mediatorbetween user and Wikipedia helping the user in prescribing the credibility ofan article. The WRS is a rating-based system which employs the trust values.

1.3 Improvements of Wikipedia Recommender System 3

Each user of WRS has his/her own trust profile which is calculated accordingto user‘s ability to classify the articles. Employing the trust profiles also givesthe benefit for weighting the rating. A user, with the higher trust value wouldhave greater influence on overall article rating than a user with less value. Thisfeature could also prevent attacks like promoting the specific item or article. Asingle attack with trust metric (value) requires the attacker to be trusted byother users. This follows that attacker should know the preferences of users inorder to get some trust. However, when the bad purpose comes into open theother users could easily distrust the attacker. This means a complete isolation.

The usage of trust metrics does not cause any inconvenience for the user. He/shedoes not have to bother with the WRS configuration or other concepts [9]. Thetrust profile is calculated by system according the user‘s ability to classify thearticles. The top-level categories of Dewey Decimal Classification is used as achoice to classify the articles. The classification scheme plays a vital role indetermining the area of expertise for user. The user expertise is expressed byassigning different trust values in each top-level category. This helps to improvethe credibility of user‘s rating and implies the enhanced quality of article.

The WRS is closely related to the MediaWiki. The ratings of articles, trustvalues are uploaded and stored in the central repository of Wikipedia and it isdone automatically by WRS. The WRS is only the plugin for the Scone proxyserver, which allows manipulation of HTML documents and does the users‘ trustand articles‘ ratings management.

1.3 Improvements of Wikipedia RecommenderSystem

To calculate the trust value for the each Wikipedia category it is necessaryto determine which information classification scheme suits the best. ThomasLefevre in his Master thesis reviewed the most prominent classification schemes:Library of Congress Classification, Universal Decimal Classification and DeweyDecimal Classification. There was a small-scale survey held analyzing onlyDewey Decimal Classification scheme. This scheme have been evaluated only bythe majority aspect. It means that if a large enough percentage of users assignthe same category for the article, the scheme is appropriate for classification.However, the fact that majority of people assign the the same category for thearticle is not sufficient to evaluate the scheme‘s feasibility for the Wikipediaarticles classification. Therefore, this Master project reviews and investigatespossible information classification schemes with a goal to determine whetherpeople agree about the classification of Wikipedia articles. Moreover, thesis

4 Introduction

analyses the evaluation approaches of classification schemes. For this purpose,the metrics are utilized. They (metrics) empower us objectively determine themost suitable classification scheme for WRS.

The determination of the best classification scheme for WRS requires as largea survey as possible. The results of it should clearly indicate the classificationscheme with which people could classify the Wikipedia articles the most reliably.We provide a survey which analyzes four classification schemes: Citizendium,Dewey Decimal Classification, Open Directory Project - Dmoz, Wikiportals onthe different topic articles. Each scheme is not only different in amount ofcategories but also from which perspective it perceives the whole information.

The classification of information plays an important role in information man-agement and retrieval tasks. Good information management is ensured by se-lecting the proper classification scheme. In order to evaluate the feasibility ofdeveloping a classification scheme for WRS, the metrics utilization should beconsidered. However, the classification schemes do not have any means to eval-uate the appropriateness of them. Therefore, the metrics for evaluating theinformation scheme should be created according to the eligible criteria for WRSclassification scheme.

The are three very important and related things for establishing more trust inthe Wikipedia articles: classification scheme, trust value and article ratings. Thebetter classification scheme is used for articles classification, the more precisetrust value can be calculated by the user. A precise trust value by the userprovides the article rating with great accuracy, therefore, the general credibilityof Wikipedia articles is enhanced. This is a high definition model for establishingmore trust on Wikipedia in which the WRS plays a great role.

1.4 Structure of thesis

Chapter 1 (Introduction) contains an introduction to the credibility issue whichWikipedia is facing. The chapter also reviews the WRS with explanation ofclassification scheme importance to user trust profile.

Chapter 2 (State of the art) reviews the information classification schemes whichare used in analysis. The general discussion about the trust dynamics and theinfluence to WRS is presented too.

Chapter 3 (Analysis) discusses the criteria, which the classification schemeshould have in order to use it in WRS. The description of the metrics, which

1.5 Terms definitions 5

evaluate the classification schemes, is set out. The analysis of the survey resultsis presented with explanations.

Chapter 4 (Implementation) describes the updates which were made in order tochange classification scheme on WRS. The Chapter also reviews the WRS andthe edits made on WRS regarding the changed Wikipedia‘s API.

Chapter 5 (Evaluation) discusses the results, which were achieved by the mea-surement of additional time imposed by WRS.

Chapter 6 (Conclusion) reviews the contributions which were made in the thesisand indicates the future works in order to make WRS more sophisticated system.

1.5 Terms definitions

The definition of terms used in this thesis.

Wikipedia. Wikipedia is an online encyclopedia and the term in the rest of the-sis is referring to the English Wikipedia found on http://en.wikipedia.

org.

WRS. Wikipedia Recommender System is a rating-based system that providesthe recommendations to the users.

Article. This term refers to the article of Wikipedia. The article describessome information in various means of media in purpose to inform, discusscurrent news and general interests.

The trustor. The trustor is a user, who uses the WRS in order to obtain therecommendations about the articles in Wikipedia. He/she gets the benefitfrom the recommendations, which are assigned by other users.

The trustee. The trustee refers to all the other WRS users, but not the trustor.

Trust profile. In the WRS each user has a trust profile. It stores the informa-tion about how many interactions user had and the experience of them.Moreover, it is saved whether the user is optimistic or cautious. All theinformation, which is stored on trust profile, is calculated into the trustvalue.

Trust value. The trust value is value between -1 to +1 in purpose of expressinghow user(s) trusts another user(s).

6 Introduction

Classification schemes. The classification scheme is an approach of classify-ing information. Classification schemes contains the different categorieswhich correspond different sort of information. This term refers to oneof the classification schemes analyzed in the thesis (Citizendium, DeweyDecimal Classification, Dmoz, Wikiportals).

Rating. The rating is a figure from 1 to 9 that could be assigned to a Wikipediaarticle by trustor or trustee using WRS.

Annotation. The annotation consists of the rating and category assigned tothe Wikipedia article. The annotation is stored on the central repositoryof Wikipedia.

Category. The category refers to one of the top-level classes in classificationscheme.

Chapter 2

State of the Art

In this chapter the recent achievements in Wikipedia Recommender System areintroduced, the fundamentals of classification scheme are described and the trustdynamics with the impact of Classification is reviewed.

2.1 Wikipedia Recommender System

2.1.1 WRS as a tool for credibility

The Wikipedia Recommender System was developed to establish more trust inthe Wikipedia articles by helping users to determine the credibility of an article.Readers of the Wikipedia are provided with the ratings that are submittedby previous users. If the ratings are submitted without using WRS, they areconsidered worthless. The WRS is a rating-based collaborative filtering systemthat uses trust metrics [9]. Traditional collaborative filtering systems sufferfrom the personalization issue. Usefulness of the system varies per user basisand while a particular filter is useful for one user it might give less value foranother. Therefore, the trust metric is employed. The trust profile is unique foreach user as well as contains the trust values for each category of classificationscheme. Moreover, the trust profile is maintained locally on the individual user‘s

8 State of the Art

computer. When the article is being viewed, the system searches for thosethat have rated the article, calculates the average of ratings and the categoryassigned to the article by the majority of readers. After the article is rated theWRS stores the data on the central repository, which is available for everyone,furthermore the trust profile is also updated depending on the ratings assignedby a user and the ratings which were assigned previously. The recommendationsare stored in the articles mirroring the Wikipedia in the user subpages [9], suchratings from the article http://en.wikipedia.org/wiki/Rotavirus is storedon http://en.wikipedia.org/wiki/User:Recommendations/Rotavirus.

Using ratings and category as the choice for user credibility evaluation in WRS,two metrics are introduced. The rating metric defines the quality of writingand the structure of an article, whereas the category metric determines a userunderstanding of an article. The rating metric is considered to be the primarymetric, because the most important is that an article is well written and struc-tured, while the category metric is the secondary metric which is dependent onindividual user‘s abilities, for instance, perception of the information.

Introducing two metrics - rating metric as the primary and category metric asthe secondary - the combination of metrics becomes complex. Five differentpossibilities / situations showed in table 2.1 could have great influence on theuser trust value variation. It is worth noticing that majority means the categorywhich is supported by people the most, while minority - only few of them haveassigned the category for a particular article.

Weight Rating Category0.5 Agree Disagree1 Agree Agree-1 Disagree Agree-1.5 Disagree Disagree with majority-0.5 Disagree Disagree with minority

Table 2.1: Interaction with weights value

The situation when the user agrees on rating, but disagree on category could beexplained by the fact that the user has misunderstood the article. However, asthe primary metric matches with the majority rating value, consequently entireinteraction is positive with value 0.5. The disagreement of rating and majorityof category reveals that the user misunderstood the area of expertise, therefore,the user is penalized with -1.5 on trust value.

The WRS containing embedded trust - metric enables us to determine the trustvalue for each category. This fact implies that the trust value has an impact

2.1 Wikipedia Recommender System 9

on the weighting of the ratings. The more trusted user is within a particularcategory, the greater impact should the user have on the rating of articles whichbelong to that category. The weighting formula is:

cj =1∑ni=0 ti

n∑i=0

tici,j , ti > 0.0

ti,i denotes the trust value of user i for category j and ci,j denotes the choice ofcategory j by user i.

2.1.2 WRS - the part of Scone

Scone is a programmable proxy, which has a convenient infrastructure for de-veloping Web enhancements. The main advantage of Scone is that it allowsdynamic enhancements in the form of plugin for Scone. Scone is based onIBM‘s Web Intermediares (WBI). The Scone proxy contains four componentsand each of them provides a tool for a developer:

• Proxy. The only one component - Proxy is used in WRS. It allows theinterception and manipulation of HTML documents.

• NetObjects. It saves data related to the proxy, like URI, HTML docu-ments, servers, links, user and their activities.

• Robot. Web ”assistant” which contains the database with links and doc-uments based on their attributes and provides the possibility to filter ofthem.

• Access Tracking. It is responsible for tracking user‘s actions and generat-ing the events.

2.1.2.1 WRS plugin

The purpose of WRS plugin is the main entry point for HTML stream [9], wherethe rest of utilities are invoked in proper sequence. Firstly, it is checking whetherthe page belongs to http://en.wikipedia.org domain, if the page is outside ofdomain it is displayed to a user without any changes. However, if WRS plugindetermines that the page belongs to http://en.wikipedia.org, the followingactions are included into this sequence:

1. Extraction of the annotations (ratings, categories and etc.) for the article.

10 State of the Art

2. Processing of the annotations, retrieving the relevant information.

3. Determination whether the user has rated the article before.

(a) If the user has rated the article before, the user is removed from thatset of annotation.

(b) Updating the trust value according to the extracted annotations.

4. Calculation of the score and article category according to the ratings.

5. Insertion of the score and category into HTML document.

6. Insertion of HTML document into the user‘s browser.

When this sequence of actions is completed the user can read the article and usethe applet to rate it. After the feedback is given by the user, the WRS preparesthe annotation data for uploading and uploads it. The trust value is updatedaccording to the extracted annotations.

The WRS plugin could be divided into one main governing module (WRS plu-gin) and three utility modules with the specific responsibilities. The flowchartof HTML document is depicted in figure 2.1.

Page

Modifier

Page

Extractror

Rating

Extractor

Recommen

dation

Submitter

Rating

Calculator

Session

Ratings

Trust

Updater

Web of Trust

Wikipedia

HTML Document

Recommendations

Feedback

User

Page Module Trust Module

Rating Module

Figure 2.1: Flowchart for HTML document in WRS plugin

The descriptions of each module in WRS plugin:

Page Module. This module is responsible for extraction of annotations of aparticular article. Furthermore, when the version number of the article

2.1 Wikipedia Recommender System 11

is extracted, the applet, score and category need to be inserted into theHTML document. Finally, when the user gives the feedback on the arti-cle, the annotations are prepared and inserted into the central repositorylocated on the Wikipedia.

Rating Module. This module is responsible for storing, calculating and pro-viding the rating data with a particular structure. The data structure forratings contains the necessary information and methods for utilizing them.Additionally, it facilitates the representation of the category in WRS. Therating module provides a method (container) to aggregate score and cat-egory data. Besides collecting a subset of all rating information, the con-tainer calculates the sum of aforementioned experiences modified by timedecay [9] [7]. This value is aggregated into the trust value.

Trust Module. This module is responsible for the data structure of the Web ofTrust. It is a simple data structure with insertion of ratings and retrievalof the trust value. Each user has the Web of Trust data, where the trustvalues and Ring of Review are stored. The Web of Trust data structure isimportant for the trust evolution function to update the user‘s trust value.

2.1.3 WRS as Decentralize Trust propagation system

Nowadays, when decentralized computing methods, like P2P, Grid and etc, arepopular, the decentralized annotation propagation approach is becoming in-creasingly interesting to recommender systems [9]. The WRS is a decentralizedrecommender system where each user is responsible for managing his own recom-mendation data. In comparison to centralized recommendation system, there isan entity which is responsible for establishing the recommendations. Centralizedsystems store the annotations on a central, easy to access location, whereas theannotation storage is alternating in decentralized systems. The trust profile ina centralized system is constantly updated when new annotations are uploaded,whereas in a decentralized system each entity has to check the repository fornew annotations. Moreover, the history of entity actions has to be stored withinentity. The history helps decentralized entity to check the previous items fornew annotations to make the entity‘s trust profile more mature [9].

The trust propagation in a decentralized system needs the information to bedistributed amongst the users. According to the information privacy policies,the uploaded annotations do not contain any trust information; they need to beproceeded by each entity in order to obtain trust information. Due to privacypolicies, sharing of information becomes problematic because the recommendersystems are not symmetric in terms of trust.

12 State of the Art

2.2 Classification schemes

Classification plays a vital role in many tasks of information management andretrieval [12]. The digital classification of information is essential because ofthe assisting of web directories development, contextual advertising and im-proving the the quality of Web search. Each classification scheme producesthe different approach to the whole information perception. There are thou-sands of schemes/systems - having different purpose and adaptation ways - forclassifying information. The most popular and well known are Dewey DecimalClassification, Universal Decimal Classification and Congress of Library. Theseclassification schemes provide frameworks of organizing the information accord-ing to their categories. Arranging or classifying the material (books or etc.)or information according to one of the systems, it is worth knowing that therecannot be the best classification scheme. Regardless of how good or popular theclassification system is, some sort of adaptation or investigation of suitability toa particular object have to be accomplished.

The general problem of Web page information classification could be split intomore detailed problems [12]:

• Subject classification

• Functional classification

• Sentiment classification

• Other types of classification

The subject classification is analyzing to what subject or topic the Web pageinformation could be allocated. The functional classification is concerned withthe role of the Web page information. It means that the Web page could havedifferent objects, for instance, a ”personal page”, ”course page” or ”registrationpage”. The sentiment classification is based on the author‘s opinion abouta particular theme. The other types of classification focus on the genre ofinformation.

According to how many classes the scheme contains, the classification could bedivided into binary classification (Figure 2.2a) and multiclass classification. Thebinary classification sorts the instance into one of two classes whereas multiclassclassification scheme contains more than two classes. According to the numberof classes (categories) which could be assigned to the instance, the classificationschemes could be divided into single-label classifications and multi-label classifi-cations. Using the single label classification, only one class could be assigned to

2.2 Classification schemes 13

the instance, whereas the multi-label classification involves more than one classfor the instance. Multiclass classification could be either single-label or multi-label. In a single-label multiclass classification only one class could be assignedto an instance (Figure 2.2b), however in multi-label multiclass classification theinstance could belong to one or many classes (Figure 2.2c). There are two typesof class assignment: soft classification and hard classification. In soft classifica-tion the prediction is made from being in some classes with according likelihood(Figure 2.2d) whereas in hard classification an instance could or could not bein one of the classes without an intermediate state.

���������� ����������

��� �������������������� �����������������������������������������������

�������������

������������� ������� ���������� ������

����������

�������

���������

���������

����

���������

�������

������

���������������������������

�������������������

����������

��������������������

��������������

� !"�� !#�� !$�� !%�

� !$�� !%�� !%�� !&�

� !#�� !&�� !#�� !%�

Figure 2.2: Types of classification

The Web page information classification could be divided into two types ac-cording to the organization of categories: flat classification (Figure 2.3a) andhierarchical classification (Figure 2.3b). The flat classification organizes theclassification in parallel, it means that one category does not replace another,whereas in hierarchical classification the categories are structured as a hierar-chical tree.

14 State of the Art

Analyzing the particular case of WRS, when the information classification schemeis used as a tool in forming the trust value for a user in all the categories, itis necessary to investigate how much users agree on the classification of arti-cles. Therefore, the investigation contains several schemes with a purpose toselect the best from the set. The investigation takes into consideration theseschemes: Citizendium, Dewey Decimal Classification, Open Directory Projectand Wikiportals. The most popular and well known - Dewey Decimal Classifica-tion scheme. The biggest advantage of it is, that scheme is continuously updatedand having numerical notation for meaningful division. Another classificationscheme - Citizendium top-level workgroups. This scheme distinguishes for itsconciseness. It contains only 6 top-level categories. The Open Directory Project- Dmoz proposes the biggest and widely distributed data base of classified Webcontent. According to it, all the content could be assigned to 15 different top-level categories, which perceive whole information from modern World point ofview. The last classification scheme - Wikiportals. Wikiportals - 12 top-levelportals in the Wikipedia, which help to classify all the online encyclopedia con-tent. In order to find out which classification scheme suits best the survey onhow people are able to classify Wikipedia‘s articles is done.

���� ������� �� ���� � ���� ��������

���� ������� �� ���� � ���� ��������

���� ����� �������� ���

���������������������

����������������������������

Figure 2.3: Flat and hierarchical classification

2.2 Classification schemes 15

2.2.1 Citizendium

Citizendium, stands for ”citizens’ compendium of everything”, is an open wikiproject with the purpose to create a repository for sharing knowledge. TheCitizendium project was introduced in March 25, 2007 and it currently encom-passes 13,914 articles. The crucial idea of Wikipedia and Citizendium is thesame: the articles are written in collaborative approach, which is known as wiki[5]. Therefore, many people would inquire about the differences between thesetwo online encyclopedias and why it is necessary to create Citizendium as obvi-ously Wikipedia is the biggest and there is no equal rival for it at the meantime.The necessity of creating Citizendium becomes clear after looking at the fun-damental ideas and principles of it. The intention of Citizendium project is toshare knowledge which has some kind of approval from specifically skilled peo-ple. Articles may be developed and revised by Citizendium participants but thesubject experts make a final decision on the default article which is publishedto people. The approved articles are locked and they can not be edited directly,but there is still an opportunity to edit the approved articles in the draft mode.

Using the real names in Citizendium allows to expect people to behave moreresponsible when their name is published together with what they have written.In this way people are impelled to think about their contributions more responsi-bly, because there is always a possibility to name the Citizendium users. All thepolicies, established on Citizendium are followed by users, known as constables.

All the articles in Citizendium are not assigned to the categories or projects,but workgroups. Workgroups are employed to classify the information on Citi-zendium in six different areas described in table 2.2. These workgroups are alsoused in the survey as one of the four classification schemes.

Natural Sciences Social Sciences Humanities

Arts Applied Arts and Sciences Recreation

Table 2.2: Citizendium workgroups

2.2.2 Dewey Decimal Classification

The Dewey Decimal Classification (DDC) is a means for general knowledgeorganization steadily revised in order to be up to date with knowledge. Theoriginator of DDC was Medvil Dewey in 1876. Nowadays, DDC is the mostwidely used classification scheme in the World. The biggest advantage of DDC

16 State of the Art

is that it is continuously updated by Dewey editor office. Moreover, idiosyncrasyof the classification is that it is using Arabic numerical notation for meaningfuldivision. Well developed hierarchy with well - defined classes and wide area oftopics ensure that DDC is a sophisticated tool for the classification of informa-tion.

The conceptual framework of DDC consists of 10 main top-level categories whichare showed in table 2.3 and known as the ”DDC Summaries” [6]. Each of mainclasses is divided into ten, which also consists of 10 bisections. The DDC schemealso could be image as a tree with the leaves where each leaves subordinatestheirs parents.

000 Computer science, information and general works

100 Philosophy and Psychology

200 Religion

300 Social sciences

400 Language

500 Science

600 Technology

700 Arts and recreation

800 Literature

900 History and geography

Table 2.3: Dewey Decimal Classification classes

2.2.3 Open Directory Project - Dmoz

Open Directory Project - Dmoz is the best known, the biggest and the mostexhaustive Web directory in the World [11]. The growth of the Internet isconstantly increasing and instead of resistance, the Open Directory Projectproposes the means how to organize the Internet. Everyone could contributethe organizing idea, because the project is based on Open Source spirit. Dueto this reason the Open Directory Project is also known as the Internet Brain,

2.2 Classification schemes 17

because it proposes the biggest and widely distributed data base of Web contentclassified by contributors [11]. The Open Source Project is employed by wellknown search engines and portals, including Google, AOL Search, NetscapeSearch, Lycos, DirectHit and others.

The Open Directory Project contains 15 top level categories which are listed intable 2.4. All the categories have hierarchical structure which varies from veryabstract (top - level) to very specific subject. The Web directory is maintainedby the experts who are responsible for the inclusion of sites into Web directory.Open Directory Project is like a library where instead of the book collectionsthe web links are collected in the repository.

Arts Business Computers

Games Health Home

Kids and Teens News Recreation

Reference Regional Science

Shopping Society Sport

Table 2.4: Open Directory Project top-level categories

2.2.4 Wikiportals

Wikipedia contains 12 major categories which are known as Portals (table 2.5).Every portal may contain smaller portals under the top-level portal. The in-formation classification of Wikipedia sometimes appears merely bizarre. Eachtop-level portal contains different structure: some portals do not have any cat-egories. The absence of clear and logical structure makes it difficult to searchfor information using hierarchical approach.

Wikiportals classification scheme is also used for analyzing how well Wikipediaarticles could be classified according to this scheme. The scheme of 12 differentcategories involves a wide range of information published on Wikipedia. There-fore, the survey of articles classification determines how solid people are able toclassify with Wikipedia‘s top-level portals scheme.

18 State of the Art

General reference History and events Philosophy and thinking

Culture and the arts Mathematics and logic Religion and belief systems

Geography and places Natural and physical sciences Society and social sciences

Health and fitness People and self Technology and applied sciences

Table 2.5: Wikipedia top-level portals

2.3 Impact of Classification on Trust Dynamics

At first glance, the Trust and information classification does not contain any-thing in common, however in the WRS these factors become the crucial pointsin reengineering the Wikipedia for reputation.

2.3.1 Trust and Trust Management

The Trust is omnipresent in all interactions amongst people [3]. The societycan not exist without trust. The concept of trust causes some difficulties inobservation and relation with a simple observable fact. Trust could be charac-terized as constantly volatile instead of being the static. In the context of trustmanagement it is generally agreed that trust is evolving from the amount ofinteraction based on [4]:

• Personal experience.

• Recommendations and the reputation of the other party.

• Environmental factors (situational trust) and individual factors (disposi-tional trust).

2.3.1.1 Initial Trust

The introduction of a new entity into the system requires the dedication of initialtrust in it. Considering the fact that trust value could obtain values from -1.0 to1.0, Jonker and Treur [7] suggested two types of initial trust: initially trustingand initially distrusting, where the initially trusting could vary from 0.0 to 1.0and initially distrusting vary from -1.0 to 0.0. A new user in WRS obtains initial

2.3 Impact of Classification on Trust Dynamics 19

trust value equal 0.0. According to the technical report written by Stephen PaulMarsh, the trust value equalling 0.0 may be assigned for a new entity which isintroduced into the system or for the trustee which was having the distrustedreputation, but later on after positive interaction was being obtained the trustvalue would equal to 0.0.

������������

�� ��

����������

�� ��

����������

����� ��

������������

����� ��

Figure 2.4: Trust Model proposed by Jonker and Treur

2.3.1.2 Trust Dynamics

The Trust Dynamics express different approaches to how trust values couldevolve. Positive and negative experiences have a particular influence on trust.Each communication between entities are followed by feedbacks which can beused to determine the relationships between them. The feedbacks are also usedin updating the trust function. Jonker and Treur [7] have identified six types oftrust dynamics:

Blindly positive. This type of trust dynamics belongs to unconditional truststate, when the trustor blindly relies on trustee after some positive ex-periences. When the trustee reaches this state it will remain in it for allfuture interactions.

Fast positive, slow negative. When the trustor establishes the trust on the

20 State of the Art

trustee after few positive interactions and it requires many negative inter-actions for trustor to destroy the trust on trustee.

Balanced fast. This type describes the trustor which can establish as well asdestroy the trust on trustee rapidly .

Balanced slow. This type describes the trustor which can establish as well asdestroy the trust on trustee slowly.

Slow positive, fast negative. When the trustor establishes the trust on thetrustee after many positive interactions and it requires only few negativeinteractions for the trustor to destroy the confidence on trustee.

Blindly negative. This type of trust dynamics belongs to unconditional dis-trust state, when the trustor blindly distrusts the trustee after some pos-itive experiences. When the trustee reaches this state it will remain in itfor all future interactions.

The human experiment with trust dynamics [3] revealed that the order of pos-itive and negative experience is important for trust dynamics. Moreover, it isimportant to notice that old interactions may be less valuable than the currentinteractions. Therefore, an aging strategy should be defined to take this intoaccount. The interactions after 3 months are worth only 50% of the originalvalue and after 9 months only 25%, after one year they are no longer counted.Considering the WRS case, there is the validity limit for ratings being includedinto updating trust value. When the limit is reached, the ratings are not em-ployed. The decomposition strategy is adopted in the same manner as it wasdescribed above. Each personal interaction has a value equal to 1

10 in positivedirections the same as well as negative.

2.3.1.3 Trust Evolution

The trust evolution function is the crucial point of the trust model. The functionin mathematical expression describes the approach of the trust value derivationand implements the initial trust and trust dynamics. Jonker and Treur [7]have described the 16 properties of trust evolution function. 11 of them areimplemented in the trust evolution function of WRS (Table 2.6). As it wasaforementioned, the initial trust for new trustee is neutral and equal to 0.0,therefore, the properties 4 and 5 hold. The WRS does not include the oldratings in trust value calculation, this implies that property 8 hold. Properties9 and 10 also hold, because a maximum of 10 positive/negative experiences thetrust will always be positive/negative.

2.3 Impact of Classification on Trust Dynamics 21

Property Description1. Future independence The values of evolution function de-

pend only on the past experience.2. Monotonicity The trust function is monotonic,

x ≤ y ⇒ f(x) ≤ f(y).4. Maximal initial trust The limitation of maximum trust

value for trustee at initial point.5. Minimal initial trust The limitation of minimum trust

value for trustee at initial point.6. Positive trust extension Express that after a positive experi-

ence, trust will increase.7. Negative trust extension Express that after a negative expe-

rience, trust will decrease.8. Forgetting about the past The function applies the age limita-

tion for values.9. Degree of trust dropping Express after how many negative ex-

periences trust will be always nega-tive.

10. Degree of trust gaining Express after how many positive ex-periences trust will be always posi-tive.

11. Positive limit approxima-tion (continious metric case)

Express the conditions under whichtrust will become maximal.

13. Negative limit approxima-tion (continious metric case)

Express the conditions under whichtrust will become minimal.

Table 2.6: Implemented properties in trust evolution function of WRS

The WRS has trust evolution function which based on superellipse. The expres-sion of formula is:

| xa|n + | y

b|n= 1

The parameter x expresses the sum of interactions and y defines the calculatedtrust value. The trust/distrust on trustee could be denoted as following 0.0 ≤ 1.0(in case of trust) and −1.0 ≤ 0.0 (in case of distrust). The parameters a andb express the radius of the superellipse. In WRS these parameters are equalto 1. n denotes the curvature of curve, according to curvature, it is possibleto determine the characteristics of trustor. The initial value of n is 1 and itcould be incremented or decremented by +0.1 or -0.1 subject to the trustorcharacteristics, this shows whether the trustor is cautious or optimistic. Thecurve with cautious feature is having n < 1.0, whereas optimistic curve hasn > 1.0. The example how n influences the curvature of curve is depicted infigures 2.5a and 2.5b.

22 State of the Art

����� ���

����

����

�����

����

����

������ ���

������� ���������� ���

��������� ������

�������� � �� ������� ���

��������� ������

Figure 2.5: Example how n influences the curvature of curve

The trust evolution function produces the outcomes only in 1st and 3rd quarterof coordination system. The 2nd quarter expresses the situation when the trustorhas positive trust value with negative interactions, however, such a situation isnot possible, because only positive interactions could cause positive trust value.The 4th quarter expresses the situation when the trustor has negative trustvalue after the positive interactions. This situation is also not possible, becausenegative trust value is obtained only after negative interactions.

There are four situations in WRS in relation to trust evolution function whichneeds to be described: an optimistic curve in trust 2.1, a cautious curve in trust2.2 , an optimistic curve in distrust 2.3 and a cautious curve in distrust 2.4. Theexpressions are also depicted in figure 2.6. It is worth mentioning that all theseformulas, which express the different scenario in WRS, hold the properties oftrust evolution function described in table 2.6.

| x− 1 |n + | y |n= 1 for 0.0 ≤ x ≤ 1.0, 0.0 ≤ y ≤ 1.0, n ≥ 1.0 (2.1)

| x |n + | y − 1 |n= 1 for 0.0 ≤ x ≤ 1.0, 0.0 ≤ y ≤ 1.0, n ≥ 1.0 (2.2)

| x |n + | y + 1 |n= 1 for − 1.0 ≤ x ≤ 0.0, −1.0 ≤ y ≤ 0.0, n ≥ 1.0 (2.3)

| x+ 1 |n + | y |n= 1 for − 1.0 ≤ x ≤ 0.0, −1.0 ≤ y ≤ 0.0, n ≥ 1.0 (2.4)

2.3.2 Impact for WRS

The initial version of WRS, which was developed by Thomas Rune Korsgaard[8], had only one trust value for each user. Subsequent analysis revealed that

2.4 Summary 23

����� ���

����

����

���

Figure 2.6: The trust evolution function in WRS with n = 2

only one trust value dedicated to user is not sufficient to determine the user‘scredibility. The example situation which shows the drawbacks of one trust valuecould be: the user visits the website and rates many articles on sports. In thiscase, the trust value is formed based on these interactions. After some time theuser decides to visit and rate a religion article, which does not have anythingin common with sports article, where the trust value has been formed. It isclear that such trust value could distort the ratings of an article on religion.Therefore, Thomas Lefevre [9] in his Master thesis introduced the category asa choice together with rating the articles. The introduction of new feature intoWRS has the impact which manifests in amount of trust value. Each user hasseveral trust values and the amount of them depends on how many top-levelcategories are defined in the classification scheme implemented in WRS. Theseenhancements of WRS establish more credibility on the user‘s assigned ratingsfor the article. Furthermore, Thomas Lefevre arranged a small-scaled survey inorder to determine how people could classify with Dewey Decimal Classification(DDC) scheme. The results showed that the DDC could classify articles reliably,therefore, the DDC top-level categories were implemented in WRS.

To conclude, the introduction of categories in WRS a has magnitude impacton ratings credibility. It means that Wikipedia article is more reliable as well.Moreover, the improved accuracy of WRS provides the feature which allows todiscover the areas of competence of the trustee.

2.4 Summary

In this chapter the Wikipedia Recommender System is described in detail. Fur-thermore, the differences between centralized and decentralized collaborativefiltering are set out. The classification schemes and their types focusing onfinding the best classification scheme in order to employ it in the WRS for es-

24 State of the Art

tablishing more credibility of the Wikipedia articles are also explained. Thechapter contains a definition of Trust Management which explains initial trust,trust dynamic and trust evolution. Moreover, the introduction of category asa choice that establishes more accuracy in article ratings and determines theuser‘s competence in specific area.

Chapter 3

Analysis

Chapter 3 analyses four different classification schemes in order to choose thebest classification scheme for WRS. The criteria for classification scheme andthe metrics for evaluating the schemes are defined. The survey is held to outlinewhich scheme is the most suitable for classification Wikipedia articles in WRS.The survey results are presented identifying the most reliably classifying scheme.

3.1 The criteria of appropriate scheme

Four classification schemes are analyzed in this investigation. In order to selectthe most appropriate classification scheme for WRS, we should define somecriteria for the best scheme [2]. The criteria help not only to define what featuresare expected from the scheme, but they are also valuable contributors to metricsdevelopment. Therefore, the list of the features for WRS classification schemeshould be:

• Intuitive

– People should find it easy to classify an article

• Complete

26 Analysis

– There should be a class for every article

– There should be enough classes to encompass the entire body of hu-man knowledge

• Concise

– There should be an article for every class

– There should be few classes so that people could remember them

• Unambiguous

– People should generally agree on the classification of articles

• Useful

– The classification should improve the ratings of WRS

3.2 The metrics of classification scheme

There are hundreds of classification schemes which are used to classify informa-tion in the World. The classification specialists and scientist unanimously agreethat there is no ”best” classification scheme. Yet some schemes are very popular.The factor which determines the popularity of a scheme is the universal adapta-tion which is relevant to coverage of different topics. Logical structure and easysearch are the the keys to success of popularity. On the other hand, placing ofinformation into a particular domain sometimes requires some substantiation ofclassification scheme.

The fact that there is no ”best” classification scheme in the World suggests thatnone of the classification schemes contain the entire measurement techniquefor evaluating them. Some researches in evaluating one or more aspects ofclassification schemes within specific content of information have been made,but the general concept of evaluation of classification schemes has not beencreated yet. The reason, why it is so complicated to apply some objectivemethods to evaluate classification schemes, is that the classification scheme inone domain or context is meaningful, whereas the same scheme in another oneis meaningless.

The absence of approach of evaluating classification schemes initiated the re-search - the purpose of which was to discover the way of how the classificationschemes could be evaluated. The evaluation of classification schemes shouldexplicitly indicate which scheme is capable of classifying articles the best andthe most satisfying.

3.2 The metrics of classification scheme 27

3.2.1 Implemented metrics

The assessment strategy consists of two parts: how people are able to classifythe articles according to a particular scheme and the people‘s satisfaction withthe classification scheme. The first part of assessment contains the statisticalcalculation of answers with a specific feature called - The majority flag. Themajority flag means that from all the possible categories of a particular article,only one category of classification scheme is more than or equals to 50%. Thisstatement complies with the mathematical description of the majority. HoweverThomas Lefevre [9] in his survey determine a majority category as a categorywhich is chosen by 80% or more of people. He thought that majority more thanor equals to 50% is inadequate to distinguish between severity of penalties. How-ever such high majority boundary is more related to trust dynamics logic, thanparticular scheme evaluation. Moreover, his survey took into account only oneclassification scheme (Dewey Decimal Classification) due to this fact it was notpossible to compare several schemes and how consistently they were classified.

Practically, if majority of people agree that appropriate article resides in specificcategory of classification scheme this classification scheme gets a flag value equalto 1. The majority flag only designate the existence of one category domination(≥ 50%) over all the other categories and the term flag only defines this fact.For simplicity reasons it is chosen that symbol 1 is a presence of majority and0 is an absence of majority.

The second part of the assessment concerns evaluation of people‘s experiencewith respective classification scheme. It is done by interviewing people afterclassifying articles, asking them to answer 4 questions:

1. Was it easy to find the right category?

2. Could you understand the logic of the category scheme easily?

3. Were the categories specific enough?

4. Did the names of the categories make sense?

Questions are the same for all the classification schemes, in order to ensure thatall schemes are analyzed with the same options. The possible answer set, whichis shown in table 3.1, is also the same for all the questions. Only one answerfrom the set could be chosen per question by a respondent.

Each answer from the set has a different weight which is shown in table 3.2. Thisindicates how people tend to agree or disagree with the question. We determined

28 Analysis

Strongly agree Moderately agree Undecided Moderately disagree Strongly disagree

Table 3.1: Possible answers into evaluation questions

that answers Strongly agree and Strongly disagree correspond accordingly to +5and -5, Moderately agree and Moderately disagree correspond accordingly to +3and -3 and answer Undecided equals to 0. Such notation allows us explicitlyindicate the differences between answers.

Value Answer+5 Strongly agree+3 Moderately agree0 Undecided-3 Moderately disagree-5 Strongly disagree

Table 3.2: The value of answers

Firstly, to calculate Experience value for the first question it is necessary tocompute what is the percentage of people who chose answers Strongly agree ...Strongly disagree. Secondly, the answers weighting (+5 ... -5) are multiplied ac-cording to the percentage value of people who chose that answer and iterativelyadded into the Experience value (Formula 3.1).

The reason why the percentage value is taken into calculation, but not the exactquantity of how many people have chosen that answer, is that different num-ber of people could classify articles according to different classification scheme.The purpose of the Experience value is to express the attitude of the peoplein numerical notation for particular question which was asked after finishingclassifying the articles with a particular scheme. When the Experience value foreach question is calculated, the total experience value is computed by addingall questions experience values (Formula 3.2). The total value indicates people‘sexperience with appropriate classification scheme.

5∑i=1

(Answer valuei ∗ Percentage valuei) = Experience value (3.1)

4∑q=1

5∑i=1

(Answer valuei ∗ Percentage valuei) = Total experience value (3.2)

3.2 The metrics of classification scheme 29

������������ �������������������������������������������

Figure 3.1: The example of Experience value calculation

3.2.2 Considered metrics

The aforementioned metrics propose two approaches how the classification schemescould be evaluated: statistical calculation of how well people can classify thearticles and people experience of schemes. Besides these two assessment ways,another two approaches of how classification schemes could be evaluated wereanalyzed. The later investigations revealed that these metrics are either ef-fectless or require special software to be developed in order to get essentialinformation. However, the further discussion is made with a purpose to explainwhy the rest of assessment approaches are inapplicable.

Analyzing the approaches of assessment of classification schemes, the calculationof conciseness value appeared to be a reasonable metric. The conciseness valueitself indicates how reliably people can classify the articles. The calculation ofthis value consists of the amount of the selected top-level categories divided bytotal amount of top-level categories in a particular classification scheme (For-mula 3.3). For example, the Dewey Decimal Classification scheme contains 10top-levels categories and on a particular article people assigned 6 different ofthem. The scattering value for certain article equals 6

10 = 0.6.

Conciseness value =Amount of selected categories

Total amount of top− level categories on scheme(3.3)

One statement could be deduced: the small scattering value means that people

30 Analysis

classify the article more reliably. However, the later analysis showed that thisvalue is not as efficient as it was expected before, because ability to classifythe article depends on many factors (education, experience, capability to un-derstand the written information and etc). For instance, classifying the articlewith Dewey Decimal Classification scheme one top-level category scored 80%and the rest 9 sharing 20%, as a consequence this value is equal to 1. The samearticle was classified with Citizendium scheme, which contains 6 top-level cate-gories and 3 of them are having 33, 33, 34 percentage with conciseness value 0.5.According to the statement of small conciseness value, people could classify thearticle with Citizendium scheme more reliably, but it is not possible to determineexplicitly to which category the article belongs, because there is not a majoritycategory. However, with Dewey Decimal Classification scheme the article couldbe determined clearly even though the conciseness value is high. Summariz-ing the conciseness value does not endow any tangible advantage in assessingthe scheme because the categories which comprise the minority in percentagevalue, enlarge the conciseness value, yet it does not have sufficient influence todetermine the category and also evaluate the scheme.

The idea to employ all the history information of a single Wikipedia articleappeared interesting and beneficial in aiming to extract important informationand assembling it into the metric. Each article of Wikipedia has its history pagewhere all the edits and the users who have made them are saved. By selectinga particular user, all his contributions which were made in the Wikipedia aredisplayed. Analysis of the articles edited by a certain user could help to elicitthe tendencies of what types or category articles user contributes the most. Ifhe/she contributes only one category of articles it could be stated that the useris proficient in a certain category. If such or very similar category exists in thecertain classification scheme, that article could have additional scores in cate-gorizing with certain categorization scheme. In order to use this method, thespecial application should be programmed, because each article is containingmany contributions from different users and to analyze each article‘s categorywhich each user has contributed requires to sift a lot of information. Finally,the reasons why this metric is not appropriate at the moment are quite obvi-ous, however, having suitable application the benefit of such metric would beundeniable.

3.2.3 Summary of metrics

The investigation takes into consideration four different classification schemes.They are evaluated according to two metrics: majority flag and user experiencevalue. The metrics are used only for assessment the schemes and they arenot been utilized in WRS. The WRS contains two metrics for user credibility

3.3 Survey legitimacy 31

evaluation: rating metric as a primary and category - secondary. These twokinds of metrics have nothing in common and the rest of the thesis is focused onassessing the classification schemes with aforementioned metrics (majority flagand user experience value).

3.3 Survey legitimacy

Online survey appeared to be the most objective way to select the best classi-fication scheme for WRS in order to classify articles of Wikipedia. The surveycomprises of:

• Four different classification schemes.

• Four different Wikipedia‘s articles.

• Four questions for evaluating the classification scheme.

The concept of choosing Wikipedia‘s articles is one of the crucial points inholding online survey. The idea how to select the articles evolved from analyzingthe current top-level Wikiportals in Wikipedia classification scheme. It wasnoticed that some articles belong to 3 different Wikiportals and it is difficultto define the main category or Wikiportals. Such distribution of the articlesacross various categories (Wikiportals) is the result of a different informationperception and the freedom of Wikipedia, because users determine Wikiportalsfor articles. Therefore, some options were established before selecting the articleswhich were analyzed in the survey:

• A possibility to assign more than one category for an article.

• A title of an article does not contain words directly related to any ofcategory name.

The first Wikipedia‘s article, which was selected to be classified with differentclassification scheme, is about Albert Einstein [15]. This article is located underthe Wikiportal named People and self, however the links to this article in theWikipedia could be found at least in two more different Wikiportals like Naturaland physical science, Technology and applied science. The second article is aboutMoon landing [18]. This article could also appear in several different categorieslike Natural and physical sciences and Technology and applied science accordingto Wikiportals. The third article describes the virus which is called Rotavirus

32 Analysis

[19]. Also there are two Wikiportals to which this article could be assigned, forexample, Natural and physical sciences and Health and fitness. The last articleis about Basketball [17] which could be categorized more explicitly .

As long as the article could belong to several categories, that article is relevantto be a candidate in the survey. However, sometimes the logic, how the articlesare classified under Wikiportals, raises some doubts, but considering that all theclassification in the Wikipedia is based only on users‘ opinion there would alwaysbe some things to negotiate. Therefore, the Wikipedia could be paralleled tocumulative and changeful entity, which highly depends on external factors.

The management of the online survey requires some statements in order to en-sure the equal opportunity to all the classification schemes to be online. Theidea is that all four schemes get 6 hours per day to be online with a shiftingschedule, shown in table 3.3. This shifting schedule vouches for every day ap-pearance of the scheme online in different day time. Moreover, the differentonline time helps to regulate the amount of people who are classifying articlesaccording to a certain classification scheme, because the probability of peopleclassify the articles at night time is less than probability in the daytime. Theintention is that in the progress of survey each classification scheme would getapproximately the same amount of answers. This would help to avoid situationlike this: 20 people have classified articles with Dewey Decimal Classificationscheme, whereas 40 people Citizendium. The compensation of the amount of an-swers endues/ensures the equal chance to elect/choose the relevant classificationscheme to the WRS and that none of schemes is under protection.

1st slot 2nd slot 3rd slot 4th slotDate\Time 7 - 13 13 - 19 19 - 01 01 - 072010-05-19 DDC Wikiportals Citizendium Dmoz2010-05-20 Dmoz DDC Wikiportals Citizendium2010-05-21 Citizendium Dmoz DDC Wikiportals2010-05-22 Wikiportals Citizendium Dmoz DDC2010-05-23 DDC Wikiportals Citizendium Dmoz2010-05-24 ... ... ... ...

Table 3.3: Online schedule for schemes

3.4 The results of the survey 33

3.4 The results of the survey

The online survey was taking place for 5 weeks. During that period 130 peoplefrom different countries and continents participated in it. The online surveyoffered a convenient way of data processing and expanded surveying area toglobal. People were asked to read 4 Wikipedia‘ articles and categorize themaccording to the classification scheme, which was on online mode at that time.When article categorization was completed, people were asked to evaluate theirexperience in the survey by answering 4 questions. At the end of the survey,all the data were calculated. The next sections describe the results which werecollected according to the different classification schemes.

3.4.1 Citizendium

The Citizendium classification scheme contains only 6 top-level categories, whichcould be rather insufficient to encompass the entire information to classify it.However, we had different survey results of the Citizendium classification schemethan we had expected. The results of articles classification with Citizendiumscheme is depicted in figure 3.2.

Two of four articles have the majority flag which means that there is one cat-egory for each article, which was selected by the majority of people. None ofthe categories scored more than 50% for articles on Albert Einstein and Moonlanding. We could explain that each article by itself may be assigned to 2 or 3different categories, what makes people confused in choosing the right category.

The names of top-level categories of Citizendium are rather abstract and encom-pass wide information range. Being not familiar enough with the significances ofeach top-level categories people assign the most relevant category according tothe way they understand the content of article and the meaning of the top-levelcategory name.

The remaining two articles are about Rotavirus and Basketball, these were clas-sified more reliably. The results revealed that majority of people assign thetop-level category Natural Sciences to the article on Rotavirus (Figure 3.2c)and Recreation to the article on Basketball (Figure 3.2d). Most of people couldeasily find the top-level categories for these two articles because they could con-temporize the content of article with top-level categories designation withoutmuch difficulty.

The statistical calculation shows that there are 2 articles with a category, which

34 Analysis

��

��

��

��

��

��

��

�� ��

��������

������

��������

��������� � �� ��������� ���

�����������

��� �����

��������������

�������

�������������������

������������� ��������������

��

��

��

��

��

��

��

��

�� ��

��������

������

��������

��������� � �� ��������� ���

�����������

��� �����

��������������

�������

�����������������

������������� ��������� ��

��

��

��

��

��

��

��

�� ��

��������

�������������� ��������� ��������� ���

�����������

��� �����

��������������

�������

�����������������������

��

��

��

��

��

��

�� ��

��������

�������������� ��������� ��������� ���

�����������

��� �����

��������������

�������

��������������������

���������������� �������� ����������������������

Figure 3.2: Articles classification results with Citizendium

was assigned by the majority of people who had categorized the articles withCitizendium scheme. As a consequence the metric majority flag has the value 2of 4, it also means that according to the calculation people are able to classifythe information moderately well, however, the users‘ experience value revealedthat people are no satisfied with this scheme. The total user experience valueequals 113. The results of then questions, asked at the end of classification withCitizendium scheme, are depicted in figure 3.3.

The most interesting fact is that people absolutely disagree with the question”Are categories specific enough” (Figure 3.3c). Looking into a Figure 3.3c it isnot simple to understand why people disagree. The reason is that the amount ofnegative choices (Moderately disagree and Strongly disagree) is greater than pos-itive (Strongly agree and Moderately agree), therefore, user satisfaction equals-28.

3.4 The results of the survey 35

��

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

������

����������

��

��

��

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

������

����������

��

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

������

����������

��

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

������

����������

���������������� �����

���� �������� ������� ����������

���������������� �����

����������� ������������������ ��������������

�������

���������������� �����

������������ ������� � ������������������������ �� ��

!�������������������� �����"��������

Figure 3.3: Answers into questions about Citizendium

To conclude, it is worth to mentioning that people experience with this scheme ispoor. However, the results of how articles are classified employing Citizendiumscheme revealed that people are able to classify the information moderatelywell. Therefore, this scheme is not suitable for use in the WRS as the mainclassification scheme, because the way people assigned the categories does notmatch with people experience in Citizendium scheme.

3.4.2 Dewey Decimal Classification

The Dewey Decimal Classification scheme is one of the widely and often usedclassification schemes in the World. The scheme is comprised of 10 top-levelcategories with an enumeration from 0 to 9. The Dewey Decimal Classificationscheme is already used in WRS as the main classification scheme to classifyarticles and assign the user trust value for each category. Thomas Lefevre in his

36 Analysis

Master thesis [9] determined that Dewey Decimal Classication scheme is mostsuitable scheme for entire information classification. Current analysis showedthat none of the Wikipedia articles had a category, which was assigned by themajority of people who had categorized the articles with Dewey Decimal Clas-sification scheme. The results are shown in figure 3.4. The majority flag valueis equal to 0 out of 4. This also means that people are able to classify articlespoorly. Mostly their choice on articles distributes into 3 categories, but thereis no single category that earns majority of votes. The typical categories‘ dis-tribution on Dewey Decimal Classification scheme is depicted in figure 3.4a,c,d.

��

��

��

��

��

��

��

��

��

� � � � � � �

��������������

�������

�����������������������

� �� ������ ���������

����� �������������������

�����

��!"������"#�����

��#�"����#

��$������$������

��$������������

%��"� �����

��&��"�����#�������������

�������

��'��������(���������

��

��

��

��

��

� � � �

��������������

�������

��������������������� �� ������ ���������

����� ����������

��������������

��$������$������

��&��"�����#�����

���������������

��)�����#���������"#� �

����*������"#

��

��

��

��

��

��

��

��

� � � � � � �

��������������

�������

�����������������

� �� ���������������

����� �������������������

�����

��!"������"#�����

��#�"����#

��(�������

��$������$������

��$������������

%��"� �����

��&��"�����#�������������

�������

��)�����#���������"#� �����

*������"#

��

��

��

��

��

��

��

��

��

� � � � � � �

��������������

�������

��������� ��!������

� �� ���������������

����� �������������������

�����

��$������$������

��$������������%��"� �����

��&��"�����#� ������������

�������

��'��������(���������

��+���������

��)�����#���������"#� �����

*������"#

�������������� ���������� ���������������������

������������������� ����������������� ���

Figure 3.4: Articles classification results with Dewey Decimal Classification

To explain such distribution of categories‘ it is worth remembering that articlesabout Albert Einstein and Moon landing could belong to several categories. Thecategories like Computer science, information and general works and Sciences

3.4 The results of the survey 37

and Mathematics have overlapping point that could be the reason for misleading.Moreover analyzing the results of how people classified articles on Rotavirus andBasketball the concern arises not only about the Dewey Decimal Classificationscheme‘s benefit and accuracy, but also about people responsibility doing thetasks. It is surprising that 15% of people assign the article on Basketball to theComputer science, information and general works category and 15% of peoplethought that it corresponds to Social Science while the most suitable category -Arts and Recreation - was only chosen by 44% of all people who had evaluatedthe articles with Dewey Decimal Classification scheme. Probably the first wordArts mislead people in assigning the right category.

��

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

�����

����������

���������������� �����

���� �������� ������� ����������

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

�����

����������

���������������� �����

����������� ������������������ ��������������

�������

��

��

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

�����

����������

���������������� �����

������������ ������� � ��������

��

��

��

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

�����

����������

���������������� �� ��

!�������������������� �����"��������

Figure 3.5: Answers into questions about Dewey Decimal Classification

The results revealed that people who did the classification with Dewey DecimalClassification scheme are moderately satisfied with it. The total value of userexperience equals to 576 and it is 5 time greater than that of Citizendiumscheme. Overall people agree how scheme classify entire information, however

38 Analysis

the real facts showed that there is no congruence between how people assign thecategories and how satisfied with it they are. People evaluate the experienceof using the scheme more positively than doing the right classification. Thescheme appears to be compact, having not too many categories, however, it isnot very intuitive to find the right category using Dewey Decimal Classification(Figure 3.5a): positive answers contain 56% whereas negative answers equal to36%. The difference between positive and negative answers is not very greatonly 20%, therefore this could explain the fact.

3.4.3 Open Directory Project - Dmoz

The Open Directory Project is the best known and the most exhaustive Webdirectory in the World with the data base of classified Web content. Everyonewho wants to help in organizing the Internet could contribute to Web contentclassification. According to Open Directory Project all the Internet contentis classified into 15 top-level categories, which encompass entire information.The results of how people could classify using Open Directory Project top-levelcategories are even higher than was expected before. All four articles have thecategory which was assigned as the most appropriate category for particulararticle by majority of people. Therefore, the metric majority flag is equal to 4of 4. The results of articles classification with Open Directory Project schemeis depicted in figure 3.6.

The articles on Albert Einstein and Moon landing which could generally containfew categories with other classification schemes, however, with Open DirectoryProject classification scheme they are classified reliably. 61% people assignedthe Science category for article Albert Einstein and 63% people claimed thatarticle Moon landing belongs also to Science category (Figure 3.6b). Such areliably category assignation to articles could be explained by the fact thatOpen Directory Project classification scheme‘s categories are specific enoughand do not overlap with categories.

The articles on Rotavirus and Basketball are classified even more reliably withscores accordingly 68% in category Health and 68% in category Sport. Gener-ally, the statistical results are very impressive comparing to other classificationschemes. Moreover, the entire analysis also contains the people experience eval-uation, which is very important in choosing the most appropriate classificationscheme for WRS. The total people experience value with Open Directory projectclassification scheme equals to 1080 points, which is the highest result of all 4classification schemes. All Open Directory Project scores in evaluating all ques-tions are the highest (Table 3.4) across all classification schemes analyzed inthe research. People highly agree that it is easy to understand the logic of

3.4 The results of the survey 39

��

��

��

��

��

��

��

� � � � � � � ��

��������������

�������

�����������������������

�� � ���

������������

����������

������������������

��������

��� �! ����"��

��� �#� ��!��

���$!���!��

���$�"%%��&�

����$"!���'�

��

��

��

��

��

��

��

� � � � � � � ��

��������������

�������

��������� ������!���

�� ���������

���(�)���

����������

������������������

��������

��� �#� ��!��

��� �&�"����

���$!���!��

���$�"%%��&�

����$"!���'�

��

��

��

��

��

��

��

� � � � � � �

�"��#����

�"��#����

���������$�������

�� ���������

���*")%��� ��

����������

��������

��� �&�"����

���$!���!��

���$"!���'�

���$%" ��

��

��

��

��

��

��

��

� � � � � � �

��������������

�������

���������%��&������

�� (�)���

������������

������

��� �! ����"��

���$!���!��

���$�"%%��&�

���$"!���'�

���$%" ��

�������������� ���������� ���������������������

������������������� ����������������� ���

Figure 3.6: Articles classification results with Open Directory Project

the category scheme and the designation of the category convince them (Fig-ure 3.7d). This is one of the key questions reflecting the ability to choose theproper category easily and correctly. High values from both people experienceand statistical calculation of category assignation using Open Directory Projectclassification scheme makes this scheme very promising for entire informationclassification in WRS.

40 Analysis

��

��

��

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

�����

����������

���������������� �����

���� �������� ������� ����������

��

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

�����

����������

���������������� �����

����������� ������������������ ��������������

�������

��

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

�����

����������

���������������� �����

������������ ������� � ��������

��

��

��

��

��

��

���� ������ ������ ��

����

��������� ������ ��

�������

���� ��

�������

��������������

�����

����������

���������������� �� ��

!�������������������� �����"��������

Figure 3.7: Answers into questions about Open Directory Project

3.4.4 Wikiportals

The categories in Wikipedia are known as Wikiportals. There are 12 differ-ent top-level Wikiportals which are used to classify entire information on theWikipedia. The analysis takes the top-level Wikiportals into investigation ofthe best classification scheme for WRS.

The results revealed (Figure 3.8) that two Wikipedia articles contain the cate-gory which was selected by the majority of people, consequently, metric calledmajority flag value is equal to 2 of 4. For articles Albert Einstein and Basketballnone of the categories scored more than 50%. To explain this fact, it is necessaryto remember that each article may be assigned to 2 or 3 different categories,which makes people confused while choosing the right category.

3.4 The results of the survey 41

��

��

��

��

��

��

��

��

�����

��������

������ ��

�����

�����������

�� �����

������ ��

�������

�������

!�� �� ��

����

"��������

�� � ����

�������

��������������

�������

�����������������������

�������������� ����������

��

��

��

��

��

#�

�����

��������

������ ��

�����

$����� �� ���

���

����� �� ��

�����

"�������� ��

� ���� �������

��������������

�������

��������������������

���������������������

��

��

��

��

��

#�

%�

�����

��������

$����� ��

��� ���

����� ��

�� �����

������ ��

�������

�������

&������ ��

������ ������

������ ��

������

��������������

�������

�����������������

��������������������������

��

��

��

��

��

��

��

��

�����

��������

������ ��

�����

$����� ��

��� ���

&������ ��

������ ������

������ ��

������

!�� �� ��

����

��������������

�������

�������������������

������������������������ ���

Figure 3.8: Articles classification results with Wikiportals.

More reliable classification was accomplished on Moon landing and Rotavirusarticles. 51% of people assigned the History and Events category for Moonlanding and 66% thought that Rotavirus should belong to Health and fitness.According to majority flag, which equals 2 of 4, people are able to classify theinformation moderately well.

The results of total experience value, which is equal to 742 points, revealed thatpeople are satisfied with Wikiportals classification scheme. They also agreedthat there were no difficulties in finding the right category and their definitionis lucid and meaningful (Figure 3.9d). Moreover, people’s value of experience isquite high but the reliability on how the articles are classified is not competentto perform sufficient classification.

42 Analysis

��

��

��

��

��

��

��

�� ��������� ����������

�����

� ������� ����������

��������

�� ������������

��������������

�����

����������

��

��

��

��

��

��

��

��

��

�� ��������� ����������

�����

� ������� ����������

��������

�� ������������

��������������

�����

����������

���������������� �����

���� �������� ������� ����������

���������������� �����

����������� ������������������ ��������������

�������

��

��

��

��

��

��

��

��

�� ��������� ����������

�����

� ������� ����������

��������

�� ������������

��������������

�����

����������

��

��

��

��

��

��

��

��

��

�� ��������� ����������

�����

� ������� ����������

��������

�� ����

��������

��������������

�����

����������

���������������� �����

������������ ������� � ��������

���������������� �� ��

!�������������������� �����"��������

Figure 3.9: Answers into questions about Wikiportals

3.5 Evaluation

The investigation has examined four different classification schemes. Each classi-fication scheme produces different approach to the whole information perception.There are thousands of schemes - having different purpose and adaptation ways- for classifying information. The most popular, well known and also analyzedin this investigation - Dewey Decimal Classification. The biggest advantage ofit is, that scheme is continuously updated and having numerical notation formeaningful division. Another classification scheme, which was analyzed, is Cit-izendium top-level workgroups. This scheme distinguishes for its conciseness.It contains only 6 top-level categories. The Open Directory Project - Dmozproposes the biggest and widely distributed data base of classified Web content.According to it, all the content could be assigned to 15 different top-level cate-gories, which perceive whole information from modern World point of view. The

3.5 Evaluation 43

last classification scheme analyzed in this investigation is Wikiportals. Wikipor-tals - 12 top-level portals in the Wikipedia, which help to classify all the onlineencyclopedia content.

All the articles were classified with aforementioned classification schemes. Inorder to evaluate schemes we employed two kinds of metrics: majority flagmetric - which indicates that one of the categories was chosen by more than50% of people, user experience value - calculated numerical magnitude of peopleanswers into schemes evaluation questions.

No. Question Citizendium DDC Dmoz Wikiportals1 Was it easy to find the right

category?53 79 255 187

2 Could you easy understand thelogic of the category scheme?

31 160 315 175

3 Are categories specific enough? -28 97 151 874 Does the name of the cate-

gories make sense?56 239 357 290

Table 3.4: People experience scores for different questions to all classificationschemes

Table 3.4 clearly indicates that the Open Directory Project - Dmoz classificationscheme lives up to expectations to be the main scheme for classifying articles onthe WRS. Moreover, total evaluation of analyzed classification schemes, whichis depicted in figure 3.10, confirms that Open Directory Project - Dmoz clas-sification scheme is the most appropriate scheme, because all the articles wereclassified the most reliably and with highest score assigned by people. Further-more the elected classification scheme meets these criteria described in section3.1:

• Intuitive - comparing all schemes the most of people indicated that it waseasy to classify with Open Directory Project classification scheme (Table3.4 Question 1).

• Complete - Open Directory Project contains the most top-level categoriesof the rest of the schemes. The categories encompass entire informationof modern human society.

• Concise - Open Directory Project classification scheme contains 15 top-level categories. Each category is a part of meaningful logical structure,which could contain articles for all categories. Open Directory Projectclassification scheme perceives whole information from modernWorld point

44 Analysis

of view which includes top-level categories that it is modern or fashion-able at that time. Consequently, the amount of top-level classes arises till15, which contradicts with the second completion feature. However, theinvestigation showed that schemes which contains few top-level categories,like Citizendium, do not provide better classification and furthermore thepeople experience rate is low.

• Unambiguous - in general, people agreed on how the information is classi-fied with Open Directory Project classification scheme. Their agreementwith the scheme is clearly indicated by two metrics - the majority flagand total experience value have the highest scores comparing to otherclassification schemes.

• Useful - this Master thesis is not investigating how useful the Open Di-rectory Project classification scheme is. It would be necessary to arrangenew investigation which would reveal how this classification scheme couldimprove the ratings of the WRS. The classification scheme is employed forbuilding the user trust on each category also this has the influence on theratings of articles, because the more trust value on particular category thegreater influence the user has on rating the particular category articles.

3.6 Summary 45

���

���

���

���

��� ��� ��� ���� ��������

����������� ����

������������� ������

�������������������������

�������

���

���

����

���

���

���

���

���

����

����

��� ��� ��� ���� ��������

������

������������� ������

��������������� �����

��������������������

Figure 3.10: Total results

3.6 Summary

In this chapter we have analyzed 4 different classification schemes in order toselect the best classification scheme for Wikipedia Recommender System. Cri-teria for classification scheme and the metrics for evaluating them were defined.To outline which scheme is the most suitable for Wikipedia articles classifica-tion in WRS we held the survey. The results of the survey showed that OpenDirectory Project classification scheme classifies the most reliably and has thehighest score of people experience.

46 Analysis

Chapter 4

Implementation

This chapter presents an overview of the WRS and describes the updates whichwere made in order to run the system and to change the classification schemeon WRS. All the modifications contain minor edits on Rating Module, TrustModule and Applet Interface.

4.1 WRS overview

The WRS was developed to establish more trust in Wikipedia articles by helpingusers to determine the credibility of an article. Readers of the Wikipedia areprovided with ratings that have been submitted by previous users. The WRS is arating-based collaborative filtering system that uses trust metrics to determinethe weight of the different recommendations in the overall trust value. EachWRS user has a unique trust profile. It calculates a trust value for each userand each classification scheme category which is maintained locally on the user‘scomputer. When the user rates an article, the WRS stores the data on centralrepository and updates the user trust profile depending on the ratings assignedby user and the ratings which were assigned before. The user‘s trust value has animpact on weighting of the ratings. The more trusted a user is within particularcategory, the greater impact should the user have on the rating of articles whichbelong to that category.

48 Implementation

The WRS is developed as a plugin for a programmable proxy called Scone,which allows manipulation of the HTML documents. The WRS plugin couldbe divided into one governing module (WRS plugin) and three utility moduleswith specific responsibilities, described in Chapter 2. The general picture ofWRS plugin modules is depicted in figure 4.1.

Page

Modifier

Page

Extractror

Rating

Extractor

Recommen

dation

Submitter

Rating

Calculator

Session

Ratings

Trust

Updater

Web of Trust

Wikipedia

HTML Document

Recommendations

Feedback

User

Page Module Trust Module

Rating Module

Figure 4.1: WRS plugin components

The WRS is a program which uses the Wikipedia as a central repository forstoring all recommendations. Therefore the Wikipedia‘s API is an essentialpart of the WRS in terms of communication with repository and posting of therecommendations.

The goal of the Wikipedia Application Programming Interface (API) is to pro-vide direct and high-level access to the data which is stored on the MediaWikidatabases [16]. The client programs, such as the WRS, can use the API to login,get/upload data and post changes. Each change of API has direct implicationsfor the client programs, which must be updated according to the changed API.

When the most proper classification scheme was determined, it was necessaryto implement it in the WRS. However, due to recent changes in the API ofthe Wikipedia, it was impossible to run the WRS. The login procedures couldnot establish the connection. Therefore, it was necessary to change the loginfunction. Moreover, the procedures for uploading certificates and ratings werenot working. Consequently, they were edited by changing some parameters andthe name of procedures. When all the edits were implemented in WRS, due tothe changed API, it appeared that ratings extraction from repository failed. Inorder to fix that it was necessary to make some edits in ExtractRating class,

4.2 Applet interface 49

which belongs to the Page Module.

The ExtractRatings constructor takes the URL of recommendations repositoryand constructs the input stream reader. It loads all tokens, which exist in theHTML page, into a vector. The reader searches through the vector until it findsthe text area token.

if (sig.substring(0, 16).equals("WikiTrustComment")) {

String raw_rating = "";

for (String temp = ((HtmlTextToken) tokens.remove(0)).getText();

!(temp.contains("--&gt")||temp.contains("-->"));

temp = ((HtmlTextToken) tokens.remove(0)).getText()) {

raw_rating += temp;

}

raw_ratingsVector.add(raw_rating);

}

The start of a text area in HTML document is indicated as ”WikiTrustCom-ment” and the end - ”–>” . However the delimiter (–&gt) code, which shouldmean the same as ”–>”, was not understood by the parser and caused errors.Therefore the line ”!temp.contains(”–&gt”);” was expanded with or statementindicating that ”!temp.contains(”–>”);” is also the end of text area.

All these edits which were made in WRS were not directly related to the mainobjective - the change of WRS classification scheme. However, it required quitea bit of time and effort to understand the code and to find what was notfunctioning as it should. Besides, it was not possible to debug or trace thecode. The WRS code was compiled and all files related with the Scone wereattached to compiled files. The only approach to debug the code was to use theSystem.out.println() function to print out some text or values of variables.

The sections below describe the modifications of WRS modules, which weremade in order to change the classification scheme.

4.2 Applet interface

An applet is a special program written in Java, which can be included in HTMLdocument [13]. When the Internet browser, with enabled Java technologies,open the page that contains an applet, the applet‘s code is transfered to thesystem and executed by the browser‘s Java Virtual Machine (JVM). The WRS

50 Implementation

applet is a part of the WRS. It is based on Java Applet technology and insertedinto the articles from http://en.wikipedia.org domain. The applet displaysthis information to a user:

• The average rating of article

• Majority category that was assigned by users who viewed and rated thearticle before.

In addition to these functions, WRS applet assigns of ratings and categories toan article. The overall picture of WRS applet is depicted in figure 4.2.

Figure 4.2: WRS Applet.

In order to change the top-level categories on WRS applet, it is necessary toedit entries of JComboBox list. When the list is changed the user could see newcategories on the applet and assign one of them to a particular article. TheJava code below indicates that Open Directory Project top-level categories arelisted, when the applet appears on the user‘s browser.

String[] categoryStrings = { "Please select a category",

"Arts",

"Business",

.

.

.

"Society",

"Sport"};

4.3 Rating Module 51

4.3 Rating Module

The classes Category and RatingCalculator are a part of Rating Module thatneeds to be insignificantly modified. The latter class contains the method whichprocesses the extracted ratings: calculates the average rating and majority cat-egory of the article. Moreover, according to the extracted ratings, the methodcomputes the user‘s trust values within all the categories. Due to this reasonthe categories array is used as repository for storing the specific trust value ineach category. The previous classification scheme, which was used in WRS, con-tained 10 top-level categories. However Open Directory Project classificationscheme contains 15 top-level categories, consequently, the size of array shouldbe enlarged.

double[] categories = new double[15];

The new classification scheme introduction causes some changes in Categoryclass. This class is responsible for handling the categories in WRS: assigningthe category name according to it‘s digital value, getting the type of categories.The Java enum data structure is used in Category class for representing thefixed categories, which are used in the WRS. Due to new scheme introductionin WRS, this set of the constant needs to be changed. The modified enumstructure is shown below.

public enum Type {

ARTS, BUSINESS, COMPUTERS, GAMES,

HEALTH, HOME, KIDSTEENS, NEWS,

RECREATION, REFERENCE, REGIONAL, SCIENCE,

SHOPPING, SOCIETY, SPORT

}

4.4 Trust Module

Marginally edits are required for the Reviewer class which belongs to the TrustModule. This class implements the trust dynamics and calculates the trust valueusing the trust evolution function. There is also the implementation of fourbasic methods, which retrieves and assigns the username and the trust valueassociated with a category. The Reviewer constructor, besides implementingsome trust dynamics logic, contains 7 arrays, which size needs to be enlarged as

52 Implementation

well, due to increased top-level categories that were extended to the size of 15.The modified constructor is shown beneath.

public Reviewer() {

noOfPosInteractions = new int[15];

noOfNegInteractions = new int[15];

curve = new Curve[15];

state = new State[15];

xValue = new double[15];

nValue = new double[15];

trustValue = new double[15];

for (int n = 0; n < 15; n++) {

noOfPosInteractions[n] = 0;

.

.

.

trustValue[n] = 0.0;

}

4.5 Summary

To change the classification scheme, required minor edits on WRS. The modifi-cations involved Applet Interface, Rating Module and Trust Module with someminor changes in the files of a particular module. The edits were mostly relatedto the enlargement of arrays size and renaming of top-level categories. Besidesthe modifications related to classification scheme, it was necessary to change theWikipedia API procedures in WRS.

Chapter 5

Evaluation

This chapter evaluates the additional time which was imposed by using theWRS. The evaluation covers the testing environment description and the resultof evaluation. Moreover, there is review of the WRS limitation.

5.1 Overhead caused by the WRS

To evaluate the computational overhead caused by WRS, it is necessary todescribe some features of the evaluating environment. First, Proxy server isrunning on the machine with parameters listed below:

• Operating System: Windows 7 Professional x86.

• RAM: 4 GB. Only 3GB are in used.

• CPU: Intel Core 2 Duo 2.1GHz

• Connection: 10/100/1000 BASET Gigabit Ethernet.

Secondly, to measure the time caused by WRS a Firebug 1.5.4 was used. It is theWeb Development add-on for Mozilla Firefox. This add-on displays information

54 Evaluation

about the page loading and splits it into different processes like Connecting,Waiting, Receiving and etc. Each process has its duration time. It is veryuseful for analyzing which process contributes to the total loading time themost.

To evaluate the extra time caused byWRS browsing the http://en.wikipedia.org it is necessary to compare the loading times when WRS is used and not.The usage of WRS plugin imposes some extra work on workstation‘s CPU be-fore the Web page is displayed to user. The time depends on the fact whetherthe page belongs to http://en.wikipedia.org or not. If the when page doesnot belong to the domain, the additional time for loading the page is equal onlyto few milliseconds and time is spent on recognizing that page. The rest ofthe procedures are skipped. When a page belongs to the domain and it hasrecommendations then the loading time is increased. The experiment takes intoaccount 90 recommendations. This amount of recommendations has been cho-sen according to the statement which was made by Thomas Rune Korsgaard [8]in his Master thesis. He stated that in order to have confidence result about anarticle, it has to contain between 80 and 100 recommendations. The experimenttook into account 3 different size of Wikipedia page:

1. Small - 28.6 KB1 - http://en.wikipedia.org/wiki/Vilnius University

2. Medium - 60.8 KB - http://en.wikipedia.org/wiki/Rotavirus

3. Large - 137.4 KB - http://en.wikipedia.org/wiki/World War II

At the beginning, all pages loading time were measured without WRS. After-wards, it was inserted 3 recommendation in each article and measured the pageloading time when WRS is used. Finally, 90 recommendations were insertedinto each article and time were measured, when WRS is used. The results ofexperiment are shown in table 5.1.

Without WRS 3 recommenda-tions with WRS

90 recommenda-tions with WRS

Small page 0.109 sec. 2.26 sec 2.35 sec.Medium page 0.124 sec. 2.99 sec. 3.06 sec.Large page 0.211 sec. 6.17 sec. 1min. 26 sec.

Table 5.1: Page loading time with/without WRS.

Concluding the evaluation of WRS it is worth to mention that overtime, imposedby WRS, depends on several factors, such as, the Internet throughput, computer

1It was measured with Firebug 1.5.4 and taken the value of Size of received response.

5.1 Overhead caused by the WRS 55

technical parameters. The latter factor influences the overtime at least, becausethe time difference between calculating 90 and only few recommendations isinsignificant 5.1, except the large articles. It was noticed that on the largearticles with 90 recommendations the WRS spent more than minute to connectto Wikipedia and only few hundreds milliseconds were taken to process therecommendations. Summarizing, the most time WRS spends on establishingthe connection with a repository.

�����

���� ����

���

���

���

� � �

���

������

����� �

��� ��� ���

������������ ����

�� ����

������������� ����

�����

���� ����

���

���

���

���

� � �

���

������

�������� �

��� ��� ���

������������ ����

�� ����

������������� ����

�� ����

���������

��

��

��

��

��

��

��

��

��

��

���

� � �

���������

�� � � �

��� ��� ���

������������ ����

�� ����

������������� ����

�� ����

����������������� ��������

���������������������������

�������������������������

Figure 5.1: Loading times.

56 Evaluation

5.2 Limitation of WRS

The current version of WRS has one weakness in how browsers control thecache. The WRS interface is injected into HTML document therefore all re-sponsibility for controlling the page handling belongs to the browser. Thetested browsers (Firefox Mozilla, Internet Explorer, Chrome) can not properlymanage the cache even though the instruction (<HTTP-EQUIV=”PRAGMA”CONTENT=”NO-CACHE”>) for Web page is inserted.

When a cached page is visited, WRS does not calculate any ratings, because thebrowser skips the Scone proxy and loads the page from the cache. Due to thisreason a notification is given to user by emphasizing that page is loaded fromthe cache and it is necessary to delete browser cache and reload the page.

5.3 Summary

In this chapter we have evaluated the WRS with a new classification schemeembedded into it. The experiment results showed that additional time spentby WRS is needed to establish the connection with repository. However theincreased amount of recommendations expanded the total loading time insignif-icantly on small and medium pages.

Chapter 6

Conclusion

This chapter concludes the Master thesis with emphasizing the achievements ofthis project and draws a future works and researches on the WRS.

6.1 Wikipedia Recommender System

Wikipedia Recommender System is a collaborative filtering system that makes iteasy to determine the quality of an article and allows users to submit the ratingsand categories. WRS includes trust metrics, which allows any user to calculatea decentralized trust profile for any other user. Once the article classificationwas implemented into the WRS, the classification scheme started playing a vitalrole in determining the user‘s expertise area. The user expertise is expressed byassigning different trust values to each user according to the top-level category.This helps to improve the credibility of user‘s ratings and implies the enhancedquality of article.

The goal of the Master thesis was to determine whether people agree aboutthe classification of articles in the Wikipedia, because the assessment of recom-mender expertise would not provide useful information if people generally dis-agree about the topic of articles. To evaluate people agreement on classificationthe Wikipedia articles, the survey was arranged. It investigated four different

58 Conclusion

information classification schemes: Citizendium, Dewey Decimal Classification,Open Directory Project - Dmoz and top-level Wikiportals. The purpose of thesurvey was to determine the scheme which enables people to classify articles themost consistently with the highest level of agreement.

The absence of formal approach of evaluating classification schemes initiated theresearch - the purpose of which was to discover the way of how the classificationschemes could be evaluated. The assessment of the schemes was captured by twometrics: majority flag and user‘s experience value. The initial goal of the ma-jority flag was to indicate that from all possible categories on particular schemethere is one category which was chosen by majority people (≥ 50%). The expe-rience value was considered as the primary metric for evaluating schemes. It wasdesigned according to the criteria, which indicated what features classificationscheme should contain in order to be feasible in WRS. To calculate experiencevalue people were asked to answer four questions after the articles classification.Each answer had its weight value which was multiplied by how many percentof the people who selected that answer and iteratively added to the experiencevalue. When the experience value for each question was calculated, the totalexperience value was computed by adding all questions experience values. Thetotal value indicates people’s experience with appropriate classification scheme.

All the articles were classified with aforementioned classification schemes. Inorder to assess schemes we employed two kinds of metrics: user experiencevalue and majority flag. The results of user experience value are as follows:Citizendium scored 113, Dewey Decimal Classification - 576, Open DirectoryProject - 1080 and Wikiportals - 742. The results of majority flag are as fol-lows: Citizendium - 2 out of 4, Dewey Decimal Classification 0 out of 4, OpenDirectory Project - 4 out of 4 and Wikiportals - 2 out of 4. According to the met-rics, the Open Directory Project displays the highest results of classifying thearticles of Wikipedia. Reliable classification could be explained by the fact thatOpen Directory Project classification scheme‘s categories are specific enoughand do not overlap with categories. Furthermore, it contains 15 top-level cat-egories and suggests the classification framework which corresponds to modernsociety expectations.

As it was mentioned before, all schemes were assessed using user experiencevalue and majority flag. The uncertainty of majority flag metric raised newdiscussions about usefulness of it. The initial goal of majority flag is to indicatethat there is one category which was chosen by majority people. Also it couldbe used as a mean of determination the consistency of user classification. How-ever later analysis revealed that this metric creates uncertainty when majorityis a bit more than 50%. The minimum preponderance of majority, for instance51%, creates maximum level of uncertainty (49%). The categories which con-sist the minority still could have influence to majority. Moreover, maximum

6.2 Summary of Conclusion 59

uncertainty indicates that there is no consistency of people‘ agreement on arti-cles‘classification. Therefore, I believe that majority flag would probably makemore sense if the majority boundary would be equal at least to 60%. Enhancedboundary would mitigate the uncertainty level.

Concluding, the benefit of majority flag is not so much important as user experi-ence value, because user experience value was designed according to the criteria.Criteria describe what features the classification scheme should have in order touse it in WRS (Chapter 3).

6.2 Summary of Conclusion

The assessment of the classification schemes revealed that Open Directory Project- Dmoz classification scheme lives up to our expectations to be the main schemefor classifying articles on the WRS. Open Directory Project classication schemeconsiders all information from modern World point of view which includes top-level categories that it is modern or fashionable at that time. According tosurvey results, classification was accomplished most reliably comparing the re-maining schemes and user‘s experience value is the highest from all the schemes.This means that people in general agree on how the information could be clas-sified with Open Directory Project classification scheme.

Concluding, the classification scheme takes an important place in WRS, becauseit is a part of credibility enhancement on the Wikipedia article. The moreconsistently people classifying, the more precise trust values could be calculated,therefore, the more accurate ratings are provided to the article.

6.3 Future works and research

A new implemented articles classification scheme requires the assessment ofhow useful it is. It would be beneficial to see how classification scheme directlyinfluences the ratings.

Looking from the security point of view, the way how WRS logging on to theWikipedia should be changed. The credentials which are sent via POST mes-sage are not encrypted and could be seen clearly. If WRS logs on via https://

secure.wikimedia.org/wikipedia/en/w/index.php?title=Special:UserLogin

it would be possible to solve the security issue.

60 Conclusion

One of the important area of research for the WRS is empirical analysis of trustfunction. It would require to have a large data set for comparison the currentimplemented prediction algorithm on the WRS with other algorithms which arepopular in recommendation systems.

Cold start problem is a common problem for recommender systems. It appearswhen a new user is introduced into the recommendation system. To deal withthis problem, the trust propagation is used. However, decentralized approachof the WRS causes difficulties of trust distribution. Moreover, probably thereare no solutions how to extend the current trust propagation scheme.

Appendix A

DDC as automaticclassification

Various projects are exploring the possibility of replacing the human classifi-cation with software, which could accomplish the automatic classification ofresources [1]. Dewey Decimal Classification (DDC) has been explored for per-forming the automatic Web classification, but only few studies purpose theapproaches how it could be achieved.

The DDC is a large-scaled classification scheme, which imposes some obstaclesto text classification, including very deep hierarchy, data sparseness and unevencategory distribution. To resolve these problems, the scheme needs to be recon-structed. This means that automatic process should follow a path different froma human classifier to traverse a category hierarchy. Therefore, Jun Wang andMeng-Chen Lee [14] proposed algorithm, which reshaped the DDC into a bal-anced virtual taxonomy by merging sparse categories, cutting dense branchesand flattening the hierarchy. Consequently, the trimmed DDC conatins evenless categories (in this case including all the categories, not even top-level) andtaxonomy depth is reduced from 23 to 4. However, the average amount of doc-ument per category is increased significantly. The dramatic reconstruction doesnot have considerable impact for classification. The reconstructed DDC withinteractive classification improves the accuracy nearly 90%, reaching the levelof practical application.

62 DDC as automatic classification

The automatic Wikipedia‘s article classification results could be used as tem-plate compared with the results of how WRS user does the articles classification.According to that it would be possible to determine how reliably users classifyarticles penalize/award him/her in specific category. However, the automatictext categorization is based on machine learning and to implement it on WRSnecessary to develop software. I believe that the usage of such kind softwarewould increase the extra time on processing the Wikipedia article with WRSsignificantly. Moreover, it would be necessary to investigate the accuracy ofhow the classification is done. The accuracy strongly depends on classifier.However, human classifier does not have strong rivals in performing consistencyclassification.

Bibliography

[1] Sue Batley. Classification in Theory and Practice. Chandos Publishing,Oxford, UK, 2005.

[2] T. Lefevre C. Jensen. Evaluating the expertise of recommenders in thewikipedia recommender system. In Short Paper Proceedings of IFIPTM2010, Morioka, Japan, 2010.

[3] Jan Theeuwes Catholijn M. Jonker, Joost J.P. Schalken and Jan Treur.Human experiments in trust dynamics. Springer-Verlag Berlin Heidelberg,2004.

[4] Thomas Rune Korsgaard Christian Damsgaard Jensen. Dynamics of trustevolution. Proceedings of International Conference on Security and Cryp-tography - Portugal : Institute for Systems and Technologies of Information,Control and Communication, 2008 (509-518 p.), 2008.

[5] Citizendium. Citizendium. http://en.citizendium.org/wiki/

Citizendium, April 2010.

[6] Diane Vizine-Goetz Joan S. Mitchell. The dewey decimal classification.OCLC Online Computer Library Center, Inc., 2009.

[7] Catholijn M. Jonker and Jan Treur. Formal analysis of models for the dy-namics of trust based experiences. Proceedings of the 9th European Work-shop on Modelling Autonomous Agents in a Multi-Agent World: MultiAgentSystem Engineering, pages 221-232. Springer Verlag, 1999.

[8] Thomas Rune Korsgaard. Improving trust in the wikipedia. Master’s thesis,Technical University of Denmark, 2007.

64 BIBLIOGRAPHY

[9] Thomas Lefevre. Extending the wikipedia recommender system. assessingexpertise of recommender. Master’s thesis, Technical University of Den-mark, 2009.

[10] David Parnas Peter Denning, Jim Horning and Lauren Weinstein.Wikipedia risks. Communications of the ACM, 2005.

[11] Open Directory Project. About the open directory project. http://www.

dmoz.org/about.html, April 2002.

[12] Xiaoguang Qi and Brian D. Davison. Web page classification: Features andalgorithms. ACM Computing Surveys, Vol. 42, No. 2, Article 12, 2009.

[13] Java Sun. Applets. http://java.sun.com/applets, July 2010.

[14] Jun Wang and Meng-Chen Lee. Reconstructing ddc for interactive classi-fication. CIKM‘07, November 6-8, 2007, Lisboa, Portugal, 2007.

[15] Wikipedia. Albert einstein. http://en.wikipedia.org/wiki/Albert

Einstein, March 2010.

[16] Wikipedia. Api. http://www.mediawiki.org/wiki/API, July 2010.

[17] Wikipedia. Basketball. http://en.wikipedia.org/wiki/Basketball,March 2010.

[18] Wikipedia. Moon landing. http://en.wikipedia.org/wiki/Moon

landing, March 2010.

[19] Wikipedia. Rotavirus. http://en.wikipedia.org/wiki/Rotavirus,March 2010.