[ieee 2010 5th international symposium on telecommunications (ist) - tehran, iran...

A Social Network-based Meta Search Engine

Mohammad Ali Ghaderi Central & intelligent processing center of excellence, School of

ECE University of Tehran

Tehran, Iran [email protected]

Nasser Yazdani Central & intelligent processing center of excellence, School of

ECE University Of Tehran


Behzad Moshiri Central & intelligent processing center of excellence, School of

ECE University of Tehran


Abstract— Social networks as an application of social web have attracted many users and have became one of the most important services of Internet. Interaction of users in a social network generates huge amount of data. In this paper, a novel meta-search engine is proposed to exploit social network data to improve web search results. The system modifies meta-search engine’s multi-agent based architecture by adding new agents to gather interaction data of users and process them to create user profiles based on the previous researches. These profiles are used to re-rank top search results of a web search engine and increase effectiveness of retrieval. Normalized Discounted Cumulative Gain (NDCG) measure is used to evaluate our system. Experimental results show the potential usefulness of social network data for improvement of web search effectiveness.

Keywords-social network; meta search engine; user social model; user profiling; social search

I. INTRODUCTION Web has experienced a great growth during recent years.

Many users use web for different purposes such as shopping, getting information, communication and other web services. Web 2 or social web is a major revolution of Web. Web 2 exploits web by including people as a part of web and make them possible to interact and build web contents. Many services are invented on top of the social web such as wikis, weblogs just to name a few. Social networks are one of the most important services of Web 2. They provide a friendly environment for their users to share their ideas and interests. Social networks have attracted a numerous number of users during recent years. For example, statistics show that number of Facebook users, a well-known social network, is reached more than 500 million active users [8]. In one hand, these large numbers of users generate lots of data and share them with their social friends. It might be a good idea to find potential ways to use this kind of data. On the other hand, “the most web search engines in use fail to take advantage of the intentions, interests and preferences of their users” [21]. Thus, this paper tries to investigate potential usage of data obtained from social networks to improve of web search results.

Many people use search engines to find their requirements on the web. Researches show that each search engines covers some parts of the web. Therefore, Meta search engines are invented to combine results of different search engines and increase web search effectiveness due to a larger coverage of

indexed web. Today’s Meta search engine’s activities are more than a simple combination of search engine results. They try to create profiles for their users and personalize search results by taking these profiles into account. This process is called Search Personalization and its usage is not limited to Meta-search engines. One of the most important methods for personalization is Re-ranking. This method tries to change rank of final results according to users’ interests using users’ profiles. In this regard, top results are more probable to be user’s interest. Many researches show that users have found personalization an effective way to find their needed information faster.

In this paper, multi-agent based architecture of Meta search engines is modified to handle social network data. New agents can gather different kinds of data from social networks and process them to build user profiles that can be used for re-ranking search results and personalizing web search. These profiles are used to re-rank search engine results and provide better results by considering users’ interests. To evaluate potential usage of social networks for search personalization, Normalized Discounted Cumulative Gain (NDCG) measure is calculated based on explicit judgments of a limited number of users.

Organization of this paper is as follows: Next section has a brief overview of social search, Meta-search engines and search personalization. Section 3 explains our novel Meta-search engine, its architecture, user interface, and user profiles. Evaluation process and experimental results of re-ranking process based on user social model are introduced in Section 4. Finally, this paper concludes with a short summary and explanation about future works.

II. BACKGROUND KNOWLEDGE AND RELATED WORKS

A. Meta-search engines Meta-search engines have been an active research area

during recent years [6]. Many Meta-search engines are created for the purpose of combining results of different information retrieval systems such as Profusion [1], SaavySearch [2], WebFusion [3][7], I-Spy [16], a few to name. Some of them use Multi-agent systems for their architecture [1] [3]. In this regard, agents act as autonomous components of system and each agent has its own activity territory. In this research, an

2010 5th International Symposium on Telecommunications (IST'2010)

978-1-4244-8185-9/10/$26.00 ©2010 IEEE 744

overview of these architectures is investigated and an expansion to them is proposed.

B. Social Search “Web 2”, Social web, had a fast growth speed since its

appearance. Therefore, it has been an increasingly attention to the usage of social information for information retrieval during recent years. A proper definition of this concept is: “Social information retrieval refers to a family of information retrieval techniques that assist users in obtaining information to meet their information needs by harnessing other users’ expert knowledge or search experience” [12]. Freyne and Smyth’s I-SPY system [16][20] is one of the earliest Meta-search engines which considers search communities and bringing ideas of Web search, social networking and personalization together. Kirsch has classified social networks for information retrieval and compared them with other approaches [11]. Usefulness of Social Network Analysis (SNA) methods for exploiting internet search is also studied in a few researches [18][22]. Some innovative applications like ArnetMiner [9][10] is created to solve specific information retrieval problems, like Person search using “Social Networks” extracted from publications.

C. Search Personalization Many researches argue that personalization of search

results can improve relevance of search results [23]. “Collaborative filtering” that is widely used in recommender systems opens its way to search personalization [15]. In this regard, impacts of users’ collaboration on search systems are investigated in some researches [14]. There are several methods that can be used for personalization such as re-ranking, query expansion, and query modification [17]. Also, clustering users in groups and usage of a shared profile for all members of a group has shown that has some advantages over usage of a single profile for each user [24]. There are different kinds of data that can be used for personalization such as Desktop data, emails, search histories, anthologies a few to name [17]. Thus, objective of this paper is to identify possibility of usefulness of data obtained from social networks for personalization by re-ranking method.

D. Background knowledge Vector Space model (VSM) [19] is used to represent the

gathered text data to create user profiles. In this model each term is a feature and has an assigned weight. A combination of frequency of terms and inverse document frequency (TF*IDF) is used to weight terms [19].

NDCG algorithm [27], a well-known measure for calculating relevancy of high-relevant results, is used for evaluation. To calculate this value, DCG value should be calculated using the following formula:

=

+−=

otherwisei

iiGiDCGG

iDCGb

1log/][]1[

]1[][ (1)

Where G is gain vector, a vector which any element of it is a explicit judgment value of the user, and i is index (rank) of each document. Let SortedG is the sorted vector of gain vector

in a descending order. NCDG can be calculated by dividing the DCG value of G on DCG value of SortedG.

III. A PROTOTYPE SYSTEM ARCHITECTURE, USER INTERFACE AND USER PROFILES

A. The proposed architecture Our architecture extends base architecture of Meta-search

engines proposed by Sussan Gauch in Profusion [4] by adding new agents to it in order to prepare it for using social networks data. Figure 1 shows Convenient architecture of Meta search engines [5] and Figure 2 depicts Profusion Meta search engine architecture [4].

Figure 1. Convenient architecture of Meta search engines

In Convenient architecture each agent is responsible to fetch data and interact with one underlying search engine.

Figure 2. ProFusion agents control flow

Profusion architecture has several agents. Dispatch agent selects between underlying search engines and get results from them. Learning agent gets users’ feedbacks and uses them to adapt Meta search engine with users’ interests. Guarding agent checks accessibility of the each underlying search engines.

745

Figure 3. A multi-agent based architecture for the social network based Meta search engines.

Profusion architecture lacks agents for dealing with social networks. To overcome this lack, four new abstract types of agents are added to the architecture. These new agents are illustrated in blue boxes of figure 3. Arrows indicate data flow between agents.

1) SNCrawler (Social network information crawler) This agent is created for the purpose of getting information

from social networks. SNCrawler is an abstract agent and should be implemented for each type of data required to be gathered separately. This means that various available data such as “Notes”, “Photos”, “Videos”, “News” and other types of data requires its own SNCrawler agent. In our first implementation, there was only one instance of this agent; but we found out that this agent becomes so complicated after a while. The main reasons of this complexity are various data types and different time interval required for crawling data. For example, “News” data should be gathered in a shorter interval than “Friends” data. So, architecture is changed to handle each type of data with its own specific agent.

2) SNModeler (Social network user modeler) After obtaining information from social network, it is

required to process them to make them useful for re-ranking process. SNModeler agent processes this kind of data and makes a user profile. The main activity of SNModeler is similar to the main activity of UserModeler agent of Profusion. The data, however, is different. The methods which are used for creating user profiles are stated in the next part of this section.

3) SNTokenManager (Social network token manager) Tokens give permission to an application to gather

information from a social network. Some tokens are being expired after certain amount of time because of inactivity of the user. Even Facebook’s “Forever” tokens, those which obtained using “offline_access” permission, have this problem. During our experimentations, some problems aroused due to problems in tokens obtained for accessing information. Thus, SNTM agent is created to manage tokens and make them up-to-date whenever required. These expired tokens caused some problems for SNCrawler agents. The main activity of this agent is to check tokens and their validation and send request for obtaining new tokens whenever required.

4) SNAnalyzer (Social network Analyzer) The main activity of SNAnalyzer agent is to perform

“Social network analysis” (SNA) on the gathered data. Although obtained results in this paper are not based on measures introduced on SNA literature, we decided not to remove this agent from architecture to have a more complete architecture for future researches. SNAnalyzer should be an abstract agent. This means that each type of analysis methods requires its own agent. For instance, an agent is required to compute “Centrality” measure of a social network item; another agent is required to calculate “Distance” measure and so on. These measures can be used to improve effectiveness of web search as discussed in Gummadi‘s paper [18].

B. User Interface

Figure 4. Integrated User Interface with Facebook

User interface of this application is designed like most familiar web search engines. Figure 4 shows a view of this interface. Users can interact with this interface inside their Facebook account and also in a separate website. This UI is

SNTokenManager

Social Network (SN)

SNCrawler News

SNCrawler Demographic

SNCrawler Friends

SNModeler

SNAnalyser Centrality

SNAnalyser Degree

SNAnalyser Betweeness

Tokens

Tokens

SN Data

SN Saved Data

SN Models SN Saved Data

SN-based User Model

Re-Ranking

746

used to save users’ queries and search results and explicit relevance judgments.

C. User Profiles Since the purpose of this research is to investigate

usefulness of social network data and not personalization methods or profile creation and usage techniques, Vector Space Model (VSM) [17] is used to represent obtained text data. This model is used to represent user models in some previous personalization research papers [28], [29], too.

There are different types of data that can be retrieved from Facebook Api such as “Notes”, ”News”, ”Friendship relations” a few to name. We decided to build our profiles based on “News” data because this type of data is the richest data that can be gathered from Facebook in accordance to our experiences. “News” data is somewhat different from RSS (most commonly expanded as Really Simple Syndication [30]) feeds since it is not only about shared news by user’s subscribed fan pages but also shared items of other related user (i.e. friends). In addition, users can put comments on “News” items, mark them as liked items or re-share them with other friends. “News” data in Facebook is a type of “Post” data and has several properties including “id”, “from”, “to”, “message”, “picture”, “link”, “name”, “caption”, “description”, “source”, “source”, “icon”, “attribution”, “actions (like share)”, “likes (number of people who likes this item)”, “created_time”, “updated_time”, “type (i.e. link, photo, video). We merged text properties such as “message” and “description” to make our profiles. Therefore, our profiles are created using a set of VSM vectors retrieved from Facebook “News” data. Weight of each word is calculated by multiplying its frequency to its invert document frequency (TF*IDF). Similarity of each search result with profile of the user is provides rank of that result.

It is obvious that such profile may contain lots of words irrelevant to user’s interests. However, we tried to optimize our profiles by some heuristic approaches proposed by some researches [28] such as removing stop words (stop word list [32]), removing words with less than 1 percent document frequency, removing words with length less than 3. However, we optimized each profile by a try-and-error threshold value of most valuable terms.

It has been mentioned that the main activity of the SNModeler agent is to create user profiles.

IV. EVALUATION PROCESS AND EXPERIMENTAL RESULTS This section describes the process of evaluation and

experimental results achieved from experiences.

To evaluate potential usefulness of social network data for improvement of web search results, a Meta-search engine application is developed. This system does not implement all described agents. So, the architecture can be assumed as a prototype for future systems. However, future development of this system will be based on the proposed architecture.

This application uses Facebook Graph API (Application programming interface) and OAuth 2.0 authentication API [26] with “Offline_Access” permission to fetch Facebook data. The

Graph API is the core of Facebook Platform that enables applications to read and write data to Facebook [26].

Evaluation method is similar to some previous personalization researches like [24], [31]. Users are asked to register in this application with their social network account. This registration permits application to fetch their data. Obtained data is used to build user profiles. Next, users are asked to input queries related to the gathered data. Although these queries make a bias toward better results due to their strong relationship to the created profiles, they are suitable to investigate hypothesis of usefulness of this kind of data for personalization of specific types of queries. Finally, users are asked to judge retrieved results of this Meta-search engine by choosing a score from three scores for each result:

0- Non-relevant (or not interested in)

1- Relevant (or somewhat interested)

2- Highly relevant (or highly interested)

These scores are made on top 25 results of each query and used for calculation of the NDCG (1) measure.

Cosine similarity is used in a simple formula to calculate similarity of each web search result and vectors of the user profile for re-ranking process.

To calculate the NDCG measure for each user, average value of NDCG for all queries of the user is used. By the other words, the represented values in the results are averaged and for some queries are more than this value and for the others are less.

Figure 5 shows achieved averaged NDCG value for each user.

Figure 5. Experimental results

It can be inferred from this chart that profiles created using social networks data have the potential of representing users’ interests and usefulness of being used for improving web

747

search effectiveness. “User 5” results are different from the other users. Unlike others, the re-ranking process not only can not increase search effectiveness but also decreases this value. There may be two reasons for this event:

1- The profile of the user was not rich enough for certain queries.

2- Most of queries used by this user were not suitable for personalization. In other words, top results were the best results and re-ranking process was assigned lower rank to these results.

However, further researches are required to analyze flaws of profiles and to find ways to treat with them. We should emphasize again here that these results are biased and may not be as convincing as it should be, but we are looking for potential improvements.

The values depicted in figure 4 are average of NDCG value for all queries of the user. It means that re-ranking process might not increase NDCG value for all queries. Figure 6 shows calculated NDCG values for each query of one user.

Figure 6. Experimental results of one user by queries

As it can be seen in figure 6, “Q1” performance is decreased by re-ranking process. The averaged value of NDCG for “User 1”, however, is promising.

V. CONCLUSION AND FUTURE WORKS In this paper, an extended architecture for Meta-search

engines based on social networks is proposed. New agents and their activities and communications are explained. We have implemented required agents in a prototype system and integrate it with Facebook social network to evaluate usefulness of social networks data for web search. A number of users were asked to input some queries to the proposed system and judge results explicitly. Although results are not as convincing as what it should be for a real personalization system, they show that profiles created using social networks data have the potential to be used for personalization and improvement of the web search results. However, further researches are still required for creating profiles and usage of

data. Social search systems are at their earliest stages of development, so we still have a long way to go.

For future works, we want to analyze flaws of profiles and make better profiles. Also, we want to extend profiles by considering users’ friends’ data and adding history of visited web pages to them. Therefore, the new profiles not only contain information about social network data but also contain user’s behavior in the Meta-search engine. The re-ranking method that is used in this paper was very simple. So, we want to develop a more effective algorithm for re-ranking process.

REFERENCES

[1] S. Gauch, G. Wang, M. Gomez. ProFusion: Intelligent Fusion from Multiple, Distributed Search Engines. Journal of Universal Computer Science, 2(9), 1996.

[2] A. E. Howe and D. Dreilinger. SavvySearch: A meta-search engine that learns which search engines to query. AI Magazine, 18(2), 1997.

[3] A. H. Keyhanipour, B. Moshiri, M. Kazemian, M. Piroozmand, C. Lucas.Aggregation of web search engines based on users’ preferences in WebFusion. Knowledge-Based Systems , vol. 20, pp. 321–328, 2007.

[4] Y. Fan, S. Gauch, Adaptive Agents for Information Gathering from Multiple, Distributed Information Sources, University of Kansas, 1999.

[5] W. Meng, K.L. Lui, Building Efficient and Effective Metasearch engines, ACM Computing Surveys, 34(1), March 2002, pp. 48-84, 2002.

[6] A.H. Keyhanipour, M. Piroozmand, B. Moshiri, C. Lucas, A multilayer/ multi-agent architecture for meta-search engines, in: Proceedings of ICGST International Conference on Artificial Intelligence and Machine Learning (AIML-05), Cairo, Egypt, 2005.

[7] A. H. Keyhanipour, B. Moshiri, M. Piroozmand, C. Lucas. WebFusion: Fundamentals and Principals of a Novel Meta Search Engine. IJCNN'2006, pp.4126-4131, 2006.

[8] Zuckerberg, Mark (2010-07-21). "500 Million Stories". Facebook. http://blog.facebook.com/blog.php?post=409753352130. Retrieved 2010-07-21.

[9] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, "Arnetminer: extraction and mining of academic social networks," in KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2008, pp. 990-998. [Online]. Available: http://dx.doi.org/10.1145/1401890.1402008

[10] J. Tang, J. Zhang, D. Zhang, L. Yao, C. Zhu and J. Li, Arnetminer: An expertise oriented search system for web community. Proceedings of ISWC 2007, 2007.

[11] S. M. Kirsch, Social Information Retrieval, Diploma Thesis in Computer Science, Rheinische Friedrich-Wilhelms-Universit¨at Bonn, 2005. [Online] http://www.sebastian-kirsch.org/moebius/docs/socialir-thesis.pdf

[12] D. Goh, D. Goh, and S. Foo, Social Information Retrieval Systems: Emerging Technologies and Applications for Searching the Web Effectively. Hershey, PA: Information Science Reference - Imprint of: IGI Publishing, 2007.

[13] J. Freyne and B. Smyth, "An experiment in social search," in Proceeding of 3rd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, Lecture Notes in Computer Science 3137. Springer-Verlag, 2004, pp. 95-103.

[14] A. K. Agrahri, Divya, and J. Riedl, "Can people collaborate to improve the relevance of search results?" in RecSys '08: Proceedings of the 2008 ACM conference on Recommender systems. New York, NY, USA: ACM, 2008, pp. 283-286. [Online]. Available: http://dx.doi.org/10.1145/1454008.1454052

[15] U. Rohini and V. Ambati, Improving Re-ranking of Search Results Using Collaborative Filtering, 2006. [Online]. Available: http://dx.doi.org/10.1007/11880592_16

748

[16] B. Smyth, J. Freyne, M. Coyle, P. Briggs, E. Balfe, and T. Building, "I-spy: anonymous, community-based personalization by collaborative web search," in In Proceedings of the 23rd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, 2003, pp. 367-380. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.95.6536

[17] A. Micarelli, F. Gasparetti, F. Sciarrone, and S. Gauch, "Personalized search on the world wide web," 2007, pp. 195-230. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-72079-9_6.

[18] K. P. Gummadi, A. Mislove, and P. Druschel, "Exploiting social networks for internet search," in Proc. 5th Workshop on Hot Topics in Networks, Irvine, CA, 2006, pp. 79-84.

[19] F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1 (Mar. 2002), pp. 1-47, 2002. DOI= http://doi.acm.org/10.1145/505282.505283.

[20] J. Freyne and B. Smyth, "An experiment in social search," in Proceeding of 3rd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, Lecture Notes in Computer Science 3137. Springer-Verlag, pp. 95-103, 2004.

[21] Pujol, J.M., Sanguesa, R., & Bermudez, J. (2003, May 20-24). Porqpine: A distributive and collaborative search engine. Proceedings of the 12th International World Wide Web Conference (WWW2003, p. S25), Budapest, Hungary. Retrieved May 27, 2006. [Online]. Available: http://www2003.org/cdrom/papers/poster/p341/p341-pujol.html.html

[22] Kirchhoff, L., Stanoevska-Slabeva, K., Nicolai, T., Fleck, M.: Using social network analysis to enhance information retrieval systems. In: Social network applications conference, 2008.

[23] J. Pitkow, H. Schütze, T. Cass, R. Cooley, D. Turnbull, A. Edmonds, E. Adar, and T. Breuel, "Personalized search," Commun. ACM, vol. 45, no. 9, pp. 50-55, September 2002. [Online]. Available: http://dx.doi.org/10.1145/567498.567526 .

[24] J. Teevan, M. R. Morris, and S. Bush, "Discovering and using groups to improve personalized search," in WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining. New York, NY, USA: ACM, 2009, pp. 15-24. [Online]. Available: http://dx.doi.org/10.1145/1498759.1498786 .

[25] F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1 (Mar. 2002), pp. 1-47, 2002. DOI= http://doi.acm.org/10.1145/505282.505283.

[26] http://developers.facebook.com/docs/ retrieved 2010-oct-05. [27] K. Järvelin and J. Kekäläinen, "IR evaluation methods for retrieving

highly relevant documents," in SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2000, pp. 41-48. [Online]. Available: http://dx.doi.org/10.1145/345508.345545

[28] F. Liu, C. Yu, and W. Meng, "Personalized web search by mapping user queries to categories," in CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management. New York, NY, USA: ACM Press, 2002, pp. 558-565. [Online]. Available: http://dx.doi.org/10.1145/584792.584884

[29] J. Teevan, S. T. Dumais, and E. Horvitz, "Personalizing search via automated analysis of interests and activities," in SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2005, pp. 449-456. [Online]. Available: http://dx.doi.org/10.1145/1076034.1076111

[30] http://en.wikipedia.org/wiki/RSS retrieved 2010-oct-05. [31] J. Teevan, S. T. Dumais, and E. Horvitz, "Potential for personalization,"

ACM Trans. Comput.-Hum. Interact., vol. 17, April 2010. [Online]. Available: http://doi.acm.org/10.1145/1721831.1721835

[32] http://members.unine.ch/jacques.savoy/clef/ retrieved 2010-oct-05.

749

[ieee 2010 5th international symposium on telecommunications (ist) - tehran, iran...

Documents