social network search as a volatile multi-armed bandit problemfelner/2013/sn_vmab.pdf · in this...

15
Social Network Search as a Volatile Multi-armed Bandit Problem Zahy Bnaya Information Systems Engineering Department Ben Gurion University, Beer Sheva, Israel [email protected] Rami Puzis Information Systems Engineering Department Ben Gurion University, Beer Sheva, Israel [email protected] Roni Stern SEAS Harvard University, MA, USA [email protected] Ariel Felner Information Systems Engineering Department Ben Gurion University, Beer Sheva, Israel [email protected] ABSTRACT In many cases the best way to find a profile or a set of profiles matching some criteria in a social network is via targeted crawling. An important chal- lenge in targeted crawling is choosing the next profile to explore. Existing heuristics for targeted crawl- ing are usually tailored for specific search criterion and could lead to short-sighted crawling decisions. In this paper we propose and evaluate a generic ap- proach for guiding targeted crawling which is based on recent developments in Artificial Intelligence. Our approach, based on the recently introduced variant of the Multi-Armed Bandit problem with volatile arms (VMAB), aims to provide a proper balance between exploration and exploitation during the crawling pro- cess. Unlike other heuristics which are hand tailored for specific type of search queries, our approach is general-purpose. In addition, it provides provable performance guarantees. Experimental results indi- cate that our approach compares favorably with the best existing heuristics on two different domains. I INTRODUCTION Online social networks (SNs) such as Facebook, Twit- ter, Google+, or Academia.edu are a part of every- day life for many people around the world. SNs are a source of valuable information. This information can be used for tracking public opinions, political trends, disseminate news, or just socialize by finding the right group of people to communicate with. Extracting information from SNs, referred to as SN querying, is widely practiced by commercial compa- nies, government agencies and even individual people. The objective of SN queries is to pinpoint profiles that provide valuable information according to some cri- teria. SN queries can be used to select subjects for surveys, research, marketing purposes etc. For example, consider an SN query designed to help the chairs of a conference program to assemble pro- gram committee and/or find reviewers. Meeting the criteria for this query may be to match profiles of re- searchers that had written a number of papers in a particular field and participated in previous confer- ences. Another possible example is selecting from a group of known candidates, the ones that are most influential or lead public opinions in a certain mat- ter of interest, such as a specific consumer’s product or political issues. These individuals can be found by examining followers on Twitter or Facebook and checking the topic of the tweets or posts on which the followers are active. Profiles that are active on the specific matter and tend to be followed can be regarded as satisfying this query criteria. Lately, Facebook realized the need for SN queries and introduced the SN search feature. However, many other SN services lag behind and do not provide search capabilities at all. Although very intuitive, the search capabilities currently provided by Face- book are very limited. In many cases a complex cri- teria is required to define which profiles are relevant and which are not. The search capabilities provided by Facebook are not sufficient to execute complex searches where the relevance of a profile can only be evaluated by deep analysis on the profile content. In case of complex SN queries, the relevant profiles are commonly pinpointed using dedicated crawls on the SN. We assume that during such crawls a function for analyzing a profile and evaluating its relevance, referred to as profile-acquisition, is repeatedly exe- cuted in order to: 1) Extract information from the analyzed profiles and 2) Extract new profiles from their lists-of-friend (LOF) which become candidates for future profile-acquisition. The aim of this paper is to provide efficient policies for guiding the SN crawl- ing process by intelligently choosing the next profile to acquire. The challenge is to focus the search at relevant areas of the network that are most likely to contain valuable profiles which will satisfy the criteria of the query. Previous work has shown that blind, arbitrary selec- tion of next profiles to examine (aka. ”Snowball“ crawling) is inefficient [1]. Thus, a number of intelli- gent selection methods for the next profile to acquire Page 1 of 15 c ASE 2012

Upload: others

Post on 21-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

Social Network Search as a Volatile Multi-armed Bandit Problem

Zahy BnayaInformation Systems

Engineering DepartmentBen Gurion University,

Beer Sheva, [email protected]

Rami PuzisInformation Systems

Engineering DepartmentBen Gurion University,

Beer Sheva, [email protected]

Roni SternSEAS

Harvard University, MA,USA

[email protected]

Ariel FelnerInformation Systems

Engineering DepartmentBen Gurion University,

Beer Sheva, [email protected]

ABSTRACT

In many cases the best way to find a profile or aset of profiles matching some criteria in a socialnetwork is via targeted crawling. An important chal-lenge in targeted crawling is choosing the next profileto explore. Existing heuristics for targeted crawl-ing are usually tailored for specific search criterionand could lead to short-sighted crawling decisions.In this paper we propose and evaluate a generic ap-proach for guiding targeted crawling which is basedon recent developments in Artificial Intelligence. Ourapproach, based on the recently introduced variant ofthe Multi-Armed Bandit problem with volatile arms(VMAB), aims to provide a proper balance betweenexploration and exploitation during the crawling pro-cess. Unlike other heuristics which are hand tailoredfor specific type of search queries, our approach isgeneral-purpose. In addition, it provides provableperformance guarantees. Experimental results indi-cate that our approach compares favorably with thebest existing heuristics on two different domains.

I INTRODUCTION

Online social networks (SNs) such as Facebook, Twit-ter, Google+, or Academia.edu are a part of every-day life for many people around the world. SNs are asource of valuable information. This information canbe used for tracking public opinions, political trends,disseminate news, or just socialize by finding the rightgroup of people to communicate with.

Extracting information from SNs, referred to as SNquerying, is widely practiced by commercial compa-nies, government agencies and even individual people.The objective of SN queries is to pinpoint profiles thatprovide valuable information according to some cri-teria. SN queries can be used to select subjects forsurveys, research, marketing purposes etc.

For example, consider an SN query designed to helpthe chairs of a conference program to assemble pro-gram committee and/or find reviewers. Meeting the

criteria for this query may be to match profiles of re-searchers that had written a number of papers in aparticular field and participated in previous confer-ences. Another possible example is selecting from agroup of known candidates, the ones that are mostinfluential or lead public opinions in a certain mat-ter of interest, such as a specific consumer’s productor political issues. These individuals can be foundby examining followers on Twitter or Facebook andchecking the topic of the tweets or posts on whichthe followers are active. Profiles that are active onthe specific matter and tend to be followed can beregarded as satisfying this query criteria.

Lately, Facebook realized the need for SN queries andintroduced the SN search feature. However, manyother SN services lag behind and do not providesearch capabilities at all. Although very intuitive,the search capabilities currently provided by Face-book are very limited. In many cases a complex cri-teria is required to define which profiles are relevantand which are not. The search capabilities providedby Facebook are not sufficient to execute complexsearches where the relevance of a profile can only beevaluated by deep analysis on the profile content. Incase of complex SN queries, the relevant profiles arecommonly pinpointed using dedicated crawls on theSN. We assume that during such crawls a functionfor analyzing a profile and evaluating its relevance,referred to as profile-acquisition, is repeatedly exe-cuted in order to: 1) Extract information from theanalyzed profiles and 2) Extract new profiles fromtheir lists-of-friend (LOF) which become candidatesfor future profile-acquisition. The aim of this paper isto provide efficient policies for guiding the SN crawl-ing process by intelligently choosing the next profileto acquire. The challenge is to focus the search atrelevant areas of the network that are most likely tocontain valuable profiles which will satisfy the criteriaof the query.

Previous work has shown that blind, arbitrary selec-tion of next profiles to examine (aka. ”Snowball“crawling) is inefficient [1]. Thus, a number of intelli-gent selection methods for the next profile to acquire

Page 1 of 15c©ASE 2012

Page 2: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

were introduced [1–3]. These methods of intelligentcrawling usually devise task-specific heuristics, basedon the homophile principle, which choose SN profilesthat tend to have many properties in common suchas age-group, gender, common friends, etc.

However, these works overlooked an important as-pect of SN querying - the exploration vs. exploita-tion tradeoff. A deeper look reveals that the effectsof acquiring a profile are twofold. 1) exploitation:It may immediately contribute to the knowledge col-lected by the process (e.g., if this profile fits the searchcriteria) and; 2) exploration: Exposing the LOF ofthis profile enriches the available information aboutthe SN, thus, allowing more informed acquisition de-cisions in later stages. These two effects constitute anatural exploration versus exploitation tradeoff.

Existing SN targeted crawling methods are pure-exploitation in the sense that a specific, task-dependent heuristic function is suggested which aimsto predict the likelihood that the profile will meetthe query criteria. Profiles that achieve the maximalheuristic value are selected for the next acquisition.None of these methods give weight to exploration as-pects.

In this paper we introduce a method which consid-ers both exploration and exploitation in SN query-ing. A common approach in the Artificial Intelligence(AI) literature for balancing exploration and exploita-tion is via the formulation of the Multi-armed banditproblem (MAB) [4]. We propose a generic algorithm(with a number of variants) for SN querying based onMAB which provides a viable balance between explo-ration and exploitation. Two important properties ofour new approach are: (1) it guarantees performancebounds on the crawling process and (2) it does notrequire an ad hoc heuristic function tailored to thesearch query. Instead, it is based merely on past ex-perience of profiles acquired during the search.

We provide experimental results on two domains withdifferent search query definitions and compare our ap-proach to state of the art heuristics in Section X. Thefirst domain is the Target Oriented Network Intelli-gence Collection (TONIC) [1]. The second domain iscommunity search in the DBLP network 1. We showthat our approach outperforms existing methods onthe tested domains and provides performance guar-antees.

The rest of the paper is structured as follows. In Sec-

tion II we formally define the SN querying problem.Section III discusses best-first search approach for an-swering SN queries. Sections IV, V, and VI discussMAB and its applicability to SN querying and pro-vide deeper insights on this approach. We introducethe Volatile-MAB variant in Section VII.

1 ETHICAL ISSUES

Social media analysis can be provided as a service toSN users or used to improve existing services. Never-theless, it can also be used to infringe the privacy ofSN users. This matter becomes extremely sensitivewhen large enterprises and commercial companies areinvolved in this analysis [5, 6]. For example, it iscommon practice for many commercial companies toinspect the Facebook and LinkedIn profiles of candi-date employees in order to extract information aboutpast projects, colleagues and managers even withoutexplicit permission of the new employee.

The legitimacy of collecting personal informationfrom open sources is an open discussion in many com-munities and was discussed in previous work [6,7, in-ter alia]. On one hand, parents rightly insist thatstrangers should not be able to track the whereaboutsof their child through a social network. On the otherhand, government agencies and law enforcement act-ing under a court order occasionally do explore theSN looking for specific information on terrorists orcriminal organizations. Thus, it is important thatusers of targeted crawling algorithms, such as thosepresented in this work, will consider the ethical issuesbefore using such algorithms. We note that all the ex-periments performed in this work only used publiclyavailable data, without violating the privacy of noone.

II PROBLEM DESCRIPTION

In this section we define the SN querying problem,providing necessary notation and listing the assump-tions used in this work. A SN can be represented asa graph G = (V,E) where V is the set of the SN pro-files and E are links between them. A link betweenprofiles in a SN can mean for example, “is a friendof”, “is following”, “commented on a post of”, “likeda post of”, etc.

Each profile v ∈ V might reveal relevant informationfor the SN query once its content is extracted and

1http://dblp.uni-trier.de/xml/

Page 2 of 15c©ASE 2012

Page 3: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

analyzed. Let ω(v) denote the utility gained by ex-amining profile v. Intuitively, this utility indicatesthe degree of relevance of profile v to the criteria of agiven query. For example, if the query is to find for-mer students of Stanford University, ω(v) could beevaluated to 1 if v is a former student and 0 other-wise. Note that ω(v) is not restricted to be a booleanfunction and can also be a continuous function.

We assume that a profile acquisition operator is avail-able, defined as follows. When a given profile p isacquired (using the profile acquisition operator), itsinternal content is revealed. In particular, the opera-tion of profile-acquisition of a given profile v consistsof two functions:

• Find neighbors this functions reveals all theedges and vertices connected to v. This func-tion is general to all SN queries.

• Calculate ω(v). Using the data extracted fromthe profile, the utility of v, ω(v) is calculated.This function must be supplied by the user.

The profile-acquisition operation is done via externalinformation extraction methods that download theprofile data, parse it and extract relevant informa-tion from it. The bottleneck of the acquisition oper-ation is the actual downloading of profile data. Wethus assume that once the data was downloaded, itis possible to quickly extract any available informa-tion from this data, including the neighbors of thatprofile. The information extraction methods used toimplement the profile acquisition action are beyondthe scope of this paper and have been discussed byothers in prior work [8–10].

Due to the size of current SNs, we assume that only asmall subgraph of the entire SN (G) is known duringthe process of SN querying. Following the notation inprior work [1, 11], let CKG = (V ′, E′) represent thecurrently known graph, where V ′ ⊆ V and E′ ⊆ E.The CKG contains two types of nodes:

• Internal nodes - Represent profiles that havebeen acquired.

• Frontier nodes - Represent profiles that havenot been acquired but only revealed by otheracquired profiles (i.e., by the internal nodes).

Edges incident to a node are only revealed when a

node is acquired. Thus, the CKG contains only edgesconnecting internal nodes to internal nodes or edgesfrom internal nodes to frontier nodes. For exam-ple, consider the CKG shown in Figure 2(a). Thedark nodes are internal nodes that were acquired. Alledges incident with them are known. In addition theirneighbors (that are not internal) constitute the set offrontier nodes. Thus, acquiring a profile v, removes itfrom the set of frontier nodes and adds it to the set ofinternal nodes. In the crawling process the aim is tochoose the next frontier node to acquire. Note thatω(v) is only known for internal nodes but is unknownfor frontier nodes.

In addition to the utility function ω and the profileacquisition operator, we assume to receive as inputan initial set of known profiles. These profiles arecalled the seeds. The CKG is initially composed ofthe seeds and their neighbors, which form the initialset of frontier nodes.

Many SNs restrict crawling operations and limitbandwidth. Also, in some cases, the stealth of theoverall process is important. Thus, we assume thata budget b is imposed (but not necessarily known),such that the SN query process can perform at mostb profile acquisitions before it is forced to halt. Thetask in SN querying is to acquire b profiles such thatthe sum of utilities for these b profiles is maximized.We denote by Ω this overall accumulated utility thatneed to be maximized, i.e., Ω =

∑v∈I ω(v) where I

is the set of internal nodes in the CKG obtained afteracquiring these b profiles.2

III SN QUERIES AS A BEST-FIRSTSEARCH

Prior work solved SN querying using best-firstsearch [1]. Best-first search maintains two lists ofnodes: an OPEN list and a CLOSED list. At everystep, the best node (according to some cost or utilityfunction) from OPEN is expanded, i.e., it is poppedfrom OPEN and moved to CLOSED. Its children aregenerated and being inserted to OPEN.

The SN querying process can be modeled as a best-first search process. Initially, the seeds are generatedand added to OPEN. The expansion of nodes is donewith the profile acquisition operator. Therefore, theinternal nodes are the nodes that have been expanded(those in CLOSED) and the frontier nodes are thosethat have been generated but not yet expanded (those

2We assume that the limit b is not necessarily known. However, we also assume the limit b is large enough, such that thelong-term considerations for the acquisition decisions should be taken into account.

Page 3 of 15c©ASE 2012

Page 4: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

in OPEN). Best-first search uses a cost/utility func-tion to select the next node to expand.

1 TOPOLOGICAL HEURISTICS FOR SNQUERIES

The heart of the querying task is to choose a fron-tier node (from OPEN) to acquire next. The contentof a given profile v such as its residence information,education, etc, might provide an indication about itsutility ω(v). However, all of this information is avail-able only after the profile is acquired and its utilityω(v) calculated. Frontier nodes do not contain any ofthat information, since they have not been acquiredyet. Thus, such content information cannot be di-rectly used and the crawling should be guided by aheuristic function which estimates the merit of v, e.g.,a prediction of ω(v).

Mostly, heuristic functions for a frontier node v arebased on two types of information: (a) topologicalproperties derived from v’s connections in the CKG(e.g., its known degree in the CKG) and (b) the al-ready known utility ω(v) of the internal nodes thatare connected to v. We call this type of heuristic func-tions topological heuristics. Many existing heuristicsfor SN querying fall into this category of topologicalheuristic functions [1, 11].

Developing a heuristic function for a SN queryingtask, is done by carefully examining the query andunderstanding the topological features that can indi-cate relevance for that specific query. Moreover, suchheuristic functions usually make some assumptionsabout the topology of the graph and the relationsbetween neighbors.

For example, consider the target oriented intelli-gence crawling (TONIC) domain [1], where the querysearches for profiles that know or have interacted witha certain specific profile (called the target). The util-ity of a profile ω(v) in this case is a boolean func-tion indicating whether a profile contains informationabout the target or not. A heuristic function thatproved efficient in this domain is called known degree(KD). The KD heuristic gives high ranks to frontiernodes that have many connections with other nodesin CKG that were already found to have informationabout the target profile. This heuristic is based on theassumption that profiles that have information abouta target profile are likely to have connections to otherprofiles that have information about that target pro-file. In [1], other topological heuristics were given, allbased on this principle.

the main problem with heuristic functions that arebased on assumptions made on the topology or con-tent is that the heuristics are hand tailored for aparticular domain. In addition, the topological as-sumptions can be hard to find and may not be accu-rate across all problem instances even within a givendomain. Furthermore, existing topological heuristicfunctions, such as the KD heuristic described above,indicate how likely it is for a given frontier node tohave a high utility. Thus, SN queries guided by thebest-first-search paradigm which expand the frontier-node that achieved the best heuristic will maximizethe probability to increase the overall utility obtainedso far by performing exactly one acquisition decision.Thus, we consider Best-First heuristic acquisitions asgreedy or myopic decisions, collecting only the imme-diate utility. In this sense, these heuristics functionsare pure exploitation heuristics. We thus introducea general approach that is domain independent, doesnot require any topological assumptions and balancesexploration and exploitation.

2 OUR APPROACH: BALANCING EX-PLORATION AND EXPLOITATION

An interesting property of topological heuristic func-tions is that many frontier nodes share the same topo-logical properties and thus have the same heuristicvalue. We use the notion of structural equivalenceclasses (SEC) to group frontier nodes having exactlythe same set of neighbors [12]. Figure 1 demonstratesthree different SECs (E1,E2,E3). The KD heuris-tic value (and also the value of any other topologicalheuristic) for all frontier nodes that belong to thesame SEC is identical because these nodes (beforeacquisition) are identical from the perspective of thenetwork topology.

Let e be a SEC. Consider ω(e) to be a random vari-able representing the utility of a random frontiernode that belongs to SEC e. Let E[ω(e)] be theexpected utility obtained by selecting to acquire aprofile that belongs to e. Obviously, if E[ω(e)] wasknown for all SECs, the optimal acquisition decisionin any stage is to acquire a profile from the SEC thathas the maximal expected utility. However, E[ω(e)]is not known for every e, as this would require ac-quiring the entire SN graph G.

Let ω(e) be the average utility obtained so far byacquiring profiles from SEC e. ω(e) can be used asan estimator for E[ω(e)] if sufficient profiles from ewere acquired. We call ω(e) the exploitation value of

Page 4 of 15c©ASE 2012

Page 5: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

e since by deciding to acquire a profile that achievedthe maximal value of ω(e), we exploit the knowledgewe accumulated so far about the SN graph.

As the number of acquisitions from a specific equiva-lence class increases, the estimation of ω(e) becomesmore accurate and can be used to perform better ac-quisition decisions in the future. Thus, performingacquisitions of frontier nodes that belong to SEC ealso explores the potential of selecting profiles fromSEC e for future acquisitions. We call the numberof profiles acquired from a given SEC e, the explo-ration value of e. Low exploration value indicatesthat the estimated value ω(e) may not be accurate.Therefore, simply choosing to acquire a node from theSEC with maximal ω(e) on each acquisition, withoutconsidering the exploration may lead to suboptimalperformance.

A tradeoff exists between the exploration and the ex-ploitation values of SECs. Specifically, the dilemmain SN querying is to decide which of the followingnodes to acquire: 1) A profile that belongs to theSEC with the maximal estimate (ω(e)), thus max-imizing the probability to obtain immediate utility.2) A profile from a SEC which has not been “sam-pled” enough times, thus exploring the topology andproviding better estimates for that SEC.

Our approach aims to balance exploration and ex-ploitation during targeted crawling. We base thison previous work on balancing exploration and ex-ploitation for the classical Multi-armed bandit (MAB)problem. In the next sections, we introduce the MABproblem and show how we adopt it to our problem.

IV MULTI-ARMED BANDIT PROBLEM

The multi-arm bandit problem (MAB) [4] assumes aset of K independent gambling machines (or arms).At each turn (n), a player pulls one of the arms (ai)and receives a reward (Wi,n) drawn from some un-known distribution with an unknown mean value µi.

A policy for MAB chooses the next arm to pull basedon previously observed rewards. A MAB policy candecide to “exploit” the knowledge accumulated so farby pulling the arm that was shown to produce thebest sequence of rewards. Alternatively, a policy candecide to “explore” an arm that was only pulled a fewtimes even though its reward history is less attractivethan other arms, in order to get a more accurate es-timation of the arm’s expected reward.

MAB policies are often designed to minimize their ac-cumulated regret (Rn), defined as the expected loss ofrewards incurred by not always pulling the arm withthe maximal expected reward (denoted as µ∗) in the

turns up to n. Formally, Rn = n · µ∗ −∑K

i=1 µi ·E[Ti(n)] where E[Ti(n)] is the expected number ofpulls of arm i in the first n turns.

The Upper Confidence Bound 1 policy [13] (denotedUCB1) is a common policy ensuring that Rn =O(log(n)). Initially, UCB1 pulls each arm once.Then, on each turn n UCB1 pulls an arm ai thatmaximizes the following expression:

Wi +

√2 · lnnTi(n)

(1)

where Wi is the average reward observed so far bypulling arm ai.

Many other MAB policies exist, such as the ε-Greedypolicy [14]. This policy chooses the best arm (ex-ploit) with probably 1 − ε and chooses an arbitraryarm with less than maximal average reward (explore)with probably ε.

MAB applications spread on many areas such as clini-cal trials, web search, Internet advertising and multi-agent systems [15–18]. Modeling these problems asMAB and using MAB policies for solving them hasbeen shown to be very successful.

2

E2 E3

1

3 4 5

E1

Figure 1: Structural Equivalence classes in CKG

V SOCIAL NETWORK QUERIES ASMULTI-ARMED BANDITS PROBLEM

We now turn to model the problem of SN query-ing as a MAB problem and develop suitable algo-rithms. Each SEC ei in the CKG represents an armai. Pulling an arm ai corresponds to acquiring a pro-file from SEC ei. The immediate reward received by

Page 5 of 15c©ASE 2012

Page 6: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

pulling ai is the ω of the corresponding profile. Theexpected reward µi of each arm ai is the expectedutility of acquiring a random frontier node from ei.Thus -

µi = E[ω(ei)]

To demonstrate this model, consider Figure 1. TheCKG consists of 5 internal nodes (numbered 1-5) andthree SECS. All frontier nodes that are connected tonodes 1 and 2 are part of SEC E1. SEC E2 containsall frontier nodes connected to nodes 4 and 5 and SECE3 contains all frontier nodes that are connected tonode 5.

Assume a boolean utility function, where each fron-tier node v can have either ω(v) of 1 (colored red inFigure 1) or 0 (colored black). Thus, the actual ex-pected rewards of the arms in this example are 5

9 , 12

and 45 for SECs E1, E2 and E3 respectively. After a

profile was selected and acquired, the immediate re-ward is received and the average utility of the SEC iscalculated.

For example, assume that three acquisitions wereperformed from SEC E1, two acquisitions were per-formed from SEC E2 and additional two acquisitionswere performed from E3. All three acquisitions offrontier nodes from E1 produced utility of 1, bothacquisitions of frontier nodes from E2 produced util-ity of 0. The acquisitions of frontier nodes from E3produced utility of 1 and 0. Therefore, the averageutilities of the SECs E1, E2 and E3 are now 1,0 and12 , respectively. A “pure exploitation” MAB policywill choose to repeatedly acquire frontier nodes fromE1 until its average utility degrades. A MAB pol-icy that balances exploration and exploitation mightchoose to explore E2 and E3 earlier in order to pro-duce better estimates for these SECs.

VI THE INFLUENCE OF PROFILE AC-QUISITIONS ON SECS

Acquiring profiles causes dynamic changes to theCKG, the SECs and their expected utility. In thissection we describe these changes and show how wemodify our MAB formalization accordingly.

1 NON-STATIONARY BANDITS

As described earlier, when a frontier node v fromSEC ei is acquired it becomes an internal node andits ω(v) is revealed. Since v cannot be acquired againit is no longer relevant to the calculation of µi. This issimilar to a case of sampling without replacement [19].Thus, the value of µi can potentially change aftereach pull of an arm. For example, in Figure 1, ac-quiring the single frontier node that produced utilityof 0 from SEC E3, change µ3 from 4

5 to 1.

This phenomenon is called non-stationary banditsand it is discussed especially in the context of Monte-Carlo tree search (MCTS) algorithms [20, 21] wherethe expected rewards of branches in the search treechange as the search progresses. A simple modifica-tion to the UCB1 formula exists such that the regretbounds are still logarithmically on such cases [21].The modified formula is shown in Equation 2.

Wi + 2Cp

√lnn

Ti(n)(2)

where Cp is an appropriate constant defined by theUCT algorithm [21]. Since bandits in SN queryingare non-stationary, in our algorithms (described inlater sections of the paper) we use Equation 2 insteadof Equation 1.

Page 6 of 15c©ASE 2012

Page 7: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

1 2

3 4 5

(a) Initial

1 2

3 4 5

(b) Disappear

43

6

1 2

5

(c) Appear

1 2

43 5

(d) Split

Figure 2: Structural Equivalence Classes events. The initial CKG, with a single SEC is shown in Fig 2(a).Fig 2(b) demonstrates the “disappear” event, when nodes 3,4 and 5 are acquired. Fig 2(c) shows the “appear”event, triggered by an acquisition of node 5. Fig 2(d) shows an alternative scenario of acquiring node 5 whichtriggers a “split” event.

2 DYNAMIC CHANGES OF ARMS

Furthermore, besides the fact that µ can change, thestructures of the SECs themselves may dynamicallychange too. As exploration continues and frontiernodes are being acquired, SECs (arms) can: (1) ap-pear (2) disappear or (3) split into two different arms.We define the three events, triggered by a profile ac-quisition action and demonstrate each event via Fig-ure 2. Internal nodes are dark and frontier nodesare light. The CKG prior to any of the events is de-picted at Figure 2(a). In this example there are twointernal nodes (1 and 2) and three frontier nodes (3,4and 5). Edges connecting nodes 3-5 to other parts ofthe graph are still unknown. This CKG consists of asingle SEC (nodes 3,4,5).

Disappear event. When all the nodes in SEC eihave been acquired, the arm ai can no longer be“pulled” and it is said to “disappear”. Figure 2(b)demonstrates this event. The frontier nodes (3,4 and5) were acquired and no new edges had been found.Thus, the SEC defined by the internal nodes 1 and 2“disappears”.

Appear event. If a profile v is acquired and noneof the new profiles which are friends of v are alreadyrepresented in CKG, a new SEC is formed and a newarm is said to “appear”. Figure 2(c) demonstratesthis event. By acquiring node 5, the frontier node 6is found and added to CKG, making it the only nodeconnected to nodes 1,2 and 5. Thus, node 6 is a solemember of a new SEC defined by connection to nodes1,2 and 5 while nodes 3 and 4 form their own SEC(the SECs are marked by a red circle).

Split event. When a profile v is acquired and one

or more of its new neighbors are already representedin CKG, these profiles are no longer associated withtheir original SEC. The original SEC is said to “split”into two SECs. Figure 2(d) demonstrates this casewhere node 5 is acquired, revealing that it is con-nected to node 4. In this case node 4 forms a newSEC (connected to 1,2 and 5) and node 3 stays in theoriginal SEC (connected to 1 and 2).

In order to deal with this dynamic behavior, wenow introduce an extension of MAB called ”VolatileMulti-Armed Bandits“ (VMAB) 3 where arms can“appear” or “disappear” unexpectedly. We introducea policy for VMAB that achieves a bounded regretand use this policy to solve the SN querying prob-lem.

*

*

*

*

*

t=0 t=0 MAB VMAB

1

2

.

.

K

Figure 3: Regular MAB (left) and Volatile MAB(right).

VII VOLATILE MULTI-ARMED BAN-DITS

Standard MAB problems assume a constant set ofK arms which are available indefinitely. We proposean extension of MAB, where new arms can “appear”or “disappear” on each turn. (Unlike the “appear”and “disappear” events, the “split” event is not part

3A preliminary version of this extension was briefly introduced in [22]

Page 7 of 15c©ASE 2012

Page 8: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

of the definition of (VMAB). However, we describehow to deal with “split” events when we describeour SN querying algorithm in Section VIII). We callthis MAB variant the Volatile-multi-Arm bandit prob-lem (VMAB). In VMAB, every arm ai is associatedwith a lifespan (zi, ti) during which this arm is avail-able. Figure 3 illustrates the difference between MABand VMAB. The standard MAB (left) has a fixed setof K arms, one of them is optimal (colored red). InVMAB (right), arms “appear” and “disappear” andthe optimal arm may change over time. Note thatthe arms’ lifespans are unknown in advance.

There are many other cases where it is beneficial toextend MAB model to the model of VMAB, even forapplications unrelated to SN queries. For example,consider an Internet advertisements matching wherethe task is to choose an advertisement that is mostlikely to be clicked by some user. On some cases,new advertisements might suddenly become availablewhen new campaigns are approved. Alternatively, ex-isting advertisements can become unavailable whenthe campaign budget runs out.

1 VMAB POLICY

A policy for VMAB chooses on each turn n an armai with a valid lifespan zi ≤ n ≤ ti. The expectedregret of a VMAB policy (labeled Rv

n) is:

Rvn =

n∑t=1

µ∗(t)− µ(I(t))

Where µ∗(t) is the expected reward of the optimalarm at turn t and I(t) is the arm selected at turn t.

Obviously, the regret of VMAB depends on the avail-able arms at each time step. Consider a set of K armswith mutual-exclusive lifespans. In this case, the ac-cumulated regret is 0 as there is only one arm to pullat each time. By contrast, if all arms have the samelifespan, the problem is identical to standard MAB.

For solving a VMAB problem, we propose the follow-ing policy, which we call VUCB1. VUCB1 selects thearm that maximizes Equation 3.

Wi +

√2 · ln(n− zi)

Ti(n)(3)

where zi is the turn in which arm i first “appeared”,n is the number of pulls performed and Ti(n) is thenumber of times arm i had been pulled. When Ti(n)

is 0, the expression in Equation 3 is evaluated to pos-itive infinity and thus, this new arm will be chosen.

2 BOUNDED REGRET FOR VUCB1

Beyond being effective in practice, which we show inSection X, we next provide theoretical bounds on theregret of using VUCB1.

We define an epoch as an interval in which new armsare not created (arms can “disappear” in a givenepoch but new arms cannot “appear”).

The following proof is based on the regret boundproof for UCB1 [13] and uses the same notationswhen possible. Similar to the UCB1 proof, we boundthe number of times each suboptimal arm i had beenselected.

Unlike the original MAB problem, arms in VMABmight be suboptimal at some intervals while beingoptimal at other intervals. It is also possible forthe same arm to have several non-adjacent intervalswhere it is suboptimal. Therefore, we deal with eachsuch “suboptimal”-interval independently. We usethe term Ti(n) to denote the number of pulls of armi in the “suboptimal”-interval [so, n], where arm i issuboptimal throughout the whole interval.

Also, the “suboptimal”-interval [so, n] itself might bedecomposed to several sub-intervals, where on eachsub-interval, a different arm is optimal. For simplic-ity, we demonstrate our proof based on a special casewhere a single optimal arm exists throughout [so, n].

We denote all the properties related to the currentoptimal arm with ∗, for example z∗ is the turn inwhich the optimal arm first “appeared”.

We prove our bounds for an individual “suboptimal”-interval. The generalization for multiple sub-intervalsis not complex and produces similar bounds. An intu-ition for this could be found by further dividing each“suboptimal”-interval to its sub-intervals and provingthe same bounds on each in a similar manner.

Lemma 1. The accumulated expected regret Rn ofVUCB1 policy within a single epoch is O(ln(n)).

Proof Sketch:

Let It denote the arm selected at time t. Also, us-ing the notion of [13], for any predicate Π, we defineΠ = 1 if the predicate Π is true and Π = 0 if Π

Page 8 of 15c©ASE 2012

Page 9: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

is false. Let l be an arbitrary positive integer and

ct,z,s =

√2 · ln(t− z)

s

Following the proof presented in [13]:

Ti(n) =

n∑t=so

It = i

≤ l +

n∑t=so

It = i, Ti(t− 1) ≥ l

≤ l +

n∑t=so

XT∗(t−1) + ct−1,z∗,T∗(t−1) ≤ Xi,Ti(t−1) +

ct−1,zi,Ti(t−1),Ti(t−1) ≥ l

≤ l +

n∑t=so

minso<s<t

X∗s + ct−1,z∗,s ≤ maxl≤si<t

Xi,si +

ct−1,zi,si

≤ l+n∑

t=so

t−1∑s=so

t−1∑si=l

X∗s + ct−1,z∗,s ≤ Xi,si + ct−1,zi,si

Where X∗s is the average of the optimal arm up to

time s and Xi,si is the average of the suboptimal armup to time si.

The inequality X∗s + ct−1,z∗,s ≤ Xi,si + ct−1,zi,si istrue only when at least one of the following threeconditions are true:

X∗s ≤ µ∗ − ct−1,z∗,s (4)

Xi,si ≤ µi + ct−1,zi,si (5)

µ∗ < µi + 2 · ct−1,zi,si (6)

By using the Chernoff-Hoffding inequality [13], wecan bound the probabilities for conditions 4 and 5:

P(X∗s ≤ µ∗ − ct−1,z∗,s) ≤ (t− z∗)−4

P(Xi,si ≤ µi + ct−1,zi,si) ≤ (t− zi)−4

Let ∆i be the difference between the expected re-wards of the arms, µ∗ − µi.

Condition 6 is false when l is 8 ln(n−zi)∆2

isince when

si ≥ l, µ∗ − µi − 2 · ct−1,zi,si ≥ µ∗ − µi −∆i = 0.

Thus,

Ti(n) ≤ 8 ln(n− zi)∆2

i

+

n∑t=so

t−1∑s=so

t−1∑si=l

(t−z∗)−4+(t−zi)−4

We define

n∑t=so

t−1∑s=so

t−1∑si=l

(t − z∗)−4 + (t − zi)−4 as a

constant M .

We define Kt to be the number of suboptimal armsat turn t. Therefore the number of suboptimal armselections at turn n is at most:

8 ·∑

1<i<Kt

∑0<t<n

ln(n− zi)∆2

i

+M

The regret for a single suboptimal arm is thus

( 8 ln(n−zi)∆2

i+ M) · ∆i. And therefor, the total regret

in an epoch is:

8 ·∑

1<i<Kt

∑0<t<n

ln(n− zi)∆i

+M

Theorem 1. Let B be the number of epochs upto turn n. The accumulated expected regret Rn ofVUCB1 policy is O(B · ln(n)).

Proof Sketch: Following Lemma 1, each epoch pro-duces a logarithmic regret. Since B epochs had oc-curred up to time n, the accumulated expected regretRn of a VUCB1 policy is O(B · ln(n)).

VIII BANDIT ALGORITHM FOR SNQUERYING

1: While |acquisitions| < b2: Select SEC e∗ that maximizes:

3: ω(e) + Cp

√ln(n−z e)

|acquisitions(e)|4: p ← rand(e∗)5: acquire(p)

Figure 4: SN-UCB1 algorithm

We now introduce our algorithm for SN queryingwhich is denoted as SN-UCB1. SN-UCB1 models SNquerying as a VMAB problem and uses our suggested

Page 9 of 15c©ASE 2012

Page 10: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

VUCB1 policy. Since our bandits are non-stationarywe use the formula shown in Equation 2, taken fromthe UCT algorithm [21].

The pseudo code for SN-UCB1 is given in Figure 4.We denote the total number of profile acquired by|acquisitions| and denote |acquisitions(e)| as thenumber of profile acquired from SEC e. SN-UCB1 re-peatedly performs profile acquisitions until the limit bis reached. In line 3, the SEC with the maximal valueaccording to the VUCB1 formula is selected (denotedas e∗).

Since SECs are dynamic, we use the VUCB1 policyand consider also the turn in which each the armfirst “appeared” (ze). As explained above, to dealwith non-stationary bandits we use the modified for-mula provided in Equation 2 (instead of the UCB1formula). Next, in line 4, one of the profiles is ran-domly selected and acquired, discovering its utilityand LOF. The LOF is added to the CKG.

1 “SPLIT” EVENTS

SECs are dynamic. Acquired profiles can migratefrom one SEC to a different one when “splits” occur.Therefore, we re-calculate ω(e) before each iterationaccording to current acquired profiles that belong toeach SEC.

On some cases, the “split” event creates a new SEC,for which the value of ω is unknown since none of theprofiles in the new SEC had been acquired.

According to the VUCB1 policy, in these cases, a pro-file acquisition from this new SEC must immediatelybe performed. However, in order to “reuse” informa-tion collected so far that can be used as an indica-tion for the performance of this SEC, our algorithminitializes a “default” value and assigns it as the av-erage utility for this SEC (with no prior acquisitions)instead of immediately performing acquisitions fromit.

Thus, the average utility of a new SEC e with noprior acquisitions is initialized according to the fol-lowing formula:

ω(e) = P · ω(e1) + (1− P ) · h(e)

where e1 is a SEC from which e had “split”, P is theproportion of e’s frontier-leads in e1 and h(e) is theheuristic value of e.

2 SN-UCB1 IMPROVEMENTS

Several improvements can be added to the simplevariant of SN-UCB1.

2.1 ARM TIE-BREAKING

When a topological heuristic function exists, theheuristics values can be used to break ties amongarms. Other tie-breaking methods are also possible.

2.2 VIRTUAL ACQUISITIONS

When calculating the VUCB1 formula for each SEC,we add to the calculation M “virtual” acquisitions(M is an adjustable parameter). These acquisitionshad not been performed in reality. We assign a de-fault value for these acquisitions according to someheuristic function. Thus, new arms are formed withsome “virtual-experience”. This prevents frequent“exploration” acquisitions when the rate of appear-ance of new SECs is high. A similar idea was alsoused for solving the Canadian Traveler Problem [23].

IX RELATED WORK

Obtaining information from a SN by intelligent crawl-ing [3,24,25] or by other methods [26] was previouslydiscussed, where the goal was to uncover large por-tions of the SN.

In addition, Fire et al, [2] proposed a method forfinding profiles that are part of a given organizationby first acquiring profiles with the largest number ofknown friends that are within the targeted organi-zation. Stern et al. [1] suggested other methods fortargeted oriented intelligence crawling where the taskis to find profiles that are connected to a given tar-get. Unlike our proposed approach, these works re-quire specific domain dependent heuristic functionsand can not provide guarantees on the performanceof the entire querying process.

Our new variant, VMAB, is reminiscent of Mor-tal Multi-Armed Bandits [27] designed for online-advertising. They also assume that arms can expireor become available. However, the number of arms isfixed, i.e., whenever one arm disappears, a new armappears. Their analysis focuses on when to stay withthe previous best arm and when to select a new arm,a problem which in inherently different than ours.Restless-bandits [28] assumes a fixed set of arms, from

Page 10 of 15c©ASE 2012

Page 11: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

which only a subset is available at each turn. Severalother similar variants exists [29,30] having a differentset of assumptions than ours.

X EXPERIMENTAL RESULTS

To demonstrate SN-UCB1 we performed experimentson two domains: the Target Oriented Network Intelli-gence Collection (TONIC) problem [1] on a Google+dataset and finding research communities on a co-authorship network of computer science bibliogra-phy, constructed from DBLP [31]. For each domainwe conducted up to 100 experiments, using differentgraphs and seeds each time. We then average the re-sults across all experiments. For SN-UCB1 we set the“virtual” acquisitions M to 20 and broke ties betweenarms using the Known Degree heuristic.

1 TONIC

In TONIC we are interested in retrieving profiles con-nected to a given profile of interest, denoted as thetarget. Due to privacy issues, the target SN profile isinaccessible to third parties. However, other SN pro-files contain information about the target and are ac-cessible, having more relaxed privacy settings. Suchprofiles are called leads. The TONIC problem is tofind as many leads as possible while minimizing thenumber of analyzed profiles.

Solving TONIC consists of analyzing SN profiles,which is the equivalent of the profile acquisition op-erator discussed in this paper. If a profile is a leadthen acquiring it, e.g., with information extractionmethods [8–10], reveals information about the tar-get, and may provide pointers to additional potentialleads. Several TONIC heuristics were proposed todecide which potential lead to acquire next in orderto find more leads [1]. The best performing TONICheuristic, called BysP, associates each lead with apromising rate (PR), which is intended to representhow likely it is for a random profile connected to thislead to also be a lead. PR of a lead l would ideally bethe percentage of leads out of the total profiles con-nected to l, and is estimated by considering only thepreviously acquired profiles. BysP then prioritizedpotential leads by aggregating the PR values of theleads they are connected to. For further details onTONIC and BysP, see [1].

Our experiments for TONIC were performed on adata set obtained from the Google+ network and in-cludes 211K profiles with 1.5M links between them.

This data set was collected and made available byFire et al. [32]. We selected 100 random profiles fromthis data set, each having at least 30 friends. Theseprofiles were used as the targets in our experiments.We assume that an experiment set of size 100 is largeenough to reflect the performance of our algorithms.We also assume that finding 30 leads is challengingenough for the TONIC problem. For each target weselected three random profiles as seeds.

Figure 5: Google+ TONIC recall comparison.

Figure 6: Average recall of research communities

Figure 5 demonstrates the results of our experimentson the TONIC domain. The X axis describes the per-centage of profiles acquired during the experiment.The Y axis, describes the percentage of leads foundfrom the total leads that exist in the network. Thecurves show the averages over all 100 targets. Whenmore acquisitions are made more leads are found andin the end all nodes have been acquired and thus allleads were found. The challenge is to find as manyleads as early as possible.

We compared the state of the art topological heuris-tic for TONIC (BysP) and SN-UCB1. In addition wecompared two benchmark policies: a random policy

Page 11 of 15c©ASE 2012

Page 12: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

that chooses an arm randomly, and an oracle policythat chooses the arm with the actual maximal ex-pected utility (denoted as the Optimal Arm in Fig-ure 5).

Both BysP and SN-UCB1 perform well and are closeto optimal. Yet, SN-UCB1 policy slightly outper-forms BysP even though BysP is a hand-tailoredheuristic while SN-UCB1 is a general-purpose heuris-tic.

2 FINDING COMMUNITIES IN DBLPGRAPH

Community search is our second domain for demon-strating our approach. We use the data of the DBLPcomputer science bibliography. This data was col-lected and made available by [31]. The task in thisdomain is to find research communities. A researchcommunity is defined by multiple authors who pub-lish in a certain journal or conference. We use a co-authorship network that is constructed such that ev-ery two authors who published at least one papertogether according to DBLP are connected.

We performed experiments on 37 different researchcommunities, having at least 100 members in eachcommunity. From each community we selected threerandom profiles as seeds. We experimented with thesame four polices described above for TONIC andshow the averages of these experiments in Figure 6.In this domain, our algorithm SN-UCB1 achieved aconsistent improvement over BysP. Since the amountof members in this domain is relatively large weused a smaller experiment set than the one used inTONIC (37 experiments in DBLP vs. 100 experi-ments in TONIC). However, we still regard 37 as alarge enough experiment set to demonstrate the ad-vantage of using our algorithm.

The oracle method provides an indication about themaximal potential improvement in performance thatcan be gained by selecting the best arm at each ac-quisition. We expect to see a more significant im-provement of SN-UCB1 over other heuristics whenthe advantage of the oracle method over these heuris-tics is larger. The clear advantage of SN-UCB1 overthe state of the art heuristics in the DBLP domainas opposed to the less evident improvement in theTONIC domain, supports this idea.

Table 1: Average number of acquisitions required toreach specific level of recall.

RecallDBLP TONIC

BysP SN-UCB1 BysP SN-UCB10.75 1275 1200 900 9000.80 2100 1883 1150 11500.85 3025 2725 1400 13750.90 3600 3425 2025 19000.95 5650 4518 3075 28250.96 5877 5500 3325 31750.97 6957 5582 3700 35750.98 7509 6561 4150 39750.99 8020 7464 4925 4600

Figure 7: Average number of arms in Google+

Table 1 describes the average number of acquisitionsrequired by each of the algorithms to reach specificlevel of recall on the DBLP and TONIC domains.On average, much less acquisitions are needed by SN-UCB1 to achieve the same level of recall compared toBysP on both domains.

3 STATISTICAL SIGNIFICANCE OF RE-SULTS

In both domains, our experiments demonstrate onlya subtle advantage for our method compared to thestate of the art heuristics. We measure the statisti-cal significance of our results by performing a pairedt-test on the performance obtained on each of theexperiments by each of the algorithms. We performthese statistical tests on both DBLP and TONIC do-mains in a similar way.

For each experiment and algorithm, we produce a re-call curve, similar to the curves presented in Figures5 and 6. The X axis on these curves is the percent-age profile acquisitions out of the total profile acqui-

Page 12 of 15c©ASE 2012

Page 13: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

sitions performed on that experiment. The Y axisis the percentage of successful finds of leads or com-munity members in the experiment. We then calcu-late the Area Under the Curve(AUC) for each curve.We pair the AUC obtained by our algorithm and thestate of the art heuristics based on the specific ex-periment performed and use this data as input fora paired t-test. For the DBLP domain we obtainedan increase in performance (mean 0.74 versus 0.72)with statistical significance of p=0.00063. The dis-tribution of the AUC obtained on each experimentis demonstrated in Figure 8. Although less appar-ent, the TONIC domain still demonstrates a slightimprovement (0.814 vs. 0.810). However, the signifi-cance levels were much lower (p=0.15657).

Figure 8: AUC per experiment in DBLP communitysearch. Each pair of bars represent the AUC obtainedon a given experiments by BysP and SN-UCB1.

4 ANALYSIS OF ARM DYNAMIC BE-HAVIOR

We also examined the average number of arms duringour experiments. Figure 7 shows a stacked graph ofthe averages over our entire TONIC domain experi-ment set. After each acquisition we show the numberof expired arms, new arms, which had not been pulledbefore and arms that were pulled at least once (expe-rienced arms). We also show the average number ofacquisitions (pulls) performed so far on the selectedSEC at each acquisition.

As the querying process progresses, the number of ac-quisitions get larger, arms “disappear” and new arms“appear”. However, there is also substantial amountof existing SECs (experienced arms) with different av-erage utilities. Our algorithm decides whether to trya new arm or an experienced arm. In the latter case,our algorithm decides on which arm to try in orderto balance exploration and exploitation.

XI CONCLUSION AND SUMMARY

We proposed a novel, general-purpose, approach forguiding social network crawler, balancing explorationand exploitation in the crawling process. This newapproach is based on modeling social network query-ing as a Multi-Armed Bandit problem.

We formally defined a new MAB variant, called MABwith volatile arms (VMAB), and provided a policy forVMAB with bounded regret. Then, we showed thatSN querying can be modeled as a VMAB by splittingthe frontier nodes into disjoint sets of structurallyequivalent nodes (SECs). Each SEC is regarded asan “arm” in our VMAB modeling. A correspondingalgorithm called SN-UCB1 is presented, based on ourregret bounded VUCB1 policy.

Importantly, SN-UCB1 is not tailored to a specifictype of SN queries as other heuristics. However, itcan be augmented with query specific heuristics us-ing Virtual Acquisitions and tie-breaking. Evaluationresults show that the new policy has non negligiblecontribution over the state of the art saving up to20% requests as shown in Table 1.

Unlike previous existing heuristics, SN-UCB1 hasperformance guarantees of being logarithmic in theaccumulated regret. Moreover, experiments per-formed with an oracle that chooses the SEC havingthe highest expected utility demonstrate that thereis much more space for improving state of the artcrawling policies using our approach.

XII ACKNOWLEDGMENTS

This research was supported by the IDF, the IsraeliMinistry of Industry Trade and Commerce, Office ofthe Chief Scientist and the Israeli Science Foundation(ISF) under grant #305/09 to Ariel Felner.

References

[1] R. Stern, L. Samama, R. Puzis, A. Felner,Z. Bnaya, and T. Beja, “Target oriented net-work intelligence collection for the social web,”in AAAI, 2013.

[2] M. Fire, R. Puzis, and Y. Elovici, “Organizationmining using online social networks,” CoRR, vol.abs/1303.3741, 2013.

[3] A. Korolova, R. Motwani, S. U. Nabar, andY. Xu, “Link privacy in social networks,” in

Page 13 of 15c©ASE 2012

Page 14: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

Proceedings of the 17th ACM conference on In-formation and knowledge management. ACM,2008, pp. 289–298.

[4] H. Robbins, “Some aspects of the sequential de-sign of experiments,” in Herbert Robbins SelectedPapers. Springer, 1985, pp. 169–177.

[5] C. S. Fleisher, “Using open source data in devel-oping competitive and marketing intelligence,”European journal of marketing, vol. 42, no. 7/8,pp. 852–866, 2008.

[6] T. S. Teo and W. Y. Choo, “Assessing the im-pact of using the internet for competitive intel-ligence,” Information & management, vol. 39,no. 1, pp. 67–83, 2001.

[7] H. Bean, “Is open source intelligence an ethicalissue?” Research in Social Problems and PublicPolicy, vol. 19, pp. 385–402, 2011.

[8] C. Chang, M. Kayed, M. Girgis, K. Shaalanet al., “A survey of web information extractionsystems,” IEEE transactions on knowledge anddata engineering, vol. 18, no. 10, p. 1411, 2006.

[9] J. Tang, L. Yao, D. Zhang, and J. Zhang, “Acombination approach to web user profiling,”ACM Trans. Knowl. Discov. Data, vol. 5, no. 1,pp. 2:1–2:44, 2010.

[10] P. Pawlas, A. Domanski, and J. Domanska,“Universal web pages content parser,” in Com-puter Networks, ser. Communications in Com-puter and Information Science. Springer BerlinHeidelberg, 2012, vol. 291, pp. 130–138.

[11] R. Stern, M. Kalech, and A. Felner, “Findingpatterns in an unknown graph,” AI Commun.,vol. 25, no. 3, pp. 229–256, 2012.

[12] L. D. Sailer, “Structural equivalence: Meaningand definition, computation and application,”Social Networks, vol. 1, no. 1, pp. 73 – 90, 1978.

[13] P. Auer, N. Cesa-Bianchi, and P. Fischer,“Finite-time analysis of the multiarmed banditproblem,” Mach. Learn., vol. 47, no. 2-3,pp. 235–256, May 2002. [Online]. Available:http://dx.doi.org/10.1023/A:1013689704352

[14] R. S. Sutton and A. G. Barto, Reinforcementlearning: An introduction. Cambridge UnivPress, 1998, vol. 1, no. 1.

[15] F. Radlinski, R. Kleinberg, and T. Joachims,“Learning diverse rankings with multi-armed

bandits,” in Proceedings of the 25th interna-tional conference on Machine learning. ACM,2008, pp. 784–791.

[16] Y. Yue and T. Joachims, “Interactively opti-mizing information retrieval systems as a du-eling bandits problem,” in Proceedings of the26th Annual International Conference on Ma-chine Learning. ACM, 2009, pp. 1201–1208.

[17] S. Pandey and C. Olston, “Handling advertise-ments of unknown quality in search advertising,”Advances in Neural Information Processing Sys-tems, vol. 19, p. 1065, 2007.

[18] D. Bouneffouf, A. Bouzeghoub, and A. L.Gancarski, “A contextual-bandit algorithm formobile context-aware recommender system,” inNeural Information Processing. Springer, 2012,pp. 324–331.

[19] D. G. Horvitz and D. J. Thompson, “A general-ization of sampling without replacement from afinite universe,” Journal of the American Statis-tical Association, vol. 47, no. 260, pp. 663–685,1952.

[20] C. B. Browne, E. Powley, D. Whitehouse, S. M.Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener,D. Perez, S. Samothrakis, and S. Colton, “A sur-vey of monte carlo tree search methods,” Com-putational Intelligence and AI in Games, IEEETransactions on, vol. 4, no. 1, pp. 1–43, 2012.

[21] L. Kocsis and C. Szepesvari, “Bandit basedmonte-carlo planning,” Machine Learning:ECML 2006, pp. 282–293, 2006.

[22] Z. Bnaya, R. Puzis, R. Stern, and A. Felner,“Volatile multi-armed bandits for guaranteedtargeted social crawling,” in Late breaking pa-pers at the Twenty-Seventh AAAI Conference onArtificial Intelligence, 2013.

[23] P. Eyerich, T. Keller, and M. Helmert, “High-quality policies for the canadian traveler’s prob-lem,” in Third Annual Symposium on Combina-torial Search, 2010.

[24] D. H. Chau, S. Pandit, S. Wang, and C. Falout-sos, “Parallel crawling for online social net-works,” in Proceedings of the 16th internationalconference on World Wide Web. ACM, 2007,pp. 1283–1284.

[25] A. Mislove, M. Marcon, K. P. Gummadi, P. Dr-uschel, and B. Bhattacharjee, “Measurement

Page 14 of 15c©ASE 2012

Page 15: Social Network Search as a Volatile Multi-armed Bandit Problemfelner/2013/sn_vmab.pdf · In this paper we introduce a method which consid-ers both exploration and exploitation in

and analysis of online social networks,” in Pro-ceedings of the 7th ACM SIGCOMM conferenceon Internet measurement. ACM, 2007, pp. 29–42.

[26] J. Bonneau, J. Anderson, and G. Danezis,“Prying data out of a social network,” inSocial Network Analysis and Mining, 2009.ASONAM’09. International Conference on Ad-vances in. IEEE, 2009, pp. 249–254.

[27] D. Chakrabarti, R. Kumar, F. Radlinski, andE. Upfal, “Mortal multi-armed bandits,” in Neu-ral Information Processing Systems, 2008, pp.273–280.

[28] P. Whittle, “Restless bandits: Activity alloca-tion in a changing world,” Journal of appliedprobability, pp. 287–298, 1988.

[29] ——, “Arm-acquiring bandits,” The Annals ofProbability, pp. 284–292, 1981.

[30] R. Kleinberg, A. Niculescu-Mizil, andY. Sharma, “Regret bounds for sleepingexperts and bandits,” Machine learning, vol. 80,no. 2-3, pp. 245–272, 2010.

[31] J. Yang and J. Leskovec, “Defining and eval-uating network communities based on ground-truth,” CoRR, vol. abs/1205.6233, 2012.

[32] M. Fire, L. Tenenboim, O. Lesser, R. Puzis,L. Rokach, and Y. Elovici, “Link predictionin social networks using computationally effi-cient topological features,” in Privacy, security,risk and trust (passat), 2011 ieee third interna-tional conference on and 2011 ieee third inter-national conference on social computing (social-com). IEEE, 2011, pp. 73–80.

Page 15 of 15c©ASE 2012