an adaptive algorithm for selecting profitable …rusmevic/psfiles/bidding-ec.pdfan adaptive...

26
An Adaptive Algorithm for Selecting Profitable Keywords for Search-Based Advertising Services Paat Rusmevichientong Cornell University [email protected] David P. Williamson Cornell University [email protected] ABSTRACT Increases in online search activities have spurred the growth of search-based advertising services offered by search en- gines. These services enable companies to promote their products to consumers based on their search queries. In most search-based advertising services, a company selects a set of keywords, determines a bid price for each keyword, and designates an ad associated with each selected keyword. When a consumer searches for one of the selected keywords, search engines then display the ads associated with the high- est bids for that keyword on the search result page. A com- pany whose ad is displayed pays the search engine only when the consumer clicks on the ad. With millions of available keywords and a highly uncertain clickthru rate associated with the ad for each keyword, identifying the most effective set of keywords and determining corresponding bid prices becomes challenging for companies wishing to promote their goods and services via search-based advertising. Motivated by these challenges, we formulate a model of keyword selection in search-based advertising services. We develop an algorithm that adaptively identifies the set of keywords to bid on based on historical performance. The algorithm prioritizes keywords based on a prefix ordering sorting of keywords in a descending order of profit-to-cost ratio. We show that the average expected profit generated by the algorithm converges to near-optimal profits. Further- more, the convergence rate is independent of the number of keywords and scales gracefully with the problem’s parame- ters. Extensive numerical simulations show that our algo- rithm outperforms existing methods, increasing profits by about 7%. 1. INTRODUCTION Search-based advertising services offered by search engines enable companies to promote their products to targeted groups of consumers based on their search queries. Exam- ples of search-based advertising services include Google Ad- words (http://www.google.com/adwords), Yahoo! Spon- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. sored Search (http://searchmarketing.yahoo.com/srch/ index.php), and Microsoft’s forthcoming MSN AdCenter (http://advertising.msn.com/searchadv/). In most search- based advertising services, a company wishing to advertise sets a daily budget, selects a set of keywords, and determines the bid price for each selected keyword. When consumers search for one of the selected keywords, search engines then display the ads associated with the highest bids for that keyword on the search result page. A company whose ad is displayed pays the search engine only when the consumer clicks on the ad. Note that the ad is not displayed if the company’s spending that day has exceeded its budget. Fig- ure 3 in Appendix A shows examples of the ads shown to a user under the search-based advertising programs offered by Google and Yahoo!. Details of the actual cost of each keyword, of how a search query matches to a keyword, and of how different bid amounts translate to the position on the search result page vary from one search engine to another. Under the Google Adwords program, for instance, each click will cost the ad’s owner an amount equals to the next highest bid plus one cent. More- over, when determining the ad’s placement on the search re- sult page, the Google Adwords program takes into account not only the bid amount, but also the ad’s popularity and clickthrus. On the other hand, under Yahoo! Sponsored Search program, the highest bidder receives the top spot. With millions of available keywords, a user of search-based advertising services faces challenging decisions. The user not only has to identify an effective set of keywords, she also needs to develop ads that will draw consumers and to determine a bid price for each keyword that balances the tradeoffs between the costs and profits that may result from each click on the ad associated with the keyword. Further- more, much of the information that is required to compute these tradeoffs is not known in advance and is difficult to estimate a priori. For instance, the clickthru probability associated with the ad for each keyword – the probability that the consumer will click on the ad once it appears on the search result page – varies dramatically depending on the keyword, the ad itself, and the position of the ad on the search result page. In this paper, we consider one of the challenges faced by users of search-based advertising services: given a fixed daily budget and unknown clickthru probabilities, develop a pol- icy that adaptively selects a set of keywords in order to maximize total expected profits. We assume that we have N keywords indexed by 1, 2,...,N . For any A ⊆{1,...,N }, let the random variable ZA denote the profits from selecting

Upload: nguyenduong

Post on 22-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

An Adaptive Algorithm for Selecting Profitable Keywordsfor Search-Based Advertising Services

Paat RusmevichientongCornell University

[email protected]

David P. WilliamsonCornell University

[email protected]

ABSTRACTIncreases in online search activities have spurred the growthof search-based advertising services offered by search en-gines. These services enable companies to promote theirproducts to consumers based on their search queries. Inmost search-based advertising services, a company selectsa set of keywords, determines a bid price for each keyword,and designates an ad associated with each selected keyword.When a consumer searches for one of the selected keywords,search engines then display the ads associated with the high-est bids for that keyword on the search result page. A com-pany whose ad is displayed pays the search engine only whenthe consumer clicks on the ad. With millions of availablekeywords and a highly uncertain clickthru rate associatedwith the ad for each keyword, identifying the most effectiveset of keywords and determining corresponding bid pricesbecomes challenging for companies wishing to promote theirgoods and services via search-based advertising.

Motivated by these challenges, we formulate a model ofkeyword selection in search-based advertising services. Wedevelop an algorithm that adaptively identifies the set ofkeywords to bid on based on historical performance. Thealgorithm prioritizes keywords based on a prefix ordering –sorting of keywords in a descending order of profit-to-costratio. We show that the average expected profit generatedby the algorithm converges to near-optimal profits. Further-more, the convergence rate is independent of the number ofkeywords and scales gracefully with the problem’s parame-ters. Extensive numerical simulations show that our algo-rithm outperforms existing methods, increasing profits byabout 7%.

1. INTRODUCTIONSearch-based advertising services offered by search engines

enable companies to promote their products to targetedgroups of consumers based on their search queries. Exam-ples of search-based advertising services include Google Ad-words (http://www.google.com/adwords), Yahoo! Spon-

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 200X ACM X-XXXXX-XX-X/XX/XX ... $5.00.

sored Search (http://searchmarketing.yahoo.com/srch/index.php), and Microsoft’s forthcoming MSN AdCenter(http://advertising.msn.com/searchadv/). In most search-based advertising services, a company wishing to advertisesets a daily budget, selects a set of keywords, and determinesthe bid price for each selected keyword. When consumerssearch for one of the selected keywords, search engines thendisplay the ads associated with the highest bids for thatkeyword on the search result page. A company whose adis displayed pays the search engine only when the consumerclicks on the ad. Note that the ad is not displayed if thecompany’s spending that day has exceeded its budget. Fig-ure 3 in Appendix A shows examples of the ads shown toa user under the search-based advertising programs offeredby Google and Yahoo!.

Details of the actual cost of each keyword, of how a searchquery matches to a keyword, and of how different bid amountstranslate to the position on the search result page vary fromone search engine to another. Under the Google Adwordsprogram, for instance, each click will cost the ad’s owner anamount equals to the next highest bid plus one cent. More-over, when determining the ad’s placement on the search re-sult page, the Google Adwords program takes into accountnot only the bid amount, but also the ad’s popularity andclickthrus. On the other hand, under Yahoo! SponsoredSearch program, the highest bidder receives the top spot.

With millions of available keywords, a user of search-basedadvertising services faces challenging decisions. The usernot only has to identify an effective set of keywords, shealso needs to develop ads that will draw consumers and todetermine a bid price for each keyword that balances thetradeoffs between the costs and profits that may result fromeach click on the ad associated with the keyword. Further-more, much of the information that is required to computethese tradeoffs is not known in advance and is difficult toestimate a priori. For instance, the clickthru probabilityassociated with the ad for each keyword – the probabilitythat the consumer will click on the ad once it appears onthe search result page – varies dramatically depending onthe keyword, the ad itself, and the position of the ad on thesearch result page.

In this paper, we consider one of the challenges faced byusers of search-based advertising services: given a fixed dailybudget and unknown clickthru probabilities, develop a pol-icy that adaptively selects a set of keywords in order tomaximize total expected profits. We assume that we haveN keywords indexed by 1, 2, . . . , N . For any A ⊆ 1, . . . , N,let the random variable ZA denote the profits from selecting

keywords in the set A. A policy φ = (A1, A2, . . .) is a se-quence of random variables, where for all t, At denotes thesubset of keywords that will be selected in period t; At isallowed to depend on the observed outcomes in the previ-ous t − 1 periods. For any T ≥ 1, we aim to find a policyφ = (A1, A2, . . .) that maximizes the total expected profits

over T periods, i.e.PT

t=1 E [ZAt ] .Identifying profitable keywords requires knowledge of the

clickthru probability of the ad associated with each keyword.Unfortunately, the clickthru rate is generally not known inadvance and we can obtain an estimate of its value only byselecting the keyword and observing the resulting displaysof the ad – commonly referred to as its impressions – andthe number of clicks on it. Since we pay for each click on thead, this process can potentially result in significant costs, yetmay offer an opportunity to discover potentially profitablekeywords. We thus must balance the tradeoffs between se-lecting keywords that seems to yield high average profitsbased on past performance and selecting previously unusedkeywords in order to learn about their clickthru probabili-ties. This is usually cast in the machine learning literature asbalancing “exploitation” (of known good options) and “ex-ploration” (of unknown options that might be better thanknown options).

We model the problem as follows: we assume that in ev-ery time period, some number of queries arrive, where thenumber is independent and identically distributed. At thebeginning of each time period, we must specify a subset ofkeywords on which we will bid. Then queries arrive sequen-tially; keyword i is queried with probability λi. If we haveenough money remaining in our budget, our ad is displayed.With some probability pi, the user clicks on our ad; we re-ceive profit πi and must pay cost ci. We assume that theprobabilities λi, costs ci, and profits πi are known; justifi-cation of our ability to know the first two quantities frompublicly available sources is given in Section 2.1. The prob-abilities pi are unknown, and we learn them over time. Ourgoal is to give an algorithm that converges to near-optimalprofits as the number of time periods tends towards infinity.

We begin by considering the static case – when the prob-abilities pi are known. This problem is related to the well-known stochastic knapsack problem and is NP-complete.Following the approximation algorithms of Sahni [23] for thestandard knapsack problem and the nonadaptive algorithmof Dean, Goemans, and Vondrak [7] for the stochastic knap-sack problem, we show that we can obtain a near-optimal ap-proximation algorithm by considering prefix-orderings. Weorder the keywords in decreasing order of profit-to-cost ratio;that is, π1/c1 ≥ π2/c2 ≥ · · · ≥ πN/cN . Then our algorithmchooses a set of the form 1, . . . , ` for some integer `; wecall such a set a prefix of the keywords. We show that if thecost of each item is sufficiently small compared to our bud-get, if the expected number of arrivals of any given keywordis not too large, and if the expected number of searchedkeywords is close to its mean, then in expectation our algo-rithm returns a near-optimal solution, where the closeness tooptimality depends on how well these assumptions are sat-isfied. We give evidence that in practice these assumptionsare quite likely to be true.

Note that the static case of our problem is somewhat dif-ferent than the stochastic knapsack problem. In that prob-lem, profits are known, but sizes are drawn from arbitraryknown distributions and items can be placed in the knap-

sack in a specified order. Here costs (corresponding to itemsizes) are deterministic and known, but query arrival (cor-responding to item placement) is random and not under ourcontrol.

Our result in the static case guides the development of anadaptive algorithm for the case when the clickthru proba-bilities pi are unknown and we must learn them over time.In each time period, we either choose a random prefix ofkeywords, or a prefix of keywords using our algorithm fromthe static case applied to our estimates of the pi from thelast time period. We show that, averaged over the numberof time periods T , in expectation our algorithm converges tonear-optimal profits, where the performance guarantee is al-most the same as that of the static case modulo some termsthat vanish as T goes to infinity.

This result should be compared to traditional multi-armedbandit algorithms [3, 4, 8, 15, 16, 17]. These algorithmsmust choose one of N slot machines (or “bandits”) to playin each time step. Each play of a bandit will yield a re-ward, whose distribution is not known in advance. Severalalgorithms have been proposed for the multi-armed banditproblem whose average expected reward also converges tothe optimal. If we associate each bandit with a prefix of thekeywords, we can show that in expectation these algorithmsconverge to near-optimal profits with the same performanceguarantee as our static case. The convergence rates of thesealgorithms, however, depend on the number of possible key-words N , which might be extremely large in our case. Incontrast, the convergence rate of our prefix-based adaptivealgorithm is independent of N ; it instead depends primarilyon the value of the largest ` such that the expected cost ofthe prefix 1, . . . , ` does not exceed our budget. We believethat in practice the value of ` will be significantly smallerthan the total number of keywords N . Furthermore, ourdependence on T is much better than the multi-armed ban-dit algorithms, so that our algorithm converges faster. Fi-nally, when compared to two standard multi-armed banditalgorithms, extensive numerical simulations show that in aslittle as 30 time periods our algorithm outperforms thesetwo algorithms, yielding significantly better profits. In oneset of simulations, our profits were about 7% better, and inanother about 20% better.

To the best of our knowledge, this work is the first pub-licly available study that addresses the problem of identi-fying profitable sets of keywords, realistically taking intoaccount the uncertainty of the clickthru probability and theneed to estimate its value based on historical performance.While there has been a great deal of interest from researchersin the area of search-based advertising services, much of ithas focused on the design of auctions and payment mech-anisms [1, 5, 6, 9, 11, 18, 21]. Our paper is one of only afew that focuses on the users of search-based advertising.Kitts et al. [12] consider the problem of finding optimal bid-ding strategies under a variety of objectives, but only in ourstatic setting in which the clickthru probabilities are knownin advance. We believe that removing this assumption is acrucial step towards a usable algorithm. Furthermore, Kittset al. do not offer an algorithm for finding an optimal set ofkeywords when the objective is to maximize expected profitsubject to a budget constraint, which again is needed giventhe requirements of using a service such as Google Adwords.

We have attempted to construct a model of the problemthat is simultaneously tractable and realistic, justifying our

choices with real data whenever possible. Our resulting algo-rithm is simple to state and code, and provides good resultsin simulations, beating out several potential competitors.

Our paper is structured as follows. We present our modelin Section 2. We analyze the static case in Section 3. Wediscuss the connection of our problem with multi-armed ban-dits in Section 4.1, then give our algorithm in Section 4.2.We give the results of an experimental study comparing ouralgorithm with multi-armed bandit algorithms in Section 5.We discuss some possible extensions of our model in Section6. Because of space constraints, many proofs and supportingfigures have been placed in the appendix.

We conclude our introduction by mentioning some otherrelated literature. During the past few years, there has beenresurgent interest in the study of online decision makingwith extremely large decision spaces [2, 13, 14, 24]. How-ever, the algorithms developed in these papers are problem-specific and they are not applicable to the problem consid-ered here.

We should note that Mehta et al. [19] have considered theproblem faced by a search engine on how to optimally matchincoming queries with the bid placed by advertisers. Thisproblem is very different from the one studied in this paper;our focus is on the company trying to advertise rather thanon the search engine providing the advertising service.

2. PROBLEM FORMULATION AND MODELDESCRIPTION

In this section, we develop a model of search-based ad-vertising services and formulate the problem of identifyingprofitable sets of keywords. Sections 2.1 and 6 provide ad-ditional discussion of the problem formulation, justificationof the model’s assumptions, and discussion of the publiclyavailable data sources that can be used to estimate some ofthe model’s parameters. Assume that we have a probabilityspace (Ω,F ,P) and N keywords indexed by 1, 2, . . . , N . Forany t ≥ 1, let St denote the total number of search queriesthat arrive in period t. We assume that S1, S2, . . . are inde-pendent and identically distributed random variables withmean 1 < µ < ∞ and the distribution of St is known inadvance.

We assume that, in each time period, the search queriesarrive sequentially, and each query can correspond to anyone of the N keywords. For any t ≥ 1 and r ≥ 1, let the ran-dom variable Qt

r denote the keyword associated with the rth

search query in period t. We assume thatQt

r : t ≥ 1, r ≥ 1

are independent and identically distributed random vari-ables and P

Qt

r = i

= λi for i = 1, . . . , N, wherePN

i=1 λi =1. We assume that λi’s are known in advance. In Section2.1, we will show an approach for estimating the distribu-tion of St and the parameters λi using publicly availableinformation provided by search engines.

For each keyword 1 ≤ i ≤ N , let 0 ≤ pi ≤ 1, ci > 0,and πi ≥ 0 denote the clickthru probability, the ad costor cost-per-click (CPC), and the expected profit (per click)of keyword i, respectively. We will assume that the CPCand the expected profits (ci and πi) are known in advanceand remain constant. Although the CPC of each keyworddepends on the bidding behaviors of other users, we believethis requirement is a reasonable model in the short-term.In Section 2.1, we will provide additional discussion of thisassumption and show how we can approximate the current

CPC associated with each keyword using publicly availableinformation provided by search engines.

At the beginning of each time period, we start with a bal-ance of B dollars and must select a set of keywords, whichwill determine the set of ads that will be shown to con-sumers. As a search query arrives, if the query matches withone of the selected keywords and we have enough money re-maining in the account, the ad associated with the keywordappears on the search result page. Each display of an adis called an impression. If the consumer then clicks on thead, we receive a profit, pay the cost associated with the key-word to the search engine, update our remaining balance,and wait for an arrival of another search query. The processterminates once all search queries have arrived for the timeperiod.

By appropriate scaling, we may assume without loss ofgenerality that the beginning balance in each time period is$1. Let 1, . . . , N denote the set of possible keywords onwhich we can bid. Let 1(·) denote the indicator function andlet the random variable BAt

r indicate the remaining balancewhen the rth search query appears and we have decidedto select keywords in the set At ⊆ 1, . . . , N in periodt. Let Xt

ri be a Bernoulli random variable with parameterpi indicating whether or not the consumer clicks on the adassociated with keyword i during the arrival of the rth searchquery in period t. If we select the set of keywords At ⊆1, . . . , N in period t, the expected profit E [ZAt ] is givenby

E [ZAt ] = E

24 StXr=1

NXi=1

πi1i ∈ At, Q

tr = i, BAt

r ≥ ci, Xtri = 1

35 ,

(1)We assume that, for any i, the random variables

Xt

ri : t ≥ 1,r ≥ 1) are independent and identically distributed and areindependent of the arrival process.

The above expression for the expected profit reflects ourmodel of the search-based advertising dynamics. Under thismodel, we receive a profit from keyword i during the arrivalof the rth search query in period t if and only if

(a) We bid on keyword i at the beginning of period t, i.e.i ∈ At;

(b) The rth query corresponds to keyword i, i.e. Qtr = i;

(c) We have enough balance in the account to pay for thecost of the keyword if the consumer clicks on the ad, i.e.BAt

r ≥ ci;

(d) The consumer actually clicks on the ad, i.e. Xtri = 1.

As mentioned in the previous section, the clickthru ratepi for any keyword i is generally not known in advance andwe can obtain an estimate of its value only by selecting thekeywords and observing the resulting impressions and clicks.Therefore, we must balance the tradeoffs between choosingkeywords that seems to yield high average profits based onpast performance and trying previously unused keywords inorder to learn about their clickthru probabilities.

To analyze the tradeoffs between exploitations and explo-rations, let us introduce the following notation. Let (Ft : t ≥ 1)denote a filtration on the probability space (Ω,F ,P), whereFt corresponds to all events that have happened up to timet. A policy φ = (A1, A2, . . .) is a sequence of random vari-ables, where At ⊆ 1, 2, . . . , N corresponds to the set ofkeywords selected in period t. We will focus exclusively ona class of non-anticipatory policies defined as follows.

Definition 1. A policy φ = (A1, A2, . . .) is non-anticipatoryif At is Ft-measurable for all t ≥ 1. Let Π denote the col-lection of all non-anticipatory policies.

For any T ≥ 1, we aim to find a non-anticipatory pol-icy that maximizes the expected profit over T periods, i.e.supφ∈Π

PTt=1 E [ZAt ] . In general, the above optimization is

intractable and we aim to find an approximation algorithmthat yields near optimal expected profit.

2.1 Model CalibrationAlthough we aim to estimate adaptively the clickthru prob-

ability associated with the ad of each keyword based on pastperformance, our formulation assumes that we can estimatea priori the arrival process, the query distributions, and thecurrent CPC of each keyword. In this section, we identifypublicly available data sources that might be used to esti-mate these parameters.

Figure 4(a) in Appendix A shows the estimated numberof search queries in Yahoo! in September 2005 related to thekeyword “cheap Europe trip”. This data is obtained throughthe publicly available Keyword Selector tool provided bythe Yahoo! Sponsored Search program (http://inventory.overture.com/d/searchinventory/suggestion/). Using thisinformation, it might be possible to estimate the probabilitythat each query will correspond to a particular keyword.

Our model also assumes that the cost-per-click (CPC) as-sociated with each keyword is known in advance. Althoughthe actual CPC depends on the outcome of the bidding be-haviors of other users, as indicated in Section 2, we be-lieve that this requirement is a reasonable assumption inthe short-term. Whereas the clickthru probability associ-ated with the ad of each keyword can be estimated only bybidding on the keywords and observing the resulting impres-sions and clicks, it is possible to estimate the CPC associatedwith each keyword using data provided by search engines.For instance, Figure 4(b) shows an example of the currentoutstanding bids (as of 10/18/05) for the keyword “cheapEurope trip” under the Yahoo! Sponsored Search Market-ing program. This information is publicly available throughthe Bid Tools offered by Yahoo! at http://uv.bidtool.

overture.com/d/search/tools/bidtool/index.jhtml. Giventhe outstanding bids for a keyword, we might approximatethe CPC of each keyword as the median bid price, the kth

maximum bid price, or the average bid price.

3. A STATIC SETTING: WHEN THE PI AREKNOWN

In this section, we will focus on the static setting, wherethe clickthru probability associated with the ad of each key-word is known in advance. The structure of the policy foundin this section will guide the development of the adaptivepolicy in the next section.

Since we know the clickthru probability associated withthe ad for each keyword, the problem reduces to the follow-ing static optimization, which we will call STATIC BID-DING problem.

Z∗≡ maxA⊆1,...,N

E [ZA]

= maxA⊆1,...N

E

"SX

r=1

NXi=1

πi1i ∈ A, Qr = i, BA

r ≥ ci, Xri = 1#

.

where E [ZA] denotes the expected profit from selecting key-words in A. The above expression of the objective function

follows directly from the general expression given in Equa-tion 1 in Section 2; we simply drop the superscript t de-noting the time period. The following lemma allows us tosimplify the objective function of the STATIC BIDDINGproblem; its proof appears in Appendix B.

Lemma 1. For any A ⊆ 1, 2, . . . , N, r ≥ 1, and key-word i ∈ A,

Pn

Qr = i, BAr ≥ ci, Xri = 1

S ≥ ro

= λipiPn

BAr ≥ ci

S ≥ ro

and E [ZA] =P

i∈A πiλipi

P∞r=1 E

1BA

r ≥ ci, S ≥ r

.

3.1 Computational Complexity and Cost As-sumptions

Note that if pi = 1 for all i, then the consumer will al-ways click on the ad. Thus, the problem becomes a varia-tion of the stochastic knapsack problem, which is known tobe NP-complete [7, 10]. Unless P = NP , it is thus unlikelythat there exists a polynomial-time algorithm for solving theSTATIC BIDDING problem. In this section, we identifyspecial structure of our problem that will enable the devel-opment of an efficient approximation algorithm in Section3.2. Throughout this paper, we will make the following as-sumptions on the ordering of keywords and their costs.

Assumption 1.

(a) The keywords are indexed such that π1/c1 ≥ π2/c2 ≥· · · ≥ πN/cN .

(b)PN

i=1 cipiλi > 1.

(c) There exists k ≥ 1 and 0 ≤ α < 1 such that ci ≤ 1/kand λiµ ≤ kα for all i.

Before we proceed to the statement of our theorem, letus first discuss each of these assumptions. The first twoassumptions are without loss of generality. Assumption 1(a)assumes that the keywords are sorted by a prefix ordering –in a descending order of profit-to-cost ratio. Moreover, byadding fictitious keywords with zero profit, we can assumewithout loss of generality that

PNi=1 cipiλi > 1, which is the

statement of Assumption 1(b).Thus our key assumption is Assumption 1(c). This as-

sumption places constraints on the magnitude of the costand the expected number of arrivals associated with eachkeyword. The requirement that ci ≤ 1/k implies that eachclick of the ad associated with keyword i will cost at most1/k fraction of the total budget. As demonstrated in Fig-ure 5 in Appendix A, we believe that this assumption isa reasonable requirement. Figure 5 shows the distributionof the median bid price for each of the 842 travel-relatedkeywords. These keywords were chosen based on the au-thors’ experience in working with an online seller of travelpackages. Examples of the keywords include “hawaii va-cation package”, “africa safari”, and “london discount air-fare”. A complete list of 842 keywords used in this studyis available at http://www.orie.cornell.edu/∼paatrus/

keywordlist.txt. The data is collected over a 2-day period(10/17/05-10/18/05) from the publicly available Bid Tooloffered by the Yahoo! Sponsored Search program at http://uv.bidtool.overture.com/d/search/tools/bidtool/index.

jhtml. As seen from the figure, the average median bid

price for a keyword is $0.30-$0.35. A daily budget of $300-$350/day will translate to a value of k around 1000. Ofcourse, as the daily budget increases, so will the value of k.

The second part of Assumption 1(c) requires that the ex-pected number of search queries for keyword i, λiµ, is atmost kα for some 0 ≤ α < 1. Although the expected num-ber of search queries for a keyword can vary dramatically,we believe that this assumption should hold for a large pro-portion of keywords, as shown in Figure 6 in Appendix A.Figure 6 shows the distribution of the average daily searcheson Yahoo! for the same 824 travel-related keywords dur-ing September 2005. This information is obtained throughthe publicly available Keyword Selection tool offered by theYahoo! Sponsored Search program at http://inventory.

overture.com/d/searchinventory/suggestion/. As seenfrom the figure, for over 80% of the keywords, the averagenumber of daily searches is at most 20 searches per day. Ifthe value of k is about 1000, this translates to the value ofα of around 0.43.

3.2 An Approximation AlgorithmThe main result of this section is stated in the following

theorem, whose proof is given in Section 3.3.

Theorem 1. Let k and α be defined as in Assumption1. Suppose the probabilities pi are known and 1/k1−α ≤1− 1/k − 1/k(1−α)/3. Let IU be defined by

IU = max

(` : µ

Xi=1

cipiλi ≤ 1− 1

k− 1

k(1−α)/3

).

If ρ = E[min S, µ]/µ, P = 1, 2, . . . , IU, and Z∗ denotesthe optimal profit, then

ρ

1− 1

k(1+2α)/3− 2

k(1−α)/3

Z∗ ≤ E [ZP ] ≤ Z∗.

Before we proceed to the proof of the theorem, let usdiscuss the implications of the above result. The above the-orem shows that the quality of the prefix-based approxi-mation algorithm depends on two factors. The first factor,E [min S, µ] /µ, measures how tightly the distribution of Sconcentrates around its mean. In the most important exam-ple where S follows a Poisson distribution, this ratio can becomputed explicitly and it is given in the following lemma,whose proof appears in Appendix C.

Lemma 2. If S is a Poisson random variable with meanµ > 1, then 1− 3/

√µ ≤ E [min S, µ] /µ ≤ 1.

In most applications, the expected number of search queriesµ will be very large, ranging from hundreds of thousands toover millions of queries per day. If the arrival follows a Pois-son distribution, then we see that the ratio E [min S, µ] /µwill be very close to one.

The second factor that influences the quality of our ap-proximation algorithm is the value of k and α. Since 0 ≤α < 1, as k – the relative value between the budget andthe CPC for each keyword – increases, the expected profitwill be close to the optimal. As indicated in the previoussection, we believe that this is a reasonable assumption. Ifthe budget for a campaign ranges from $200-$400 per dayand the maximum CPC for each keyword ranges from 30 to50 cents, this translates to a value of k from 400 to 1200.

In addition, for our approximation algorithm to performwell, the value α should be small. Recall from Assumption1 that for any keyword i, λiµ ≤ kα. Thus, α measures theaverage number of search queries for each keyword i. Asshown in Figure 6 in Appendix A, most keywords have anaverage of less than 20 queries per day, leading us to believethat this requirement should hold for a large proportion ofkeywords.

Figure 1 shows the value of the performance bound givenin Theorem 1 (with ρ = 1.0) for various values of α and k,with k ranging from 103 to 106. We observe, for instance,that if k is 1000, α is 0.2, and ρ is 1, Theorem 1 shows thatthe algorithm is within 0.68 of the optimal profit.

Performance Bound for Different Values of k and α (ρ = 1.0)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

αPe

rfor

man

ce B

ound

k = 103

k = 104

k = 105

k = 106

Figure 1: Plot of the performance bound given in The-

orem 1 for various values of k and α (with ρ = 1.0).

3.3 Proof of Theorem 1The proof of Theorem 1 relies on the following lemmas.

The first lemma establishes an upper bound on the optimalexpected profit in terms of the optimal value of a linearprogram.

Lemma 3. Let ZLP be defined by

ZLP ≡ max

(NX

i=1

πiλipizi

NXi=1

ciλipizi ≤ 1, 0 ≤ zi ≤ µ, ∀i

)

Then, Z∗ ≤ ZLP .

Proof. Consider any A ⊆ 1, 2, . . . N. Let the randomvariable CA denote the total cost associated with biddingon keywords in A. By definition, we know that CA ≤ 1 withprobability 1, which implies that

1 ≥ E [CA] = E

"SX

r=1

NXi=1

ci1i ∈ A, Qr = i, BA

r ≥ ci, Xri = 1#

=Xi∈A

ciλipi

∞Xr=1

Eh1BA

r ≥ ci, S ≥ ri

where the last equality follows from Lemma 1. Note that forany i,

P∞r=1 E

1BA

r ≥ ci, S ≥ r≤P∞

r=1 E [1 (S ≥ r)] =E[S] = µ.

For any 1 ≤ i ≤ N , let 0 ≤ zi ≤ µ be defined by zi =1 (i ∈ A) ·

P∞r=1 E

1BA

r ≥ ci, S ≥ r

. Then, we have that

PNi=1 cipiλizi ≤ 1 and also that

E [ZA] = E

"SX

r=1

NXi=1

πi1i ∈ A, Qr = i, BA

r ≥ ci, Xri = 1#

=Xi∈A

πiλipi

∞Xr=1

Eh1BA

r ≥ ci, S ≥ ri

=

NXi=1

πiλipizi

Since A is arbitrary, the desired result follows.

The next lemma relates the optimal value ZLP of thelinear program defined in Lemma 3 to the expected profitsobtained when we select keywords according to the prefixordering; its proof is in Appendix D.

Lemma 4. Let H = 1, . . . , u where u is any positive

integer such that 0 < µPu

i=1 ciλipi ≤ 1− 1/k− 1/k(1−α)/3,

so that u ≤ IU . Let ZLP denote the optimal value of thelinear program defined in Lemma 3. Then,X

i∈H

πiλipi ≥

Xi∈H

ciλipi

!ZLP ≥

Xi∈H

ciλipi

!Z∗.

The final lemma provides a lower bound on the probability

Pn

BHr ≥ ci

S ≥ ro

that we have enough money remaining

if the rth search query corresponds to keyword i, given thatwe bid on a set of keywords H and r ≤ µ. Since the proofis fairly technical, the details are given in Appendix E.

Lemma 5. Let H = 1, . . . , u where u is any positive

integer such that 0 < µPu

j=1 cjλjpj ≤ 1−1/k−1/k(1−α)/3;so that u ≤ IU . Then, for any keyword i ∈ H and 1 ≤ r ≤ µ,

PBHr ≥ ci

S ≥ r ≥ 1−µk

Pi∈H ciλipi

1− 1k− µ

Pi∈H ciλipi

2≥ 1− 1

k(1+2α)/3.

Here is the proof of Theorem 1.

Proof. Note that since by hypothesis 1/k1−α ≤ 1−1/k−1/k(1−α)/3, it follows that IU is well-defined. It follows fromLemma 5 that for any keyword i ∈ P and 1 ≤ r ≤ µ,

P

BPr ≥ ci

S ≥ r

≥ 1− 1/k(1+2α)/3. Therefore,

E [ZP ] =Xi∈P

πiλipi

∞Xr=1

Pn

BPr ≥ ci, S ≥ r

o

≥Xi∈P

πiλipi

bµcXr=1

Pn

BPr ≥ ci, S ≥ r

o

1− 1

k(1+2α)/3

0@bµcXr=1

P S ≥ r

1AXi∈P

πiλipi

=E [min S, µ]

µ

1− 1

k(1+2α)/3

µXi∈P

πiλipi

≥ E [min S, µ]µ

1− 1

k(1+2α)/3

Z∗µ

Xi∈P

ciλipi

≥ E [min S, µ]µ

1− 1

k(1+2α)/3− 2

k(1−α)/3

Z∗

where the second inequality follows from Lemma 5. Thethird inequality follows from Lemma 2. The final inequalityfollows from Assumption 1(c), which implies that, for anykeyword i, µciλipi ≤ 1/k1−α; it follows from the definition

of IU that µPIU

i=1 ciλipi ≥ 1− 1/k − 1/k(1−α)/3 − 1/k1−α.

Since k ≥ 1 and 0 < α < 1, 1/k(1−α)/3 ≥ 1/k1−α, we havethat

1− 1

k(1+2α)/3

1− 1

k− 1

k(1−α)/3− 1

k1−α

1− 1

k(1+2α)/3

1− 1

k− 2

k(1−α)/3

≥ 1− 1

k(1+2α)/3− 2

k(1−α)/3+

2

k(2+α)/3− 1

k

≥ 1− 1

k(1+2α)/3− 2

k(1−α)/3

which is the desired result.

4. AN ADAPTIVE SETTING: WHEN THEPI ARE UNKNOWN

In this section, we consider the case when the clickthruprobabilities are not known in advance. To estimate theseprobabilities, we need to select the keywords and observe theresulting impressions and clicks, a process that can poten-tially result in significant costs. We thus need to carefullybalance the tradeoffs between explorations and exploita-tions. In Section 4.1, we will show how this problem canbe formulated as a multi-armed bandit problem, where eachbandit corresponds to a subset of keywords. We then aim todevelop a strategy that adaptively selects bandits that yieldnear-optimal profits.

By formulating the problem an instance of the multi-armed bandit problem, we can apply existing adaptive algo-rithms developed for this class of problems. Unfortunately,existing multi-armed bandit algorithms have performanceguarantees that deteriorate as the number of keywords in-creases, motivating us to develop an alternative adaptive ap-proximation algorithm that leverages the special structureof our problem.

In Section 4.2, we describe our improved algorithm andshow that the algorithm exhibits a superior performancebound compared with traditional multi-armed bandit al-gorithms. We will show that the quality of the solutiongenerated by our algorithm is independent of the numberof keywords and scales gracefully with the problem’s otherparameters. Then, in Section 5, we show through exten-sive simulations that our algorithm outperforms traditionalmulti-armed bandit algorithms, increasing profits by about7%.

4.1 A Traditional Multi-Armed Bandit ApproachTheorem 1 shows that by considering only subsets of key-

words that follow the prefix ordering – a descending orderof profit-to-cost ratio – we can achieve near-optimal profits.This result enables us to reduce the size of the decision spacefrom 2N (all possible subsets of N keywords) to just N .

Assume that the keywords are indexed as in Assumption1. For any 1 ≤ i ≤ N , let Di = 1, 2, . . . i. We can vieweach Di as a decision whose payoff corresponds to the ex-pected profit that results from selecting keywords in Di, i.e.E [ZDi ]. Let Zprefix denote the maximum profit among de-cisions D1, . . . DN , i.e. Zprefix = max1≤i≤N E [ZDi ] .

Viewing each Di as a bandit whose reward correspondsto the profit from selecting keywords in Di, we can thenformulate our problem as the multi-armed bandit problem.The multi-armed bandit problem has received much atten-tion and many algorithms exist for this class of problems.Two such algorithms are the UCB1 (see [3]) and the EXP3(see [4]) algorithms. Both algorithms adaptively select abandit in each time period based on past observations. Let(Zt(UCB1) : t ≥ 1) and (Zt(EXP3) : t ≥ 1) denote the se-quence of profits obtained by the algorithms UCB1 andEXP3, respectively. The average expected profit generatedby these algorithms are given by (see [3, 4]):

1 ≥PT

t=1 Zt(UCB1)

TZprefix≥ 1− L (αN log T + βN )

TZprefix

1 ≥PT

t=1 Zt(EXP3)

TZprefix≥ 1− 3 3

√LN log N

3√

2TZprefix

where L ≡ max

PNi=1 πixi

PNi=1 cixi ≤ 1, xi ∈ N ∀i

is

an upper bound on the maximum possible profit in any oneperiod and αN =

P1≤i≤N :∆i>0 8/∆i and βN =

1 + π2/3

PNi=1 ∆i,

and for any 1 ≤ i ≤ N , ∆i = Zprefix − E [ZDi ].The above results show that, as T increases to infinity,

the average profit earned by the UCB1 and EXP3 algo-rithms converge to the maximum expected profit Zprefix

among D1, . . . , DN . We know from Theorem 1 that Zprefix

provides a good approximation to Z∗, suggesting that theUCB1 and EXP3 algorithms may provide a viable algo-rithms for solving our problem.

Unfortunately, these two generic algorithms for the multi-armed bandit problem ignore any special structure of theproblem, and thus, the rates of convergence of these al-gorithms depend on the number of keywords N . As thenumber of keywords N increases, the rate of convergencedecreases as well, clearly an undesirable feature in our set-ting where we can have millions of keywords. In fact, theUCB1 algorithm requires us to try each of the N possibledecisions once during the first N time periods. Clearly, forlarge values of N as in our setting, this might not be feasible.

4.2 An Improved Adaptive Approximation Al-gorithm

Motivated by the result from the previous section, we willdevelop an alternative algorithm that exploits the specialstructure of our problem. We will show that our algorithmexhibits a performance bound that scales gracefully withthe problem’s parameters and the bound will be indepen-dent of the number of keywords. Then, through extensivesimulations in Section 5, we will show that our algorithmoutperforms both the UCB1 and EXP3 algorithms.

Our algorithm, which we will refer to as the ADAPTIVEBIDDING algorithm, is defined as follows.

ADAPTIVE BIDDING

• INITIALIZATION:

– For any i, let yi and xi denote the cumulativetotal number of impressions and the total numberof clicks, respectively, that the ad associated withkeyword i has received. Set yi = xi = 0 for all1 ≤ i ≤ N . Set p0

i = 1 for all i.

– We choose a sequence (γt ∈ [0, 1] : t ≥ 1), whereγt will denote the probability that we choose arandom decision at time t. We will later suggestgood choices for the γt (see the end of this sectionand Appendix F).

• DEFINITION: For t = 1, 2, . . .

– Let `t be the index such that

µ

`tXu=1

cuλupt−1u ≤ 1− 1

k− 2

k(1−α)/3< µ

`t+1Xu=1

cuλupt−1u

– Let θt denote an integer chosen uniformly at ran-dom from the set 1, 2, . . . , N

– Let Ft denote an independent binary random vari-able such that P Ft = 1 = 1−γt and P Ft = 0 =γt.

– Let the decision gt be defined by

gt =

`t, if Ft = 1θt, otherwise.

– Let Gt = 1, 2, . . . , gt. Bid on keywords in Gt.

– Observe the resulting impressions and clicks.

– Updates: For any keyword i, let V ti and W t

i de-note the number of impressions and clicks thatkeyword i receives in this period, respectively.Then, for all i,

yi ← yi + V ti and xi ← xi + W t

i

pti =

1, if yi = 0xiyi

, if yi > 0

• OUTPUT: A sequence of decisions (Gt : t ≥ 1) and thecorresponding sequence of indexes (gt : t ≥ 1).

In the above algorithm, pti represents our estimate, at the

end of t periods, of the clickthru probability of the ad associ-ated with keyword i. Note that as long as the ad associatedwith keyword i has not received any impressions (i.e. wehave no data on the clickthru probability), we will set pt

i to1. However, when the ad receives at least one impressions,we set pt

i as the average clickthru probability.For ease of exposition, let us introduce the following no-

tation that will be used throughout the paper. Let IU bedefined by

IU = max

(` : µ

Xu=1

cuλupu ≤ 1− 1

k− 1

k(1−α)/3

)Observe that IU is the same as the index of the prefix wewould use in the algorithm of Theorem 1 if we knew theprobabilities pi. We will show below that we can state theconvergence of our algorithm in terms of IU rather than N ,the total number of keywords. Intuitively speaking, it doesnot make sense for us to choose too frequently prefixes whoseindex is larger than IU since this will cause us to exceed ourbudget.

The main result of this section is stated in the followingtheorem.

Theorem 2. Let (Gt : t ≥ 1) denote the sequence of de-cisions generated by the ADAPTIVE BIDDING algo-rithm. Let ρ = E [min S, µ] /µ, and assume that 1/k1−α ≤1 − 1/k − 1/k(1−α)/3. Then, under Assumption 1, for anyT ≥ 1PT

t=1 E [ZGt ]

TZ∗ ≥ ρ

1− 1

k(1+2α)/3− 4

k(1−α)/3

−PT

t=1 γt

T−M

T,

where

M =32µIU

kρ1− 1

k(1+2α)/3

+12I2

U

PIUi=1 λiµ/

1− e−λiµ

k(7−4α)/3

≤ 32µIU

kρ1− 1/ 3

√k +

12I2U

PIUi=1 λiµ/

1− e−λiµ

k

Before we proceed to the proof of the theorem, let us com-pare the performance bound of the our adaptive bidding al-gorithm with those obtained from the UCB1 and EXP3 algo-rithms for the multi-armed bandit problem. It follows fromthe discussion in Section 4.1 and from Theorem 1 that, underthe algorithms UCB1 and EXP3, we have the following per-

formance bounds: let κ ≡ ρ1− 1/k(1+2α)/3 − 2/k(1−α)/3

,

then PTt=1 Zt(UCB1)

TZ∗ ≥ κ− L (αN log T + βN )

TZ∗PTt=1 Zt(EXP3)

TZ∗ ≥ κ− 3 3√

LN log N3√

2TZ∗

where L denotes the maximum possible profit in any oneperiod.

When we compare the performance of UCB1 and EXP3with the result of Theorem 2, we observe that there is aslightly decrease in the performance of our algorithm, from2/k(1−α)/3 to 4/k(1−α)/3. However, the convergence rateof our algorithms is superior to both UCB1 and EXP3, asindicated below.

(a) Unlike its counterparts, the convergence rate of our al-gorithm in Theorem 2 is independent of the number of key-words N . Instead, it depends only on IU which reflects themaximum number of keywords (chosen based on a prefixordering) whose expected cost does not exceed 1 − 1/k −1/k(1−α)/3 fraction of the budget. In practice, IU should besignificantly smaller than N .

(b) Although the convergence rate of our algorithm dependson the expected number of search queries µ in any givenperiod, this quantity should be comparable to the constantL – the maximum profit that can be obtained in any oneperiod – appearing in the convergence rates of UCB1 andEXP3 above. To see this, note that by definition of L, wehave that

L ≥ EZ1,...,IU

≥ µρ

1− 1

k(1+2α)/3

IUXi=1

πiλipi,

where the last inequality follows from the proof of Theorem1. Moreover, the convergence rate of our algorithm dependson µ through the product µIU . By definition, the value ofIU decreases as µ increases, suggesting that the convergence

rate of our algorithm should be insensitive to the increasein the parameter µ.

(c) With millions of available keywords, the arrival rate λiµof some keywords can be extremely small. The convergencerate of our algorithm, however, is insensitive to small arrivalrates because the expression λiµ/

1− e−λiµ

is bounded

above by 1 as λiµ converges to 0.

(d) We should also point out that, if we set γt = 1/t2 forall t ≥ 1, then

P∞t=1 γt < ∞. This implies that the aver-

age reward generated by our algorithm over T time periodsconverges to near-optimal profits at the rate of 1/T , whichis faster than the convergence rate of other two algorithms.

(e) In their seminal paper, Lai and Robbins [16] establisheda lower bound on the minimum regret of any adaptive strat-egy, showing that for any non-anticipatory strategy φ,

TXt=1

Z∗ − E [Zt (φ)] = Ω(log T ),

where E [Zt (φ)] denote the expected profit in period t underthe strategy φ. This result shows that the total expectedprofit over T periods of any adaptive strategy must differfrom the optimal expected profit by at least Ω(log T ); thus,the average expected reward per time period cannot con-verge to the optimal expected profit faster than O ((log T )/T ).We should note that our result in Theorem 2 is consistentwith this lower bound. According to Theorem 2, we havethat

TXt=1

Z∗ − E [ZGt ] ≤ νZ∗T +

TXt=1

γt + M,

where ν = 1 − ρ1− 1/k(1+2α)/3 − 4/k(1−α)/3

. Clearly,

the upper bound on the right hand side of the above in-equality increases linearly with T . Intuitively speaking, theaverage expected profit under our algorithm converges tothe optimal expected profit under the prefix ordering, whichserves only as an approximation to the optimal expectedprofit (from among all subsets of keywords).

Finally, in addition to nicer theoretical properties, in Sec-tion 5, through simulations on many large problem instances,we will further show that our adaptive bidding algorithmoutperforms both UCB1 and EXP3, increasing the profitsby an average of 7%.

4.3 Sketch of the Proof of Theorem 2We provide a sketch of the proof of Theorem 2. For more

details, the reader is referred to Appendix G. In Lemma 8 inAppendix G, we establish an upper bound on the probabilitythat `t > IU , showing that for all t > 1, P `t > IU ≤1/k(1−α)/3. For any T ≥ 1, let the random variable JT ⊆1, 2, . . . T denote the set of time periods when the index ofthe last keyword chosen based on the prefix ordering is lessthan or equal to IU , i.e. JT = t ≤ T : Ft = 1, `t ≤ IU .Note that when Ft = 1, the algorithm chooses the decisionbased on the prefix ordering, i.e. gt = `t. The result fromLemma 8 shows that JT will contain a large fraction of timeperiods, enabling us to restrict our attention to only timeperiods in JT .

Let ξ = 1 − 1/k(1+2α)/3. For any keyword i, let ζi,T =

EhP

t∈JT :gt≥i

pi − pt−1i

i denote the expected cumulative

errors during time periods in JT between our estimate ofthe clickthru probability for keyword i and its true value.Lemma 9 (in Appendix G) then relates the performanceof our algorithm to the accuracy of our estimates of theclickthru probabilities associated with keywords 1, . . . , IU

during time periods in JT , showing that for any T ≥ 1,PTt=1 E [ZGt ]

TZ∗ ≥ ρ

ξ − 3

k(1−α)/3

−PT

t=1 γt

T−ρξ

µ

T

IUXi=1

ciλiζi,T .

In Lemma 10 in Appendix G, we then show that, on av-erage, the cumulative errors over time periods in JT do notincrease with T . In fact, it is bounded by a small constantplus a term that vanishes to zero as T increases, i.e. for anyT ≥ 1 and 0 < δ < 1,

µ

T

IUXi=1

ciλiζi,T ≤1

k(1−α)/3+

1

T

0@ 8µIU

δ2kρ2ξ2+

6I2U

PIUi=1

λiµ

(1−e−λiµ)

k(7−4α)/3(1− δ)ρξ

1AThe result of Theorem 2 follows from the last two inequal-ities and from setting δ = 1/2. The intuition underlyingthe above inequality is the following. Consider any keywordi ≤ IU . Suppose JT consists of t1 < . . . < tw and considerany th+1 ∈ JT . During the previous h periods t1, . . . , th, thetotal expected number of queries associated with keyword iis λiµh. By the definition of JT , the prefix chosen duringthese h periods has final index at most IU . We can thusapply Lemma 5 to show that with probability at least ξ wehave enough budget to display ads for each of these λiµhqueries of keyword i. Hence, the expected cumulative num-ber of impressions that the ad for keyword i will receive justbefore period th+1 is at least Ω (ξλiµh). We can then applya Chernoff bound to show that, at time th+1, the error be-tween our empirical estimate of the probability and its truevalue is at most O

e−ξλiµh

, declining to zero exponentially

in h. Thus, the total errors across all time periods in JT

(summing over h) will be finite and independent of T .

5. EXPERIMENTSIn the previous section, we developed an alternative algo-

rithm that exploits the prefix ordering of the keywords. Weshowed that the average expected profit generated by thealgorithm converges to near-optimal profits and the conver-gence rate is independent of the number of keywords. Inthis section, we compare the performance of the algorithmagainst UCB1 and EXP3 algorithms for multi-armed banditproblems.

We ran two experiments: SMALL and LARGE. Each ex-periment has 100 randomly generated problem instances.In the SMALL experiment, each problem instance has 8,000keywords, 200 time periods, $400 budget per period, and thenumber of search queries in each time period has a Poissondistribution with a mean of 40,000. In the LARGE experi-ment, each problem instance has 50,000 keywords, 200 timeperiods, $1,000 budget per period, and the number of searchqueries in each time period has a Poisson distribution witha mean of 150,0001.

In both sets of experiments, for each problem instance,the cost-per-click of each keyword is chosen uniformly atrandom from the interval [0.1,0.3) and the profit-per-clickof each keyword is chosen uniformly at random from the

1Since the mean number of search queries µ is very large, to speed up

the simulation, we approximate the Poisson distribution using a Gaussiandistribution with mean µ and standard deviation

õ.

interval [0,1). The clickthru rate associated with each key-word is a random number from [0.0,0.2). To generate theprobability that a search query will correspond to each key-word, we assign random numbers (chosen independently anduniformly from [0,1)) to all keywords and normalize them.

For each problem instance, we compare our algorithmwith EXP32 and UCB13, two standard algorithms for themulti-armed bandit problem (see Section 4.1 for more de-tails). For our prefix-based algorithm, we set the random-ization probability γt to be γt = 1/t2 for all t ≥ 1.

Figure 2 (a) and (b) shows the simulation results for theSMALL experiment. Figure 2(a) shows the average profitover time for all three algorithms when averaged over all 100problem instances in the SMALL experiment. The dashedlines above and below the solid lines correspond to the 95%confidence interval. As seen from the figure, our prefix-basedalgorithm outperforms both UCB1 and EXP3 algorithms,yielding an increase of about 7% in average profits. Fig-ure 2(b) shows the distribution over 100 problem instances(in the SMALL experiment) of the relative difference be-tween the linear programming upper bound (Lemma 3) andthe profit under our algorithm. As seen from the figure,the profit generated by our algorithm differs from the linearprogramming bound by about 30%. Since the optimal solu-tion of the linear program provides an upper bound on theoptimal expected profit, this result implies that the averageprofit generated by our algorithm differs from the optimalby at most 30%.

Note that for the SMALL experiments, the value of k isabout 1200 and the value of α is around 0.33 (see Assump-tion 1 for more details). The bound of Theorem 2 impliesthat for these values our algorithm should converge to atleast .16 of the linear programming bound, but our simula-tion results show that our algorithm performs much betterthan this. As indicated by Theorems 1 and 2, as the value ofk increases and the value of α decreases, the average profitshould approach the optimal. This theorem is confirmed bythe simulation results for the LARGE experiments, which isshown in Figure 2 (c) and (d). For the LARGE experiment,the value of k and α are around 3,333 and 0.22, respectively.For these values, the bound of Theorem 2 implies that ouralgorithm should converge to at least .61 of the linear pro-gramming bound, and again in our simulation we are doingbetter than this. In Figure 2(c), we see that the averageprofit generated by our algorithms is about 20% higher thanthe profits under UCB1 and EXP3 algorithms. In addition,as shown in Figure 2(d), the average profit differs from theoptimal by at most 20%.

Appendix F provides additional discussion of the impactof different choices of randomization parameters γt and ini-tialization of the estimated clickthru probabilities.

6. MODEL DISCUSSIONS AND EXTENSIONSIn this section, we provide additional discussion on the

strengths and weaknesses of our formulation, along with pos-sible extensions of our model. First, in most search-basedadvertising services, changing the bid price of each keywordwill alter the position of the ad on the search result page.

2For EXP3, we tested 3 different settings for the “exploration probability”:

1%, 5%, and 10%. All three settings exhibit similar performance. So, wedecide to set the exploration probability to 5%.3The UCB1 algorithm requires us to try each of the N prefix sets once

during the first N periods. We choose these sets by random sampling fromamong those that have not been tried.

Average Profit Among UCB1, EXP3, and Prefix-Based Adaptive Algorithm(100 Problem Instances, 8K Keywords, Avg. 40K Queries, $400 Budget)

NOTE: The dash lines represent 95% confidence interval

900920940960980

1000102010401060108011001120

1 51 101 151

Time Period

Run

ning

Ave

rage

Pro

fit U

p To

Ti

me

tPrefix Ordering

EXP3

UCB1

Distribution of Relative Errors Between the LP Bound and the Profit Under the Prefix Algorithm

(100 Problem Instances, 8K Keywords, Avg. 40K Queries, $400 Budget)

0%

5%

10%

15%

20%

25%

<= 27.8% 28.0% 28.2% 28.4% 28.6% 28.8% 29.0% <= 30%

Relative Error (%)

% o

f Pro

blem

Inst

ance

s

(a) (b)

Average Profit Among UCB1, EXP3, and Prefix-Based Adaptive Algorithm(100 Problem Instances, 50K Keywords, Avg. 150K Queries, $1,000 Budget)

NOTE: The dash lines represent 95% confidence interval

250026002700280029003000310032003300340035003600

1 51 101 151

Time Period

Run

ning

Ave

rage

Pro

fit U

p To

Ti

me

t

Prefix Ordering

EXP3

UCB1

Distribution of Relative Errors Between the LP Bound and the Profit Under the Prefix Algorithm

(100 Problem Instances, 50K Keywords, Avg. 150K Queries, $1,000 Budget)

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

<= 18% 18.5% 19.0% 19.5% 20.0% 20.5% <= 21.0%

Relative Error (%)

% o

f Pro

blem

Inst

ance

s

(c) (d)

Figure 2: Figures in the top ((a) and (b)) and bottom ((c) and (d)) rows shows the result for the SMALL and LARGE

experiments, respectively. The figures in the left hand column ((a) and (c)) show a comparison of the average profit

over time under UCB1, EXP3, and our prefix-based algorithms. The figures on the right hand column ((b) and (d))

show the relative difference between the linear programming bound and the average profit generated by our algorithm

over 100 problem instances.

This work focuses primarily on identifying profitable subsetsof keywords. Our formulation assumes that the bid price ofeach keyword remains constant and the ad will always ap-pear in the same spot on search result page. Incorporatingthe impact of the ad’s position on the search result page willallow for a richer class of models. Finding an appropriatemodel that enables the development of efficient computa-tional methods remains an open question. Another openquestion is that of finding an appropriate model for compet-itive bidding for keywords. Our model assumes that the costof a keyword is fixed. Of course, these costs depend on whatothers bid for the same keywords. Incorporating competi-tive aspects into our model is an interesting and challengingresearch direction.

7. REFERENCES[1] G. Aggarwal and J. D. Hartline, “Knapsack Auctions,” Workshop on

Sponsored Search Auctions, EC 2005.[2] R. Agrawal, “The continuum-armed bandit problem,” SIAM J. Control and

Optimization, vol. 33, pp. 1926–1951, 1995.[3] P. Auer, N. Cesa-Bianchi, P. Fischer, “Finite-time Analysis of the

Multiarmed Bandit Problem,” Machine Learning, pp. 235–256, 2002.[4] P. Auer, N. Cesa-Bianchi, Y. Freund, R. E. Schapire, “Gambling in a

rigged casino: The adversarial multi-armed bandit problem,” SODA 1995.[5] M-F. Balcan, A. Blum, J. D. Hartline, Y. Mansour “Sponsored Search

Auction Design via Machine Learning,” Workshop on Sponsored SearchAuctions, EC 2005.

[6] J. Chen, D. Liu, A. B. Whinston, “Designing Share Structure in Auctionsof Divisible Goods,” Workshop on Sponsored Search Auctions, EC 2005.

[7] B. Dean, M.X. Goemans and J. Vondrak, “Approximating the StochasticKnapsack Problem: The Benefit of Adaptivity,” FOCS 2004, pp. 208–217.

[8] E. Even-Dar, S. Mannor, and Y. Mansour, “PAC bounds for multi-armedbandit and Markov decision processes,” Fifteenth Annual Conference onComputational Learning Theory, pp. 255-270, 2002.

[9] J. Goodman, “Pay-Per-Percentage of Impressions: An AdvertisingMethod that is Highly Robust to Fraud,” Workshop on Sponsored SearchAuctions, EC 2005.

[10] R.L. Graham, E.L. Lawler, J.K. Lenstra, and A.H.G. Rinnooy Kan,“Optimization and approximation in deterministic sequencing andscheduling: A survey,” Annals of Discrete Mathematics, 5:287–326, 1979.

[11] B. J. Jansen and M. Resnick, “Examining Searcher Perceptions of andInteractions with Sponsored Results,” Workshop on Sponsored SearchAuctions, EC 2005.

[12] B. Kitts, P. Laxminarayan, B. LeBlanc, R. Meech, “A Formal Analysis ofSearch Auctions Including Predictions on Click Fraud and BiddingTactics,” Workshop on Sponsored Search Auctions, EC 2005.

[13] R. Kleinberg, “Online Decision Problems with Large Strategy Sets,”Ph.D. disseration, MIT 2005.

[14] R. Kleinberg, “Nearly tight bounds for the continuum-armed banditproblem,” in NIPS 2005.

[15] S. R. Kulkarni and G. Lugosi, “Finite-time lower bounds for thetwo-armed bandit problem,” IEEE Trans. Aut. Control, pp. 711-714, 2000.

[16] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocationrules,” Advances in Applied Mathematics, vol. 6, pp. 4-22, 1985.

[17] S. Mannor and J. N. Tsitsiklis, “Lower bounds on the sample complexityof exploration in the multiarmed bandit problem,” Sixteenth AnnualConference on Computational Learning Theory, pp. 418-432. Springer, 2003.

[18] C. Meek, D. M. Chickering, D. B. Wilson, “Stochastic andContingent-Payment Auctions,” Workshop on Sponsored SearchAuctions, EC 2005.

[19] A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani, “Adwords andGeneralized Online Matching,” FOCS 2005.

[20] M. Mitzenmacher and E. Upfal, Probability and Computing: RandomizedAlgorithms and Probabilistic Analysis, Cambridge, 2005.

[21] D. C. Parkes and T. Sandholm,“Optimize-and-Dispatch Architecture forExpressive Ad Auctions,” Workshop on Sponsored Search, EC 2005.

[22] D. Pollard, Convergence of Stochastic Processes, Springer Verlang, 1994.[23] S. Sahni, “Approximate Algorithms for the 0/1 Knapsack Problem,”

Journal of the ACM, 22:115–124, 1975.[24] M. Zinkevich, “Online convex programming and generalized infinitisimal

gradient ascent,” in Proceeding of the 20th International Conference on MachineLearning, pp. 928–936, 2003.

APPENDIX

A. FIGURES

Figure 3: Examples of ads shown under search-based advertising programs. The left and right figures showthe result of searching for “cheap Europe trip” (on 10/18/05) on Google and Yahoo!, respectively. Thelinks on the top (shaded) area and on the right hand column of the search result page correspond to ads bycompanies participating in the search-based advertising programs.

(a) (b)

Figure 4: Figure (a) is a screen shot showing the estimated number of search queries in Yahoo! in September 2005

related to the keyword “cheap Europe trip”. The data is available through the Keyword Selector Tool offered by Yahoo!

at http://inventory.overture.com/d/searchinventory/suggestion/. Figure (b) shows the current bids (as of 10/18/05)

for the same keyword under the Yahoo! Sponsored Marketing program. This information is publicly available through

the Bid Tools offered by Yahoo! at http://uv.bidtool.overture.com/d/search/tools/bidtool/index.jhtml.

B. PROOF OF LEMMA 1

Proof. Recall that BAr denotes the remaining account balance when the rth search query arrives and we bid on keywords

in A. This implies that the event BAr ≥ ci depends only on the previous r − 1 search queries, and does not depend on

whether the consumer clicks on the ad associated with keyword i when the rth search query arrives nor does it depend on

Distribution of the Median Cost Per Click for 842 Travel-Related Keywords(10/18/2005, Yahoo! Sponsored Search Program)

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

<= 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.50+

Median CPC ($)%

of T

otal

Key

wor

ds

Figure 5: A distribution of the median bid price under the Yahoo! Sponsored Search program for 842 travel-related keywords collected during a 2-day period from 10/17/05 - 10/18/05. A complete list of 842 keywordsused in this study can be found at http://www.orie.cornell.edu/∼paatrus/keywordlist.txt.

Distribution of the Average Number of Daily Searches for 842 Travel-Related Keywords in September 2005

(Yahoo! Sponsored Search Program)

0%

10%

20%

30%

40%

50%

60%

70%

80%

<=10 20 30 40 50 60 70 80 90 100 100+

Average Daily Searches in September 2005

% o

f Tot

al K

eyw

ords

Figure 6: A distribution of average daily searches on Yahoo! for 824 travel-related keywords during September2005.

what the rth search query corresponds to. Thus,

Pn

Qr = i, BAr ≥ ci, Xri = 1

S ≥ ro

= Eh1Qr = i, BA

r ≥ ci, Xri = 1 S ≥ r

i= E

h1BA

r ≥ ci

S ≥ riEh1 (Qr = i, Xri = 1)

S ≥ ri

= λipiEh1BA

r ≥ ci

S ≥ ri

= λipiPn

BAr ≥ ci

S ≥ ro

where the third equality follows from our assumption that the clickthrus of the ads are independent of the keyword corre-sponding to the rth search query, and both of these random variables are independent of the total number of arrivals S. Thesecond equality in the lemma follows directly from the above result.

C. PROOF OF LEMMA 2The proof of Lemma 2 makes use of the following bound on the tail of a Poisson random variable. Since the proof follows

from the standard application of Chernoff bound (cf. Theorem 4.5 in [20]), we omit the details.

Lemma 6. If S is a Poisson random variable with mean µ, then for any 0 < δ < 1, P S ≤ (1− δ)µ ≤ e−µδ2/2.

Here is the proof of Lemma 2.

Proof. We have that

bµcXr=1

P S ≥ r =

bµcXr=1

(1− P S < r) = bµc −bµcXr=1

P S ≤ r − 1

It follows from Lemma 6 that for any 1 ≤ r ≤ bµc,

P S ≤ r − 1 = P

S ≤ r − 1

µµ

≤ exp

(−µ

1−

r − 1

µ

22

)= e−(1+µ−r)2/2µ ≤ e−(1+bµc−r)2/2µ,

which implies that

bµcXr=1

P S ≤ r − 1 ≤bµcXr=1

e−(1+bµc−r)2/2µ =

bµcXr=1

e−r2/2µ ≤Z ∞

0

e−x2/2µdx =

√2πµ

2

Therefore, we have that

bµcXr=1

P S ≥ r ≥ bµc −√

2πµ

2= µ

bµcµ−√

2√

µ

≥ µ

1− 1

µ−√

2√

µ

≥ µ

1− 3√

µ

,

where the last inequality follows from our assumption that µ > 1, which implies that 1/√

µ > 1/µ and the fact that√2π/2 ≤ 2.

D. PROOF OF LEMMA 4

Proof. Since the second inequality follows directly from Lemma 3, it suffices to prove only the first inequality. Let mdenote the index such that µ

Pmi=1 ciλipi ≤ 1 < µ

Pm+1i=1 ciλipi Since π1/c1 ≥ π2/c2 ≥ · · · ≥ πn/cn, it follows from the

definition of ZLP that

ZLP = µ

mXi=1

πiλipi + πm+1λm+1pm+1

1− µ

Pmi=1 ciλipi

cm+1λm+1pm+1

It is easy to verify that u < m since ciλipiµ ≤ 1/k1−α ≤ 1/k(1−α)/3 for all i by Assumption 1 (c). Moreover, since

π1/c1 ≥ · · · ≥ πN/cN , we have thatPui=1 πiλipiPui=1 ciλipi

≥Pm

i=u+1 πiλipiPmi=u+1 cipiλi

and

Pui=1 πiλipiPui=1 ciλipi

≥ πm+1λm+1pm+1

cm+1λm+1pm+1,

which implies that

mXi=u+1

πiλipi ≤

uX

i=1

πiλipi

!Pmi=u+1 ciλipiPu

i=1 ciλipi

and πm+1λm+1pm+1 ≤

uX

i=1

πiλipi

!cm+1λm+1pm+1Pu

i=1 ciλipi

.

Putting everything together, we have

ZLP = µ

uXi=1

πiλipi + µ

mXi=u+1

πiλipi + πm+1λm+1pm+1

1− µ

Pmi=1 ciλipi

cm+1λm+1pm+1

µ

uXi=1

πiλipi

!1 +

Pmi=u+1 ciλipiPu

i=1 ciλipi+

1

µ

cm+1λm+1pm+1Pu

i=1 ciλipi·1− µ

Pmi=1 ciλipi

cm+1λm+1pm+1

µ

uXi=1

πiλipi

!1 +

Pmi=u+1 ciλipiPu

i=1 ciλipi+

1− µPm

i=1 ciλipi

µPu

i=1 ciλipi

=

µ

uXi=1

πiλipi

!1

µPu

i=1 ciλipi

,

which is the desired result.

E. PROOF OF LEMMA 5The proof of Lemma 5 makes use of the following result.

Lemma 7. For any A ⊆ 1, 2, . . . , N,

V ar

r−1Xs=1

Xi∈A

ci1 (Qs = i, Xsi = 1)

S ≥ r

!≤ r − 1

k

Xi∈A

ciλipi

Proof. Since Qs’s are independent and for any i, Xsi’s are also independent, it suffices to show that for any s < r,

V ar

Xi∈A

ci1 (Qs = i, Xsi = 1)

S ≥ r

!≤ 1

k

Xi∈A

ciλipi

Note that since Qs and Xsi are independent, we have that E1 (Qs = i, Xsi = 1)

S ≥ r

= λipi, which implies that

V ar

Xi∈A

ci1 (Qs = i, Xsi = 1)

S ≥ r

!

= E

" Xi∈A

ci (1 (Qs = i, Xsi = 1)− piλi)

!2 S ≥ r

#

=Xi∈A

c2i E

"(1 (Qs = i, Xsi = 1)− piλi)

2

S ≥ r

#

+X

i,j∈A:i6=j

cicjE

"(1 (Qs = i, Xsi = 1)− λipi) · (1 (Qs = j, Xsj = 1)− λjpj)

S ≥ r

#

=Xi∈A

c2i λipi (1− λipi)−

Xi,j∈A:i6=j

cicjλipiλjpj

≤ 1

k

Xi∈A

ciλipi,

where the third equality follows from the fact that 1(Qs = i)1(Qs = j) = 0 for all i 6= j. The final inequality follows fromAssumption 1(c) that ci ≤ 1/k for all i.

Here is the proof of Lemma 5.

Proof. For any keyword i ∈ H, let T Hr denote the total amount of money that we have already spent when rth search

query arrives. Then,

P

BHr ≥ ci

S ≥ r

= P

T H

r ≤ 1− ci

S ≥ r

= 1− P

T H

r > 1− ci

S ≥ r

.

We will focus on developing an upper bound for the expression P

T Hr > 1− ci

S ≥ r

.

By definition of T Hr , we know that, with probability 1,

T Hr =

r−1Xs=1

Xi∈H

ci1Qs = i, BH

r ≥ ci, Xsi = 1≤

r−1Xs=1

Xi∈H

ci1 (Qs = i, Xsi = 1) .

Moreover, by the hypothesis of the lemma,

E

"r−1Xs=1

Xi∈H

ci1 (Qs = i, Xsi = 1)

S ≥ r

#= (r − 1)

Xi∈H

ciλipi ≤ µXi∈H

ciλipi ≤ 1− 1

k− 1

k(1−α)/3,

and it follows from the Chebyshev’s inequality and the previous lemma that

P

T Hr > 1− ci

S ≥ r

≤ P

(r−1Xs=1

Xi∈H

ci1 (Qs = i, Xsi = 1) > 1− ci

S ≥ r

)

≤V ar

Pr−1s=1

Pi∈H ci1 (Qs = i, Xsi = 1)

S ≥ r

1− ci − (r − 1)

Pi∈H cipiλi

2≤

r−1k

Pi∈H ciλipi

1− ci − (r − 1)P

i∈H cipiλi

2≤

µk

Pi∈H ciλipi

1− 1k− µ

Pi∈H cipiλi

2≤ k(2−2α)/3

kµXi∈H

ciλipi

≤ 1

k(1+2α)/3

which is the desired result.

F. IMPACT OF RANDOMIZATION AND INITIALIZATIONIn this section, we explore the impact of different choices of randomization and initialization. We ran our algorithm on the

100 problem instances in the LARGE experiment using four different randomization and initialization settings:

(a) No randomization (γt = 0 ∀t) and set the initial estimate of the clickthru probability associated with the ad of eachkeyword to zero (p0

i = 0 ∀i).(b) 1/t randomization (γt = 1/t ∀t) and set the initial estimate of the clickthru probability associated with the ad of each

keyword to 1 (p0i = 1 ∀i).

(c) 1/t2 randomization (γt = 1/t2 ∀t) and set the initial estimate of the clickthru probability associated with the ad of eachkeyword to 1 (p0

i = 1 ∀i).(d) No randomization (γt = 0 ∀t) and set the initial estimate of the clickthru probability associated with the ad of each

keyword to 1 (p0i = 1 ∀i).

Figure 7 shows the average profit over time under each of the four parameter settings. Recall that our ADAPTIVEALGORITHM sets the initial estimate of the clickthru probability to 1, corresponding to cases b), c), and d). In this case,we clearly see the benefits of randomization; allowing for random decisions with small probability (either γt = 1/t or 1/t2)yields significantly higher average profits. It is interesting to note that although setting γt = 1/t yields slightly higher profitsover γt = 1/t2, the difference is quite small.

By setting the initial clickthru probability for each keyword to 1, we limit the number of keywords chosen in each timeperiod. Initially, we will select a small subset of keywords and gradually enlarge the set of keywords as we observe more dataon impressions and clicks. The opposite extreme of this approach is to set the initial clickthru probability for each keywordto 0, implying that we will select all keywords in the first time period. Although this setting yields the highest average profit,as seen from the top line in Figure 7, we do not have a formal proof of the convergence under this setting.

Average Profits for Prefix-Based Algorithm Under Different Randomizations and Initializations (LARGE experiment)

2400

2600

2800

3000

3200

3400

3600

1 51 101 151Time Period

Ave

rage

Pro

fit

Up

toTi

me

t

No Randomization with Initial Prob. Estimates Equal to Zero1/t Randomization with Initial Prob. Estimates Equal to One1/t2 Randomization with Initial Prob. Estimates Equal to OneNo Randomization with Initial Prob. Estimates Equal to One

Figure 7: Impact of different randomization and initialization. Figure 7 shows the results of applying our algorithm

under four different parameter settings.

G. PROOF OF THEOREM 2The proof of Theorem 2 depends on a series of lemmas. The first lemma establishes an upper bound on the probability

that `t > IU . The proof appears in Appendix G.1.

Lemma 8. For any t ≥ 1, P `t > IU ≤ 1/k(1−α)/3.

The next lemma establishes a lower bound on the average total profit. The proof of this result appears in Appendix G.2.

Lemma 9. For any T ≥ 1,

PTt=1 E [ZGt ]

TZ∗ ≥ ρ

1− 1

k(1+2α)/3− 3

k(1−α)/3

−PT

t=1 γt

T− ρ

T

1− 1

k(1+2α)/3

µ

IUXi=1

ciλiE

24 Xt:gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35We should note that if JT = t ≤ T : Ft = 1, `t ≤ IU, then since the keywords are indexed starting from 1, we have that

Xt:gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

= Xt∈JT :gt≥i

pi − pt−1i

,which consistent with the notations in Section 4.3.

The next result establishes an upper bound on the accumulated difference between the empirical estimate of the clickthruprobability bpt−1

i and its true value pi, corresponding to the second term in the above lemma. The proof appears in AppendixG.3.

Lemma 10. For any T ≥ 1 and 0 < δ < 1,

µ

IUXi=1

ciλiE

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35 ≤ T

k(1−α)/3+

8µIU

δ2kρ21− 1

k(1+2α)/3

2 +6I2

U

PIUi=1 λiµ/

1− e−λiµ

k(7−4α)/3(1− δ)ρ

1− 1

k(1+2α)/3

Finally, here is the proof of Theorem 2.

Proof. It follows from Lemma 9 and 10 that, for any 0 < δ < 1,

PTt=1 E [ZGt ]

TZ∗ ≥ ρ

1− 1

k(1+2α)/3− 4

k(1−α)/3

−PT

t=1 γt

T

−ρ

1− 1

k(1+2α)/3

1

T

0B@ 8µIU

δ2kρ21− 1

k(1+2α)/3

2

1CA−ρ

1− 1

k(1+2α)/3

1

T

0@ 6I2U

PIUi=1 λiµ/

1− e−λiµ

k(7−4α)/3(1− δ)ρ

1− 1

k(1+2α)/3

1A

Setting δ = 1/2, we get that

PTt=1 E [ZGt ]

TZ∗ ≥ ρ

1− 1

k(1+2α)/3− 4

k(1−α)/3

−PT

t=1 γt

T− M

T,

where

M =32µIU

kρ1− 1

k(1+2α)/3

+12I2

U

PIUi=1 λiµ/

1− e−λiµ

k(7−4α)/3

≤ 32µIU

kρ1− 1/ 3

√k +

12I2U

PIUi=1 λiµ/

1− e−λiµ

k

where the inequality follows from the fact that 0 ≤ α < 1, which implies that 1− 1/k(1+2α)/3 ≥ 1− 1/ 3√

k and k(7−4α)/3 ≥ k,which is the desired result.

G.1 Proof of Lemma 8

Proof. For any keyword u, let Yu(t) denote the cumulative total number of impressions that the ad associated with

keyword u has received up to the end of period t. It follows from the definition of `t that

P `t > IU = P

Iu+1Xu=1

cuλupt−1u ≤ 1− 1

k− 2

k(1−α)/3

)

= P

Iu+1Xu=1

cuλu

pt−1

u − pu

≤ 1− 1

k− 2

k(1−α)/3− µ

IU +1Xu=1

cuλupu

)

= P

Iu+1Xu=1

cuλu

pu − pt−1

u

≥ µ

IU +1Xu=1

cuλupu −

1− 1

k− 2

k(1−α)/3

)

≤ P

Iu+1Xu=1

cuλu

pu − pt−1

u

>

1

k(1−α)/3

)

≤ P

IU +1Xu=1

cuλu

pu − pt−1

u

1 (Yu(t− 1) ≥ 1) >

1

k(1−α)/3

),

where the first inequality follows from the definition of IU , which implies that 1 − 1/k − 1/k(1−α)/3 < µPIU +1

u=1 cuλupu.Therefore, we have that

µ

IU +1Xu=1

cuλupu −

1− 1

k− 2

k(1−α)/3

>

1

k(1−α)/3.

The final inequality follows from the fact that for any keyword u such that Yu(t − 1) = 0, we have pt−1u = 1, which implies

that pu − ptu ≤ 0.

By Markov’s inequality, we have the following inequalities

P

IU +1Xu=1

cuλu

pu − pt−1

u

1 (Yu(t− 1) ≥ 1) >

1

k(1−α)/3

)

≤E

nµPIU +1

u=1 cuλu

pu − pt−1

u

1 (Yu(t− 1) ≥ 1)

o2

(1/k(1−α)/3)2

= k(2−2α)/3IU +1Xu=1

(µcuλu)2 Eh

pu − pt−1u

21 (Yu(t− 1) ≥ 1)

i+2k(2−2α)/3

Xu<v

µ2cucvλuλvE

pu − pt−1u

pv − pt−1

v

1 (Yu(t− 1) ≥ 1, Yv(t− 1) ≥ 1)

Now, consider any 1 ≤ u < v ≤ IU + 1,

E

pu − pt−1u

pv − pt−1

v

1 (Yu(t− 1) ≥ 1, Yv(t− 1) ≥ 1)

=X

yu≥1

Xyv≥1

P Yu(t− 1) = yu, Yv(t− 1) = yvE

pu − pt−1

u

pv − pt−1

v

Yu(t− 1) = yu, Yv(t− 1) = yv

=X

yu≥1

Xyv≥1

P Yu(t− 1) = yu, Yv(t− 1) = yvE

" 1

yu

yuXs=1

(Xsu − pu)

! 1

yv

yvXs=1

(Xsv − pv)

!Yu(t− 1) = yu, Yv(t− 1) = yv

#= 0,

where Xsu is an indicator random variable indicating whether or not the consumer clicks on the ad associated with keyword

u during the sth time when the ad associated with keyword u receives an impression. The final equality follows from thefact Xs

u and Xsv are independent of Yu(t) and Yv(t). Moreover, Xs

u and Xtv are also independent for any s and t and

E [Xsu − pu] = E [Xs

v − pv] = 0 for all s.

In addition,

Eh

pu − pt−1u

21 (Yu(t− 1) ≥ 1)

i=X

yu≥1

P Yu(t− 1) = yuE

pu − pt−1

u

2 Yu(t− 1) = yu

=X

yu≥1

P Yu(t− 1) = yuE

" 1

yu

yuXs=1

(Xsu − pu)

! 1

yu

yuXs=1

(Xsu − pu)

!Yu(t− 1) = yu

#

=X

yu≥1

P Yu(t− 1) = yuE

"1

y2u

yuXs=1

(Xsu − pu)2

Yu(t− 1) = yu

#

=X

yu≥1

P Yu(t− 1) = yupu(1− pu)

yu

≤ pu

where the third equality follows from the fact that Xsu’s are independent and identically distributed Bernoulli random variables

with parameter pu and they are independent of Yu(t). Moreover, E [Xsu − pu] = 0 for all s. The fourth equality follows from

the fact that Xsu are Bernoulli random variable with parameter pu.

Putting everything together, we have that

P `t > IU ≤ P

IU +1Xu=1

cuλu

pu − pt−1

u

1 (Yu(t− 1) ≥ 1) >

1

k(1−α)/3

)

≤ k(2−2α)/3IU +1Xu=1

(µcuλu)2 pu

≤ k(2−2α)/3 1

k1−α

IU +1Xu=1

µcuλupu

=1

k(1−α)/3µ

IU +1Xu=1

cuλupu

≤ 1

k(1−α)/3

where the third inequality follows from Assumption 1(c), which implies that, for any keyword i, µciλi ≤ 1/k1−α. The finalinequality follows from the definition of IU , which implies that

µ

IU +1Xu=1

cuλupu ≤ 1− 1

k− 1

k(1−α)/3+

1

k1−α≤ 1

G.2 Proof of Lemma 9

Proof. Recall that, for any t ≥ 1, we have Gt = 1, . . . , gt. Also, let St denote the total number of search queries intime t, and let Bgt

r denote the remaining account balance in period t just before the arrival of the rth search query associatedwith keyword i (in period t), assuming that we have bid on the set of keywords in 1, 2, . . . , gt in period t. Also, let therandom variable Qt

r denote the keyword associated with the rth search query in period t. Also, let Xtri denote a Bernoulli

random variable indicating whether or not the consumer clicks on the ad associated with keyword i during the arrival of therth search query in period t. Then, we have that

TXt=1

E [ZGt ] =

TXt=1

E

24 StXr=1

gtXi=1

πi1Qt

r = i, Bgtr ≥ ci, X

tri = 1

35=

TXt=1

E

"gtX

i=1

∞Xr=1

πi1Qt

r = i, Bgtr ≥ ci, X

tri = 1, St ≥ r

#

=

TXt=1

E

"E

"gtX

i=1

∞Xr=1

πi1Qt

r = i, Bgtr ≥ ci, X

tri = 1, St ≥ r

gt

##

=

TXt=1

E

"gtX

i=1

πiλipi

∞Xr=1

E

"1Bgt

r ≥ ci, St ≥ r

gt

##

=

TXt=1

E

"gtX

i=1

πiλipi

∞Xr=1

1Bgt

r ≥ ci, St ≥ r

#

where the penultimate equality follows from Lemma 1.Recall that Ft ∈ 0, 1 denotes a binary random variable indicating whether or not we choose a decision based on the prefix

ordering in period t, where Ft = 1 implies that gt equals to `t (the decision based on prefix ordering), and when Ft = 0, gt isa random decision. Note that, by definition, Ft = 1 with probability 1− γt. Then, we have that

TXt=1

E [ZGt ] =

TXt=1

E

"gtX

i=1

πiλipi

∞Xr=1

1Bgt

r ≥ ci, St ≥ r

#

≥TX

t=1

E

"1 (Ft = 1 ; `t ≤ IU ) ·

gtXi=1

πiλipi

∞Xr=1

1Bgt

r ≥ ci, St ≥ r

#

=

TXt=1

E

"E

"1 (Ft = 1 ; `t ≤ IU ) ·

gtXi=1

πiλipi

∞Xr=1

1Bgt

r ≥ ci, St ≥ r

`t, Ft

##

=

TXt=1

E

"1 (Ft = 1 ; `t ≤ IU ) · E

"gtX

i=1

πiλipi

∞Xr=1

1Bgt

r ≥ ci, St ≥ r

`t, Ft

##

≥TX

t=1

E

241 (Ft = 1 ; `t ≤ IU ) · E

24 gtXi=1

πiλipi

bµcXr=1

1Bgt

r ≥ ci, St ≥ r

`t, Ft

3535≥ ρ

1− 1

k(1+2α)/3

TXt=1

E

"1 (Ft = 1 ; `t ≤ IU ) · µ

gtXi=1

πiλipi

#

where the final inequality follows from the fact that when Ft = 1 and `t ≤ IU , we have that gt = `t, which implies that

E

24 gtXi=1

πiλipi

bµcXr=1

1Bgt

r ≥ ci, St ≥ r

`t, Ft

35 =

`tXi=1

πiλipiE

24bµcXr=1

1B`t

r ≥ ci, St ≥ r

`t, Ft

35=

`tXi=1

πiλipi

bµcXr=1

P

(B`t

r ≥ ci

`t, Ft, St ≥ r

)P

(St ≥ r

`t, Ft

)

1− 1

k(1+2α)/3

`tXi=1

πiλipi

bµcXr=1

P

(St ≥ r

`t, Ft

)

=

0@bµcXr=1

PSt ≥ r

1A1− 1

k(1+2α)/3

`tXi=1

πiλipi

= ρ

1− 1

k(1+2α)/3

µ

gtXi=1

πiλipi,

where the inequality follows from Lemma 5 and the fact that, conditioned on `t and St ≥ r, B`tr is independent of Ft. The

penultimate equality follows from the fact that St is independent of `t and Ft. The final equality follows from the observation

that

bµcXr=1

PSt ≥ r

= E

min

St, µ

= ρµ.

The above result implies that

1 (Ft = 1 ; `t ≤ IU ) · E

24 StXr=1

gtXi=1

πiλipi1 (Bgtr ≥ ci)

`t, Ft

35 ≥ ρ

1− 1

k(1+2α)/3

1 (Ft = 1 ; `t ≤ IU ) µ

gtXi=1

πiλipi

which gives the desired result.Thus, we have that

TXt=1

E [πGt ] ≥ ρ

1− 1

k(1+2α)/3

TXt=1

E

"1 (Ft = 1 ; `t ≤ IU ) · µ

gtXi=1

πiλipi

#

≥ ρ

1− 1

k(1+2α)/3

Z∗

TXt=1

E

"1 (Ft = 1 ; `t ≤ IU ) · µ

gtXi=1

ciλipi

#

where the final inequality follows from the fact that if Ft = 1 and `t ≤ IU , then we have that with probability 1, gt = `t andby definition of IU ,

µ

gtXi=1

ciλipi = µ

`tXi=1

ciλipi ≤ 1− 1

k− 1

k(1−α)/3,

and it follows from Lemma 4 that

`tXi=1

πiλipi ≥ Z∗`tX

i=1

ciλipi.

Therefore, we have that

1 (Ft = 1 ; `t ≤ IU ) · µgtX

i=1

πiλipi ≥ Z∗1 (Ft = 1 ; `t ≤ IU ) · µgtX

i=1

ciλipi.

Let C∗ = 1− 1/k − 2/k(1−α)/3 − 1/k1−α. Note that by definition of `t, we have that

µ

`tXi=1

ciλipt−1i ≤ 1− 1

k− 2

k(1−α)/3< µ

`t+1Xi=1

ciλipt−1i ,

which implies that µP`t

i=1 ciλipt−1i ≥ C∗ since it follows from Assumption 1(c) that ciλiµ ≤ 1/k1−α for all i. Then, we have

the following inequalities

TXt=1

E [ZGt ] ≥ ρ

1− 1

k(1+2α)/3

Z∗

TXt=1

E

"1 (Ft = 1 ; `t ≤ IU ) · µ

gtXi=1

ciλipi

#

= ρ

1− 1

k(1+2α)/3

Z∗

TXt=1

E

"1 (Ft = 1 ; `t ≤ IU )

gtXi=1

ciλipt−1i + µ

gtXi=1

ciλi

pi − pt−1

i

)#

≥ ρ

1− 1

k(1+2α)/3

Z∗

TXt=1

E

"1 (Ft = 1 ; `t ≤ IU )

(C∗ + µ

gtXi=1

ciλi

pi − pt−1

i

)#

where the last inequality follows from the fact that when Ft = 1, gt = `t. Therefore,

TXt=1

E [ZGt ] ≥ ρ

1− 1

k(1+2α)/3

Z∗C∗

TXt=1

E [1 (Ft = 1 ; `t ≤ IU )]

1− 1

k(1+2α)/3

Z∗E

"TX

t=1

1 (Ft = 1 ; `t ≤ IU )

gtXi=1

ciλiµpi − pt−1

i

#

≥ ρ

1− 1

k(1+2α)/3

Z∗C∗

TXt=1

E [1 (Ft = 1 ; `t ≤ IU )]

−ρ

1− 1

k(1+2α)/3

Z∗E

"TX

t=1

1 (Ft = 1 ; `t ≤ IU )

gtXi=1

ciλiµpi − pt−1

i

#

= ρ

1− 1

k(1+2α)/3

Z∗C∗

TXt=1

P Ft = 1P `t ≤ IU

−ρ

1− 1

k(1+2α)/3

Z∗E

24 TXt=1

gt·1(Ft=1 ; `t≤IU )Xi=1

ciλiµpi − pt−1

i

35≥ Z∗C∗ρ

1− 1

k(1+2α)/3

1− 1

k(1−α)/3

TXt=1

(1− γt)

−ρ

1− 1

k(1+2α)/3

Z∗E

24 TXt=1

gt·1(Ft=1 ; `t≤IU )Xi=1

ciλiµpi − pt−1

i

35= Z∗C∗ρ

1− 1

k(1+2α)/3

1− 1

k(1−α)/3

TXt=1

(1− γt)

−ρ

1− 1

k(1+2α)/3

Z∗E

24maxgt·1(Ft=1 ; `t≤IU )Xi=1

ciλiµX

t≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35where the first equality follows from the fact that `t and Ft are independent. The second inequality follows from Lemma 8.Note that we assume that

P0i=1 ciλiµ(pi − pt−1

i ) = 0.Note that when Ft = 1, gt = `t. Using the fact that ciλiµ ≤ 1/k1−α for all i, we have that

E

24maxgt·1(Ft=1 ; `t≤IU )Xi=1

ciλiµX

t≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35 ≤ µ

IUXi=1

ciλiE

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35Putting everything together, we have that

PTt=1 E [ZGt ]

TZ∗ ≥ ρC∗

1− 1

k(1+2α)/3

1− 1

k(1−α)/3

1

T

TXt=1

(1− γt)

− 1

IUXi=1

ciµλiE

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35≥ ρ

1− 1

k(1+2α)/3− 3

k(1−α)/3

1

T

TXt=1

(1− γt)

−ρ

1− 1

k(1+2α)/3

1

T

IUXi=1

ciµλiE

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35≥ ρ

1− 1

k(1+2α)/3− 3

k(1−α)/3

−PT

t=1 γt

T

−ρ

1− 1

k(1+2α)/3

1

T

IUXi=1

ciµλiE

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35

where is the desired result. Note that the second inequality follows from the fact that

C∗

1− 1

k(1+2α)/3

1− 1

k(1−α)/3

=

1− 1

k− 2

k(1−α)/3− 1

k1−α

1− 1

k(1+2α)/3

1− 1

k(1−α)/3

=

1− 1

k− 2

k(1−α)/3− 1

k1−α

1− 1

k(1+2α)/3− 1

k(1−α)/3+

1

k(2+α)/3

=

1− 1

k(1+2α)/3− 1

k(1−α)/3+

1

k(2+α)/3

+

− 1

k+

1

k(4+2α)/3+

1

k(4−α)/3− 1

k(5+α)/3

+

− 2

k(1−α)/3+

2

k(2+α)/3+

2

k(2−2α)/3− 2

k

+

− 1

k1−α+

1

k(4−α)/3+

1

k(4−4α)/3− 1

k(5−2α)/3

=

1− 1

k(1+2α)/3− 3

k(1−α)/3

+

3

k(2+α)/3+

2

k(4+α)/3+

2

k(4−α)/3+

2

k(2−2α)/3+

1

k(4−4α)/3

3

k+

1

k(5+α)/3+

1

k(5−2α)/3+

1

k1−α

=

1− 1

k(1+2α)/3− 3

k(1−α)/3

+

3

k(2+α)/3− 3

k

+

1

k(4+2α)/3− 1

k(5+α)/3

+

2

k(4−α)/3− 1

k(5−2α)/3

+

2

k(2−2α)/3− 1

k1−α

+

1

k(4−4α)/3

≥ 1− 1

k(1+2α)/3− 3

k(1−α)/3

G.3 Proof of Lemma 10The proof of Lemma 10 makes use of the following results. The first result provides an upper bound on the sample average

for a Bernoulli random variable. Since the result follows from a standard application of Chernoff bound (cf. [20]), we omitthe proof.

Lemma 11. Let X1, X2, . . . , Xn be independent and identically distributed Bernoulli random variables with a parameter p,i.e. for all i, PXi = 1 = 1− PXi = 0 = p. Then, for any 0 < ε < 1,

E

p− 1

n

nXi=1

Xi

≤ ε + 2e−nε2/3.

The next result is a restatement of the Berstein’s Inequality for real-valued random variables. For a proof, the reader isrefer to Appendix B in [22].

Theorem 3. Let Y1, . . . , Yn be independent nonnegative random variables and 0 ≤ Yi ≤ M for all i. For any i, letσ2

i = V ar(Yi). Let Y =Pn

i=1 Yi and µ = E [Y ]. If V ≥Pn

i=1 σ2i , then for any 0 < δ < 1,

P Y ≤ (1− δ) µ ≤ exp

(− 1

2δ2µ2

V + 13Mδµ

)Using the fact that, if a, b and c are nonnegative, the function x 7→ (ax2)/(b + cx) is increasing in x for x ≥ 0, the corollary

below follows immediately from the theorem.

Corollary 1. Let Y1, . . . , Yn be independent nonnegative random variables and 0 ≤ Yi ≤ M for all i. For any i, letσ2

i = V ar(Yi). Let Y =Pn

i=1 Yi and µ = E [Y ]. If V ≥Pn

i=1 σ2i , then for any 0 < δ < 1 and 0 ≤ η ≤ µ,

P Y ≤ (1− δ) η ≤ exp

(− 1

2δ2η2

V + 13Mδη

)The next lemma establishes an upper bound on the expected difference between pi and pt−1

i .

Lemma 12. For any keyword i and 0 < ε, δ < 1,

E

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35 ≤ εT +1

1− exp

−δ2λiρ2

1− 1

k(1+2α)/3

2

4

+

2

1− exp

−ε2(1− δ)λiµρ

1− 1

k(1+2α)/3

3

,

Proof. Let 0 < ε, δ < 1 be given. Note that when Ft = 1, we have that gt = `t. Therefore, it suffices to consider onlykeywords i such that i ≤ IU . So, let any i ≤ IU be given. To prove the desired result, it suffices to show that the upperbound holds for any arbitrary set of periods 1 ≤ t1 < t2 < · · · < tw ≤ T such that

gth · 1 (Fth = 1, `th ≤ IU ) ≥ i, ∀h = 1, . . . , w.

Note that during time periods t1, . . . , tw, we have selected keywords based on the prefix ordering (i.e. Fth = 1 and gth = `th

for all h) whose largest index gth includes i and is less than or equal to IU .We will show that, for any t1, . . . , tw satisfying the above inequality, the conditional expectation

E

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

(t1, `t1 , Ft1) , . . . , (tw, `tw , Ftw ) ,

35is bounded above by the expression given in Lemma 12. For ease of exposition, we will drop the reference to the conditioninginformation (t1, `t1 , Ft1) , . . . , (tw, `tw , Ftw ) when it is clear from the context.

Recall that Yi(t) denotes the total cumulative impressions that the ad associated with keyword i has received at the end ofperiod t. Conditioned on (t1, `t1 , Ft1) , . . . , (tw, `tw , Ftw ), we have that

E

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

(t1, `t1 , Ft1) , . . . , (tw, `tw , Ftw )

35=

wXh=1

Epth−1

i − pi

=

wXh=1

E

pth−1i − pi

· 1Yi (th − 1) ≤ (1− δ)(h− 1)λiρµ

1− 1

k(1+2α)/3

+

wXh=1

E

pth−1i − pi

· 1Yi (th − 1) > (1− δ)(h− 1)λiρµ

1− 1

k(1+2α)/3

≤wX

h=1

P

Yi (th − 1) ≤ (1− δ)(h− 1)λiρµ

1− 1

k(1+2α)/3

+

wXh=1

ε + 2 exp

−ε2(1− δ)(h− 1)λiρµ

1− 1

k(1+2α)/3

3

≤wX

h=1

P

Yi (th − 1) ≤ (1− δ)(h− 1)λiρµ

1− 1

k(1+2α)/3

+ εT +

2

1− exp

−ε2(1− δ)λiρµ

1− 1

k(1+2α)/3

3

(2)

where the first inequality follows from Lemma 11 and the final inequality follows from the formula for the geometric series.Thus, to establish the desired result, it suffices to show that

wXh=1

P

Yi (th − 1) ≤ (1− δ)(h− 1)λiρµ

1− 1

k(1+2α)/3

≤ 1

1− exp

−δ2λiρ2

1− 1

k(1+2α)/3

2

4

Recall that B

Gthr denote the remaining account balance in period th just before the arrival of the rth search query (in period

th), assuming that we bid on keywords in Gth = 1, . . . , gth. And Qthr denotes the keyword associated with the rth search

query in period th. For any 1 ≤ h ≤ w, let Uh be defined by

Uh ≡minSth ,bµcX

r=1

1Qth

r = i, BGthr ≥ ci

=

∞Xr=1

1Qth

r = i, BGthr ≥ ci, min

Sth , bµc

≥ r

=

bµcXr=1

1Qth

r = i, BGthr ≥ ci, S

th ≥ r

The random variable Uh provides a lower bound on the number of impressions that keyword i receives in period th. Thus,

for any 1 ≤ h ≤ w,

h−1Xv=1

Uv ≤ Yi (th − 1) .

Note that, conditioned on (t1, `t1 , Ft1) , . . . , (tw, `tw , Ftw ), the random variables U1, U2, . . . , Uw are independent because Si’sare i.i.d. and the distribution of Uh depends only on Gth which is fixed given our conditioning information. We can thus

apply the result of Corollary 1 to the random variablePh−1

v=1 Uv. To do so, we need to determine its expectation and variance.Conditioned on t1, . . . , tw, we will show that for any 1 ≤ h ≤ w,

E

"h−1Xv=1

Uv

#≥ (h− 1)λiµρ

1− 1

k(1+2α)/3

and

h−1Xv=1

V ar [Uv] ≤ (h− 1)λiµ2.

To prove this result, note that for any 1 ≤ h ≤ w,

E [Uh] =

bµcXr=1

λiPn

BGthr ≥ ci, S

th ≥ ro

≥ λi

1− 1

k(1+2α)/3

bµcXr=1

PSth ≥ r

= λiE

min

Sth , µ

1− 1

k(1+2α)/3

= λiµρ

1− 1

k(1+2α)/3

,

where the inequality follows from Lemma 5 and the fact that Gth = 1, 2, . . . , gth and gth ≤ IU for all 1 ≤ h ≤ w. Note thatsince gth ≤ IU for all h, it follows from the definition of IU that

µ

gthXu=1

cuλupu ≤ 1− 1

k− 1

k(1−α)/3,

which is the required hypothesis for the application of Lemma 5. The above result implies that for any 1 ≤ h ≤ w,

E

"h−1Xv=1

Uv

#≥ (h− 1)λiµρ

1− 1

k(1+2α)/3

,

Moreover, conditioned on (t1, `t1 , Ft1) , . . . , (tw, `tw , Ftw ), for any 1 ≤ h ≤ w,

V ar [Uh] ≤ EU2

h

= E

2640B@minSth ,bµcX

r=1

1Qth

r = i, BHthr ≥ ci

1CA2375

≤ E

240@bµcXr=1

1Qth

r = i1A235

= E

24bµcXr=1

1Qth

r = i

+X

1≤q,r≤bµc:q 6=r

1Qth

r = i, Qthq = i

35= λibµc+ λ2

i bµc(bµc − 1)

≤ λiµ2

where the last equality follows from the fact that, for r 6= q, the arrival of the rth and qth queries are independent of eachother. The final inequality follows from our assumption that µ > 1. The above result implies that

h−1Xv=1

V ar [Uv] ≤ (h− 1)λiµ2.

Note that 0 ≤ Uh ≤ µ for all h and i. Therefore, by applying Corollary 1, we can conclude that for any 1 ≤ h ≤ w,

P

Yi (th − 1) ≤ (1− δ)(h− 1)λiµρ

1− 1

k(1+2α)/3

≤ P

(h−1Xv=1

Uv ≤ (1− δ)(h− 1)λiµρ

1− 1

k(1+2α)/3

)

≤ exp

8><>:− 1

2

hδ(h− 1)λiµρ

1− 1

k(1+2α)/3

i2(h− 1)λiµ2 + 1

3µδh(h− 1)λiµρ

1− 1

k(1+2α)/3

i9>=>;

= exp

8><>:− 1

2δ2(h− 1)2λ2

i µ2ρ21− 1

k(1+2α)/3

2

(h− 1)λiµ2 + 13δ(h− 1)λiµ2ρ

1− 1

k(1+2α)/3

9>=>;

= exp

8><>:− 1

2δ2(h− 1)λiρ

21− 1

k(1+2α)/3

2

1 + 13δρ1− 1

k(1+2α)/3

9>=>;

≤ exp

8><>:−δ2(h− 1)λiρ

21− 1

k(1+2α)/3

2

4

9>=>;where the last inequality follows from the fact that 1 + 1

3δ1− 1

k(1+2α)/3

≤ 2.

Hence,

wXh=1

P

Yi (tih − 1) ≤ (1− δ)(h− 1)λiµρ

1− 1

k(1+2α)/3

wXh=1

exp

8><>:−δ2(h− 1)λiρ

21− 1

k(1+2α)/3

2

4

9>=>;≤ 1

1− exp

−δ2λiρ2

1− 1

k(1+2α)/3

2

4

where the last inequality follows from the standard bound for the geometric series. Since (t1, `t1 , Ft1) , . . . , (tw, `tw , Ftw ) arearbitrary, the desired result follows.

Finally, here is the proof of Lemma 10.

Proof. It follows from Lemma 12 that

IUXi=1

ciλiµE

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35≤

IUXi=1

εciλiµT +

IUXi=1

ciλiµ

1− exp

−δ2λiρ2

1− 1

k(1+2α)/3

2

4

+

IUXi=1

2ciλiµ

1− exp

−ε2(1− δ)λiρµ

1− 1

k(1+2α)/3

3

≤IUXi=1

εciλiµT +

IUXi=1

8ciλiµ

δ2λiρ21− 1

k(1+2α)/3

2 +

IUXi=1

6ciλiµ

ε2(1− δ)ρ1− 1

k(1+2α)/3

(1− e−λiµ)

= εT

IUXi=1

ciλiµ +8µ

δ2ρ21− 1

k(1+2α)/3

2

IUXi=1

ci +6

ε2(1− δ)ρ1− 1

k(1+2α)/3

IUXi=1

ciλiµ

(1− e−λiµ),

where the last inequality follows from the fact that for any 0 ≤ x ≤ 1 and any d ≥ 0,

1

1− e−x≤ 2

xand

1

1− e−dx≤ 1

x (1− e−d),

which implies that

1

1− exp

−δ2λiρ2

1− 1

k(1+2α)/3

2

4

≤ 8

δ2λiρ21− 1

k(1+2α)/3

2

and

1

1− exp

−ε2(1− δ)λiµρ

1− 1

k(1+2α)/3

3

≤ 3

ε2(1− δ)ρ1− 1

k(1+2α)/3

(1− e−λiµ)

Using our assumption that ci ≤ 1/k and λiµ ≤ kα for all i, we can conclude that for any 0 < ε, δ < 1,

IUXi=1

ciλiµE

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35 ≤ εTIU

k1−α+

8µIU

δ2kρ21− 1

k(1+2α)/3

2 +6PIU

i=1 λiµ/1− e−λiµ

ε2k(1− δ)ρ

1− 1

k(1+2α)/3

,

which is the desired result.Set ε = k(2−2α)/3/IU . By definition of IU , it is easy to verify that ε < 1. Then, we have that for any 0 < δ < 1,

IUXi=1

ciλiµE

24 Xt≤T :gt·1(Ft=1 ; `t≤IU )≥i

pi − pt−1i

35 ≤ T

k(1−α)/3+

8µIU

δ2kρ21− 1

k(1+2α)/3

2 +6I2

U

PIUi=1 λiµ/

1− e−λiµ

k(7−4α)/3(1− δ)ρ

1− 1

k(1+2α)/3

which gives the desired result.