analyse, target & advertise - reportsecowinetcourse.epfl.ch/.../final_report.pdf · the...

8
Analyse, Target & Advertise Privacy in mobile ads Tomasz Trzcinski Supervision: Prof. Jean-Pierre Hubaux, Nevena Vratonjic, Marcin Poturalski Laboratory for Computer Communications and Applications EPFL, Lausanne, Switzerland Email: [email protected] Abstract—Online advertising becomes ubiquitous among many Internet services, such as email, websites and search engines. Furthermore, the rising market of mobile ads facilitates the proliferation of various applications and mobile websites that incorporate ads as well. However, several significant privacy related questions regarding this phenomenon have not been answered yet. For instance, there is no publicly accessible specification of how the major ads’ providers operate or how they enhance their targeting mechanisms using data gathered from the users. In this report, we focus on developing a systematic and robust method for analysis of the retrieved mobile ads. In the analysis, we attempt to investigate what information is gathered by the ad providers, which part of this information is used for ad targeting and what is the discriminative power of particular part of that information. Overall, we design a robust and modular comprehensive system to assess the influence of user characteristics on the targeting mechanism of mobile ads. Index Terms—Advertising, Privacy, Mobile Ads, AdMob I. I NTRODUCTION A rapid increase in number of mobile devices capable of connecting to the Internet has recently arisen the interest of the advertisers to capitalize on the generated traffic. One of the proposed methods is mobile advertising which is essentially based on the same principle as online advertising. Mobile ads, similarly to online ads, are distributed by the appropriate providers that are contracted by the advertisers to present their ads in various forms: images, videos or plain text. In order to better distribute the contracted ads, providers cooperate with website publishers and application developers, so that they incorporate the ads into their services. In order to pro- mote the contracted ads, the advertisers reward the publishers mainly for number of impressions (number of times the ad is shown) and mouse clicks. In fact, they employ complex billing mechanisms to combine these metrics which include e.g. fraud detection methods. The overview of a mobile advertising system is shown in Fig. 1. Typically, we have a mobile terminal that is requesting a website from the publisher. Since the publisher had incor- porated some ads in his website, it sends an ad request to the advertising system (e.g. AdMob [1]). The ad server processes the request and returns the HTML code with a URL of an ad. The publisher website can now be passed on to the terminal along with the ad. One can expect that in order to increase the number of user clicks the displayed ads should be appropriately selected. Fig. 1: Architecture of a mobile advertising system. Indeed, it has been reported that successful targeting mech- anism can increase the probability that the user will follow the displayed ad [2]. There exist several approaches to the targeting: behavioural targeting, contextual targeting, targeting based on demographics, etc. In any of these approaches, it is clear that in order to perform a successful targeting, additional information about the user has to be gathered, for instance user web search history, behavioural patterns or mobile locations. As this implies wide access to rather private information, the understandable concerns about user privacy appeared. In spite of all the potential threats to user privacy, there is no publicly accessible source of information on how the targeting mechanisms in the major advertising systems work, neither what user data they gather. As a matter of fact, many of those advertising systems are governed by the entities which have access to wide selection of private user data (e.g. Google). It certainly increases the social and economical importance of this issue. Moreover, the advertisers distribute the ads among the publishers using generic code extract for different program- ming languages and platforms, e.g. HTTP, JavaScript, etc. This rises additional security concerns, since there has been reported several methods of exploiting HTTP protocol and injecting malicious JavaScript code. What is more, the transfer of ad request containing private data between the user and the publisher is sent without authentication, increasing the

Upload: others

Post on 04-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analyse, Target & Advertise - Reportsecowinetcourse.epfl.ch/.../Final_Report.pdf · The overview of a mobile advertising system is shown in Fig.1. Typically, we have a mobile terminal

Analyse, Target & AdvertisePrivacy in mobile ads

Tomasz TrzcinskiSupervision: Prof. Jean-Pierre Hubaux, Nevena Vratonjic, Marcin Poturalski

Laboratory for Computer Communications and ApplicationsEPFL, Lausanne, Switzerland

Email: [email protected]

Abstract—Online advertising becomes ubiquitous among manyInternet services, such as email, websites and search engines.Furthermore, the rising market of mobile ads facilitates theproliferation of various applications and mobile websites thatincorporate ads as well. However, several significant privacyrelated questions regarding this phenomenon have not beenanswered yet. For instance, there is no publicly accessiblespecification of how the major ads’ providers operate or how theyenhance their targeting mechanisms using data gathered fromthe users. In this report, we focus on developing a systematicand robust method for analysis of the retrieved mobile ads.In the analysis, we attempt to investigate what information isgathered by the ad providers, which part of this information isused for ad targeting and what is the discriminative power ofparticular part of that information. Overall, we design a robustand modular comprehensive system to assess the influence of usercharacteristics on the targeting mechanism of mobile ads.

Index Terms—Advertising, Privacy, Mobile Ads, AdMob

I. INTRODUCTION

A rapid increase in number of mobile devices capable ofconnecting to the Internet has recently arisen the interest ofthe advertisers to capitalize on the generated traffic. One of theproposed methods is mobile advertising which is essentiallybased on the same principle as online advertising. Mobileads, similarly to online ads, are distributed by the appropriateproviders that are contracted by the advertisers to present theirads in various forms: images, videos or plain text. In orderto better distribute the contracted ads, providers cooperatewith website publishers and application developers, so thatthey incorporate the ads into their services. In order to pro-mote the contracted ads, the advertisers reward the publishersmainly for number of impressions (number of times the ad isshown) and mouse clicks. In fact, they employ complex billingmechanisms to combine these metrics which include e.g. frauddetection methods.

The overview of a mobile advertising system is shown inFig. 1. Typically, we have a mobile terminal that is requestinga website from the publisher. Since the publisher had incor-porated some ads in his website, it sends an ad request to theadvertising system (e.g. AdMob [1]). The ad server processesthe request and returns the HTML code with a URL of an ad.The publisher website can now be passed on to the terminalalong with the ad.

One can expect that in order to increase the number ofuser clicks the displayed ads should be appropriately selected.

Fig. 1: Architecture of a mobile advertising system.

Indeed, it has been reported that successful targeting mech-anism can increase the probability that the user will followthe displayed ad [2]. There exist several approaches to thetargeting: behavioural targeting, contextual targeting, targetingbased on demographics, etc. In any of these approaches, it isclear that in order to perform a successful targeting, additionalinformation about the user has to be gathered, for instance userweb search history, behavioural patterns or mobile locations.As this implies wide access to rather private information, theunderstandable concerns about user privacy appeared.

In spite of all the potential threats to user privacy, there is nopublicly accessible source of information on how the targetingmechanisms in the major advertising systems work, neitherwhat user data they gather. As a matter of fact, many of thoseadvertising systems are governed by the entities which haveaccess to wide selection of private user data (e.g. Google). Itcertainly increases the social and economical importance ofthis issue.

Moreover, the advertisers distribute the ads among thepublishers using generic code extract for different program-ming languages and platforms, e.g. HTTP, JavaScript, etc.This rises additional security concerns, since there has beenreported several methods of exploiting HTTP protocol andinjecting malicious JavaScript code. What is more, the transferof ad request containing private data between the user andthe publisher is sent without authentication, increasing the

Page 2: Analyse, Target & Advertise - Reportsecowinetcourse.epfl.ch/.../Final_Report.pdf · The overview of a mobile advertising system is shown in Fig.1. Typically, we have a mobile terminal

potential threat of being intercepted. This can clearly lead tosevere privacy breach.

As the latter issues has been already discussed in theliterature [3], in this report, we mainly focus on the privacyof user data from the advertising system point of view. Moreprecisely, we attempt to determine which type of informationis gathered and used by the advertising systems to target theads. We propose a systematic approach for measuring mobileadvertising system. We design a set of novel experimentsalong with corresponding metrics that are used for quantitativeassessment of the discriminative power of the particular infor-mation type. By addressing the main challenges of measuringadvertising systems, we provide a complete toolbox for robustanalysis of the ads. Finally, we present the example resultsobtained for AdMob advertising system.

II. RELATED WORK

There has been relatively little past work in the topics ofmobile advertising. However, there are several related areas ofresearch that may be of use in that matter. The most relevantof them can be divided into the following categories:

A. Ad selection algorithms

The complex problem of accurate targeting user interestsand needs has recently gained wide attention of researchcommunity. As this problem combines various fields, rangingfrom psychology to mathematics, several important notionshave stemmed in the literature, e.g. user data pattern min-ing [4] or comprehensive program event analysis [5]. However,contextual and behavioural targeting quickly became the mostsignificant ones. While contextual targeting is based on theconceptual analysis of the content of an ad surroundings(website, application), behavioural targeting uses the userinformation and history of activity to learn his interests and ac-curately target the ads. As a matter of fact, it has been proven,that user characteristics, such as gender, age or behaviouralpatterns, can highly influence the user impression about aparticular ad [2][6]. For instance, young male that is frequentlyvisiting sport-related websites will be more interested in asport-related ad and, hence, will be more prone to click on thisad increasing the mouse clicks number. As this quickly becameapparent to many researchers and entrepreneurs, various ads’targeting algorithms based on user data were designed andapplied. Using machine learning and data mining techniques,they substantially increased the efficiency of the targetingmethods. In the same time, however, serious issues regardingprivacy of user data appeared and started a worldwide debateon that matter [7].

B. Privacy of user information

In order to guarantee successful ads’ targeting various usercharacteristics shall be known to the targeting algorithm.Assuming this data is gathered from several sources, e.g. socialnetworking profiles or web search history, this leads to severalimportant privacy-related questions: how is this data gatheredand transmitted, is it stored safely, what entities can have

access to this data? Some of these questions can be answeredwith reference to the specifications of the service providers(social network administrators, websites publishers, etc.). Mostof these questions, however, cannot be answered simply be-cause of the privacy of providers’ business operations. Hence,the research done in the field of user information privacyis mainly focused on the detection of privacy leakage andproposition of new methods to protect the private data [8][9].

C. Advertising systems

Another research path is oriented at design and analysis ofthe advertising systems as an entire multi-functional tool togather information about the users, select the appropriate adusing sophisticated targeting algorithm and present it to theuser [10]. Since most of the insight into the advertising systemis restricted to the in-house administrators, the system becomesa black-box element which makes its analysis a complex task.

However, there has been recently a few propositions whichhave shed some light on that problem. One of the mostinteresting one was presented by Guha et al. in [11] wherethey propose a user-oriented method of measuring the perfor-mance of an advertising system. Their approach is focusedon the analysis of the online advertising systems from theuser perspective. Their measurement methodology provides asimilarity metrics that is used for the analysis of how websitesuse user profile information. It is robust to noise and performsrelatively well with the online ads. In fact, the methodologypresented in [11] is complimentary to methodology describedin this report. We also aim at robust analysis of the adtargeting mechanisms. However, we propose to analyse thisbehaviour from the publisher side, i.e. using the ad requestssent from the publisher server. Furthermore, as we analysethe mobile ads, we extend the range of the investigatedparameters by those that are not applicable for ordinary onlineads, such as current location coordinates or terminal type.Nevertheless, [11] provides essential insight into the problemof ad system analysis and we compare our results to those ofGuha et al. in section IV.

III. EXPERIMENTAL SETUP

In order to understand the behaviour of the mobile ad-vertising system more deeply, we propose an experimentalsetup which is implemented from the website or applicationpoint of view rather than the user’s one. Our main focus is toanalyse the retrieved mobile ads so that we can draw valuableconclusions about the type of user data that is being gatheredby the advertising system and determine how it influences thecontents of the presented ads. More precisely, we propose apublisher-side systematic approach to the problem of mea-suring the discriminative power of particular data type. Ourmain contribution is a comprehensive modular system for theanalysis of an advertising system.

A. Challenges

Due to the complexity of the system and temporal charac-teristics of advertising campaigns, several challenges have to

Page 3: Analyse, Target & Advertise - Reportsecowinetcourse.epfl.ch/.../Final_Report.pdf · The overview of a mobile advertising system is shown in Fig.1. Typically, we have a mobile terminal

(a) Instance 1. (b) Instance 2. (c) Instance 3.

Fig. 2: Single ad having multiple instances.

/ / AdMob P u b l i s h e r Code$admob params = a r r a y (

’PUBLISHER ID ’ => ’ a14cb325d33ef43 ’ ,’ANALYTICS ID ’ => ’ y o u r a n a l y t i c s s i t e i d ’ ,’AD REQUEST’ => t r u e ,’ANALYTICS REQUEST’ => f a l s e ,’TEST MODE’ => f a l s e ,/ / o p t i o n a l p a r a m e t e r s’OPTIONAL’ => a r r a y ( ) ) ;

$params = a r r a y ( ’ r t = ’ . $ r t ,’ z = ’ . ( $ sec + $usec ) ,’ u = ’ . u r l e n c o d e ( $ SERVER [ ’HTTP USER AGENT ’ ] ) ,’ i = ’ . u r l e n c o d e ( $ SERVER [ ’REMOTE ADDR’ ] ) ,’ p = ’ . u r l e n c o d e ( ” $ p r o t o c o l : / / ” . $ SERVER [ ’HTTP HOST ’ ] . $ SERVER [ ’REQUEST URI ’ ] ) ,’ v = ’ . u r l e n c o d e ( ’20081105−PHPCURL−acda0040bcdea222 ’ ) ) ;

?>

Fig. 3: Format of the HTML AdMob response.

be faced while designing and performing the experiments. Wedescribe them thoroughly and address each one of them sothat the the experiment results become more precise and morerobust to measurement noise.

1) Ads’ selection algorithm unknown: As it is the core tech-nology in the business model of any advertising company, thealgorithm used to determine the appropriate ad to be displayedis not public. Thus, the only way to analyse the performance ofthe system is empirical assessment of the incoming ads. As aconsequence, the designed experiment shall retrieve sufficientnumber of ads to reduce the measurement noise and gainstatistical relevance.

2) No information about the ads: One should also note thatthere is no information about the cardinality of the ad set. Inother words, the number of all possible ads to be retrieved isunknown. Hence, in our experiments we deal with incompleteinformation problem and we shall assume that only a subsetof all ads is retrieved in one attempt. In order to extend thesubset of retrieved ads so that it overlaps significantly with theentire set, we shall perform the ads’ retrieval several times.

3) User profile: User profiling done by major advertisersis a common way to exercise the information gathered fromuser’s websearch and browsing history, etc. This is mainlydone using appropriately configured cookies and it may sig-nificantly contribute to the targeting algorithm as it providessubstantial amount of data about user behavioural patterns andinterests. For instance, Google learns short-term and long-term user interests based on the above mentioned data andcreates automatically user profile. It can be accessed and

modified also by each user1. While targeting the ads, Googlefrequently analyses this user profile and selects the presentedads accordingly.

The problem of privacy leakage in case of cookie-basedtracking has been discussed widely in the literature [8]. Sincewe are more interested in the privacy of user data in mobileadvertising, our research is mostly focused on the parametersthat cannot be recovered trivially from websearch history, e.g.location, terminal type, etc. Thus, in our experiments we shallassume that there is no user profile available and our systemis memory-less.

4) Comparison: The main objective of our work is to assessthe discriminative power of a particular user parameter. Themost simplistic approach to that problem consists of creatingtwo ad requests that differ only with one parameter (theremaining ones are identical) and compare the two outputsobtained by launching these requests. To compare them, weneed to first identify the ads of the same product so thatthey are treated as one and do not introduce additional noise.This is not a trivial task since ads can have many differentinstances leading to the same product or service (see Fig. 2).One may expect that the advertising systems present the adusing its unique identifier. Unfortunately, this is not the casefor AdMob.

Responding to the ad request, AdMob returns the HTMLcode of the format shown in Fig. 3. In this generic example,we can see both image and text ad retrieved. This is a defaultAdMob setting, which may be changed so that only text or

1http://www.google.com/ads/preferences

Page 4: Analyse, Target & Advertise - Reportsecowinetcourse.epfl.ch/.../Final_Report.pdf · The overview of a mobile advertising system is shown in Fig.1. Typically, we have a mobile terminal

only graphical ad is displayed. As we can see, the otherfields of the returned code contain links to the product site.Unfortunately, this is only redirect URL and it does not providea unique ad identifier, but only unique ad impression identifier.In other words, even if after clicking the ad the user isredirected to the product website, this website address is onlyrevealed after the click. Since clicking on the ad to obtain thewebsite address entails fraud, we cannot do that to classifythe ads. The last field of the retrieved code contains the mostimportant request parameters (request type, timestamp, cookienumber, etc.).

To summarize, the HTML code returned by AdMob inresponse to the request does not provide any unique datathat can be used to identify the same ad among many of itsinstances in a straightforward manner. Thus, we shall developa method to identify and possibly cluster the ads of the sameproduct.

5) Similarity assessment: After capturing the sufficientamount of ads, we shall compare their distributions (his-tograms in our implementation) to verify their similarity. Thereare several different approaches that can be employed tocompare certain distributions. such as simple goodness to fittest, Pearson chi-square test [12], Jaccard similarity coefficientor entropy based Kullback-Leibler divergence. Therefore, weshall determine which of the possible similarity measures willperform accurately in our experimental setup.

B. System overview

Bearing in mind all the above mentioned challenges, wepropose a robust integrated system for analysis of the retrievedmobile advertisement. Its main objective is to assess the dis-criminative power of a certain user parameter and its overviewis shown in Fig. 4.

In order to quantify the influence of a certain parameterm, three ad requests are prepared. First, where m is notset; second, where m is equal to the first proposed valueand third, where it is equal to the second proposed value.By comparing the outcome of these requests we are ableto determine the incremental discriminative power dmi of theparameter m (comparing the outcome of the first and secondrequest) and the differential discriminative power dmd of m(comparing the outcome of the second and third request).

As discussed in the previous section, the number of re-trieved ads should be sufficiently large to guarantee reducedmeasurement noise and increase the statistical relevance of theresult. From the initial experiments, we estimated the numberof retrievals N = 100. The ads are retrieved one after anotherfrom AdMob advertising system by means of PHP (cURL)POST method.

The ad request has been prepared using previously capturedAdMob requests from an Android application and a designatedwebsite that we had created for this purpose. After the detailedanalysis of the requests captured with Wireshark2 we haveselected the parameters that are necessary for the correct

2http://www.wireshark.org/

retrieval process. We have intentionally disabled the cookieparameter of the request which is responsible for recordinguser activity for the purposes of user profiling. This way weapproach the memory-less condition for the system and reducethe bias that stems from user profiling. Using publisher codeprovided by AdMob we have implemented the request moduleas a PHP script.

Having received the ads, we process them so that themultiple instances of the same ad are treated as one. As aresult, we obtain a distribution of ad occurrences over a uniqueset of ads. As mentioned earlier, the key challenge that wehave to face at this stage is identifying the various instancesof the same ad using the HTML code provided by AdMob andclustering them together. One should recall that the retrievedcode does not contain any information unique for a certain ad(not ad impression) and the final redirection website, whichcould serve as such information, is revealed only after clickingthe website. Taking into account all the above, we propose amethod of classification based on the text description that canbe found in the retrieved HTML code.

We implement our classification module using Alche-myAPI3 that is a publicly available interface that providesrich suite of content analysis and meta-data annotation tools.It uses machine learning algorithms and statistical languageprocessing methods to extract the keywords from a given text.

After obtaining the keywords specific for each ad we build ahistogram of their occurrences among all the ads. Since one adcan be associated with more than one keywords, we normalisethe contribution of each ad so that it does not depend on thenumber of keywords assigned.

When trying to determine the influence of the particularparameter, we have to compare the obtained histograms andquantify the differences between them. Since the ad retrievalcan be interpreted as an information retrieval process, we referto the information theory in order to compute the similaritybetween two histograms. Typically for information theory, twoprobability distributions P and Q can be compared usingKullback-Leibler divergence:

DKL(P‖Q) =∑i

P (i) logP (i)

Q(i). (1)

However, one can easily note that Kullback-Leibler di-vergence is non-symmetric. We can address this problemintroducing averaged sum of two complementary Kullback-Leibler divergences:

DJS(P‖Q) =1

2DKL (P‖M) +

1

2DKL (Q‖M) (2)

where M is the average of distributions P and Q: M =12 (P + Q). This gives us Jensen-Shannon divergence DJS

which is symmetric and nonnegative. Additionally, it is afinite value that ranges from 0 (when two distributions arecompletely different from each other) to 1 (iff P = Q).

3http://www.alchemyapi.com/

Page 5: Analyse, Target & Advertise - Reportsecowinetcourse.epfl.ch/.../Final_Report.pdf · The overview of a mobile advertising system is shown in Fig.1. Typically, we have a mobile terminal

Fig. 4: Overview of the integrated system for mobile advertisements analysis.

As we are interested in determining the influence of aparameter on the retrieved ads’ distribution, one can instantlysee that Jensen-Shannon divergence gives us a straightforwardprojection of the parameter’s discriminative power to [0, 1]space. Let p and q be the request parameters to be analysed.By changing the value of p we obtain two ad distributionsP and P ′ for which Dp

KL(P‖P ′) = 1 (P and P ′ completelydifferent). Similarly, by changing the value of q we obtaintwo other distributions Q and Q′ for which Dq

KL(Q‖Q′) = 0(Q and Q′ identical). Then, we can conclude that the dis-criminative power of the parameter p is higher than that ofq. Corresponding metrics are also following this inequalityDp

KL(P‖P ′) > DqKL(Q‖Q′). Thus, we can see that Jensen-

Shannon divergence can be used as a discriminative powermeasure.

Additionally, in order to verify our results, we computeextended Jaccard index (also called cosine similarity):

JI(P‖Q) =P̄ · Q̄

‖P̄‖ · ‖Q̄‖(3)

where P̄ = [wP,e]; wP,e is some non-zero weight if ad e existsin distribution P or 0 if it does not.

This index enables us to compute the two distributionsoverlap. Its variation with non-zero weights calculated as thelogarithm of the number of the ad occurrences is reportedto perform well in the online advertisements analysis [11].Inversely to the Jensen-Shannon divergence, Jaccard indexequal to 0 implies no overlap, and equal to 1 implies identicaldistributions. Thus, in order to obtain the discriminative powervalue, we will modify the index so that JI ′(P‖Q) = 1 −JI(P‖Q).

C. Parameters

There are plenty of parameters that may be used by adver-tising systems to correctly target the ads they are distributing:• terminal type• browser specification

TABLE I: AdMob request parameters with the assigned values.

Parameter Description Value1 Value2u User Agent BlackBerry8703e iPad

i IP Address 192.14.14.14 109.169.41.20(USA) (UK)

d[coord] Coordinates 177.563657, 37.563657,-32.324807 -122.324807

d[pc] Postal Code 94107 31532

d[ac] Area Code 650) 320

d[dob] Date of Birth 1962.01.23 1986.11.06

d[gender] Gender m(ale) f(emale)

k Keywords “sports baseball” “cars”

• current location• address• age• gender• interestsPublisher or application developer may obtain this data from

user by various means, e.g. by input form, profile registration,IP tracking, etc. However, it depends on the advertising systemif this data is going to be used in the targeting algorithm. Inother words, the parameters send in the ad request may or maynot take part in the ad selection process.

In order to verify which parameters are used in AdMobadvertising system and how they influence the outcoming ads,we have selected several AdMob request parameters4 andassigned each of them three values: NULL (unset), value1and value2. Exception to this rule has been made for useragent parameter u and IP address parameter i, due to the factthat these parameters have to be set to validate the request. Intheir case, we assigned them only two values (without NULL).The parameters and their values are shown in Tab. I.

As shown in Fig. 4 for each of the parameters and for eachof their values, we launch a set of ad requests. Afterwards, we

4http://developer.admob.com/wiki/Requests

Page 6: Analyse, Target & Advertise - Reportsecowinetcourse.epfl.ch/.../Final_Report.pdf · The overview of a mobile advertising system is shown in Fig.1. Typically, we have a mobile terminal

(a) User agent (b) IP address (c) Coordinates (d) Postal code

(e) Area code (f) Date of birth (g) Gender (h) Keywords

Fig. 5: Discriminative power values of particular request parameters measured using Jensen-Shannon divergence (JSD).

compute the Jensen-Shannon divergence DJS(P‖V1) whereP is the distribution of ads obtained for unset parame-ter (set to NULL) and V1 the distribution for parameterset to value1. This way we can compute the incrementaldiscriminative power dmi = DJS(P‖V1) of parameter m.We can also compute the differential discriminative powerdmd = DJS(V1‖V2) where V2 is, by analogy, the distributionobtained for m = value2. Additionally, we will computethe incremental discriminative power d̂mi and differential dis-criminative power d̂md using extended Jaccard index. Hence,we have that: d̂mi = JI ′(P‖V1) = 1 − JI(P‖V1) andd̂md = JI ′(V1‖V2) = 1− JI(V1‖V2).

D. Noise

During the initial experiments we have observed that theabove consideration do not account for the measurementnoise which happens to be rather high. In extreme case, twodistributions obtained for the same ad request at the sametime may be substantially different. This may be due to theinfluence of the ad churn which is the constant process ofdeactivating old ads and activating new ones. It may also becaused by other factors (e.g. DNS load-balancing) or maysimply be the consequence of randomizing the displayed addone during the targeting phase at advertiser’s side.

In order to monitor the noise level in the system, we proposeto compute additional control values of the Jensen-Shannondivergence and extended Jaccard index for two distributionsresulting from the same request, launched one after another.This way, we will be able to mitigate the noise influence andaccount for it, while analysing the results.

IV. RESULTS

In this section we present the results obtained according tothe experimental setup described in details above. The mainobjective is to verify which parameters are used for mobileads targeting and what is their discriminative power. We arerestricted to such a simplified analysis, since we cannot controlall the inputs of the ad targeting mechanism and, hence,we treat it as a black-box. Due to that we have to leavesome questions unanswered. For instance, we cannot assessthe influence of the ad clicking history on the process of adselection.

The presented results were obtained over a period of fivedays. For each experiment we configure the requests byassigning the values from Tab. I to the selected parameter.Additionally we create two identical requests to serve asthe noise-level control. If the discriminative power of thecontrol pair (noise level) is much lower than the discriminativepower of the other measurements, we consider that the inputparameter in which the other measurements differ influenceshighly the ad selection algorithm. If those two values of thediscriminative power are comparable, we conclude that theinfluence cannot be observed.

The results are shown in Fig. 5 and Fig. 6. Fig. 5 plots thediscriminative power values obtained using Jensen-Shannondivergence, whereas Fig. 6 plots these values computed asmodified version of extended Jaccard index. It can be easilyseen that in both cases the plots are extremely correlated. Thisconfirms that the Jensen-Shannon divergence proposed as ametric to judge on the discriminative power is coherent withthe extended Jaccard index proposed as a metric in [11].

Page 7: Analyse, Target & Advertise - Reportsecowinetcourse.epfl.ch/.../Final_Report.pdf · The overview of a mobile advertising system is shown in Fig.1. Typically, we have a mobile terminal

(a) User agent (b) IP address (c) Coordinates (d) Postal code

(e) Area code (f) Date of birth (g) Gender (h) Keywords

Fig. 6: Discriminative power values of particular request parameters measured using extended Jaccard index (JI).

It is also interesting to mention that the plots (c)-(h) in bothfigures exhibit some sort of regular variations over time and thefluctuations are consistent across all those graphs. This can beexplained by the fact that at some specific point of the day theads that had exhausted their daily budget on the previous dayare reactivated. Thus, the cardinality of the set of all availableads increases, which results in more various distributions andhigher variance of the results. This has been verified by themanual inspection of gathered data. Indeed, the number ofunique ads is greater for the time instants when the variationsoccur than for the times when the plots are smoother. Thisalso confirms the observations of these regular fluctuationsthat were reported by Guha et al. in [10].

From the analysis of the discriminative power values foreight different AdMob request parameters, we can concludethat only the user agent and IP address parameters affectthe targeting algorithm. In fact, they completely change thedistribution of the ads - there is almost no overlap betweenthe ads served for BlackBerry and iPad. The same applies forthe IP addresses - the average differential discriminative powerof the parameter i computed for American and British IPaddresses equals to 0.87 and 0.86 (Jensen-Shannon Divergenceand extended Jaccard index, respectively), i.e. the set ofretrieved ads differs almost completely. The correspondingcontrol values equal to 0.07 / 0.08, respectively. However,we have expected the AdMob targeting mechanism to usethe information obtained from other parameters as well. Ap-parently, the discriminative power of the other parametersis relatively low and comparable to the noise level (controlpair discriminative power). Hence, we conclude that those

parameters have much smaller influence on the set of retrievedads than IP address and terminal type.

The above considerations seem to confirm the results pre-sented in [11] for the online advertising system of Google.Guha et al. have also reported that since the website ad-vertising is mainly based on the contextual information, thebehavioural targeting may have almost no influence on the setof retrieved ads. However, due to the high level of noise theyhave observed while performing the experiments, they foundthe results inconclusive. From our results, we can confirm thatthe discriminative power of the behavioural parameters, suchas gender or date of birth, is indeed relatively low.

Nevertheless, we may expect that the results evolve in thefuture. As behavioural targeting algorithms improve over time,we assume that the parameters that are not used now may bebe employed in the future. We find this to be a reason whyAdMob leaves those parameters in the request format.

V. CONCLUSIONS

In this report we have proposed a comprehensive modularsystem for the analysis of the mobile advertising system. Wehave also presented the corresponding metrics that can beused to assess the discriminative power of the ad requestparameters. Furthermore, this report contains the results ofthe experiments performed on the AdMob advertising system.Using the methodology described above, we have concludedthat the current targeting algorithm of AdMob does not usebehavioural data and the ads are mainly influenced by userterminal type and its IP address.

Page 8: Analyse, Target & Advertise - Reportsecowinetcourse.epfl.ch/.../Final_Report.pdf · The overview of a mobile advertising system is shown in Fig.1. Typically, we have a mobile terminal

VI. FUTURE WORK

Since it lies in the nature of measurement studies, thepresented results can only be treated as a snapshot of time.Thus, the proposed measurement methodology should betreated as a tool to analyse the evolution of the targetingalgorithm in the mobile advertising systems over time. Fur-thermore, it should be noted that the proposed system is amodular composition and its single elements can be modifiedwithout rising compatibility issues with other elements. Thus,additional future work may include increasing the efficiencyof detecting multiple instances of the same ad, as this problemappears to be relevant also for other applications (e.g. valuationof mobile advertisement campaigns).

REFERENCES

[1] AdMob, http://www.admob.com/.[2] J. Yan, N. Liu, G. Wang, W. Zhang, Y. Jiang, and Z. Chen, “How much

can behavioral targeting help online advertising?” in 18th InternationalWorld Wide Web Conference (WWW2009), 2009. [Online]. Available:http://data.semanticweb.org/conference/www/2009/paper/27

[3] M. Ter Louw, K. T. Ganesh, and V. Venkatakrishnan, “Adjail: Practicalenforcement of confidentiality and integrity policies on web advertise-ments,” in 19th USENIX Security Symposium, Aug. 2010.

[4] V. Ng and K.-H. Mok, “An intelligent agent for web advertisements,”in CODAS ’01: Proceedings of the Third International Symposium onCooperative Database Systems for Advanced Applications. Washington,DC, USA: IEEE Computer Society, 2001, p. 102.

[5] A. Thawani and S. Gopalan, “Event driven semantics based ad selec-tion,” in ICME ’04: IEEE International Conference on Multimedia andExpo, 2004., 2004.

[6] J. Jaworska and M. Sydow, “Behavioural targeting in on-line advertising:An empirical study,” in WISE ’08: Proceedings of the 9th internationalconference on Web Information Systems Engineering. Berlin, Heidel-berg: Springer-Verlag, 2008, pp. 62–76.

[7] “Thirty-one privacy and civil liberties organizations urge google tosuspend gmail,” http://www.privacyrights.org/ar/GmailLetter.htm.

[8] B. Krishnamurthy and C. E. Wills, “On the leakage of personallyidentifiable information via online social networks,” in Proceedings ofACM SIGCOMM Workshop on Online Social Networks, 2009.

[9] ——, “Privacy leakage in mobile online social networks,” in Proceedingsof Workshop on Online Social Networks, 2010.

[10] S. Guha, B. Cheng, A. Reznichenko, H. Haddadi, and P. Francis, “Privad:Rearchitecting online advertising for privacy,” in Proceedings of HotTopics in Networking (HotNets), 2009.

[11] S. Guha, B. Cheng, and P. Francis, “Challenges in measuring online ad-vertising systems,” in Proceedings of Internet Measurement Conference(IMC), Melbourne, Australia, Nov 2010.

[12] J.-Y. Le Boudec, Performance Evaluation of Computer and Communi-cation Systems. EPFL Press, Lausanne, Switzerland, 2010.