aj guardado comp. sci. 49s. in order to correctly manage programs (adsense, adwords), properly...

14
AJ Guardado Comp. Sci. 49S

Upload: lambert-mcdaniel

Post on 17-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

AJ GuardadoComp. Sci. 49S

Page 2: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid clicks, Google must collect a great deal of data about querying and clicking activities.

All of this data is accumulated by Google and contains information about a visitor’s activities on the Google Network.

Page 3: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

The “post-clicking” data about conversion actions on the advertiser’s website makes up a large piece of this collected data.

If the advertiser formally agrees to provide this information, Google collects data on what pages the user went to on the advertised site marked as “conversion” pages (checkout page, form filling pages, etc).

This data is limited to what the ADVERTISER decides to provide to GOOGLE. Some decide to opt out from providing this conversion data.

This “raw” data is cleaned, preprocessed and stored in various internal logs by Google for different types of analysis.

Page 4: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

A weakness of Google’s data collection effort is it’s inability to get full access to all clicking activities of visitors.

The conversion data they collect is only part of all the activity of a visitor an the advertised site.

This data is important for detecting invalid clicks, but Google and many other search engines don’t have full access to it.

This isn’t Google’s fault, it is a limitation of the types of data available to Google.

Page 5: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

Advertisers get reports describing clicking and billing activities from Google.

These reports aren’t done that well. Smallest unit of analysis is one day, so advertisers can’t know if a click was marked as valid or invalid by Google, and Google won’t give them this info.

Advertisers feel they have the right to know this info, but if Google gives them the info they open themselves up to click fraud, because they are giving the advertisers hints about how click detection works.

Page 6: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

One definition of invalid clicks: “When a person, automated script or computer program imitates a legitimate user of a web browser clicking on an ad, for the purpose of generating an improper charger per click”.

Invalid clicks can be made by humans or computer programs.

To evaluate how valid a click is, you have to understand what the intent of clicking the ad was.

Page 7: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

Need to determine if the click is generated “artificially” or not, by way of a list of “prohibited means” that Google follows: (https://www.google.com/adsense/policies?sourceid=asos&subid=ww-ww-et-HC_entry&medium=link )

Many can be detected, but some elude Google, like a person looking at an ad a second time to make sure he’s certain what the ad entailed.

Doubleclicks are also sometimes disputed as valid or invalid. p is time difference between clicks, and if p is relatively large, second click is valid.

Page 8: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

These acts come from a malicious intent to make an advertiser pay for unnecessary clicks.

Fraudulent clicks are invalid clicks made with malicious intent.

Example of invalid is a person doubleclicking an add out of habit.

May come from software or “bots” designed to click on ads, people manipulating pages, advertisers clicking on the ads of their competitors, or multiple accounts from AdSense publishers.

Goal of the Click Quality team is to identify all invalid clicks regardless of nature, but they’re not there yet.

Page 9: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

Anomaly-based: Too many clicks in a given amount of time (Ex: 100 times a day).

Rule-based: IF-THEN rules established. Classifier-based: One learns to recognize

invalid clicks from past experiences with invalid clicks.

Google uses the first two often, rarely uses third.

Page 10: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

No real definition of invalid clicks, and a definition can’t be given to the public because unethical users will take advantage.

Search engines must either assure advertisers that they are doing everything possible, or use independent third-party vendors to solve the problem.

Page 11: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

Click Quality team tries to protect Google’s advertising and provide customer service.

Does this through prevention and detection.

Filtering and detection on several levels help solve the problem.

Pre-filtering, online filtering, post-filtering, automated monitoring, manual reviews (proactive and reactive).

Page 12: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

Started with only 3 filters, steadily grew over the years. Prioritizes filters by order in which they are used in checking invalid clicks.

Test filters before they actually use them, those that pass require constant tuning and maintenance to perform.

When Google sees the filters missed invalid clicks, they give credits to the advertisers and try to fix their filters.

Page 13: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

4 types of clicks: True Positive: invalid, correctly identified as

invalid. True Negative: valid, correctly identified as valid. False Positive: valid, incorrectly identified as

invalid. False Negative: invalid, incorrectly identified as

valid. TP+TN+FP+FN=N (total number of clicks). Accuracy rate of a filter equal to (TP+TN)/N,

and error rate to (FP+FN)/N.

Page 14: AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid

Hard for Google to get this info, doesn’t know about actual validity of clicks.

Each filter only detects 2-3% not detected by other filters already.

Offline invalid click methods detect few invalid clicks in comparison to the filters.