prognosis - an approach to predictive analytics- impetus white paper

7/31/2019 Prognosis - An Approach to Predictive Analytics- Impetus White Paper

1/14

Prognosis - An Approach

to Predictive Analytics

W H I T E P A P E R

Abstract

Prediction is a statement made about the future, an

anticipatory vision or perception. This White Paper

discusses the emergence of technology that enables

precise predictions in varied fields, and the application

of exploratory and normative methods to augmentdecision making.

Forecasting is primarily based on mining historical data

sets, extracting hidden patterns and transforming

them into valuable information through a process of

classification, clustering, regression and association

rule learning.

The white paper talks about Impetus implementation

of Behavioral Targeting for the ad world. This is a

widely accepted, statistical machine learning algorithmthat helps select most relevant ads to be displayed to a

web user based on their historical data.

.

Impetus Technologies Inc.

www.impetus.com

November 2011


2/14

Prognosis An Approach to Predictive Analytics

2

Table of Contents

Introduction .................................................................................................................................................. 2

Large scale data analytics ......................................................................................................................... 3

Algorithms for forecasting & prediction ................................................................................................... 3

Behavioral Targeting ..................................................................................................................................... 4

Advantages and threats ............................................................................................................................ 4

Industry impact ......................................................................................................................................... 5

Generic Approach to BT problem solving ................................................................................................. 6

Large scale implementation of BT ................................................................................................................ 6

Poissons Linear Regression ...................................................................................................................... 6

Implementing BT using Poissons Linear Regression ................................................................................ 6

1. Data Preparation ........................................................................................................................... 7

2. Model Training .............................................................................................................................. 8

3. Model Evaluation ........................................................................................................................ 12

Summary ..................................................................................................................................................... 14

Introduction

A prediction is a statement about the way things will happen in the future, often

but not always based on experience or knowledge. Prediction is necessary to

allow plans to be made about possible developments. Large corporations invest

heavily in this kind of activity to help focus attention on possible events, risks

and business opportunities. Such work brings together all available past and

current data, as a basis to develop reasonable expectations about the future.

The basic idea behind any such algorithm is to gather gigantic behavioral data

that describes the historical series of events/actions/behavior of the entity in

question. This data is fed into machines and run through complex machine

learning algorithms to derive models. The models serve as the basis for

predictions, i.e. based on input criteria the models infer the expected behavior

of the entity.

The application of prediction algorithms has gained prominence in a wide range

of fields such as finance (stock market predictions), insurance (predicting life

expectancy), science (weather forecasting, predicting natural disasters), medical


3/14


3

science (treating developmental disabilities), marketing (behavioral targeting)

and many more.

Typically, with predictions, there is a huge amount of historical data, time is of

the essence and there is always a current activity happening that impacts the

future. In many cases, freshness of data is a key factor and plays a major role inforecasting the future course of action. In other instances, the entire data set

has equal relevance and contributes to determining the future.

Large scale data analytics

Projects related to future predictions and forecasting point to a huge increase in

the amount of data that must not only be stored but processed quickly and

efficiently. These challenges are at once a daunting and exciting chance to use

data to create a positive impact.

Often, there is an immediate need to analyze the data at hand, to discover

patterns, reveal threats, monitor critical systems, and make decisions about thedirection the organization should take. Several constraints are always present:

the need to implement new analytics quickly enough to capitalize on new data

sources, limits on the scope of development efforts, and the pressure to expand

mission capability without an increase in budgets. For many of these

applications, the large data processing stack (which includes the simplified

programming model Map-Reduce, distributed file systems, semi-structured

stores, and integration components, all running on commodity class hardware),

has opened up a new avenue for scaling out efforts and enabling analytics that

were impossible in previous architectures. This new ecosystem has been found

to be remarkably versatile at handling various types of data and classes of

analytics.

Perhaps the most exciting benefit, however, from moving to these highly

scalable architectures is that after the immediate issues have been solved, often

with a system that can handle todays requirements and scale up to 10x or

more, new analytics and capabilities can be developed, evaluated and

integrated easily. This is owing to the speed and ease of Map-Reduce, Pig, Hive,

and other technologies. More than ever, the large-scale data analysis software

stack is proving to be a platform for innovation.

Algorithms for forecasting and prediction

There are several classes of statistical algorithms that are well suited for thesekinds of problems, which are associated with trend analysis, pattern generation

and artificial intelligence based predictions. Some of the most common ones

are:

Conjoint Analysis Expert opinion and Delphi surveys


4/14


4

Quantitative Statistical, suited to predicting trends e.g. PoissonsLinear regression, Exponential smoothing

Qualitative Subjective, providing a range of possible outcomes, e.g.the Bayesian approach

Statistical combination A mix of quantitative and qualitativetechniques e.g. Quasi Bayes

Behavioral Targeting

Behavioral targeting (BT) leverages historical user behavior to select the most

relevant ads to display. The state-of-the-art of BT derives a Linear Poisson

Regression model from fine-grained user behavioral data and predicts click-

through rate (CTR) from user history.

Behavioral targeting is an application of modern statistical machine learningmethods to online advertising. But unlike other computational advertising

techniques, BT does not primarily rely on contextual information such as query

(sponsored search) and web page (content match). Instead, BT learns from

past user behavior, especially the implicit feedback (i.e., ad clicks) to match the

best ads to users.

This makes BT enjoy a broader applicability such as graphical display ads, or at

least a valuable user dimension complementary to other contextual advertising

techniques. In today's practice, behaviorally targeted advertising inventory

comes in the form of some kind of demand-driven taxonomy. Hierarchical

examples are Finance, Investment and Technology, Consumer Electronics, andCellular Telephones. Within a category of interest, a BT model derives a

relevance score for each user from past activity. Should the user appear online

during a targeting time window, the ad serving system will qualify this user (to

be shown an ad in this category) if the score is above a certain threshold. One

de facto measure of relevance is CTR, and the threshold is predetermined in

such a way that both a desired level of relevance (measured by the cumulative

CTR of a collection of targeted users) and the volume of targeted ad impressions

(also called reach) can be achieved.

The impact of behavioral targeting can be negative if consumers feel annoyed or

threatened by the use of their personal data. However, as demonstrated by

Amazon, when personal information and technology enhance the online

experience, there is less risk of a negative response.

Advantages and threats

There are a lot of advantages attributed to ad targeting and behavioral analysis,

but at the same time it is also important to look at the downsides and surface


5/14


5

the threats posed by them. Some of the advantages that can be seen right away

are:

Reaching the right audience at the right time (of the day, week or lifestage), with clear behavioral assumptions

Standing out in a cluttered category

Reaching target audiences when context inventory is sold out(reaching same target in alternative content)

High cost of entry in desired content (reaching the same target inalternative content with lower costs)

Tailoring message to behavioral patterns to make it more relevantAs mentioned earlier, there are some downsides to BT:

Achieving high reach is difficult. Within extremely targeted segments,the potential universe available may be very limited and there may be alimit to the sites currently allowing behavioral targeting.

Inconsistencies within segment classifications. The definition ofcommon behavioral segment may differ by publisher (e.g., job seeker

searching Monster.com not the same job seeker as reading job-related

article on iVillage). Also, as the technology is cookie enabled, it suffers

the usual issues of cookie stability and data accuracy.

Ultimate issue of behavioral targeting clutter. Other advertisers withinthe same vertical will compete in the same space/segments. This is

currently a future issue but in time, cost, clutter and inventory

availability positives will become challenges (as seen in paid search). Inthe future, as targeting matures and advertisers have measurable

results, historical data will be a key indicator of which assumptions

work. This will provide optimization insights. Collecting and analyzing

response data generated from different segments are important

prerequisites for success.

Industry impact

Behavioral targeting, as a concept, has wide acceptance in the industry.

Indicated below are some use-cases where it is being successfully implemented

as a tool for predicting user behavior:

Ad Targeting and Predicting the buying behavior of users Relationship building Audience targeting Presidential candidates using BT to target persuasion Treatment of mental disorders and developmental disabilities


6/14


6

There is a vast horizon where BT, or BT based solutions are being used to

successfully predict/forecast behavior in order to increase reach, accessibility,

and revenue.

Generic approach to BT problem solving

Data mining involves extracting hidden patterns from data to transformit into valuable information using computer power to apply knowledge

discovery methodologies.

It applies knowledge discovery and prediction through a process ofclassification, clustering, regression and association rule learning.

The value of the information depends on the collection of indicative andrepresentative data.

Cookies for behavioral advertising usually contain text that uniquelyidentifies the browser so that advertisers or ad networks can recognize

the same Internet user across different Web sites or multiple areas on

the same site.

Large scale implementation of BT

Poissons Linear Regression

This is a statistical method used to calculate the probability of an event, given

the rate of occurrence of the event in disjoint timeframes, suited for analyzing

outcomes that have positive values.

Poissons Linear Regression works really well where the input data is sparse i.e.

results are valid for rare events. It can model rare events when everyone is

followed for the same length of time, or when people have different length of

follow ups.

Implementing BT using Poissons Linear Regression

Behavioral targeting can be effectively implemented using the Poissons Linear

Regression algorithm, as it maps well to the nature of input data and the kind of

predictions that organizations are looking at.

The Algorithm is well explained by the flow chart:


7/14


7

Impetus Technologies implemented Behavioral targeting using the Poissons

Linear Regression algorithm. The algorithm was deployed using the Hadoop

ecosystem. The entire algorithm was decomposed into individual steps. Each of

the steps was implemented as a Hadoop M/R job and the jobs were run

sequentially using the Oozie workflow engine. The results of the

implementation were models for different categories. These models were

stored on the HBase data store and later consumed for analytics and behavioral

predictions.The steps involved in the above implementation are explained

below:

1. Data Preparation

In this preprocessing step, the data fields of interest were extracted from raw

data feeds, thus reducing the size of the data.

Raw data was related to user behavior with respect to one or more ads. It also

included ad clicks, ad views, page views, searches, organic clicks or overture

clicks.


8/14


8

1. The raw data came from the user base2. The system stored the raw data in HDFS3. The raw data was sent to the data preparation module which

undertook the following:

a. Aggregated event counts over a configurable period of time, tofurther shrink the data size

b. Merged counts into a single entry with as unique key

c. It included two M/R jobsFeature-Extractor and Feature-Generator

1.1 Feature-Extractor

Input- Raw data feeds

Output-

1.2 Feature-Generator

Input-

Output-

2. Model Training

This fitted the Linear Poisson Regression Model from the preprocessed data and

involved the following:

1.

Feature selection2. Generating of training examples3. Model weights initialization4. Multiplicative recurrence to converge model weights

2.1 Poisson-Entity-Dictionary

It mainly performed feature selection and inverted indexing. It did this

by counting entity frequency in terms of touching cookies and selecting

the most frequent entities in the given feature space.

Output-Hashmap of (inverted

index) for all entity types

An entity referred to the name (unique identifier) of an event (e.g. an ad

id, a space Id for page, or a query). The Entity was different from the


9/14


9

feature since the latter was uniquely identified by the pair.

In the context of BT, there were three types of entitiesad, page and

search

The Poisson entity dictionary included three M/R jobsPoissonEntityUnit, PoissonEntitySum, and PoissonEntityHash

2.2 Poisson-Feature-Vector

This generated training examples (feature vectors) that were directly

used later by model initialization and multiplicative recurrence.

It used a sparse data structure (populated primarily with zeros) for

feature vectors. Behavioral count data is very sparse by nature. For a

given user, in a given time period, his or her activity only involves a

limited number of events. Impetus used a pair of arrays of the samelength to represent a feature vector or a target vectoran Integer type

for feature and float type for value (float type for possible decaying),

with an array index giving a pair.

Feature Selection and inverted indexing: - With the feature space

selected from PoissonEntityDictionary, in this step, Impetus discarded

the unselected events from the training data in the feature (input

variable) side. On the target (response variable) side Impetus took the

option of using all features or only selected features to categorize them

into target event counts.

With the inverted index built from PoissonEntityDictionary,

from the PoissonFeatureVector step and onwards, Impetus

referenced an original feature name by its index. The same idea was

also applied to cookies, since the cookie field was irrelevant.

Several pre-computations were performed at this stage: -

1. Impetus further aggregated feature counts into a time window,with a size larger than or equal to the resolution from data

preparation.

2. Decay counts over time using a configurable factor3. Realized causal approach to generate examples. (Causal

approach collects features before targets temporarily; while the

non-causal approach generates targets and features from the

same period of history).


10/14


10

4. Impetus used binary representation (serialized objects in java)and data compression (Sequence file with BLOCK compression

in Hadoop framework) for feature vectors.

Data structure for the feature vector

int[targetLength] targetIndex Array float[targetLength] targetValue Array int[inputLength] inputIndex Array float[inputLength] inputValue Array

Input-

Output-

Target counts were collected from a sliding time window and feature

counts aggregated (possibly with decay) from a time period preceding

the target window. The size of the sliding window was kept relatively

small for the following reasons: -

1. A large window effectively discarded many co-occurrences within that window. E.g. The following

setup yielded superior long term models: -

a. A target window of size one dayb. Sliding over a one week periodc. Preceded by a four week feature window(also sliding

along with the target window)

The Algorithm included the following:

1. For each cookie Impetus cached all the event count data.2. It sorted events by time, forming an event stream of this

particular cookie covering the entire time period of interest.

3. Impetus pre-computed boundaries of the sliding window. Fourboundaries were specified featureBegin,

featureEnd, targetBegin, targetEnd.

separatingfeatureEnd and targetBegin allowed a

gap window in between, which was necessary to emulate

possible latency in online prediction.


11/14


11

4. The company maintained three iterators on the event stream,referencing previous featureBegin, current

FeatureBegin, and targetBegin. It used one pair of

treeMap objects (i.e. inputMap and targetMap) to hold

features and targets of a feature vector as the data was being

processed.2.3 Poisson-Initializer

It initialized the model weights (coefficients of the regressors) by

scanning the training data once.

k: Index of target variables

j: Index of features or input variables

i: examples

a unigram(j) is one occurrence of feature j

a bigram(k,j) is one co-occurrence of target k and feature j

The basic idea was to allocate the weight w(k,j) as a normalized number

of co-occurrences of (k,j).Bigram based initialization.

The output ofPoissonInitializerwas an initialized weight

matrix of dimensionality number of targets by number of features.

1. Impetus distributed the computation of counting the bigrams bya composite key and effectively pre-computed total bigram

counts of all examples before the final stage.

2. The M/R framework provided a single key data structure. Inorder to distribute , Impetus needed an efficient function

to transform a composite key(two integers) into a single key andrecover the composite key back when needed.

bigram Key(k,j) = a long integer obtained by bitwise left

shift 32 bit of k and then bitwise OR by j

3. The Impetus team cached the output of first mapper thatemitted .

2.4 Poisson-Multiplicative

It updated the model weights by scanning the training data iteratively. It

utilized highly effective multiplicative recurrence.

Computing a normalizer Poisson mean involved dot product a previous

weight vector by a feature vector (The input portion)

Input-

Output- updated wk for all k


12/14


12

1. Impetus represented the model weight matrix as K denseweight vectors (arrays) of length J, where K was the number of

targets and J the number of features.

2. Using weight vectors was more scalable in terms of memoryfootprint than matrix representation. But, it raised challenges in

Disk IO. Impetus addressed this problem via in-memory caching.Caching weight vectors was not the solution. The trick was to

cache input examples. After caching, Impetus maintained a

hashmap that recorded all relevant targets for cached feature

vectors. And provided constant time lookup from target Index

to array-index Map.

3. Impetus also used Hadoop's distributed cache, which copied therequested files from HDFS to the slave nodes before the task

was executed. It only copied the files once per job for each task

tracker, which was shared by M/R tasks.

3. Model Evaluation

It tested the trained model on a test data set. The main tasks were:

1. Predicting expected target counts(clicks and views)2. Scoring (CTR)3. Ranking scores of a test set4. Calculating and reporting performance metrics such as CTR lift and area

under ROC curve.

This component contained three sequential steps:

3.1 Poisson-Feature-Vector-Eval

It was Identical to Poisson-Feature-Vector.

There was no need to book keep the summary statistics fortraining such as total count of examples, feature and target

unigrams.

Decay was typically necessary in generating test data. Sinceit enabled efficient incremental predicting as new events

flow in, while diminishing the obsolete long history

exponentially.

Sampling and heuristic based robot filtering were notapplied to generate test data

Impetus could remove those examples without a targetfrom the test dataset, since these records did not impact

the performance, no matter how the model predicted

them. However, examples with targets were also kept, even

those without any inputs. This was because these records


13/14


14/14


14

Summary

As explained above, prediction is a statement made about the future. A very

popular area of application that has flourished in recent times is Behavioral

targeting (BT). BT is defined as a large scale machine learning problem that

leverages historical user behavior to select the most relevant ads to display. The

process basically involves mining historical data sets and extracting hidden

patterns (trends) to predict user interests.

Major IT giants like Yahoo, Google and Amazon have used Behavioral Targeting

and achieved major gains in terms of reach and CTR increase. There are several

implementations of BT that employ various statistical algorithms and processes

to extract the behavioral traits of the users in question.

The input to the BT engine is a historical sequence of the activities undertaken

by users over the Internet. These activities include ad clicks, ad views, page

views, search queries and search clicks. As the users browse the Internet they

unknowingly leave a trail of footprints in terms of visited pages, ads, cookies,

etc. These footprints reveal a lot about their personality traits. BT leverages on

these subtle inputs and without hindering the privacy of the users draws their

personality sketch. Based on these inferences, advertisers are able to target

their audience and show them relevant ads.

Impetus applied Poissons Linear Regression algorithm for its implementation.

This was deployed on the Hadoop environment using chained Map reduce jobs

as an Oozie workflow.

DisclaimersThe information contained in this document is the proprietary and exclusive property of Impetus Technologies Inc. except as otherwise indicated. No part of

this document, in whole or in part, may be reproduced, stored, transmitted, or used for design purposes without the prior written permission of Impetus

Technologies Inc.

About Impetus

Impetus Technologies offers Product Engineering and Technology R&D services for software product development.

With ongoing investments in research and application of emerging technology areas, innovative business models, and

an agile approach, we partner with our client base comprising large scale ISVs and technology innovators to deliver

cutting-edge software products. Our expertise spans the domains of Big Data, SaaS, Cloud Computing, Mobility

Solutions, Test Engineering, Performance Engineering, and Social Media among others.

Impetus Technologies, Inc.

5300 Stevens Creek Boulevard, Suite 450, San Jose, CA 95129, USATel: 408.252.7111 | Email:[email protected]

Regional Development Centers - INDIA: New Delhi Bangalore Indore Hyderabad

To know more visit:http://www.impetus.com

prognosis - an approach to predictive analytics- impetus white paper

Documents