devoxx real-time learning

1©MapR Technologies - Confidential

Real-time Learning


whoami – Ted Dunning

Chief Application Architect, MapR Technologies Committer, member, Apache Software Foundation– particularly Mahout, Zookeeper and Drill

(we’re hiring)

Contact me [email protected]@[email protected]@ted_dunning

mailto:[email protected]




Slides and such (available late tonight):– http://www.mapr.com/company/events/devoxx-3-29-2013

Hash tags: #mapr #devoxxfr


Agenda

What is real-time learning? A sample problem Philosophy, statistics and the nature of the knowledge A solution System design


What is Real-time Learning?

Training data arrives one record at a time

The system improves a mathematical model based on a small amount of training data

We retain at most a fixed amount of state

Each learning step takes O(1) time and memory


We have a product to sell … from a web-site


What picture?

What tag-line?

What call to action?


The Challenge

Design decisions affect probability of success– Cheesy web-sites don’t even sell cheese

The best designers do better when allowed to fail– Exploration juices creativity

But failing is expensive– If only because we could have succeeded– But also because offending or disappointing customers is bad


A Quick Diversion

You see a coin– What is the probability of heads?– Could it be larger or smaller than that?

I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change?– And did it ever have a single value?


A Philosophical Conclusion

Probability as expressed by humans is subjective and depends on information and experience


So now you understand Bayesian probability


Another Quick Diversion

Let’s play a shell game This is a special shell game It costs you nothing to play The pea has constant probability of being under each shell

(trust me)

How do you find the best shell? How do you find it while maximizing the number of wins?


Pause for short con-game


Conclusions

Can you identify winners or losers without trying them out?No

Can you ever completely eliminate a shell with a bad streak?No

Should you keep trying apparent losers?Yes, but at a decreasing rate


So now you understand multi-armed bandits


Is there an optimum strategy?


Thompson Sampling

Select each shell according to the probability that it is the best

Probability that it is the best can be computed using posterior

But I promised a simple answer


Thompson Sampling – Take 2

Sample θ

Pick i to maximize reward

Record result from using i


Nearly Forgotten until Recently

Citations for Thompson sampling


Bayesian Bandit for the Shells

Compute distributions based on data so far Sample p1, p2 and p3 from these distributions

Pick shell i where i = argmaxi pi

Lemma 1: The probability of picking shell i will match the probability it is the best shell

Lemma 2: This is as good as it gets


And it works!


Video Demo


The Basic Idea

We can encode a distribution by sampling Sampling allows unification of exploration and exploitation

Can be extended to more general response models


The Original Problem

x1x2

x3


Mathematical Statement

Logistic or probit regression


Same Algorithm

Sample θ

Pick design x to maximize reward


Context Variables

x1x2

x3

y1=user.geo y2=env.time y3=env.day_of_week y4=env.weekend


Two Kinds of Variables

The web-site design - x1, x2, x3– We can change these– Different values give different web-site designs

The environment or context – y1, y2, y3, y4– We can’t change these– They can change themselves

Our model should include interactions between x and y


Same Algorithm, More Greek Letters

Sample θ, π, φ

Pick design x to maximize reward, y’s are constant

This looks very fancy, but is actually pretty simple


Surprises

We cannot record a non-conversion until we wait

We cannot record a conversion until we wait for the same time

Learning from conversions requires delay

We don’t have to wait very long


Required Steps

Learn distribution of parameters from data– Logistic regression or probit regression (can be on-line!)– Need Bayesian learning algorithm

Sample from posterior distribution– Generally included in Bayesian learning algorithm

Pick design– Simple sequential search

Record data


Required system design


t

now

Hadoop is Not Very Real-time

UnprocessedData

Fully processed

Latest full period

Hadoop job takes this long for this data


t

now

Hadoop works great back here

Storm workshere

Real-time and Long-time together

Blended view

Blended view

Blended View


Traditional Hadoop Design

Can use Kafka cluster to queue log lines Can use Storm cluster to do real time learning Can host web site on NAS Can use Flume cluster to import data from Kafka to Hadoop Can record long-term history on Hadoop Cluster

How many clusters?


Kafka

Kafka Cluster

Kafka Cluster

Kafka Cluster

Storm

Users

Web Site

Kafka API

Web Service NAS

Design Targeting

Hadoop

HDFS Data

Flume


That is a lot of moving parts!


Alternative Design

Can host log catcher on MapR via NFS Storm can read data directly from queue Can host web server directly on cluster

Only one cluster needed– Total instances drops by 3x– Admin burden massively decreased


Users

Catcher Storm

Topic Queue

Web-server

http

Web Data

MapR


You can do thisyourself!


Contact Me!

We’re hiring at MapR in US and Europe

MapR software available for research use

Contact me at [email protected] or @ted_dunning

Share news with @apachemahout

Tweet #devoxxfr #mapr #mahout @ted_dunning


devoxx real-time learning

Technology

mapr technologies committer

best probability

best shell lemma

distributions pick shell

pq d pick i

algorithm sample q

w x q i ij

best designers