journée thématique big dataperso.univ-lyon1.fr/khalid.benabdeslem/ihp/o_giroud.pdfcustomer...

1

JournéeThématiqueBig Data

13/03/2015

2

• About Flaminem

• What Do We Want To Predict?

• What Is The Machine Learning Theory Behind It ?

• How Does It Work In Practice?

• What Is Happening When Data Gets Big?

…And How Big is Big?

• MapReduce And Spark Explained With Card Games

Agenda

3

We Collect Browsing Data Way Beyond Our Clients Properties And Analyze Online Behaviors to Predict Intent

Our clients

properties

Internet

Some figures in France

We track online journeys of 150 Millions of

cookies (of which, 50% mobile and the other

half on Tablet/PC)…

…on 1 million of websites in France (via http

cookies with opt-out)

… and detect, for each single cookie, about

5 to 30 visits each active day

We analyze the context of these visits thanks

to our automated classification engine

(based on semantic analysis of pages

content)

4

We Find the Best Moments to Engage a Conversation With Their Customers and Prospective Customers

Purchase

Decision

Level Of

Intent Complex and rare decisions taking

months to consider, evaluate and decide

Examples: Car, Real-Estate Purchase,

suscriptions to Insurance, Credit.

What we predict

Impulse purchase

Examples :

eCommerce,

Travel

Not in our focusConsideration

Intention

Customer Decision Journey

5

Our Solutions Are Based On Modelling And Predicting Online Behaviors

Data for client

Acquisition

• Upper-funnel and

Mid-funnel targeting

segments for RTB

• Paid Search

Optimization Data

Leads Scoring

For Call-Centers

Clients

Scoring

• Churn scoring

• Cross-Sell and

Up-Sell Scoring

• Lead scoring for

use in Call-Center

to optimize lead

conversions

Price Sensitivity

Scoring

• Scores to assess

prospects / clients

price sensitivity to

improve product

pricing

R&D

6

• We predict when a user fills an online form : “client conversion”

• We represent the conversion with a binary variable 𝑦 ∈ {0,1}

• In the general case we call y the outputs (or responses)

• To predict conversion, we learn on features (e.g., browsing history)

• We represent these input variables (called features or predictors), with a d-

dimensional vector 𝑥𝑖 ∈ ℝ𝑑

• For each example, the goal is to use the inputs to predict the value of the

outputs. We call this exercise Supervised Learning

What Do We Want To Predict?

7

What Is The Machine Learning Theory Behind It ?

• We consider the outputs 𝑦𝑖 = 𝑓 𝑥𝑖

• We want to minimize risk 𝔼𝑃𝑋,𝑌 𝑙 𝑦, 𝑓 𝑥

• Where 𝑙 is the Loss Function.

• In practice, we want to minimize the empiric risk (expectation

is approximated by averaging over sample data)

1

𝑛

𝑖=1

𝑛

𝑙(𝑦𝑖 , 𝑓 𝑥𝑖 )

• After adding the regularization term, the goal is to solve the problem:

min𝑓[1

𝑛

𝑖=1

𝑛

𝑙 𝑦𝑖 , 𝑓 𝑥𝑖 + λ. Ω(𝑓) ]

Loss function Regularization

8

How Does It Work In Practice ?

Data Ingestion Data Processing Machine Learning

• Batch Update

• Quality control

• Jobs Automation and

Monitoring

• Events to Features building

• Features aggregation

• Features enrichment with

temporal dimension

• Pattern analysis

• Data formatting

• Features Engineering

• Model fitting

• Hybrid optimization

‒ 1st epoch with SGD

‒ High-precision

conversion with LBFGS

What tools are available when dealing with “small” data?

9

What Is Happening When Data Gets Big?... …..By The Way How Big is Big?


How do traditional tools behave with such constraints?

10+ GB Ingested daily

Queries (Join, Sort…)

on very large tables

300+ GB

10M lines sub-sampled

dataset

30 000 Features

Very Sparse Features (e.g., 60 active features per line)

Too

slow

Too large

to fit in

RAM

Needs

distributed

storage

10

Our Tools


Exploration

Machine Learning at Scale

+ Interactive Analytics

11

Let’s play a game!

12

2

3

4

5

Let’s play a game!Method 1: Use the RAM

13


14


7

8

3

Method 2: Sort the deck

15

Let’s play a game!Method 3: Use MapReduce

,2,3,4,5,

6,7,8,9,10

,2,3,4,5,

6,8,9,10

,2,3,4,5,

6,7,9,10

,2,4,5,

6,7,8,9,10

7

8

3

Split ShuffleMap Sort ReducePartition

16

Let’s play a game!Method 4: Use Spark

1-5

1-5

1-5

1-5

6-10

6-10

6-10

6-10

1-5

1-5

1-5

1-5

6-10

6-10

6-10

6-10

1-5

1-5

1-5

1-5

6-10

6-10

6-10

6-10

7

3

8

17

Wrapping up

• You shall not scale

• Code can get messy

• Simple to use

• Wide ML library

Used for : Machine Learning (Exploration)

Pros Cons

18

Wrapping up

Used for : Data storage + Batch Processing + SQL

• Heavy to handle

• MapReduce jobs are hard to

write

• SQL can’t do everything

• Scalable to 1000+

machines

• Failure Resilience

• Mature

• Very powerful data

structures

Pros Cons

19

Wrapping up

Used for : Machine Learning at Scale + Interactive Analytics

• Hard to master

• Young

• Less stable

• Less feature-

complete

• Very versatile

• Faster than MapReduce

• Easy to write jobs

• Works (not only) with Hadoop

• Easy to interface with SQL

• Strong community, fast

growth

• Many connectors (Python, R)

Pros Cons

FLAMSPARK

20

(A brief, unexhaustive)

History of Hadoop (and co.)

20052006

Open Source

200320042006

Google File SystemMapReduceBigTable

Publishes papers on Proprietary

20072008

Open Source

20072008

Open Source

20102013

Open Source

2006

2010

201220082011

Open Source

21

Questions ?

[email protected] [email protected]

22

Our Machine Learning Algorithms

Timestamps added to our features

Features selection:

- Features’ frequencies and strengths

- Noisy features removed

- L1 regularization to select top features

- Stepwise subset selection

- “User groups” classification

- “Sites/pages” classification (semi-

supervised)

- “User groups” crossed with “site groups”

- Feature hierarchizing and crossing

Our Models

Features Engineering

Our Optimizers

Logistic Regression

L2 regularization

Distributed algorithms in Spark

Hybrid models:

- 1st epoch with SGD

- High-precision conversion with LBFGS

(OW-LQN for non-smooths functions)

Evaluation of Factorization machine and

multinomial logistic regression

Finalization of decision process modelling

through a Hidden Markov Chain (HMM)

model

Current explorations

Gradient Boosted Decision trees used to

generate more complex features

23

What is FlamSpark? A ML Framework sitting on top

of Apache Spark and exploiting existing mathematical

libraries (MLLib, Breeze)

Why Spark? Our models in Python couldn’t scale

above 5 Million observations and 20,000 features.

Distributed Computing was our preferred option to

scale and Spark the best framework for this

Why FlamSpark?

• Simplify data manipulation

• Rapid ML model prototyping in Spark

• Support custom mathematical features beyond

existing Libs (e.g., LBFGS & OWLQN optimizers,

ElasticNet and Adaptative regularization,

Factorization Machine Algorithm)

How does it scale?• Run Logistic Regression with 100 Million

observations and 500,000 features (on a 10

nodes cluster)

ROC + Precision / Recall

+ Segments generation

for use in RTB

Raw Data

Meet FlamSpark, Our Machine Learning Framework

journée thématique big dataperso.univ-lyon1.fr/khalid.benabdeslem/ihp/o_giroud.pdfcustomer...

Documents