big data solutions for advanced marketing analytics

34
Big Data Solutions for Marketing Analytics Natalino Busa @natalinobusa

Upload: natalino-busa

Post on 28-Nov-2014

675 views

Category:

Technology


1 download

DESCRIPTION

Our retail banking market demands now more than ever to stay close to our customers, and to carefully understand what services, products, and wishes are relevant for each customer at any given time. This sort of marketing research is often beyond the capacity of traditional BI reporting frameworks. In this talk, we illustrate how we team up data scientists and big data engineers in order to create and scale distributed analyses on a big data platform. By using Hadoop and open source statistical language and tools such R and Python, we can execute a variety of machine learning algorithms, and scale them out on a distributed computing framework.

TRANSCRIPT

Page 1: Big data solutions for advanced marketing analytics

Big Data Solutions for Marketing Analytics

Natalino Busa@natalinobusa

Page 2: Big data solutions for advanced marketing analytics

Parallelism Hadoop Cassandra Akka

Machine Learning Statistics Big Data

Algorithms Cloud Computing Scala Spray

Natalino Busa@natalinobusa

www.natalinobusa.com

Page 3: Big data solutions for advanced marketing analytics

Humanize Data

Page 4: Big data solutions for advanced marketing analytics

The bank statements

Page 5: Big data solutions for advanced marketing analytics

Back to routine.Grocery, broken washmachine

After-vacation funPancake house.

Traveling back.

Just back home. Pizza.

Shopping in SicilyVacation!

The bank statements How I read the bank bills

Page 6: Big data solutions for advanced marketing analytics

Back to routine.Grocery, broken washmachine

After-vacation funPancake house.

Traveling back.

Just back home. Pizza.

Shopping in SicilyVacation!

The bank statements How I read the bank bills What happened those days

Page 7: Big data solutions for advanced marketing analytics

data is the fabric of our livesLet’s give more meaning and context to data.

Page 8: Big data solutions for advanced marketing analytics

Abraham Harold Maslow (April 1, 1908 – June 8, 1970) was an American psychologist who was best known for creating Maslow's hierarchy of needs

Page 9: Big data solutions for advanced marketing analytics

breathing, food, water, sleep

security of body, resources, health, employment, property

friend, family, partnersecurity of love and belonging

self-esteem, confidence, achievements, respect

spontaneity, creativity, acceptance, freedom, ethics

Physiology

Contractual

Love & Caring

Esteem

Self-actualization

Very human needs

Page 10: Big data solutions for advanced marketing analytics

How much caring can technology be?

Page 11: Big data solutions for advanced marketing analytics

Connectivity, Electricity, Hardware / Infra

security of basic operationsREST APIs, Encryption, Authentication

Notification, Alerts,Social bonding, Predictions

Set goals, planning,Achievements, Advisory role

Freedom, Trusted Companion

Physiology

Contractual

Love & Caring

Esteem

Self-actualization

Technology is reaching out

Page 12: Big data solutions for advanced marketing analytics

Data science top 3

Dimensionality

Reduction

Predictive

Analytics

Clustering

Segmentation

Page 13: Big data solutions for advanced marketing analytics

Data science: what’s working?

- Random Forests

- Artificial Neural Networks

- Clustering Algorithms

- Pattern Recognition

- Time-Serie analysis

- RegressionMost actual models are a

combination of these ones

Page 14: Big data solutions for advanced marketing analytics

Data science ^.^/

keep it scientific

cross-validate your models

keep it measurable

play with it

create new features

explore the available data

Page 15: Big data solutions for advanced marketing analytics

How to code data science?

Page 16: Big data solutions for advanced marketing analytics

# Multiple Linear Regression Example

fit <- lm(y ~ x1 + x2 + x3, data=mydata)

summary(fit) # show results

● Language for statistics● Easy to Analyze and shape data● Advanced statistical package● Fueled by academia and professionals● Very clean visualization packages

Packages for machine learningtime serie forecasting, clustering, classification decision trees, neural networks

Remote procedure calls (RPC)From scala/java via RProcess and Rserve

Data Science: R

Page 17: Big data solutions for advanced marketing analytics

>>> from sklearn.datasets import load_iris>>> from sklearn import tree>>> iris = load_iris()>>> clf = tree.DecisionTreeClassifier()>>> clf = clf.fit(iris.data, iris.target)

● Flexible, concise language● Quick to code and prototype● Portable, visualization libraries

Machine learning libraries:scipy, statsmodels, sklearn, matplotlib, ipython

Web librariesflask, tornado, (no)SQL clients

Data Science: Python

Page 18: Big data solutions for advanced marketing analytics

Earn the trust

Page 19: Big data solutions for advanced marketing analytics

The customer’s context

Personal history: amount of transactions ever done

Long term Interaction:how the users’ action correlate with others

Real time events:Trends and recent events

Page 20: Big data solutions for advanced marketing analytics

The customer’s context

context is related to time:

slow changing: the defining characteristic of a person

fast changing: events which influence our lives, trends

Require very different technology solutions !!!

Page 21: Big data solutions for advanced marketing analytics

Challenges

Not much time to reactEvents must be delivered fast to the new machine APIsIt’s Web, and Mobile Apps: latency budget is limited

Loads of information to processUnderstand well the user historyAccess a larger context

Page 22: Big data solutions for advanced marketing analytics

Big Data and Fast data

ranking and preference

segmentation and clustering

short term trending topics

rule-based recommendations

10’s Terabytes of Data. This can take hours ….

100’s of events per second.This must be fast ….

Page 23: Big data solutions for advanced marketing analytics

Back to the drawing board

Page 24: Big data solutions for advanced marketing analytics

core banking systems

SOAP services and DBs

System BUS

customer facing appls

channels

A high-level bank schematic

Page 25: Big data solutions for advanced marketing analytics

Higher separation !

Less silos

Interactions

with core

systems

Bigger and Faster

Page 26: Big data solutions for advanced marketing analytics

Human-centric applications

Page 27: Big data solutions for advanced marketing analytics

Some techs

Page 28: Big data solutions for advanced marketing analytics

Hadoop: Distributed Data OS

ReliableDistributed, Replicated File System

Low cost↓ Cost vs ↑ Performance/Storage

Computing Powerhouse

All clusters CPU’s working in parallel for running queries

Page 29: Big data solutions for advanced marketing analytics

Cassandra: A low-latency 2D store

ReliableDistributed, Replicated File System

Low latencySub msec. read/write operations

Tunable CAPDefine your level of consistency

Data model: hashed rows, sorted wide columns

Architecture model: No SPOF, ring of nodes, omogeneous system

Page 30: Big data solutions for advanced marketing analytics

Scala / Akka / Spray: a WEB API reactive framework

ActorA Actor

B

ActorC

msg 1msg 2

msg 3

msg 4● it scales horizontally (can run in cluster mode)

● maximum use of the available cores/memory

● processing is non-blocking, threads are re-used

● can parallelize computing power across many actors

Very fast: 1000’s messages/sec

Very reliable: auto recovery

Lazy: compute only when required

Page 31: Big data solutions for advanced marketing analytics

Putting it all together

Hadoop

application (actor based)

millions of millions of

λ= conversions

( lamda )Data queues

Page 32: Big data solutions for advanced marketing analytics

Science & Engineering

Statistics, Data Science

PythonRVisualization

IT InfraBig Data

JavaScalaSQL

Hadoop: Big Data Infrastructure, Data Science on large datasets

Big Data and Fast Data requires different profiles to be able to achieve the best results

Page 33: Big data solutions for advanced marketing analytics

Some lessons learned

● Mix and match technologies is a good thing● Fast Data must complement Big Data● Ease integration among teams● Hadoop, Cassandra, and Akka● Data Science takes time to figure out

Page 34: Big data solutions for advanced marketing analytics

Parallelism Mathematics Programming

Languages Machine Learning Statistics

Big Data Algorithms Cloud Computing

Natalino Busa@natalinobusa

www.natalinobusa.com

Thanks !Any questions?