open source for customer analytics

Open Source for Customer Analytics

Matthias FunkeBusiness & Technology Consultant

Agenda Topics

Open Source Software

Data Products

The “Data Process”

Tying it together

Open Source Software

Examples: Linux, LibreOffice, Eclipse, Hadoop

Source Code open, e.g. github.com (>3M users, 6.8M repos)

Governed by foundations, e.g. Apache Software Foundation, Free Software Foundation

Contributors / committers: Academia, start-ups, corporations, specialised OSS companies

Popular Apache Software Projects

Project Donated by...

Cassandra Facebook (2008)

Storm Twitter (2013)

Hadoop Yahoo (2008)

Kafka LinkedIn

Apache Software Foundation SponsorsGoogle, Yahoo, Microsoft, Facebook, Citrix…

HP, IBM, Hortonworks, Cloudera, Comcast

Auto & General, Huawei, Pivotal, …

Talend, Twitter

Benefits, Drawbacks & Facts

Benefits● No Licence Cost● Huge amount of

knowledge in the community

● High speed of innovation● Funny names

Drawbacks● Overwhelming choices● Varying maturity● Skills challenge (for

newer projects)

Facts of Life● Professional Services / Support not free

“Data Products”

Core: valuable data. Tools to display and manipulate.

Good: live, visual, searchable

Types:

● Exploratory● Internal production● Publicly facing (but free)● Commercial = monetised

VOLUME

VARIETY

VELOCITY

VERACITY

Popular Data Products

Google Flights (not a booking engine!)

CIA World Fact Book (simple presentation)

Inside AirBnB (“activist”)

data.gov.uk

The Data Process

1. Obtain data2. Explore & clean data3. Analyse & model4. Visualise5. Productionise & automate Data Pipeline

a. How and where to distribute?

b. How to scale?

c. How to secure?

d. How to manage day-to-day?

Data Exploration on One PC

Using ggplot2 for exploratory graphs

qplot(host$availability_365,+ geom="histogram",+ binwidth = 5, + main = "Histogram for Availability", + xlab = "AirBnB in London", + fill=I("blue"))

Statistical Analysis

SIMPLE

● Sum, Count, Mean / Median

● Variance / Standard Deviation

E.g. Average Revenue per User per Neighbourhood (by Month of the Year)

MORE COMPLEX

● Clustering

● Co-variance matrix

(dependencies between

variables)

● Predictive Models

● Machine Learning

Big Data Architectures (simplified)

“Big” Database Hadoop Cluster / File System

Query Engine (Data Access)

Execution Engine (Business Logic)

Search Engine (Accessibility)

Visualisation Layer

Visualisation using KIBANA

Trusted Analytics Platform - Brand New OSS

Interactive Notebooks

New breed of software to work interactively on data

Spark/Scala Notebook

Apache Zeppelin

Databricks: cloud (proprietary but built on Spark)

open source for customer analytics

Software

big data and analytics creating actionable intelligence ·...

customer analytics software - quiterian

online customer review analytics

how analytics drives customer life-cycle …...how analytics...

predictive analytics customer successes

customer data analytics

vendor landscape: customer analytics service providers ·...

customer analytics and segmentation

customer analytics in telecom.pptx

open source analytics

customer analytics

customer experience analytics presentation

the forrester wave™: customer analytics solutions, q4 2012...

sas customer link analytics 5.6: administrator's guide ·...

predictive customer analytics

analytics for customer engagement

di analytics: customer slides

the fastest route to customer insight, profitability &...

customer interaction analytics speech analytics - wipro ·...

location analytics in customer experience …€¦ · 2 |...