1 how r transformed the analytics paradigm at millward brown juan manuel hernÁndez
TRANSCRIPT
1
How R Transformed the Analytics Paradigm at Millward BrownJ U A N M A N U E L H E R N Á N D E Z
2
help(Millward Brown)W H AT W E D O F O R O U R C L I E N T S
3
library(Millward Brown)
For mor information, go to www.millwardbrown.com
40 years collecting consumer attitudinal data - demographics, social, economic, and cultural habits & opinions.
Marketing communications, media, digital and brand equity research. A lot of touch points <- attitudinal and behavioural data
We work with 90% of the world’s leading brands. Categories and consumer profiles <- data of all types
Offices in 56 countries Global/Regional <- data from almost anywhere.
Brand Strategy
Creative Development
Channel Optimization
Brand Performance
4
data(BrandZ™)
BrandZ™ is just one example of the type and magnitude of studies carried out by MB around the world.
BrandZ™ is the world's largest brand equity database. Created in 1998 and continually updated. It contains data on brands gathered from interviews with over 150,000 people every year in up to 400 studies around the world.
5
MB_2014 <- sum(seq_along(MB))
# T H E J O U R N E Y T O A N E W A N A LY T I C S PA R A D I G M
6
library(BrandDynamics)
Analytics originally written in SPSS!
Voltage2.0 Brand Map
Measuring brand equity with BrandDynamics™
PresenceV
olt
age 2
.0
High
High
Strong Equity(Large strong brands)
Declining Equity(Large weaker brands)
Little Equity(Small weaker brands)
Growing Equity(Small strong brands)
LowLow
7
ts(BrandDynamics™)
20 years provide a lot of learning - conceptual, analytical, and operational. What, why, where, when, and how to analyse brand equity?
1996 1998
2010
2010
20092003
2003
20051992 2012
Launch of BrandDynamics
Launch of BRANDZ
Launch of the ‘Paw Print’ analysis
Bonding Factor analysis
Launch
of D&A
Meaningfully Different Framework
Development of Voltage2.0
Development of the Brand
Strength Score
Development of the Value Driver workshops
8
The new framework required a new calculation engine.
Traditional software development teams aren’t usually skilled in high-level statistics.
Real, enterprise software is much more than making sure calculations are correct.
Error in library(MDf) : no package called ‘MDf'
Original development estimate timeframe of 2 years!
9
Wikipedia :: define(“enterprise software”)
Traditional statisticians have never even heard of enterprise software!
“Enterprise software, also known as enterprise software application (ESA), is purposed-designed computer software used to satisfy the needs of an organization rather than individual users […] Enterprise software is an integral part of a (computer based) Information System, and as such includes web site software production.” - http://en.wikipedia.org/wiki/Enterprise_software
Global, enterprise systems have to consider:• Systems architecture• Support• Deployment• Source control• General software dev. best practices• Development programs (e.g. versioning, agile vs. waterfall development, etc…)
10
Open Source – We could build a free prototype
Statistical Power – Endless, dynamic array of statistical and data processing capabilities
There were hidden demons - an R script/package is not the same as an enterprise analytic system
install.package(MB_R) # the New Calculation Engine
11
UnsupportedLibrary quality
Resource Management“Unknown language”
“R encountered a fatal error”
Open-source software, like R, can be difficult to manage for the enterprise.
12
installed.packages() # Developers & R Statisticians
R was an appealing solution, but very little expertise was available.
An immediate lack of balance in new requirements for analytics systems vs. skillsets available became apparent.
A lot of our solutions would require a level of high-level automation of statistical analyses our developers could not deliver fast enough.
Requirement Skills
Developers
R
Stats
13
load(R Leap of Faith)# M B B O L D LY G O E S F O R R
14
A Universe of Data <- 40 Years
Our analyses are cleverly designed to be applied at a global, generic level, maximizing insight and minimizing noise through dynamic learning.
<-Messy
“Pathological” Data
Survey data is always challenging. Over time, big improvements in data collection have been made, but many, unexpected inconsistencies and biases are a constant presence we need to control.
A difficult mixture of effects on data quality and tidiness make harnessing so much data a major challenge – Millward Brown has a lot of data.
15
DESCRIPTION
We went for it and came up with a plan. An R based enterprise system would require a wrapper to deliver and supply services to the calculation engine. R would have to cooperate with other programming languages.
Agile
Dev
elop
men
t
Calculation Engine
Input Loading
Validation
Analytics
Output
Wrapper
Distribution to Users
Internal Infrastructure
16
sapply(Validation)
Business Logic
Data Validation
Validation routines allowed us to control and enhance the level of flexibility in our analyses and systems.
Validating inputs guarantees required input consistency.
Provides (near immediate) feedback to the user if sufficient conditions aren’t met for successful processing.
Happens promptly to avoid wasting time.
17
• Quickly embraced R’s excellent data manipulation functionality.
• Implementing any sort of statistical analysis/model was possible.
• Vibrant open-source community provides the best possible support, if you know how to harness it!
• Leaning process is difficult but rewarding.
MB_Analytics <- function(BrandEquity)
We quickly learned we could do everything we required in R.
18
read/write.output(MB)
We could plug R into a system/architecture that would make the most of its analytical capabilities.
R can read in data from almost any data source.
R can generate pretty much any type of output.
19
require(Beast)
The Beast is MB’s R-based analytics service.
“’Brand Equity Analytics…’, get the name to spell ‘Beast’ and you’re onto something.” – Dale Smith, Global Head of Analytical Innovations, MB
20
SPSSCSVSAS
str(Beast)
R
Outputs
Enriched respondent level data
(Tabs, dashboards)
Summary reports(XLS, PPT, PDF)
Normative database
Wrapper
Inputs
What started out as a calculation engine quickly evolved into a grand vision that empowered MB’s statistical minds.
Analytics contributed by MB’s statistical community
21
print(MB_Beast)# L E S S O N S L E A R N E D , B L O W S TA K E N , R E W A R D S R E A P E D
22
Warning: Planning returned NA
As statisticians, we had to learn a lot about a lot of standard software best practices that are alien to us by nature!
• R, CRAN, & Custom Library versions• As the team grows, how will several programmers contribute code?
Versioning & Source Control
• Processing Time, Memory Usage• Hardware vs. Code Optimization
Single Analysis vs. Process
• Unit, Regression Testing• Error Handling
Testing & Exception Handling
• Code re-usability• Documentation• Classes, Methods, Functions, Services
Code/Package Structure & Quality
sessionInfo() # R & the Beast over time
23
Phase 1Phase 2
January 2013
March 2013 May 2013 August 2013 August 2014
Phase 4
Beast Community
Prototype built.Single, lonely R programmer wrote one epic script run locally!
T h e B e a s t
The Beast is born! An app-based .Net wrapper is created to deliver functionality.Beast team: 1 R, & 1 .Net Developers
Phase 32 developers for each language.
The Beast is packaged!
The Beast code is structured in line with a Service-oriented architecture.
3 R Developers!
MB now has 6 R Developers in the Global Analytics team.
MB Global R Community is born: 70 active members world-wide.
24
summary(Beast)
Today, the Beast is at the heart of much of what we deliver to our clients, with around 1000 different jobs being processed over the last 12 months around the world.
10%
5%
13%
6%
31%
35%
25
Statisticians aren’t software developers, but they can learn and harness the best of both worlds
Building enterprise software with R is challenging: open source languages require you to consider things you wouldn’t otherwise have to worry about.
R can communicate with all sorts of platforms, enabling efficient gateways for analytics system success. Statisticians and developers make powerful allies!
return(Beast)
26
How R Transformed the Analytics Paradigm at Millward BrownJ u a n M a n u e l H e r n á n d e z