bringing big data to life - skim€¦ · seg 2 119 545 171 58 174 203 101 seg 3 55 113 316 44 240...
TRANSCRIPT
![Page 1: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/1.jpg)
Jeroen Hardon | Venture Café | March 2017
Bringing big data to life
![Page 2: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/2.jpg)
1.3
Exabytes
2.9
Million
Per second
375
Megabytes
Per day
24
Petabytes
Per day
50
Million
Per day
700
Billion
Minutes
per month
73
Items
Per second
Big data is everywhere
20
Hours
Per minute
![Page 3: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/3.jpg)
A journey in segmentation with
data scientists and big data.
![Page 4: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/4.jpg)
What was the
problem?
What was the
solution?
How well did
it work?
![Page 5: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/5.jpg)
Needs-based
segmentation
7 segments
created
Classifier
tool build,
using 10
questions
Original segmentation study
![Page 6: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/6.jpg)
This resulted in a
happy client.
![Page 7: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/7.jpg)
“Let’s tag a segment to
each person in our
database of 40 million“
![Page 8: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/8.jpg)
12.000 people
from the database
answered the
classifier
questions
Those 12.000
were classified
in 1 of the 7
segments
Attitudinal
segments not
explained by
demographics
Attitudes ≠ Demographics
![Page 9: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/9.jpg)
Revised
segments should
align better with
big data
Must predict
original
segments in
segmentation
study
Merging the
2 types of
data
New classification tool
![Page 10: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/10.jpg)
The database
and survey
demographics
did not match
We build classifiers
by matching survey
data to resemble
the database
We generated many
samples of our
survey data and
built an ensemble
of classifiers
Ensembles
![Page 11: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/11.jpg)
While building ensembles of
classifiers helped, it was still
inadequate.
We needed to strengthen the demographic / behavioral signal
![Page 12: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/12.jpg)
Expectation Maximization
?
![Page 13: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/13.jpg)
Expectation Maximization
5
![Page 14: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/14.jpg)
Expectation Maximization
How do I
"assign" each of
the individual
fruits to a tree
type?
What are the
characteristics of
the fruit of each
tree type?
![Page 15: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/15.jpg)
Expectation Maximization
![Page 16: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/16.jpg)
Expectation Maximization
![Page 17: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/17.jpg)
Expectation Maximization
Observed Data
Initial segmentation data
6500 respondents
Augment of 12000
from Big Data
Known
fixed
segment
Unknown
segment
+ Model 1
![Page 18: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/18.jpg)
Expectation Maximization
Observed Data
Initial segmentation data
6500 respondents
Augment of 12000
from Big Data
Known
fixed
segment
Unknown
segment
+
Big data
variables
Model 2
![Page 19: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/19.jpg)
We got classifiers that were slightly
less optimal in predicting survey
data, but much more aligned with
the big data.
We made sure to not let the predictive accuracy drop below 70%
(originally 80%)
![Page 20: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/20.jpg)
How well did it work?
Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7
Seg 1 564 84 15 56 36 14 18
Seg 2 68 844 84 13 7 13 10
Seg 3 33 72 561 2 3 1 5
Seg 4 34 8 0 567 5 81 29
Seg 5 27 12 1 6 635 50 57
Seg 6 21 27 6 76 43 873 30
Seg 7 18 28 9 50 59 52 1193
Initia
l cla
ssifie
r
segm
en
t
Revised classifier segment
Only 19% changed
Data Source: Survey Data of 6500
![Page 21: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/21.jpg)
How well did it work?
Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7
Seg 1 135 102 18 66 207 157 45
Seg 2 119 545 171 58 174 203 101
Seg 3 55 113 316 44 240 219 72
Seg 4 90 67 4 283 233 287 69
Seg 5 303 169 41 216 1994 925 205
Seg 6 325 259 36 261 646 1591 127
Seg 7 52 26 3 90 193 191 156
Initia
l cla
ssifie
r
segm
en
t
Revised classifier segment
Over 58% changed
Data Source: Augment of 12000
![Page 22: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/22.jpg)
Conclusions
Big data cannot
predict
everything
No need to be
scared of big data.
Surveys and big
data can coexist
Expectation
maximization
provides a
framework for
joint modeling
![Page 23: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/23.jpg)
So what?
![Page 24: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/24.jpg)
So what?
![Page 25: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/25.jpg)
So what?
![Page 26: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259](https://reader034.vdocuments.us/reader034/viewer/2022050406/5f83a9ba4d0a232e8945881d/html5/thumbnails/26.jpg)
So what?