repères bayesia consumer segmentation skim conf08

31
Bayesian Networks : a new tool for consumer segmentation Skim Conference – Barcelona – May 28 th 2008

Upload: francois-abiven

Post on 16-Jan-2015

4.232 views

Category:

Business


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

Bayesian Networks : a new toolfor consumer segmentationSkim Conference – Barcelona – May 28 th 2008

Page 2: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

2

Skim Conference – Barcelona – May 28 th 2008

� Introduction to consumer segmentations

� A brief overview of Bayesian Networks

� Computing a segmentation with Bayesian Networks

� Conclusion

Summary

Page 3: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

3

Skim Conference – Barcelona – May 28 th 2008

Introduction to consumer segmentations

� Introduction to consumer segmentations

� A brief overview of Bayesian Networks

� Computing a segmentation with Bayesian Networks

� Conclusion

Page 4: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

4

Skim Conference – Barcelona – May 28 th 2008

Different marketing strategies# Concepts# Products# Communication# Advertising

MORE EFFICIENT

Why a segmentation ?

� Valuable tool to understand a market

� Homogeneous marketing targets- people who behave the same way- people who have homogeneous motivations / attitudes.

� Groups of people to whom it is possible to speak the same language

Page 5: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

5

Skim Conference – Barcelona – May 28 th 2008

A good segmentation - some important features

�Homogeneous segments

�Clear differences between segments

�Stable…

�Easy to understand

�Operational / Actionable

�Fair representation of the real world

Interpretation / Analysis

OutputStatistical procedure

Only a part of the whole process.How important is it ?

Preparation stage

TECHNICALQUALITY

AND OTHERVERY

IMPORTANTELEMENTS

Page 6: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

6

Skim Conference – Barcelona – May 28 th 2008

The marketer’s dream…and cruel reality

� Obvious groups !

� Any kind of computation should lead to the same results

� More complicated

� Unlimited number of typologies

Procedure should guarantee a relevant clustering

Page 7: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

7

Skim Conference – Barcelona – May 28 th 2008

Classical procedures

� A factorial analysis followed by a clustering of the individuals

� Canonical segmentation

Drawbacks : Difficult to choose what are the attitu des / what are the behaviours (declarative statements) – Time consuming .

CANONICAL ANALYSISCANONICAL ANALYSIS

ATTITUDESATTITUDES

Projection of the individuals on the factorial axisProjection of the individuals on the factorial axis

Clustering of the individualsClustering of the individuals

BEHAVIOURSBEHAVIOURS

Page 8: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

8

Skim Conference – Barcelona – May 28 th 2008

A brief overview of Bayesian Networks

� Introduction to consumer segmentations

� A brief overview of Bayesian Networks

� Computing a segmentation with Bayesian Networks

� Conclusion

Page 9: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

9

Skim Conference – Barcelona – May 28 th 2008

Bayesian Networks

� A computational Tool to Model Uncertaintybased both on graphs theoryreadability – Powerful communication tool

and probability theorysound computations

� Manual modelling through brainstormingProbabilistic Expert Systems

� Induction by automatic learningData analysis, data mining

� Growing popularityIndustry, Defense, Health, …and now, Market Research

Page 10: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

10

Skim Conference – Barcelona – May 28 th 2008

A complete framework for Data Mining

� Parametric estimationUse of the database to estimate the probabilities of a given structure

� Robust Missing values processingExpectation-Maximization (EM)Structural EM

� Structural learningUnsupervised learning to discover all the direct probabilistic relationsSupervised learning to characterize a target variableVariable clustering to induce “factors” made of highly connected variablesProbabilistic Structural Equations

and… Data Clustering to find groups of data sharing the same characteristics

Page 11: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

11

Skim Conference – Barcelona – May 28 th 2008

Formalism : 2 distinctive parts

� StructureDirected acyclic graphs

� ParametersProbability distributions associated to each node

Example: Anti-doping agency using two different tests to

screen competitors

Page 12: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

12

Skim Conference – Barcelona – May 28 th 2008

A reasoning engine 1/3

� Sound evidence propagation on the entire networkSimulationDiagnosisAnd any combination of these 2 types of inference

Page 13: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

13

Skim Conference – Barcelona – May 28 th 2008

A reasoning engine 2/3

� Sound evidence propagation on the entire networkSimulationDiagnosis

If a competitor is doped...

…there is 99.5% chancethat he is disqualified

Page 14: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

14

Skim Conference – Barcelona – May 28 th 2008

A reasoning engine 3/3

� Sound evidence propagation on the entire networkSimulationDiagnosis : thinking the other way round

… there is a slight probability (8%)that he is nevertheless clean.

If a competitor has beendisqualified…

Page 15: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

15

Skim Conference – Barcelona – May 28 th 2008

� Introduction to consumer segmentations

� A brief overview of Bayesian Networks

� Computing a segmentation with Bayesian Networks

� Conclusion

Segmentation with Bayesian Networks

Real case study: Segmentation of women as regards s hopping and fashionFor confidentiality reasons, consumer statements an d outputs have been modified.

Page 16: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

16

Skim Conference – Barcelona – May 28 th 2008

1st Stage : segmentation induction

Page 17: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

17

Skim Conference – Barcelona – May 28 th 2008

Unsupervised learningDiscovering relations between consumer statements

Usage and attitude survey conducted for a clothes retailer.

Sample=1065 women.

234 consumer statements: attitudes and behaviours towards fashion in general, retailers, brand image…

�Heuristic Search Algorithm to find the best representation of the joint probability distribution.

�Minimum Description Length score to evaluate the quality of the network based on fitnessand compactness

Induced network

Page 18: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

18

Skim Conference – Barcelona – May 28 th 2008

Variables clustering and factor induction Simplifying the information

� Analysis of the network to discover groups of variables that are strongly connected and that form a “concept”Ascendant Hierarchical Clustering algorithm based o n the arcs’ Kullback Leibler forces(non linear and global measure – contribution of the relation to the network).

� For each cluster of variablesCreation of a latent variable summarizing the infor mation.

42 factors computed

Example of factor 15 : dimension summarizing originality .

Based on attitude statements (importance to be original, like to differentiate with clothes) and behaviours (buy brands X, Y and Z more often).

Latent variable

Page 19: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

19

Skim Conference – Barcelona – May 28 th 2008

Factor clustering: overview of the procedureSegmentation of the individuals based on the main f actors

� Introducing a new variable (consumer segments) which i s the hidden cause of the main factors.

� Learning the probabilities with Expectation – Maximis ation

� Score derived from MDL to assess the quality of the cl ustering

Page 20: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

20

Skim Conference – Barcelona – May 28 th 2008

Selecting the number of clusters

� Pseudo random walk to find the best number of clustersexample: find the best clustering with random walk between 2 and 6 clusters– 20 iterations

� Also possible to define the desired number of cluste rs

� Possible to define the minimal purity of the clusters. The purity is computed as the mean of the probability of each clus ter point.

The best segmentation is the one that minimizes the score

Page 21: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

21

Skim Conference – Barcelona – May 28 th 2008

2nd stage : segmentation analysis

Page 22: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

22

Skim Conference – Barcelona – May 28 th 2008

� LEARNING the relations between…THE TARGET VARIABLE = SEGMENTATIONTHE CONSTITUTIVE VARIABLES = CONSUMER STATEMENTS

Target Variable= consumer segments

Supervised learningFocusing on consumer clusters

Page 23: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

23

Skim Conference – Barcelona – May 28 th 2008

� Identification of the key variables and associated v aluesFor each consumer group, we use the % of shared inf ormation to sort the variables according to their importance in the characterisati on of the group.

Cluster ProfileUsing the network to describe the consumer groups

4 m

ost c

ontr

ibut

ing

varia

bles

for

Clu

ster

#5

Arrows symbolize the change in the probability dist ributionwhen observing cluster #5.

Compared with total sample,women of cluster#5 :

- Buy brand X more often- Are older women (59 in average)- Do not consider originality as important- Do not like discovering new shops

Page 24: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

24

Skim Conference – Barcelona – May 28 th 2008

Generation of the cluster mapping

Map generation

The size of the cluster is proportional to its prob ability

The proximity of the clusters is a probabilistic pr oximity

The darkness of the blue is proportional to the pur ity of the cluster(in this example all clusters have a purity > 95%)

Page 25: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

25

Skim Conference – Barcelona – May 28 th 2008

Summarizing segmentation results

Superstars

Fashion cheap

8%

10%

18%

Classical upmarket

20%

Functional before all

18%

Neutral

20%

14%

Classical

Young manager / executive

women

-- Money devoted to

clothes

++ Money devoted to

clothes

AgeFashionable originality

SuperstarsSuperstars

Fashion cheap

8%

10%

18%

Classical upmarket

20%

Functional above all

18%

Neutral

20%

14%

Classical

Young manager / executive

women

-- Money devoted to

clothes

++ Money devoted to

clothes

AgeAgeFashionable originality

Page 26: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

26

Skim Conference – Barcelona – May 28 th 2008

Going further : identifying a more compact target m odel

� Markov procedure to select a subset of statements to d etermine to which category consumers belong

� Selection of a subset of variables…

� …knowing the values of these variables makes the ta rget independent of all the other variables

Subset of 11 variables

Overall prediction score = 68%

Interesting to quickly recruit consumer groups amongst the total population.

Page 27: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

27

Skim Conference – Barcelona – May 28 th 2008

Conclusion

� Introduction to consumer segmentations

� A brief overview of Bayesian Networks

� Computing a segmentation with Bayesian Networks

� Conclusion

Page 28: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

28

Skim Conference – Barcelona – May 28 th 2008

Benefits

� Our experience : a powerful tool - Relevant typologies- Easy to carry out

� Modelling the consumer variables : good representation of reality- Non-supervised modelling : no strong hypothesis- Discovering interactions between variables (behaviours / attitu des)- Use of qualitative / quantitative variables

� Data clustering quality- Possible to set the minimum purity of the clusters : enables the marketer to discover “niche” markets (usually less pure) or focus on main stream groups.

� Added-value in the analysis of the clusters- Easy ranking of the key variables for each consumer cluster- Proximity mapping to summarize results

� Development of robust models to identify consumer grou ps- Interesting in the case of upcoming recruitment .

Page 29: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

29

Skim Conference – Barcelona – May 28 th 2008

� Modelling the consumer network and computing latent v ariables can be long when the number of variables is very important .234 variables and 1065 lines: 30-40 minutesTo speed up the process, possible to learn a simpli fied network : e.g. maximum spanning tree or increase of the structural complex ity parameter.

� Continuous variables have to be discretizedResults will depend on the quality of the discretiz ation.Possible to use K-Means to adapt discretization to the distribution of the data.Expertise of the user also helps.

And most of the time in consumer research variables are discrete !

Some drawbacks. How to deal with them ?

Page 30: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

30

Skim Conference – Barcelona – May 28 th 2008

Perspectives

� Flexibility : can be used far beyond usage and attit udes surveys

� Easy to carry out

� Can be adapted to any type of data

� Well designed to process large amounts of data

� Example: segmentation of trains using client’s intern al data

� In the future…- typology of clients (turnover, potential…) to feed a business strategy- segmentation of consumers based on utilities (CBC d ata)

Travelers' Data10 Million individuals

Train data (turnover, occupancy rate…)15.000 trains Clustering of trains

Page 31: RepèRes Bayesia   Consumer Segmentation   Skim Conf08

31

Skim Conference – Barcelona – May 28 th 2008

Contact

Jouffe LionelManaging Director

[email protected]

Craignou FabienData Mining Department Manager

[email protected]