rfm-based e-markets segmentation using … · rfm-based e-markets segmentation ... ** ms student of...
TRANSCRIPT
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
86
h
ttp
://w
ww
.prj
.co
.in
RFM-BASED E-MARKETS SEGMENTATION USING SELF-
ORGANIZING MAPS
BAHRAM IZADI*, ATIYE SABAGHINIA**
* Department of Management, Faculty of Administrative Sciences and Economics, University of Isfahan, Iran
** MS Student of Business Administration- Marketing, Sheikh Bahaei University, Isfahan, Iran
ABSTRACT
As companies have put the mass marketing aside and turned to direct marketing, precise and
correct segmentation and prioritizing the market segments using the appropriate tools are
become significant issues in field of marketing. The purpose of this paper is to segment E-
markets based on three variables of recency, frequency and monetary values, using Neural
Network clustering method. To achieve this purpose, ADSL customers of an E-company are
clustered using the method of self-organizing map (SOM) which is one of the well-known
clustering method in data mining.
KEYWORDS: market segmentation, data mining, RFM model, self-organizing maps
INTRODUCTION
In the past, managers believed in the concept of mass marketing and the debate was over the
creation of potentially large markets, which lead to less cost and more income. Todays, many
companies are moved from mass marketing to smaller groups of buyers with specified needs and
behavioral characteristics that require individual products with marketing mix (Schejter&et al,
2010). Today, companies have realized that they can not be attractive to all customers, or at least
all they can not absorb all into a form. The numbers of purchasers have been too high and are
geographically widespread, and they also have different needs and demands and they have
different shopping experiences. In addition, companies have very different abilities in different
sectors of the market (Kotler& Armstrong, 2011).So Market segmentation is a process by which
a distinct market segment of customer’s needs and characteristics are divided equally. (Walker&
et al,2005).
companies are not only looking to sell goods, rather, they are trying to create and keep profitable
customers. Companies can identify customers by segmenting their market’s customers into
different groups based on specific criteria. Market Segmentation involves a wide variety of ways
and methods, which are divided into two main groups: the first group are approaches that
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
87
h
ttp
://w
ww
.prj
.co
.in
segments are selected based on known features from a known population, the second group are
E-hoc methods which are empirical researches and are related using multi-variable analysis
identified in each sector (Hanafizadeh&Mirzazadeh, 2011). It can be said that in the past,
segmentation was based on subjective methods which were based on the researcher’s perception.
In these old methods, number and type of segments were predetermined and were grouped
customers subjectively on the basis of predetermined variables and thus there were not consistent
relationship between selected segments based on speculation. But today, most companies are
relying on use of data with new market segmentation methods by following customer-
orientation. They do not only study sections features, but also they pay attention to size and
profitability of market segments in order to the efficient segmentation. In line with this, they use
an appropriate method which is simply understandable and far from speculation. The new
segmentation helps identifying valuable and more profitable segments of market, better and more
accurate. In the current situation according to shifting towards customer-orientation, segmenting
markets to identify valuable customers in order to keep and attract customers, is essential.
Understanding and separation of customer by their needs and the used marketing mix has a vital
role in marketing (Liu &et al, 2012) and in line with the new segmentation, precise
determination of the relevant market is very important and customer segmentation by new well
known methods helps that. Today, markets segmentation is essential to identify valuable
customers and keep and attract them. Hence, today, with the increasing expansion of information
technology and the huge volume of data available to customers, the new and efficient techniques
which are created from combination and integration of different sciences are used for effective
segmentation and providing appropriate approaches to develop various industries. Companies are
faced with very large data sets in databases and because of values of these data, companies have
decided to segment customers, which is unavoidable in order to take the advantages of available
large-scale data for identifying customers with new data mining methods. Somehow a large
amount of data and inefficient performance of traditional statistical techniques for intensive data
is an incentive to find effective segmentation tools in order to discover useful information about
markets and customers, therefore, data mining is a solution to this problem (Hiziroglu,2013).
Data mining as a powerful tool refers to find connections between rules and behavior patterns
from analysis of large quantities of data (Xiao & Fan, 2014). In recent years, based on data
mining techniques and based on models and variables which are different from the old
segmentation, analysis has been done in transactional data available to customers Including Li&
et al (2011), In their study, they analyze the characteristics of their customers in a spinning
factory using clustering techniques. In this study, the customer relationship model is defined
based on RFM developed model which is LRFM, in which L defines the length of relation. Also
the customers in this study are grouped in five clusters using K average method and in this
regard the different groups of customers are identified including potential customers, new
customers and valuable and main customers. And Wei& et al (2012), in their study began
segmenting customers of a dental clinic with an approach based on neural network (self
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
88
h
ttp
://w
ww
.prj
.co
.in
organizing maps) and with use of developed model of LRFM (L denotes number of days from
the date of first visit up to the last visit). After reaching the 12 clusters, they used customer
relationship matrix for analysis. Wei& et al (2013), in their study investigated the concept of
customer relationship in the hairdressing industry using data mining techniques. They segmented
the customers of hairdressing salons in Taiwan By combining the methods of k-means and self-
organizing maps based on criteria of RFM. They identified a variety of clients including loyal
customers, potential customers, new customers and etc. in their study and raised appropriate
marketing strategies for each of the clusters and different types of customers.
As mentioned, customer segmentation can be done in various ways. In this paper, the variables
of RFM model have been used for related electronic market segmentation as input variables of
self-organizing maps.
This article seeks to achieve an efficient method to segment electronic markets and their
customers. For this task RFM model is been used for determining variables and data mining for
market segmentation and grouping customers based on three variables of recency, frequency and
monetary value. So this paper is organized as follows: In Section II, the proposed methodology is
described and in Section III the results of applying this model in e-marketplaces are described
and in the final section, appropriate strategies for each part and suggestions for future researches
are presented.
The proposed method of research
In this part, the proposed method of paper is provided. This method is based on RFM model and
focuses on segmentation based on self-organizing map method and somehow using neural
approach in market segmenting. First of all the relevant data are extracted from the company’s
database and then preparing and weighting three variables of recency, frequency and monetary
value are been done. In the next step with regard to use of neural network for segmentation of e-
marketplaces, it is used the method of self-organizing maps in order to achieve market segments
and determine different clusters of customers and In the end, each cluster strategy has been
developed.
Data Preparation
Customers database are composed of massive transaction data that some of them are irrelevant,
redundant and useless which can be removed through data preparation.
The first step for cleaning data is discovering differences in the data which may be due to several
factors. Preparation involves clearing a subset of the data, inserting the appropriate values or
estimating of missing data and integration of data (Han &et al,2011).
RFM model
RFM model is a general and flexible model that has flexibility in particular situations and can be
used and localized based on characteristics of industry. This model can be used in combination
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
89
h
ttp
://w
ww
.prj
.co
.in
with other data mining tools by itself and with its simplicity, it can have useful and applicable
results for marketing strategies including market segmentation and strategy formulation.
RFM is a powerful and known tool in marketing databases that is widely used to measure the
value of customers based on their purchasing. RFM model was first introduced in 1994 by Huges
(Han Hua&et al,2013). Variables in the model are:
1- Recency: This index refers to the time interval between the last purchase made by a
customer to a particular period (end of period). The lower amount of this distance
indicates the high value of this index in model.
2- Frequency: This indicator shows the number of transactions that a customer has done in
a specific period of time. Greater number of exchanges indicates the high value of this
index in the model.
3- Monetary: This index represents the amount of money that a customer is paid for
exchanges in a specific period of time. Greater amount of paid money indicates high
value of this index in model (Coussementet al, 2014).
These models were developed during the research done and some of these models are as follows:
1. eRFM-EMO: which is developed based on composition of the demographic histological
data and used for predicting customer loss rates.
2. TRFM: that is used for development of seasonal products as a combination with
quarterly information.
3. RFD: its development is based on combination with periods in order to analyze web sites
clients.
4. RML: it is used to evaluate customer’s loyalty based on combination with loyalty factor.
5. FRAT: combined with amount and type of sold goods to improve customer clustering
and is used based on classification of each product.
6. RFR: its developing method is used with combination of influence and network access
for the analysis of social networks (Wei &et al, 2010).
7. WRFM: The analytic hierarchy process is used for better decision making to determine
the relative weights of RFM’s criteria.
Self-organizing maps
Self-organizing maps provide powerful and attractive tools to display Multi-dimensional data in
spaces with lower dimensions (usually one or two dimensions). Also, they are a method for
clustering and preprocessing information and also these maps are visualization tools for
exploratory data analysis and make it easy to observe relationships between large amounts of
data for humans. Self-organizing maps have been developed by Professor Tyov Cohen of the
University of Finland.
The algorithm of Self-organizing map is an invariant recursive regression equation which maps a
set of vectors of m∈ Rn to the space of x∈ Rn vectors through the following steps:
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
90
h
ttp
://w
ww
.prj
.co
.in
At each stage of training, a sample vector of x from the input data set is randomly selected and
the distances between x and all the prototype vectors are calculated. By minimizing the
difference between a sample with other samples, the best degree of matching (Best Matching
Unit (BMU)) is calculated by Equation 1 (Vesanto& et al, 2000).
x −mb = mini x−mi
In the next step, the prototype vectors are updated to the best match and their topological
neighbors are moved to the vicinity of the input vector in the input space. For updating the
prototype vector of unit i, equation 2 is used:
mi t + 1 = mi t + α t hbi t x t −mi t
Where t represents the time that the self-organizing network is trained as a recursive process, α
(t) is training and learning rate which indicates the conformity rate and decrease uniformity with
regression process (time) and hbi (t) is the neighborhood kernel which is a decreasing function of
the distance between i-th and b-th on the network map and focuses on winning unit.
Neighborhood function is considered to be like equation 3: (Vesanto&AlhoniemiVe, 2000).
hbi t = exp − ri − rb
2δ2 t
Where σ2 (t) equal to the radius of the neighborhood function expansion and ri∈ R2وrb∈ R2 are
place of i-th and b-th neuron on the self-organized network, that in this case, along with the time
and process of regression, radius expansion decreases. There is no specific approach to
determine the number of clusters and just a general rule for determining the number of clusters is
proposed in which, √Nis the number of samples in data sets. Self-organizing algorithm
minimizes the error function in equation 4:
E = hbi
C
j=1
xi −mj 2
N
i=1
Where C is the number of clusters, neighboring kernel of hbi (t) is focused in unit b that
represents the best amount of fitness of vector xi and analyzed for unit j. According to equation
4, SOM considers a more negative score for large errors (greater distances). The input data of
SOM are formed of vectors with n elements. For clustering these input vectors, different arrays
composition can be considered. As previously mentioned, there is not any predetermined
approach to determine the number of categories. It should be noted that the number of neurons in
each array is obtained by multiplying the number of categories in each of two-elements in an
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
91
h
ttp
://w
ww
.prj
.co
.in
array. Also, to compare the distances within a cluster and between clusters, the Euclidean
distance is used (Schatzmann, 2003).
In this algorithm at every level of training, each of the training vectors adds to network randomly
and weight values and bias are updated after the presentation of each vector. Network of self-
organizing map has been trained for clustering of input data set using Random order incremental
training. Training is stopped when one of these criteria is fulfilled: the maximum numbers of
courses reach the minimum error or achieve the maximum amount of time. Then the network
will specify the winner neuron and weights of the winning neuron and neighboring winner
neurons in each learning phase get closer to the input vector. The weight of winner neurons and
its neighboring neurons are changed according to learning rate. The learning rate and neighbor
distance will be updated in the two-step arrangement. In the arranging stage, the learning rate
begins from an initial value and decreases and the neighbor distance decreased from the
maximum neurotic distance to 1. In the arranging stage it is expected that the neuronal weights
make themselves compatible with neuronal correlates positions in the input space and develop a
general arrangement in the weights of all neurons with the great strides. Hence, the variable of
learning rate is a considerable value and with a certain number of steps, the arranging stage will
stop. During the adjusting stage, unlike the arranging stage, learning rate decreases slowly and
with small changes in weights, it reaches more accurate and final adjustment in weights which at
last it leads to convergence. In adjusting stage, it is expected the weights to be scattered in entire
input space randomly besides preserving topological discipline in arrangement stage. In adjusting
stage which is stage of convergence, training rate has smaller amount in order to reaches more
accurate and final adjustment in weights by small changes. Thus, feature maps during learning
clustering inputs, will also learn topology and the input distribution (Demuth& et al, 2008).
The research data analysis
The examined case study in this research is the ADSL customers of an internet service provider
in Iran. The transaction data between 2006 and 2012 from the company’s database were used for
doing this research which were30000 records. After preparation of data, three variables of
recency, frequency and monetary value are extracted and 6000 cleaned record are obtained. Parts
of these data are given in table 1.
Table1: Some parts of cleared data
Date User ID User Changed
Credit
2012-06-06 7560 30000 1
2012-06-06 7559 7000 2
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
92
h
ttp
://w
ww
.prj
.co
.in
2012-06-06 7558 8000 3
2012-06-06 7557 9000 4
2012-06-06 7556 10000 5
2012-06-05 7555 9000 6
2012-06-05 7554 3000 7
2012-06-05 7553 2000 8
2012-06-05 7552 3000 9
2012-06-05 7551 3000 10
The weight of variables has been determined according to experts which based on intuitive
judgments of experts and professionals of IT industry, the weights of 0.5 for frequency, 0.3 for
recency and 0.2 for monetary value are estimated which shows that the repeating purchase in
electronic markets is important because customers repeating buy and also it’s recently in this
industry, helps long term relationship and increases loyalty and maintains customers in such
markets and along with it, that will result a continued profitability in relevant market. Some of
weighted data are given in table 2.
Table2: Part of the data related to RFM score
RFM
Score
Monetary
Score
Frequency
Score
Recency
Score
Monetary Frequency Recency User
ID
1.000 1 1 1 5000 1 2010 1 1
1.000 1 1 1 122 1 2010 2 2
1.000 1 1 1 5000 1 2010 3 3
1.600 4 1 1 120000 1 2007 4 4
1.600 4 1 1 1000000 1 2007 5 5
1.600 4 1 1 125000 1 2007 6 6
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
93
h
ttp
://w
ww
.prj
.co
.in
1.600 4 1 1 120000 1 2007 7 7
1.600 4 1 1 1000000 1 2007 8 8
1.600 4 1 1 100088 1 2007 9 9
1.200 2 1 1 12000 1 2007 10 10
After data preparation that is based on RFM model and is done by clementine software, these
data are used as input variables for clustering using SOM method. As mentioned, this method is
a neural network to cluster the data set into distinct clusters and records are clustered in a way
that the records within a cluster are similar and records in different clusters are dissimilar.
In this method, the basic units are neurons that are organized at both the input and output layers.
All the input neurons are connected to the output neurons, where each of these connections has
their own weights. During training, each neuron fights all other neurons to win. This process is
repeated several times, until the changes are very minor (Hong, 2012). In this way, a two-
dimensional map of the clusters creates in which, similar records are seen near and the records
that are different from each other seen far apart. And thus, in the present study based on this
method, data is divided into n clusters which are shown in table 3.
Table3: Identifying customers segments according to RFM model
Monetary
Score Mean
Frequency Score
Mean
Recently Score
Mean
Number of
Customer
2.34 1.21 2.09 2476 Cluster1
3.57 4.46 4.01 2109 Cluster2
2.79 2.84 3.11 1051 Cluster3
And also as mentioned, these maps are good visualization tools to display data easily. In figure 1,
it is shown as a two-dimensional map. Existences of different colors on the map indicate the
number of customer in each of the homes.
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
94
h
ttp
://w
ww
.prj
.co
.in
Picture 1: Self-organized map
CONCLUSION
In today's world of marketing, market segmentation for better planning and more focused on
markets and customers is paramount importance. Market segmentation can enable companies to
be more coordinated with customers through highlighting specific customer needs and with
effectively targeting. The segmentation that is done today by integration of methods such as data
mining, statistics, operations research and computer sciences in line with technology
development can be more helpful for companies.
In this paper in addition to emphasize on importance of e-markets segmentation, necessity of
implementation of segmentation process with new methods such as data mining is mentioned.
And one of these techniques which is self-organizing map with approach of using neural network
has been applied. The variables of recency, frequency and monetary value based on RFM model
are used as input variables for segmenting these markets. The obtained results showed three
distinct segments and more importantly the valuable market segment of ADSL Company.
REFERENCES
Biranty,D.(2010).Data Mining Using Rfm Analysis.KnowledgeOrientedApplications InData
Mining,vol18.
Coussement, K., Van den Bossche, F. A. M and Bock, K. W. D. (2014). Data accuracy's impact
on performance: Benchmarking RFM analysis, logistic regression, and decision trees
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
95
h
ttp
://w
ww
.prj
.co
.in
Demuth, H., Beale, M. & Hagan, M. (2008). Neural Network Toolbox (MATLAB), version 6,
The MathWorks, Inc.
Han Hua., H, Cheng-KuiHuang., T, Hua Kao., Y.(2013).Knowledge discovery of weighted RFM
sequential patterns from customer sequence databases. The Journal of Systems and Software, 86:
779-788.
Han, J., Kamber, M., Pei, J. (2011). Data Mining: Concepts and techniques.Third Edition.
Printed in United States of America.
Hanafizadeh, P and Mirzazadeh, M. (2011). Visualizing market segmentation using self-
organizing maps and Fuzzy Delphi method – ADSL market of a telecommunication company.
Expert Systems with Applications, 38, PP. 198-205.
Hiziroglu, A. (2013). "Soft computing applications in customer segmentation: State-of-art review
and critique.Expert Systems with Applications, 40: 6491-6507
Hong, C, W.(2012).Using the Taguchi method for effective market segmentation.Expert Systems
with Applications, 39:5451-5459.
Journal of Business Research, 67: 2751-2785.
Kotler, P., Armstrong, G, M.(2011). Principles of Marketing. 14 th Edition. Publisher: Prentice
Hall..14
Li, D. C., Dai, W. L and Tseng, W. T. (2011). A two-stage clustering method to analyze
customer characteristics to build discriminative customer management: A case of textile
manufacturing business. Expert Systems with Applications, 38, PP. 7186-7191.
Liu, Y., Kiang, M and Brusco, M (2012)."A unified framework for market segmentation and its
applications. Expert Systems with Applications, 39: 10292-10302.
Schatzmann, J., (2003). Using Self-Organizing Maps to Visualize Clusters and Trends in
Multidimensional Datasets.Department of Computing Data Mining Group, Imperial College,
London.
Schejter , A, M.,Serenko, A., S., Turel , O and Zahaf, M. (2010). "Policy implications of market
segmentation as a determinant of fixed-mobile service substitution: What it means for carriers
and policy makers.Telematics and Informatics, 27: 90-102.
Vesanto, J. &Alhoniemi, E. (2000).Clustering of the Self-Organizing Map, 11(3), 586-600.
Vesanto, J., Himberg, J., Alhoniemi, E. &Parhankangas, J. ( 2000). SOM Toolbox for Matlab
5.Helsinki University of Technolog.
Walker, O, C., Boyd, H, W., Mullins,J., Larreche, J,C (2005). Marketing Strategy: A Decision-
Focused Approach. Publisher:McGraw-Hill Irvin.
APJEM
Arth Prabhand: A Journal of Economics and Management
Vol. 3 Issue 12 December 2014, ISSN 2278-0629, pp. 86-96 (Special Issue on Basic and Applied Sciences)
Pin
nac
le R
esea
rch
Jo
urn
als
96
h
ttp
://w
ww
.prj
.co
.in
Wei, J,T., Lin, S, Y., W, H,H .(2010). A review of The Aplication of RFM Model.Artificial
Journal ofBusiness Management, Vol4.
Wei, J. T., Lee, M. C., Chen, H. K and Wu, H. H. (2013). Customer relationship management in
the hairdressing industry: An application of data mining techniques.Expert Systems with
Applications, 40, PP.7513-7518.
Wei,J-T., Lin,S-Y., Weng,C-C and Wu,H-H. (2012). A case study of applying LRFM model in
market segmentation of a children’s dental clinic. Expert Systems with Applications, 39: 5529-
5533.
Xiao, F.,Fan,C.(2014). Data mining in building automation system for improving
buildingoperational performance. Energy and Buildings ,75,PP.109-118.