mining best utility pattern from rfid data warehouse...

Dr. Barjesh Kochar & Prof. Pankaj Lathar ABSTRACT Identifying the sequential patterns from a huge database sequence is a main problem in the area of knowledge discovery and data mining. Therefore, only if an efficient mining technique is used the stored information will be helpful. In the earlier effort an innovative data mining technique based on sequential pattern mining and fuzzy logic was used to efficiently mine the RFID data. In a large database, if the entire set of sequential patterns is presented in the result the user may find it difficult to understand and employ the mining result. It is found that even efficient algorithms that have been proposed for mining large amount of sequential patterns from huge databases is a computationally costly task. An efficient data mining system that generates the most favorable sequential pattern is proposed to overcome this issue. Developing a utility considered RFID data mining technique is the main aim of exploration. Generation of dataset from the warehoused RFID data is the first stage in the proposed technique. Then, with various pattern length combinations the sequential patterns are mined and by using the sequential patterns the fuzzy rules are generated. Each pattern has its own utility. From the mined sequential patterns the most favorable sequential pattern is generated by using Genetic Algorithm (GA). To find out the sequential pattern with maximum profit, the fitness function of the GA will be used. The implementation result shows that the proposed mining system performs accurately by extracting the important RFID tags and its combinations, nature of movement of the tags and the optimum sequential patterns. Focusing only on the consequential sequential patterns that the users find interesting leads to productive trade in RFID enabled applications. Data Mining System, RFID, Genetic Algorithm (GA), Fuzzy rules Keywords: . Mining Best Utility Pattern from RFID Data Warehouse through Genetic Algorithm INTRODUCTION As a result of the recent development of information, and accessibility of low-priced storage, huge data collection has been possible throughout the previous decades. Utilizing this information to comprehend competitive benefits, by analyzing the data is the eventual purpose of this huge data collection i.e., determining previously unknown patterns in data that can direct the process of decision making [1] [8]. On the basis of direct handling of the data by a person the conventional data analysis methods are generally based and they are not extendable to large data sets. For competent storage and searching of large data sets primary tools are obtainable in database technology. But, the challenging and unsettled issue is assisting the humans to analyze and understand large masses of data. There is an assurance that novel techniques and intelligent tools that are presented by the forthcoming data mining field will meet these challenges. The term data mining, which is also called Knowledge Discovery in Databases (KDD), is defined as “The non-trivial extraction of implicit, previously unknown, and potentially useful information from data” [2] [7]. Data mining [3] a multidisciplinary united effort from databases, machine learning, and statistics, is winning in turning masses of data into small valuable pieces. In a real-world application the ultimate goal of a data mining task might be e.g. to allow a company to either improve its marketing, sales, and customer support operations or through better understanding of its customers, recognize a fraudulent customer. Data mining methods have been successfully carried out in a variety of fields VIPS VIVEKANANDA JOURNAL OF RESEARCH (1)

Upload: doandien

Post on 07-Sep-2018




0 download


Dr. Barjesh Kochar & Prof. Pankaj Lathar


Identifying the sequential patterns from a huge database sequence is a main problem in the area of

knowledge discovery and data mining. Therefore, only if an efficient mining technique is used the stored

information will be helpful. In the earlier effort an innovative data mining technique based on sequential

pattern mining and fuzzy logic was used to efficiently mine the RFID data. In a large database, if the entire

set of sequential patterns is presented in the result the user may find it difficult to understand and employ the

mining result. It is found that even efficient algorithms that have been proposed for mining large amount of

sequential patterns from huge databases is a computationally costly task. An efficient data mining system

that generates the most favorable sequential pattern is proposed to overcome this issue. Developing a utility

considered RFID data mining technique is the main aim of exploration. Generation of dataset from the

warehoused RFID data is the first stage in the proposed technique. Then, with various pattern length

combinations the sequential patterns are mined and by using the sequential patterns the fuzzy rules are

generated. Each pattern has its own utility. From the mined sequential patterns the most favorable

sequential pattern is generated by using Genetic Algorithm (GA). To find out the sequential pattern with

maximum profit, the fitness function of the GA will be used. The implementation result shows that the

proposed mining system performs accurately by extracting the important RFID tags and its combinations,

nature of movement of the tags and the optimum sequential patterns. Focusing only on the consequential

sequential patterns that the users find interesting leads to productive trade in RFID enabled applications.

Data Mining System, RFID, Genetic Algorithm (GA), Fuzzy rulesKeywords: .

Mining Best Utility Pattern from RFID DataWarehouse through Genetic Algorithm


As a result of the recent development of information,

and accessibility of low-priced storage, huge data

collection has been possible throughout the previous

decades. Utilizing this information to comprehend

competitive benefits, by analyzing the data is the

eventual purpose of this huge data collection i.e.,

determining previously unknown patterns in data

that can direct the process of decision making [1]

[8]. On the basis of direct handling of the data by a

person the conventional data analysis methods are

generally based and they are not extendable to large

data sets. For competent storage and searching of

large data sets primary tools are obtainable in

database technology. But, the challenging and

unsettled issue is assisting the humans to analyze

and understand large masses of data. There is an

assurance that novel techniques and intelligent tools

that are presented by the forthcoming data mining

field will meet these challenges. The term data

mining, which is also called Knowledge Discovery

in Databases (KDD), is defined as “The non-trivial

extraction of implicit, previously unknown, and

potentially useful information from data” [2] [7].

Data mining [3] a multidisciplinary united effort

from databases, machine learning, and statistics, is

winning in turning masses of data into small

valuable pieces. In a real-world application the

ultimate goal of a data mining task might be e.g. to

allow a company to either improve its marketing,

sales, and customer support operations or through

better understanding of its customers, recognize a

fraudulent customer. Data mining methods have

been successfully carried out in a variety of fields


including marketing [10], manufacturing, process

control, and fraud detection [9], bioinformatics,

information retrieval, adaptive hypermedia,

electronic commerce and network management [4].

Descriptive mining and Predictive mining are the

two types of data mining tasks [5]. The fundamental

characteristics or common properties of the data in

the database are portrayed by a technique denoted by

descriptive mining. The technique of predictive

mining figures out patterns from the data, this

enables predictions to be made. Tasks like

Classification, Regression and Deviation detection

are included in predictive mining methods.

Many latest and emerging applications are found by

mining information from a huge database. One of

the fields that incorporate the sequential pattern

mining in RFID database is the Radio Frequency

Identification (RFID). Radio Frequency

Identification (RFID) is a high-speed, real-time,

precise information gathering and processing

technology, which by employing radio-frequency

signal identifies the objects distinctively [6]. An

extensive variety of organizations and individuals

are being helped by RFID technology, for instance,

hospitals and patients, retailers and customers, and

manufacturers and distributors all through the

supply chain to achieve substantial productivity

gains and efficiencies [11]. Motivated by long

sequences in text data, biological data, software

engineering, and sensor networks, mining repetitive

gapped subsequences was studied to capture the

occurrences of sequential patterns repeating within

each sequence of a large database and then use them

as features for classification or prediction. The tags

are very diverse from printed barcodes in their

ability to hold data, at which range the tags can be

read, and the absence of line-of-sight constraints


Finding all frequent sequential patterns with a user-

specified least support is the goal of sequential

pattern mining. Usually, the sequential pattern

mining approaches are either generate-and-test (also

known as Apriori) or pattern growth (also known as

divide-and-conquer) or vertical format method

approach [13]. Of the many approaches [15] that

have been proposed in sequential pattern mining

most of them are focused on the following two

issues: (1) enhancing the competency of the mining

process and (2) widening the mining of sequential

patterns to other types of time related patterns [16].

The retailing industry problems motivated the issue

of sequential patterns discovery. However, the

results are applicable to numerous scientific and

business domains, like stocks and markets basket

analysis, natural disasters (e.g. earthquakes), DNA

sequence analyses, gene structure analyses, web log

click stream analyses, and so on [18]. Time is the

most important feature for this task, mainly when the

results are necessary in a limited period of time [17].

In many cases, sequential pattern mining still faces

hard challenges in both efficacy and competence,

nevertheless efficiency of mining the whole set of

sequential patterns has been enhanced considerably.

On the one hand, in a large database there could be a

huge quantity of sequential patterns. Only a small

subset of such patterns often interests a user. By

presenting the complete set of sequential patterns

the mining result would be tough to understand and

hard to employ [22].To optimize the cost of the

interesting sequential patterns Genetic Algorithm

(GA) is employed. GA optimizers are vigorous and

they function well with discontinuous and non

differentiable functions where the customary local

optimizers fail. Processes such as genetic

combination, mutation, and natural selection in a

design based on the concepts of evolution are used

by the optimization techniques.

Even efficient algorithms that have been proposed

for mining, it can be found that mining large amount

of sequential patterns from huge databases is a

computationally expensive task. In this work, an

effective data mining system that generates the

optimum sequential pattern is proposed. The main

aim of the exploration is to develop a utility

considered RFID data mining technique. It is

intended to discover an optimum sequential pattern

based on their utility. The rest of the paper is

organized as follows: section 2 describes some of


the recent related works. Section 3 briefs about GA

and section 4 details about the proposed method,

optimization of sequential patterns using GA.

Experimental results and analysis of the proposed

methodology are discussed in Section 5. Finally,

concluding remarks are provided in Section 6.

Numerous researches have been proposed by

researchers for an effective data mining process. In

this section, a brief review of some important

contributions from the existing literature is


For frequent item set mining that identifies high-

utility item combinations an algorithm was

presented by J. Hu and A. Mojsilovic [18]. In

difference to the customary association rule and

frequent item mining methods, the objective of the

algorithm was to locate segments of data, defined

through combinations of some items (rules), which

gratify certain conditions as a group and maximize a

predefined objective function. They devise the task

as an optimization problem, present a competent

estimation to resolve it by specialized partition trees,

called High-Yield Partition Trees, and examine the

functioning of diverse splitting strategies. The

algorithm was tested on “real-world” data sets, and it

accomplished very good results.

For numerous sequential pattern mining

applications, Jian Pei et al [19] proposed that the

Constraints were vital. Nevertheless, no systematic

study was available on constraint-based sequential

pattern mining. In their paper, that issue was

investigated and it was pointed out that the

framework which was developed for constrained

frequent-pattern mining did not fit our mission well.

On the basis of a sequential pattern growth

methodology an extended framework was

developed. Their study illustrates that under this

new framework the constraints can be effectively

and efficiently pushed deep into the sequential

pattern mining. Furthermore, their framework can

be extended to constraint-based structured pattern

mining as well.

A methodology with two processes for sequence

classification that utilizes sequential pattern mining

and optimization was presented by Themis P.

Exarchos et al. [21]. In the first stage, a series

classification model, which was found on a set of

sequential patterns, was defined and two sets of

weights one for the patterns and the other for classes

were set up. In the second stage, by employing an

optimization technique the weight values were

assessed to achieve best classification precision. By

altering the number of sequences, the number of

patterns and the number of classes, extensive

appraisal was done on the methodology, and it has

compared with similar sequence classification


Data mining is a well accepted verity that the

process of data mining produces numerous patterns

from the given data and it was proposed by

S.Shankar et al. [22]. The procedure of discovering

frequent item sets and association rules were the

most important tasks in data mining. For mining

frequent item sets and association rules several

competent algorithms were attainable in the

literature. In recent years incorporating utility

considerations in data mining tasks was gaining

fame. The business value has been improved by

certain association rules and these rules of interest

were accredited by the data mining community over

a long time. The discovery of frequent item sets and

association rules from transaction databases

benefits numerous business applications. A

complete survey and study of a variety of techniques

in existence for frequent item set mining,

association rule mining with utility considerations

have been proposed in their paper.

Mining Sequential Patterns in large databases has

become a vital data mining task with broad

applications and this was described by Mourad

Ykhlef and Hebah ElGibreen [23]. In the field of

data mining it was an important task, which

describes potential sequenced relationships among

items in a database. Numerous diverse algorithms

were introduced for their task. The precise optimal

Sequential Pattern rule were found by the



conventional algorithms but particularly when they

were applied on large databases it takes a long time.

Currently, some evolutionary algorithms, namely

Particle Swarm Optimization and Genetic

Algorithm, were proposed and have been applied to

solve their problem. A new variety of hybrid

evolutionary algorithm that combines Genetic

Algorithm (GA) with Particle Swarm Optimization

(PSO) to mine Sequential Pattern was introduced in

their paper, so as to enhance the pace of evolutionary

algorithms convergence. Their algorithm was

referred to as SP-GAPSO.

A search and optimization technique which is

inspired by nature's evolutionary processes is

genetic algorithm (GA). A population of candidates

iterates through multiple generations of selection,

crossover, and mutation until an optimized solution

survives, much in the manner of “survival of the

fittest”. GAs are computer based optimization

techniques that employs the Darwinian evolution of

nature as a model [24]. The work of Holland (1975)

obtained a huge popularity for them. Usually, they

are employed for problems, which have an immense

and complex search space with an increased number

of local optimums [27]. The strength behind GAs is

the fact that the search space is traversed in parallel

by arbitrarily generating solutions and those

solutions are endlessly evaluated with a fitness

function [25]. Generally, three different search

phases are there in GA: (1) creating an initial

population; (2) Evaluating the population by a

fitness function; (3) producing a new population

[21]. In GA, the solutions are termed as individuals

or chromosomes [27]. The genetic search starts with

an arbitrarily generated population inside which, a

fitness function evaluates every individual.

The individuals of existing and following

generations are duplicated or eliminated on the basis

of the fitness values. By applying GA operators

further generations are produced [21] i.e.

reproduction, crossover and mutation which are

sequentially applied to each individual with certain

probabilities [23], [22]. The first operator which is

the production operator (elitism) produces one or

more copies of any individual that posses a high

fitness value; or else, the individual is detached from

the solution pool [29]. Two randomly chosen parent

individuals are taken by the crossover operator as

input, and then they are combined and they generate

two children. This process of combining takes place

by choosing two crossover points in the strings of

the parents and then exchanging the genes between

these two points [26]. The mutation of individuals

through the alteration of parts of their genes is the

next step in each generation [30]. Mutation brings

inconsistency into the population of the succeeding

generation by altering a gene of a chromosome.

Making sure that the search algorithm is not bound

on a local optimum is its main goal [22]. It is used to

make sure that all likely alleles can go into the

population and hence preserve the population

diversity [21]. It is a very important component of

GAs and to produce diversity for GAs it is a

variation operator [28].

By means of a novel data cleaning, transformation

and loading technique the RFID data has been

effectively warehoused, which was dedicatedly

proposed for RFID data. The previous works

illustrated that the required knowledge from the

warehoused RFID data was efficiently mined by the

proposed novel RFID data mining system. The

present work is intended to discover an optimum

sequential pattern on their cost, termed as utility

assigned. To identify the optimal sequential pattern

the GA-based technique is employed. After the

fuzzy rules are created from the sequential patterns,

the optimal sequential patterns are recognized by the

GA based method as per their utility assigned. The

sequential pattern with maximum profit is

discovered by the fitness function of the GA. For

easy understanding of the proposed mining system

the optimal sequential pattern of RFID data is

briefed in the following sub-section, prior to detail

the proposed mining system.












In this paper, we have presented a data mining

system for mining the information that are

applicable to the type of movement of the tags,

which are attached to the warehouse goods. The

proposed mining system mined knowledge from the

warehoused data by generating I-dataset, mining

sequential patterns and then by generating fuzzy

rules from the sequential patterns. After that, on the

basis of their assigned utility, the sequential patterns

are optimized by using GA. The outcome of the

system, optimized fuzzy rules with corresponding

profit, has detailed the type of the tag movement

with a fuzzy score. Given a part of the tag (indirectly

it refers to a product) movement, the fuzzy rules

clasp the persisting path of the tag (product). In this

manner, diverse length combinations of the tags

have been taken into consideration and their

movement has been understood. The movements

are considered only for some important tags and

combinations and not for all tags and their

combinations. From the implementation results and

comparative analysis, we observed that our

proposed system will efficiently identify the

optimum sequential pattern. So, with the help of the

presented optimized data mining system, tracking

of goods in large warehouses can be executed

efficiently. As we only concentrated on the

optimized sequential patterns the cost of mining the

sequential patterns is minimized. The extracted

information would be helpful for warehouse


1. Bin Li and Dennis Shasha,

ACM SIGMOD Record, Vol.27, No.2,

pp.541-543, June 1998.

2. Anand, Bell and Hughes,

Data and Knowledge Engineering,

Vol.18, No.3, pp.189-223, 1996.

3. Agrawal, Imielinsk and Swami,


Transaction Knowledge and Data Engineering,

vol. 5, no. 6, pp. 914-925, 1993.

4. Chen and Liu,


Journal of Business Intelligence and Data

Mining, Vol.1, No.1, pp.4-11, 2005.

5. Yashpal Singh and Alok Singh Chauhan,

Journal of

Theoretical and Applied Information

Technology, Vol.5, No.6, pp.36-42,2009.

6. C.M. Roberts,

Computers & Security, Vol.25,pp. 18

– 26, 2006.

7. Hatim A. Aboalsamh,

WSEAS Transactions on

Computers, Vol.7, No.8, pp.1352-1361, August


8. Sathiyamoorthi and Murali Bhaskaran,

International Journal of

Recent Trends in Engineering, Vol. 2, No.

3,pp.1-5, November 2009

9. Jayanthi Ranjan and Vishal Bhatnagar,

Journal of

Knowledge Management Practice, Vol. 9, No.

1, March 2008.

10. Michael J. Shaw, Chandrasekar Subramaniam,

Gek Woo Tan and Michael E. Welge,

Decision support systems, Vol.31,

No.1, pp.127-137, May 2001

11. Asghar Sabbaghi and Ganesh Vaidyanathan,

Journal of

Theoretical and Applied Electronic Commerce

Research, Vol. 3, No. 2, p.p. 71-81, 2008, ISSN


12. Asif, Z., Mandviwalla, M.,

Communications of the

Association for Information Systems, Vol. 15,

"Free Parallel Data


"EDM: A general

framework for data mining based on evidence



Mining: A Performance Perspective”,

"Data mining from 1994 to 2004:

an application-oriented review",

"Neural Networks In Data Mining",

"Radio frequency identification


"A novel Boolean

algebraic framework for association and

pattern mining",


Mining for Intelligent Enterprise Resource

Planning System",


Review of Data Mining Tools In Customer

Relationship Management",

"Knowledge management and data mining for


“Effectiveness and Efficiency of RFID

technology in Supply Chain Management:

Strategic values and Challenges”,

"Integrating the

supply chain with RFID: a technical and

business analysis",


No. 24, pp.393-427, 2005.

13. Jian Pei,Jiawei Han, Behzad Mortazavi-Asl,

Jianyong Wang, Helen Pinto, Qiming Chen,

Umeshwar Dayal and Mei-Chun Hsum,


Transactions on Knowledge and Data

Engineering, Vol. 16, No. 10, pp.1-17, October


14. M.S. Chen, J. Han, P.S. Yu,


Transactions on Knowledge and Data

Engineering, Vol.8, No.6,pp.866– 883, 1996.

15. Yen-Liang Chen and Ya-Han Hu,

Decision Support Systems, Vol. 42, pp. 1203-

1215, 2006.

16. Kuen-Fang Jea, Ke-Chung Lin and I-En Liao,


Journal of Innovative Computing, Information

and Control, Vol.5, No.8,August 2009.

17. Dhany Saputra, Dayang R.A.Rambli and Oi

Mean Foong,

International Journal of

Computer Science and Engineering, Vol. 2,

No.2, pp.49-554, 2008.

18. J. Hu and A. Mojsilovic,

Pattern Recognition, Vol. 40, pp.

3317 – 3324, 2007.

19. Jian Pei, Jiawei Han and WeiWang,

Journal of

Intelligent Information Systems,Vol.28

,No.2,pp.133 -160,April 2007.

20. Shigeaki Sakurai, Youichi Kitahara and Ryohei


International Journal of Computational

Intelligence, Vol. 4, No.4, pp.252-260, 2008.

21. Themis P. Exarchos, Markos G. Tsipouras,

Costas Papaloukas and Dimitrios I. Fotiadis, "Atwo-stage methodology for sequenceclassification based on sequential patternmining and optimization", Data & KnowledgeEngineering, Vol.66, pp.467–487, 2008.

22. Shankar and Purusothaman, "

International Journal of Soft ComputingApplications, Vol.10, No.4, pp.81-95, 2009.

23. Mourad Ykhlef and Hebah ElGibreen,

World Academy of Science,Engineering and Technology,Vol.60,pp.863-870,2009.

24. Jyothi Pillai and O.P.Vyas,

International Journal of Computer Applications(0975 – 8887), Vol. 5, No.11, pp.9-13,August2010.

25. M. Sedighizadeh and A. Rezazadeh,

World Academy of Science,Engineering and Technology, Vol. 37, 2008.

26. P. Radhakrishnan, V.M. Prasad and M.R.Gopalan,

Journal of Computer Science,Vol. 5, No. 3, pp. 233-241, 2009.

27. Basheer M. Al-Maqaleh and Kamal K.Bharadwaj,

World Academy of Science, Engineering andTechnology, vol. 11, pp. 43-46, 2005.

28. Timo Mantere,

"Mining Sequential Patterns by Pattern-

Growth: The PrefixSpan Approach",

Data mining: an

overview from a database perspective,


based sequential pattern mining: The

consideration of recency and compactness",

"Mining hybrid sequential patterns by

hierarchical mining technique",

"Mining Sequential Patterns

Using I-PrefixSpan",

“High-utility pattern

mining: A method for discovery of high-utility

item sets”,

"Constraint-based sequential pattern mining:

The pattern-growth methods",

"A Sequential Pattern Mining Method

based on Sequential Interestingness",

Utility Sentient

Frequent Itemset Mining and Association Rule

Mining: A Literature Survey and Comparative



Sequential Patterns Using Hybrid Evolutionary


"Overview of Itemset

Utility Mining and its Applications",


Genetic Algorithm for Distributed Generation

Allocation to Reduce Losses and Improve

Voltage Profile,”

"Optimizing Inventory Using Genetic

Algorithm for Efficient Supply Chain


"Genetic Programming Approach

to Hierarchical Production Rule Discovery,"

“A Min-Max Genetic Algorithm

with Alternating Multiple Sorting for Solving


Constrained Problems,”

“Improved Off-Line Intrusion Detection Using

A Genetic Algorithm,”

"Selection of RTOS for

an Efficient Design of Embedded Systems,"


Genetic Algorithms With Imperfect And

Subdivided Features For The Automatic

Registration Of Point Clouds (GAREG-ISF),”

"A Comparative Study of Adaptive

Mutation Operators for Genetic Algorithms,"

"The Rank-

scaled Mutation Rate for Genetic Algorithms,”

"A Genetic Algorithm-based Solution for

Intrusion Detection,"

in Proceedings of theNinth Scandinavian Conference on ArtificialIntelligence, 2006.

29. Pedro A. Diaz-Gomez and Dean F. Hougen,

Proceedings of theSeventh International Conference on EnterpriseInformation Systems, 25-28, 2005, pp. 66-73,May 25-28, Miami, USA, 2005.

30. S. Ramanarayana Reddy,

International Journal of Computer Science andNetwork Security, Vol.6 No.6, pp. 29-37, June2006

31. Stefan Schenk and Klaus Hanke,

Proceedings of the 3rd ISPRS InternationalWorkshop, Vol. 38,

32. Imtiaz Korejo, Shengxiang Yang andChangheLi,

inproceedings of the 8th MetaheuristicInternational Conference, July 13–16, 2009.

33. Mike Sewell, Jagath Samarabandu, RangaRodrigo, and Kenneth McIsaac,

International Journal of InformationTechnology, Vol. 3, No. 1, 2006.

34. Zorana Bankovic, José M. Moya,AlfaroAraujo,Slobodan Bojanic and Octavio Nieto-Taladriz,

Journal of InformationAssurance and Security, Vol. 4, pp. 192-199,2009.