data mining tool for effective classification and retrieval o f ... · tasks such as data...
TRANSCRIPT
Data Mining Tool for Effective Classification and Retrieval of
Relevant User Data Using Fuzzy and BSO
1 Antony Rosewelt &
2 Arokia Renjit
1Department of CSE, Stella Mary's College of Engineering, Nagercoil, India
2Department of CSE, Jeppiaar Engineering College, Chennai, India
[email protected]; [email protected]
Abstract –Recently, the data mining techniques are used as a tool to solve the basic
information or data retrieval from large volume of databases such as Data warehouses,
repositories and World Wide Web. The huge volume of user data can be stored in the cloud
repositories and relevant information stored and maintained in Internet. The efficiency of the
data mining tools can be finalized based on the volume of relevant data or information
successfully retrieved from the source. Moreover, the classification process is also playing
major role to identify the right data or information and categorize them for retrieving, storing
and maintaining. For this purposes, we propose a new data mining tool for retrieving the data
effectively by using pre-processing and classification. Here, introduce a new semantic based
data pre-processing technique for effective data pre-processing. Moreover, propose a new
classification algorithm for effective data classification using fuzzy rules and Bees Swarm
Optimization based Information Retrieval algorithm. In addition, group the relevant data and
web pages using the existing k-means clustering algorithm in this work. During the retrieval
process, inter and intra coupling relationships between the data must be analysed by using the
existing semantic model. Here, the common terms for identifying the intra relationship
between the data and the partial order relation used for identifying the intra-relationship
between the data. Finally, the proposed mining tool has been evaluated by using the famous
repositories namelyWeb-docs and Wiki-links and the user’s feedback which are collected
from users by Amazon.
Keywords -Information retrieval, Data mining, Bees Swarm Optimization algorithm, Fuzzy
rules, Clustering, Classification, coupling inter-relationship, coupling intra-relationship.
1. INTRODUCTION
The rapid development of internet and related data, the data or information retrieval related
tools are playing crucial role over the relevant data extraction process. Current internet users
International Journal of Pure and Applied MathematicsVolume 119 No. 16 2018, 1239-1256ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/
1239
are fully depending on the tools even for retrieve the required information or data to improve
their knowledge and business due to the availability of large volume data. The conventional
information retrieval methods are working based on the keywords such as positive and
negative keywords which are useful for identifying the related terms. Even though, the
existing information retrieval methods are not satisfying the internet users fully due to the
availability of semantic challenges like polysemy and the synonymy. These challenges are
called as vocabulary or word mismatch by researchers and academicians (Furnas et al., 1987).
The enormous efforts have been taken by various researchers in the past for addressing the
word mismatch issue like query expansion methods and the lattice based information retrieval
approach for the query transmission. The query expansion generates a new query by
enhancing theaugmented query with new attributes with same meaning where the attributes
are additional keywordsthat extracted from a dictionary like WordNet and the relevance
feedback (Carpineto and Romano, 2012). Otherwise, extra keywords from the original data
sources which are used for expanding the query and concept of lattice based information
retrieval technique can be refined and also expanded the query which exploresthe navigation
search techniques by using the data specificity and the generality relation of the lattice
(Carpineto and Romano, 2005).
The fuzzy logic is used for overcoming the uncertainty issues through the development of
formal concept analysis. The standard uncertainty issues like data vagueness and the implicit
information over the relevant queries and the related documents for retrieving the relevant
data. Many fuzzy logic and lattice based techniques were proposed for handling these issues
by various researchers in the past using formal concept analysis (Poelmans et al., 2014;
Kumar et al., 2015).Many existing methods adopted the concept partial order relation of the
concepts which are available in the web, databases and repositories for computing the inter
and intra relationships between the various concepts, related web documents and data’s
available in repositories and returnsthe related web documents or data for the given user
query. However, these all methods are neglecting the semantic data between the concepts like
common objects and the attributes of concepts. Finally, the data coupling relationship
between the conceptsthat consisting of common object, common attribute andthe partial order
relationship of concept that is neglected. Moreover, coupling relationship is demonstrated
that its significant value which is used to improve the existing analysis and also the learning
International Journal of Pure and Applied Mathematics Special Issue
1240
tasks such as data clustering, data classification, recommendation systems, queries and outlier
term detection process (Pang et al 2016).
In this work, a new data mining tool is proposed for retrieving the data effectively by using
data pre-processing and the data classification process. Moreover, a new semantic approach is
also proposed for effective data pre-processing. In addition, a new classification algorithm
has been proposed for effective data classification using fuzzy rules and the existingBees
Swarm Optimization based Information Retrieval method. Moreover, an existing clustering
algorithm called k-means clustering algorithm is used for grouping the data effectively based
on the relevancy score. The relevancy is also considered in this work as inter and intra
coupling relationships between the data with analysis by using the existing semantic model.
The partial order relationship is used in this work for identifying the relationship level
between the data. Finally, the proposed data mining tool has been tested with the data or
information or feedback which is collected from the famous repositories namely Web-docs
and Wiki-links and the user’s feedback.
The rest of this paper is organized as follows: Section 2 discussed in detail about the existing
data mining tools which are developed by researchers in this direction in the past. Section 3
explains the overall proposed system architecture. Section 4 described in detail about the
proposed semantic based data pre-processing, clustering and the data classification process.
Section 5 gives conclusion and the future works in this direction.
2. LITERATURE SURVEY
There are many works have been done in the direction of semantic based data pre-processing,
information retrieval, data clustering and data classification (Arokia Renjit and
Shanmuganathan 2010 and 2011) by the various researchers in the past. Among them,
Youcef et al (2018) exploredthe advances of data mining techniques for solving the basic
document retrieval problem. In their technique, they discovered the useful data by using data
mining techniques and also used the knowledge for exploring the full documents efficiently.
They have investigated the two different techniques such as data pre-processing, clustering
process and Bees Swarm Optimization for exploring the clustered and grouped documents
deeply.Their approach improved the quality of retrieved relevant documents reasonably in
less time. Shufeng et al (2018) introduced a new framework which is based on lattice and the
coupling relationship analysis. Their framework employs the formal concepts which are
International Journal of Pure and Applied Mathematics Special Issue
1241
extracted by using the fuzzy formal concept analysis forrepresenting the queries and the
documents.They also find the coupling relationship analysis such as intra and inter concept
that are applied to rank the web documents.
Fabricio et al (2015) investigated that use of a bi-clustering method for capturing the local
methods of the coherence across the subsets of records and the available fields. They have
solved the dimensionality problem and reduced the redundancy of correlated features and
also improved the separability and the classification accuracy.
Thiago et al (2018) developed a new supervised classifierthat appliedthe
limitationprobabilities of the random walk theory on underlying networks which are
constructed from the input labelled data. They also demonstrated that the examples that
combines the low and high level attributes in their classifier.
Fuji (2009) exploredfew main areas of the information retrieval which are in advanced level.
The authors concentrate that related to the cross lingual, multimedia and the semantic based
information retrievals. Here, the cross lingual based information retrieval deals with rising
queries in one kind of language and also retrieve the related web documents in various
languages. In their work, the semantic based data or information retrieval that goes beyond
the level of surface data orthe related information by using the concepts which are
represented in web documents and also the user queries for improvingthe retrieval process.
Antonio et al (2018) considered as an initial point which has a new strategy based on the
clustering process. They improved the performance by solving the major issues which are
related to the records that located in near to the cluster boundaries by enlarging the size and
also consideredthe use of Deep Neural Networks that are used for learning a suitable
representation for the classification task.They achieved the reasonable classification accuracy
over the eight different datasets.
Mao et al (2013) addressed the problems such as to find the relevant documents, complicated
in use of languages, ambiguous in language and the result inaccuracy. They developed a new
semantic based content mapping technique for the information retrieval model. Their new
model employs the standard semantic features and an ontological structure for constructing a
new content map. Their model improved the accuracy of the relevant document or data
retrieval results.
International Journal of Pure and Applied Mathematics Special Issue
1242
Olga et al(2017) described an effective method called PolaritySim for determining the word
level contextual polarity that uses readily available consumer rated reviews as the only
external resource.
Preben et al (2005) investigated that the expressions of collaborative activities within the
information searching and the data retrieval processes. They also presented empirical
experimental results from a real world life and also the information setting within the domain.
Moreover, they also categorise and also related to the variousstages in an information
searching and the retrieval processes. Finally, they introduced a new information retrieval
that is an improved information retrieval model in collaborative aspects.
Rabia et al (2006) employed new algorithms forranking the documents automatically. They
merged the information retrieval results of the multiple systems by using the various data
fusion algorithms and alsouse the top-ranked documents which are relevant and also
employed these relevant documentsfor evaluating and ranking the methods. Moreover, they
also introduced a new approach for the selection of information retrieval systems that are to
be used for effective data fusion. Finally, the authors proved that their method perform well
than the existing automatic ranking techniques.
Goran et al (2014) presentedthe new methods to retrieve the document and also summarized
the multi-documents.Their method measures the similarity between the queries and the web
documents which combines the graph kernels on event graphs. Their model achieved the
better clustering performance and the relevant multi-document summarization.
Antonio et al (2010) proposed anew algorithm for refining the ontologies that are used for
relevant information retrieval tasks with the preliminary positive results. Andrea et al (2012)
presentedtheir experience in using X.MAS that is a generic multi-agent architecture which
aimed at the process of relevant information retrieval, data filtering and also reorganizing the
information based on the user requests. Tatiana et al (2013) describedin detail about the basic
theories of human development that used to explain the specifics of young users such as their
cognitive skills, fine motor skills, knowledge, memory and emotional states in so far as they
differ from those of adults.
Sairamesh et al (2015) proposed a new algorithm to infer the user interests that are based on
the user queries and the fast profile logs and also to provide the relevant information which is
based on the user personalization. Moreover, they introduced a new classifier for classifying
International Journal of Pure and Applied Mathematics Special Issue
1243
the data and also apply a new ranking algorithm for categorizing the relevant data.
Kulunchakov et al (2017) proposed a novel approach for constructing new ranking algorithms
for effective relevant information or data retrieval. Mehrbakhsh et al (2018) proposed a new
recommendation systemthat is based on the ontology and the dimensionality reduction
techniques forimproving the sparsity and the scalability problems.
Obada et al (2017) proposed a novel method by using fuzzy logic for developingthe tasks,
user profiles and documents to model the user relevant information searching behaviour. The
feedback relevancy is also calculated and considered in this work by using a linear regression
model that used to predict the web document relevancy based on the implicit relevance
indicators. Moreover, the fuzzy rule based summarisation was also used for integrating the
profiles. The overall performance of their method was evaluated based on the evaluation
metrics such as precision and recall metrics that shows the significant improvements in the
relevant information retrieval based on the user queries.
3. SYSTEM ARCHITECTURE
The overall architecture of the proposed system developed for analysing the web data and
documents in this work is shown in Figure 1. It consists of six modules such as web
documents/ feedback data, a user interface module, an intelligent data mining tool, a rule
manager, a rule base and results.
Figure 1.System Architecture
User Interface Module
Intelligent Data Mining
Tool
Data Pre-processing
Rule Manager
Document Clustering
Data Classification
Rule
Base Result
Web
Documents/
Feedback data
International Journal of Pure and Applied Mathematics Special Issue
1244
The web documents and feedback data consists of large volume of web documents and also
the feedback data that are available in amazon website and cloud repositories. The collection
of web documents and web data like feedback data that have been considered as input dataset
in this work. The user interface module collects the necessary web documents and web data
like feedback data from business websites like amazon. The Intelligent Data Mining tool
consists of three sub modules such as data pre-processing, clustering and data classification.
Here, the data pre-processing sub module is taken care of removing the noisy data, null data
and meaningless data. The clustering module is responsible for grouping the relevant data or
relevant web documents using the existing k-means clustering algorithm. The classification
sub module is responsible for categorizing the data or documents effectively by using
intelligent fuzzy rules. The rule manager manages the fuzzy rules and interacts with data
mining tool and rule base. It stores and retrieves the fuzzy rules over the knowledge base. The
rule base stores all kinds of fuzzy rules which are useful for categorizing the feedback data
and for classifying the web documents. The proposed model refers to the rule base built
around user queries. The result module holds the resulted documents or feedback data of the
user query.
4. PROPOSED WORK
In this section, we discussed in detail about the proposed model which is the combination of
data pre-processing, clustering and classification. In the proposed model, a new semantic
based data pre-processing technique is proposed in this work for identifying the original and
useful data for the analysis. Moreover, an existing clustering algorithm is used for grouping
the relevant data or web documents for further analysis quickly. In addition, a new classifier
is also proposed for effective data or document classification. This section is categorized into
three subsections such as semantic based data pre-processing, K-Means clustering and Fuzzy
Rule and BSO based Classification.
4.1 Semantic based Data Pre-processing
The main aim of data pre-processing is to enhance the capability of the existing data mining
tools which are used for extracting the relevant data or documents that is used in this work
later by the proposed fuzzy rule and BSO based classifier. Here, it removes the unnecessary
data like null values from the input dataset. Moreover, it checks the availability of semantic
data or content which are available in the dataset. In this proposed data pre-processing phase
International Journal of Pure and Applied Mathematics Special Issue
1245
is responsible to tokenize the input content, check the grammar and checks the content is
semantically correct or not.
4.2 K-Means Clustering
The K-means algorithm is used in this work for grouping the ‘n’ points into ‘k’ subsets 𝑆𝑢𝑏𝑗 .
Every subset of the clusters that are having the 𝑛𝑠𝑗 number of data points in a cluster. First,
the data points (𝑛𝑠𝑗 ) that are assigned randomly to the k number of clusters and also the
centroid point is also calculated for each cluster. Then, each centroid point is also assigned
for the cluster whose point is very close to that centroid point. The above mentioned steps are
repeated when there is no assignment further of the data points that are to the clusters. In this
work, the adaptation of K-means clustering algorithm to the proposed work in two steps. In
first step, the web data weightage is assigned for all the documents with individual words
weightage. The term frequency and relevancy score are also calculated based on the words
weightages and the occurrence of a word in a document is calculated in step 2 of this work.
4.3 Fuzzy Rule and BSO based Classification
In this subsection, a new fuzzy rule and BSO based classifier is explained in detail that is
incorporating with the proposed intelligent data mining tool which is developed for effective
relevant information retrieval from repositories. The proposed classifier is the combination of
the existing BSO based classification algorithm that is developed by [1] and the necessary
rules have been incorporated for making effective decision over the retrieval process on web
data.
4.4 Intelligent Data Mining Tool for Information Retrieval
In this work, a new and intelligent data mining tool has been designed for relevant data
retrieval from the large volume of databases and the cloud repositories. Here, we have used a
semantic based data pre-processing, clustering and classification techniques for effective data
retrieval. This tool has three different phases for taking care these three different activities in
this tool.
International Journal of Pure and Applied Mathematics Special Issue
1246
Input: WebDocuments
Output: Relevant Documents
Phase 1: Pre-processing
Step 1: Read the documents one by one from the list DL = (d1, d2, d3….dn)
Step 2: Read first line from the first document Di and also checks the tokens from the
standard metadata.
Step 3: Apply the parser over the line sentence ‘l’.
Step 4: Call the syntax analyser for grammar checking.
Step 5: if the line doesn’t have any grammatical error then
Apply LSA (DLi, S, l)
Else
Correct the grammatical errors and Go to step 5.
Step 6: Create a semantic network for the line by calling the procedure semantic_network()
Step 7: Compare the developed semantic structure for the line and the semantic metadata of
the line in node wise for the whole sentence.
Step 8: If the line is matched semantically with metadata then
Step 9: If the data is not end then
Display the semantic analysis results
Else
Goto Step 13.
Step 10: Apply the procedure Pragmatic_Analysis()
Step 11: If Anaphora is resolved then Goto Step 9
Step 12: Else Checksthe file status
Step 13: If EOF then Stop
Step 14: Else Go To Step 1.
Phase 2: Clustering
Step 1: Set the ‘k’ number of clusters
Step 2: Select the ‘k’ initial center points for the all ‘k’ groups.
Step 3: Weightages are assigned for all the words that are available in the document as a
word representation by using the expert guidelines which are stored in the database.
International Journal of Pure and Applied Mathematics Special Issue
1247
Step 4: Read first document from set of documents
Step 5: Find the Cosine similarity for the words belongs to a first group and store it into m.
Step 6: Checks the cosine similarity of each document words
Step 7: if the cosine similarity of the words is less than that will be considered as minimum
cosine value of the whole data.
Step 8: if any one of the document words are changed the average score of a group then
Stop the process and exit
Else
Find the new center point of each groups which are available in a cluster.
Step 9: Return the clustered document set.
Phase 3: Classification
Step 1: Accept the user request from the users queries
Step 2: Apply the existing classifier called Bees Swarm Optimization based Information
Retrieval algorithm over the clustered documents that are available in the document
list.
Step 3: Provide the relevant content or data to the user and apply fuzzy rules.
Step 4: Map the semantic fuzzy rules and the nodes that are available in the newly
constructed semantic tree nodes.
Step 5: If the nodes and rules are matched then
Produce all the relevant contents.
End if
Step 6: Call the procedure for retrieving the exact contents.
The proposed data mining tool performs three different actions such as semantic based pre-
processing, k-means clustering for data and grouping the documents and the fuzzy rule based
BSO-IR for effective relevant data from the databases.
5. RESULTS AND DISCUSSION
International Journal of Pure and Applied Mathematics Special Issue
1248
This section described in detail about the test bed that is used to evaluate the proposed data
mining tool which is used for retrieving the relevant data or documents from the web or
repositories. Here, the famous performance metrics are used to measure the performance of
the proposed data mining tool which is used for retrieving the relevant data. The experiments
have been conducted using the web documents which are containing the product review as a
feedback about the product or company and the CSV file which contains the user feedback
about amazon products. The Java program was used for implementing the data mining tool.
The prediction accuracy over the documents or user data has been calculated in this work
using the following metrics such as precision, recall and F-measure which are defined below:
𝑃𝑅𝐸𝐶𝐼𝑆𝐼𝑂𝑁 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 (1)
𝑅𝐸𝐶𝐴𝐿𝐿 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑡𝑟𝑖𝑣𝑒𝑑 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚 𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑖𝑛 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑖𝑜𝑛 (2)
𝐹 −𝑀𝐸𝐴𝑆𝑈𝑅𝐸 = 2 × 𝑃𝑅𝐸𝐶𝐼𝑆𝐼𝑂𝑁 ×𝑅𝐸𝐶𝐴𝐿𝐿
𝑃𝑅𝐸𝐶𝐼𝑆𝑂𝑁 +𝑅𝐸𝐶𝐴𝐿𝐿 (3)
The five experiments have been conducted for evaluating the proposed algorithm over the
web documents. Figure 2 shows the performance analysis of the proposed data mining tool
which is developed for retrieving the relevant information from the available documents in
database. This paper considered the different levels of web documents such as 600, 700, 800,
900 and 1000.
International Journal of Pure and Applied Mathematics Special Issue
1249
Figure 2. Performance Analysis of the data mining tool over the web documents
From figure 2, it can be seen that the performance of the proposed data mining tool which is
the combination of data pre-processing, clustering and classification activities. The proposed
tool accuracy is significantly changed based on the number of sentences considered for
conducting the experiments.
Figure 3 shows relevancy score analysis between the proposed IRA and the proposed IRA
with semantic indexing. Here, five different numbers of documents are considered for
experiments such as 50, 100, 150, 200, 250 and 300. These documents contain the different
levels of documents.
Figure 3. Relevancy Score Analysis between IRA and IRA with Semantic Indexed table
97
97.5
98
98.5
99
99.5
100
600 700 800 900 1000
Acc
ura
cy (
%)
No. of Web Documents
Performance Analysis
Without Semantic Indexing
With Semantic Indexing
9898.198.298.398.498.598.698.798.898.9
99
100 200 300 400 500
Re
leva
ncy
Sco
re (
%)
No. of documents
Relevancy Score Analysis
IRA
Semantic Indexed table+IRA
International Journal of Pure and Applied Mathematics Special Issue
1250
From figure 3, it can be observed that the performance of the proposed semantic index table
with IRA is better than the without semantic index table.
Figure 4 shows the relevance score analysis between the proposed IRA, FTCM algorithm
with semantic index table, FTCM and FTCM. Here, this work has considered the various
documents such as 50, 100, 150, 200, 250 and 300 for experiments.
Figure 4. Relevancy score Analysis between the proposed system and FTCM
From figure 4, it can be observed that the performance of the proposed system is better than
the existing FTCM algorithm.
Table 1 shows the performance of the proposed system and the existing systems. It consists
of precision, recall and f-measure values for the proposed system and the existing systems.
Table 1. Performance Analysis
Method Name Precision Recall F-Measure Accuracy
(%)
FCM 93.21 92.98 93.98 93.41
FTCM 98.24 98.52 98.75 98.56
FTCM+IRA 98.93 98.96 99.02 98.98
FTCM+IRA+ Semantic
Index Table 99.45 99.65 99.78 99.72
95
95.5
96
96.5
97
97.5
98
98.5
99
99.5
100
100 200 300 400 500
Re
leva
ncy
Sco
re (
%)
No. of documents
Relevancy Score Analysis
FTCM
Semantic Indexed table+IRA+FTCM
International Journal of Pure and Applied Mathematics Special Issue
1251
From table 1, it is observed that the precision, recall, F-measure and the percentage of
accuracy for the proposed system is higher than the existing systems. This is due to the fact
that the proposed system provides a semantic index table which is useful for performing
syntax and semantic oriented ordering of documents.
Figure 5 shows the accuracy analysis between the proposed recommendation system and the
existing model. Here, this system considered various documents such as 50, 100, 150, 200,
250 and 300 for experiments.
Figure 5. Accuracy Analysis between the proposed data mining tool and the existing
model
From figure 5, it can be observed that the accuracy analysis of the proposed recommendation
system is better than the existing model by more than 2%. This is due to the fact that the use
of effective semantic based pre-processing, effective document clustering and the fuzzy rule
base BSO-IR.
6. CONCLUSION AND FUTURE WORKS
A new data mining tool is developed in this work for retrieving the data effectively by using
pre-processing and classification. Here, a new semantic based data pre-processing technique
is also proposed and implemented for effective data pre-processing. Moreover, a new
classification algorithm is also proposed and implemented for effective data classification
using fuzzy rules and Bees Swarm Optimization based Information Retrieval Method.
Moreover, group the relevant data and web pages using the existing k-means clustering
90
92
94
96
98
100
50 100 150 200 250 300
Acc
ura
cy (
%)
No. of Documents
Recommendation Accuracy Analysis
Existing Model
Proposed Recommendation System
International Journal of Pure and Applied Mathematics Special Issue
1252
algorithm in this work. During the retrieval process, inter and intra coupling relationships
between the data must be analysed by using the existing semantic model. Here, the common
terms for identifying the intra relationship between the data and the partial order relation used
for identifying the intra-relationship between the data. Finally, the proposed mining tool has
been evaluated by using the famous repositories namely Web-docs and Wiki-links and the
user’s feedback which are collected from users by Amazon.Future works in this direction
could be the introduction of new document clustering algorithm for enhancing the
performance of the classifier which is used for retrieving the relevant documents from web
database.
REFERENCES
1. Youcef Djenouri, Asma Belhadi, Riadh Belkebir, "Bees swarm optimization guided by
data mining techniques for document information retrieval", Expert Systems With
Applications, Vol. 94, pp.126–136, 2018.
2. Shufeng Hao, Chongyang Shi, Zhendong Niu, Longbing Cao, "Concept coupling
learning for improving concept lattice-based document retrieval", Engineering
Applications of Artificial Intelligence, Vol. 69, pp. 65–75, 2018.
3. Fabrício O. de França, André L.V. Coelho, "A biclustering approach for classification
with mislabeled data", Expert Systems with Applications, Vol. 42, pp. 5065–5075, 2015.
4. Thiago Henrique Cupertino, Murillo Guimarães Carneiro, Qiusheng Zheng,Junbao
Zhang, Liang Zhao, "A scheme for high level data classification using random walk
andnetwork measures", Expert Systems With Applications, Vol. 92, pp. 289–303, 2018.
5. Fuji Ren,"Advanced Information Retrieval", Electronic Notes in Theoretical Computer
Science, Vol. 225, pp. 303–317, 2009.
6. Antonio-Javier Gallego, Jorge Calvo-Zaragoz, Jose J. Valero-Mas, Juan R. Rico-Juan,
"Clustering-based k -nearest neighbor classification for large-scale datawith neural codes
representation", Pattern Recognition, Vol. 74, pp. 531–543, 2018.
7. Mao-Yuan Pai, Ming-Yen Chen, Hui-Chuan Chu, Yuh-Min Chen, "Development of a
semantic-based content mapping mechanism forinformation retrieval", Expert Systems
with Applications, Vol. 40, pp. 2447–2461, 2013.
8. Olga Vechtomova, "Disambiguating context-dependent polarity of words:
Aninformation retrieval approach", Information Processing and Management, Vol. 53,
pp. 1062–1079, 2017.
International Journal of Pure and Applied Mathematics Special Issue
1253
9. Preben Hansen, Kalervo Jarvelin, "Collaborative Information Retrieval in aninformation-
intensive domain", Information Processing and Management, Vol. 41, pp. 1101–1119,
2005.
10. Rabia Nuray, Fazli Can, "Automatic ranking of information retrieval systemsusing data
fusion", Information Processing and Management, Vol. 42, pp. 595–614, 2006.
11. Goran Glavaš, Jan Šnajder, "Event graphs for information retrieval and multi-
documentsummarization", Expert Systems with Applications, Vol. 41, pp. 6904–6916,
2014.
12. Obada Alhabashneh, Rahat Iqbal, Faiyaz Doctor, Anne James, "Fuzzy rule based
profiling approach for enterpriseinformation seeking and retrieval", Information
Sciences, Vol. 394–395, pp. 18–37, 2017.
13. A.S. Kulunchakov, V.V. Strijov, "Generation of simple structured information retrieval
functions bygenetic algorithm without stagnation", Expert Systems with Applications,
Vol. 85, pp. 221–230, 2017.
14. Andrea Addis, Giuliano Armano, Eloisa Vargiu, "Multiagent systems and information
retrieval our experience with X.MAS", Expert Systems with Applications, Vol. 39, pp.
2509–2523, 2012.
15. Antonio Jimeno-Yepes, Rafael Berlanga-Llavori, Dietrich Rebholz-Schuhmann,
"Ontology refinement for improved information retrieval",Information Processing and
Management, Vol. 46, pp. 426–435, 2010.
16. Tatiana Gossen, Andreas Nürnberger, "Specifics of information retrieval for young
users: A survey", Information Processing and Management, Vol. 49, pp. 739–756, 2013.
17. L Sai Ramesh, Sannasi Ganapathy, R Bhuvaneshwari, Kanagasabai Kulothungan, V
Pandiyaraju, Arputharaj Kannan, "Prediction of user interests for providing relevant
information using relevance feedback and re-ranking",International Journal of Intelligent
Information Technologies (IJIIT), Vol.11, No.4, pp. 55-71, 2015.
18. Mehrbakhsh Nilashi, Othman Ibrahim, Karamollah Bagherifard, "A recommender
system based on collaborative filtering using ontologyand dimensionality reduction
techniques", Expert Systems with Applications, Vol. 92, pp. 507–520, 2018.
19. J Arokia Renjit, KL Shunmuganathan, "Network based anomaly intrusion detection
system using SVM",Indian Journal of Science and Technology,Vol. 4, No. 9, pp. 1105-
1108, 2011.
International Journal of Pure and Applied Mathematics Special Issue
1254
20. J Arokia Renjit, KL Shunmuganathan,"Mining the Data from Distributed Database using
an Improved Mining Algorithm", International Journal of Computer Science and
Information Security,Vol. 7, No. 3, pp.116-121, 2010.
AUTHORS BIOGRAPHY
Antony Rosewelt, completed his undergraduate in Electronics and Communication
Engineering in Anna University, post graduate in Computer Science and Engineering in Anna
University. Currently he is working as Assistant Professor in the Department of Computer
Science and Engineering, Stella Mary's College of Engineering. His research interests are
Data Mining, Soft Computing and Ad-hoc Networks.
J. Arokia Renjith has completed his undergraduate in Electrical and Electronics Engineering
in Bharathiyar University, post graduate in Computer Science and Engineering in Anna
University and PhD in Computer Science and Engineering at Sathyabama University. He is in
the teaching field for the past 16 years.Currently he is working as Professor and Head in the
Department of Computer Science and Engineering, Jeppiaar Engineering College.His
research interests are Data Mining, Cloud Computing and Network Security.
International Journal of Pure and Applied Mathematics Special Issue
1255
1256