research paper ajcst 034 18

Advanced Journal of Computer Science and Engineering (AJCST)

ISSN : 2393-8390 (O) AJCST Vol. 4, Issue 3, April – June 2015

A Detailed Survey of Decision Trees Classifiers Chanchal

Dept. of Computer Science and Technology R P Inderprastha Institute of Technology(RPIIT)

[email protected]

ER. Namita Dept. of Computer Science and Technology

R P Inderprastha Institute of Technology(RPIIT)

Abstract— Decision Trees have played a significant role in data mining and machine learning for many years. They create white-box classification and regression models and this can be useful for feature selection and sample prediction. The transparency of these models is a big advantage over black-box learners, in that the models are easy to understand and translate, and that they may be readily extracted and implemented into any programming language (with nested if-else statements) for use in production environments. Furthermore, decision trees need very little data preparation (in other words. normalization) and will handle both numerical and data that are nominal/categorical. Decision Trees may also be pruned or bundled into ensembles of trees (i.e. arbitrary forests) in order to remedy over-fitting and enhance prediction accuracy. This paper Surveys Decision Tree based learning Algorithms.

Index Terms—Data Mining, Reduced Error Pruning, Decision Trees, Classification.

I. INTRODUCTION

Decision trees are a simple, but powerful form of multiple variable analysis[1]. They contacts between inputs and targets. Instructions may be selected and used to produce the • traditional analytical types of evaluation (such multiple linear regression) adjustable. illustrated in Figure 1, which demonstrates your option tree can mirror both a consistent A tree may be "learned" by splitting the inspiration set into subsets based on an element value test. This process is repeated for every derived subset in a recursive manner labeled as recursive partitioning.

Fig 1: An overview of Decision Trees in Machine learning and Data Mining

The recursion is completed when the subset at a node gets the same well worth of this unbiased adjustable, or whenever splitting no a lot longer adds cost towards the forecasts. This procedure of top-down induction of decision trees is a good example of a greedy algorithm; also it is considered the most common technique for discovering choice forests from information. supply special capabilities to product, complement, and substitute for a number of data mining tools and practices (such as for example neural networks) records, fields, and industry values which are observed in the product of analysis. The discovery of Decision tree discovering uses a decision tree as a predictive model which maps findings about something to conclusions regarding the product's target value. It is one connected with the predictive modelling approaches used in data, data mining and device learning. Tree types when the target variable usually takes a finite pair of values are known as group trees. In these tree structures, leaves represent class labels and branches represent conjunctions of functions that lead to those class labels.

Decision trees where the target variable usually takes continuous values (typically genuine numbers) are known as regression trees. option tree, which provides a sensible way to visually analyze and describe the tree-like predict the values of the latest or unseen findings that have values for the inputs, but the target area. The prospective field can be called a result, response, or dependent area or • recently developed multidimensional forms of reporting and analysis present in the method that extracts the cooperation involving the object of evaluation (that acts as the In choice analysis, a decision tree could possibly be utilized to visually and explicitly express choices and decision-making. In information mining, a determination tree describes data not alternatives; instead the resulting classification tree can be an input for decision-making. This page addresses decision trees in information mining. target area into the information) and a number of fields offering as feedback areas to build up the A decision tree is a simple representation for classifying examples. Decision tree finding is undoubtedly one of abdominal muscles successful approaches for supervised category learning. Due to this section, assume that all regarding the features have finite discrete domains, and there's undoubtedly a single target feature called the classification. Each part of the domain connected with group is known as a class. A determination tree or a classification tree is a tree in which each inner (non-leaf) node is labeled with an input function. The arcs originating from a node labeled with a feature are labeled with every certainly one of the feasible values for the feature. Each leaf of the tree is labeled with a class or a probability distribution within the classes. The general form of this modeling strategy is illustrated in Figure 1. Since soon as the and categorical item of analysis. The screen using this node reflects all



the knowledge set your option guideline to form the limbs or sections underneath the root node is according to a may really not contain values for the targets.

Decision trees are generally a straightforward, but effective kind of a few adjustable analysis. They root node as an easy, one-dimensional show in the choice tree pc software. The name of In information mining, choice trees are explained also as the mixture of mathematical and computational techniques to help the information, categorisation and generalisation of a provided set of information. the business of information that is the item of evaluation is normally presented, combined with the scatter or neighborhood of relationships that characterize the feedback and target values. Choice maxims can originates with a root node towards the top of the tree. The item of evaluation is mirrored in this set into branch-like parts. These segments form an inverted option tree that Decision tree learning is a method widely used in data mining.[2] The target is to create a design that predicts the worth of a target variable based on several input factors. An illustration is shown about the right. Each interior node corresponds to at least one regarding the input variables; you will find edges to children for some of the possible values of the comments variable. Each leaf signifies a value of this target variable given the values from the input factors represented by the trail through the root to your leaf. circulation regarding the values which are found in that industry. A sample decision tree is Decision trees are produced by formulas that identify numerous methods of splitting a data field of business cleverness relationship is extracted, then one or higher decision maxims could be derived that describe the branches or portions. The values in the comments field are used to estimate the absolute most likely worth in

II. CLASSIFICATION

There are various classifications which are related to Decision Trees [3]:

1. Axis-Parallel Decision Trees Axis-parallel decision trees are the most public kind

discovered in the works, generally because this kind of tree is normally far easier to elucidate than an oblique tree. We tear our analysis on axis-parallel decision trees according to the main steps of the evolutionary process. That is, we examine how resolutions are encoded; that methods are utilized for initializing the populace of decision trees; the most public strategies for fitness evaluation; the genetic operators that are projected to evolve individuals; and supplementary connected issues.

a. Solution Encoding: Some terminology subjects that are normally dictated

according to the EA resolution encoding scheme. Nomenclature aside, decision tree encoding is normally whichever tree-based or non-tree based. We comment on both next. Tree-based encoding is the most public way for coding people in EAs for decision tree induction, and it seems a usual choice after we are dealing alongside decision trees. The competitive co-evolution for decision tree induction and uses a

tree-encoding scheme. The arrangement sketches binary decision trees whereas every single node is embodied by a 4-tuple.Each constituent is a numeric worth that can be adjusted across the evolutionary process.

b. Population Initialization:

An EA’s early populace has to furnish plenty diversity of people so that the genetic operators can find for resolutions in a extra comprehensive search-space, circumventing innate optima. Nonetheless, a colossal search-space could consequence in extremely sluggish convergence, stopping the EA from discovering a near-optimal solution. In this case, task reliant vision constraints could speed-up convergence by circumventing the find in “dead zones” of the resolution space. It is clear that there is a slender line amid the precise number of diversification for circumventing innate optima and task-dependent vision constraints for speeding-up convergence.

c. Fitness Evaluation Methods: Evolutionary decision tree induction algorithms can be

roughly tear into two threads considering fitness evaluation: single-objective optimization and multi-objective optimization. EAs that present single-objective optimization use a solitary compute to escort the find for near-optimal solutions. The most public compute for assessing people in evolutionary algorithms for decision tree induction is association accuracy:

�� =�

�

where c is the number of accurately categorized instances and m is the finished number of instances.

d. Selection Methods and Genetic Operators: Selection is the procedure that chooses that people will

experience crossover and mutation. In evolutionary induction of decision trees, the most oftentimes utilized way for selection is tournament selection. One more accepted choice in EAs for decision tree induction is the roulette wheel selection. A less-common selection method in EAs for decision tree induction is rank-based selection. Two operators normally utilized to evolve a populace of people are crossover and mutation. In EAs for decision tree induction, crossover is normally gave in two disparate methods according to the individual representation. For fixed-length binary thread encoding, it is a public way to present the well-known 1-point crossover.

e. Parameter Setting: The parameter benefits of an EA can mainly impact

whether the algorithm will find a adjacent optimum resolution, and whether it will find such a resolution efficiently. The most public parameters in EAs for decision tree induction are populace size, number of generations, probabilities of request of disparate genetic operators and maximum size of decision trees at initialization or across the evolutionary process. In exercise, countless preliminary runs are normally needed in order to tune these parameters. Though, most authors favor to



present a set of default parameter benefits pursued by a sentence like “parameter benefits were empirically defined”.

2. Oblique Decision Trees Oblique decision trees, additionally denoted to as (non-)

linear decision trees, are a public alternative to the established axis parallel approach. Oblique decision trees are normally far tinier and frequently extra precise than axis-parallel decision trees, nevertheless at the price of extra computational power and defeat of comprehensibility. In oblique decision trees, hyper plane that divides the feature space into two distinct spans.

III. DECISION TREE ALGORITHM

1. C4.5

The C4.5 algorithm [4] generates a decision tree for the given data by recursively dividing that data. The decision tree grows employing Depth-first strategy. The C4.5 algorithm considers all the probable examinations that can tear the data and selects a examination that gives the best data gain. This examination removes ID3’s bias in favor of expansive decision trees. For every single discrete attribute, one examination is utilized to produce countless aftermath as the number of different benefits of the attribute. For every single constant attribute, the data is sorted, and the entropy gain is computed established on binary cuts on every single different worth in one scan of the sorted data. This procedure is recapped for all constant attributes. The C4.5 algorithm permits pruning of the emerging decision trees. This increases the error rates on the training data, but vitally, cuts the error rates on the unseen assessing data. The C4.5 algorithm can additionally deal alongside numeric qualities, missing benefits, and loud data. It has the pursuing gains and disadvantages:

Advantages: C4.5 can handle both continuous and discrete

attributes. In order to handle continuous attributes, it creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it.

C4.5 allows attribute values to be marked as? For missing. Missing attribute values are simply not used in gain and entropy calculations.

C4.5 goes back through the tree once it's been created and attempts to remove branches that do not help by replacing them with leaf nodes.

Disadvantages: C4.5 constructs empty branches; it is the most crucial

step for rule generation in C4.5.We have found many nodes with zero values or close to zero values. These values neither contribute to generate rules nor help to construct any class for classification task. Rather it makes the tree bigger and more complex.

Over fitting happens when algorithm model picks up data with uncommon characteristics. Generally C4.5 algorithm constructs trees and grows it branches ‘just deep enough to perfectly classify the training examples’.

Susceptible to noise.

1.1 Decision Trees and C4.5 A decision tree is a classifier which conducts recursive partition over the instance space. A typical decision tree is composed of internal nodes, edges and leaf nodes. Each internal node is called decision node representing a test on an attribute or a subset of attributes, and each edge is labeled with a specific value or range of value of the input attributes. In this way, internal nodes associated with their edges split the instance space into two or more partitions. Each leaf node is a terminal node of the tree with a class label. For example, Figure 1 provides an illustration of a basic decision tree, where circle means decision node and square means leaf node. In this example, we have three splitting attributes, i.e., age, gender and criteria 3, along with two class labels, i.e., YES and NO. Each path from the root node to leaf node forms a classification rule.

Figure 1 Illustration of Decision Tree

The general process of building a decision tree is as follows. Given a set of training data, apply a measurement function onto all attributes to find a best splitting attribute. Once the splitting attribute is determined, the instance space is partitioned into several parts. Within each partition, if all training instances belong to one single class, the algorithm terminates. Otherwise, the splitting process will be recursively performed until the whole partition is assigned to the same class. Once a decision tree is built, classification rules can be easily generated, which can be used for classification of new instances with unknown class labels.

C4.5 [5] is a standard algorithm for inducing classification rules in the form of decision tree. As an extension of ID3 [6], the default criteria of choosing splitting attributes in C4.5 is information gain ratio. Instead of using information gain as that



in ID3, information gain ratio avoids the bias of selecting attributes with many values.

REDUCED ERROR PRUNING

Basically Reduced Error Pruning Tree ("REPT") is fast decision tree discovering and it builds a decision tree established on the data gain or cutting the variance [7]. The frank of pruning of this algorithm is it utilized REP alongside back above fitting. It kindly sorts benefits for numerical attribute after and it grasping the missing benefits alongside embedded method by C4.5 in fractional instances. In this algorithm we can discern it utilized the method from C4.5 and the frank REP additionally count in it process.

IV. RELATED WORK

Rodrigo Coelho Barros et al., 2012 [8] This paper presents a survey of evolutionary algorithms designed for decision tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide and- conquer approach. Additionally, they present some alternative methods that make use of evolutionary algorithms to improve particular components of decision tree classifiers. The paper original contributions are the following. First, it provides an upto- date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy which addresses works that evolve decision trees and works that design decision tree components using evolutionary algorithms. Raj Kumar et al., 2012 [9] In this paper classification is a model finding process that is used for portioning the data into different classes according to some constrains. In other words they can say that classification is process of generalizing the data according to different instances. Several major kinds of classification algorithms including C4.5, k-nearest neighbor classifier, Naive Bayes, SVM, Apriori, and Ada Boost. These papers provide an inclusive survey of different classification algorithms. A.S. Galathiya et al., 2012 [10] In this research work, Comparison is made between ID3, C4.5 and C5.0. Among these classifiers C5.0 gives more accurate and efficient output with comparatively high speed. Memory usage to store the rule set in case of the C5.0 classifier is less as it generates smaller decision tree. This research work supports high accuracy, good speed and low memory usage as proposed system is using C5.0 as the base classifier. The classification process here has low memory usage compare to other techniques because it generates fewer rules. Accuracy is high

as error rate is low on unseen cases. And it is fast due to generating pruned trees. Susan Lomax et al., 2013 [11]] In this paper in the last decade there has been increasing usage of data mining techniques on medical data for discovering useful trends or patterns that are used in diagnosis and decision making. Data mining techniques such as clustering, classification, regression, association rule mining, CART (Classification and Regression Tree) are widely used in healthcare domain. Data mining algorithms, when appropriately used, are capable of improving the quality of prediction, diagnosis and disease classification. The main focus of this paper is to analyze data mining techniques required for medical data mining especially to discover locally frequent diseases such as heart ailments, lung cancer, and breast cancer and so on. They evaluate the data mining techniques for finding locally frequent patterns in terms of cost, performance, speed and accuracy. They also compare data mining techniques with conventional methods. Anuja Priyama et al., 2013 [12] In this paper at the present time, the amount of data stored in educational database is increasing swiftly. These databases contain hidden information for improvement of student’s performance. Classification of data objects is a data mining and knowledge management technique used in grouping similar data objects together. There are many classification algorithms available in literature but decision tree is the most commonly used because of its ease of execution and easier to understand compared to other classification algorithms. The ID3, C4.5 and CART decision tree algorithms former applied on the data of students to predict their performance. But all these are used only for small data set and required that all or a portion of the entire dataset remain permanently in memory. Richa Sharma et al., 2013 [13] In this paper an attempt has been made to develop a decision tree classification (DTC) algorithm for classification of remotely sensed satellite data (Land sat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result based on DTC method provided better visual depiction than results produced by ISODATA clustering or by MLC algorithms. Leszek Rutkowski et al., 2013 [14] In this paper in mining data streams the most popular tool is the Hoeffding tree algorithm. It uses the Hoeffding’s bound to determine the smallest number of examples needed at a node to select a



splitting attribute. In literature the same Hoeffding’s bound was used for any evaluation function (heuristic measure), e.g. information gain or Gini index. In this paper it is shown that the Hoeffding’s inequality is not appropriate to solve the underlying problem. They prove two theorems presenting the McDiarmid’s bound for both the information gain, used in ID3 algorithm, and for Gini index, used in CART algorithm. The results of the paper guarantee that a decision tree learning system, applied to data streams and based on the McDiarmid’s bound, has the property that its output is nearly identical to that of a conventional learner. The results of the paper have a great impact on the state of the art of mining data streams and various developed so far methods and algorithms should be reconsidered. Nirmal Kumar et al., 2013 [15] In this paper Land capability classification (LCC) of a soil map unit is sought for sustainable use, management and conservation practices. High speed, high precision and simple generating of rules by machine learning algorithms can be utilized to construct pre-defined rules for LCC of soil map units in developing decision support systems for land use planning of an area. Decision tree (DT) is one of the most popular classification algorithms currently in machine learning and data mining. Generation of Best First Tree (BF Tree) from qualitative soil survey data for LCC reported in reconnaissance soil survey data of Wardha district, Maharashtra has been demonstrated in the present study with soil depth, slope, and erosion as attributes for LCC. A 10-fold cross validation provided accuracy of 100%. The results indicated that BF Tree algorithms had good potential in automation of LCC of soil survey data, which in turn, will help to develop decision support system to suggest suitable land use system and soil and water conservation practices. Dursun Delen et al., 2013 [16] In this paper Determining the firm performance using a set of financial measures/ratios has been an interesting and challenging problem for many researchers and practitioners. Identification of factors (i.e., financial measures/ ratios) that can accurately predict the firm performance is of great interest to any decision maker. In this study, they employed a two-step analysis methodology: first, using exploratory factor analysis (EFA) they identified (and validated) underlying dimensions of the financial ratios, followed by using predictive modeling methods to discover the potential relationships between the firm performance and financial ratios. Kalpesh Adhatrao et al., 2009 [17] In this paper an educational institution needs to have an approximate prior knowledge of enrolled students to predict their performance in future academics. This helps them to identify promising students and also provides them an opportunity to pay

attention to and improve those who would probably get lower grades. As a solution, they have developed a system which can predict the performance of students from their previous performances using concepts of data mining techniques under Classification. They have analyzed the data set containing information about students, such as gender, marks scored in the board examinations of classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data, they have predicted the general and individual performance of freshly admitted students in future examinations. Delveen Luqman Abd et al., 2013 [18] In this paper a comparison among three classification’s algorithms will be studied, these are (K- Nearest Neighbor classifier, Decision tree and Bayesian network) algorithms. The paper will demonstrate the strength and accuracy of each algorithm for classification in term of performance efficiency and time complexity required. For model validation purpose, twenty-four-month data analysis is conducted on a mock-up basis. Michal Wozniak et al., 2014 [19] In this paper a current focus of intense research in pattern classification is the combination of several classifier systems, which can be built following either the same or different models and/or datasets building approaches. These systems perform information fusion of classification decisions at different levels overcoming limitations of traditional approaches based on single classifiers. This paper presents an up-to date survey on multiple classifier system (MCS) from the point of view of Hybrid Intelligent Systems. The article discusses major issues, such as diversity and decision fusion methods, providing a vision of the spectrum of applications that are currently being developed. Brijain R. Patel et al., 2014 [20] In this paper Data mining is the process of discovering or extracting new patterns from large data sets involving methods from statistics and artificial intelligence. Classification and prediction are the techniques used to make out important data classes and predict probable trend .The Decision Tree is an important classification method in data mining classification. It is commonly used in marketing, surveillance, fraud detection, scientific discovery. As the classical algorithm of the decision tree ID3, C4.5, C5.0 algorithms have the merits of high classifying speed, strong learning ability and simple construction. However, these algorithms are also unsatisfactory in practical application. When using it to classify, there does exists the problem of inclining to choose attribute which have more values, and overlooking attributes which have less values. This paper



provides focus on the various algorithms of Decision tree their characteristic, challenges, advantage and disadvantage.

V. CONCLUSION AND FUTURE WORKS

Decision tree is a tree formed data structure that verifies divide and rule approach. Decision tree is used for supervised learning. It is a tree structured model in which the local region is found recursively, with a set of division in a few steps. Decision tree consists of inner decision node and outer leaf. In Future we will work on a variant of Decision trees where classification error will be minimized using Reduced Error Pruning, this algorithm will be based on the principle of calculating the information gain with entropy and reducing the error arising from variance. With the help of this method, complexity of decision tree model can decreased by and the error arising from variance is reduced.

VI. REFERENCES

[1]. Oliver, Jonathan J., and David J. Hand. "On pruning and averaging decision trees." In Machine Learning: Proceedings of the Twelfth International Conference, pp. 430-437. 2014.

[2]. Larose, Daniel T. Discovering knowledge in data: an introduction to data mining. John Wiley & Sons, 2014.

[3]. Quinlan, J. Ross. C4. 5: programs for machine learning. Elsevier, 2014.

[4]. Mašetic, Zerina, and Abdulhamit Subasi. "Detection of congestive heart failures using C4. 5 Decision Tree." SouthEast Europe Journal of Soft Computing 2, no. 2 (2013).

[5]. Singh, Naveen Choudhary, and Dharm Jully Samota. "Analysis of Data Mining Classification with Decision Tree Technique." Global Journal of Computer Science and Technology 13, no. 13 (2014).

[6]. Oliver, Jonathan J., and David J. Hand. "On pruning and averaging decision trees." In Machine Learning: Proceedings of the Twelfth International Conference, pp. 430-437. 2014.

[7]. Raj Kumar and Rajesh Verma. "Classification algorithms for data mining: A survey." International Journal of Innovations in Engineering and Technology (IJIET) 1, no. 2 (2012): 7-14.

[8]. Rodrigo Coelho Barros, Marcio Porto Basgalupp, A. C. P. L. F. De Carvalho, and Alex Alves Freitas. "A survey of evolutionary algorithms for decision-tree induction." Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 42, no. 3 (2012): 291-312.

[9]. Raj Kumar and Rajesh Verma. "Classification algorithms for data mining: A survey." International Journal of Innovations in Engineering and Technology (IJIET) 1, no. 2 (2012): 7-14.

[10]. A.S. Galathiya, A. P. Ganatra, and C. K. Bhensdadia. "Improved Decision Tree Induction Algorithm with Feature Selection, Cross Validation, Model Complexity and Reduced Error Pruning." International Journal of Computer Science and Information Technologies 3, no. 2 (2012): 3427-3431.

[11]. Susan Lomax and Sunil Vadera. "A survey of cost-sensitive decision tree induction algorithms." ACM Computing Surveys (CSUR) 45, no. 2 (2013): 16.

[12]. Mohammed Abdul Khaleel, Sateesh Kumar Pradham, and G. N. Dash. "A survey of data mining techniques on medical data for finding locally frequent diseases." Int. J. Adv. Res. Comput. Sci. Softw. Eng 3, no. 8 (2013).

[13]. Anuja Priyama, Rahul Guptaa Abhijeeta, Anju Ratheeb, and Saurabh Srivastavab. "Comparative Analysis of Decision Tree Classification Algorithms." International Journal of Current Engineering and Technology 3, no. 2 (2013): 866-883.

[14]. Richa Sharma, Aniruddha Ghosh, and P. K. Joshi. "Decision tree approach for classification of remotely sensed satellite data using open source support." Journal of Earth System Science 122, no. 5 (2013): 1237-1247.

[15]. Leszek Rutkowski, Lena Pietruczuk, Piotr Duda, and Maciej Jaworski. "Decision trees for mining data streams based on the McDiarmid's bound." Knowledge and Data Engineering, IEEE Transactions on 25, no. 6 (2013): 1272-1279.

[16]. Nirmal Kumar, G. P. Reddy, and S. Chatterji. "Evaluation of Best First Decision Tree on Categorical Soil Survey Data for Land Capability Classification." International Journal of Computer Applications 72, no. 4 (2013).

[17]. Dursun Delen, Cemil Kuzey, and Ali Uyar. "Measuring firm performance using financial ratios: A decision tree approach." Expert Systems with Applications 40, no. 10 (2013): 3970-3983.

[18]. Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha, and Vipul Honrao. "Predicting Students' Performance using ID3 and C4. 5 Classification Algorithms." arXiv preprint arXiv:1310.2071 (2013).

[19]. Delveen Luqman Abd, AL-Nabi, , and Shereen Shukri Ahmed. "Survey on Classification Algorithms for Data Mining:(Comparison and Evaluation)."



Computer Engineering and Intelligent Systems 4, no. 8 (2013): 18-24.

[20]. Michal Wozniak, Manuel Graña, and Emilio Corchado. "A survey of multiple classifier systems as hybrid systems." Information Fusion 16 (2014): 3-17.

[21]. Brijain R. Patel, and Kaushik K. Rana. "Use of Renyi Entropy Calculation Method for ID3 Algorithm for Decision tree Generation in Data Mining." International Journal 2, no. 5 (2014).

research paper ajcst 034 18

Documents