decision trees
TRANSCRIPT
![Page 1: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/1.jpg)
DATA WARE HOUSING AND DATA MINING
DECISION TREE
![Page 2: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/2.jpg)
Contents•Introduction•Decision Tree•Decision Tree Algorithm•Decision Tree Based Algorithm•Algorithm•Decision Tree Advantages and
Disadvantages
![Page 3: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/3.jpg)
Introduction•Classification is a most familiar and most
popular data mining technique.•Classification applications includes image
and pattern recognition, loan approval, detecting faults in industrial applications.
•All approaches to performing classification assumes some knowledge of the data.
•Training set is used to develop specific parameters required by the technique.
![Page 4: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/4.jpg)
Decision Tree
•Decision Tree (DT):▫Tree where the root and each internal node
is labeled with a question. ▫The arcs represent each possible answer to
the associated question. ▫Each leaf node represents a prediction of a
solution to the problem.•Popular technique for classification; Leaf
node indicates class to which the corresponding tuple belongs.
![Page 5: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/5.jpg)
Decision Tree Example
![Page 6: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/6.jpg)
Decision Tree•A Decision Tree Model is a
computational model consisting of three parts:▫Decision Tree▫Algorithm to create the tree▫Algorithm that applies the tree to data
•Creation of the tree is the most difficult part.
•Processing is basically a search similar to that in a binary search tree (although DT may not be binary).
![Page 7: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/7.jpg)
Decision Tree Algorithm
![Page 8: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/8.jpg)
Algorithm Definition
•The decision tree approach is most useful in classification problems. With this technique, a tree is constructed to model the classification process.
•Once the tree is build, it is applied to each tuple in the database and results in a classification for that tuple.
•There are two basics step in this techinque: Building the tree and Applying the tree to the database.
![Page 9: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/9.jpg)
•The decision tree approach to classification is to divide the search space into rectangular region. A tuple is classified based on the region into which it falls.
•Definition: Given a database D={t1……..tn} where ti=<ti1……..tih> and the database schema consist of following attributes {A1,A2,………,Ah} also a set of classes C={C1,……,Cm}. A decision tree DT or classification tree is a tree associated with D that has the following properties:▫Each internal node is labeled with an attribute Ai▫Each arc is labeled with a predicate that can be
applied to a attribute associated with a parent.▫Each leaf node is labeled with a class Cj.
![Page 10: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/10.jpg)
Algorithm•Input:
D // Training data•Output:
T //Decision tree•DTBuild algorithm
// Simplistic algorithm to illustrate naive approach to building DT
![Page 11: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/11.jpg)
• T=0;Determine best splitting criterion;T=Create root node, node and label with splitting attribute;T=Add arc to root node for each split predicate and label;for each arc doD= database created by applying splitting predicate to D;if stopping point reached for this path, then T’= Create leaf node and label with appropriate class;elseT’=DTBuild(D);T=Add T’ to arc;
![Page 12: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/12.jpg)
DT Advantages/Disadvantages•Advantages:
▫Easy to understand. ▫Easy to generate rules
•Disadvantages:▫May suffer from overfitting.▫Classifies by rectangular partitioning.▫Does not easily handle nonnumeric data.▫Can be quite large – pruning is
necessary.
![Page 13: Decision trees](https://reader036.vdocuments.us/reader036/viewer/2022083119/586fcf581a28aba24c8b8013/html5/thumbnails/13.jpg)
THANK YOU
Made by:Jagjit Singh Wilku