data mining: classification & predication hosam al-samarraie, phd. centre for instructional...
TRANSCRIPT
![Page 1: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/1.jpg)
Data Mining:Classification & Predication
Hosam Al-Samarraie, PhD.
Centre for Instructional Technology & Multimedia
Universiti Sains Malaysia
![Page 2: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/2.jpg)
What Does Data Mining Do?
• Extract patterns from data– Pattern? A mathematical (numeric
and/or symbolic) relationship among data items.
• Types of patterns– Association– Classification & Prediction– Cluster (segmentation)
![Page 3: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/3.jpg)
Knowledge Discovery
Steps in a Knowledge Discovery process
![Page 4: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/4.jpg)
Supervised vs. Unsupervised Learning
• Supervised learning (classification)
– Supervision: The training of data (observations, constructs, variables, eye-movement parameters, etc.) indicating the class of the observations (out put, dependent variable, known class, etc.). = model to be tested.
• Unsupervised learning (clustering & association)n
– Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
![Page 5: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/5.jpg)
Classification vs. Prediction
Classification: predicts categorical class labelsclassifies data (constructs a model) based on the training set and the
values (class labels) in a classifying attribute and uses it in classifying new data
Prediction (Regression): Similar to classification but with identifying the unknown or missing
values
![Page 6: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/6.jpg)
Classification
My DV
My IV
![Page 7: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/7.jpg)
Classification: A Two-Step Process
• Model construction: describing a set of predetermined classes– Each case/instance is assumed to belong to a predefined
class, as determined by the class label attribute (DV)– The set of cases used for model construction name training
set
• Model usage: for classifying future or unknown objects– Estimate accuracy of the model
• The known label of test sample is compared with the classified result from the model
• Accuracy rate is the percentage of test set samples that are correctly classified by the model
![Page 8: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/8.jpg)
Classification Process (1): Model Construction
TrainingData
ClassificationAlgorithms
IF Hosam= ‘Senior lecturer’OR years > 3THEN tenured = ‘yes’
Classifier(Model)
![Page 9: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/9.jpg)
Classification Process (2): Use the Model in Prediction
Classifier
TestingData Unseen Data
(Anwer, Assoicate, 4)
Bonus?
![Page 10: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/10.jpg)
10
Learning and using a model• Learning
– Learning algorithm takes instances of concept as input– Produces a structural description (model) as output
Input:conceptto learn
Learningalgorithm Model
Prediction Model takes new instance as input Outputs prediction
Input Model Prediction
![Page 11: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/11.jpg)
Other Classification Techniques
Decision tree analysis, J48 (most popular)
Neural networksSupport vector machines (most
popular)Naïve Baye (most popular)
![Page 12: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/12.jpg)
Classification by Decision Tree Induction
Decision tree A flow-chart-like tree structure Internal node denotes a test on an attributeBranch represents an outcome of the testLeaf nodes represent class labels or class distribution
![Page 13: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/13.jpg)
Accuracy Measures
Most accuracy measures are derived from the classification matrix (also called the confusion matrix.) This matrix summarizes the correct and incorrect classifications that
a classifier produced for a certain dataset. Rows and columns of the confusion matrix correspond to the true
and predicted classes respectively.
13
![Page 14: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/14.jpg)
ROC Curves
• Receiver operator characteristic
• Summarize & present performance of any binary classification model
• Models ability to distinguish between false & true positives
![Page 15: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/15.jpg)
Cont….
• Receiver Operator Characteristic (ROC) curves are commonly used to show how the number of correctly classified positive examples varies with the number of incorrectly classified negative examples.
![Page 16: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/16.jpg)
ROC vs Precision & Recall (PR)
![Page 17: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/17.jpg)
![Page 18: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/18.jpg)
Classification?
• I use classifier to identify the characteristics for each animal to be used later for prediction model testing.
Tail Hoof Rib Dewlap Stirrup Reins Twist Animal
yes Yes No No Yes Yes No Horse
yes Yes No No Yes Yes No Horse
no Yes No Yes No No Yes Sheep
yes No Yes No No No No Rabbit
yes No Yes No No No No Rabbit
no Yes No Yes No No Yes Sheep
yes Ye No No Yes Yes No Horse
![Page 19: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/19.jpg)
Prediction?
• To have the characteristics but do not know to whom it belongs!!
Tail Hoof Rib Dewlap Stirrup Reins Twist Animal
yes Yes No No Yes Yes No ?
yes Yes No No Yes Yes No ?
no Yes No Yes No No Yes ?
yes No Yes No No No No ?
yes No Yes No No No No ?
no Yes No Yes No No Yes ?
yes Ye No No Yes Yes No ?
![Page 20: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/20.jpg)
Summary
• Classification predicts class labels • Numeric prediction models continued-valued
functions
• Two steps of classification: • 1) Training • 2) Testing and using
![Page 21: Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649ebc5503460f94bc5846/html5/thumbnails/21.jpg)
• Now lets check it out using Weka