a simple introduction to weka
TRANSCRIPT
A SIMPLE INTRODUCTION TO WEKA
Contents What is WEKA? WEKA Explorer Preprocessing the data Classification Clustering Association Rules Attribute Selection Data Visualization
Waikato Environment for Knowledge Analysis
Developed by Department of Computer Science, University of Waikato, New Zealand.
Weka is also a bird found only on theislands of New Zealand.
1 What is WEKA?
A collection of machine learning algorithmsfor data mining tasks.
Download and Install WEKA
Website:http://www.cs.waikato.ac.nz/~ml/weka/index.html
Platform independent
Exploratory data analysis
Experimental environment
New process model inspired interface
Command Line Interface
WEKA GUI
WEKA Explorer
2
Preprocessing the data
Classification
Clustering
Association Rules
Attribute Selection
Data Visualization
“Pre-Processing the data
Data can be imported from a file in various formats.ARFF-Attribute-Relation File FormatCSV - Comma Separated Values
Data can be read from a URL or from a SQL database.
Filters are used for pre-processing
@relation heart-disease-simplified
@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}
@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...
WEKA only deals with flat files
05/02/2023University of Waikato 13
05/02/2023University of Waikato 14
05/02/2023University of Waikato 15
05/02/2023University of Waikato 16
05/02/2023University of Waikato 17
05/02/2023University of Waikato 18
05/02/2023University of Waikato 19
05/02/2023University of Waikato 20
05/02/2023University of Waikato 21
05/02/2023University of Waikato 22
05/02/2023University of Waikato 23
05/02/2023University of Waikato 24
05/02/2023University of Waikato 25
05/02/2023University of Waikato 26
05/02/2023University of Waikato 27
05/02/2023University of Waikato 28
05/02/2023University of Waikato 29
05/02/2023University of Waikato 30
Building Classifiers Classifiers in WEKA are models for predicting nominal
or numeric quantities
Implemented learning schemes include:
Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptron, logistic regression, Bayes’ nets, …
age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
Decision Tree Induction: Training Dataset
age?
overcast
student? credit rating?
<=30 >40
no yes yes
yes
31..40
nofairexcellentyesno
Output: A Decision Tree for “buys_computer”
05/02/2023 University of Waikato 34
05/02/2023 University of Waikato 35
05/02/2023 University of Waikato 36
05/02/2023University of Waikato 37
05/02/2023University of Waikato 38
05/02/2023University of Waikato 39
05/02/2023University of Waikato 40
05/02/2023University of Waikato 41
05/02/2023University of Waikato 42
05/02/2023University of Waikato 43
05/02/2023University of Waikato 44
05/02/2023University of Waikato 45
05/02/2023University of Waikato 46
05/02/2023University of Waikato 47
05/02/2023University of Waikato 48
05/02/2023University of Waikato 49
05/02/2023University of Waikato 50
05/02/2023University of Waikato 51
05/02/2023University of Waikato 52
05/02/2023University of Waikato 53
05/02/2023University of Waikato 54
05/02/2023University of Waikato 55
Clustering data Finding groups of similar instances in a
dataset
Implemented schemes in WEKA are:k-Means, EM, Cobweb, X-means, FarthestFirst
Finding Associations WEKA contains an implementation of the Apriori
algorithm for learning association rules
Works only with discrete data
Can identify statistical dependencies between groups of attributes:
05/02/2023University of Waikato 58
05/02/2023University of Waikato 59
05/02/2023University of Waikato 60
05/02/2023University of Waikato 61
05/02/2023University of Waikato 62
Used to determine the most predictive attributes
Consists of two parts
1.) A search method : best-first, forward selection, random, exhaustive, genetic algorithm and etc.
2.)An evaluation method : correlation-based, wrapper, information gain an etc.
Attribute Selection
05/02/2023University of Waikato 64
05/02/2023University of Waikato 65
05/02/2023University of Waikato 66
05/02/2023University of Waikato 67
05/02/2023University of Waikato 68
05/02/2023University of Waikato 69
05/02/2023University of Waikato 70
05/02/2023University of Waikato 71
Data Visualization WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)
Color-coded class values
Use of “jitter” option
“Zoom-in” function
05/02/2023University of Waikato 73
05/02/2023University of Waikato 74
05/02/2023University of Waikato 75
05/02/2023University of Waikato 76
05/02/2023University of Waikato 77
05/02/2023University of Waikato 78
05/02/2023University of Waikato 79
05/02/2023University of Waikato 80
05/02/2023University of Waikato 81
05/02/2023University of Waikato 82
The End