weka: the bird · machine learning for data mining 2 6/11/2013 university of waikato 3 weka: the...
TRANSCRIPT
Machine Learning for Data Mining 1
Department of Computer Science, University of Waikato, New Zealand
Eibe Frank
WEKA: A Machine Learning Toolkit
The Explorer• Classification and
Regression
• Clustering
• Association Rules
• Attribute Selection
• Data Visualization
The Experimenter
The Knowledge Flow GUI
Conclusions
Machine Learning with WEKA
6/11/2013 University of Waikato 2
WEKA: the bird
Copyright: Martin Kramer ([email protected])
Machine Learning for Data Mining 2
6/11/2013 University of Waikato 3
WEKA: the software
Machine learning/data mining software written in Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features: Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
Graphical user interfaces (incl. data visualization)
Environment for comparing learning algorithms
6/11/2013 University of Waikato 4
WEKA: versions There are several versions of WEKA:
WEKA 3.0: “book version” compatible with description in data mining book
WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)
WEKA 3.3: “development version” with lots of improvements
This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)
Machine Learning for Data Mining 3
6/11/2013 University of Waikato 5
@relation heart-disease-simplified
@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}
@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...
WEKA only deals with “flat” files
6/11/2013 University of Waikato 6
@relation heart-disease-simplified
@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}
@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...
WEKA only deals with “flat” files
Machine Learning for Data Mining 4
6/11/2013 University of Waikato 7
6/11/2013 University of Waikato 8
Machine Learning for Data Mining 5
6/11/2013 University of Waikato 9
6/11/2013 University of Waikato 10
Explorer: pre-processing the data Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for: Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
Machine Learning for Data Mining 6
6/11/2013 University of Waikato 11
6/11/2013 University of Waikato 12
Machine Learning for Data Mining 7
6/11/2013 University of Waikato 13
6/11/2013 University of Waikato 14
Machine Learning for Data Mining 8
6/11/2013 University of Waikato 15
6/11/2013 University of Waikato 16
Machine Learning for Data Mining 9
6/11/2013 University of Waikato 17
6/11/2013 University of Waikato 18
Machine Learning for Data Mining 10
6/11/2013 University of Waikato 19
6/11/2013 University of Waikato 20
Machine Learning for Data Mining 11
6/11/2013 University of Waikato 21
6/11/2013 University of Waikato 22
Machine Learning for Data Mining 12
6/11/2013 University of Waikato 23
6/11/2013 University of Waikato 24
Machine Learning for Data Mining 13
6/11/2013 University of Waikato 25
6/11/2013 University of Waikato 26
Machine Learning for Data Mining 14
6/11/2013 University of Waikato 27
6/11/2013 University of Waikato 28
Machine Learning for Data Mining 15
6/11/2013 University of Waikato 29
6/11/2013 University of Waikato 30
Machine Learning for Data Mining 16
6/11/2013 University of Waikato 31
6/11/2013 University of Waikato 32
Explorer: building “classifiers” Classifiers in WEKA are models for predicting
nominal or numeric quantities
Implemented learning schemes include: Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …
“Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
Machine Learning for Data Mining 17
6/11/2013 University of Waikato 33
6/11/2013 University of Waikato 34
Machine Learning for Data Mining 18
6/11/2013 University of Waikato 35
6/11/2013 University of Waikato 36
Machine Learning for Data Mining 19
6/11/2013 University of Waikato 37
6/11/2013 University of Waikato 38
Machine Learning for Data Mining 20
6/11/2013 University of Waikato 39
6/11/2013 University of Waikato 40
Machine Learning for Data Mining 21
6/11/2013 University of Waikato 41
6/11/2013 University of Waikato 42
Machine Learning for Data Mining 22
6/11/2013 University of Waikato 43
6/11/2013 University of Waikato 44
Machine Learning for Data Mining 23
6/11/2013 University of Waikato 45
6/11/2013 University of Waikato 46
Machine Learning for Data Mining 24
6/11/2013 University of Waikato 47
6/11/2013 University of Waikato 48
Machine Learning for Data Mining 25
6/11/2013 University of Waikato 49
6/11/2013 University of Waikato 50
Machine Learning for Data Mining 26
6/11/2013 University of Waikato 51
6/11/2013 University of Waikato 52
Machine Learning for Data Mining 27
6/11/2013 University of Waikato 53
6/11/2013 University of Waikato 54
Machine Learning for Data Mining 28
6/11/2013 University of Waikato 55
6/11/2013 University of Waikato 56
Machine Learning for Data Mining 29
6/11/2013 University of Waikato 57
6/11/2013 University of Waikato 58
Machine Learning for Data Mining 30
6/11/2013 University of Waikato 59
6/11/2013 University of Waikato 60
Machine Learning for Data Mining 31
6/11/2013 University of Waikato 61
6/11/2013 University of Waikato 62
Machine Learning for Data Mining 32
6/11/2013 University of Waikato 63
6/11/2013 University of Waikato 64
Machine Learning for Data Mining 33
6/11/2013 University of Waikato 65QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
6/11/2013 University of Waikato 66QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Machine Learning for Data Mining 34
6/11/2013 University of Waikato 67QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
6/11/2013 University of Waikato 68
Machine Learning for Data Mining 35
6/11/2013 University of Waikato 69
6/11/2013 University of Waikato 70
Machine Learning for Data Mining 36
6/11/2013 University of Waikato 71
6/11/2013 University of Waikato 72
Machine Learning for Data Mining 37
6/11/2013 University of Waikato 73
6/11/2013 University of Waikato 74
Machine Learning for Data Mining 38
6/11/2013 University of Waikato 75
QuickTime™ and a TIFF (LZW) decompressor are needed to see this pict
6/11/2013 University of Waikato 76
Machine Learning for Data Mining 39
6/11/2013 University of Waikato 77
6/11/2013 University of Waikato 78
Machine Learning for Data Mining 40
6/11/2013 University of Waikato 79
6/11/2013 University of Waikato 80
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Machine Learning for Data Mining 41
6/11/2013 University of Waikato 81
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
6/11/2013 University of Waikato 82
Machine Learning for Data Mining 42
6/11/2013 University of Waikato 83
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
6/11/2013 University of Waikato 84
Machine Learning for Data Mining 43
6/11/2013 University of Waikato 85
6/11/2013 University of Waikato 86
Machine Learning for Data Mining 44
6/11/2013 University of Waikato 87
6/11/2013 University of Waikato 88
Machine Learning for Data Mining 45
6/11/2013 University of Waikato 89
6/11/2013 University of Waikato 90
Machine Learning for Data Mining 46
6/11/2013 University of Waikato 91
6/11/2013 University of Waikato 92
Explorer: clustering data WEKA contains “clusterers” for finding groups of
similar instances in a dataset
Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to “true” clusters (if given)
Evaluation based on loglikelihood if clustering scheme produces a probability distribution
Machine Learning for Data Mining 47
6/11/2013 University of Waikato 93
6/11/2013 University of Waikato 94
Machine Learning for Data Mining 48
6/11/2013 University of Waikato 95
6/11/2013 University of Waikato 96
Machine Learning for Data Mining 49
6/11/2013 University of Waikato 97
6/11/2013 University of Waikato 98
Machine Learning for Data Mining 50
6/11/2013 University of Waikato 99
6/11/2013 University of Waikato 100
Machine Learning for Data Mining 51
6/11/2013 University of Waikato 101
6/11/2013 University of Waikato 102
Machine Learning for Data Mining 52
6/11/2013 University of Waikato 103
6/11/2013 University of Waikato 104
Machine Learning for Data Mining 53
6/11/2013 University of Waikato 105
6/11/2013 University of Waikato 106
Machine Learning for Data Mining 54
6/11/2013 University of Waikato 107
6/11/2013 University of Waikato 108
Explorer: finding associations WEKA contains an implementation of the Apriori
algorithm for learning association rules Works only with discrete data
Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given minimum support and exceed a given confidence
Machine Learning for Data Mining 55
6/11/2013 University of Waikato 109
6/11/2013 University of Waikato 110
Machine Learning for Data Mining 56
6/11/2013 University of Waikato 111
6/11/2013 University of Waikato 112
Machine Learning for Data Mining 57
6/11/2013 University of Waikato 113
6/11/2013 University of Waikato 114
Machine Learning for Data Mining 58
6/11/2013 University of Waikato 115
6/11/2013 University of Waikato 116
Explorer: attribute selection Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
Attribute selection methods contain two parts: A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
An evaluation method: correlation-based, wrapper, information gain, chi-squared, …
Very flexible: WEKA allows (almost) arbitrary combinations of these two
Machine Learning for Data Mining 59
6/11/2013 University of Waikato 117
6/11/2013 University of Waikato 118
Machine Learning for Data Mining 60
6/11/2013 University of Waikato 119
6/11/2013 University of Waikato 120
Machine Learning for Data Mining 61
6/11/2013 University of Waikato 121
6/11/2013 University of Waikato 122
Machine Learning for Data Mining 62
6/11/2013 University of Waikato 123
6/11/2013 University of Waikato 124
Machine Learning for Data Mining 63
6/11/2013 University of Waikato 125
Explorer: data visualization
Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem
WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal attributes (and to detect “hidden” data points)
“Zoom-in” function
6/11/2013 University of Waikato 126
Machine Learning for Data Mining 64
6/11/2013 University of Waikato 127
6/11/2013 University of Waikato 128
Machine Learning for Data Mining 65
6/11/2013 University of Waikato 129
6/11/2013 University of Waikato 130
Machine Learning for Data Mining 66
6/11/2013 University of Waikato 131
6/11/2013 University of Waikato 132
Machine Learning for Data Mining 67
6/11/2013 University of Waikato 133
6/11/2013 University of Waikato 134
Machine Learning for Data Mining 68
6/11/2013 University of Waikato 135
6/11/2013 University of Waikato 136
Machine Learning for Data Mining 69
6/11/2013 University of Waikato 137
6/11/2013 University of Waikato 138
Performing experiments Experimenter makes it easy to compare the
performance of different learning schemes
For classification and regression problems
Results can be written into file or database
Evaluation options: cross-validation, learning curve, hold-out
Can also iterate over different parameter settings
Significance-testing built in!
Machine Learning for Data Mining 70
6/11/2013 University of Waikato 139
6/11/2013 University of Waikato 140
Machine Learning for Data Mining 71
6/11/2013 University of Waikato 141
6/11/2013 University of Waikato 142
Machine Learning for Data Mining 72
6/11/2013 University of Waikato 143
6/11/2013 University of Waikato 144
Machine Learning for Data Mining 73
6/11/2013 University of Waikato 145
6/11/2013 University of Waikato 146
Machine Learning for Data Mining 74
6/11/2013 University of Waikato 147
6/11/2013 University of Waikato 148
Machine Learning for Data Mining 75
6/11/2013 University of Waikato 149
6/11/2013 University of Waikato 150
Machine Learning for Data Mining 76
6/11/2013 University of Waikato 151
6/11/2013 University of Waikato 152
The Knowledge Flow GUI New graphical user interface for WEKA
Java-Beans-based interface for setting up and running machine learning experiments
Data sources, classifiers, etc. are beans and can be connected graphically
Data “flows” through components: e.g.,
“data source” -> “filter” -> “classifier” -> “evaluator”
Layouts can be saved and loaded again later
Machine Learning for Data Mining 77
6/11/2013 University of Waikato 153
6/11/2013 University of Waikato 154
Machine Learning for Data Mining 78
6/11/2013 University of Waikato 155
6/11/2013 University of Waikato 156
Machine Learning for Data Mining 79
6/11/2013 University of Waikato 157
6/11/2013 University of Waikato 158
Machine Learning for Data Mining 80
6/11/2013 University of Waikato 159
6/11/2013 University of Waikato 160
Machine Learning for Data Mining 81
6/11/2013 University of Waikato 161
6/11/2013 University of Waikato 162
Machine Learning for Data Mining 82
6/11/2013 University of Waikato 163
6/11/2013 University of Waikato 164
Machine Learning for Data Mining 83
6/11/2013 University of Waikato 165
6/11/2013 University of Waikato 166
Machine Learning for Data Mining 84
6/11/2013 University of Waikato 167
6/11/2013 University of Waikato 168
Machine Learning for Data Mining 85
6/11/2013 University of Waikato 169
6/11/2013 University of Waikato 170
Machine Learning for Data Mining 86
6/11/2013 University of Waikato 171
6/11/2013 University of Waikato 172
Machine Learning for Data Mining 87
6/11/2013 University of Waikato 173
Conclusion: try it yourself!
WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
Also has a list of projects based on WEKA
WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang