microsoft powerpoint - weka [read-only]

47
Machine Learning for Data Mining 1 Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA 2/4/2004 University of Waikato 2 WEKA: the bird Copyright: Martin Kramer ([email protected])

Upload: butest

Post on 27-Jan-2015

126 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 1

Department of Computer Science, University of Waikato, New Zealand

Eibe Frank

� WEKA: A Machine Learning Toolkit

� The Explorer• Classification and

Regression

• Clustering

• Association Rules

• Attribute Selection

• Data Visualization

� The Experimenter

� The Knowledge Flow GUI

� Conclusions

Machine Learning with WEKA

2/4/2004 University of Waikato 2

WEKA: the bird

Copyright: Martin Kramer ([email protected])

Page 2: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 2

2/4/2004 University of Waikato 3

WEKA: the software

� Machine learning/data mining software written in Java (distributed under the GNU Public License)

� Used for research, education, and applications� Complements “Data Mining” by Witten & Frank� Main features:

� Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods

� Graphical user interfaces (incl. data visualization)� Environment for comparing learning algorithms

2/4/2004 University of Waikato 4

WEKA: versions

� There are several versions of WEKA:� WEKA 3.0: “book version” compatible with

description in data mining book� WEKA 3.2: “GUI version” adds graphical user

interfaces (book version is command-line only)� WEKA 3.3: “development version” with lots of

improvements

� This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)

Page 3: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 3

2/4/2004 University of Waikato 5

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

2/4/2004 University of Waikato 6

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

Page 4: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 4

2/4/2004 University of Waikato 7

2/4/2004 University of Waikato 8

Page 5: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 5

2/4/2004 University of Waikato 9

2/4/2004 University of Waikato 10

Explorer: pre-processing the data

� Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary

� Data can also be read from a URL or from an SQL database (using JDBC)

� Pre-processing tools in WEKA are called “filters”� WEKA contains filters for:

� Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

Page 6: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 6

2/4/2004 University of Waikato 11

2/4/2004 University of Waikato 12

Page 7: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 7

2/4/2004 University of Waikato 13

2/4/2004 University of Waikato 14

Page 8: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 8

2/4/2004 University of Waikato 15

2/4/2004 University of Waikato 16

Page 9: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 9

2/4/2004 University of Waikato 17

2/4/2004 University of Waikato 18

Page 10: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 10

2/4/2004 University of Waikato 19

2/4/2004 University of Waikato 20

Page 11: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 11

2/4/2004 University of Waikato 21

2/4/2004 University of Waikato 22

Page 12: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 12

2/4/2004 University of Waikato 23

2/4/2004 University of Waikato 24

Page 13: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 13

2/4/2004 University of Waikato 25

2/4/2004 University of Waikato 26

Page 14: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 14

2/4/2004 University of Waikato 27

2/4/2004 University of Waikato 28

Page 15: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 15

2/4/2004 University of Waikato 29

2/4/2004 University of Waikato 30

Page 16: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 16

2/4/2004 University of Waikato 31

2/4/2004 University of Waikato 32

Explorer: building “classifiers”

� Classifiers in WEKA are models for predicting nominal or numeric quantities

� Implemented learning schemes include:� Decision trees and lists, instance-based classifiers,

support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

� “Meta”-classifiers include:� Bagging, boosting, stacking, error-correcting output

codes, locally weighted learning, …

Page 17: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 17

2/4/2004 University of Waikato 33

2/4/2004 University of Waikato 34

Page 18: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 18

2/4/2004 University of Waikato 35

2/4/2004 University of Waikato 36

Page 19: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 19

2/4/2004 University of Waikato 37

2/4/2004 University of Waikato 38

Page 20: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 20

2/4/2004 University of Waikato 53

2/4/2004 University of Waikato 92

Explorer: clustering data

� WEKA contains “clusterers” for finding groups of similar instances in a dataset

� Implemented schemes are:� k-Means, EM, Cobweb, X-means, FarthestFirst

� Clusters can be visualized and compared to “true” clusters (if given)

� Evaluation based on loglikelihood if clustering scheme produces a probability distribution

Page 21: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 21

2/4/2004 University of Waikato 108

Explorer: finding associations

� WEKA contains an implementation of the Apriori algorithm for learning association rules� Works only with discrete data

� Can identify statistical dependencies between groups of attributes:� milk, butter ⇒ bread, eggs (with confidence 0.9 and

support 2000)

� Apriori can compute all rules that have a given minimum support and exceed a given confidence

2/4/2004 University of Waikato 109

Page 22: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 22

2/4/2004 University of Waikato 110

2/4/2004 University of Waikato 111

Page 23: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 23

2/4/2004 University of Waikato 112

2/4/2004 University of Waikato 113

Page 24: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 24

2/4/2004 University of Waikato 114

2/4/2004 University of Waikato 115

Page 25: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 25

2/4/2004 University of Waikato 116

Explorer: attribute selection

� Panel that can be used to investigate which (subsets of) attributes are the most predictive ones

� Attribute selection methods contain two parts:� A search method: best-first, forward selection,

random, exhaustive, genetic algorithm, ranking� An evaluation method: correlation-based, wrapper,

information gain, chi-squared, …

� Very flexible: WEKA allows (almost) arbitrary combinations of these two

2/4/2004 University of Waikato 117

Page 26: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 26

2/4/2004 University of Waikato 118

2/4/2004 University of Waikato 119

Page 27: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 27

2/4/2004 University of Waikato 120

2/4/2004 University of Waikato 121

Page 28: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 28

2/4/2004 University of Waikato 122

2/4/2004 University of Waikato 123

Page 29: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 29

2/4/2004 University of Waikato 124

2/4/2004 University of Waikato 125

Explorer: data visualization

� Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem

� WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)� To do: rotating 3-d visualizations (Xgobi-style)

� Color-coded class values� “Jitter” option to deal with nominal attributes (and

to detect “hidden” data points)� “Zoom-in” function

Page 30: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 30

2/4/2004 University of Waikato 126

2/4/2004 University of Waikato 127

Page 31: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 31

2/4/2004 University of Waikato 128

2/4/2004 University of Waikato 129

Page 32: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 32

2/4/2004 University of Waikato 130

2/4/2004 University of Waikato 131

Page 33: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 33

2/4/2004 University of Waikato 132

2/4/2004 University of Waikato 133

Page 34: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 34

2/4/2004 University of Waikato 134

2/4/2004 University of Waikato 135

Page 35: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 35

2/4/2004 University of Waikato 136

2/4/2004 University of Waikato 137

Page 36: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 36

2/4/2004 University of Waikato 138

Performing experiments

� Experimenter makes it easy to compare the performance of different learning schemes

� For classification and regression problems� Results can be written into file or database� Evaluation options: cross-validation, learning

curve, hold-out� Can also iterate over different parameter settings� Significance-testing built in!

2/4/2004 University of Waikato 152

The Knowledge Flow GUI

� New graphical user interface for WEKA� Java-Beans-based interface for setting up and

running machine learning experiments� Data sources, classifiers, etc. are beans and can

be connected graphically� Data “flows” through components: e.g.,

“data source” -> “filter” -> “classifier” -> “evaluator”� Layouts can be saved and loaded again later

Page 37: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 37

2/4/2004 University of Waikato 153

2/4/2004 University of Waikato 154

Page 38: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 38

2/4/2004 University of Waikato 155

2/4/2004 University of Waikato 156

Page 39: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 39

2/4/2004 University of Waikato 157

2/4/2004 University of Waikato 158

Page 40: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 40

2/4/2004 University of Waikato 159

2/4/2004 University of Waikato 160

Page 41: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 41

2/4/2004 University of Waikato 161

2/4/2004 University of Waikato 162

Page 42: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 42

2/4/2004 University of Waikato 163

2/4/2004 University of Waikato 164

Page 43: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 43

2/4/2004 University of Waikato 165

2/4/2004 University of Waikato 166

Page 44: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 44

2/4/2004 University of Waikato 167

2/4/2004 University of Waikato 168

Page 45: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 45

2/4/2004 University of Waikato 169

2/4/2004 University of Waikato 170

Page 46: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 46

2/4/2004 University of Waikato 171

2/4/2004 University of Waikato 172

Page 47: Microsoft PowerPoint - weka [Read-Only]

Machine Learning for Data Mining 47

2/4/2004 University of Waikato 173

Conclusion: try it yourself!

� WEKA is available athttp://www.cs.waikato.ac.nz/ml/weka

� Also has a list of projects based on WEKA� WEKA contributors:

Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, BernhardPfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,

Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang