weka toolkit introduction

153
Weka – A Machine Learning Toolkit October 2, 2008 Keum-Sung Hwang

Upload: butest

Post on 27-Jan-2015

4.388 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Weka toolkit introduction

Weka – A Machine Learning Toolkit

October 2, 2008

Keum-Sung Hwang

Page 2: Weka toolkit introduction

• WEKA: A Machine Learning Toolkit

• The Explorer

– Classification and Regression

– Clustering

– Association Rules

– Attribute Selection

– Data Visualization

• The Experimenter

• The Knowledge Flow GUI

• Conclusions

Agenda

Page 3: Weka toolkit introduction

WEKA

• A flightless bird species endemic to New Zealand

Copyright: Martin Kramer ([email protected])

Page 4: Weka toolkit introduction

WEKA

• Machine learning/data mining software written in Java (distributed under the GNU Public License)

• Used for research, education, and applications

• Main features:

– Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods

– Graphical user interfaces (incl. data visualization)

– Environment for comparing learning algorithms

Page 5: Weka toolkit introduction

WEKA: Versions

• There are several versions of WEKA:

– WEKA 3.0: “book version” compatible with description in data mining book

– WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)

– WEKA 3.4: “development version” with lots of improvements

• This talk is based on the snapshot of WEKA 3.3

Page 6: Weka toolkit introduction
Page 7: Weka toolkit introduction

Explorer: Pre-processing

• Data can be imported from a file in various formats:

– ARFF, CSV, C4.5, binary

• Data can also be read from a URL or from an SQL database (using JDBC)

• Pre-processing tools in WEKA are called “filters”

• WEKA contains filters for:

– Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

Page 8: Weka toolkit introduction
Page 9: Weka toolkit introduction
Page 10: Weka toolkit introduction
Page 11: Weka toolkit introduction
Page 12: Weka toolkit introduction
Page 13: Weka toolkit introduction
Page 14: Weka toolkit introduction
Page 15: Weka toolkit introduction
Page 16: Weka toolkit introduction
Page 17: Weka toolkit introduction
Page 18: Weka toolkit introduction
Page 19: Weka toolkit introduction
Page 20: Weka toolkit introduction
Page 21: Weka toolkit introduction
Page 22: Weka toolkit introduction
Page 23: Weka toolkit introduction
Page 24: Weka toolkit introduction
Page 25: Weka toolkit introduction
Page 26: Weka toolkit introduction
Page 27: Weka toolkit introduction
Page 28: Weka toolkit introduction
Page 29: Weka toolkit introduction

Explorer: Building “Classifiers”

• Classifiers in WEKA are models for predicting nominal or numeric quantities

• Implemented learning schemes include:

– Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

• “Meta”-classifiers include:

– Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …

Page 30: Weka toolkit introduction
Page 31: Weka toolkit introduction
Page 32: Weka toolkit introduction
Page 33: Weka toolkit introduction
Page 34: Weka toolkit introduction
Page 35: Weka toolkit introduction
Page 36: Weka toolkit introduction
Page 37: Weka toolkit introduction
Page 38: Weka toolkit introduction
Page 39: Weka toolkit introduction
Page 40: Weka toolkit introduction
Page 41: Weka toolkit introduction
Page 42: Weka toolkit introduction
Page 43: Weka toolkit introduction
Page 44: Weka toolkit introduction
Page 45: Weka toolkit introduction
Page 46: Weka toolkit introduction
Page 47: Weka toolkit introduction
Page 48: Weka toolkit introduction
Page 49: Weka toolkit introduction
Page 50: Weka toolkit introduction
Page 51: Weka toolkit introduction
Page 52: Weka toolkit introduction
Page 53: Weka toolkit introduction
Page 54: Weka toolkit introduction
Page 55: Weka toolkit introduction
Page 56: Weka toolkit introduction
Page 57: Weka toolkit introduction
Page 58: Weka toolkit introduction
Page 59: Weka toolkit introduction
Page 60: Weka toolkit introduction
Page 61: Weka toolkit introduction
Page 62: Weka toolkit introduction

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 63: Weka toolkit introduction

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 64: Weka toolkit introduction

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 65: Weka toolkit introduction
Page 66: Weka toolkit introduction
Page 67: Weka toolkit introduction
Page 68: Weka toolkit introduction
Page 69: Weka toolkit introduction
Page 70: Weka toolkit introduction
Page 71: Weka toolkit introduction
Page 72: Weka toolkit introduction

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 73: Weka toolkit introduction
Page 74: Weka toolkit introduction
Page 75: Weka toolkit introduction
Page 76: Weka toolkit introduction
Page 77: Weka toolkit introduction

Explorer: Clustering Data

• WEKA contains “clusterers” for finding groups of similar instances in a dataset

• Implemented schemes are:

– k-Means, EM, Cobweb, X-means, FarthestFirst

• Clusters can be visualized and compared to “true” clusters (if given)

• Evaluation based on loglikelihood if clustering scheme produces a probability distribution

Page 78: Weka toolkit introduction
Page 79: Weka toolkit introduction
Page 80: Weka toolkit introduction
Page 81: Weka toolkit introduction
Page 82: Weka toolkit introduction
Page 83: Weka toolkit introduction
Page 84: Weka toolkit introduction
Page 85: Weka toolkit introduction
Page 86: Weka toolkit introduction
Page 87: Weka toolkit introduction
Page 88: Weka toolkit introduction
Page 89: Weka toolkit introduction
Page 90: Weka toolkit introduction
Page 91: Weka toolkit introduction
Page 92: Weka toolkit introduction
Page 93: Weka toolkit introduction

Explorer: Finding Associations

• WEKA contains an implementation of the Apriori algorithm for learning association rules

– Works only with discrete data

• Can identify statistical dependencies between groups of attributes:

– milk, butter bread, eggs (with confidence 0.9 and support 2000)

• Apriori can compute all rules that have a given minimum support and exceed a given confidence

Page 94: Weka toolkit introduction
Page 95: Weka toolkit introduction
Page 96: Weka toolkit introduction
Page 97: Weka toolkit introduction
Page 98: Weka toolkit introduction
Page 99: Weka toolkit introduction
Page 100: Weka toolkit introduction
Page 101: Weka toolkit introduction

Explorer: Attribute Selection

• Panel that can be used to investigate which (subsets of) attributes are the most predictive ones

• Attribute selection methods contain two parts:

– A search method:

• best-first, forward selection, random, exhaustive, genetic algorithm, ranking

– An evaluation method:

• correlation-based, wrapper, information gain, chi-squared, …

• Very flexible: allows arbitrary combinations of these two

Page 102: Weka toolkit introduction
Page 103: Weka toolkit introduction
Page 104: Weka toolkit introduction
Page 105: Weka toolkit introduction
Page 106: Weka toolkit introduction
Page 107: Weka toolkit introduction
Page 108: Weka toolkit introduction
Page 109: Weka toolkit introduction
Page 110: Weka toolkit introduction

Explorer: Data Visualization

• Visualization very useful in practice:

– e.g. helps to determine difficulty of the learning problem

• WEKA can visualize single attributes and pairs of attributes

– To do: rotating 3-d visualizations (Xgobi-style)

• Color-coded class values

• “Jitter” option to deal with nominal attributes (and to detect “hidden” data points)

• “Zoom-in” function

Page 111: Weka toolkit introduction
Page 112: Weka toolkit introduction
Page 113: Weka toolkit introduction
Page 114: Weka toolkit introduction
Page 115: Weka toolkit introduction
Page 116: Weka toolkit introduction
Page 117: Weka toolkit introduction
Page 118: Weka toolkit introduction
Page 119: Weka toolkit introduction
Page 120: Weka toolkit introduction
Page 121: Weka toolkit introduction
Page 122: Weka toolkit introduction

Performing Experiments

• Experimenter makes it easy to compare the performance of different learning schemes

• For classification and regression problems

• Results can be written into file or database

• Evaluation options: cross-validation, learning curve, hold-out

• Can also iterate over different parameter settings

• Significance-testing built in!

Page 123: Weka toolkit introduction
Page 124: Weka toolkit introduction
Page 125: Weka toolkit introduction
Page 126: Weka toolkit introduction
Page 127: Weka toolkit introduction
Page 128: Weka toolkit introduction
Page 129: Weka toolkit introduction
Page 130: Weka toolkit introduction
Page 131: Weka toolkit introduction
Page 132: Weka toolkit introduction
Page 133: Weka toolkit introduction
Page 134: Weka toolkit introduction
Page 135: Weka toolkit introduction

The Knowledge Flow GUI

• New graphical user interface for WEKA

• Java-Beans-based interface for setting up and running machine learning experiments

• Data sources, classifiers, etc. are beans and can be connected graphically

• Data “flows” through components: e.g.,

“data source” -> “filter” -> “classifier” -> “evaluator”

• Layouts can be saved and loaded again later

Page 136: Weka toolkit introduction
Page 137: Weka toolkit introduction
Page 138: Weka toolkit introduction
Page 139: Weka toolkit introduction
Page 140: Weka toolkit introduction
Page 141: Weka toolkit introduction
Page 142: Weka toolkit introduction
Page 143: Weka toolkit introduction
Page 144: Weka toolkit introduction
Page 145: Weka toolkit introduction
Page 146: Weka toolkit introduction
Page 147: Weka toolkit introduction
Page 148: Weka toolkit introduction
Page 149: Weka toolkit introduction
Page 150: Weka toolkit introduction
Page 151: Weka toolkit introduction
Page 152: Weka toolkit introduction
Page 153: Weka toolkit introduction

Conclusion: Try It Yourself!

• WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka

Also has a list of projects based on WEKA

WEKA contributors:

Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang