an introduction to weka

79
An Introduction to WEKA ECLT 5810/ SEEM 5750 The Chinese University of Hong Kong

Upload: others

Post on 09-Jan-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to WEKA

An Introduction to

WEKAECLT 5810/ SEEM 5750

The Chinese University of Hong Kong

Page 2: An Introduction to WEKA

Content

What is WEKA?

The Explorer:

Preprocess data

Classification

Clustering

Association Rules

Attribute Selection

Data Visualization

References and Resources

4/8/2019 2

Page 3: An Introduction to WEKA

What is WEKA?

Waikato Environment for

Knowledge Analysis It’s a data mining/machine learning tool developed by

Department of Computer Science, University of Waikato,

New Zealand.

Weka is also a bird found only on the islands of New

Zealand.

3

Page 4: An Introduction to WEKA

Download and Install WEKA

Website:

http://www.cs.waikato.ac.nz/~ml/weka/index.html

Support multiple platforms (written in java):

Windows, Mac OS X and Linux

4

Page 5: An Introduction to WEKA

Main GUI

Three graphical user interfaces

“The Explorer” (exploratory data analysis)

“The Experimenter” (experimental environment)

“The KnowledgeFlow” (new process model inspired interface)

5

Page 6: An Introduction to WEKA

Content

What is WEKA?

The Explorer:

Preprocess data

Classification

Clustering

Association Rules

Attribute Selection

Data Visualization

References and Resources

4/8/2019 6

Page 7: An Introduction to WEKA

Explorer: pre-processing the

data

Data can be imported from a file in various formats:

ARFF, CSV, C4.5, binary

Data can also be read from a URL or from an SQL

database (using JDBC)

Pre-processing tools in WEKA are called “filters”

WEKA contains filters for:

Discretization, normalization, resampling, attribute

selection, transforming and combining attributes, …

4/8/2019 7

Page 8: An Introduction to WEKA

ARFF Format

@relation heart-disease-simplified

@attribute age numeric

@attribute sex { female, male}

@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}

@attribute cholesterol numeric

@attribute exercise_induced_angina { no, yes}

@attribute class { present, not_present}

@data

63,male,typ_angina,233,no,not_present

67,male,asympt,286,yes,present

67,male,asympt,229,yes,present

38,female,non_anginal,?,no,not_present

...4/8/2019 8

Page 9: An Introduction to WEKA

4/8/2019University of Waikato 9

Page 10: An Introduction to WEKA

4/8/2019University of Waikato 10

Page 11: An Introduction to WEKA

4/8/2019University of Waikato 11

Page 12: An Introduction to WEKA

4/8/2019University of Waikato 12

Page 13: An Introduction to WEKA

4/8/2019University of Waikato 13

Page 14: An Introduction to WEKA

4/8/2019University of Waikato 14

Page 15: An Introduction to WEKA

4/8/2019University of Waikato 15

Page 16: An Introduction to WEKA

4/8/2019University of Waikato 16

Page 17: An Introduction to WEKA

4/8/2019University of Waikato 17

Page 18: An Introduction to WEKA

4/8/2019University of Waikato 18

Page 19: An Introduction to WEKA

4/8/2019University of Waikato 19

Page 20: An Introduction to WEKA

4/8/2019University of Waikato 20

Page 21: An Introduction to WEKA

4/8/2019University of Waikato 21

Page 22: An Introduction to WEKA

4/8/2019University of Waikato 22

Page 23: An Introduction to WEKA

4/8/2019University of Waikato 23

Page 24: An Introduction to WEKA

4/8/2019University of Waikato 24

Page 25: An Introduction to WEKA

4/8/2019University of Waikato 25

Page 26: An Introduction to WEKA

4/8/2019University of Waikato 26

Page 27: An Introduction to WEKA

4/8/2019University of Waikato 27

Page 28: An Introduction to WEKA

4/8/2019University of Waikato 28

Page 29: An Introduction to WEKA

4/8/2019University of Waikato 29

Page 30: An Introduction to WEKA

Explorer: building classifiers

Classifiers in WEKA are models for predicting nominal or

numeric quantities

Implemented learning schemes include:

Decision trees and lists, instance-based classifiers, support

vector machines, multi-layer perceptrons, logistic

regression, Bayes’ nets, …

4/8/2019 30

Page 31: An Introduction to WEKA

4/8/2019University of Waikato 31

Page 32: An Introduction to WEKA

4/8/2019University of Waikato 32

Page 33: An Introduction to WEKA

4/8/2019University of Waikato 33

Page 34: An Introduction to WEKA

4/8/2019University of Waikato 34

Page 35: An Introduction to WEKA

4/8/2019University of Waikato 35

Page 36: An Introduction to WEKA

4/8/2019University of Waikato 36

Page 37: An Introduction to WEKA

4/8/2019University of Waikato 37

Page 38: An Introduction to WEKA

4/8/2019University of Waikato 38

Page 39: An Introduction to WEKA

4/8/2019University of Waikato 39

Page 40: An Introduction to WEKA

4/8/2019University of Waikato 40

Page 41: An Introduction to WEKA

4/8/2019University of Waikato 41

Page 42: An Introduction to WEKA

4/8/2019University of Waikato 42

Page 43: An Introduction to WEKA

4/8/2019University of Waikato 43

Page 44: An Introduction to WEKA

4/8/2019University of Waikato 44

Page 45: An Introduction to WEKA

4/8/2019University of Waikato 45

Page 46: An Introduction to WEKA

4/8/2019University of Waikato 46

Page 47: An Introduction to WEKA

4/8/2019University of Waikato 47

Page 48: An Introduction to WEKA

4/8/2019University of Waikato 48

Page 49: An Introduction to WEKA

4/8/2019University of Waikato 49

Page 50: An Introduction to WEKA

4/8/2019University of Waikato 50

Page 51: An Introduction to WEKA

4/8/2019University of Waikato 51

Page 52: An Introduction to WEKA

4/8/2019University of Waikato 52

Page 53: An Introduction to WEKA

Explorer: finding associations

WEKA contains an implementation of the Apriori

algorithm for learning association rules

Works only with discrete data

Can identify statistical dependencies between groups of

attributes:

milk, butter bread, eggs (with confidence 0.9 and

support 2000)

Apriori can compute all rules that have a given

minimum support and exceed a given confidence

4/8/2019 55

Page 54: An Introduction to WEKA

4/8/2019University of Waikato 56

Page 55: An Introduction to WEKA

4/8/2019University of Waikato 57

Page 56: An Introduction to WEKA

4/8/2019University of Waikato 58

Page 57: An Introduction to WEKA

4/8/2019University of Waikato 59

Page 58: An Introduction to WEKA

4/8/2019University of Waikato 60

Page 59: An Introduction to WEKA

Explorer: attribute selection

Panel that can be used to investigate which (subsets of)

attributes are the most predictive ones

Attribute selection methods contain two parts:

A search method: best-first, forward selection, random,

exhaustive, genetic algorithm, ranking

An evaluation method: correlation-based, wrapper,

information gain, chi-squared, …

Very flexible: WEKA allows (almost) arbitrary

combinations of these two

4/8/2019 61

Page 60: An Introduction to WEKA

4/8/2019University of Waikato 62

Page 61: An Introduction to WEKA

4/8/2019University of Waikato 63

Page 62: An Introduction to WEKA

4/8/2019University of Waikato 64

Page 63: An Introduction to WEKA

4/8/2019University of Waikato 65

Page 64: An Introduction to WEKA

4/8/2019University of Waikato 66

Page 65: An Introduction to WEKA

4/8/2019University of Waikato 67

Page 66: An Introduction to WEKA

4/8/2019University of Waikato 68

Page 67: An Introduction to WEKA

4/8/2019University of Waikato 69

Page 68: An Introduction to WEKA

Explorer: data visualization

Visualization very useful in practice: e.g. helps to

determine difficulty of the learning problem

WEKA can visualize single attributes (1-d) and pairs of

attributes (2-d)

To do: rotating 3-d visualizations (Xgobi-style)

Color-coded class values

“Jitter” option to deal with nominal attributes (and to

detect “hidden” data points)

“Zoom-in” function

4/8/2019 70

Page 69: An Introduction to WEKA

4/8/2019University of Waikato 71

Page 70: An Introduction to WEKA

4/8/2019University of Waikato 72

Page 71: An Introduction to WEKA

4/8/2019University of Waikato 73

Page 72: An Introduction to WEKA

4/8/2019University of Waikato 74

Page 73: An Introduction to WEKA

4/8/2019University of Waikato 75

Page 74: An Introduction to WEKA

4/8/2019University of Waikato 76

Page 75: An Introduction to WEKA

4/8/2019University of Waikato 77

Page 76: An Introduction to WEKA

4/8/2019University of Waikato 78

Page 77: An Introduction to WEKA

4/8/2019University of Waikato 79

Page 78: An Introduction to WEKA

4/8/2019University of Waikato 80

Page 79: An Introduction to WEKA

References and Resources

References:

WEKA website: http://www.cs.waikato.ac.nz/~ml/weka/index.html

WEKA Tutorial: Machine Learning with WEKA: A presentation demonstrating

all graphical user interfaces (GUI) in Weka.

WEKA Data Mining Book: Ian H. Witten and Eibe Frank, Data Mining: Practical

Machine Learning Tools and Techniques (Second Edition)

WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page

Others: Jiawei Han and Micheline Kamber, Data Mining:

Concepts and Techniques, 2nd ed.