a simple introduction to weka

Post on 16-Apr-2017

171 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A SIMPLE INTRODUCTION TO WEKA

Contents What is WEKA? WEKA Explorer Preprocessing the data Classification Clustering Association Rules Attribute Selection Data Visualization

Waikato Environment for Knowledge Analysis

Developed by Department of Computer Science, University of Waikato, New Zealand.

Weka is also a bird found only on theislands of New Zealand.

1 What is WEKA?

A collection of machine learning algorithmsfor data mining tasks.

Download and Install WEKA

Website:http://www.cs.waikato.ac.nz/~ml/weka/index.html

Platform independent

Exploratory data analysis

Experimental environment

New process model inspired interface

Command Line Interface

WEKA GUI

WEKA Explorer

2

Preprocessing the data

Classification

Clustering

Association Rules

Attribute Selection

Data Visualization

“Pre-Processing the data

Data can be imported from a file in various formats.ARFF-Attribute-Relation File FormatCSV - Comma Separated Values

Data can be read from a URL or from a SQL database.

Filters are used for pre-processing

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with flat files

05/02/2023University of Waikato 13

05/02/2023University of Waikato 14

05/02/2023University of Waikato 15

05/02/2023University of Waikato 16

05/02/2023University of Waikato 17

05/02/2023University of Waikato 18

05/02/2023University of Waikato 19

05/02/2023University of Waikato 20

05/02/2023University of Waikato 21

05/02/2023University of Waikato 22

05/02/2023University of Waikato 23

05/02/2023University of Waikato 24

05/02/2023University of Waikato 25

05/02/2023University of Waikato 26

05/02/2023University of Waikato 27

05/02/2023University of Waikato 28

05/02/2023University of Waikato 29

05/02/2023University of Waikato 30

Building Classifiers Classifiers in WEKA are models for predicting nominal

or numeric quantities

Implemented learning schemes include:

Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptron, logistic regression, Bayes’ nets, …

age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

Decision Tree Induction: Training Dataset

age?

overcast

student? credit rating?

<=30 >40

no yes yes

yes

31..40

nofairexcellentyesno

Output: A Decision Tree for “buys_computer”

05/02/2023 University of Waikato 34

05/02/2023 University of Waikato 35

05/02/2023 University of Waikato 36

05/02/2023University of Waikato 37

05/02/2023University of Waikato 38

05/02/2023University of Waikato 39

05/02/2023University of Waikato 40

05/02/2023University of Waikato 41

05/02/2023University of Waikato 42

05/02/2023University of Waikato 43

05/02/2023University of Waikato 44

05/02/2023University of Waikato 45

05/02/2023University of Waikato 46

05/02/2023University of Waikato 47

05/02/2023University of Waikato 48

05/02/2023University of Waikato 49

05/02/2023University of Waikato 50

05/02/2023University of Waikato 51

05/02/2023University of Waikato 52

05/02/2023University of Waikato 53

05/02/2023University of Waikato 54

05/02/2023University of Waikato 55

Clustering data Finding groups of similar instances in a

dataset

Implemented schemes in WEKA are:k-Means, EM, Cobweb, X-means, FarthestFirst

Finding Associations WEKA contains an implementation of the Apriori

algorithm for learning association rules

Works only with discrete data

Can identify statistical dependencies between groups of attributes:

05/02/2023University of Waikato 58

05/02/2023University of Waikato 59

05/02/2023University of Waikato 60

05/02/2023University of Waikato 61

05/02/2023University of Waikato 62

Used to determine the most predictive attributes

Consists of two parts

1.) A search method : best-first, forward selection, random, exhaustive, genetic algorithm and etc.

2.)An evaluation method : correlation-based, wrapper, information gain an etc.

Attribute Selection

05/02/2023University of Waikato 64

05/02/2023University of Waikato 65

05/02/2023University of Waikato 66

05/02/2023University of Waikato 67

05/02/2023University of Waikato 68

05/02/2023University of Waikato 69

05/02/2023University of Waikato 70

05/02/2023University of Waikato 71

Data Visualization WEKA can visualize single attributes (1-d) and

pairs of attributes (2-d)

Color-coded class values

Use of “jitter” option

“Zoom-in” function

05/02/2023University of Waikato 73

05/02/2023University of Waikato 74

05/02/2023University of Waikato 75

05/02/2023University of Waikato 76

05/02/2023University of Waikato 77

05/02/2023University of Waikato 78

05/02/2023University of Waikato 79

05/02/2023University of Waikato 80

05/02/2023University of Waikato 81

05/02/2023University of Waikato 82

The End

top related