learn to use wekalearn to use weka · simple cli • the simple cli provides full access to all...

Post on 16-Oct-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Learn to Use WekaLearn to Use Weka

Jue Wang (Joyce)Department of Computer Science,

University of Massachusetts, Boston

Feb-09-2010

Outline

• Introduction of Weka

• Explorerp

Filter

Classifyy

Cluster

• ExperimenterExperimenter

• KnowledgeFlow

• Simple CLI• Simple CLI

2

What is Weka?

Copyright: Martin Kramer (mkramer@wxs.nl)

Waikato Environment for Knowledge

A l i( @ )

3

Analysis

Introduction of Weka

• Machine learning/data mining software written in Java (distributed under the GNU Public License)

Support MS Windows, Mac OS X and GNU/LinuxSupport MS Windows, Mac OS X and GNU/Linux

• Used for research, education, and applications

• Main features:

Comprehensive set of data pre processing tools learningComprehensive set of data pre-processing tools, learning algorithms and evaluation methods

Graphical user interfaces (incl. data visualization)

fEnvironment for comparing learning algorithms

4

Data Set% This is a toy example the UCI weather dataset Comment lines at the beginning of the dataset should give an% This is a toy example, the UCI weather dataset.% Any relation to real weather is purely coincidental.

Comment lines at the beginning of the dataset should give an indication of its source, context and meaning.

@relation golfWeatherMichigan_1988/02/10_14days Here we state the internal name of the dataset. Try to be as comprehensive as possible.

@attribute outlook {sunny, overcast, rainy}@attribute windy {TRUE, FALSE}

Here we define two nominal attributes, outlook and windy. The former has three values: sunny, overcast and rainy; the latter two: TRUE and FALSE. Nominal values with special characters, commas or spaces are enclosed in ’single quotes’.

@attribute temperature real@attribute humidity real

These lines define two numeric attributes. Instead of real, integer or numeric can also be used. While double floating point values are stored internally, only seven decimal digits are usually processed.

@ ib l { } Th l ib i h d f l l i bl d f@attribute play {yes, no} The last attribute is the default target or class variable used for prediction. In our case it is a nominal attribute with two values, making this a binary classification problem.

@dataFALSE 85 85

The rest of the dataset consists of the token @data, followed byt d l f th tt ib t lisunny,FALSE,85,85,no

sunny,TRUE,80,90,noovercast,FALSE,83,86,yesrainy,FALSE,70,96,yesrainy,FALSE,68,?,yes

comma‐separated values for the attributes – one line per  example. In our case there are five examples.

y, , , ,y

More details: http://www.cs.waikato.ac.nz/~ml/weka/arff.htmlhttp://www.cs.waikato.ac.nz/~ml/weka/arff.html5

Explorer

6

Explorer

• Filtertransforms datasets:transforms datasets:

removing or adding attributes

resampling the datasetresampling the dataset

removing examples

7

Explorer• FilterFilter

8

Explorer• ClassifyClassify

9

Explorer• ClusterCluster

10

Experimenter

11

Experimenter

• Experimenter makes it easy to compare the performance of different learning schemes

F l ifi ti d i bl• For classification and regression problems

• Results can be written into file or database

• Evaluation options: cross-validation learning• Evaluation options: cross-validation, learning

curve, hold-out

• Can also iterate over different parameter settingsp g

• Significance-testing built in!

12

KnowledgeFlow

13

KnowledgeFlow

• New graphical user interface for WEKA

• Java-Beans-based interface for setting up and i hi l i i trunning machine learning experiments

• Data sources, classifiers, etc. are beans and can be connected graphicallyconnected graphically

• Data “flows” through components: e.g.,

“data source” -> “filter” -> “classifier” ->

“evaluator”

• Layouts can be saved and loaded again later

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

Simple CLI

33

Simple CLI

• The Simple CLI provides full access to all Weka classes, i.e., classifiers, filters, clusterers, etc., but without the hassle of the CLASSPATH (it facilitates the one, with which Weka was started)Weka was started).

• The Simple CLI is the place for you test the code calling Weka from other programcode calling Weka from other program.

34

Simple CLI

java <classname> [<args>] invokes a java class with the given arguments (if any)

break stops the current thread e g a runningbreak stops the current thread, e.g., a running classifier, in a friendly manner

kill stops the current thread, e.g., a running classifier in a friendly mannerclassifier, in a friendly manner

cls clears the output area

exit exits the Simple CLI

help [<command>] provides an overview of the available commands if without a command name as argument, otherwise more help on the specified commandspecified command

35

ThanksThanks

36

top related