getting started with weka -...

Post on 22-May-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Getting started with WekaYishuang Geng, Kexin Shi, Pei Zhang, Angel

Trifonov, Jiefeng He, Xiaolu Xiong

Lesson 1.1 - Introduction

Purpose of this course

● Take the mystery out of data mining.

● How to use the Weka workbench for data mining.

● Explain the basic principles of several popular algorithms

Data mining with Weka

● What’s data mining?○ We are overwhelmed with data○ Data mining is about going from the raw data to

information.

● What could data mining do?○ You’re at the supermarket checkout and you’re

happy with your bargains … and the supermarket is happy you’ve bought some more stuff

○ You want a child, but you and your partner can’t have one.

○ You want to monitor the firefighters status but you cannot get into the burning houses to watch them.

What is Weka?

1. A bird found only in New Zealand2. Waikato Environment for Knowledge

Analysis

● Weka includes:○ 100+ algorithms for classification○ 75 for data preprocessing○ 25 to assist with feature

selection○ 20 for clustering, finding

association rules, etc

Textbook

Data Mining: Practical machinelearning tools and techniques,by Ian H. Witten, Eibe Frank andMark A. Hall. Morgan Kaufmann, 2011

Learning outcome of the course

● Load data into Weka and look at it● Use filters to preprocess it● Explore it using interactive visualization● Apply classification algorithms● Interpret the output● Understand evaluation methods and their implications● Understand various representations for models● Explain how popular machine learning algorithms work● Be aware of common pitfalls with data mining

Use Weka on your own data… and understand what you are doing!

A simple application

○ You want to monitor the firefighters status but you cannot get into the burning houses to watch them.

A simple application

Motion Detection Using RF Signals for the FirstResponder in Emergency Operations● Firefighters

○ Sensor to monitor their physiological information, which personal area communication capability to a centroid node.

○ Centroid node has local area communication capability to link the terminals out of burning house.

○ If we want to monitor their motion, what should we do?

Existing approaches

● Pros○ High detection rate.○ Low computational cost.

● Cons○ Add extra load to firefighter.○ Limited sensor location, usually on shoes.○ Lack of capability on detecting multiple

motions,mainly used for fall detection.

Raw data

Data mining

Information from the raw data

Summary

● Why taking that course● Materials

○ Weka○ Textbook

● Course schedule○ Lectures○ Activities○ Assessments

● Learning outcome● A simple application

Lesson 1.2 - Exploring the Explorer

Setting up Weka

● Download latest (Weka 3.6.10) from http://www.cs.waikato.ac.nz/ml/weka/downloading.html

● Self-extracting executable ○ Java VM included (if needed)

● Create shortcut to Data folder in your Computer’s My Documents

● Use the Weka shortcut from the program folder

Weka Interface

● Weka interfaces○ Explorer○ Experimenter○ GUI○ Command-line

● Explorer will be used the most

Explorer Interface

● Explorer Panels○ Preprocess

● Opening datasets○ File

● Filter○ Supervised○ Unsupervised

Filters

● Difference

● An additional two kinds of filtering○ Instances○ Attributes

More Preprocess Information

● Relation○ Attributes○ Instances

● Selected Attribute○ Name○ Type○ Other Info

● Attributes○ Editing○ Removing

● Class Visualization● Status and log

Lesson 1.3 - Exploring datasets

Classification

Nominal vs. Numerical

ARFF file format

Lesson 1.4 - Building a classifier

Classifying the glass datasetInterpreting J48 outputJ48 configuration panel... option: pruned vs unpruned trees ... option: avoid small leaves

Jiefeng

Click Here

Use the confusion matrix to determine how many headlamps instances were misclassified as build wind float?

3What is the percentage ofcorrectly classified instances?

Turning pruning off results in larger trees, and often yields worse results because the classifier may "overfit" the data. However, in some cases the unpruned tree performs better than the pruned one.

1.4 Summary

Building a classifierClassifying the glass datasetInterpreting J48 outputJ48 configuration panel... option: pruned vs unpruned trees ... option: avoid small leaves

Lesson 1.5 - Using a filter

Use a filter to remove an attribute

Open weather.nominal.arff

Check the filters

Set attributeIndices to 3 and click OK

Apply the filter

Lesson 1.6 - Visualizing your data

Raw data visualization

Sepalwidth vs. petalwidth

Zoom in

Zoom in

Error visualization

Error visualization

Thank you!

Questions?

top related