an introduction to weka - nus computing · 2006-04-17 · lecture at national yang ming university,...

Post on 22-May-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture at National Yang Ming University, June 2006

An Introduction to WEKA

Lecture by Limsoon WongSlides prepared by Dong Difeng

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

2

Outline

• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

3

What is WEKA

• Developed at Univ of Waikato in New Zealand

• A collection of state-of-art machine learning algorithms and data preprocessing tools

• Provide implementation of – Regression– Classification– Clustering– Association rules– Feature selection

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

4

What is WEKA

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

5

Outline

• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

6

Knowledge Flow

• Experiment 1:– Type: Classification– Feature selection: GainRatio; Ranker (top 3)– Algorithm: ID3– Training: Weather_nominal.arff– Test: Weather_nominal.arff

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

7

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

8

Knowledge Flow

• Source file (.ARFF)

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

9

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

10

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

11

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

12

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

13

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

14

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

15

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

16

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

17

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

18

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

19

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

20

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

21

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

22

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

23

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

24

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

25

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

26

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

27

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

28

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

29

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

30

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

31

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

32

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

33

Knowledge Flow

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

34

Outline

• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

35

Explorer

• Do the same experiment…• Experiment 1:

– Type: Classification– Feature selection: GainRatio; Ranker top 3– Algorithm: ID3– Training: Weather_nominal.arff– Test: Weather_nominal.arff

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

36

Explorer

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

37

Explorer

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

38

Explorer

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

39

Explorer

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

40

Explorer

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

41

Explorer

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

42

Outline

• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

43

Why Knowledge Flow

• There are some jobs we cannot do in explorer …– Combine feature selection– Build more complicated systems

• KF describes the process more clearly– Never regard the training and test data to be

separate in the previous example in explorer• KF help us to access some mid-process info of

the machine learning method– Cross Validation

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

44

Outline

• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

45

Cross Validation

• Experiment 2:– Type: Classification– Feature selection: GainRatio; Ranker top 3– Algorithm: ID3– Training: Weather_nominal.arff (CV)– Test: Weather_nominal.arff (CV)– CV type: 3-folder CV

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

46

Cross Validation

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

47

Cross ValidationWhat do we view in this case?Text1 VS. Text2 (1)

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

48

Cross Validation

Text1 VS. Text2 (2)

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

49

Cross Validation

Text1 VS. Text2 (3)

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

50

Cross Validation

Text3 VS. Text4 (1)

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

51

Cross Validation

Text3 VS. Text4 (2)

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

52

Cross Validation

Text3 VS. Text4 (3)

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

53

Cross ValidationTrees

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

54

Cross ValidationEvaluation of result

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

55

Cross Validation

• Conclusion:– Source data are separated into several folders for

cross validation– Feature selection is done for each training folder

(only training) folder separately– Different trees are build in different cases– The evaluation of classification is by overall results

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

56

Cross Validation

• Experiment 3:– Type: Classification– Feature selection: GainRatio; Ranker top 2– Algorithm: ID3– Training: Weather_nominal.arff– Test: Weather_nominal.arff

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

57

Cross Validation

Ranker top 3 VS. Ranker top 2

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

58

Cross Validation

• Conclusion:– Attribute “windy” was ignored. In this case, the

classifier only consider the attribute that was kept

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

59

Reference

• http://www.cs.waikato.ac.nz/~ml/

• Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)

top related