an introduction to weka - nus computing · 2006-04-17 · lecture at national yang ming university,...

59
Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong Difeng

Upload: others

Post on 22-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006

An Introduction to WEKA

Lecture by Limsoon WongSlides prepared by Dong Difeng

Page 2: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

2

Outline

• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference

Page 3: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

3

What is WEKA

• Developed at Univ of Waikato in New Zealand

• A collection of state-of-art machine learning algorithms and data preprocessing tools

• Provide implementation of – Regression– Classification– Clustering– Association rules– Feature selection

Page 4: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

4

What is WEKA

Page 5: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

5

Outline

• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference

Page 6: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

6

Knowledge Flow

• Experiment 1:– Type: Classification– Feature selection: GainRatio; Ranker (top 3)– Algorithm: ID3– Training: Weather_nominal.arff– Test: Weather_nominal.arff

Page 7: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

7

Knowledge Flow

Page 8: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

8

Knowledge Flow

• Source file (.ARFF)

Page 9: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

9

Knowledge Flow

Page 10: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

10

Knowledge Flow

Page 11: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

11

Knowledge Flow

Page 12: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

12

Knowledge Flow

Page 13: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

13

Knowledge Flow

Page 14: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

14

Knowledge Flow

Page 15: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

15

Knowledge Flow

Page 16: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

16

Knowledge Flow

Page 17: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

17

Knowledge Flow

Page 18: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

18

Knowledge Flow

Page 19: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

19

Knowledge Flow

Page 20: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

20

Knowledge Flow

Page 21: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

21

Knowledge Flow

Page 22: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

22

Knowledge Flow

Page 23: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

23

Knowledge Flow

Page 24: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

24

Knowledge Flow

Page 25: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

25

Knowledge Flow

Page 26: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

26

Knowledge Flow

Page 27: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

27

Knowledge Flow

Page 28: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

28

Knowledge Flow

Page 29: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

29

Knowledge Flow

Page 30: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

30

Knowledge Flow

Page 31: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

31

Knowledge Flow

Page 32: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

32

Knowledge Flow

Page 33: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

33

Knowledge Flow

Page 34: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

34

Outline

• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference

Page 35: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

35

Explorer

• Do the same experiment…• Experiment 1:

– Type: Classification– Feature selection: GainRatio; Ranker top 3– Algorithm: ID3– Training: Weather_nominal.arff– Test: Weather_nominal.arff

Page 36: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

36

Explorer

Page 37: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

37

Explorer

Page 38: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

38

Explorer

Page 39: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

39

Explorer

Page 40: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

40

Explorer

Page 41: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

41

Explorer

Page 42: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

42

Outline

• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference

Page 43: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

43

Why Knowledge Flow

• There are some jobs we cannot do in explorer …– Combine feature selection– Build more complicated systems

• KF describes the process more clearly– Never regard the training and test data to be

separate in the previous example in explorer• KF help us to access some mid-process info of

the machine learning method– Cross Validation

Page 44: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

44

Outline

• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference

Page 45: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

45

Cross Validation

• Experiment 2:– Type: Classification– Feature selection: GainRatio; Ranker top 3– Algorithm: ID3– Training: Weather_nominal.arff (CV)– Test: Weather_nominal.arff (CV)– CV type: 3-folder CV

Page 46: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

46

Cross Validation

Page 47: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

47

Cross ValidationWhat do we view in this case?Text1 VS. Text2 (1)

Page 48: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

48

Cross Validation

Text1 VS. Text2 (2)

Page 49: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

49

Cross Validation

Text1 VS. Text2 (3)

Page 50: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

50

Cross Validation

Text3 VS. Text4 (1)

Page 51: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

51

Cross Validation

Text3 VS. Text4 (2)

Page 52: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

52

Cross Validation

Text3 VS. Text4 (3)

Page 53: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

53

Cross ValidationTrees

Page 54: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

54

Cross ValidationEvaluation of result

Page 55: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

55

Cross Validation

• Conclusion:– Source data are separated into several folders for

cross validation– Feature selection is done for each training folder

(only training) folder separately– Different trees are build in different cases– The evaluation of classification is by overall results

Page 56: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

56

Cross Validation

• Experiment 3:– Type: Classification– Feature selection: GainRatio; Ranker top 2– Algorithm: ID3– Training: Weather_nominal.arff– Test: Weather_nominal.arff

Page 57: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

57

Cross Validation

Ranker top 3 VS. Ranker top 2

Page 58: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

58

Cross Validation

• Conclusion:– Attribute “windy” was ignored. In this case, the

classifier only consider the attribute that was kept

Page 59: An Introduction to WEKA - NUS Computing · 2006-04-17 · Lecture at National Yang Ming University, June 2006 An Introduction to WEKA Lecture by Limsoon Wong Slides prepared by Dong

Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng

59

Reference

• http://www.cs.waikato.ac.nz/~ml/

• Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)