an introduction to weka - nus computing · 2006-04-17 · lecture at national yang ming university,...
TRANSCRIPT
Lecture at National Yang Ming University, June 2006
An Introduction to WEKA
Lecture by Limsoon WongSlides prepared by Dong Difeng
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
2
Outline
• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
3
What is WEKA
• Developed at Univ of Waikato in New Zealand
• A collection of state-of-art machine learning algorithms and data preprocessing tools
• Provide implementation of – Regression– Classification– Clustering– Association rules– Feature selection
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
4
What is WEKA
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
5
Outline
• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
6
Knowledge Flow
• Experiment 1:– Type: Classification– Feature selection: GainRatio; Ranker (top 3)– Algorithm: ID3– Training: Weather_nominal.arff– Test: Weather_nominal.arff
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
7
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
8
Knowledge Flow
• Source file (.ARFF)
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
9
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
10
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
11
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
12
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
13
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
14
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
15
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
16
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
17
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
18
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
19
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
20
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
21
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
22
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
23
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
24
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
25
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
26
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
27
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
28
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
29
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
30
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
31
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
32
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
33
Knowledge Flow
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
34
Outline
• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
35
Explorer
• Do the same experiment…• Experiment 1:
– Type: Classification– Feature selection: GainRatio; Ranker top 3– Algorithm: ID3– Training: Weather_nominal.arff– Test: Weather_nominal.arff
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
36
Explorer
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
37
Explorer
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
38
Explorer
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
39
Explorer
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
40
Explorer
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
41
Explorer
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
42
Outline
• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
43
Why Knowledge Flow
• There are some jobs we cannot do in explorer …– Combine feature selection– Build more complicated systems
• KF describes the process more clearly– Never regard the training and test data to be
separate in the previous example in explorer• KF help us to access some mid-process info of
the machine learning method– Cross Validation
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
44
Outline
• What is WEKA• Knowledge Flow• Explorer• Why Knowledge Flow• Cross Validation• Reference
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
45
Cross Validation
• Experiment 2:– Type: Classification– Feature selection: GainRatio; Ranker top 3– Algorithm: ID3– Training: Weather_nominal.arff (CV)– Test: Weather_nominal.arff (CV)– CV type: 3-folder CV
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
46
Cross Validation
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
47
Cross ValidationWhat do we view in this case?Text1 VS. Text2 (1)
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
48
Cross Validation
Text1 VS. Text2 (2)
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
49
Cross Validation
Text1 VS. Text2 (3)
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
50
Cross Validation
Text3 VS. Text4 (1)
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
51
Cross Validation
Text3 VS. Text4 (2)
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
52
Cross Validation
Text3 VS. Text4 (3)
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
53
Cross ValidationTrees
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
54
Cross ValidationEvaluation of result
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
55
Cross Validation
• Conclusion:– Source data are separated into several folders for
cross validation– Feature selection is done for each training folder
(only training) folder separately– Different trees are build in different cases– The evaluation of classification is by overall results
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
56
Cross Validation
• Experiment 3:– Type: Classification– Feature selection: GainRatio; Ranker top 2– Algorithm: ID3– Training: Weather_nominal.arff– Test: Weather_nominal.arff
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
57
Cross Validation
Ranker top 3 VS. Ranker top 2
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
58
Cross Validation
• Conclusion:– Attribute “windy” was ignored. In this case, the
classifier only consider the attribute that was kept
Lecture at National Yang Ming University, June 2006 Copyright 2006 © Dong Difeng
59
Reference
• http://www.cs.waikato.ac.nz/~ml/
• Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)