rapidminer - upec/files_1112/lab05-lab.pdf · rapidminer ‐i.com/ open‐source data mining with...

26
Rapidminer http://rapidi.com/ OpenSource Data Mining with the Java Software RapidMiner “RapidMiner is the worldwide leading opensource data mining solution due to the combination of its leadingedge technologies and its functional range. Applications of RapidMiner cover a wide range of realworld data mining tasks.” 1

Upload: hanhi

Post on 01-Apr-2018

242 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

Rapidminer

http://rapid‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner  “RapidMiner is the world‐wide leading open‐source data 

mining solution due to the combination of its leading‐edge technologies and its functional range. Applications of RapidMiner cover a wide range of real‐world data mining tasks.”

1

Page 2: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

2

lab 05‐risk.xls

Page 3: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

3

Page 4: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

4

Page 5: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

5

Page 6: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

6

Page 7: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

7

Page 8: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

8

Page 9: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

9

Page 10: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

10

Page 11: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

11

Page 12: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

12

Page 13: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

13

We use k‐medoids because k‐means only works with numerical attributes. 

Page 14: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

14

Page 15: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

k‐means example

15

NAME  Calories  Protein  Fat  Calcium  Iron LabelBEEF BRAISED 340 20 28 9 2.6 1HAMBURGER 245 21 17 9 2.7 1BEEF ROAST 420 15 39 7 2 1BEEF STEAK 375 19 32 9 2.6 1BEEF CANNED 180 22 10 17 3.7 1CHICKEN BROILED 115 20 3 8 1.4 2CHICKEN CANNED 170 25 7 12 1.5 2BEEF HEART 160 26 5 14 5.9 3LAMB LEG ROAST 265 20 20 9 2.6 1LAMB SHOULDER ROAST 300 18 25 9 2.3 1SMOKED HAM 340 20 28 9 2.5 1PORK ROAST 340 19 29 9 2.5 1PORK SIMMERED 355 19 30 9 2.4 1BEEF TONGUE 205 18 14 7 2.5 1VEAL CUTLET 185 23 9 9 2.7 1BLUEFISH BAKED 135 22 4 25 0.6 2CLAMS RAW 70 11 1 82 6 3CLAMS CANNED 45 7 1 74 5.4 3CRABMEAT CANNED 90 14 2 38 0.8 2HADDOCK FRIED 135 16 5 15 0.5 2MACKEREL BROILED 200 19 13 5 1 2MACKEREL CANNED 155 16 9 157 1.8 3PERCH FRIED 195 16 11 14 1.3 2SALMON CANNED 120 17 5 159 0.7 3SARDINES CANNED 180 22 9 367 2.5 3TUNA CANNED 170 25 7 7 1.2 2SHRIMP CANNED 110 23 1 98 2.6 3

Page 16: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

k‐means example

16

Page 17: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

17

Page 18: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

18

Page 19: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

19

Page 20: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

20

Page 21: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

DBscan example

21

Page 22: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

Labeled data

22

Page 23: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

Results with k‐means

23

Page 24: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

DBscan

24

Page 25: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

25

References

Data Mining: Concepts and Techniques, Jiawei Han, Micheline Kamber (Morgan Kaufmann ‐ 2000)

Data Mining: Introductory and Advanced Topics, Margaret Dunham (Prentice Hall, 2002)

A Tutorial on Clustering Algorithms http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/index.html

Clustering Web Search Results, Iwona Białynicka‐Birula, http://www.di.unipi.it/~iwona/Clustering.ppt

Page 26: Rapidminer - UPec/files_1112/lab05-lab.pdf · Rapidminer ‐i.com/ Open‐Source Data Mining with the Java Software RapidMiner “RapidMiner is the world‐wide leading open‐source

26

Solutions nearly always come from the direction you least expect, which means there's no point in trying to look in that direction because it wont be coming from there. 

Douglas Adams