mining for patterns based on contingency tables by kl-miner first experience jan rauch milan...
TRANSCRIPT
![Page 1: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/1.jpg)
Mining for Patterns Based on Contingency Tables by KL-Miner
First Experience
Jan Rauch Milan Šimůnek (PhD. student)
Václav Lín (student)University of Economics Prague
![Page 2: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/2.jpg)
FDM 2003 2
… KL-Miner, First Experience
KL-Miner Basic features
Application example
Implementation principles
Scalability
Concluding remarks
![Page 3: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/3.jpg)
FDM 2003 3
KL-Miner -- Data and Patterns
M A1 A2 … AP
o1 2 12 … 1
o2 1 5 … 4
… … … … …
on 3 9 … 2
Data:
Data Matrix
Patterns i.e. KL-hypothesis: R C /
row attribute R {A1, …, AP}, possible values i.e. categories: r1, …, rK
column attribute C {A1, …, AP}, possible values i.e. categories: c1, …, cL
Boolean attribute derived from other attributes A1, …, AP
KL quantifier …. Condition imposed on contingency table of R and C
![Page 4: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/4.jpg)
FDM 2003 4
KL – quantifiers
Contingency table
of R and C:
Examples of quantifiers:
Simple aggregate function:
Kendall’s quantifier: e.g. |b | P
![Page 5: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/5.jpg)
FDM 2003 5
Kendall’s quantifier
b 0;1
b > 0 … positive ordinal dependence
b < 0 … negative ordinal dependence
b = 0 … ordinal independence
| b | = 1 … C is a function of R
Kendall’s quantifier: e. g. | b | p or | b | p
:Kendall’s coeficient:
![Page 6: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/6.jpg)
FDM 2003 6
KL-Miner application example STULONG Project, 1419 patients, entry examination
See http://euromise.vse.cz
![Page 7: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/7.jpg)
FDM 2003 7
STULONG attributes examples (1)
Systolic blood pressure
Smoking
Group of patients
![Page 8: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/8.jpg)
FDM 2003 8
STULONG attributes examples (2)
Skinfold above musculus triceps (mm)
Beer – amount / day
219 attributes total
38 ordinal attributes
We use 17 ordinal attributes
![Page 9: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/9.jpg)
FDM 2003 9
Example - analytic questionAre there any ordinal dependencies among attributes under some conditions?
at least 50 patients
| b | 0.75
relevant conditions :
![Page 10: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/10.jpg)
FDM 2003 10
Example – relevant condition specification (1)
Group of patients (normal), Group of patients (risk), …
Beer 10(yes), Beer 12(yes), …, Beer 10(yes) Beer 12(yes)
Sliding windows …
![Page 11: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/11.jpg)
FDM 2003 11
Example – relevant condition specification (2)
4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 504, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50
...........
4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50
4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50
Sliding window
![Page 12: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/12.jpg)
FDM 2003 12
Example – output overview2 min 1sec
550 310 verifications
25 hypotheses
3.06 GHz
512 MB DDR SDRAM
![Page 13: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/13.jpg)
FDM 2003 13
Example – output detail (1)
b = 0.82 (i.e. strong positive ordinal dependence)
![Page 14: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/14.jpg)
FDM 2003 14
Example – output detail (2)
b = 0.78 (i.e. strong positive ordinal dependence)
![Page 15: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/15.jpg)
FDM 2003 15
Implementation principles (1)
M A1 A2 … AP A1[1] A1 [2] A1 [3]
o1 2 12 … 1 0 1 0
o2 1 5 … 4 1 0 0
… … … … … … … …
on 3 9 … 2 0 0 1
Attributes Cards of categories of A1
Attributes are represented by cards of categories i.e. strings of bits
![Page 16: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/16.jpg)
FDM 2003 16
Implementation principles (2)
CARD [] = bit string representation of Booelan attribute
CARD [ Group of patients (normal) Beer 10(yes) Beer 12(yes) ]
= Group of patients [normal] Beer 10[yes] Beer 12[yes]
Count() – number of “1” in the bit string
![Page 17: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/17.jpg)
FDM 2003 17
Implementation principles (3)
n1,1 = Count( R[r1] C[c1] CARD [])
![Page 18: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/18.jpg)
FDM 2003 18
Scalability
75 000 verifications
approximately linear
![Page 19: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of](https://reader035.vdocuments.us/reader035/viewer/2022070306/5519b4bf5503465b578b4746/html5/thumbnails/19.jpg)
FDM 2003 19
Concluding remarks
KL-Miner practically interesting results
Suitable for interactive work
Further quantifiers
Combinations with further mining procedures