meta-association rules for fusing regular association...
TRANSCRIPT
![Page 1: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/1.jpg)
Meta-Association Rules for Fusing RegularAssociation Rules from Different Databases
M.D. Ruiz, J. Gomez-Romero, M.J. Martin-Bautista, D.Sanchez, M. Delgado
9th July 2014
![Page 2: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/2.jpg)
Motivation
I Exponential growth of available data in Data Mining area.
I Datasets are often distributed.
I Datasets are processed separately (several mining processes arecarried out over data with similar meaning coming from a differentsource)
⇒ the extracted information should be fused in order to provide aunified and not overwhelming view to the user.
2
![Page 3: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/3.jpg)
Motivation
Several problems arise when using association rule algorithms indistributed databases:
1. Obtaining rules from very large datasets can be difficult andtime-consuming.
• Parallel versions of rule mining algorithms, e.g. MapReduce
2. Handling with distributed databases with similar meaning anddifferent description, that they cannot be directly merged.
Solution:
Data Mining + Information Fusion
3
![Page 4: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/4.jpg)
Overview
1. Example in Crime Data Analysis
2. ProposalBrief Introduction to Association RulesMeta-Association Rules
3. Algorithm and Implementation Issues
4. Experimental Evaluation
5. Discussion and Future Research
6. References
4
![Page 5: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/5.jpg)
Example in Crime Data Analysis
I We want to study the crime incidents happened in the city ofChicago.
I Each district of the Chicago has its own dataset: D1, D2, . . . , Dk
some of them sharing some of their attributes.
I Association rule mining algorithms are executed separately in eachdistrict obtaining different sets of rules: R1, R2, . . . , Rk.
I There are several attributes concerning/describing some aspects ofthe districts: at1, at2, . . . , atm
Proposal:
Fusing this information by means of Meta-Association Rules
5
![Page 6: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/6.jpg)
Proposal
…"R1# R2# Rk&1# Rk#
"Meta#database#
Rules"r1,"r2,"…,"rn"|"Addi1onal"a4ributes:"at1,"…,"atm""
Meta&associa1on#rules#
6
![Page 7: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/7.jpg)
Brief Introduction to Association Rules
I Data is usually stored in datasets D composed by transactions ti(rows) and attributes (columns).
I We call item to a pair 〈attribute, value〉 or 〈attribute, interval〉.
D i1 i2 . . . ij ij+1 . . . im
t1 1 0 . . . 0 1 . . . 0t2 0 1 . . . 1 1 . . . 1...
......
. . ....
.... . .
...tn 1 1 . . . 0 1 . . . 1
I Association Rules are expressions of the form A→ B where A, Bare non-empty set of items with no intersection.
I An association rule represents a relation between the jointco-occurrence of A and B.
7
![Page 8: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/8.jpg)
Brief Introduction to Association Rules
I The support of an itemset A is defined as probability that atransaction contains the item
supp(A) =|t ∈ D : A ⊆ t|
|D|
I For assessing the ARs validity, the most common measures aresupport (joint probability P (A ∪B)) and confidence (conditionalprobability P (B|A)
Supp(A→ B) =supp(A ∪B)
|D|; Conf(A→ B) =
supp(A ∪B)
supp (A)
that must be ≥ minsupp and ≥ minconf resp. (thresholdsimposed by the user), that is, the rule is frequent and confident.
8
![Page 9: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/9.jpg)
Brief Introduction to Association Rules
I An alternative framework is to measure the accuracy by means ofthe certainty factor, CF (A→ B)
Conf(A→ B)− supp(B)
1− supp(B)if Conf(A→ B) > supp(B)
Conf(A→ B)− supp(B)
supp(B)if Conf(A→ B) < supp(B)
0 otherwise.
I CF measures how our belief that B is in a transaction changes whenwe are told that A is in that transaction.
I Certainty factor has better properties than confidence and otherquality measures, in particular, it helps to reduce the number ofrules obtained by filtering those rules corresponding to statisticalindependence or negative dependence.
I When CF (A→ B) ≥ minCF the rule is called certain.
9
![Page 10: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/10.jpg)
Meta-Association Rules
Meta-association rules are association rules where theantecedent or the consequent can contain regular rules that have
been previously extracted with a high reliability in a highpercentage of the source databases.
10
![Page 11: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/11.jpg)
Proposal
…"R1# R2# Rk&1# Rk#
"Meta#database#
Rules"r1,"r2,"…,"rn"|"Addi1onal"a4ributes:"at1,"…,"atm""
Meta&associa1on#rules#
11
![Page 12: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/12.jpg)
Algorithm and Implementation Issues
1. From each database a set of rules Ri is obtained.
2. We compile these rules in a new database D joint with theattributes at1, . . . , atm.
D r1 r2 · · · rn at1 · · · atmD1 1 1 · · · 0 1 · · · 1D2 0 1 · · · 0 0 · · · 1
......
.... . .
......
. . ....
Dk 1 0 · · · 1 1 · · · 0
3. This information is fused by finding meta-association rules(involving the rules previously extracted r1, . . . , rn and theattributes at1, . . . , atm).
12
![Page 13: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/13.jpg)
Meta-Association Rules
Formally, we will obtain three types of meta-association rules:
I ri → rj where ri, rj can be rules or a conjunction of rules.For example: ri = ri1 ∧ · · · ∧ ris.
I ati → atj where ati, atj can be attributes or a conjunction ofattributes.
I ri → atj or atj → ri where ri, atj can be a conjunction ofrules and a conjunction of attributes resp., and they can bemixed.
13
![Page 14: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/14.jpg)
Meta-Association Rule Mining Algorithm
Input: D1, . . . , Dk, minsupp, minCFOutput: MR (set of meta-association rules)1: for all Di such that 1 ≤ i ≤ k do2: # Di preprocessing3: Read Di and store the items I4: Transform Di into a boolean database5: Store database into a vector of BitSets
6: # Mine very strong rules7: Compute the candidate set C of frequent itemsets Supp(X) ≥ minsupp8: Store the BitSet vector indexes of X ∈ C and Supp(X)9: Compose the rule with X,Y ∈ C10: if Supp(X ⇒ Y ) ≥ minsupp and CF (X ⇒ Y ) ≥ minCF then11: The rule is a very strong rule12: end if13: end for14: # D creation
15: Compile all different rules from R1, . . . , Rk
16: Create D using compiled rules and additional attributes17: # Mining meta-association rules18: Repeat steps 1-13 to mine meta-association rules from D
14
![Page 15: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/15.jpg)
Experimental Evaluation: DataSet Description
I 22 Databases about crime related to the districts in the city ofChicago
I Number of transactions: min = 5694 and max = 22493.
I 6 types of attributes (around 300 items) in each database:
• Quarter of the year in which the incident happened.• Day period: morning, afternoon, evening, night.• Crime description according to police standard protocols.• Location description: street, residence, etc.• Arrest, if there is an arrest associated to the crime.• Domestic, if the crime happened in a domestic environment.
I Additional attributes about the districts:
• Number of students in the district: low, medium, high, veryhigh.
• Number of misconducts notified in the district: low, very low,medium, high, very high.
• Perceived safety index, obtained by means of surveys: low,medium, high.
15
![Page 16: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/16.jpg)
Experimental Evaluation: Some Results
Example of obtained meta-association rule:
“IF (Crime-Description=$500 under → Arrest=false)AND
(Location-Description=RESIDENCE → Arrest=false)
THEN Safety-Index=High”
with Supp = 0.136 and CF = 1.
That means that it is frequent to have a high perception of security when
there are crimes of minor relevance without arrests in residential areas.
16
![Page 17: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/17.jpg)
Experimental Evaluation: Some Results
Another example of obtained meta-association rule:
“IF Safety-Index=Medium
THEN(Location-Description=STREET →Domestic=false)
ANDNumber-of-Students=Very High”
with Supp = 0.136 and CF = 0.511.
Interpretation: In some districts (13.6%) a higher safety perception
(medium) is frequently associated to the fact that crimes are happening
in the streets and the number of students in the district is very high.
17
![Page 18: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/18.jpg)
Discussion and Future Research
We have identified several problems or deficiencies of our approachthat can be improved.
I We have taken into account the presence/absence of a rule in D.
• It would be convenient to consider the degree of importance ofthe rule
Future: Improvement taking into account fuzzy association rules.
I The databases considered have the same structure.
• It would be convenient to address the problem of havingdatasets with different structure or different attributedescriptions but very similar meaning.
Future: Using a knowledge repository assisting the algorithm inmatching items with the same meaning.
18
![Page 19: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/19.jpg)
References
[Sanchez et al.] D. Sanchez, M.A. Vila, L. Cerda, and J.M. Serrano.Association rules applied to credit card fraud detection. ExpertSystems with Applications, 36:3630-3640, 2009.[Delgado et al.] M. Delgado, M.D. Ruiz, and D. Sanchez. Studyinginterest measures for association rules through a logical model. Int.J. of Uncertainty, Fuzziness and Knowledge-Based Systems,18(1):87-106, 2010.[Ruiz et al.] M.D. Ruiz, M.J. Martin-Bautista, D. Snchez, M.A. Vila, andM. Delgado. Anomaly detection using fuzzy association rules. Int.J. Electronic Security and Digital Forensics, 6(1):25-37, 2014.
19
![Page 20: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining](https://reader033.vdocuments.us/reader033/viewer/2022050603/5fab24338c2a257ee91674ea/html5/thumbnails/20.jpg)
Thank you. Any questions?
20