elnaz delpisheh york university department of computer science and engineering april 13, 2015...

Post on 14-Jan-2016

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Elnaz DelpishehYork University

Department of Computer Science and Engineering

April 21, 2023

Identifying Interesting Association Rules with Genetic

Algorithms

Data mining

2

Data

Data Mining

Association rules

Too much data

•I = {i1,i2,...,in} is a set of items.•D = {t1,t2,...,tn} is a transactional database.•ti is a nonempty subset of I.•An association rule is of the form AB, where A and B are the itemsets, A⊂ I, B⊂ I, and A∩B=∅ .•Apriori algorithm is mostly used for association rule mining.•{milk, eggs}{bread}.

Apriori Algorithm

TID List of item IDs

T100

I1,I2,I3

T200

I2, I4

T300

I2, I3

T400

I1,I2,I4

T500

I1, I3

T600

I2, I3

T700

I1, I3

T800

I1, I2, I3, I5

T900

I1, I2, I3

3

Apriori Algorithm (Cont.)

4

Association rule mining

5

Too many

association rules

Data

Data Mining

Association rules

Too much data

Interestingness criteria

6

Comprehensibility.Conciseness.Diversity.Generality.Novelty.Utility....

Interestingness measures

Subjective measuresData and the user’s prior knowledge are considered.Comprehensibility, novelty, surprisingness, utility.

Objective measuresThe structure of an association rule is considered.Conciseness, diversity, generality, peculiarity.Example: Support

It represents the generality of a rule. It counts the number of transactions containing both A and

B.

7

Drawbacks of objective measuresDetabase-dependence

Lack of knowledge about the databaseThreshold dependence

SolutionMultiple database reanalysis

Problemo Large number of disk I/O

Detabase-independence

8

Genetic algorithm-based learning (ARMGA )1. Initialize population2. Evaluate individuals in population3. Repeat until a stopping criteria is met

A. Select individuals from the current population

B. Recombine them to obtain more individualsC. Evaluate new individualsD. Replace some or all the individuals of the

current population by off-springs

4. Return the best individual seen so far

9

ARMGA ModelingGiven an association rule XYRequirement

Conf(XY) > Supp(Y)

Aim is to maximise

10

ARMGA EncodingMichigan Strategy

Given an association k-rule XY, where X,Y⊂I, I is a set of items I=i1,i2,..., in, and X∩Y=∅.

For example{A1,...,Aj}{Aj+1,...,Ak}

11

ARMGA Encoding (Cont.)

12

The aforementioned encoding highly depends on the length of the chromosome.

We use another type of encoding:Given a set of items {A,B,C,D,E,F}Association rule ACFB is encoded as follows

00A11B00C01D11E00F00: Item is antecedent11: Item is consequence01/10: Item is absent

ARMGA Operators

SelectCrossoverMutation

13

ARMGA Operators-SelectSelect(c,ps): Acts as a filter of the

chromosomeC: ChromosomePs: pre-specified probability

14

ARMGA Operators-CrossoverThis operation uses a two-point strategy

15

ARMGA Operators-Mutate

16

ARMGA Initialization

17

ARMGA Algorithm

18

Empirical studies and EvaluationImplement the entire procedure using

Visual C++Use WEKA to produce interesting

association rulesCompare the results

19

20

top related