chapter 10 association rule

41
Chapter 10 ASSOCIATION RULE By: Aris D.(13406054) Ricky A.(13406058) Nadia FR. (13406069) Amirah K.(13406070) Paramita AW.(13406091) Bahana W.(13406102)

Upload: nadia-friza

Post on 18-Nov-2014

473 views

Category:

Documents


3 download

DESCRIPTION

Presentation Slides of Association Rule (Decision Support System)

TRANSCRIPT

Page 1: Chapter 10 Association Rule

Chapter 10

ASSOCIATION RULEBy:

Aris D.(13406054)

Ricky A.(13406058)

Nadia FR. (13406069)

Amirah K.(13406070)

Paramita AW.(13406091)

Bahana W.(13406102)

Page 2: Chapter 10 Association Rule

Introduction

• Affinity Analysis

Study of attributes or characteristics that “go together”.

• Market Based Analysis

The method, uncover rules for quantifying the relationship between two or more attributes.

“If antecedent, then consequent”

Page 3: Chapter 10 Association Rule

Affinity Analysis & Market Basket Analysis

• Example: Supermarket may find that of the 1000 customers

shopping on a Thursday night, 200 bought diapers, and of the 200 who bought diapers, 50 bought beer.

The association rule:If buy diapers, then buy beers”,with support of 50/1000 = 5%,and confidence of 50/200=25%

Page 4: Chapter 10 Association Rule

Affinity Analysis & Market Basket

Analysis (2)Examples business & research:• Investigating the proportion of subscribers to your

company’s cell phone plan that respond positively to an offer of a service upgrade

• Examining the proportion of children whose parents read to them who are themselves good readers

• Predicting degradation in telecommunications networks• Finding out which items in a supermarket are purchased

together & which are never purchased together• Determining the proportion of cases in which a new drug

will exhibit dangerous side effects

Page 5: Chapter 10 Association Rule

Affinity Analysis & Market Basket

Analysis (3)• The number of possible association rules grows

exponentially in the number of attributes.

• If binary attributes (yes/no) then there are k.[2^(k-1)] possible association rule.

• Example: a convinience store that sells 100 items. Possible association rules = 100.[2^99] ≈ 6,4 x (10^31)

• A priori algorithm (pendahuluan) reduce the search problem to a more manageable size

Page 6: Chapter 10 Association Rule

Notation for Data Representation in

Market Basket Analysis• Farmer sells I = {asparagus, beans, broccoli,

corn, green peppers, squash, tomatoes}

• A customer puts in a basket, Subset I = {broccoli, corn}

• Subset doesn’t keep track of how much each item is purchased, just the name of item.

Page 7: Chapter 10 Association Rule

Transactional Data Format

Page 8: Chapter 10 Association Rule

Tabular Data Format

Page 9: Chapter 10 Association Rule

Support, Confidence, Frequent

Itemsets, & the Apriori Property• Example:D : set of transactions represented in Table 10.1T : a transaction in D represents a set of itemsI : set of itemsSet of items A : beans, squashSet of items B : asparagus

THEN …Association rule takes the form if A, then B (AB),A and B are PROPER subsets of I, and are mutuallyexclusive

Page 10: Chapter 10 Association Rule

Table of Transaction Made

Page 11: Chapter 10 Association Rule

Support and Confidence• Support, s, is the proportion of transactions in D

that contain both A and B.support = P(AB)= number of transactions containing both A&B

total number of transactions• Confidence, c, is a measure of the accuracy of the

rule.confidence = P(B|A)= P(AB)

P(A)= number of transactions containing both A&B

number of transactions containing A

• Analysts prefer RULES:High support AND High confidence

Page 12: Chapter 10 Association Rule

Frequent Itemset Definition…

An Itemset is a set of items contained in I, and a k-itemset containing k items. e.g: {beans, squash} 2-itemset The itemset frequency…

the number of transactions that contain the particular itemset A frequent itemset …

itemset that occurs at least a certain minimum number of times, having itemset frequency

Example:Set that = 4, then itemsets that occur more than FOUR times are said to be frequent.

Page 13: Chapter 10 Association Rule

• Mining Association RulesIt is a two-steps process:1. Find all frequent itemsets (all itemsets with

frequency )2. From the frequent itemsets, generate

association rules satisfying the minimum support and confidence conditions

• The Apriori property states that if an itemset Z isnot frequent, then adding another item A tothe itemset Z will not make Z morefrequent. This helpful property reducessignificantly the search space for the a priorialgorithm.

The Apriori Property

Page 14: Chapter 10 Association Rule

How does the Apriori Algorithm Work?

• Part 1: Generating Frequent Itemsets

• Part 2: Generating Association Rules

Page 15: Chapter 10 Association Rule

Generating Frequent Itemsets• Example:

let = 4, so that an itemset is frequent if it occursfour or more times in D.

F1= {asparagus, beans, broccoli, corn, greenpeppers, squash, tomatoes}F2 first, constructs a set Ck of candidate k-itemsetsby joining Fk-1 with itself. Then it prunes Ck usingthe a priori property.Ck for k=2, consists of all the combinations ofvegetables in Table 10.4F3 not much different than the steps for F2, butuse k number = 3

Page 16: Chapter 10 Association Rule

Table 10.3 (pg.183)

Page 17: Chapter 10 Association Rule

Table 10.4 (pg. 185)

Page 18: Chapter 10 Association Rule

• However, consider s={beans, corn, squash}

the subset {corn, squash} has frequency 3 < 4 =, so that {corn, squash} is not frequent.

By the priori property, therefore, {beans, corn,squash} cannot be frequent, is therefore pruned,and doesn’t appear in F3

So does the s= {beans, squash, tomatoes}, the frequency of the subsets is < 4

Page 19: Chapter 10 Association Rule

Generating Association Rules

1. Generate all subsets of s.

2. Association Rule R : ss ⇒ (s-ss)Generate R if fulfills the minimum confidence requirement.

(s-ss) is set s without ss

Page 20: Chapter 10 Association Rule

Example two antecedent

• All transaction = 14

• Transaction include asparagus and beans = 5

• Transaction include asparagus and Squash = 5

• Transaction include Beans and squash = 6

Page 21: Chapter 10 Association Rule

Ranked by support x Confidence

• Minimum Confidence 80%

Page 22: Chapter 10 Association Rule

Clementine generating Association

Rules

Page 23: Chapter 10 Association Rule

Clementine generating Association

Rules (2)• Support means occurences of antecedent,

different from what we defined before.

• First columns indicates number of antecedent occurs.

• To find actual “support” using clementine, multiply support and confidence.

Page 24: Chapter 10 Association Rule

Extension From Flag Data to General

Categorical Data

- Association rule not only for Flag (Boolean) data.

- A priori algorithm can be applied to categorical data.

Page 25: Chapter 10 Association Rule

Example using Clementine

• Recall Normalized adult data set in chapter 6 and 7

Page 26: Chapter 10 Association Rule

Information-Theoretic Approach:

Generalized Rule Induction MethodWhy GRI?

• A priori algorithm is not well equipped to handle numerical attributes, need discretization

• Discretization can lead to loss of information

• GRI can handle both categorical or numerical variables as inputs, but still requires categorical variables as output

Page 27: Chapter 10 Association Rule

Generalized Rule Induction Method (2)

J-Measure

• p(x) probability of the value of x (antecedent)

• p(y) probability of the value of y (consequent)

• p(y|x) conditional probability of y given that x has occured

)(1

)|(1ln)].|(1[

)(

)|(ln).|()(

yp

xypxyp

yp

xypxypxpJ

Page 28: Chapter 10 Association Rule

Generalized Rule Induction Method (3)

• J-Measure shows “interestingness”

• In GRI, user specifies how many association rules would be reported

• If the “interestingness” of new rule > current minimum J in the rule table, new rule is inserted, rule with minimum J is eliminated

Page 29: Chapter 10 Association Rule

Application of GRIp(x) : female, never married

p(x) = 0.1463

Page 30: Chapter 10 Association Rule

Application of GRI (2)

p(y) : work class = private

p(y) = 0.6958

Page 31: Chapter 10 Association Rule

Application of GRI (3)p(y|x) : work class = private;

given : female, never married

p(y|x) = conditional probabilities = 0.763

Page 32: Chapter 10 Association Rule

Application of GRI

Calculation :

001637.0

)]7791.0ln().237.0()0966.1ln(.763.0[1463.0

3042.0

237.0ln).237.0(

6958.0

763.0ln.763.01463.0

)(1

)|(1ln)].|(1[

)(

)|(ln).|()(

yp

xypxyp

yp

xypxypxpJ

Page 33: Chapter 10 Association Rule

When not to use Association Rules

• Association Rules chosen a priori could be used based on:

▫ Confidence

▫ Confidence Difference

▫ Confidence Ratio

• Association Rules need to be applied with care because the results are sometimes unreliable.

Page 34: Chapter 10 Association Rule

When not to use Association Rules (2)Association Rules chosen a priori, based on confidence

• Applying this association rule reduces the probability of randomly selecting desired data.

• Eventhough the rule is useless, software still reported it probably because the default ranking mechanism for priori’s algorithm is confidence.

• We should never simply believe the computer output without making the effort to understand the models and mechanism underlying the result.

Page 35: Chapter 10 Association Rule

When not to use Association Rules (3)Association Rules chosen a priori, based on confidence

Page 36: Chapter 10 Association Rule

When not to use Association Rules (4)Association Rules chosen a priori, based on confidence difference

• A random selection from the database wouldhave provided more effective results (none useless report)than applying the association rule.

• This rule provide the greatest increase in confidence from the prior to posterior.

• Evaluation measures the absolute difference between the prior and posterior confidences.

Page 37: Chapter 10 Association Rule

When not to use Association Rules (5)Association Rules chosen a priori, based on confidence difference

Page 38: Chapter 10 Association Rule

When not to use Association Rules (6)Association Rules chosen a priori, based on confidence ratio

• Analyst prefer to use the confidence ratio to evaluate potential rules.

• Confidence difference criterion yielded the very same rules as did the confidence ratio criterion.

Page 39: Chapter 10 Association Rule

When not to use Association Rules (7)Association Rules chosen a priori, based on confidence ratio

• Example:

If Marital_Satus = Divorced, then sex = Female. p(y)=0.3317 danp(y|x)=0.60

Page 40: Chapter 10 Association Rule

Do Association Rules Represent

Supervised or Unsupervised Learning?• Supervised learning:

▫ Variable is prespecified

▫ Algorithm is provided with a rich collection of examples where possible association between the target vaiable and the predictor variables may be uncovered

• Unsupervised learning:▫ No target variable is identified explicitly

▫ Algorithm searches for patterns and structure among all the variables

• Association Rules generally used for unsupervised learning but can also be applied for supervised learning for classification task

Page 41: Chapter 10 Association Rule

Local Patterns Versus Global Models

Model: Global Description or Explanation of a data set. Patterns: Essential local features of Data Association rules are well suited to uncovering

local patterns in data Applying “if “clause drills down deep into data set,

uncovering a hidden local pattern that might be relevant Finding local patterns is one of the most

important goals in data mining. It can lead to new profitable initiatives.