mining frequent itemset-association analysis

59
Mining Frequent itemset Association Analysis

Upload: sandeep-dwivedi

Post on 15-May-2017

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Mining Frequent Itemset-Association Analysis

Mining Frequent itemset

Association Analysis

Mining Frequent Item set

Frequent patterns are patterns that appear in the dataset frequentlyFor example a set of items such as milk and bread that appear frequently together in a transaction data set is a frequent itemset

Frequent pattern mining searches for recurring relationships in a given data set

Market Basket Analysis

bull Frequent itemset mining leads to the discovery of associations and correlations among items in largetransactional or relational data sets

bull A example of frequent item set mining is market basket analysisThis process analyzes customer buying habits by finding association between items that customers place in their shopping baskets

Market Basket Analysis

Market Basket Analysisbull Market Basket analysis may be performed on the retail

data of customer transaction at your storeThe results can be used to plan marketing and advertising strategies or in the design of a new catalog

bull The patterns can be represented in the form of association rulesFor example the information that customers who purchase computers also tend to buy anti virus software at the same time is represented by the association rule

bull Computer=gtantivirus_software[support=2confidence=60]

Frequent ItemsetClosed Itemset and Association Rules

bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I

bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support

threshold and minimum confidenece threshold are said to be strong

Frequent ItemsetClosed Itemset and Association Rules

bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset

bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support

Frequent ItemsetClosed Itemset and Association Rules

bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)

bull Association Rules can be viewed as a 2 step process-

bull Find all the frequent itemsetbull Generate strong association rules from the

frequent itemset

Closed Frequent Itemset amp Maximal Frequent Item set

bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S

bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S

bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S

Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways

based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the

rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 2: Mining Frequent Itemset-Association Analysis

Mining Frequent Item set

Frequent patterns are patterns that appear in the dataset frequentlyFor example a set of items such as milk and bread that appear frequently together in a transaction data set is a frequent itemset

Frequent pattern mining searches for recurring relationships in a given data set

Market Basket Analysis

bull Frequent itemset mining leads to the discovery of associations and correlations among items in largetransactional or relational data sets

bull A example of frequent item set mining is market basket analysisThis process analyzes customer buying habits by finding association between items that customers place in their shopping baskets

Market Basket Analysis

Market Basket Analysisbull Market Basket analysis may be performed on the retail

data of customer transaction at your storeThe results can be used to plan marketing and advertising strategies or in the design of a new catalog

bull The patterns can be represented in the form of association rulesFor example the information that customers who purchase computers also tend to buy anti virus software at the same time is represented by the association rule

bull Computer=gtantivirus_software[support=2confidence=60]

Frequent ItemsetClosed Itemset and Association Rules

bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I

bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support

threshold and minimum confidenece threshold are said to be strong

Frequent ItemsetClosed Itemset and Association Rules

bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset

bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support

Frequent ItemsetClosed Itemset and Association Rules

bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)

bull Association Rules can be viewed as a 2 step process-

bull Find all the frequent itemsetbull Generate strong association rules from the

frequent itemset

Closed Frequent Itemset amp Maximal Frequent Item set

bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S

bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S

bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S

Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways

based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the

rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 3: Mining Frequent Itemset-Association Analysis

Market Basket Analysis

bull Frequent itemset mining leads to the discovery of associations and correlations among items in largetransactional or relational data sets

bull A example of frequent item set mining is market basket analysisThis process analyzes customer buying habits by finding association between items that customers place in their shopping baskets

Market Basket Analysis

Market Basket Analysisbull Market Basket analysis may be performed on the retail

data of customer transaction at your storeThe results can be used to plan marketing and advertising strategies or in the design of a new catalog

bull The patterns can be represented in the form of association rulesFor example the information that customers who purchase computers also tend to buy anti virus software at the same time is represented by the association rule

bull Computer=gtantivirus_software[support=2confidence=60]

Frequent ItemsetClosed Itemset and Association Rules

bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I

bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support

threshold and minimum confidenece threshold are said to be strong

Frequent ItemsetClosed Itemset and Association Rules

bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset

bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support

Frequent ItemsetClosed Itemset and Association Rules

bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)

bull Association Rules can be viewed as a 2 step process-

bull Find all the frequent itemsetbull Generate strong association rules from the

frequent itemset

Closed Frequent Itemset amp Maximal Frequent Item set

bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S

bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S

bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S

Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways

based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the

rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 4: Mining Frequent Itemset-Association Analysis

Market Basket Analysis

Market Basket Analysisbull Market Basket analysis may be performed on the retail

data of customer transaction at your storeThe results can be used to plan marketing and advertising strategies or in the design of a new catalog

bull The patterns can be represented in the form of association rulesFor example the information that customers who purchase computers also tend to buy anti virus software at the same time is represented by the association rule

bull Computer=gtantivirus_software[support=2confidence=60]

Frequent ItemsetClosed Itemset and Association Rules

bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I

bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support

threshold and minimum confidenece threshold are said to be strong

Frequent ItemsetClosed Itemset and Association Rules

bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset

bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support

Frequent ItemsetClosed Itemset and Association Rules

bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)

bull Association Rules can be viewed as a 2 step process-

bull Find all the frequent itemsetbull Generate strong association rules from the

frequent itemset

Closed Frequent Itemset amp Maximal Frequent Item set

bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S

bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S

bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S

Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways

based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the

rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 5: Mining Frequent Itemset-Association Analysis

Market Basket Analysisbull Market Basket analysis may be performed on the retail

data of customer transaction at your storeThe results can be used to plan marketing and advertising strategies or in the design of a new catalog

bull The patterns can be represented in the form of association rulesFor example the information that customers who purchase computers also tend to buy anti virus software at the same time is represented by the association rule

bull Computer=gtantivirus_software[support=2confidence=60]

Frequent ItemsetClosed Itemset and Association Rules

bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I

bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support

threshold and minimum confidenece threshold are said to be strong

Frequent ItemsetClosed Itemset and Association Rules

bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset

bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support

Frequent ItemsetClosed Itemset and Association Rules

bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)

bull Association Rules can be viewed as a 2 step process-

bull Find all the frequent itemsetbull Generate strong association rules from the

frequent itemset

Closed Frequent Itemset amp Maximal Frequent Item set

bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S

bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S

bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S

Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways

based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the

rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 6: Mining Frequent Itemset-Association Analysis

Frequent ItemsetClosed Itemset and Association Rules

bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I

bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support

threshold and minimum confidenece threshold are said to be strong

Frequent ItemsetClosed Itemset and Association Rules

bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset

bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support

Frequent ItemsetClosed Itemset and Association Rules

bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)

bull Association Rules can be viewed as a 2 step process-

bull Find all the frequent itemsetbull Generate strong association rules from the

frequent itemset

Closed Frequent Itemset amp Maximal Frequent Item set

bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S

bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S

bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S

Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways

based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the

rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 7: Mining Frequent Itemset-Association Analysis

Frequent ItemsetClosed Itemset and Association Rules

bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset

bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support

Frequent ItemsetClosed Itemset and Association Rules

bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)

bull Association Rules can be viewed as a 2 step process-

bull Find all the frequent itemsetbull Generate strong association rules from the

frequent itemset

Closed Frequent Itemset amp Maximal Frequent Item set

bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S

bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S

bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S

Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways

based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the

rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 8: Mining Frequent Itemset-Association Analysis

Frequent ItemsetClosed Itemset and Association Rules

bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)

bull Association Rules can be viewed as a 2 step process-

bull Find all the frequent itemsetbull Generate strong association rules from the

frequent itemset

Closed Frequent Itemset amp Maximal Frequent Item set

bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S

bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S

bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S

Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways

based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the

rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 9: Mining Frequent Itemset-Association Analysis

Closed Frequent Itemset amp Maximal Frequent Item set

bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S

bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S

bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S

Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways

based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the

rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 10: Mining Frequent Itemset-Association Analysis

Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways

based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the

rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 11: Mining Frequent Itemset-Association Analysis

Efficient and Scalable Frequent Itemset Mining method

bull Apriori is the basic algorithm for designing frequent itemset

bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties

bull Apriori employs an iterative approach known as level wise search

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 12: Mining Frequent Itemset-Association Analysis

Apriori property

bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that

is P(I)lt min_supThe property is called antimonotone in the sense

that if the a set cannot pass the test all of its superset will fail the same test as well

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 13: Mining Frequent Itemset-Association Analysis

Apriori Algorithm

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 14: Mining Frequent Itemset-Association Analysis

Apriori Algo

bull Input-I ItemsetD Database of transactionS Support

bull OutputL Large itemset

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 15: Mining Frequent Itemset-Association Analysis

Apriori Algo(Contdhellip)

Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 16: Mining Frequent Itemset-Association Analysis

Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do

if ci gt= (s|D|) do

LK=LK UII

L=L U Lk

CK+1=Apriori-Gen(LK)

Until Ck+1= null

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 17: Mining Frequent Itemset-Association Analysis

Generating Association Rules from Frequent Itemsets

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 18: Mining Frequent Itemset-Association Analysis

Improving the Efficiency of Apriori

bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 19: Mining Frequent Itemset-Association Analysis

Mining Frequent Itemset without Candidate Generation(FP Growth)

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 20: Mining Frequent Itemset-Association Analysis

FP Growth

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 21: Mining Frequent Itemset-Association Analysis

FP Growth Algo

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 22: Mining Frequent Itemset-Association Analysis

Mining Frequent Itemset using Vertical Format

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 23: Mining Frequent Itemset-Association Analysis

Mining frequent Itemset using Vertical Data Format

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 24: Mining Frequent Itemset-Association Analysis

Mining Various Kind of Association Rules

Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit

ordering among values

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 25: Mining Frequent Itemset-Association Analysis

Mining Multi Level Association Rules

bull Finding strong association rule at low or primitive level of abstraction is very difficult

bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction

bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 26: Mining Frequent Itemset-Association Analysis

A concept hierarchy for All Electronics

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 27: Mining Frequent Itemset-Association Analysis

Multiple level Association Rules

bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules

bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework

bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 28: Mining Frequent Itemset-Association Analysis

Using uniform minimum support for all levels

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 29: Mining Frequent Itemset-Association Analysis

Using reduced minimum support at lower levels

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 30: Mining Frequent Itemset-Association Analysis

Disadvantages of mining multilevel association rules

Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 31: Mining Frequent Itemset-Association Analysis

Disadvantages of mining multilevel association rules

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 32: Mining Frequent Itemset-Association Analysis

Disadvantages of mining multilevel association rules

bull If a rule doesnrsquot provide any new information then it should be removed

bull A rule R1 is more generalized than rule R2so need to specify rule R2

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 33: Mining Frequent Itemset-Association Analysis

Mining Multidimensional association rules from Relational Database and DW

Association rules that imply a single predicate that is the predicate buys

Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra

dimensional association rules

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 34: Mining Frequent Itemset-Association Analysis

Mining Multidimensional association rules from Relational Database and DW

Mining multidimensional database association rules-

Associations rules that involves two or more dimensions or predicate are called multidimensional association rules

Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 35: Mining Frequent Itemset-Association Analysis

Two Basic Approaches

bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of

possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 36: Mining Frequent Itemset-Association Analysis

Two Basic Approaches

Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 37: Mining Frequent Itemset-Association Analysis

Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes

Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels

Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 38: Mining Frequent Itemset-Association Analysis

Mining Quantitative Association Rules

bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria

bull A(quant1) ^ A(quant2) =gt Acat

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 39: Mining Frequent Itemset-Association Analysis

Association Rule Clustering System

bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition

bull The following steps are involved in ARCS-

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 40: Mining Frequent Itemset-Association Analysis

Steps involved in ARCS

1 Binning- Quantitative attributes can have a very wide range of values defining their domain

The partioning process is called binning The intervals are considered as binsThe common binning strategies are-

Equal width binning Equal Frequency Binning Clustering based binning

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 41: Mining Frequent Itemset-Association Analysis

Steps involved in ARCS(contd)

2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence

3Clustering the association rules-the strong association rules are then mapped to 2-D grid

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 42: Mining Frequent Itemset-Association Analysis

Steps involved in ARCS(contdhellip)

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 43: Mining Frequent Itemset-Association Analysis

Steps involved in ARCS(contdhellip)

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 44: Mining Frequent Itemset-Association Analysis

From Association Mining to Correlation Analysis

Even strong association rules can be misleadingSupport Confidence framework can be

supplemented by additional measure based on statistical significance and correlational analysis

Lift is a simple correlation measureThe occurrence of itemset A is independent of the

occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 45: Mining Frequent Itemset-Association Analysis

From Association Mining to Correlation Analysis

lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is

negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is

positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and

no correlation

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 46: Mining Frequent Itemset-Association Analysis

From Association Mining to Correlation Analysis

P(BA)P(B) or con f(A=gtB)sup(B)

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 47: Mining Frequent Itemset-Association Analysis

bull Correlation Analysis using lift-

Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table

Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated

Examples

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 48: Mining Frequent Itemset-Association Analysis

Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 49: Mining Frequent Itemset-Association Analysis

Other correlation measure

bull All_confidencebull Cosine

bull all_conf(X)=sup(X)mx_item_sup(x)

bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 50: Mining Frequent Itemset-Association Analysis

Comparison of four correlation measures on typical data set

bull A null transaction is a transaction that does not contain any of the itemsets being examined

bull A measure is null-variant if its value is free from the influence of null transaction

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 51: Mining Frequent Itemset-Association Analysis

Constraint Based Association mining

bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users

bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 52: Mining Frequent Itemset-Association Analysis

Constraint Based Association mining

bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 53: Mining Frequent Itemset-Association Analysis

metarule-Guided Mining of association rules

bull Metarules allows users to specify the syntatic form of rules that they are interested in mining

bull Metarule-guided mining-Finding association between customer traits and

the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 54: Mining Frequent Itemset-Association Analysis

Constraint Pushing

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint
Page 55: Mining Frequent Itemset-Association Analysis

Rule Constraint

bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible

Explain all of them with example of DMQL

  • Mining Frequent itemset
  • Mining Frequent Item set
  • Market Basket Analysis
  • Market Basket Analysis (2)
  • Market Basket Analysis (3)
  • Frequent ItemsetClosed Itemset and Association Rules
  • Frequent ItemsetClosed Itemset and Association Rules (2)
  • Frequent ItemsetClosed Itemset and Association Rules (3)
  • Closed Frequent Itemset amp Maximal Frequent Item set
  • Frequent Pattern mining
  • Efficient and Scalable Frequent Itemset Mining method
  • Apriori property
  • Apriori Algorithm
  • Slide 14
  • Apriori Algo
  • Apriori Algo(Contdhellip)
  • Apriori algo(contd)
  • Generating Association Rules from Frequent Itemsets
  • Improving the Efficiency of Apriori
  • Mining Frequent Itemset without Candidate Generation(FP Growth)
  • FP Growth
  • FP Growth Algo
  • Mining Frequent Itemset using Vertical Format
  • Mining frequent Itemset using Vertical Data Format
  • Slide 25
  • Slide 26
  • Mining Various Kind of Association Rules
  • Mining Multi Level Association Rules
  • A concept hierarchy for All Electronics
  • Multiple level Association Rules
  • Using uniform minimum support for all levels
  • Using reduced minimum support at lower levels
  • Disadvantages of mining multilevel association rules
  • Disadvantages of mining multilevel association rules (2)
  • Disadvantages of mining multilevel association rules (3)
  • Mining Multidimensional association rules from Relational Datab
  • Mining Multidimensional association rules from Relational Datab (2)
  • Two Basic Approaches
  • Two Basic Approaches (2)
  • Mining Multidimensional Association Rules using Static Discreti
  • Slide 41
  • Mining Quantitative Association Rules
  • Association Rule Clustering System
  • Steps involved in ARCS
  • Steps involved in ARCS(contd)
  • Steps involved in ARCS(contdhellip)
  • Steps involved in ARCS(contdhellip) (2)
  • From Association Mining to Correlation Analysis
  • From Association Mining to Correlation Analysis (2)
  • From Association Mining to Correlation Analysis (3)
  • Examples
  • Correlation analysis using X2
  • Other correlation measure
  • Comparison of four correlation measures on typical data set
  • Constraint Based Association mining
  • Constraint Based Association mining (2)
  • metarule-Guided Mining of association rules
  • Constraint Pushing
  • Rule Constraint