michael r. wick and paul j. wagner department of computer science

Connecting Discrete Structures to Connecting Discrete Structures to the “Real World”the “Real World”

Using Market Basket Analysis Using Market Basket Analysis (and Gray Codes)(and Gray Codes) to Integrate and to Integrate and

Motivate Topics in Discrete StructuresMotivate Topics in Discrete Structures

Michael R. Wick Michael R. Wick and and Paul J. WagnerPaul J. Wagner

Department of Computer ScienceDepartment of Computer Science

University of Wisconsin - Eau ClaireUniversity of Wisconsin - Eau Claire

Eau Claire, WI 54701Eau Claire, WI 54701

Road MapRoad Map

IntroductionIntroduction Our Discrete Structures CourseOur Discrete Structures Course Application: Market Basket AnalysisApplication: Market Basket Analysis

The Apriori AlgorithmThe Apriori Algorithm Set TheorySet Theory Dynamic ProgrammingDynamic Programming Algorithm AnalysisAlgorithm Analysis

Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes ApplicationsApplications RecursionRecursion Algorithm AnalysisAlgorithm Analysis Divide-and-ConquerDivide-and-Conquer Dynamic ProgrammingDynamic Programming

SummarySummary Contact InformationContact Information

IntroductionIntroduction

Perceived disconnect with Perceived disconnect with Discrete StructuresDiscrete Structures Rest of curriculumRest of curriculum Application to “real world”Application to “real world”

Particularly problematic in applied programsParticularly problematic in applied programs

We claim this course for our ownWe claim this course for our own Replaced similar course in MathematicsReplaced similar course in Mathematics Retained rigorRetained rigor Infused applications and algorithmicsInfused applications and algorithmics

Our Discrete Structures CourseOur Discrete Structures Course TopicsTopics

LogicLogic Expert Systems, Algorithm Correctness ProofExpert Systems, Algorithm Correctness Proof

Proof TechniquesProof Techniques RecursionRecursion GraycodesGraycodes Divide and ConquerDivide and Conquer Dynamic ProgrammingDynamic Programming

Sets & RelationsSets & Relations Market-basket AnalysisMarket-basket Analysis compareTocompareTo and and equalsequals implementations implementations

FunctionsFunctions Algorithm AnalysisAlgorithm Analysis

Combinatorics/ProbabilityCombinatorics/Probability Expert SystemsExpert Systems

MatricesMatrices Graphics/Transmission ErrorsGraphics/Transmission Errors

Graphs and TreesGraphs and Trees Shortest Path, Iterative Deepening, Huffman CodingShortest Path, Iterative Deepening, Huffman Coding

Application: Market-Basket AnalysisApplication: Market-Basket Analysis Sets are a powerful way to describe the applicationSets are a powerful way to describe the application

Market Basket Analysis: the use of association techniques to find Market Basket Analysis: the use of association techniques to find groups of items that tend to occur together in transactionsgroups of items that tend to occur together in transactions frequent item setsfrequent item sets

• sets of items that occur above some minimum threshold (called the sets of items that occur above some minimum threshold (called the minimum supportminimum support))

• example: {a,b,c,d} occurs 12 times (min. support == 10)example: {a,b,c,d} occurs 12 times (min. support == 10)

association rulesassociation rules• a,b,c a,b,c d iff support({a,b,c,d}) / support({a,b,c}) d iff support({a,b,c,d}) / support({a,b,c}) rr (called (called

minimum confidenceminimum confidence))• a,b a,b c,d iff support({a,b}) / support({c,d}) c,d iff support({a,b}) / support({c,d}) rr • how many such rules are there?how many such rules are there?

Suggestive SellSuggestive Sell When the client selects the When the client selects the antecedentantecedent items suggest that they select the items suggest that they select the

consequentconsequent items items

Application: Market-Basket AnalysisApplication: Market-Basket Analysis

AprioriApriori Algorithm (1997) Algorithm (1997) PrinciplesPrinciples

Every subset of a frequent item set must be frequentEvery subset of a frequent item set must be frequent

Every frequent item set of cardinality Every frequent item set of cardinality nn+1 must have at least +1 must have at least two frequent item sets of cardinality two frequent item sets of cardinality nn as subsets as subsets

The intersection of these two subsets must have a cardinality The intersection of these two subsets must have a cardinality of of nn-1-1

We can build every possible frequent item set of size We can build every possible frequent item set of size nn+1 +1 from the union of frequent item sets of size from the union of frequent item sets of size nn..


AprioriApriori Algorithm (1997) Algorithm (1997) Example: Example: minSupport = 2minSupport = 2

I= {Table Saw, Router, Kreg Jig, Sander, Drill Press}I= {Table Saw, Router, Kreg Jig, Sander, Drill Press}T= {{Table Saw, Router, Drill Press},T= {{Table Saw, Router, Drill Press},

{ Router, Sander },{ Router, Sander }, { Router, Kreg Jig },{ Router, Kreg Jig },

{Table Saw, Router, , Sander },{Table Saw, Router, , Sander }, {Table Saw, , Kreg Jig },{Table Saw, , Kreg Jig }, { Router, Kreg Jig },{ Router, Kreg Jig }, {Table Saw, , Kreg Jig },{Table Saw, , Kreg Jig }, {Table Saw, Router, Kreg Jig, , Drill Press},{Table Saw, Router, Kreg Jig, , Drill Press}, {Table Saw, Router, Kreg Jig }}{Table Saw, Router, Kreg Jig }}

LL11 = { {T}, {R}, {K}, {S}, {D} } = { {T}, {R}, {K}, {S}, {D} }

LL22 = { {R,T}, {K,T}, {D,T}, {K,R}, {R,S}, {D,R} } = { {R,T}, {K,T}, {D,T}, {K,R}, {R,S}, {D,R} }

LL33 = { {K,R,T}, {D,R,T} } = { {K,R,T}, {D,R,T} }

LL44 = = Rules = ????Rules = ????

Application: Market-Basket AnalysisApplication: Market-Basket Analysis AprioriApriori Algorithm (1997) Algorithm (1997)

Let I = {a,b,c,…} be a Let I = {a,b,c,…} be a setset of all items of all items in the domain in the domainLet T = { Let T = { SS | | SS I } be a I } be a bagbag of all transaction of all transaction records of item sets records of item setsLet support(Let support(SS) = ) = { {AA | | AA T T SS AA} |} |

Let LLet L11 = { { = { {aa} | } | aa I I support({ support({aa}) }) minSupport } minSupport }

kk ( (kk > 1 > 1 L Lkk-1-1 ) Let ) Let LLkk = { = { SSii SSjj | (| (SSii L Lkk-1-1) ) ( (SSjj L Lkk-1-1) ) ( |( |SSii – – SSjj| = 1 ) | = 1 ) ( | ( |SSjj – – SSii| = 1) | = 1) ( ( SS[ (([ ((SS SSii SSjj) ) (| (|SS| = | = kk-1)) -1)) S S L Lkk-1-1] ) ] ) ( support(( support(SSii SSjj) ) minSupport ) minSupport )

The set of all frequent item sets is given byThe set of all frequent item sets is given by

L = LL = Lkk

and the set of all association rules is given byand the set of all association rules is given by

R = { R = { AA CC | | AA (L(Lkk) ) ( (CC = L = Lkk – – A)A) ( (AA ) ) ( (CC ) )

support(Lsupport(Lkk) / support() / support(AA) ) minConfidence } minConfidence }

k


Dynamic ProgrammingDynamic Programming ApproachApproach Want proof of Want proof of principle of optimalityprinciple of optimality and and overlapping overlapping

subproblemssubproblems

Principle of OptimalityPrinciple of Optimality The optimal solution to LThe optimal solution to Lkk includes the optimal solution includes the optimal solution

of Lof Lkk-1-1

Proof by contradictionProof by contradiction

Overlapping SubproblemsOverlapping Subproblems Lemma of every subset of a frequent item set is a frequent Lemma of every subset of a frequent item set is a frequent

item setitem set Proof by contradictionProof by contradiction

k

k

Application: Market-Basket AnalysisApplication: Market-Basket Analysis Rule Generation Rule Generation AlgorithmAlgorithm

Let L = Let L = k Lk Lkk

Let T = {S | S Let T = {S | S I } be the set of all transactions. I } be the set of all transactions.Let <A,C> be an association rule with antecedent A and consequent C.Let <A,C> be an association rule with antecedent A and consequent C.Let confid(<A,C>) = |{B | B Let confid(<A,C>) = |{B | B T T (A (A B) B) B}| / B}| / |{B | B |{B | B T T A A B}| B}|Let R1 = {<F-a,a> | F Let R1 = {<F-a,a> | F L L a a F F confid(F,a) ≥ min_confid)} andconfid(F,a) ≥ min_confid)} andk [ (k > 1) k [ (k > 1) (Rk-1 ≠ (Rk-1 ≠ ) ) Rk = { <A,C> |Rk = { <A,C> | (<A,C(<A,Cii> > R Rk-1k-1) ) (<A,C(<A,Cjj> > R Rk-1k-1) ) (|C(|Cii – C – Cjj| =1 | =1 |C |Cj j – C– Cii| = 1) | = 1) ((S [((S S [((S C Cii C Cjj) ) (|S| = k-1)) (|S| = k-1)) <A,S> <A,S> R Rk-1k-1]) ]) (confide(<A, C(confide(<A, Cii C Cjj>) ≥ min_confi) } >) ≥ min_confi) } thenthenR = R = RRkk is the set of all confident association rules. is the set of all confident association rules.

Given as a homeworkproblem on sets

Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes

Formal Definition: Formal Definition: A binary reflected Gray code is a one-to-one function mapping A binary reflected Gray code is a one-to-one function mapping

the integers 0 the integers 0 i i 2 2nn – 1 to – 1 to nn-bit binary numbers so that every -bit binary numbers so that every two consecutive binary numbers differ in exactly one bit.two consecutive binary numbers differ in exactly one bit.

OriginOrigin Used by Emile Baudot in telegraph in 1878.Used by Emile Baudot in telegraph in 1878. Used by Frank Gray in 1953 patient for pulse-code modulation Used by Frank Gray in 1953 patient for pulse-code modulation

tubetube Prevented large noise spikes when vacuum tube counters Prevented large noise spikes when vacuum tube counters

incrementedincremented

Example:Example:

000000

001001

011011

010010

110110

111111

101101

100100


Appears in a curiously large number of Appears in a curiously large number of applicationsapplications Towers of HanoiTowers of Hanoi Robotic Arm Angle measurementRobotic Arm Angle measurement Hamiltonian CircuitsHamiltonian Circuits ……


Why is it called “Binary Reflected”?Why is it called “Binary Reflected”? Binary is obviousBinary is obvious

Strings are drawn from alphabet of 0s and 1sStrings are drawn from alphabet of 0s and 1s

Reflected is less obviousReflected is less obvious Each half of the code sequence is built from a reflected copy Each half of the code sequence is built from a reflected copy

of the other halfof the other half

000000

001001

011011

010010

110110

111111

101101

100100

Visual RepresentationVisual Representation


A Simple Recursive DefinitionA Simple Recursive Definition Let G(Let G(kk,,nn) represent the ) represent the kkthth code in the code in the nn-bit binary -bit binary

reflected Gray code sequence reflected Gray code sequence

Computed in Computed in ΘΘ((nn) time (for ) time (for nn bits) bits) For single Gray code value, this is optimalFor single Gray code value, this is optimal Typically, however, desire Typically, however, desire entireentire code sequent code sequent


A Naïve ImplementationA Naïve Implementation To generate the entire sequence, call G(To generate the entire sequence, call G(ii,,nn) )

with with ii going from 0 to going from 0 to k-1k-1..

A priori AnalysisA priori Analysis Each invocation of G requires Each invocation of G requires ΘΘ((nn) time) time G is invoked G is invoked kk times times kk is equal to 2 is equal to 2nn

Therefore, Therefore, ΘΘ((nn*2*2nn) time and ) time and ΘΘ(2(2nn) space) space Optimal is Optimal is ΘΘ(2(2nn) time and space) time and space


What is the source of the inefficiency?What is the source of the inefficiency? Repeated work.Repeated work.


A Dynamic Programming ApproachA Dynamic Programming Approach


Naïve Dynamic Programming Naïve Dynamic Programming ImplementationImplementation RequirementRequirement

We must generate and store the entire (We must generate and store the entire (nn-1)-bit -1)-bit Gray code sequence prior to starting the Gray code sequence prior to starting the nn-bit Gray -bit Gray code sequencecode sequence

ApproachApproach Use two-dimensional matrix to store previously Use two-dimensional matrix to store previously

calculated Gray code sequencescalculated Gray code sequences


AnalysisAnalysis TimeTime

SpaceSpace

Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes Notice the classic time/space trade-offNotice the classic time/space trade-off

Naïve IterativeNaïve Iterative Time: Time: ΘΘ((nn*2*2nn)) Space: Space: ΘΘ(2(2nn) )

Naïve Dynamic ProgrammingNaïve Dynamic Programming Time: Time: ΘΘ(2(2n+1n+1) ) Space: Space: ΘΘ(2(2n+1n+1) )

What are the sources of the remaining What are the sources of the remaining inefficiencies?inefficiencies? Time: Spends too much time copying valuesTime: Spends too much time copying values

22ndnd half of half of nn-bit sequence is copy (plus “0”) of 1-bit sequence is copy (plus “0”) of 1stst half half Space: Only require previous Gray code sequence, Space: Only require previous Gray code sequence, not not

all previous sequencesall previous sequences

Time/Space trade-offis just a rule of thumb


Improved Approach Improved Approach Use integers rather than strings to represent Use integers rather than strings to represent

codescodes Binary representation of integer is equivalent Binary representation of integer is equivalent

to the string versionto the string version Requires only 1 bit per bit of code.Requires only 1 bit per bit of code.

Reuse the first half of the (Reuse the first half of the (nn-1)-bit sequence -1)-bit sequence directly as the first half of directly as the first half of nn-bit sequence-bit sequence Most-significant bit is still set as it must Most-significant bit is still set as it must

contain leading zeros.contain leading zeros. To set leading one of second half, just add 2To set leading one of second half, just add 2nn-1-1


AnalysisAnalysis Produces and Produces and

storesstores

Time and Time and SpaceSpace


Posteriori Analysis

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

n

tim

e

Divide and Conquer

Dynamic Programming

Optimal

Recipe

SummarySummary Revised Discrete Structures CourseRevised Discrete Structures Course Explicit connection to curriculumExplicit connection to curriculum Infusion of “real-world” applicationsInfusion of “real-world” applications Applications allow infusion ofApplications allow infusion of

Dynamic ProgrammingDynamic Programming Divide-and-ConquerDivide-and-Conquer Set TheorySet Theory Algorithm AnalysisAlgorithm Analysis RecursionRecursion Proof TechniquesProof Techniques LogicLogic

Contact InformationContact Information

Michael R. Wick (Michael R. Wick ([email protected]@uwec.edu))

Paul J. Wagner (Paul J. Wagner ([email protected]@uwec.edu))

Department of Computer ScienceDepartment of Computer Science

University of Wisconsin – Eau ClaireUniversity of Wisconsin – Eau Claire

Eau Claire, WI 54701Eau Claire, WI 54701

www.cs.uwec.eduwww.cs.uwec.edu

mailto:[email protected]

michael r. wick and paul j. wagner department of computer science

Documents

s s si sj s

kreg jig

c d iff support

si sj si lk

marketbasket analysissets

domainlet t

cardinality of n

discrete structures