michael r. wick and paul j. wagner department of computer science
DESCRIPTION
Connecting Discrete Structures to the “Real World” Using Market Basket Analysis (and Gray Codes) to Integrate and Motivate Topics in Discrete Structures. Michael R. Wick and Paul J. Wagner Department of Computer Science University of Wisconsin - Eau Claire Eau Claire, WI 54701. Road Map. - PowerPoint PPT PresentationTRANSCRIPT
Connecting Discrete Structures to Connecting Discrete Structures to the “Real World”the “Real World”
Using Market Basket Analysis Using Market Basket Analysis (and Gray Codes)(and Gray Codes) to Integrate and to Integrate and
Motivate Topics in Discrete StructuresMotivate Topics in Discrete Structures
Michael R. Wick Michael R. Wick and and Paul J. WagnerPaul J. Wagner
Department of Computer ScienceDepartment of Computer Science
University of Wisconsin - Eau ClaireUniversity of Wisconsin - Eau Claire
Eau Claire, WI 54701Eau Claire, WI 54701
Road MapRoad Map
IntroductionIntroduction Our Discrete Structures CourseOur Discrete Structures Course Application: Market Basket AnalysisApplication: Market Basket Analysis
The Apriori AlgorithmThe Apriori Algorithm Set TheorySet Theory Dynamic ProgrammingDynamic Programming Algorithm AnalysisAlgorithm Analysis
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes ApplicationsApplications RecursionRecursion Algorithm AnalysisAlgorithm Analysis Divide-and-ConquerDivide-and-Conquer Dynamic ProgrammingDynamic Programming
SummarySummary Contact InformationContact Information
IntroductionIntroduction
Perceived disconnect with Perceived disconnect with Discrete StructuresDiscrete Structures Rest of curriculumRest of curriculum Application to “real world”Application to “real world”
Particularly problematic in applied programsParticularly problematic in applied programs
We claim this course for our ownWe claim this course for our own Replaced similar course in MathematicsReplaced similar course in Mathematics Retained rigorRetained rigor Infused applications and algorithmicsInfused applications and algorithmics
Our Discrete Structures CourseOur Discrete Structures Course TopicsTopics
LogicLogic Expert Systems, Algorithm Correctness ProofExpert Systems, Algorithm Correctness Proof
Proof TechniquesProof Techniques RecursionRecursion GraycodesGraycodes Divide and ConquerDivide and Conquer Dynamic ProgrammingDynamic Programming
Sets & RelationsSets & Relations Market-basket AnalysisMarket-basket Analysis compareTocompareTo and and equalsequals implementations implementations
FunctionsFunctions Algorithm AnalysisAlgorithm Analysis
Combinatorics/ProbabilityCombinatorics/Probability Expert SystemsExpert Systems
MatricesMatrices Graphics/Transmission ErrorsGraphics/Transmission Errors
Graphs and TreesGraphs and Trees Shortest Path, Iterative Deepening, Huffman CodingShortest Path, Iterative Deepening, Huffman Coding
Application: Market-Basket AnalysisApplication: Market-Basket Analysis Sets are a powerful way to describe the applicationSets are a powerful way to describe the application
Market Basket Analysis: the use of association techniques to find Market Basket Analysis: the use of association techniques to find groups of items that tend to occur together in transactionsgroups of items that tend to occur together in transactions frequent item setsfrequent item sets
• sets of items that occur above some minimum threshold (called the sets of items that occur above some minimum threshold (called the minimum supportminimum support))
• example: {a,b,c,d} occurs 12 times (min. support == 10)example: {a,b,c,d} occurs 12 times (min. support == 10)
association rulesassociation rules• a,b,c a,b,c d iff support({a,b,c,d}) / support({a,b,c}) d iff support({a,b,c,d}) / support({a,b,c}) rr (called (called
minimum confidenceminimum confidence))• a,b a,b c,d iff support({a,b}) / support({c,d}) c,d iff support({a,b}) / support({c,d}) rr • how many such rules are there?how many such rules are there?
Suggestive SellSuggestive Sell When the client selects the When the client selects the antecedentantecedent items suggest that they select the items suggest that they select the
consequentconsequent items items
Application: Market-Basket AnalysisApplication: Market-Basket Analysis
AprioriApriori Algorithm (1997) Algorithm (1997) PrinciplesPrinciples
Every subset of a frequent item set must be frequentEvery subset of a frequent item set must be frequent
Every frequent item set of cardinality Every frequent item set of cardinality nn+1 must have at least +1 must have at least two frequent item sets of cardinality two frequent item sets of cardinality nn as subsets as subsets
The intersection of these two subsets must have a cardinality The intersection of these two subsets must have a cardinality of of nn-1-1
We can build every possible frequent item set of size We can build every possible frequent item set of size nn+1 +1 from the union of frequent item sets of size from the union of frequent item sets of size nn..
Application: Market-Basket AnalysisApplication: Market-Basket Analysis
AprioriApriori Algorithm (1997) Algorithm (1997) Example: Example: minSupport = 2minSupport = 2
I= {Table Saw, Router, Kreg Jig, Sander, Drill Press}I= {Table Saw, Router, Kreg Jig, Sander, Drill Press}T= {{Table Saw, Router, Drill Press},T= {{Table Saw, Router, Drill Press},
{ Router, Sander },{ Router, Sander }, { Router, Kreg Jig },{ Router, Kreg Jig },
{Table Saw, Router, , Sander },{Table Saw, Router, , Sander }, {Table Saw, , Kreg Jig },{Table Saw, , Kreg Jig }, { Router, Kreg Jig },{ Router, Kreg Jig }, {Table Saw, , Kreg Jig },{Table Saw, , Kreg Jig }, {Table Saw, Router, Kreg Jig, , Drill Press},{Table Saw, Router, Kreg Jig, , Drill Press}, {Table Saw, Router, Kreg Jig }}{Table Saw, Router, Kreg Jig }}
LL11 = { {T}, {R}, {K}, {S}, {D} } = { {T}, {R}, {K}, {S}, {D} }
LL22 = { {R,T}, {K,T}, {D,T}, {K,R}, {R,S}, {D,R} } = { {R,T}, {K,T}, {D,T}, {K,R}, {R,S}, {D,R} }
LL33 = { {K,R,T}, {D,R,T} } = { {K,R,T}, {D,R,T} }
LL44 = = Rules = ????Rules = ????
Application: Market-Basket AnalysisApplication: Market-Basket Analysis AprioriApriori Algorithm (1997) Algorithm (1997)
Let I = {a,b,c,…} be a Let I = {a,b,c,…} be a setset of all items of all items in the domain in the domainLet T = { Let T = { SS | | SS I } be a I } be a bagbag of all transaction of all transaction records of item sets records of item setsLet support(Let support(SS) = ) = { {AA | | AA T T SS AA} |} |
Let LLet L11 = { { = { {aa} | } | aa I I support({ support({aa}) }) minSupport } minSupport }
kk ( (kk > 1 > 1 L Lkk-1-1 ) Let ) Let LLkk = { = { SSii SSjj | (| (SSii L Lkk-1-1) ) ( (SSjj L Lkk-1-1) ) ( |( |SSii – – SSjj| = 1 ) | = 1 ) ( | ( |SSjj – – SSii| = 1) | = 1) ( ( SS[ (([ ((SS SSii SSjj) ) (| (|SS| = | = kk-1)) -1)) S S L Lkk-1-1] ) ] ) ( support(( support(SSii SSjj) ) minSupport ) minSupport )
The set of all frequent item sets is given byThe set of all frequent item sets is given by
L = LL = Lkk
and the set of all association rules is given byand the set of all association rules is given by
R = { R = { AA CC | | AA (L(Lkk) ) ( (CC = L = Lkk – – A)A) ( (AA ) ) ( (CC ) )
support(Lsupport(Lkk) / support() / support(AA) ) minConfidence } minConfidence }
k
Application: Market-Basket AnalysisApplication: Market-Basket Analysis
Dynamic ProgrammingDynamic Programming ApproachApproach Want proof of Want proof of principle of optimalityprinciple of optimality and and overlapping overlapping
subproblemssubproblems
Principle of OptimalityPrinciple of Optimality The optimal solution to LThe optimal solution to Lkk includes the optimal solution includes the optimal solution
of Lof Lkk-1-1
Proof by contradictionProof by contradiction
Overlapping SubproblemsOverlapping Subproblems Lemma of every subset of a frequent item set is a frequent Lemma of every subset of a frequent item set is a frequent
item setitem set Proof by contradictionProof by contradiction
k
k
Application: Market-Basket AnalysisApplication: Market-Basket Analysis Rule Generation Rule Generation AlgorithmAlgorithm
Let L = Let L = k Lk Lkk
Let T = {S | S Let T = {S | S I } be the set of all transactions. I } be the set of all transactions.Let <A,C> be an association rule with antecedent A and consequent C.Let <A,C> be an association rule with antecedent A and consequent C.Let confid(<A,C>) = |{B | B Let confid(<A,C>) = |{B | B T T (A (A B) B) B}| / B}| / |{B | B |{B | B T T A A B}| B}|Let R1 = {<F-a,a> | F Let R1 = {<F-a,a> | F L L a a F F confid(F,a) ≥ min_confid)} andconfid(F,a) ≥ min_confid)} andk [ (k > 1) k [ (k > 1) (Rk-1 ≠ (Rk-1 ≠ ) ) Rk = { <A,C> |Rk = { <A,C> | (<A,C(<A,Cii> > R Rk-1k-1) ) (<A,C(<A,Cjj> > R Rk-1k-1) ) (|C(|Cii – C – Cjj| =1 | =1 |C |Cj j – C– Cii| = 1) | = 1) ((S [((S S [((S C Cii C Cjj) ) (|S| = k-1)) (|S| = k-1)) <A,S> <A,S> R Rk-1k-1]) ]) (confide(<A, C(confide(<A, Cii C Cjj>) ≥ min_confi) } >) ≥ min_confi) } thenthenR = R = RRkk is the set of all confident association rules. is the set of all confident association rules.
Given as a homeworkproblem on sets
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
Formal Definition: Formal Definition: A binary reflected Gray code is a one-to-one function mapping A binary reflected Gray code is a one-to-one function mapping
the integers 0 the integers 0 i i 2 2nn – 1 to – 1 to nn-bit binary numbers so that every -bit binary numbers so that every two consecutive binary numbers differ in exactly one bit.two consecutive binary numbers differ in exactly one bit.
OriginOrigin Used by Emile Baudot in telegraph in 1878.Used by Emile Baudot in telegraph in 1878. Used by Frank Gray in 1953 patient for pulse-code modulation Used by Frank Gray in 1953 patient for pulse-code modulation
tubetube Prevented large noise spikes when vacuum tube counters Prevented large noise spikes when vacuum tube counters
incrementedincremented
Example:Example:
000000
001001
011011
010010
110110
111111
101101
100100
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
Appears in a curiously large number of Appears in a curiously large number of applicationsapplications Towers of HanoiTowers of Hanoi Robotic Arm Angle measurementRobotic Arm Angle measurement Hamiltonian CircuitsHamiltonian Circuits ……
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
Why is it called “Binary Reflected”?Why is it called “Binary Reflected”? Binary is obviousBinary is obvious
Strings are drawn from alphabet of 0s and 1sStrings are drawn from alphabet of 0s and 1s
Reflected is less obviousReflected is less obvious Each half of the code sequence is built from a reflected copy Each half of the code sequence is built from a reflected copy
of the other halfof the other half
000000
001001
011011
010010
110110
111111
101101
100100
Visual RepresentationVisual Representation
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
A Simple Recursive DefinitionA Simple Recursive Definition Let G(Let G(kk,,nn) represent the ) represent the kkthth code in the code in the nn-bit binary -bit binary
reflected Gray code sequence reflected Gray code sequence
Computed in Computed in ΘΘ((nn) time (for ) time (for nn bits) bits) For single Gray code value, this is optimalFor single Gray code value, this is optimal Typically, however, desire Typically, however, desire entireentire code sequent code sequent
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
A Naïve ImplementationA Naïve Implementation To generate the entire sequence, call G(To generate the entire sequence, call G(ii,,nn) )
with with ii going from 0 to going from 0 to k-1k-1..
A priori AnalysisA priori Analysis Each invocation of G requires Each invocation of G requires ΘΘ((nn) time) time G is invoked G is invoked kk times times kk is equal to 2 is equal to 2nn
Therefore, Therefore, ΘΘ((nn*2*2nn) time and ) time and ΘΘ(2(2nn) space) space Optimal is Optimal is ΘΘ(2(2nn) time and space) time and space
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
What is the source of the inefficiency?What is the source of the inefficiency? Repeated work.Repeated work.
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
A Dynamic Programming ApproachA Dynamic Programming Approach
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
Naïve Dynamic Programming Naïve Dynamic Programming ImplementationImplementation RequirementRequirement
We must generate and store the entire (We must generate and store the entire (nn-1)-bit -1)-bit Gray code sequence prior to starting the Gray code sequence prior to starting the nn-bit Gray -bit Gray code sequencecode sequence
ApproachApproach Use two-dimensional matrix to store previously Use two-dimensional matrix to store previously
calculated Gray code sequencescalculated Gray code sequences
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
AnalysisAnalysis TimeTime
SpaceSpace
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes Notice the classic time/space trade-offNotice the classic time/space trade-off
Naïve IterativeNaïve Iterative Time: Time: ΘΘ((nn*2*2nn)) Space: Space: ΘΘ(2(2nn) )
Naïve Dynamic ProgrammingNaïve Dynamic Programming Time: Time: ΘΘ(2(2n+1n+1) ) Space: Space: ΘΘ(2(2n+1n+1) )
What are the sources of the remaining What are the sources of the remaining inefficiencies?inefficiencies? Time: Spends too much time copying valuesTime: Spends too much time copying values
22ndnd half of half of nn-bit sequence is copy (plus “0”) of 1-bit sequence is copy (plus “0”) of 1stst half half Space: Only require previous Gray code sequence, Space: Only require previous Gray code sequence, not not
all previous sequencesall previous sequences
Time/Space trade-offis just a rule of thumb
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
Improved Approach Improved Approach Use integers rather than strings to represent Use integers rather than strings to represent
codescodes Binary representation of integer is equivalent Binary representation of integer is equivalent
to the string versionto the string version Requires only 1 bit per bit of code.Requires only 1 bit per bit of code.
Reuse the first half of the (Reuse the first half of the (nn-1)-bit sequence -1)-bit sequence directly as the first half of directly as the first half of nn-bit sequence-bit sequence Most-significant bit is still set as it must Most-significant bit is still set as it must
contain leading zeros.contain leading zeros. To set leading one of second half, just add 2To set leading one of second half, just add 2nn-1-1
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
AnalysisAnalysis Produces and Produces and
storesstores
Time and Time and SpaceSpace
Application: Binary Reflected Gray CodesApplication: Binary Reflected Gray Codes
Posteriori Analysis
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
n
tim
e
Divide and Conquer
Dynamic Programming
Optimal
Recipe
SummarySummary Revised Discrete Structures CourseRevised Discrete Structures Course Explicit connection to curriculumExplicit connection to curriculum Infusion of “real-world” applicationsInfusion of “real-world” applications Applications allow infusion ofApplications allow infusion of
Dynamic ProgrammingDynamic Programming Divide-and-ConquerDivide-and-Conquer Set TheorySet Theory Algorithm AnalysisAlgorithm Analysis RecursionRecursion Proof TechniquesProof Techniques LogicLogic
Contact InformationContact Information
Michael R. Wick (Michael R. Wick ([email protected]@uwec.edu))
Paul J. Wagner (Paul J. Wagner ([email protected]@uwec.edu))
Department of Computer ScienceDepartment of Computer Science
University of Wisconsin – Eau ClaireUniversity of Wisconsin – Eau Claire
Eau Claire, WI 54701Eau Claire, WI 54701
www.cs.uwec.eduwww.cs.uwec.edu