instructor : prof. marina gavrilova. goal goal of this presentation is to discuss in detail how data...
TRANSCRIPT
![Page 1: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/1.jpg)
Instructor : Prof. Marina Gavrilova
![Page 2: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/2.jpg)
GoalGoal of this presentation is to discuss in detail
how data mining methods are used in market analysis.
![Page 3: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/3.jpg)
Outline of Presentation Motivation based on types of learning
(supervised/unsupervised) Market Based Analysis Association Rule Algorithms More abstract problem Redux Breadth-first search Depth-first search Summary
![Page 4: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/4.jpg)
What to Learn/Discover?Statistical SummariesGeneratorsDensity EstimationPatterns/RulesAssociations Clusters/Groups Exceptions/OutliersChanges in Patterns Over Time or
Location
![Page 5: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/5.jpg)
Market Basket AnalysisConsider shopping cart filled with several
itemsMarket basket analysis tries to answer the
following questions:Who makes purchases?What do customers buy together?In what order do customers purchase items?
![Page 6: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/6.jpg)
Market Basket AnalysisGiven:A database of
customer transactions
Each transaction is a set of items
Example:Transaction with TID 111 contains items {Pen, Ink, Milk, Juice}
TID CID Date Item Qty 111 201 5/1/99 Pen 2 111 201 5/1/99 Ink 1 111 201 5/1/99 Milk 3 111 201 5/1/99 Juice 6 112 105 6/3/99 Pen 1 112 105 6/3/99 Ink 1 112 105 6/3/99 Milk 1 113 106 6/5/99 Pen 1 113 106 6/5/99 Milk 1 114 201 7/1/99 Pen 2 114 201 7/1/99 Ink 2 114 201 7/1/99 Juice 4
![Page 7: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/7.jpg)
Market Basket Analysis (Contd.)Coocurrences
80% of all customers purchase items X, Y and Z together.
Association rules60% of all customers who purchase X and Y
also buy Z.Sequential patterns
60% of customers who first buy X also purchase Y within three weeks.
Example: Face recognition for vending machine product recommendation
![Page 8: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/8.jpg)
Confidence and SupportWe prune the set of all possible association
rules using two interesting measures:Support of a rule:
X Y has support s : P(XY) = s (X AND Y PURCHASED TOGETHER)
Confidence of a rule:X Y has confidence c : P(Y|X) = c (Y
FOLLOWED X)
![Page 9: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/9.jpg)
ExampleExamples:{Pen} => {Milk}
Support: 75%Confidence: 75%
{Ink} => {Pen}Support: 100%Confidence: 100%
TID CID Date Item Qty 111 201 5/1/99 Pen 2 111 201 5/1/99 Ink 1 111 201 5/1/99 Milk 3 111 201 5/1/99 Juice 6 112 105 6/3/99 Pen 1 112 105 6/3/99 Ink 1 112 105 6/3/99 Milk 1 113 106 6/5/99 Pen 1 113 106 6/5/99 Milk 1 114 201 7/1/99 Pen 2 114 201 7/1/99 Ink 2 114 201 7/1/99 Juice 4
![Page 10: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/10.jpg)
ExampleFind all itemsets
withsupport >= 75%?
TID CID Date Item Qty 111 201 5/1/99 Pen 2 111 201 5/1/99 Ink 1 111 201 5/1/99 Milk 3 111 201 5/1/99 Juice 6 112 105 6/3/99 Pen 1 112 105 6/3/99 Ink 1 112 105 6/3/99 Milk 1 113 106 6/5/99 Pen 1 113 106 6/5/99 Milk 1 114 201 7/1/99 Pen 2 114 201 7/1/99 Ink 2 114 201 7/1/99 Juice 4
![Page 11: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/11.jpg)
ExampleFind all association
rules with support >= 50%
TID CID Date Item Qty 111 201 5/1/99 Pen 2 111 201 5/1/99 Ink 1 111 201 5/1/99 Milk 3 111 201 5/1/99 Juice 6 112 105 6/3/99 Pen 1 112 105 6/3/99 Ink 1 112 105 6/3/99 Milk 1 113 106 6/5/99 Pen 1 113 106 6/5/99 Milk 1 114 201 7/1/99 Pen 2 114 201 7/1/99 Ink 2 114 201 7/1/99 Juice 4
![Page 12: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/12.jpg)
Market Basket Analysis: Applications
Sample ApplicationsDirect marketingFraud detection for medical insuranceFloor/shelf planningWeb site layoutCross-selling
![Page 13: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/13.jpg)
Applications of Frequent ItemsetsMarket Basket AnalysisAssociation RulesClassification (especially: text, rare classes)Seeds for construction of Bayesian NetworksWeb log analysisCollaborative filtering
![Page 14: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/14.jpg)
Association Rule AlgorithmsAbstract problem reduxBreadth-first searchDepth-first search
![Page 15: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/15.jpg)
Problem ReduxAbstract: A set of items {1,2,…,k} A dabase of transactions
(itemsets) D={T1, T2, …, Tn},Tj subset {1,2,…,k}
GOAL:Find all itemsets that appear in at
least x transactions
(“appear in” == “are subsets of”)I subset T: T supports I
For an itemset I, the number of transactions it appears in is called the support of I.
x is called the minimum support.
Concrete: I = {milk, bread, cheese, …} D = { {milk,bread,cheese},
{bread,cheese,juice}, …}
GOAL:Find all itemsets that appear in
at least 1000 transactions
{milk,bread,cheese} supports {milk,bread}
![Page 16: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/16.jpg)
Problem Redux (Cont.)Definitions:An itemset is frequent if it
is a subset of at least x transactions. (FI.)
An itemset is maximally frequent if it is frequent and it does not have a frequent superset. (MFI.)
GOAL: Given x, find all frequent (maximally frequent) itemsets (to be stored in the FI (MFI)).
Obvious relationship:MFI subset FI
Example:D={ {1,2,3}, {1,2,3},
{1,2,3}, {1,2,4} }Minimum support x = 3
{1,2} is frequent{1,2,3} is maximal frequentSupport({1,2}) = 4
All maximal frequent itemsets: {1,2,3}
![Page 17: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/17.jpg)
The Itemset Lattice{}
{2}{1} {4}{3}
{1,2} {2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
![Page 18: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/18.jpg)
Frequent Itemsets
Frequent itemsets
Infrequent itemsets
{}
{2}{1} {4}{3}
{1,2} {2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
![Page 19: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/19.jpg)
Breath First Search: 1-Itemsets{}
{2}{1} {4}{3}
{1,2} {2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
InfrequentFrequentCurrently examinedDon’t know
![Page 20: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/20.jpg)
Breath First Search: 2-Itemsets{}
{2}{1} {4}{3}
{1,2} {2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
InfrequentFrequentCurrently examinedDon’t know
![Page 21: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/21.jpg)
Breath First Search: 3-Itemsets{}
{2}{1} {4}{3}
{1,2} {2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
InfrequentFrequentCurrently examinedDon’t know
![Page 22: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/22.jpg)
Breadth First Search: RemarksWe prune infrequent itemsets and avoid
to count themTo find an itemset with k items, we need
to count all 2k subsets
![Page 23: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/23.jpg)
Depth First Search (1){}
{2}{1} {4}{3}
{1,2} {2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
InfrequentFrequentCurrently examinedDon’t know
![Page 24: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/24.jpg)
Depth First Search (2){}
{2}{1} {4}{3}
{1,2} {2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
InfrequentFrequentCurrently examinedDon’t know
![Page 25: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/25.jpg)
Depth First Search (3){}
{2}{1} {4}{3}
{1,2} {2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
InfrequentFrequentCurrently examinedDon’t know
![Page 26: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/26.jpg)
Depth First Search (4){}
{2}{1} {4}{3}
{1,2} {2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
InfrequentFrequentCurrently examinedDon’t know
![Page 27: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/27.jpg)
Depth First Search (5){}
{2}{1} {4}{3}
{1,2} {2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
InfrequentFrequentCurrently examinedDon’t know
![Page 28: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/28.jpg)
BFS Versus DFSBreadth First SearchPrunes infrequent
itemsetsUses anti-
monotonicity: Every superset of an infrequent itemset is infrequent
Depth First SearchPrunes frequent
itemsetsUses monotonicity:
Every subset of a frequent itemset is frequent
![Page 29: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/29.jpg)
ExtensionsImposing constraints
Only find rules involving the dairy departmentOnly find rules involving expensive productsOnly find “expensive” rulesOnly find rules with “whiskey” on the right hand
sideOnly find rules with “milk” on the left hand sideHierarchies on the itemsCalendars (every Sunday, every 1st of the month)
![Page 30: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/30.jpg)
Item set ConstraintsDefinition: A constraint is an arbitrary property of itemsets.
Examples:The itemset has support greater than 1000. No element of the itemset costs more than $40.The items in the set average more than $20.
Goal: Find all itemsets satisfying a given constraint P.
“Solution”: If P is a support constraint, use the Apriori Algorithm.
![Page 31: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/31.jpg)
Two Trivial ObservationsApriori can be applied to any constraint P
(). Start from the empty set.Prune supersets of sets that do not satisfy P.
Itemset lattice is a boolean algebra, so Apriori also applies to Q ().Start from set of all items instead of empty set.Prune subsets of sets that do not satisfy Q.
![Page 32: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/32.jpg)
Negative Pruning a Monotone Q{}
{2}{1} {4}{3}
{2,3}{1,3} {1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
{1,2}
Satisfies QDoesn’t satisfy QCurrently examinedDon’t know
![Page 33: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/33.jpg)
Positive Pruning in Apriori{}
{2}{1} {4}{3}
{2,3}{1,3}{1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
{1,2}
FrequentInfrequentCurrently examinedDon’t know
![Page 34: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/34.jpg)
Positive Pruning in Apriori
{2,3}
{}
{2}{1} {4}{3}
{1,3}{1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
{1,2}
FrequentInfrequentCurrently examinedDon’t know
![Page 35: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/35.jpg)
Positive Pruning in Apriori{}
{2}{1} {4}{3}
{2,3}{1,3}{1,4} {2,4}
{1,2,3,4}
{1,2,3}
{3,4}
{1,2,4} {1,3,4} {2,3,4}
{1,2}
FrequentInfrequentCurrently examinedDon’t know
![Page 36: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/36.jpg)
The Problem Current Techniques:Approximate the difficult constraints.
New Goal:Given constraints P and Q, with P (support) and
Q (statistical constraint). Find all itemsets that satisfy both P and Q.
Recent solutions:Newer algorithms can handle both P and Q
![Page 37: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/37.jpg)
Satisfies Q
Satisfies P & Q
Satisfies P
{}
D
All supersets satisfy Q
All subsets satisfy P
![Page 38: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/38.jpg)
ApplicationsSpatial association rulesWeb miningMarket basket analysisUser/customer profiling
![Page 39: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/39.jpg)
Review QuestionsWhat is Supervised and Un- supervised learning ? Is clustering – supervised or un supervised type of
learning?What are Association Rule Algorithms?Differentiate with help of an example Breadth-first search
and Depth-first search
![Page 40: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis](https://reader035.vdocuments.us/reader035/viewer/2022070402/56649f275503460f94c3f7bf/html5/thumbnails/40.jpg)
Useful linkshttp://www.oracle.com/technology/
industries/life_sciences/pdf/ls_sup_unsup_dm.pdf
http://www.autonlab.org/tutorials/http://www.bandmservices.com/Clustering/
Clustering.htmhttp://www.cs.sunysb.edu/~skiena/
combinatorica/animations/search.htmlhttp://www.codeproject.com/KB/java/
BFSDFS.aspx