『 data mining 』 by jung, hae-sun. 1.introduction 2.definition 3.data mining applications 4.data...
TRANSCRIPT
『 Data Mining 』
By Jung, hae-sun
1. Introduction2. Definition3. Data Mining Applications4. Data Mining Tasks5. Overview of the System6. Data Mining Analysis 7. Application8. Reference
1. Introduction
Data mining is related to
- Data warehousing
- Online analytical processing (OLAP)
- Data visualization Data mining needs a data warehouse for effective
mining. The aims of OLAP and data mining are similar but only data mining involves looking for unknown patterns. Finally, data mining requires data visualization of presentation of results.
2. Definition
A technique using software tools geared for the user who typically does not know exactly what he's searching for, but is looking for particular patterns or trends. Data mining is the process of sifting through large amounts of data to produce data content relationships. This is also known as data surfing.
3. Data Mining Applications
Applications in financial, telecom, insurance and retail companies for
- market segmentation
- fraud detection
-better marketing
- trend analysis
- market basket analysis
- customer churn
4. Data Mining Tasks
Class description Association Sequential Patterns Time-Series analysis Prediction Classification Clustering
5. Overview of the System - Recommender System
Product Database
CustomerPurchaseDatabase
Data MiningClustering
Cluster-specificProduct lists
Data MiningAssociations
MatchingAlgorithm
PersonalizedRecommendation
List
Products eligible forrecommendation
Clusterassignments
NormalizedCustomervectors
Vector for Target customer
Productaffinities
Target Customer
Products ListFor target customer’scluster
Grouping between customer & product
Grouping between products
6. Data Mining Analysis (1)
▶ Clustering
- Neural Clustering Algorithm- Demographic Clustering Algorithm
▶ Association Rule
- Apriori Algorithm- AprioriAll Algorithm- AprioriTid Algorithm- DynamicSome Algorithm- FP-Growth
Matching Algorithm (Key points in this paper)
6. Data Mining Analysis (2)
▶ Association Rule- Concept
- Search for interesting relationships among items in a given data set.
▶ Association Rule- Procedure
1. Find all frequent itemsets.
; Each of these itemsets will occur at least as frequently as a pre-determined
minimum support.
2. Generate strong association rules from the frequent itemsets.
; These rules must satisfy minimum support and minimum confidence.
6. Data Mining Analysis (3)
▶ Association Rule- Measure
- Support (A B) =Total number of transactions
number of transactions containing both A and B
- Confidence (A B) =number of transactions containing A
number of transactions containing both A and B
P(A)
P(A B)
∩ = = P(B | A)
P(A B)= ∩
6. Data Mining Analysis (4)
▶ Association Rule- Example
Purchased products
A B C D E F
Customer 1 1 0 0 0 0 1
Customer 2 1 1 0 1 0 1
Customer 3 1 0 1 1 0 1
Customer 4 1 0 0 1 0 1
Customer 5 1 1 0 0 1 0
Support of A & D = 3/5 = 0.6Support of A & F = 4/5 = 0.8Support of A & E = 1/5= 0.2
Large Itemset # of transactions Support (%)
A 5 100
D 3 60
F 4 80
A,D 3 60
A,F 4 80
D,F 3 60
A,D,F 3 60
Minimum support = 60%Step1: Find all frequent itemsets.
6. Data Mining Analysis (5)Step2: Generate strong association rules from the frequent itemsets.
Rules Support P(A ∩ B) Prob. Of Conditions Confidence
A F 80 % 100 % 0.8
A D 60% 100 % 0.6
D F 60 % 60 % 1
D, F A 60 % 60 % 1
AD : Confidence = 60%/100%= 0.6, D F : Confidence = 60%/60% = 1
Minimum Confidence = 90%
Strong Association Rule : D F , etc
7. Application (1) - Safeway
Stores
▶ Data Collection
- Duration : 7 months- Number of Customers : 200- Recommendation Products per each customer : 10~20
TeaPetfoodsSoft Drinks
DriedCat Food
DriedDog Food
CannedDog Food
CannedCat Food
FriskiesLiver (250g)
Product classes (99)
Product subclasses (2302)
Products (~30000)
▶ Safeway product taxonomy
Problem : Multilevel Products (Data Mining Issue) Seasonal Products
7. Application (2) - Safeway
Stores
7. Application (3) - Safeway Stores
▶ Results
- 1957 products were recommended. Of these, 120(6.1%) were chosen.
(It is important to recall that the recommendation list will contain no
products
previously purchased by this customer.)
This system can be used a reasonable tool for recommending new
products
in Supermarket.
8. References
Agrawal, R. and Srikant, R., Fast Algorithms for mining association rules,
In proc. of the VLDB Conf., 1994
http://www.twocrows.com/glossary.htm, “Two Crows, Data Mining Glossary”
http://www.mis.postech.ac.kr/topic/dm_e.html, “Data Mining”
http://wwwmaths.anu.edu.au/~steve/pdcn.pdf