『 data mining 』 by jung, hae-sun. 1.introduction 2.definition 3.data mining applications 4.data...

17
Data Mining By Jung, hae-sun

Upload: ernest-johnston

Post on 11-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

『 Data Mining 』

By Jung, hae-sun

Page 2: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

1. Introduction2. Definition3. Data Mining Applications4. Data Mining Tasks5. Overview of the System6. Data Mining Analysis 7. Application8. Reference

Page 3: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

1. Introduction

Data mining is related to

- Data warehousing

- Online analytical processing (OLAP)

- Data visualization Data mining needs a data warehouse for effective

mining. The aims of OLAP and data mining are similar but only data mining involves looking for unknown patterns.  Finally, data mining requires data visualization of presentation of results.

Page 4: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

2. Definition

A technique using software tools geared for the user who typically does not know exactly what he's searching for, but is looking for particular patterns or trends. Data mining is the process of sifting through large amounts of data to produce data content relationships. This is also known as data surfing.

Page 5: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

3. Data Mining Applications

Applications in financial, telecom, insurance and retail companies for

- market segmentation

- fraud detection

-better marketing

- trend analysis

- market basket analysis

- customer churn

Page 6: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

4. Data Mining Tasks

Class description Association Sequential Patterns Time-Series analysis  Prediction Classification Clustering

Page 7: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

5. Overview of the System - Recommender System

Product Database

CustomerPurchaseDatabase

Data MiningClustering

Cluster-specificProduct lists

Data MiningAssociations

MatchingAlgorithm

PersonalizedRecommendation

List

Products eligible forrecommendation

Clusterassignments

NormalizedCustomervectors

Vector for Target customer

Productaffinities

Target Customer

Products ListFor target customer’scluster

Grouping between customer & product

Grouping between products

Page 8: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

6. Data Mining Analysis (1)

▶ Clustering

- Neural Clustering Algorithm- Demographic Clustering Algorithm

▶ Association Rule

- Apriori Algorithm- AprioriAll Algorithm- AprioriTid Algorithm- DynamicSome Algorithm- FP-Growth

Matching Algorithm (Key points in this paper)

Page 9: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

6. Data Mining Analysis (2)

▶ Association Rule- Concept

- Search for interesting relationships among items in a given data set.

▶ Association Rule- Procedure

1. Find all frequent itemsets.

; Each of these itemsets will occur at least as frequently as a pre-determined

minimum support.

2. Generate strong association rules from the frequent itemsets.

; These rules must satisfy minimum support and minimum confidence.

Page 10: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

6. Data Mining Analysis (3)

▶ Association Rule- Measure

- Support (A B) =Total number of transactions

number of transactions containing both A and B

- Confidence (A B) =number of transactions containing A

number of transactions containing both A and B

P(A)

P(A B)

∩ = = P(B | A)

P(A B)= ∩

Page 11: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

6. Data Mining Analysis (4)

▶ Association Rule- Example

Purchased products

A B C D E F

Customer 1 1 0 0 0 0 1

Customer 2 1 1 0 1 0 1

Customer 3 1 0 1 1 0 1

Customer 4 1 0 0 1 0 1

Customer 5 1 1 0 0 1 0

Support of A & D = 3/5 = 0.6Support of A & F = 4/5 = 0.8Support of A & E = 1/5= 0.2

Large Itemset # of transactions Support (%)

A 5 100

D 3 60

F 4 80

A,D 3 60

A,F 4 80

D,F 3 60

A,D,F 3 60

Minimum support = 60%Step1: Find all frequent itemsets.

Page 12: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

6. Data Mining Analysis (5)Step2: Generate strong association rules from the frequent itemsets.

Rules Support P(A ∩ B) Prob. Of Conditions Confidence

A F 80 % 100 % 0.8

A D 60% 100 % 0.6

D F 60 % 60 % 1

D, F A 60 % 60 % 1

AD : Confidence = 60%/100%= 0.6, D F : Confidence = 60%/60% = 1

Minimum Confidence = 90%

Strong Association Rule : D F , etc

Page 13: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

7. Application (1) - Safeway

Stores

▶ Data Collection

- Duration : 7 months- Number of Customers : 200- Recommendation Products per each customer : 10~20

Page 14: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

TeaPetfoodsSoft Drinks

DriedCat Food

DriedDog Food

CannedDog Food

CannedCat Food

FriskiesLiver (250g)

Product classes (99)

Product subclasses (2302)

Products (~30000)

▶ Safeway product taxonomy

Problem : Multilevel Products (Data Mining Issue) Seasonal Products

7. Application (2) - Safeway

Stores

Page 15: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

7. Application (3) - Safeway Stores

▶ Results

- 1957 products were recommended. Of these, 120(6.1%) were chosen.

(It is important to recall that the recommendation list will contain no

products

previously purchased by this customer.)

This system can be used a reasonable tool for recommending new

products

in Supermarket.

Page 16: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining

8. References

Agrawal, R. and Srikant, R., Fast Algorithms for mining association rules,

In proc. of the VLDB Conf., 1994

http://www.twocrows.com/glossary.htm, “Two Crows, Data Mining Glossary”

http://www.mis.postech.ac.kr/topic/dm_e.html, “Data Mining”

http://wwwmaths.anu.edu.au/~steve/pdcn.pdf

Page 17: 『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining