data mining introduction

28
Data Mining Introduction Data Mining Introduction TYNE SYSTEM Chun-hung, Chou 2003.12.09

Upload: jabir

Post on 19-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Data Mining Introduction. TYNE SYSTEM Chun-hung, Chou 2003.12.09. Outline. 1. Data Mining Overview 2. Functionalities 3. Software 4. R function 5. Example 6. Q & A. Data Mining Overview. Knowledge Discovery Process. 1. Data cleaning - remove noise and inconsistent data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining Introduction

Data Mining IntroductionData Mining Introduction

TYNE SYSTEM

Chun-hung, Chou

2003.12.09

Page 2: Data Mining Introduction

OutlineOutline

1. Data Mining Overview

2. Functionalities

3. Software

4. R function

5. Example

6. Q & A

Page 3: Data Mining Introduction

Data Mining Overview

Page 4: Data Mining Introduction

Knowledge Discovery ProcessKnowledge Discovery Process

1. Data cleaning - remove noise and inconsistent data

2. Data integration - combine multiple data sources

3. Data selection - data relevant to the analysis task

4. Data transformation - the forms for mining

5. Data mining

6. Pattern evaluation - identify

7. Knowledge presentation

Page 5: Data Mining Introduction

What is Data Mining?What is Data Mining?

• Viewed as part of the Knowledge Discovery process.

• Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data.

• Uses tools from Computer Science and Artificial Intelligence as well as Statistics.

Page 6: Data Mining Introduction

Why do we need data mining?Why do we need data mining?

– Large number of records (cases) (108-1012 bytes)– High dimensional data (variables) (10-104 attributes)– Only a small portion, typically 5% to 10%, of the

collected data is ever analyzed.– Data that may never be explored continues to be

collected out of fear that something that may prove important in the future may be missing.

– Magnitude of data precludes most traditional analysis ANOVA/PC/

Page 7: Data Mining Introduction

Potential ApplicationsPotential Applications

– Fraud Detection – Manufacturing Processes – Targeting Markets – Scientific Data Analysis– Risk Management– Web Intelligence– Bioinformation– …...

Page 8: Data Mining Introduction

•Data mining tools need no guidance.•Data mining models explain behavior.•Data mining requires no data analysis skill.•Data mining tools are “different” from statistics•Data mining eliminates the need to understand your business and your data.

Data Mining MythsData Mining Myths

Page 9: Data Mining Introduction

Data Mining FunctionalitiesData Mining Functionalities

• Concept/Class Description

• Association Analysis

• Classification Analysis

• Cluster Analysis

• Outlier Analysis

• Evolution Analysis

Page 10: Data Mining Introduction

Concept DescriptionConcept Description

Generate descriptions for characterization and

comparison of data

characterization :

summarizes and describes a collection of data

e.g. mean,distribution,percentile,..

comparison :

summarizes and distinguishes one collection of data from other

collection(s) of data

Page 11: Data Mining Introduction

Concept DescriptionConcept Description

Method:

visualization:

e.g. boxplot,bar chart, histogram,…

statistics/tabulate:

e.g. mean, std, proportion,contingency table…

Page 12: Data Mining Introduction

Association AnalysisAssociation Analysis

Goal: find interesting relationships among items in a given data set

Page 13: Data Mining Introduction

Association AnalysisAssociation Analysis

Example:• Market Basket Analysis - An example of Rule-based

Machine Learning• Customer Analysis

– Market Basket Analysis uses the information about what a customer purchases to give us insight into who they are and why they make certain purchases

• Product Analysis– Market Basket Analysis gives us insight into the

merchandise by telling us which products tend to be purchased together and which are most amenable to purchase

Page 14: Data Mining Introduction

Classification AnalysisClassification Analysis

Goal:

Build a model to describe a predetermined set of data

classes or concepts and use the model as prediction

Page 15: Data Mining Introduction

Classification AnalysisClassification Analysis

Method: Decision Tree Bayesian network Bayesian belife network Neural network k-nearest neighbor case-based reasoning genetic algorithm rough sets fuzzy logic

Page 16: Data Mining Introduction

Cluster AnalysisCluster Analysis

Goal:

grouping a set of physical or abstract objects into classes

of similar objects

Page 17: Data Mining Introduction

ClusterCluster

• Method:

Partitioning methods :k-means

Hierarchical methods :top-down,bottom-up

Density-based methods :arbitrary shapes

Grid-based methods :cells

Model-based methods :best fit of given model

Page 18: Data Mining Introduction

Outlier AnalysisOutlier Analysis

Outlier: the data can be considered as

inconsistent in a given data set

Goal: find an efficient method to mine the

outliers

Page 19: Data Mining Introduction

Outlier AnalysisOutlier Analysis

Method:

- Statistical-Based Outlier Detection

- Distance-Based Outlier Detection

- Deviation-Based Outlier Detection

Page 20: Data Mining Introduction

Evolution AnalysisEvolution Analysis

• Goal:

Describe and models regularities or trends for

objects whose behavior changes over time

Page 21: Data Mining Introduction

Evolution AnalysisEvolution Analysis

• Method:

Statistical Method

Trend Analysis

Similarity Search in Time-Series Analysis

Sequential Pattern Mining

Periodicity Analysis

Page 22: Data Mining Introduction

Commercial Software Commercial Software

• Full Suite

Product Company Price(US$)

EnterpriseMiner SAS >75000

Clementine SPSS ~50000Intelligent Miner IBM ??

Data Miner STATISTICA ~50000

IndexMiner Index Software ??

Page 23: Data Mining Introduction

Method in RMethod in R

Function R Library

Tree tree

Cluster clara

Cluster diana

Cluster fanny

Cluster mona

Cluster hclust

Cluster kmeans

Cluster cluster

Page 24: Data Mining Introduction

Example—Decision TreeExample—Decision Tree

• Decision Tree for Tools abnormal detection

AWD080AWD030,AWD050

Page 25: Data Mining Introduction

Example– Decision TreeExample– Decision Tree

Page 26: Data Mining Introduction

Example -- ClusterExample -- Cluster

Page 27: Data Mining Introduction

Question & Suggestion

Page 28: Data Mining Introduction

Thanks !