introduction to data mining & warehousing · data mining exploration & analysis, by...

33
Introduction to Data Mining & Warehousing

Upload: others

Post on 01-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Introduction to Data Mining &

Warehousing

Page 2: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Objectives

After finishing this class the

students will:

Understand the basic terms

in Data Mining and

Warehousing

Understand their necessity

in business and IS

Page 3: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Objectives

Understand the basic

concepts of Data Mining

and Warehousing

Understand the

implementation processes of

those concepts

Page 4: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Motivation

Page 5: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Motivation

Lots of data is being collected

and warehoused

Web data, e-commerce

purchases at department/

grocery stores

Bank/Credit Card

transactions

Page 6: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Motivation

Computers have become cheaper and more powerful

Competitive Pressure is Strong

Need better, customized services for an edge (e.g. in Customer Relationship Management)

Page 7: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Motivation

Page 8: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data Warehousing

A data warehouse is

repository of information

collected from multiple

sources, stored under a

unified scheme, and

usually resides at a

single site

Page 9: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data Warehousing

A data warehouse is

only a half solution of

mining the huge data

Page 10: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Typical Data Warehousing Architecture

Page 11: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data Mining

Page 12: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data Mining

Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Page 13: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data Mining

Data mining is the process of discovering

actionable information from large sets of data.

Data mining uses mathematical analysis to

derive patterns and trends that exist in data.

Typically, these patterns cannot be discovered by

traditional data exploration because the

relationships are too complex or because there is

too much data.

Page 14: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data Mining

Is a synonym for Knowledge Discovery in Database

Page 15: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Discovering the knowledge

Data cleaning

Remove the noise or irrelevant data

Data integration

Combine the possible data sources

Data selection

Retrieve the relevant data for such analysis task

Page 16: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Discovering the knowledge

Data transformation

Transform and consolidate data into a form that appropriate for mining

Data Mining

Pattern evaluation

Identify the interesting patterns that representing the knowledge

Page 17: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Discovering the knowledge

Knowledge Presentation

Visualize and presents the mined knowledge to the user

Page 18: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Typical Data mining architecture

Page 19: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining tasks

Prediction Methods

Use some variables to predict unknown or

future values of other variables.

Page 20: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining tasks

Description Methods

Find human-interpretable patterns that

describe the data.

Page 21: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining tasks

Classification [Predictive]

Clustering [Descriptive]

Association Rule Discovery [Descriptive]

Sequential Pattern Discovery [Descriptive]

Regression [Predictive]

Deviation Detection [Predictive]

Page 22: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining Algorithms

Classification algorithms

predict one or more discrete variables,

based on the other

Regression algorithms

predict one or more continuous variables,

such as profit or loss, based on other

attributes in the dataset.

Page 23: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining Algorithms

Segmentation algorithms

divide data into groups, or clusters, of

items that have similar properties

Page 24: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining Algorithms

Association algorithms

find correlations between different

attributes in a dataset. The most common

application of this kind of algorithm is for

creating association rules, which can be

used in a market basket analysis.

Page 25: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining Algorithms

Sequence analysis algorithms

summarize frequent sequences or

episodes in data, such as a Web path flow.

Page 26: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining Models

patterns and trends that were collected

are defined as a data mining model.

Page 27: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining Models

Forecasting

Estimating sales, predicting server loads or

server downtime

Page 28: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining Models

Risk and probability

Choosing the best customers for targeted

mailings, determining the probable break-

even point for risk scenarios, assigning

probabilities to diagnoses or other

outcomes

Page 29: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining Models

Recommendations

Determining which products are likely to be

sold together, generating

recommendations

Page 30: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining Models

Finding sequences

Analyzing customer selections in a

shopping cart, predicting next likely events

Page 31: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Data mining Models

Grouping

Separating customers or events into

cluster of related items, analyzing and

predicting affinities

Page 32: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

References

J. Han, M. Kamber, Data Mining:

Concepts and Techniques, 2001

Page 33: Introduction to Data Mining & Warehousing · Data Mining Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful

Dr. Ir. Muhammad Ikhwan Jambak, MEng