part i data mining fundamentals. data mining: a first view chapter 1

31
Part I Data Mining Fundamentals

Upload: vernon-cunningham

Post on 04-Jan-2016

231 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Part I

Data Mining Fundamentals

Page 2: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Data Mining: A First View

Chapter 1

Page 3: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

1.1 Data Mining: A Definition

Page 4: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Data Mining

The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data.

Page 5: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Induction-based Learning

The process of forming general concept definitions by observing specific examples of concepts to be learned.

Page 6: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Knowledge Discovery in Databases (KDD)

The application of the scientific method to data mining. Data mining is one step of the KDD process.

Page 7: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

1.2 What Can Computers Learn?

Page 8: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Four Levels of Learning

• Facts

• Concepts

• Procedures

• Principles

Page 9: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Concepts

Computers are good at learning concepts. Concepts are the output of a data mining session.

Page 10: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Three Concept Views

• Classical View

• Probabilistic View

• Exemplar View

Page 11: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Supervised Learning

• Build a learner model using data instances of known origin.

• Use the model to determine the outcome new instances of

unknown origin.

Page 12: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Supervised Learning:

A Decision Tree Example

Page 13: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Decision Tree

A tree structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes.

Page 14: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Table 1.1 • Hypothetical Training Data for Disease Diagnosis

Patient Sore SwollenID# Throat Fever Glands Congestion Headache Diagnosis

1 Yes Yes Yes Yes Yes Strep throat2 No No No Yes Yes Allergy3 Yes Yes No Yes No Cold4 Yes No Yes No No Strep throat5 No Yes No Yes No Cold6 No No No Yes No Allergy7 No No Yes No No Strep throat8 Yes No No Yes Yes Allergy9 No Yes No Yes Yes Cold10 Yes Yes No Yes Yes Cold

Page 15: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Figure 1.1 A decision tree for the data in Table 1.1

SwollenGlands

Fever

No

Yes

Diagnosis = Allergy Diagnosis = Cold

No

Yes

Diagnosis = Strep Throat

Page 16: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Table 1.2 • Data Instances with an Unknown Classification

Patient Sore SwollenID# Throat Fever Glands Congestion Headache Diagnosis

11 No No Yes Yes Yes ?12 Yes Yes No No Yes ?13 No No No No Yes ?

Page 17: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Production Rules

IF Swollen Glands = Yes

THEN Diagnosis = Strep Throat

IF Swollen Glands = No & Fever = Yes

THEN Diagnosis = Cold

IF Swollen Glands = No & Fever = No

THEN Diagnosis = Allergy

Page 18: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Unsupervised Clustering

A data mining method that builds models from data without predefined classes.

Page 19: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Table 1.3 • Acme Investors Incorporated

Customer Account Margin Transaction Trades/ Favorite AnnualID Type Account Method Month Sex Age Recreation Income

1005 Joint No Online 12.5 F 30–39 Tennis 40–59K1013 Custodial No Broker 0.5 F 50–59 Skiing 80–99K1245 Joint No Online 3.6 M 20–29 Golf 20–39K2110 Individual Yes Broker 22.3 M 30–39 Fishing 40–59K1001 Individual Yes Online 5.0 M 40–49 Golf 60–79K

Page 20: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

1.3 Is Data Mining Appropriate for My Problem?

Page 21: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Data Mining or Data Query?

• Shallow Knowledge

• Multidimensional Knowledge

• Hidden Knowledge

• Deep Knowledge

Page 22: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Data Mining vs. Data Query: An Example

• Use data query if you already almost know what you are looking for.

• Use data mining to find regularities in data that are not obvious.

Page 23: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

1.4 Expert Systems or Data Mining?

Page 24: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Expert System

A computer program that emulates the problem-solving skills of one or more human experts.

Page 25: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Knowledge Engineer

A person trained to interact with an expert in order to capture their knowledge.

Page 26: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Figure 1.2 Data mining vs. expert systems

Data Mining Tool

Expert SystemBuilding Tool

Human Expert

If Swollen Glands = YesThen Diagnosis = Strep Throat

If Swollen Glands = YesThen Diagnosis = Strep Throat

Knowledge Engineer

Data

Page 27: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

1.5 A Simple Data Mining Process Model

Page 28: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Figure 1.3 A simple data mining process model

SQL QueriesOperationalDatabase

DataWarehouse

ResultApplication

Interpretation&

EvaluationData Mining

Page 29: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Assembling the Data

• The Data Warehouse

• Relational Databases and Flat Files

Page 30: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

Mining the Data

Interpreting the Results

Result Application

Page 31: Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1

1.7 Data Mining Applications