regression, classification and clustering

23
Mah-Rukh Fida  June 2012

Upload: mahrukh-fida

Post on 05-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 1/23

Mah-Rukh Fida

 June 2012

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 2/23

Topics to be discussed D ATA  MINING 

R EGRESSION 

CLASSIFICATION  CLUSTERING 

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 3/23

DATA MINING 

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 4/23

Definition Definition : Exploring hidden information

Models of data mining

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 5/23

Two categories of data mining

models Prediction Model

Makes prediction using known results found from differentdata objects.

Descriptive Model

Identifies patterns or relationships in data.

Explores properties of the data examined

Does not predict new properties.

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 6/23

REGRESSION 

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 7/23

Definition Numeric prediction of the value of dependent variable.

Relationship between dependent and independent variable(s) are expressible through mathematical equation.

 Types of regression

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 8/23

Types of Regression Linear regression

 y=c+mx, where c and m are regression coefficients.

Multi-Linear regression  y=c

0+c

 1 x 

 1+c

 2 x 

 2+…+c

n x 

n

 where c0 ,c

 1 ,…c

nare regression coefficients and x 

 1 , x 

 2 ,…, x 

are independent variables.

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 9/23

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 10/23

Regression Continued …  Regression model is selected when

Prediction of a continuous or numerical value is needed

The relationship of predictor and response can beexpressed in the form of a curve or a mathematicalequation

Regression is not suitable when

Data may not fit in linear model Linear data may be poor due to noise or outliers.

Data is non-numeric

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 11/23

CLASSIFICATION 

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 12/23

Definition

Predicts class membership of data instances Classes are non-overlapping

Classes are already defined

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 13/23

Basic Steps for Prediction

Model Construction

Model Usage

Example :

• Height based Output follows the below given division criteria:

 2m ≤ Height Tall  1.7m < Height < 2m Medium  Height ≤ 1.7m Short • Classify :<Pat, F, 1.6> using KNN with K=5.

- {<Kristina, F, 1.6>, <Kathy, F, 1.6>, < Stephanie, F, 1.7>, <Dave, M, 1.7>, <Wynette, F, 1.75>}.

- Pat is Short. 

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 14/23

Validation Criteria

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 15/23

Validation Criteria

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 16/23

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 17/23

Definition Grouping of like terms

Groups are not predefined

Four Clusters

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 18/23

Clustering Algorithms

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 19/23

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 20/23

Clustering Algorithm

Result Validation If clusters do not make sense, go back to prior stage

Check for tendency of clusters in the data set

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 21/23

Selection Criteria Simplification

Useful in data concept construction

Unsupervised learning

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 22/23

Validation Criteria External criteria

Entropy, F-Measure, NMI-Measure, Purity 

Internal criteria Sum of Squared Error, BIC, CH, DB, SIL, DUNN

Relative criteria

Entropy, SSE

7/31/2019 Regression, Classification and Clustering

http://slidepdf.com/reader/full/regression-classification-and-clustering 23/23

END