data mining: concepts and techniques unit-iii part-i classification and predictions september 10,...

152
DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions June 27, 202 2 DATA MINING CSE@HCST 1

Upload: christiana-atkinson

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

DATA MINING: CONCEPTS AND TECHNIQUES

UNIT-III

Part-I Classification and PredictionsApril 21, 2023

DATA MINING CSE@HCST 1

Page 2: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Classification and Prediction

What is classification? What is

prediction?

Issues regarding classification and

prediction

Classification by decision tree

induction

Bayesian classification

*Rule-based classification

Classification by back propagation

Neural Network

*Support Vector Machines (SVM)

*Associative classification

Lazy learners (or learning from

your neighbors)

Other classification methods

*Prediction

*Accuracy and error measures

*Ensemble methods

*Model selection

Summary

April 21, 2023

2

DATA MINING CSE@HCST

Page 3: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Objectives

April 21, 2023DATA MINING CSE@HCST

3

Page 4: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Classification vs. Prediction

April 21, 2023DATA MINING CSE@HCST

4 Classification-

Predicts categorical class labels (discrete or nominal). Classifies data (constructs a model) based on the training set and

the values (class labels) in a classifying attribute and uses it in classifying new data.

Prediction- Models continuous-valued functions, i.e., predicts unknown or

missing values. Typical applications-

Credit approval. Document categorization. Target marketing. Medical diagnosis. Treatment effectiveness analysis. Fraud detection.

Page 5: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Classification types

April 21, 2023DATA MINING CSE@HCST

5

Page 6: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Classification—A Two-Step Process

April 21, 2023DATA MINING CSE@HCST

6 Model construction: describing a set of predetermined classes

Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute.

The set of tuples used for model construction is training set. The model is represented as classification rules, decision trees, or

mathematical formulae. Model usage: for classifying future or unknown objects

Estimate accuracy of the model The known label of test sample is compared with the classified result

from the model. Accuracy rate is the percentage of test set samples that are correctly

classified by the model. Test set is independent of training set, otherwise over-fitting will

occur. If the accuracy is acceptable, use the model to classify data tuples

whose class labels are not known.

Page 7: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Classification—A Two-Step Process

April 21, 2023DATA MINING CSE@HCST

7

Page 8: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Example-1 : Model Construction

April 21, 2023DATA MINING CSE@HCST

8

Page 9: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

9

Example-1: Using the Model in Prediction April 21, 2023DATA MINING CSE@HCST

Page 10: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Example-2 Process (1): Model Construction

April 21, 2023DATA MINING CSE@HCST

10

TrainingData

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

Page 11: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Process (2): Using the Model in Prediction

April 21, 2023DATA MINING CSE@HCST

11

Classifier

TestingData

NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

Unseen Data

(Jeff, Professor, 4)

Tenured?

Page 12: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

How does classification work ?

April 21, 2023DATA MINING CSE@HCST

12

Page 13: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Supervised vs. Unsupervised Learning

April 21, 2023DATA MINING CSE@HCST

13

Supervised learning (classification)

Supervision: The training data (observations,

measurements, etc.) are accompanied by labels indicating

the class of the observations.

New data is classified based on the training set.

Unsupervised learning (clustering)

The class labels of training data is unknown.

Given a set of measurements, observations, etc. with the

aim of establishing the existence of classes or clusters in

the data.

Page 14: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Issues: Data Preparation

April 21, 2023DATA MINING CSE@HCST

14

Data cleaning Preprocess data in order to reduce noise and handle missing

values Relevance analysis (feature selection)

Remove the irrelevant or redundant attributes Data transformation

Generalize and/or normalize data

Page 15: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Issues: Evaluating Classification Methods

April 21, 2023DATA MINING CSE@HCST

15

Accuracy classifier accuracy: predicting class label. predictor accuracy: guessing value of predicted attributes.

Speed time to construct the model (training time). time to use the model (classification/prediction time).

Robustness: handling noise and missing values. Scalability: efficiency in disk-resident databases. Interpretability

understanding and insight provided by the model. Other measures, e.g., goodness of rules, such as decision tree

size or compactness of classification rules.

Page 16: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction: Training Dataset

April 21, 2023DATA MINING CSE@HCST

16

age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

This follows an example of Quinlan’s ID3 (Playing Tennis)

Page 17: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree

April 21, 2023DATA MINING CSE@HCST

17

Page 18: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree

April 21, 2023DATA MINING CSE@HCST

18

Page 19: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree

April 21, 2023DATA MINING CSE@HCST

19

Page 20: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree

April 21, 2023DATA MINING CSE@HCST

20

Page 21: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree

April 21, 2023DATA MINING CSE@HCST

21

Page 22: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree

April 21, 2023DATA MINING CSE@HCST

22

Page 23: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree

April 21, 2023DATA MINING CSE@HCST

23

Page 24: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree

April 21, 2023DATA MINING CSE@HCST

24

Page 25: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

25

Page 26: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Boundary

April 21, 2023DATA MINING CSE@HCST

26

Page 27: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

27

Page 28: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

28

Page 29: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

29

Page 30: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

30

Page 31: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

31

Page 32: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

32

Page 33: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

33

Page 34: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

34

Page 35: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

35

Page 36: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

36

Page 37: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

37

Attribute Selection Measure: Information Gain (ID3/C4.5)

Select the attribute with the highest information gain

Let pi be the probability that an arbitrary tuple in D belongs to

class Ci, estimated by |Ci, D|/|D|

Expected information (entropy) needed to classify a tuple in D:

Information needed (after using A to split D into v partitions) to classify D:

Information gained by branching on attribute A

)(log)( 21

i

m

ii ppDInfo

)(||

||)(

1j

v

j

jA DInfo

D

DDInfo

(D)InfoInfo(D)Gain(A) AApril 21, 2023DATA MINING CSE@HCST

Page 38: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

38

Page 39: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Gain Ratio for Attribute Selection (C4.5)

April 21, 2023DATA MINING CSE@HCST

39

Page 40: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Gini Index (CART, IBM IntelligentMiner)

April 21, 2023DATA MINING CSE@HCST

40

Page 41: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Comparisons Of Attribute Selection Measures

April 21, 2023DATA MINING CSE@HCST

41

Page 42: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

42

Other Attribute Selection Measures

CHAID: a popular decision tree algorithm, measure based on χ2 test for

independence.

C-SEP: performs better than info. gain and gini index in certain cases.

G-statistic: has a close approximation to χ2 distribution.

MDL (Minimal Description Length) principle (i.e., the simplest solution is

preferred):

The best tree as the one that requires the fewest # of bits to both (1) encode

the tree, and (2) encode the exceptions to the tree.

Multivariate splits (partition based on multiple variable combinations)

CART: finds multivariate splits based on a linear comb. of attrs.

Which attribute selection measure is the best?

Most give good results, none is significantly superior than othersApril 21, 2023DATA MINING CSE@HCST

Page 43: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

43

Page 44: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

44

Page 45: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Decision Tree Induction [IMPORTANT]

April 21, 2023DATA MINING CSE@HCST

45

Page 46: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

46

Page 47: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

EXAMPLE: Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

47

Page 48: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

EXAMPLE: Decision Tree Induction

April 21, 2023DATA MINING CSE@HCST

48

Page 49: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

49

Page 50: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

EXAMPLE: Calculating Gain Ratio

April 21, 2023DATA MINING CSE@HCST

50

Page 51: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Gini Index

April 21, 2023DATA MINING CSE@HCST

51

Page 52: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Calculating Gini Index

April 21, 2023DATA MINING CSE@HCST

52

Page 53: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

53

Overfitting and Tree Pruning

Overfitting: An induced tree may overfit the training data- Too many branches, some may reflect anomalies due to noise

or outliers. Poor accuracy for unseen samples.

Two approaches to avoid overfitting- Prepruning: Halt tree construction early ̵ do not split a node

if this would result in the goodness measure falling below a threshold. Difficult to choose an appropriate threshold.

Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees. Use a set of data different from the training data to decide

which is the “best pruned tree”. April 21, 2023DATA MINING CSE@HCST

Page 54: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Overfitting and Tree Pruning

April 21, 2023DATA MINING CSE@HCST

54

Page 55: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

55

Enhancements to Basic Decision Tree Induction

Allow for continuous-valued attributes- Dynamically define new discrete-valued attributes that

partition the continuous attribute value into a discrete set of intervals.

Handle missing attribute values- Assign the most common value of the attribute. Assign probability to each of the possible values.

Attribute construction- Create new attributes based on existing ones that are sparsely

represented. This reduces fragmentation, repetition, and replication.

April 21, 2023DATA MINING CSE@HCST

Page 56: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

56

Bayesian Classification: Why?

A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

Foundation: Based on Bayes’ Theorem. Performance: A simple Bayesian classifier, naïve Bayesian

classifier, has comparable performance with decision tree and selected neural network classifiers.

Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct — prior knowledge can be combined with observed data.

Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured.April 21, 2023DATA MINING CSE@HCST

Page 57: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

57

Bayes’ Theorem: Basics

Total probability Theorem:

Bayes’ Theorem:

Let X be a data sample (“evidence”): class label is unknown. Let H be a hypothesis that X belongs to class C. Classification is to determine P(H|X), (i.e., posteriori probability): the

probability that the hypothesis holds given the observed data sample X. P(H) (prior probability): the initial probability-

E.g., X will buy computer, regardless of age, income, …… P(X): probability that sample data is observed. P(X|H) (likelihood): the probability of observing the sample X, given that

the hypothesis holds. E.g., Given that X will buy computer, the prob. that X is 31…40,

medium income.

)()1

|()( iAPM

i iABPBP

)(/)()|()(

)()|()|( XXX

XX PHPHPP

HPHPHP

April 21, 2023DATA MINING CSE@HCST

Page 58: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

58

Prediction Based on Bayes’ Theorem

Given training data X, posteriori probability of a hypothesis H,

P(H|X), follows the Bayes’ theorem-

Informally, this can be viewed as-

posteriori = [likelihood * prior/evidence]

Predicts X belongs to Ci iff the probability P(Ci|X) is the highest

among all the P(Ck|X) for all the k classes.

Practical difficulty: It requires initial knowledge of many

probabilities, involving significant computational cost.

)(/)()|()(

)()|()|( XXX

XX PHPHPP

HPHPHP

April 21, 2023DATA MINING CSE@HCST

Page 59: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

59

Classification Is to Derive the Maximum Posteriori

Let D be a training set of tuples and their associated class labels, and each tuple is represented by an n-D attribute vector X = (x1, x2, …, xn).

Suppose there are m classes C1, C2, …, Cm. Classification is to derive the maximum posteriori, i.e., the

maximal P(Ci|X). This can be derived from Bayes’ theorem-

Since P(X) is constant for all classes, only

needs to be maximized.

)()()|(

)|(X

XX

PiCPiCP

iCP

)()|()|( iCPiCPiCP XX

Page 60: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

60

Naïve Bayes Classifier A simplified assumption: attributes are conditionally

independent (i.e., no dependence relation between attributes):

This greatly reduces the computation cost: Only counts the class distribution.

If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk for Ak divided by |Ci, D| (# of tuples of Ci in D).

If Ak is continous-valued, P(xk|Ci) is usually computed based on Gaussian distribution with a mean μ and standard deviation σ

and P(xk|Ci) is-

)|(...)|()|(1

)|()|(21

CixPCixPCixPn

kCixPCiP

nk

X

2

2

2

)(

2

1),,(

x

exg

),,()|(ii CCkxgCiP X

Page 61: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

61

Naïve Bayes Classifier: Training Dataset

Class:C1:buys_computer = ‘yes’C2:buys_computer = ‘no’

Data to be classified: X = (age <=30, Income = medium,Student = yesCredit_rating = Fair)

age incomestudentcredit_ratingbuys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

April 21, 2023DATA MINING CSE@HCST

Page 62: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

62

Naïve Bayes Classifier: An Example

P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643

P(buys_computer = “no”) = 5/14= 0.357 Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222

P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444

P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4

P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667

P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2

P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667

P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028

P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

age income studentcredit_ratingbuys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

April 21, 2023DATA MINING CSE@HCST

Page 63: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

63

Avoiding the Zero-Probability Problem

Naïve Bayesian prediction requires each conditional prob. be non-zero. Otherwise, the predicted prob. will be zero.

Ex. Suppose a dataset with 1000 tuples, income=low (0), income= medium (990), and income = high (10).

Use Laplacian correction (or Laplacian estimator) Adding 1 to each case

Prob(income = low) = 1/1003

Prob(income = medium) = 991/1003

Prob(income = high) = 11/1003 The “corrected” prob. estimates are close to their

“uncorrected” counterparts.

n

kCixkPCiXP

1)|()|(

Page 64: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

64

Naïve Bayes Classifier: Comments

Advantages- Easy to implement. Good results obtained in most of the cases.

Disadvantages- Assumption: class conditional independence, therefore loss of

accuracy. Practically, dependencies exist among variables-

E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer,

diabetes, etc. Dependencies among these cannot be modeled by Naïve Bayes

Classifier. How to deal with these dependencies? Bayesian Belief Networks.

April 21, 2023DATA MINING CSE@HCST

Page 65: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Classification by Backpropagation

April 21, 2023DATA MINING CSE@HCST

65

Backpropagation: A neural network learning algorithm.

Started by psychologists and neurobiologists to develop and

test computational analogues of neurons.

A neural network: A set of connected input/output units

where each connection has a weight associated with it.

During the learning phase, the network learns by adjusting

the weights so as to be able to predict the correct class label of

the input tuples.

Also referred to as connectionist learning due to the

connections between units.

Page 66: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Neural Network as a Classifier

April 21, 2023DATA MINING CSE@HCST

66 Weakness

Long training time. Require a number of parameters typically best determined empirically,

e.g., the network topology or ``structure." Poor interpretability: Difficult to interpret the symbolic meaning behind

the learned weights and of ``hidden units" in the network.

Strength High tolerance to noisy data. Ability to classify untrained patterns. Well-suited for continuous-valued inputs and outputs. Successful on a wide array of real-world data. Algorithms are inherently parallel. Techniques have recently been developed for the extraction of rules from

trained neural networks.

Page 67: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

A Neuron (= a perceptron)

The n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping

)sign(y

ExampleFor n

0ikii xw

April 21, 2023

DATA MINING CSE@HCST

67

k-

f

weighted sum

Inputvector x

output y

Activationfunction

weightvector w

w0

w1

wn

x0

x1

xn

Page 68: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

A Multi-Layer Feed-Forward Neural Network

April 21, 2023DATA MINING CSE@HCST

68

Output layer

Input layer

Hidden layer

Output vector

Input vector: X

wij

i

jiijj OwI

jIje

O

1

1

))(1( jjjjj OTOOErr

jkk

kjjj wErrOOErr )1(

ijijij OErrlww )(jjj Errl)(

Page 69: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

How A Multi-Layer Neural Network Works?

April 21, 2023DATA MINING CSE@HCST

69

The inputs to the network correspond to the attributes measured for each

training tuple .

Inputs are fed simultaneously into the units making up the input layer.

They are then weighted and fed simultaneously to a hidden layer.

The number of hidden layers is arbitrary, although usually only one.

The weighted outputs of the last hidden layer are input to units making up

the output layer, which emits the network's prediction.

The network is feed-forward in that none of the weights cycles back to an

input unit or to an output unit of a previous layer.

From a statistical point of view, networks perform nonlinear regression:

Given enough hidden units and enough training samples, they can closely

approximate any function.

Page 70: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Defining a Network Topology

April 21, 2023DATA MINING CSE@HCST

70

First decide the network topology: # of units in the input layer, # of hidden layers (if > 1), # of units in each hidden layer, and # of units in the output layer.

Normalizing the input values for each attribute measured in the training tuples to [0.0—1.0].

One input unit per domain value, each initialized to 0. Output, if for classification and more than two classes, one

output unit per class is used. Once a network has been trained and its accuracy is

unacceptable, repeat the training process with a different network topology or a different set of initial weights.

Page 71: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Backpropagation

April 21, 2023DATA MINING CSE@HCST

71

Iteratively process a set of training tuples & compare the network's prediction

with the actual known target value.

For each training tuple, the weights are modified to minimize the mean

squared error between the network's prediction and the actual target value.

Modifications are made in the “backwards” direction: from the output layer,

through each hidden layer down to the first hidden layer, hence

“backpropagation”.

Steps- Initialize weights (to small random #s) and biases in the network. Propagate the inputs forward (by applying activation function). Backpropagate the error (by updating weights and biases). Terminating condition (when error is very small, etc.).

Page 72: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

72

Page 73: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

73

Page 74: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Multilayer Neural Network

April 21, 2023DATA MINING CSE@HCST

74

Page 75: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

75

Page 76: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

76

Page 77: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

77

Page 78: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Lazy vs. Eager Learning

April 21, 2023DATA MINING CSE@HCST

78 Lazy vs. eager learning

Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple.

Eager learning (the above discussed methods): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify.

Lazy: less time in training but more time in predicting. Accuracy-

Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function.

Eager: must commit to a single hypothesis that covers the entire instance space.

Page 79: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Lazy Learner: Instance-Based Methods

April 21, 2023DATA MINING CSE@HCST

79 Instance-based learning:

Store training examples and delay the processing (“lazy evaluation”) until a new instance must be classified.

Typical approaches- k-nearest neighbor approach

Instances represented as points in a Euclidean space. Locally weighted regression

Constructs local approximation. Case-based reasoning

Uses symbolic representations and knowledge-based inference.

Page 80: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

The k-Nearest Neighbor Algorithm

April 21, 2023DATA MINING CSE@HCST

80

All instances correspond to points in the n-D space. The nearest neighbor are defined in terms of Euclidean

distance, dist(X1, X2). Target function could be discrete- or real- valued. For discrete-valued, k-NN returns the most common value

among the k training examples nearest to xq.

Vonoroi diagram: the decision surface induced by 1-NN for a typical set of training examples.

.

_+

_ xq

+

_ _+

_

_

+

.

..

. .

Page 81: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST81

Page 82: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST82

Page 83: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST83

Page 84: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST84

Page 85: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST85

Page 86: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST86

Page 87: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST87

Page 88: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST88

Page 89: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST89

Page 90: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST90

Page 91: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST91

Page 92: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST92

Page 93: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST93

Page 94: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST94

Page 95: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST95

Page 96: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Example : -Nearest Neighbors

K-Nearest Neighbor K-Nearest Neighbor ClassifierClassifier

CustomerCustomer AgAgee

IncomIncomee

No. No. credit credit cardscards

ResponsResponsee

JohnJohn 3535 35K35K 33 NoNo

RachelRachel 2222 50K50K 22 YesYes

HannahHannah 6363 200K200K 11 NoNo

TomTom 5959 170K170K 11 NoNo

NellieNellie 2525 40K40K 44 YesYes

DavidDavid 3737 50K50K 22 ??

April 21, 2023DATA MINING

CSE@HCST96

Page 97: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

ExampleExample

K-Nearest Neighbor K-Nearest Neighbor ClassifierClassifier

CustomerCustomer AgAgee

IncomIncomee

(K)(K)

No. No.

cardcardss

JohnJohn 3535 3535 33

RachelRachel 2222 5050 22

HannahHannah 6363 200200 11

TomTom 5959 170170 11

NellieNellie 2525 4040 44

DavidDavid 3737 5050 22

ResponsResponsee

NoNo

YesYes

NoNo

NoNo

YesYes

Distance from Distance from DavidDavid

sqrt [(35-37)sqrt [(35-37)22+(35-+(35-50)50)2 2 +(3-2)+(3-2)22]=]=15.1615.16

sqrt [(22-37)sqrt [(22-37)22+(50-+(50-50)50)2 2 +(2-2)+(2-2)22]=]=1515

sqrt [(63-37)sqrt [(63-37)22+(200-+(200-50)50)2 2 +(1-+(1-2)2)22]=]=152.23152.23

sqrt [(59-37)sqrt [(59-37)22+(170-+(170-50)50)2 2 +(1-2)+(1-2)22]=]=122122

sqrt [(25-37)sqrt [(25-37)22+(40-+(40-50)50)2 2 +(4-2)+(4-2)22]=]=15.7415.74YesApril 21, 2023DATA MINING CSE@HCST

97

Page 98: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Genetic Algorithms (GA-Part-I)

April 21, 2023DATA MINING CSE@HCST

98

Genetic Algorithm: based on an analogy to biological evolution.

An initial population is created consisting of randomly generated rules- Each rule is represented by a string of bits

E.g., if A1 and ¬A2 then C2 can be encoded as 100

If an attribute has k > 2 values, k bits can be used

Based on the notion of survival of the fittest, a new population is formed to consist of the fittest rules and their offsprings.

The fitness of a rule is represented by its classification accuracy on a set of training examples.

Offsprings are generated by crossover and mutation.

The process continues until a population P evolves when each rule in P satisfies a prespecified threshold.

Slow but easily parallelizable.

Page 99: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Genetic Algorithms (GA)

April 21, 2023DATA MINING CSE@HCST

99

Page 100: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Genetic Algorithms (GA)

April 21, 2023DATA MINING CSE@HCST

100

Page 101: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Genetic Algorithms (GA)

April 21, 2023DATA MINING CSE@HCST

101

Page 102: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Genetic Algorithms (GA)

To use a genetic algorithm, you must encode solutions to your problem in a structure that can be stored in the computer.

This object is a genome (or chromosome). The genetic algorithm creates a population of genomes then

applies crossover and mutation to the individuals in the population to generate new individuals.

It uses various selection criteria so that it picks the best individuals for mating (and subsequent crossover).

Your objective function determines how 'good' each individual is.

April 21, 2023DATA MINING CSE@HCST

102

Page 103: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Genetic Algorithms (GA)

The genetic algorithm is very simple, yet it performs well on many different types of problems.

But there are many ways to modify the basic algorithm, and many parameters that can be 'tweaked'.

Basically, if you get the objective function right, the representation right and the operators right, then variations on the genetic algorithm and its parameters will result in only minor improvements.

April 21, 2023DATA MINING CSE@HCST

103

Page 104: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Representation

You can use any representation for the individual genomes in the genetic algorithm.

Holland worked primarily with strings of bits, but you can use arrays, trees, lists, or any other object.

But you must define genetic operators- (initialization, mutation, crossover, comparison) for any representation that you decide to use.

Remember that each individual must represent a complete solution to the problem you are trying to optimize.

April 21, 2023DATA MINING CSE@HCST

104

Page 105: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

105

Page 106: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Mutation operators

These are some sample tree mutation operators. You can use more than one operator during an

evolution. The mutation operator introduces a certain amount

of randomness to the search. It can help the search find solutions that crossover

alone might not encounter.

April 21, 2023DATA MINING CSE@HCST

106

Page 107: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

107

Page 108: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Crossover operators

These are some sample tree crossover operators. Typically crossover is defined so that two

individuals (the parents) combine to produce two more individuals (the children).

But you can define asexual crossover or single-child crossover as well.

The primary purpose of the crossover operator is to get genetic material from the previous generation to the subsequent generation.

April 21, 2023DATA MINING CSE@HCST

108

Page 109: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

109

Page 110: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Mutation operators

These are some sample list mutation operators. Notice that lists may be fixed or variable length. Also common are order-based lists in which the sequence is

important and nodes cannot be duplicated during the genetic operations.

You can use more than one operator during an evolution. The mutation operator introduces a certain amount of

randomness to the search. It can help the search find solutions that crossover alone

might not encounter.

April 21, 2023DATA MINING CSE@HCST

110

Page 111: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

111

Page 112: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST112

Notice that lists may be fixed or variable length. Also common are order-based lists in which the sequence is important and nodes cannot be duplicated during the genetic operations. You can use more than one operator during an evolution. The mutation operator introduces a certain amount of randomness to the search. It can help the search find solutions that crossover alone might not encounter.

Page 113: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

113

Page 114: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Genetic Algorithms (GA)

Two of the most common genetic algorithm implementations are 'simple' and 'steady state'.

In simple state- It is a generational algorithm in which the entire population is replaced each generation.

The steady state genetic algorithm is used by the Genitor program. In this algorithm, only a few individuals are replaced each 'generation'. This type of replacement is often referred to as overlapping populations.

April 21, 2023DATA MINING CSE@HCST

114

Page 115: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

115

http://lancet.mit.edu/mbwall/presentations

Page 116: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST

116

Page 117: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Genetic AlgorithmGenetic Algorithms (GA-Part-I)

117

April 21, 2023DATA MINING CSE@HCST

Page 118: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Outline

Introduction to Genetic Algorithm (GA) GA Components

Representation Recombination Mutation Parent Selection Survivor selection

Example

118

April 21, 2023DATA MINING CSE@HCST

Page 119: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Introduction to GA (1)119

Calculus Base Techniques

Fibonacci

Search Techniqes

Guided random search techniqes

Enumerative Techniqes

BFSDFS Dynamic Programmin

g

Tabu Search

Hill Climbi

ng

Simulated Anealing

Evolutionary Algorithms

Genetic Programming

Genetic Algorithms

Sort

April 21, 2023DATA MINING CSE@HCST

Page 120: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Introduction to GA (2)

“Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions you might not otherwise find in a lifetime.”- Salvatore Mangano, Computer Design, May 1995.

Originally developed by John Holland (1975) The genetic algorithm (GA) is a search heuristic that

mimics the process of natural evolution Uses concepts of “Natural Selection” and “Genetic

Inheritance” (Darwin 1859)

120

April 21, 2023DATA MINING CSE@HCST

Page 121: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Use of GA

Widely-used in business, science and engineering Optimization and Search Problems Scheduling and Timetabling

121

April 21, 2023DATA MINING CSE@HCST

Page 122: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Let’s Learn Biology (1)

Our body is made up of trillions of cells. Each cell has a core structure (nucleus) that contains your chromosomes.

Each chromosome is made up of tightly coiled strands of deoxyribonucleic acid (DNA). Genes are segments of DNA that determine specific traits, such as eye or hair color. You have more than 20,000 genes.

A gene mutation is an alteration in your DNA. It can be inherited or acquired during your lifetime, as cells age or are exposed to certain chemicals. Some changes in your genes result in genetic disorders.

122

April 21, 2023DATA MINING CSE@HCST

Page 123: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Let’s Learn Biology (2) 123

Source: http://www.riversideonline.com/health_reference/Tools/DS00549.cfm

1101101April 21, 2023DATA MINING CSE@HCST

Page 124: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Let’s Learn Biology (3) 124

April 21, 2023DATA MINING CSE@HCST

Page 125: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Let’s Learn Biology (4)

Natural Selection Darwin's theory of evolution: only the organisms best

adapted to their environment tend to survive and transmit their genetic characteristics in increasing numbers to succeeding generations while those less adapted tend to be eliminated.

125

Source: http://www.bbc.co.uk/programmes/p0022nyy

April 21, 2023DATA MINING CSE@HCST

Page 126: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

GA is inspired from Nature

A genetic algorithm maintains a population of candidate solutions for the problem at hand,and makes it evolve by iteratively applying a set of stochastic operators

126

April 21, 2023DATA MINING CSE@HCST

Page 127: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Nature VS GA

The computer model introduces simplifications (relative to the real biological mechanisms),

BUT

surprisingly complex and interesting structures have emerged out of evolutionary algorithms

127

April 21, 2023DATA MINING CSE@HCST

Page 128: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

High-level Algorithm

produce an initial population of individuals evaluate the fitness of all individuals while termination condition not met do select fitter individuals for reproduction recombine between individuals mutate individuals evaluate the fitness of the modified individuals generate a new population End while

128

April 21, 2023DATA MINING CSE@HCST

Page 129: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

GA Components129

Source: http://www.engineering.lancs.ac.ukApril 21, 2023DATA MINING CSE@HCST

Page 130: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

GA Components With Example

The MAXONE problem : Suppose we want to maximize the number of ones in a string of L binary digits

It may seem trivial because we know the answer in advance

However, we can think of it as maximizing the number of correct answers, each encoded by 1, to L yes/no difficult questions`

130

April 21, 2023DATA MINING CSE@HCST

Page 131: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

GA Components: Representation

Encoding An individual is encoded (naturally) as a string of l

binary digits Let’s say L = 10. Then, 1 = 0000000001 (10 bits)

131

April 21, 2023DATA MINING CSE@HCST

Page 132: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Algorithm

produce an initial population of individuals

evaluate the fitness of all individuals while termination condition not met do select fitter individuals for reproduction recombine between individuals mutate individuals evaluate the fitness of the modified individuals generate a new population End while

132

April 21, 2023DATA MINING CSE@HCST

Page 133: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Initial Population

We start with a population of n random strings. Suppose that l = 10 and n = 6

We toss a fair coin 60 times and get the following initial population:

s1 = 1111010101

s2 = 0111000101

s3 = 1110110101

s4 = 0100010011

s5 = 1110111101

s6 = 0100110000

133

April 21, 2023DATA MINING CSE@HCST

Page 134: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Algorithm

produce an initial population of individuals

evaluate the fitness of all individuals while termination condition not met do select fitter individuals for reproduction recombine between individuals mutate individuals evaluate the fitness of the modified individuals generate a new population End while

134

April 21, 2023DATA MINING CSE@HCST

Page 135: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Fitness Function: f()

We toss a fair coin 60 times and get the following initial population:

s1 = 1111010101 f (s1) = 7

s2 = 0111000101 f (s2) = 5

s3 = 1110110101 f (s3) = 7

s4 = 0100010011 f (s4) = 4

s5 = 1110111101 f (s5) = 8

s6 = 0100110000 f (s6) = 3 --------------------------------------------------- =

34

135

April 21, 2023DATA MINING CSE@HCST

Page 136: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Algorithm

produce an initial population of individuals evaluate the fitness of all individuals while termination condition not met do

select fitter individuals for reproduction recombine between individuals mutate individuals evaluate the fitness of the modified individuals generate a new population End while

136

April 21, 2023DATA MINING CSE@HCST

Page 137: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Selection (1)

Next we apply fitness proportionate selection with the roulette wheel method:

We repeat the extraction as many times as the number of individuals

we need to have the same parent population size (6 in our case)

137

Individual i will have a probability to be chosen Individual i will have a probability to be chosen

i

if

if

)(

)(

2211nn

33

Area is Proportional to fitness value

44

April 21, 2023DATA MINING CSE@HCST

Page 138: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Selection (2)

Suppose that, after performing selection, we get the following population:

s1` = 1111010101 (s1) s2` = 1110110101 (s3) s3` = 1110111101 (s5) s4` = 0111000101 (s2) s5` = 0100010011 (s4) s6` = 1110111101 (s5)

138

April 21, 2023DATA MINING CSE@HCST

Page 139: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Algorithm

produce an initial population of individuals evaluate the fitness of all individuals while termination condition not met do select fitter individuals for reproduction

recombine between individuals mutate individuals evaluate the fitness of the modified individuals generate a new population End while

139

April 21, 2023DATA MINING CSE@HCST

Page 140: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Recombination (1)

aka Crossover For each couple we decide according to

crossover probability (for instance 0.6) whether to actually perform crossover or not

Suppose that we decide to actually perform crossover only for couples (s1`, s2`) and (s5`, s6`).

For each couple, we randomly extract a crossover point, for instance 2 for the first and 5 for the second

140

April 21, 2023DATA MINING CSE@HCST

Page 141: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Recombination (2)141

April 21, 2023DATA MINING CSE@HCST

Page 142: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Algorithm

produce an initial population of individuals evaluate the fitness of all individuals while termination condition not met do select fitter individuals for reproduction recombine between individuals

mutate individuals evaluate the fitness of the modified individuals generate a new population End while

142

April 21, 2023DATA MINING CSE@HCST

Page 143: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Mutation (1)143

Before applying mutation:

s1`` = 1110110101

s2`` = 1111010101

s3`` = 1110111101

s4`` = 0111000101

s5`` = 0100011101

s6`` = 1110110011

After applying mutation:

s1``` = 1110100101

s2``` = 1111110100

s3``` = 1110101111

s4``` = 0111000101

s5``` = 0100011101

s6``` = 1110110001 April 21, 2023DATA MINING CSE@HCST

Page 144: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Mutation (2)

The final step is to apply random mutation: for each bit that we are to copy to the new population we allow a small probability of error (for instance 0.1)

Causes movement in the search space(local or global)

Restores lost information to the population

144

April 21, 2023DATA MINING CSE@HCST

Page 145: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

High-level Algorithm

produce an initial population of individuals evaluate the fitness of all individuals while termination condition not met do select fitter individuals for reproduction recombine between individuals mutate individuals

evaluate the fitness of the modified individuals

generate a new population End while

145

April 21, 2023DATA MINING CSE@HCST

Page 146: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Fitness of New Population

After Applying Mutation: s1``` = 1110100101 f (s1```) = 6

s2``` = 1111110100 f (s2```) = 7

s3``` = 1110101111 f (s3```) = 8

s4``` = 0111000101 f (s4```) = 5

s5``` = 0100011101 f (s5```) = 5

s6``` = 1110110001 f (s6```) = 6 -------------------------------------------------------------

37

146

April 21, 2023DATA MINING CSE@HCST

Page 147: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Example (End)

In one generation, the total population fitness changed from 34 to 37, thus improved by ~9%

At this point, we go through the same process all over again, until a stopping criterion is met

147

April 21, 2023DATA MINING CSE@HCST

Page 148: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Distribution of Individuals

Distribution of Individuals in Generation 0

Distribution of Individuals in Generation N

April 21, 2023DATA MINING CSE@HCST

148

Page 149: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Issues

Choosing basic implementation issues: representation population size, mutation rate, ... selection, deletion policies crossover, mutation operators

Termination Criteria Performance, scalability Solution is only as good as the evaluation function (often

hardest part)

149

April 21, 2023DATA MINING CSE@HCST

Page 150: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

When to Use a GA

Alternate solutions are too slow or overly complicated Need an exploratory tool to examine new approaches Problem is similar to one that has already been

successfully solved by using a GA Want to hybridize with an existing solution Benefits of the GA technology meet key problem

requirements

April 21, 2023DATA MINING CSE@HCST

150

Page 151: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

Conclusion

Inspired from Nature Has many areas of Applications GA is powerful

151

April 21, 2023DATA MINING CSE@HCST

Page 152: DATA MINING: CONCEPTS AND TECHNIQUES UNIT-III Part-I Classification and Predictions September 10, 2015 DATA MINING CSE@HCST 1

April 21, 2023DATA MINING CSE@HCST152

END